Skip to main content Skip to footer

How to Convert Markdown (.md) files to Word (.docx) Using .NET C#

What is Markdown?

Markdown is a lightweight markup language used to compile formatted text using basic text editors. It has gained popularity due to its easy-to-learn, read and write features. Moreover, it is entirely different from the WYSIWYG editors, for example, Word, where you apply formatting by clicking multiple buttons and can instantly view the formatted result. This is not the case with markdown. This language is quite popular for formatting web pages, as the user need not memorize many tags to format the page.

Despite all the advantages mentioned above, many still consider the Word format the standard for file sharing or long-term archival. Also, several document formatting features are exclusive to Word, and Word knowledge is very common among many developers and application users. However, users continue to search and need new ways to convert .md files to .docx files.

This blog discusses how to convert a markdown (.md) file to a Word (.docx) file using Document Solutions for Word API, previously GrapeCity Documents for Word API, and Markdig markdown parser in a C# application, following these steps:

  1. Configure the Application
  2. Define WordRenderer
  3. Define BlockRenderers
  4. Define InlineRenderers
  5. Convert .md file to Word

We will also learn to convert markdown to PDF as it allows us to employ all the various benefits and features of the PDF format (e.g., signing) on documents created in markdown.

Take total control of Word documents with the fastest Word API available for .NET 6. Download a FREE 30-Day Trial!

Use Case

Consider the scenario of converting a readme file shipped along with a product to a Word file. This might help the user to consume the file when designing a public guide or writing any article on the subject. Users can consume this converted file as the basis to add the remaining content, formatting, and images to enhance it further and need not start from scratch.

We will be working on converting the Document Solutions PDF Viewer readme file available as a .md file to a Word file. The screenshot below depicts the same:

Markdown File (.md) Word File (.docx)
Markdown Word

Convert Markdown (.md) to Word (.docx)

Step 1: Configure the Application

We will begin by creating a C# .Net Core Console Application and configure it to use DsWord API, previously GcWord API, and Markdig by installing the following Nuget packages:

NuGet

The Markdig package will be used to parse the markdown content to generate a structured MarkdownDocument, enabling DsWord API to read each MarkdownObject and render it as an element in the Word document with the same formatting as defined in the .md file.

Next, we need to add three class files, WordRenderer, BlockRenderers, and InlineRenderers, to the project containing several classes to implement the conversion.

Follow the detailed steps to understand how to implement the classes consuming these packages to achieve the conversion.

Step 2: Define WordRenderer

Next, we will add a new class file named WordRenderer to the project and define two classes: WordRenderer and WordObjectRenderer.

The WordRenderer class defines the layout of the Word document, serving as the converted Word document after all the elements and content from the .md file have been rendered on the Word document. In addition, it provides a list of helper methods that help to parse the .md file and add various elements such as Headers, Paragraphs, Runs, hyperlinks, lists, etc., to the Word document to render the content from the .md file.

The ToWord method defined in this class serves as the entry point of conversion and has two overloads, each serving a different purpose.

The method GcWordDocument ToWord(string markdown, MarkdownPipeline pipeline = null) invokes the Parse method of the Markdig markdown parser to parse the content of a markdown file into an Abstract Syntax Tree (AST) MarkdownDocument with the help of the MarkdownPipeline. The MarkdownDocument derived from a ContainerBlock consists of a collection of various Block Elements and Inline Elements, which need to be rendered as Word elements to generate the Word document. You may refer to the following links to understand Markdig Parsing and Abstract Syntax Tree (AST) in detail.

So, the overload of the ToWord method DsWordDocument ToWord(MarkdownDocument document)consumes the generated Markdown document to fetch each element in AST as a Markdownobject and pass it to either the BlockRenderer or InlineRenderer to generate the Word elements corresponding to the Markdownobjectdepending on its type such as Heading, Paragraph, List, etc.

The WordObjectRenderer class is defined as an abstract class. It is meant to be inherited by the specific Renderer classes, which will be defined (as described in the next step) to render different markdown objects to a Word document.

The code below depicts the implementation of the WordObjectRenderer class. Download the WordRender class file to understand the implementation.

    // A base class for rendering Block and Inline Markdown objects to Word.    
    public abstract class WordObjectRenderer<TObject> : MarkdownObjectRenderer<WordRenderer, TObject> where TObject : MarkdownObject
    {
    }

Step 3: Define BlockRenderers

Add another class file named BlockRenderers to define all the classes used to render the Block Elements, such as ListBlock, HeadingBlock, ParagraphBlock, etc., of the MardownDocument to the Word document with the help of helper methods defined in WordRenderer class.

The code below is an example of the implementation of one renderer class for the HeadingBlock element; for detailed implementation, refer to the BlockRenderers class file:

    // An Word renderer for a HeadingBlock.    
    public class HeadingRenderer : WordObjectRenderer<HeadingBlock>
    {
        protected override void Write(WordRenderer renderer, HeadingBlock obj)
        {
            renderer.InHtmlBlock = false;
            int index = obj.Level - 1;
            Style style;
            if (index >= 0 && index < renderer.HeaderStyles.Length)
                style = renderer.HeaderStyles[index];
            else
                style = renderer.HeaderFallbackStyle;

            if (renderer.IsCurrBlockStyleDefault)
            {
                var prevBlockStyle = renderer.CurrBlockStyle;
                renderer.CurrBlockStyle = style;
                renderer.AddParagraph();
                renderer.WriteLeafInline(obj);
                renderer.CurrBlockStyle = prevBlockStyle;
            }
            else
            {
                var prevInlineStyle = renderer.CurrInlineStyle;
                var myInlineStyleProps = WordRenderer.InlineStyleProps.FromStyle(style);
                var myInlineStyle = myInlineStyleProps.MergeWith(prevInlineStyle);
                renderer.CurrInlineStyle = myInlineStyle;
                renderer.AddParagraph();
                renderer.WriteLeafInline(obj);
                renderer.CurrInlineStyle = prevInlineStyle;
            }
        }
    }

Step 4: Define InlineRenderers

Finally, add another class file named InlineRenderers to define all the classes used to render the Inline Elements, such as CodeInline, LiteralInline, LineBreakInline, etc., of the MardownDocument to the Word document with the assistance of helper methods defined in WordRenderer class.

The code below depicts the implementation of one renderer class for the LiteralInline element; for detailed implementation, refer to the InlineRenderers class file:

    // A Word renderer for a LiteralInline.    
    public class LiteralInlineRenderer : WordObjectRenderer<LiteralInline>
    {
        protected override void Write(WordRenderer renderer, LiteralInline obj)
        {
            renderer.Write(ref obj.Content);
        }
    }

Step 5: Convert .md file to Word

Finally, define the CreateDocx method, which reads the contents of a .md file and invokes the ToWord method of WordRenderer class to parse the markdown content and generate an Abstract Syntax Tree (AST), also known as a MarkdownDocument. This MarkdownDocument will then be passed to another overload of the ToWord method, which reads all the markdown objects in the generated AST and renders each one using the renderers defined in the BlockRenderers and InlineRenderers classes along with the helper methods defined in the WordRenderer class.

The code snippet below depicts the same method. You can refer to Md2Word.cs file in the sample for details:

public GcWordDocument CreateDocx(string[] sampleParams)
{
   //Path to access the .md file
   var fn = Path.Combine(sampleParams[3].Split('/'));

   //Read the contents of .md file
   var markdown = File.ReadAllText(fn);
   
   //Invoke the ToWord method of WordRenderer class to parse the .md file
   return WordRenderer.ToWord(markdown);
}

Lastly, the method defined above is invoked via a method call in Program.cs file, which even saves the generated Word file at the end, as shown below:

//Invoke the method to instantiate .md to Word file conversion
GcWordDocument docx;
docx = sample.CreateDocx(0);

//Save the generated Word file
docx.Save(docxName);

Convert .md file to PDF

We can achieve the conversion from a .md file to PDF by simply saving the Word (.docx) file generated above to a PDF file using the SaveAsPdf method of the GcWordLayout class:

using (var layout = new GcWordLayout(docx, wls))
{
    //Save the converted Word file to PDF
    layout.SaveAsPdf(pdfName, null, pdfs);
}

Here is a quick view of the PDF file converted from the Word document generated above:

pdf

Download this sample as described above to dive into the details.

You can observe the sample in action here and explore other powerful features of Document Solutions for Word through demos and documentation.

Take total control of Word documents with the fastest Word API available for .NET 6. Download a FREE 30-Day Trial!

comments powered by Disqus