Skip to main content Skip to footer

How to Programmatically Parse Formulas in Excel Using C#

Formulas are the heart of Excel and are an integral part of creating an Excel file. Each formula in Excel is represented by a semantic model, letting users create or modify a formula.

Wouldn't it be interesting to get a hold of this semantic model to play around with the formula? This is possible by parsing a formula and generating its syntax tree to represent the semantic model.

Document Solutions for Excel (DsExcel, previously GcExcel) supports the Formula Parsing feature, which allows parsing of formulas to generate a syntax tree representing the semantic model of the formulas and the ability to format and modify the formulas.

It even creates a syntax tree to generate a formula. For example, refer to the following image, which depicts a formula syntax tree generated for a formula:

formula

GrapeCity.Documents.Excel.Expressions namespace has been added to DsExcel API to provide a formula parsing feature. The FormulaSyntaxTree class is the entry of formula expressions API.

The Formula Parser topic in the documentation provides a deep dive into the technical details. The DsExcel demos showcase all the operations that can be performed by parsing a formula.

The steps to parse an existing formula in an Excel spreadsheet and modify it as per the requirements in C# are as follows:

  1. Initialize Workbook with Sample Data
  2. Extract Formula
  3. Parse a Formula
  4. Modify the Formula
  5. Save Excel File

Ready to Try it Out? Download Document Solutions for Excel, .NET Edition, Today!

Use Case

Creating a sales report using Excel is a widespread scenario. In this blog, we will be making a straightforward sales report using the formula parsing feature offered in the Document Solutions for Excel, .NET Edition API.

The snapshot below has been captured from an Excel file. The left displays raw sales data consisting of the sales representative's name, region, product, and units sold.

The right shows the sales analysis result corresponding to a specific sales representative who has been extracted from raw data and progress toward monthly sales targets in each product region combination. These targets are derived by categorizing the Units Sold field values into the following three brackets:

  • Less than 2500: Below Target
  • More than 3000: On Target
  • More than 5000: Above Target

use case

This sales analysis on the right has been made using an Excel formula, which combines different Excel functions to get the expected results. The formula makes use of IF, ISNUMBER, and FILTER functions, as described below:

=IF(ISNUMBER(FILTER(A2:D19,A2:A19="Fritz")),IFS(FILTER(A2:D19,A2:A19="Fritz")>5000,"Above Target",FILTER(A2:D19,A2:A19="Fritz")>3000,"On Target",FILTER(A2:D19,A2:A19="Fritz")<2500,"Below Target"),FILTER(A2:D19,A2:A19="Fritz"))

A similar sales analysis result for each sales representative is added to the Excel file to complete the sales report. This is accomplished by modifying the above formula. We need to replace the sales representative's name in the above formula with the name of another sales representative whose sales analysis is required.

This is how the completed sales report looks after adding sales analysis results for each representative along with data formatting:

sales

To make such a modification to the formula, it is imperative to know where the name variable for the sales representatives is located in the formula. If one were to do this manually, it would be a tedious task and prone to errors.

This task can be simplified by parsing the formula and using the parsed syntax tree to replace the sales representative name easily.

This blog will learn how to accomplish formula parsing and modification using Document Solutions for Excel and C#. The steps ahead will guide you on using DsExcel API to parse the above-described sales analysis formula and modify it to get the expected result using C#.

Parse and Modify Excel Formula Using C#

To begin with, create a new C# console application (.NET Core) and install GrapeCity Documents for Excel package using NuGet Package Manager to get started and follow the steps ahead. Refer to DsExcel's Quick Start documentation for more details.

Note: Document Solutions was previously know as GrapeCity Documents, the older product name currently remains on our NuGet packages.

1: Initialize Workbook with sample data

Instantiate an instance of the Workbook class and import the sample data from the Excel file, as shown below. The code below even enables Dynamic Array support in DsExcel by setting the AllowDynamicArray property of the Workbook class to true. This is required as the formula used for calculation uses the FILTER function, a dynamic array function.

//Create a new workbook
var workbook = new GrapeCity.Documents.Excel.Workbook();           

//Load sample data from excel file
workbook.Open("SampleData.xlsx");

//Enable dynamic array formula
workbook.AllowDynamicArray = true;

2: Extract Formula

After the Workbook is loaded with sample data and the expected formula, we extract the required formula from the worksheet for parsing and modification using the Formula property.

The Formula Parser provided by DsExcel API expects the formula to be passed without the '=' (equal to) operator for successful formula parsing. Therefore, take note of how the formula has been extracted below without the "=" operator.

//Fetch worksheet
var worksheet = workbook.Worksheets[0];

//Fetch the original formula which needs to be parsed.
var originalFormula = worksheet.Range["H3"].Formula.Substring(1);

3: Parse a Formula

Invoke the Parse method of FormulaSynatxTree class to parse a formula and generate a syntax tree that helps you understand all different types of values, operators, and functions the formula contains.

Each of these tokens of the formula syntax tree is represented by other classes in DsExcel API, such as FunctionNode for a function, OperatorNode for an operator, etc.

Refer to the following documentation, which lists all the available classes for Formula Parser.

The code below parses the sales analysis formula extracted in the last step. It then appends the values from the generated FormulaSyntaxTree to a workbook, which is later saved as an Excel file to help you understand the syntax tree of the formula.

//Method to parse a formula and print the syntax tree
public static void ParseAndPrint(IWorksheet worksheet, string formula)
{
   // Get syntax tree
   var syntaxTree = FormulaSyntaxTree.Parse(formula);

   // Flatten nodes
   var displayItems = new List<(string TypeName, int IndentLevel, string Content)>();

   void flatten(SyntaxNode node, int level)
   {
      displayItems.Add((node.GetType().Name, level, node.ToString()));
      foreach (var child in node.Children)
      {
         flatten(child, level + 1);
      }
   }

   flatten(syntaxTree.Root, 0);

   // Output          
   worksheet.ShowRowOutline = false;
   worksheet.OutlineColumn.ColumnIndex = 1;

   // Header
   worksheet.Range["A1"].Value = "Formula";
   worksheet.Range["A3"].Value = "Syntax node";
   worksheet.Range["B3"].Value = "Part";

   // Values
   worksheet.Range["B1"].Value = "'=" + formula;
   for (var i = 0; i < displayItems.Count; i++)
   {
      var item = displayItems[i];
      var text = "'" + item.TypeName;

      worksheet.Range[i + 4, 0].Value = text;
      worksheet.Range[i + 4, 0].IndentLevel = item.IndentLevel;
      worksheet.Range[i + 4, 1].Value = "'" + item.Content;
   }

   //Apply styling
   worksheet.Range["A1:B3"].Interior.Color = System.Drawing.Color.FromArgb(68, 114, 196);
   worksheet.Range["A1:B3"].Font.Color = System.Drawing.Color.White;
   worksheet.Range["A1:B3"].Borders.Color = System.Drawing.Color.FromArgb(91, 155, 213);
   worksheet.Range["A1:B3"].Borders.LineStyle = BorderLineStyle.Thin;
   worksheet.Range["A1,A3,B3"].Font.Size = 14;
   worksheet.Range["A1,A3,B3"].Font.Bold = true;
   worksheet.Range["A:C"].EntireColumn.AutoFit();           
}

Here is a quick view of the generated FormulaSyntaxTree. Please note this is just a part of the complete syntax tree:

syntax

4: Modify the formula

From the syntax tree generated in the last step, you can see the sales representative name is represented as a TextNode and has multiple occurrences in the formula. We can replace all these occurrences with a simple find and replace operation as shown in the code below:

  1. To replace the sales representative names in the formula, we start with a list of their names. We filter out the list of unique names from raw data by using the UNIQUE function. And then use the result of this UNIQUE function to parse and modify the sales analysis formula for all sales representatives.
  2. We modify the sales representative name using TextNode class. The code below initializes an instance of TextNode class, passing the sales representative name to be searched for in the formula as a parameter. This instance can be termed as the Find Node.
  3. Next, we initialize another instance of TextNode class, passing the sales representative name to be replaced in the formula as a parameter. This instance can be termed as the Replace Node.
  4. A recursive function replaceNode has been defined in the code below to traverse all the child nodes of the syntax tree and replace each occurrence of Find node with Replace node. This action is repeated for each sales representative.
  5. Once the formula has been modified, the new formula is assigned to a cell in the Worksheet to generate the expected sales report.

The code below includes some formatting code to format the sales report content.

//Method to parse and modify the formula
public static void ModifyFormula(IWorksheet worksheet, string originalFormula)
{
    //Apply UNIQUE formula to get unique sales representatives list
    worksheet.Range["F1"].Value = "Unique Rep";
    worksheet.Range["F2"].Formula = "=UNIQUE(A2:A19)";
    var uniqueRep = worksheet.Range["F2#"];
    // Apply Styling
    worksheet.Range["F:F"].EntireColumn.AutoFit();
    worksheet.Range["F1"].Interior.Color = System.Drawing.Color.FromArgb(68, 114, 196);
    worksheet.Range["F1"].Font.Color = System.Drawing.Color.White;
    worksheet.Range["F2#"].Borders.Color = System.Drawing.Color.FromArgb(91, 155, 213);
    worksheet.Range["F2#"].Borders.LineStyle = BorderLineStyle.Thin;

    //Get syntax tree
    var syntaxTree = FormulaSyntaxTree.Parse(originalFormula);

    //Find
    var findText = new TextNode("Fritz");

    //Replacement
    var replaceText = new TextNode("");

    //Loop through names list to modify the formula for each sales representative
    for (int r = 0, resultRow = 3; r < uniqueRep.Cells.Count; r++, resultRow = resultRow + 4)
    {
       //Get name to be replaced in the formula
       var cval = uniqueRep.Cells[r].Value.ToString();

       if (findText.Value != cval)
       {
          //Assign name to be replaced to Replace TextNode
          replaceText.Value = cval;

          //Invoke the recursive method to perform find and replace operation
          replaceNode(syntaxTree.Root, findText, replaceText);

          //Assign the modified formula to a cell in the worksheet
          var resultRange = "H" + resultRow.ToString();
          worksheet.Range[resultRange].Formula = "=" + syntaxTree.ToString();
          worksheet.Range[resultRange + "#"].Borders.Color = System.Drawing.Color.FromArgb(91, 155, 213);
          worksheet.Range[resultRange + "#"].Borders.LineStyle = BorderLineStyle.Thin;

          //Update the value of Find node to perform find and replace operation for next sales representative name
          findText = replaceText;
       }
    }

    //Find and replace
    void replaceNode(SyntaxNode lookIn, SyntaxNode find, SyntaxNode replacement)
    {
       var children = lookIn.Children;

       for (var i = 0; i < children.Count; i++)
       {
          var child = children[i];
          if (child.Equals(find))
          {
             children[i] = replacement;
          }
          else
          {
             replaceNode(child, find, replacement);
          }
       }
    }
 }

Here is one of the modified formulas:

=IF(ISNUMBER(FILTER(A2:D19,A2:A19="Xi")),IFS(FILTER(A2:D19,A2:A19="Xi")>5000,"Above Target",FILTER(A2:D19,A2:A19="Xi")>3000,"On Target",FILTER(A2:D19,A2:A19="Xi")<2500,"Below Target"),FILTER(A2:D19,A2:A19="Xi"))

5: Save Excel File

Once all the modified formulas have been added to the worksheet, the Save method of the Workbook class is invoked to save the Excel file as depicted in the code below:

//Save modified Excel file
workbook.Save("ModifiedFormula.xlsx", SaveFileFormat.Xlsx);

Open the saved Excel file to observe the following output:

excel

Download the sample to implement the aforementioned scenario and receive a deep understanding of the concept.

Refer to documentation and demos for more details.

Download a trial of Document Solutions for Excel, .NET Edition, Today!


Tags:

comments powered by Disqus