TextParser Library | ComponentOne
Working with TextParser / Using XML Template
In This Topic
    Using XML Template
    In This Topic

    TextParser library provides TemplateBasedExtractor class to set up the Template-Based extractor that allows you to parse a plain text document following any user defined structure format.

    The structure format is a template which is specified following a declarative approach that is XML. The plain text input to parse can contain many instances of the defined template. All the text that matches the specification of the template can be extracted from the input text.

    This section helps you get started on how to define your custom Template-Based extractor templates.

    Defining a template

    The template to be used for text extraction is defined formally using XML elements/tags and its properties. The root of any XML template definition must be a template XML element. The extraction can be performed either by defining properties for the “template” element or by nesting the template element to define complex user-defined structures. Following are the different template structures for the text extraction process:

    Applying a template

    To extract the text using TemplateBasedExtractor class, you need to implement the steps mentioned in the code snippet below:

    1. Open the plain text template file which contains the user defined XML template.
      Stream templateStream = File.Open(@"Template.xml", FileMode.Open);
    2. Create an instance of the TemplateBasedExtractor class and pass the stream containing the user defined XML template as a parameter to it.
      TemplateBasedExtractor extractionResult = new TemplateBasedExtractor(templateStream);
    3. Open the plain text input source file from which you want to extract the text.       
      Stream inputStream = File.Open(@"Source.txt", FileMode.Open);
         
    4. Extract the desired text from the input source using the Extract method of the TemplateBasedExtractor class. This method returns an instance of IExtractionResult interface containing the extraction results.
      IExtractionResult res = extractionResult.Extract(inputStream);
    5. To convert the extraction result to JSON string, use the ToJsonString method of the IExtractionResult interface.
      Console.WriteLine(res.ToJsonString());

    After defining and applying the XML template through code, the parsed result is obtained in the JSON string format which can be further used as the extracted text from the input source.