TextParser Library
Working with TextParser / Using Regular Expressions
In This Topic
    Using Regular Expressions
    In This Topic

    The TextParser library provides StartsAfterContinuesUntil class to set up the StartsAfterContinuesUntil extractor which retrieves a block of text from a plain text source, based on regular expressions. It pulls all the text placed between the occurrences of starting and ending regular expression patterns.

    The extraction technique using regular expressions is the simplest and the easiest to get started with, however, we must consider the following facts about this technique:


    Let us take another example to understand how to use TextParser using regular expressions.

    Following drop down section shows the text input source.

    Click here to see the input

    Let’s look at the latest ranking of two best tourist spots in Japan.

    1. Fushimi Inari-taisha Shrine (Kyoto)

    Fushimi Inari Taisha Shrine fascinates the tourists most by numerous gates.

    2. Hiroshima Peace Memorial Museum (Hiroshima)

    Hiroshima Peace Memorial Museum is one of top visited tourist sites.

     

    Now, from the text input we want to retreive just the names of the tourist spots listed above. To extract any such text between StartsAfter and ContinuesUntil regular expressions using StartsAfterContinuesUntil class, you need to implement the steps mentioned in the code snippet below:

    1. Open the plain text input stream from which you want to extract the text.
      Stream inputStream = File.Open("JapanTouristSpots.txt", FileMode.Open);
      
    2. Create an instance of the StartsAfterContinuesUntil class and pass the starting and ending regular expression patterns as parameters to it. This will initialize the StartsAfterContinuesUntil class to extract all the text between two regular expressions.
       StartsAfterContinuesUntil extractor = new StartsAfterContinuesUntil(@"([1-9]|10)[.]\s", "\r");
      
         
    3. The Extract method of  StartsAfterContinuesUntil class is used to extract the text from input stream. This method returns an instance of IExtractionResult interface containing the extraction results.
      IExtractionResult res = extractor.Extract(inputStream);
      
    4. The ToJsonString method of the IExtractionResult interface is used to convert the extraction result to JSON string.
      Console.WriteLine(res.ToJsonString());
      

    Following image shows the parsed result in JSON string format:

    Parsed Result