TextParser Library
Working with TextParser / Using XML Template / Nested Template
In This Topic
    Nested Template
    In This Topic

    The parsing possibilities of the Template-Based extractor are not limited to simple templates. It also allows the user to define complex structures like nested templates where a template element can contain other template elements as children. These children can be defined using the element tag.

    To understand how we can define nested templates, let's take an example where we want to extract "int" from an input source followed by a “quotedString” separated by a comma. The structure is delimited by “{“ and “}”. Let's take the input source as {7891, "hello"} that matches the following template definition:

    <template name="myTwoElementSequence" startingRegex="{" endingRegex="}" childrenSeparatorRegex="," childrenOrderMatter="true">
       <element name="myFirstElement" extractFomat="int"/>
       <element name="mySecondAndLastElement" extractFomat="quotedString"/>
     </template>
    

    The above example introduces following properties of the “template” element to be defined when creating a template consisting of children elements:

    Property Description Template Definition
    childsSeparatorRegex This property defines the regular expression that must match between any two child instances extracted. <template name="simpleEmailTemplate" startingRegex="{" childsSeparator = ","  />
    childsOrderMatter This property can be used to specify whether the child extracted instances should follow the order in which the childs are defined in the template specification. The default value of this property is “false”.

    <template name="simpleEmailTemplate" startingRegex="{" childsOrder = "true"  />

    ignoreWhiteSpaces

    By default this property is "true" if not defined in a template. However, if the template element is nested within another template element and has not defined its own “ignoreWhiteSpaces” property, then the element would inherit the value of this property from the parent template.

    <template name="simpleEmailTemplate" startingRegex="{" ignoreWhiteSpaces = "false"  />

    There is also a possibility of searching a text for multiple occurrences of the same child element. In case we want to search for multiple integer values, so we can either define a child element for each occurrence of integer value or we can set the “occurs” property of the element tag defined for the integer value. The property defines how many occurrences of the child element must exist. The value of this property can be set either to a single integer value or to a range of integer values.

    For example, we want to extract a sequence of integer values whose count ranges from 3 to 7 separated by comma, and enclosed within “{ }” from the following input:

    {12 , 34} {65, 87, 34} {1, 34, 267,123} {123} {1, 23, 45, 67, 89, 12, 23}

    The template for the above example can be defined as following:

    <template name="myIntegerSequence" startingRegex="{" endingRegex="}" childrenSeparatorRegex="," childrenOrderMatter="true">
      <element name="myResult" extractFormat="int" occurs="3-7"/>
    </template>
    

    Following drop down section displays the output:

    Click here to see the output

    {

      "Extractor": "XMLTemplateBased",

      "Result": {

      "myIntegerSequence": [

        {

          "myResult": [

            65,

            87,

            34

          ]

        },

        {

          "myResult": [

            1,

            34,

            267,

            123

          ]

        },

        {

          "myResult": [

            1,

            23,

            45,

            67,

            89,

            12,

            23

          ]

        }

      ]

    }

    }

    Lastly, let's consider an example which implements all the properties explained above to define a nested template structure for extracting relevant ticket information from a mail containing text about the customer support tickets:

    Following drop down section displays the text input source:

    Click here to see the input

    Hi John,

    Below are all necessary information about the tickets which need to be resolved this week:

    customerName "Robert King",
    customerId 10,
    ticketId 200,
    ticketSubject "Change cell color in Grid",
    ticketContent "I am not able to figure out how to change the color of cells in a grid. Need your help"

    Also, last week we had resolved the query from Mr Andrew whose customerId is 20 but for some reason its ticket does not appear on our system. Please take care of the same as early as possible.

    customerId 20,
    customerName "Andrew Fuller",
    ticketId 230,
    ticketSubject "Adding sparkline to grid",
    ticketContent "Could you please let me know how could I add sparklines to a column in my grid"

    Regards,
    Nancy Jones

    The following nested XML template defines the structure of a customer support ticket and can be used to extract the customer support ticket information from the above input source:

    <template name="CustomerTicketInfo" childrenSeparatorRegex=",">
    <element name="customer_name" startingRegex="customerName" extractFormat="quotedString" />
    <element name="customer_id" startingRegex="customerId" extractFormat="int" />
    <element name="ticket_id" startingRegex="ticketId" extractFormat="int" />
    <element name="ticket_subject" startingRegex="ticketSubject" extractFormat="quotedString" />
    <element name="ticket_content" startingRegex="ticketContent" extractFormat="quotedString"/>
    </template>
    

    You can observe the use of the name, extractFormat, startingRegex and childSeparatorRegex properties in the above defined template.

    Following image shows the parsed result in JSON string format after applying the template:

    Result in JSON string format