Semi-Structured Text Files
The C1TextParser library can parse text files such as HTML and plain text. Data parsing is a method where one string of data gets converted into a different type of data.
Output Format as JSON
The extraction result can be formatted as JSON or an object instance from a custom class.
Types of Extractors
The C1TextParser library supports three different extractors: Starts-After-Continues-Until, HTML, and Template-based.
Extraction can occur along matched regular expressions, after a matched word or phrase, or using a defined script.
Extract and Parse Text
Use C1TextParser along with other ComponentOne components to extract text from more file types. Use C1Word or C1PdfDocumentSource to extract text from Microsoft Word and PDF files that C1TextParser can then parse.
The Starts-After-Continues-Until extractor is the simplest and the easiest to use.
- This extractor was designed with the purpose of extracting relevant text from a plain text source.
- To use it you must define two parameters: where the text starts and where it ends (or continues until).
- Essentially, it extracts all the text contained between the occurrences of two regular expressions.
The HTML Extractor is designed to help automate the process of extracting specific data from emails and other HTML-structured files. Automated emails, such as travel itineraries, tickets, and e-commerce receipts, typically follow a repeated structure that can be parsed using C1TextParser even if every email does not follow the exact same HTML structure. The HTML Extractor is similar to the template-based extractor; however it’s specialized for complex HTML documents by allowing unexpected characters within the markup.
The template-Based extractor is the most generic as it allows users to parse data structures following a declarative XML template. Since the template can be provided as a separate file, it allows users to provide both template and source from which to parse. The plain text source can contain many instances of the defined structure. All the text that does not match the template specification is simply ignored.
Extract Important Email Information
Emails are a prevalent source of data for specific segments of a company (product, sales, marketing), and often data extraction is manual. Anytime you receive an email that has a similar repeated structure, a parser can be useful. C1TextParser enables you to extract, store easily, and track this repeated type of data from emails. Once extracted, the data can be stored or passed to another destination.
Examples of emails that can be easily parsed include:
- Invoices and order forms
- Leads from an email submission form
- Customer support and requests
- Ticket and travel reservation confirmations
Process Resumes for Digital Analysis
Resumes are often formatted in a predictable manner that allows them to be easily read by a machine for parsing out important information. If a company has to deal with hundreds of resumes that would take too much time for humans to process, a text parsing service that first analyzes the resumes can help narrow the field or provide quick stats on the candidate pool by parsing out key requirements.
Improve Productivity with Smart Tags
TextParser can be used to parse and provides intelligence written text in a CRM or editor. Identify common patterns in text and add Smart Tags that allow the end-user to perform actions quickly. Examples include parsing and formatting phone numbers to launch a phone calling app, formatting people's names to add to a contacts list, or format dates to add calendar events. This UI behavior is becoming more common as our software becomes smarter and helps us become more productive.