Document Solutions for PDF
Features / Text Search, Replace and Delete
In This Topic
    Text Search, Replace and Delete
    In This Topic

    Search Text

    DsPdf allows text search in a PDF document to find all occurences of the specified text. The library supports all common find text options including regular expressions, case-sensitive search, etc. It also works across line breaks, so logically connected text that is rendered on different text lines can also be found. You can use FindText method of the GcPdfDocument to search text in a PDF document. This method accepts object of FindTextParams and OutputRange class as parameters to find all the occurrences of the searched string in the loaded document. The FindTextParams class represents the target text to be searched. The class also lets you incorporate other useful search options discussed below:

    Search and Highlight Text

    FindText method returns a list of all occurrences of the searched text. You can iterate through the list and highlight the search results using FillPolygon and DrawPolygon methods of the GcGraphics class.

    Search Text 

    The example below shows how to search and highlight a text string in a PDF document:

    C#
    Copy Code
    public void CreatePDF(Stream stream)
    {
        //load file
        var doc = new GcPdfDocument();
        using var fs = File.OpenRead("TimeSheet.pdf");
        doc.Load(fs);
    
        //define text bounds
        var findText = new FindTextParams("HOURS", true, false);
    
        //find text
        IList<FoundPosition> findTextList = doc.FindText(findText);
    
        //highlight text
        foreach (FoundPosition text in findTextList)
        {
            //get bounds of each occurrence of found text
            var g = doc.Pages[text.PageIndex].Graphics;
            Quadrilateral[] pos = text.Bounds;
    
            //highlight the text
            g.DrawPolygon(pos[0], Color.Yellow, 1);
            g.FillPolygon(pos[0], Color.FromArgb(100, Color.OrangeRed));
        }
    
        //save pdf 
        var newDoc = new GcPdfDocument();
        newDoc.Load(fs);
        doc.Save("FindText.pdf");
    
    }
    

    Case-sensitive Search

    Case sensitivity is also one of the criteria while searching for a text string. Using DsPdf library, you can specify whether the text search should be case sensitive or not. To search for a text with matching case, you can set matchCase parameter of the FindTextParams method to true.

    Case Sensitive Search 

    The example below shows how to search strings having specific case in a PDF document:

    C#
    Copy Code
    //find word “time”, the word “Time” or “TIME” will be ignored
    var findWord = new FindTextParams("time", false, true);
    var findText = doc.FindText(findWord);
    

    Whole Word Search

    DsPdf lets you search for a whole word or you can also search for instances that are subset of a certain word present in the PDF document. To search for a whole word, you can set wholeWord parameter of the FindTextParams method to true.

    whole word search

    The example below shows how to search whole word strings in a PDF document:

    C#
    Copy Code
    //find word “time”, the word “overtime” will be ignored
    var findWord = new FindTextParams("Time", true, false);
    var findText = doc.FindText(findWord);
    

    Regular Expression Search

    Regular expressions are useful when you want to search variable text strings that use common pattern such as date, time, email address, etc. instead of searching a particular text or phrase. To search using regular expressions, you need to pass regular expression as a string parameter to the FindTextParams method and set its regex parameter to true.

    whole word search

    C#
    Copy Code
    //finds all the dates present in PDF document, using regular expressions
    var findWord = new FindTextParams(@"\d+[/-]\w+[/-]\d\d", false, false, 72, 72, false, true);
    var findText = doc.FindText(findWord);
    

    For more information about implementation of text search using DsPdf, see DsPdf sample browser.

    Replace Text

    With DsPdf, you can replace a text in the whole document or its specific page by using ReplaceText method which is available in the GcPdfDocument and Page classes, and on the ITextMap interface. This method accepts the object of FindTextParams class and the new text string along with other parameters to find and replace all occurrences of the target text. It searches the target text and replaces it with the new text along with adjusting the space required to accommodate the replaced text.

    The code below shows how to replace a text in the whole document:

    C#
    Copy Code
    // replace word ".NET Standard 2.0" with ".NET 6" in document
    using (FileStream fs = new FileStream(@"..\..\..\DotnetFramework.pdf", FileMode.Open, FileAccess.Read, FileShare.Read))
    {
        GcPdfDocument doc = new GcPdfDocument();
        doc.Load(fs);
        FindTextParams ftp = new FindTextParams(".NET Standard 2.0", true, false);
        doc.ReplaceText(ftp, ".NET 6", null, null, null);
        doc.Save("DotnetFramework_Document.pdf");
    }
    

    Delete Text

    DsPdf allows you to delete a text in the whole document or a specific page by using DeleteText method which is available in the GcPdfDocument and Page classes and, on the ITextMap interface. This method accepts the object of FindTextParams class and DeleteTextMode enumeration. The DeleteTextMode enumeration provides two options - ‘Standard’ and ‘PreserveSpace’ which represent two modes of deleting text in a Pdf document.

    On deleting a text in the Standard mode, the text following the deleted text shifts to fill the void created by deleted text. However, in the PreserveSpace mode, the document retains an empty space at the place of the deleted text and text after the deleted text does not move.

    The code below shows how to delete a text string from first page of a PDF document using standard mode:

    C#
    Copy Code
    // delete word "wetlands" from the first page using DeleteTextMode.Standard
    using (FileStream fs = new FileStream(@"..\..\..\Wetlands.pdf", FileMode.Open, FileAccess.Read, FileShare.Read))
    {
        GcPdfDocument doc = new GcPdfDocument();
        doc.Load(fs);
        FindTextParams ftp = new FindTextParams("wetlands", true, false);
        doc.Pages[0].DeleteText(ftp, DeleteTextMode.Standard);
        doc.Save("wetlands_deleted.pdf");
    }