Skip to main content Skip to footer

Redacting Content in PDF Files in .NET Core

What is Redaction?

You might want to share a document within or outside the organization, without revealing sensitive information contained within the document. Some common examples of sensitive information are names, addresses, account numbers, phone numbers, legal information, legal or financial records.

Simply covering the content with black rectangles is not enough, as the text is still present and can be extracted through any tool. This is where redaction can be used. PDF redaction allows you to wipe sensitive information from an existing PDF. By using this method, the redacted content is not visible AND cannot be extracted using PDF tools (GcPdf itself).

In the Documents for PDF 2019 v2, we introduced GcPdf Redact Annotation which allows marking parts of PDF documents for redaction. The actual applying of those redactions is planned for the next release, (due in late October 2019).

To remove the marked areas from the document, you can open the PDF in Adobe Acrobat, review the redact annotations added by GcPdf, apply them and save the PDF.

RedactAnnotation class

GcPdf provides a class named RedactAnnotation, which can be used to specify the regions of a page that are intended to be removed. You can also use it to manipulate the existing redact annotations.

In the simplest case, you can assign a rectangle to be removed to the Rect property of the RedactAnnotation class. Alternatively, the Area property can be used to specify a list of quadrilaterals to be excluded, so that multiple areas can be added to a single redact annotation. You can also specify the outline color which should be used to highlight the annotation's border using the MarkBorderColor property. The color with which the redacted region should be filled can be set using the OverlayFillColor property.

For specifying the overlay text that should be drawn over the redacted region, you can make use of its OverlayText property. Additionally, you can specify whether the overlay text should be repeated to fill the redacted region or not using the OverlayTextRepeat property. Other settings such as the justification and appearance of the overlay text can be customized with the help of Justification and OverlayTextAppearanceString properties of this class.

For adding a redact annotation to a page, create an instance of RedactAnnotation class, specify its relevant properties and add this instance to the Annotations collection of the page.

Note: PDF viewers that support redact annotations (e.g. Adobe Reader) show the effect of specified redactions when the mouse hovers over the marked areas.

Adding Redact Annotation to a PDF

Adding Redact annotation to a PDF

Consider a scenario where you have a PDF invoice (as shown in the above image) and you want to mark the customer’s address (static text) and all the mobile numbers (a pattern) contained in the invoice for redaction before distributing it to others.

Let's see how this can be achieved with the help of RedactAnnotation class.

Step 1: Load an existing PDF

Create an instance of GcPdfDocument class and load the PDF Invoice into it using the Load method of GcPdfDocument class.

doc = new GcPdfDocument();

doc.Load(new FileStream(Path.Combine("Invoice.pdf"), FileMode.Open, FileAccess.Read));

Step 2: Create a method to add the redact annotations

Now we create a method named ‘AddRedactAnnotation’. In this method, we first find all the occurrences of the input text in the document using the FindText method. The FindText method returns information i.e page and bounds of the found text. If no object of RedactAnnotation exists for the page in which the specified content is found, then we create a new instance of RedactAnnotation, add the bounds of the found text to the Area collection of the RedactAnnotation class and finally add the annotation to the corresponding page. In case the redact annotation already exists for the page, then we simply add the found text’s bounds to the Area collection of the page’s redact annotation.

private void AddRedactAnnotation(string txt)

        {

            //Finds all the occurrences of the searched text(i.e address/mobile numbers)

            //including the page number and bounds of the found text

            var found = doc.FindText(new FindTextParams(txt, true, false), null);



            foreach (var f in found)

            {

                List<RedactAnnotation> redactAnnotations = doc.Pages[f.PageIndex].Annotations.OfType<RedactAnnotation>().ToList<RedactAnnotation>();

                RedactAnnotation redactAnnotation;

                redactAnnotations = doc.Pages[f.PageIndex].Annotations.OfType<RedactAnnotation>().ToList<RedactAnnotation>();

                if (redactAnnotations.Count == 0)

                {

                    //Redact annotation not found for the page, create it

                    redactAnnotation = new RedactAnnotation();

                    redactAnnotation.OverlayFillColor = System.Drawing.Color.Black;

                    redactAnnotation.MarkBorderColor = Color.Red;

                    redactAnnotation.Area.Add(f.Bounds); // add the text's fragment area to the annotation

                    doc.Pages[f.PageIndex].Annotations.Add(redactAnnotation); // add the annotation to the page

                }

                else

                {

                    //Redact annotation found for the page, retrieve it and add the text's fragment area to the annotation

                    redactAnnotation = redactAnnotations.First();

                    redactAnnotation.Area.Add(f.Bounds);

                }

            }

        }

Step 3: Call the method created above with the text that is to be marked for redaction as parameter

Call the ‘AddRedactAnnotation’ method created in the above method with the customer’s address and the mobile numbers found in the document’s text as parameters. The method would then search for the text (address/mobile numbers) and mark the specified content for redaction.

//Adds redact annotation for the customer address in the invoice

AddRedactAnnotation("410 Main Street, NY 10027");

//Searches a pattern(mobile numbers) in the document's text           

MatchCollection match = Regex.Matches(doc.GetText(), @"\(?\d{3}\)?-? *\d{3}-? *-?\d{4}");

foreach (Match m in match)

     AddRedactAnnotation(m.Value); //Adds redact annotation for the mobile numbers in the invoice

Step 4: Save the PDF

Save the document using Save method of the GcPdfDocument class.

doc.Save("Redacted_Invoice.pdf"); //Save the document

This is how the saved document will look:

Redacting annotations on a PDF

In case you want to format the redact annotations differently for a page, you can create different instances of redacting annotations. Set their individual properties and add them to the page as desired.

Step 5: Apply the Redactions

After the document has been processed by the code shown above, you can load the modified document into Adobe Acrobat, review the redact annotations created by the code, and if everything seems fine, apply them and save the PDF. Now the content marked for redaction will be completely removed from the original PDF.

In the next release, it will be possible to also remove the content that is marked for redaction, using GcPdf itself.

Thanks for following along! If you have any questions about the new features, please leave them in the comments below.

Happy Coding!

Palak Bansal

Associate Software Engineer
comments powered by Disqus