Product Architecture

In This Topic

Packaging

DsPdf is a collection of cross-platform .NET class libraries written in C#, that provides an API that allows the creation of PDF files from scratch as well as loading, analyzing, and modifying existing documents.

DsPdf is compatible with .NET Core 2.x/3.x, .NET Standard 2.x, .NET Framework 4.6.1 or higher, and .NET 6 or higher.

DsPdf and supporting packages are available on nuget.org:

DS.Documents.Pdf
DS.Documents.BarCode
DS.Documents.Imaging
DS.Documents.Imaging.Windows
DS.Documents.DX.Windows

To use DsPdf in an application, simply reference the DS.Documents.Pdf package. All other required packages that DsPdf utilizes will be installed automatically.

To render barcodes, install the DS.Documents.Barcode package (DsBarcode for short). It provides extension methods allowing to draw barcodes when using DsPdf.

DS.Documents.DX.Windows provides access to the native imaging APIs to DsPdf if it runs on a Windows system.

DsPdf API Overview

Classes and other types in the DsPdf and related libraries expose a PDF object model that closely follows the Adobe PDF specification version 1.7 published by Adobe. DsPdf is designed to provide, whenever feasible, direct access to all features of the PDF format, including the low-level features. In addition, DsPdf provides a powerful and platform-independent text layout engine and some other high-level features that make document creation using DsPdf easy and convenient.

Namespaces

Namespaces	Description
GrapeCity.Documents.Drawing	Framework for drawing on the abstract GcGraphics surface.
GrapeCity.Documents.Pdf	Types used to create, process and modify PDF documents includes GcPdfGraphics. Nested namespaces contain types supporting specific PDF spec areas: GrapeCity.Documents.Pdf.AcroForms GrapeCity.Documents.Pdf.Actions GrapeCity.Documents.Pdf.Annotations GrapeCity.Documents.Pdf.Graphics GrapeCity.Documents.Pdf.Log GrapeCity.Documents.Pdf.Parser GrapeCity.Documents.Pdf.Security
GrapeCity.Documents.Text	Text processing sub-system.

GcPdfDocument

A PDF document in DsPdf is represented by an instance of the GrapeCity.Documents.Pdf.GcPdfDocument class. To create a new PDF, create an instance of GcPdfDocument, add content to it and then call one of the GcPdfDocument.Save() overloads to write the document to a file. Save() method can be called multiple times on an instance of GcPdfDocument, so that many (possibly different) PDF documents can be created.

GcPdfDocument also provides a Load() method, allowing the analysis and/or modification of an existing PDF. When Load() method is called on an instance of GcPdfDocument, the instance is cleared first. It is important to note that the Load() method accepts a Stream that is opened by the caller on the PDF which is loaded, and the stream must be readable and must be kept open for the duration of working with the loaded document. This is because Load() method does not actually load the whole document into memory, rather it loads the required parts on demand, which keeps the memory footprint to a minimum and improves performance. Note that Load() is a "read-only" method. GcPdfDocument does not try to write back to the loaded stream - In order to save any changes made to the document, Save() method must be called, specifying the output file or stream as a newly created document.

A number of properties and collections on the GcPdfDocument provide access to the content and properties of the document. The most important collection is Pages (see The Pages Collection), others include Outlines, AcroForm, Security and so on.

The Pages Collection

The Pages collection represents the collection of a document's pages. When a new GcPdfDocument is created, this collection is initially empty. The usual collection modifying methods are available and can be used to fetch, add, insert, remove or move pages around. When an existing PDF is loaded into a GcPdfDocument, the Pages collection is filled with the pages loaded from that document. It can then be modified in the same way as in a document created from scratch.

Modifying Existing Documents

Using the GcPdfDocument.Load() method, existing documents can be inspected and modified. The possible modifications include:

Changing the writable properties of the loaded document and its elements.
Adding arbitrary new content. Anything that can be added to a new document, can also be added to a loaded one: pages, page content, annotations, fields and so on.
Modifying collections on the document and document pages. Elements of the following collections can be moved around, removed or added:
- At the document level:
  - Pages
  - NamedDestinations
  - Outlines
  - AcroForm.Fields
- At the page level:
  - ContentStreams
  - Annotations

No other modifications are supported at this time. For example, it is currently not possible to replace existing text or graphics, except by removing existing and adding new content streams.

It should be noted again that when an existing document is loaded into a GcPdfDocument instance, the connection with the original document is read-only, i.e. content is fetched as needed from the underlying stream, but no attempt is made to write back the changes. The GcPdfDocument.Save() method should be called if preserving the changes is required.

Sequential (StartDoc/EndDoc) Mode

In addition to the Save() method mentioned above, GcPdfDocument provides a sequential mode for creating a PDF. To use this mode, start by calling the StartDoc() method on the document, specifying a writable Stream as the method's only parameter. After that content can be added to the document as usual, but with following limitations. When done, call the EndDoc() method which completes writing the document.

The limitations of the sequential method are as follows:

The only allowed modification of the Pages collection is adding a page to the end of it. Removing, inserting or moving pages is not allowed.
You can only draw on the last page of the Pages collection. Once another page has been added after it, modifying any of the preceding pages is not allowed.
Certain features (e.g. linearization) are not available in this mode.

The advantage of the sequential mode is that the pages of the document are written to the underlying stream as soon as they are completed, so especially if creating a very large PDF the memory footprint can be much smaller.

Text

Text measuring and layout is supported by a specialized set of classes in the GrapeCity.Documents.Text namespace. These classes provide a rich object model that includes, and allows access to text elements from high-level (paragraphs) all the way down to the lowest levels, such as individual font and glyph features. Text processing is completely platform-independent and does not rely on any operating system-provided APIs.

The most important class in the GrapeCity.Documents.Text namespace is TextLayout, it represents one or more paragraphs of text, and supports the following features:

Layout of paragraphs in an arbitrary rectangular area using a specified text flow direction
Line wrapping according to the Unicode standard recommendations
OpenType, TrueType and WOFF fonts, including extensions for handling national languages
Individual formatting of text fragments using different fonts, font styles and colors (see TextFormat class)
Typography features such as tabs, text alignment, char and line spacing, etc.
Text flow around rectangular areas
Inline and anchored objects
Kashida text justification in Arabic scripts
Splitting of large bodies of text into several layouts (columns or pages), including support for column balancing and control over widow/orphan lines

All features are fully supported for vertical (Chinese or Japanese) and RTL/bidirectional text.

After a text has been added to, and processed by, an instance of the TextLayout class, a representation of the text is generated using the glyphs from the specified fonts, and coordinates of any fragment of the original text in the generated layout can be fetched, if necessary.

A TextLayout instance can also be directly rendered onto GcGraphics (see Graphics) using the DrawTextLayout method. Simple MeasureString/DrawString methods on GcGraphics are also provided for convenience.

Graphics

DsPdf provides a graphics surface to draw on, represented by a GcPdfGraphics class, which is an implementation of the abstract GcGraphics base class. GcPdfGraphics provides a flexible and rich object model for measuring, stroking, and filling the usual graphic primitives such as lines, rectangles, polygons, ellipses and so on. Drawing (Stroking) can be done with solid or dashed lines, shapes can be filled with solid, or gradient brushes. For an example of shape rendering methods, see GcPdfGraphics.DrawEllipse() or GcPdfGraphics.FillEllipse() method. Complex shapes can be created and rendered using graphic paths. For example, see GcPdfGraphics.DrawPath() method.

Graphics transformations using 3x2 matrices are fully supported (including text). For more information, see GcPdfGraphics.Transform() method.

Units of Measurement

The default units of measurement used by GcPdfGraphics and TextLayout are printer points (1/72 of an inch). If desired, these can be changed to an arbitrary resolution using the Resolution property available on both GcPdfGraphics and TextLayout classes.

Coordinates

Coordinates of all graphic objects are measured from the top left corner of the graphics surface (which in GcPdfGraphics is usually a page). GcPdfGraphics.Transform can be used to change that.

Page Graphics

To draw on a page in a PDF document, an instance of GcPdfGraphics must be used for each page. Each page in the GcPdfDocument.Pages collection has the Graphics property that fetches the graphics for that page. You can simply get that property and draw on the returned graphics instance. Initially each page has just one graphics associated with it. But if the page contains multiple context streams, each context stream will have its own graphics, and the Page.Graphics property will return the graphics of the last (top-most) content stream. (All content streams of the page can be accessed via its ContentStreams collection.)

DsHtml API Overview

DsHtml is a utility library that renders HTML to PDF file or an image in PNG, JPEG, and WebP format. DsHtml uses a Chrome or Edge browser (already installed in the current system, or downloaded from a public web site) in headless mode. Also, it doesn’t matter whether your .NET application is built for x64, x86 or AnyCPU platform target. The browser is continuously working in a separate process.
The DS.Documents.Html library consists of a platform-independent main package that exposes the HTML rendering functionality. The main package contains the following namespaces:

Namespaces	Description
GrapeCity.Documents.Pdf	It provides the extension methods for rendering HTML to PDF file and represents the formatting attributes for rendering HTML to PDF file. The namespace comprises the following classes: GcPdfGraphicsExt HtmlToPdfFormat
GrapeCity.Documents.Html	It provides methods for converting HTML to PDF or images and defines parameters for the PDF or image. The namespace comprises the following classes: BrowserFetcher GcHtmlBrowser HtmlPage ImageOptions JpegOptions LaunchOptions PageOptions PdfMargins PdfOptions PngOptions TimeOutOptions WebpOptions
GrapeCity.Documents.Drawing	It provides the extension methods and formatting attributes for rendering HTML to image. The namespace comprises the following classes: GcBitmapGraphicsExt HtmlToImageFormat

Namespaces

Description

GrapeCity.Documents.Pdf

It provides the extension methods for rendering HTML to PDF file and represents the formatting attributes for rendering HTML to PDF file.

The namespace comprises the following classes:

GrapeCity.Documents.Html

It provides methods for converting HTML to PDF or images and defines parameters for the PDF or image.

The namespace comprises the following classes:

GrapeCity.Documents.Drawing

It provides the extension methods and formatting attributes for rendering HTML to image.

The namespace comprises the following classes:

GrapeCity.Documents.HTML.BrowserFetcher

The BrowserFetcher class has two static methods: GetSystemChromePath() and GetSystemEdgePath(). The methods return the path to an executable file of Chrome or Edge browsers correspondingly. Another option is to download and install Chromium into a local folder. You can create an instance of BrowserFetcher and pass the information such as host, platform, revision, and the destination folder, if needed. Then, execute the BrowserFetcher.GetDownloadedPath() method which downloads Chromium, if required, and returns the path to an executable file for running the Chromium.

GrapeCity.Documents.Html.GcHtmlBrowser

The GcHtmlBrowser class provides methods for converting HTML to PDF and images. With a path to an executable file for running either the Chromium or Edge browsers discovered in the BrowserFetcher class, we can create an instance of GcHtmlBrowser class, which effectively runs the browser process in the background. GcHtmlBrowser also accepts another parameter of LaunchOptions type. The LaunchOptions class provides various settings specific to launching the browser.

The class has two important methods: NewPage(Uri uri) and NewPage(string html). Both methods return an instance of HtmlPage class which represents a browser tab after navigating to the specified web address, file, or the arbitrary HTML content. The second parameter of PageOptions type provides various properties to be applied to the new browser page such as username, password for HTTP authentication, disabling JavaScript, lazy loading etc.

Note:

We recommend using Chrome browser with GcHtmlBrowser class as Edge has some differences in the implementation of some DevTools features.
It is important to dispose every instance of the GcHtmlBrowser and HtmlPage classes after use.

Grapecity.Documents.Html.HtmlPage

The HtmlPage class represents a browser tab after navigating to the specified web address, file, or the arbitrary HTML content. The class has methods such as SaveAsPdf, SaveAsPng, SaveAsJpeg, and SaveAsWebp to save the current page as a PDF or as a raster image of PNG, JPEG, or WebP formats respectively. The first parameter of these methods specifies the destination file or stream. The second parameter passes the additional options for rendering HTML page as single PDF page, setting page size, margins, header and footer etc.

The HtmlPage class contains the additional methods that help to interact with HTML page content. For example, you can obtain the full HTML content of the page using the GetContent method. The SetContent method updates the HTML markup. You can reload the web page with the Reload method or even execute a script in the browser context using the EvaluateExpression method. The WaitForNetworkIdle method helps with loading asynchronous web content.

GrapeCity.Documents.Html.PdfOptions

The PdfOptions class represents output settings for rendering HTML to PDF and defines parameters for the Chromium PDF exporter. In the case of PDF, it doesn’t support any transparency.

If PageWidth and PageHeight properties are not set, the Letter paper size (8.5 by 11 inches) is used by default. Landscape property of the class indicates the paper orientation and is ignored if FullPage property is set to true. The Margins property specifies page margins, in inches and its default value is 0. The Scale property scales the content of PDF on the scale of 0.1 to 2.0. You might also need to provide the scaled values for PageWidth and PageHeight properties to keep the relative size of the resulting pages unchanged.

The PageRanges property allows you to limit the number of pages in the output PDF file. You could specify the desired page numbers as a string, such the following: "1-5, 8, 11-13". Invalid page ranges (e.g., "9-5") are ignored.

Setting the FullPage property to true allows you to export the whole HTML as single PDF page. All other layout settings (except Scale) are ignored in that case.

GrapeCity.Documents.Pdf.HtmlToPdfFormat

The HtmlToPdfFormat class contains the formatting attributes for rendering HTML to PDF file on a GcPdfGraphicsExt class using DrawHtml extension methods. The HTML is drawn to a temporary PDF as single page (if FullPage is true) or with the specified page size (MaxPageWidth, MaxPageHeight), Scale and DefaultBackgroundColor. It is then loaded into a GcPdfDocument and trimmed to actual size of the HTML content. The result is rendered on a GcPdfGraphics as PDF Form XObject.

If MaxPageWidth or MaxPageHeight properties are not set explicitly they are assumed to be equal to 200 inches. DefaultBackgroundColor is equal to Color.White by default.

Other properties of HtmlToPdfFormat are mapped to the corresponding properties of the PageOptions/PdfOptions class:

HtmlToPdfFormat Property	PageOptions/PdfOptions Property
WindowSize	PageOptions.WindowSize
DefaultBackgroundColor	PageOptions.DefaultBackgroundColor
FullPage	PdfOptions.FullPage
DisplayBackgroundGraphics	PdfOptions.PrintBackground
Scale	PdfOptions.Scale
MaxPageWidth	PdfOptions.PageWidth
MaxPageHeight	PdfOptions.PageHeight

GcPdfGraphics Extension Methods

DsHtml provides 4 methods that extend GcPdfGraphics and allow to render or measure an HTML text or page:

Draws an HTML text on this GcPdfGraphics at a specified position:
bool GcPdfGraphics.DrawHtml(GcHtmlBrowser browser, string html, float x, float y, HtmlToPdfFormat format, out SizeF size, bool loadLazyImages = false)
Draws an HTML page specified by a URI on this GcPdfGraphics at a specified position:
bool GcPdfGraphics.DrawHtml(GcHtmlBrowser browser, Uri htmlUri, float x, float y, HtmlToPdfFormat format, out SizeF size, bool loadLazyImages = false)
Measures an HTML text for this GcPdfGraphics:
SizeF GcPdfGraphics.MeasureHtml(GcHtmlBrowser browser, string html, HtmlToPdfFormat format, bool loadLazyImages = false)
Measures an HTML page specified by a URI for this GcPdfGraphics:
SizeF GcPdfGraphics.MeasureHtml(GcHtmlBrowser browser, Uri htmlUri, HtmlToPdfFormat format, bool loadLazyImages = false)

Note: In DsImaging release version 6.0.0, the GcHtmlRenderer class has been marked obsolete and has been replaced by the new GcHtmlBrowser class. This is done to avoid GPL or LGPL licensed software that had to be used in the custom chromium build. For tips about migration from obsolete GcHtmlRenderer class, see Tips to Migrate from Obsolete GcHtmlRenderer class.