Document Solutions for PDF
In This Topic
    Access Primitive and High-Level PDF Objects
    In This Topic

    A PDF document consists of some primitive and high-level PDF objects. Generally, a PDF document contains nine primitive types of objects and can be interpreted as a graph of linked primitive PDF objects, where an object is one of the following types defined in the PDF specification:

    All high-level PDF objects in object model (such as Page, AnnotationBase, Action, etc.) are implemented as wrappers around primitive PDF objects. A wrapper contains a reference to the underlying primitive PDF type (PdfDict, PdfArray, PdfDictObject, etc.) and provides methods and properties for accessing and manipulating the underlying object. The root class for all high-level objects is PdfWrapperBase; it contains a reference to the underlying PDF primitive object defined by IPdfObject.

    DsPdf allows you to work directly with the primitive objects used to build all the high-level entities in a PDF document, such as DocumentInfo, a PDF dictionary, using the following listed interfaces and classes, and their methods and properties in GrapeCity.Documents.Pdf.Spec namespace:

    Note: All the types and their members mentioned below are for advanced users only. The reader of this document must have a basic idea of PDF specification, direct and indirect PDF objects, and how a PDF file is organized.
    Interface/Class Description
    IPdfObject It is the common interface supported by all PDF objects in a GcPdfDocument that are persisted in a PDF file. Indirect and ObjID properties allow you to identify indirect PDF objects and IDs of the PDF objects.
    IPdfArray It is the common interface implemented by PdfArray, PdfArrayObject, and PdfArrayWrapper types.
    IPdfArrayExt It contains extension methods for the IPdfArray interface.
    IPdfDict It is the common interface implemented by PdfDict, PdfDictObject, and PdfDictWrapper types.
    IPdfDictExt It contains extension methods for the IPdfDict interface.
    IPdfName It is the common interface for PdfName and PdfNameObject.
    IPdfNameExt It contains extension methods for the IPdfName interface.
    IPdfNumber It is the common interface for PdfNumber and PdfNumberObject.
    IPdfNumberExt It contains extension methods for the IPdfNumber interface.
    IPdfRef It is the common interface for PdfRef and PdfRefObject.
    IPdfRefExt It contains extension methods for the IPdfRef interface.
    IPdfString It is the common interface for PdfString and PdfStringObject.
    IPdfStringExt It contains extension methods for the IPdfString interface.
    IPdfBool It is the common interface for PdfBool and PdfBoolObject.
    IPdfBoolExt It contains extension methods for the IPdfBool interface.
    IPdfNull It is the common interface for PdfNull and PdfNullObject.
    IPdfNullExt It contains extension methods for the IPdfNull interface.
    PdfArray It represents a direct PDF array object.
    PdfArrayObject It represents an indirect PDF array object.
    PdfArrayWrapper It represents an array wrapper object.
    PdfDict It represents a direct PDF dictionary object.
    PdfDictObject It represents an indirect PDF dictionary object.
    PdfDictWrapper It represents a dictionary wrapper object.
    PdfName It represents a direct PDF name object. This class overrides GetHashCode() and Equals(object) methods and defines the equality and inequality operators. This class is immutable.
    PdfNameObject It represents an indirect PDF name object.
    PdfNumber It represents a direct PDF number object. The class overrides GetHashCode() and Equals(object) methods and defines the equality and inequality operators. This class is immutable.
    PdfNumberObject It represents an indirect PDF number object.
    PdfStreamObjectBase It represents a PDF stream. It is always an indirect object, as a stream cannot be a direct object in PDF.
    PdfRef It represents a direct PDF reference object. This class overrides GetHashCode() and Equals(object) methods. The class is immutable.
    PdfRefObject It represents an indirect PDF reference object.
    PdfString It represents a direct PDF string object. This class overrides GetHashCode() and Equals(object) methods and defines the equality and inequality operators. The class is immutable.
    PdfStringObject It represents an indirect PDF string object.
    PdfBool It represents a direct PDF bool object. You cannot create instances of this class from user code; the two predefined instances are PdfBool.True and PdfBool.False. Overrides GetHashCode() and Equals(object), which define equality and inequality operators.
    PdfBoolObject It represents an indirect PDF bool object.
    PdfNull It represents a direct PDF null object. You cannot create instances of this class from user code; instead, use the PdfNull.Instance predefined instance. It overrides GetHashCode() and Equals(object), which define equality and inequality operators. This class is immutable.
    PdfNullObject It represents an indirect PDF null object.

    The PDF specification defines the properties that can be present in this dictionary (Creator, Author, etc.), but PDF producers can add arbitrary custom properties, such as the SourceModified property, which is often found in various real-world PDF files. Types from GrapeCity.Documents.Pdf.Spec namespace allow you to access (read, write, or edit) such custom elements.

    Since most high-level objects in a PDF file are PDF dictionaries, in the DsPdf API, the corresponding objects are derived from the PdfDictWrapper class, which in turn is derived from PdfWrapperBase and uses IPdfDict as the underlying object. The GetPdfStreamGetPdfStreamInfo, and GetPdfStreamData methods of PdfWrapperBase can retrieve data from the PDF stream associated with the PDF dictionary.
    Each high-level PDF object (depending on its type) implements one of the primitive interfaces so that the user can use the extension methods of GrapeCity.Documents.Pdf.Spec namespace with these high-level objects.

    Refer to the following example code to get image properties from a PDF document:

    C#
    Copy Code
    // Initialize GcPdfDocument.
    GcPdfDocument doc = new GcPdfDocument();
                    
    // Load PDF document.
    doc.Load(fs);
    
    // Get image from the PDF document.
    var imgs = doc.GetImages();
    var pi = imgs[0].Image;
    
    // Write image ID.
    Console.WriteLine($"PdfImage object ID: {pi.ObjID}");
    
    /* The PdfImage is a descendant of PdfDictWrapper object and has a lot of methods
       that allow you to get properties and data from the underlying PDF stream object. */
    using (PdfStreamInfo psi = pi.GetPdfStreamInfo())
    {
        // Get image information such as length filter name, filter decode parameters, etc.
        Console.WriteLine($"    Image stream length: {psi.Stream.Length}");
        Console.WriteLine($"        ImageFilterName: {psi.ImageFilterName}");
        Console.WriteLine($"ImageFilterDecodeParams: {psi.ImageFilterDecodeParams}");
        
        // Dump content of ImageFilterDecodeParams.
        foreach (var kvp in psi.ImageFilterDecodeParams.Dict)
        {
            Console.WriteLine($"{kvp.Key}: {kvp.Value}");
        }
        
        // Get value of BlackIs1.
        var blackIs1 = psi.ImageFilterDecodeParams.GetBool(PdfName.Std.BlackIs1, null);
        Console.WriteLine($"BlackIs1: {blackIs1}");
    }
                    
    // Dump properties of PdfImage dictionary.
    Console.WriteLine();
    Console.WriteLine("Properties of PdfImage dictionary:");
    foreach (KeyValuePair<PdfName, IPdfObject> kvp in pi.PdfDict.Dict)
    {
        Console.WriteLine($"{kvp.Key}: {kvp.Value}");
    }
                    
    // Get color space and bits per component.
    var cs = pi.Get<IPdfObject>(PdfName.Std.ColorSpace);
    Console.WriteLine($"ColorSpace: {cs.GetType().Name} {cs}");
    var bpc = pi.Get<IPdfObject>(PdfName.Std.BitsPerComponent);
    Console.WriteLine($"BitsPerComponent: {bpc?.GetType().Name} {bpc}");