add document documentation

2025-08-20 06:38:07 +08:00 · 2019-10-09 12:10:45 +01:00 · 2019-10-09 12:10:45 +01:00 · 9306195038
commit 9306195038
parent bf4ed3a7e1
2 changed files with 111 additions and 2 deletions
--- a/Home.md
+++ b/Home.md
@ -1,5 +1,3 @@
-# PdfPig Wiki #
-
 This wiki contains more detail on various aspects of the public API and the PDF document format.

 ## Getting Started ##
@ -49,9 +47,12 @@ The resulting bytes are a valid PDF document and can be saved to the file system

 More details on the API can be found here.

+* [PdfDocument](https://github.com/UglyToad/PdfPig/wiki/PdfDocument)
 * [Letters](https://github.com/UglyToad/PdfPig/wiki/Letters)
 * [Document Creation](https://github.com/UglyToad/PdfPig/wiki/Document-Creation)

+Additional automated documentation from doc-comments can be found on [DotNetApis](http://dotnetapis.com/pkg/PdfPig/0.0.9).
+
 ## Release Notes ##

 Release notes as well as downloadable packages can be found on the releases page [https://github.com/UglyToad/PdfPig/releases](https://github.com/UglyToad/PdfPig/releases).
--- a/PdfDocument.md
+++ b/PdfDocument.md
@ -0,0 +1,108 @@
+
+Namespace - `UglyToad.PdfPig`
+
+The `PdfDocument` class provides all root functionality for consuming document content.
+
+To create an instance of a `PdfDocument` you must first call `PdfDocument.Open`. There are 3 overloads for opening a document:
+
+    PdfDocument Open(byte[] fileBytes, ParsingOptions options = null);
+
+This opens a document from an array of bytes representing a PDF document.
+
+    PdfDocument Open(string filePath, ParsingOptions options = null);
+
+This opens a document from the filesystem at the provided path. This will load the entire file into memory at once. The alternative is to use the 3rd overload:
+
+    PdfDocument Open(Stream stream, ParsingOptions options = null);
+
+This opens a document from a stream of any kind, this could be a `MemoryStream`, `FileStream`, etc. It's worth noting that if the stream is not buffered (e.g. a network stream) this will be much slower. One workaround for this is to load the stream into a [BufferedStream](https://docs.microsoft.com/en-us/dotnet/api/system.io.bufferedstream?view=netframework-4.8), a framework class which enables buffering automatically.
+
+Any call to open should be wrapped in a `using` statement since `PdfDocument` implements `IDisposable`:
+
+    using (PdfDocument document = PdfDocument.Open(@"C:\docs\test.pdf"))
+    {
+    }
+
+## Parsing Options ##
+
+Parsing options control aspects of how the document is opened and allow the consumer to provide their own logger. The defaults should be sufficient, except where the document is password protected where a password must be provided in the `ParsingOptions.Password` property.
+
+`UseLenientParsing` controls how strictly the library interprets the PDF specification and how much error recovery it attempts where the document format is invalid or corrupt. The default is to attempt lenient parsing but a stricter parsing mode can be enabled by passing the static `ParsingOptions.LenientParsingOff` instance.
+
+## Pages ##
+
+Once a `PdfDocument` has been obtained by calling `Open` the main use case is to inspect the pages that the document contains.
+
+Firstly the total number of pages in the document is provided by:
+
+    int numberOfPages = document.NumberOfPages;
+
+Individual pages may then be opened using `GetPage`. This takes a 1-indexed page number as an argument:
+
+    using UglyToad.PdfPig.Content;
+    // ...
+    Page page1 = document.GetPage(1);
+    Page page2 = document.GetPage(2);
+    // etc.
+
+Calling `GetPage(i)` with a value of `i <= 0` is invalid.
+
+You can also enumerate all pages in a document in order using:
+
+    using UglyToad.PdfPig.Content;
+    // ...
+    IEnumerable<Page> pages = document.GetPages();
+
+## XMP Metadata ##
+
+A PDF document can include general information about the document at the top level in the XML format defined by the Extensible Metadata Platform (XMP).
+
+If this optional XML data is present it may be obtained using the `TryGetXmpMetadata` method:
+
+    using UglyToad.PdfPig.Content;
+    // ...
+    if (document.TryGetXmpMetadata(out XmpMetadata metadata))
+    {
+        XDocument xmpDocument = metadata.GetXDocument();
+    }
+    else
+    {
+        // No XMP metadata was present.
+    }
+
+## Document Information ##
+
+In addition to XMP metadata which allows for an extensible range of metadata a PDF document may optionally contain an information dictionary. This defines a range of fields such as author, title, etc.
+
+This can be accessed through the `Information` property:
+
+    using UglyToad.PdfPig.Content;
+    // ...
+    DocumentInformation information = document.Information;
+    string title = information.Title;
+    string author = information.Author;
+    // etc.
+
+Since all fields on the information dictionary are optional they can be `null` and should be checked prior to access, e.g.:
+
+    DocumentInformation information = document.Information;
+    if (information.Author != null) 
+    {
+        string upperAuthor = information.Author.ToUpper();
+    }
+
+## Version ##
+
+There are multiple versions of the PDF specification following the numbering `1.1, 1.2, 1.3, etc.`. The version number of the current document can be retrieved with the `Version` property:
+
+    decimal version = document.Version;
+
+## IsEncrypted ##
+
+Documents can be encrypted using a number of different algorithms defined by the PDF specification, the `IsEncrypted` flag indicates whether a document is encrypted.
+
+## Structure ##
+
+The `Structure` property of a document provides access to the underlying PDF tokens that are used to construct the document.
+
+This is for advanced users and relies on a familiarity with the PDF specification to use.