Update README.md

2025-10-08 00:14:35 +08:00 · 2025-03-30 11:39:47 +01:00
parent 5fb36d452f
commit ede77c20f5
1 changed files with 21 additions and 23 deletions
--- a/README.md
+++ b/README.md
@@ -10,8 +10,6 @@ containing text and geometrical shapes.

 This project aims to port [PDFBox](https://github.com/apache/pdfbox) to C#.

-**Migrating to 0.1.6 from 0.1.x?** Use this guide: [migration to 0.1.6](https://github.com/UglyToad/PdfPig/wiki/Migration-to-0.1.6).
-
 ## Wiki
 Check out our [wiki](https://github.com/UglyToad/PdfPig/wiki) for more examples and detailed guides on the API.

@@ -55,7 +53,7 @@ An example of the output of this is shown below:

 Where for the PDF text ("Write something in") shown at the top the 3 words (in pink) are detected and each word contains the individual letters with glyph bounding boxes.

-### Ceate PDF Document
+### Create PDF Document
 To create documents use the class `PdfDocumentBuilder`. The Standard 14 fonts provide a quick way to get started:

 ```cs
@@ -77,10 +75,10 @@ The output is a 1 page PDF document with the text "Hello World!" in Helvetica ne

 ![Image shows a PDF document in Google Chrome's PDF viewer. The text "Hello World!" is visible](https://raw.githubusercontent.com/UglyToad/Pdf/master/documentation/builder-output.png)

-Each font must be registered with the PdfDocumentBuilder prior to use enable pages to share the font resources. Only Standard 14 fonts and TrueType fonts (.ttf) are supported.
+Each font must be registered with the `PdfDocumentBuilder` prior to use enable pages to share the font resources. Only Standard 14 fonts and TrueType fonts (.ttf) are supported.

 ### Advanced Document Extraction
-In this example a more advanced document extraction is performed. PdfDocumentBuilder is used to create a copy of the pdf with debug information (bounding boxes and reading order) added.
+In this example a more advanced document extraction is performed. `PdfDocumentBuilder` is used to create a copy of the pdf with debug information (bounding boxes and reading order) added.


 ```cs
@@ -183,7 +181,7 @@ The document contains the version of the PDF specification it complies with, acc

    decimal version = document.Version;

-### Document Creation (0.0.5)
+### Document Creation

 The `PdfDocumentBuilder` creates a new document with no pages or content.

@@ -256,7 +254,7 @@ string title = document.Information.Title;
 // etc...
 ```

-### Document Structure (0.0.3)
+### Document Structure

 The document now has a Structure member:

@@ -286,21 +284,21 @@ bool isA4 = size == PageSize.A4;

    string text = page.Text;

-There is a new (0.0.3) method which provides access to the words. This uses basic heuristics and is not reliable or well-tested:
+There is a method which provides access to the words. The default method uses basic heuristics. For advanced cases, You can also implement your own `IWordExtractor` or use the `NearestNeighbourWordExtractor`:

    IEnumerable<Word> words = page.GetWords();

-You can also (0.0.6) access the raw operations used in the page's content stream for drawing graphics and content on the page:
+You can also access the raw operations used in the page's content stream for drawing graphics and content on the page:

    IReadOnlyList<IGraphicsStateOperation> operations = page.Operations;

 Consult the PDF specification for the meaning of individual operators.

-There is also an early access (0.0.3) API for retrieving the raw bytes of PDF image objects per page:
+There is also an API for retrieving the PDF image objects per page:

-    IEnumerable<XObjectImage> images = page.ExperimentalAccess.GetRawImages();
+    IEnumerable<XObjectImage> images = page.GetImages();

-This API will be changed in future releases.
+Please read the [wiki on Images](https://github.com/UglyToad/PdfPig/wiki/Images).

 ### Letter

@@ -322,15 +320,15 @@ These letters contain:

 Letter position is measured in PDF coordinates where the origin is the lower left corner of the page. Therefore a higher Y value means closer to the top of the page.

-### Annotations (0.0.5)
+### Annotations

-Early support for retrieving annotations on each page is provided using the method:
+Retrieving annotations on each page is provided using the method:

-    page.ExperimentalAccess.GetAnnotations()
+    page.GetAnnotations()

-This call is not cached and the document must not have been disposed prior to use. The annotations API may change in future.
+This call is not cached and the document must not have been disposed prior to use.

-### Bookmarks (0.0.10)
+### Bookmarks

 The bookmarks (outlines) of a document may be retrieved at the document level:

@@ -338,7 +336,7 @@ The bookmarks (outlines) of a document may be retrieved at the document level:

 This will return `false` if the document does not define any bookmarks.

-### Forms (0.0.10)
+### Forms

 Form fields for interactive forms (AcroForms) can be retrieved using:

@@ -350,15 +348,15 @@ The fields can be accessed using the `AcroForm`'s `Fields` property. Since the f

 Please note the forms are readonly and values cannot be changed or added using PdfPig.

-### Hyperlinks (0.1.0)
+### Hyperlinks

 A page has a method to extract hyperlinks (annotations of link type):

    IReadOnlyList<UglyToad.PdfPig.Content.Hyperlink> hyperlinks = page.GetHyperlinks();

-### TrueType (0.1.0)
+### TrueType

-The classes used to work with TrueType fonts in the PDF file are now available for public consumption. Given an input file:
+The classes used to work with TrueType fonts in the PDF file are available for public consumption. Given an input file:


 ```cs
@@ -372,7 +370,7 @@ TrueTypeFont font = TrueTypeFontParser.Parse(input);

 The parsed font can then be inspected.

-### Embedded Files (0.1.0)
+### Embedded Files

 PDF files may contain other files entirely embedded inside them for document annotations. The list of embedded files and their byte content may be accessed:

@@ -386,7 +384,7 @@ if (document.Advanced.TryGetEmbeddedFiles(out IReadOnlyList<EmbeddedFile> files)
 }
 ```

-### Merging (0.1.2)
+### Merging

 You can merge 2 or more existing PDF files using the `PdfMerger` class: