Updated Home (markdown)

changetocoding
2023-10-04 15:11:03 +01:00
parent f7c07955fb
commit 5dd8c93da9

16
Home.md

@@ -1,5 +1,21 @@
This wiki contains more detail on various aspects of the public API and the PDF document format.
## Features ##
- Extracts the position and size of letters from any PDF document. This enables access to the text and words in a PDF document.
- Allows the user to retrieve images from the PDF document.
- Allows the user to read PDF annotations, PDF forms, embedded documents and hyperlinks from a PDF.
- Provides access to metadata in the document.
- Exposes the internal structure of the PDF document.
- Creates PDF documents containing text and path operations.
- Read content from encrypted files by providing the password.
- Document Layout Analysis - PdfPig also comes with some tools for document layout analysis such as the Recursive XY Cut, Document Spectrum and Nearest Neighbour algorithms, along with others. It also provides support for exporting page contents to Alto, PageXML and hOcr format. See [Document Layout Analysis](https://github.com/UglyToad/PdfPig/wiki/Document-Layout-Analysis)
- Tables are supported through [Tabula Sharp](https://github.com/BobLd/tabula-sharp)
This provides an alternative to the commercial libraries such as [SpirePDF](https://www.e-iceblue.com/Introduce/pdf-for-net-introduce.html) or copyleft alternatives such as [iText 7](https://github.com/itext/itext7-dotnet) (AGPL) for some use-cases.
It should be noted the library does not support use-cases such as converting HTML to PDF or from other document formats to PDF. For HTML to PDF a good quality solution is [wkhtmltopdf](https://wkhtmltopdf.org/). It also does not currently support generating images from PDF pages. If you need this functionality see if [docnet](https://github.com/GowenGit/docnet) meets your requirements.
## Getting Started ##
PdfPig aims to provide 2 main areas of functionality: