PdfPig/README.md

# UglyToad.Pdf #

[![Build status](https://ci.appveyor.com/api/projects/status/ni7et2j2ml60pdi3?svg=true)](https://ci.appveyor.com/project/EliotJones/pdf)
[![codecov](https://codecov.io/gh/UglyToad/Pdf/branch/master/graph/badge.svg)](https://codecov.io/gh/UglyToad/Pdf)

The aim of this project is to convert the [PdfBox](https://github.com/apache/pdfbox) code to C# in order to provide a properly open source (i.e. no copyleft) solution for inspecting PDF documents. This uses the Apache 2.0 licence.

## Status ##

There is a lot left to do for this project, the initial minimum viable project when released to Alpha will provide:

+ Page counts and sizes (in points) for a document.
+ Access to the text contents of each page. Note that since PDF has no concept of a "word" it will be up to the consumer of the text to work out where the words are within the text.
+ (Possible) The locations and bounds of each letter on the page.

For the initial alpha release all files will be opened rather than streamed so this will not support large files.

Eventually the library should support all existing PdfBox operations such as accessing graphical elements, form elements as well as creating PDF documents.

## Usage ##

The initial public API will be as limited as possible to allow extensive refactoring to take place. The proposed usage is as follows:

    using (PdfDocument document = PdfDocument.Open(@"C:\my-file.pdf"))
    {
        int pageCount = document.NumberOfPages;

        Page page = document.GetPage(1);

        decimal widthInPoints = page.Width;
        decimal heightInPoints = page.Height;

        string text = page.Text;
    }

The ```PdfDocument``` will also support opening from byte arrays (as well as streams eventually):

    byte[] fileBytes = File.ReadAllBytes(@"C:\my-file.pdf");
    (using PdfDocument document = PdfDocument.Open(fileBytes))
    {
        int numberOfPages = document.NumberOfPages;
    }
Move the code from the forked version to clean repository 2017-11-10 03:14:09 +08:00			`# UglyToad.Pdf #`

use the correct appveyor url in the readme 2017-11-10 03:29:44 +08:00			`[![Build status](https://ci.appveyor.com/api/projects/status/ni7et2j2ml60pdi3?svg=true)](https://ci.appveyor.com/project/EliotJones/pdf)`
add coverage badge 2017-12-06 07:45:59 +08:00			`[![codecov](https://codecov.io/gh/UglyToad/Pdf/branch/master/graph/badge.svg)](https://codecov.io/gh/UglyToad/Pdf)`
add build badge to the readme 2017-11-10 03:27:21 +08:00
encapsulate the internals better and improve the api for pdfdocument, delete old code and tidy tests. expand readme 2017-12-28 21:14:03 +08:00			`The aim of this project is to convert the [PdfBox](https://github.com/apache/pdfbox) code to C# in order to provide a properly open source (i.e. no copyleft) solution for inspecting PDF documents. This uses the Apache 2.0 licence.`

			`## Status ##`

			`There is a lot left to do for this project, the initial minimum viable project when released to Alpha will provide:`

			`+ Page counts and sizes (in points) for a document.`
			`+ Access to the text contents of each page. Note that since PDF has no concept of a "word" it will be up to the consumer of the text to work out where the words are within the text.`
			`+ (Possible) The locations and bounds of each letter on the page.`

			`For the initial alpha release all files will be opened rather than streamed so this will not support large files.`

			`Eventually the library should support all existing PdfBox operations such as accessing graphical elements, form elements as well as creating PDF documents.`

			`## Usage ##`

			`The initial public API will be as limited as possible to allow extensive refactoring to take place. The proposed usage is as follows:`

			`using (PdfDocument document = PdfDocument.Open(@"C:\my-file.pdf"))`
			`{`
			`int pageCount = document.NumberOfPages;`

			`Page page = document.GetPage(1);`

			`decimal widthInPoints = page.Width;`
			`decimal heightInPoints = page.Height;`

			`string text = page.Text;`
			`}`

			The ```PdfDocument``` will also support opening from byte arrays (as well as streams eventually):

			`byte[] fileBytes = File.ReadAllBytes(@"C:\my-file.pdf");`
			`(using PdfDocument document = PdfDocument.Open(fileBytes))`
			`{`
			`int numberOfPages = document.NumberOfPages;`
			`}`