support both xobject and inline images. adds unsupported filters so that exceptions are only thrown when accessing lazily evaluated image.bytes property rather than when opening the page.
treat all warnings as errors.
- Create a TextBlock class
- Creates IPageSegmenter
- Add other useful distances: angle, etc.
- Update RecursiveXYCut
- With IPageSegmenter and TextBlock
- Make XYNode and XYLeaf internal
- Optimise (faster) NearestNeighbourWordExtractor and isolate the clustering algorithms for use outside of this class
- Implement a Docstrum inspired page segmentation algorithm
Text edges are where words have either there BoundingBox's left, right or mid coordinate aligned on the same vertical line.
Useful to detect tables, justified text, lists, etc.
an inline image in a pdf content stream starts with the bi tag, then id declares the start of image data and ei the end. attempting to parse the bytes after the id tag as usual resulted in errors. this change adds special case handling for inline images.
* trial azure pipelines
[skip ci]
* use vs2017
* build pr commits
* include codecov and update test nuget
* add codecov call
* add publish test results step
* include coverlet package for test coverage and allow coverlet dynamic public types
* add azure pipelines badge and remove appveyor badge
* add nuget pack step
* use build configuration variable for nuget pack and move after build
* fix path to package to pack
* change nuget to dotnet pack
* remove old codecov related tools
- begin adding support for extended graphics state (the 'gs' operator) including setting the font #39.
- apply page level rotation to the glyph bounding box and width to get correct glyph sizes #41.
- wrap page rotation in a value type to ensure the value is restricted to right angle rotations and provide convenience members #42.
- fix bug where system font finder never worked for truetype fonts because it began reading the file from the wrong offset.
- In the Letter class:
- Renaming 'Location' to 'StartBaseLine' and adding 'EndBaseLine' for better localisation of the letter ('Location' is also kept).
- Adding TextDirection.
- Fixed Test
temporary 'safe' untested implementation of seac for type 1 charstrings.
make structure public
bump version of package and project to 0.0.3 (it had accidentally increased to 0.0.5)