Commit Graph

77 Commits

Author SHA1 Message Date
BobLd
b462c3bea4 update PublicApiScannerTests 2020-04-12 10:15:12 +01:00
BobLd
064fa4922a make Clipping internal
do not throw errors when CurrentPath is null
modify tests to match
2020-04-05 17:58:57 +01:00
Eliot Jones
5f45ee53bd #127 add basic pdf/a-1b level compliance to the document builder
adds color profiles/output intents and an xmp metadata stream to the document in order to be compliant with pdf/a-1b (basic). this compliance level is toggled on the builder since it will generate larger files and set to 'off/none' by default. pdf/a documents are also not able to use standard fonts so using a font when the compliance level is not none will throw.
2020-03-29 16:43:52 +01:00
Eliot Jones
2193063809 fix tests for public api and merge conflict
the cross reference parser tests behaviour had change, this fixes a compilation error from merge conflicts. also updates the merger tests to account for new version behaviour and checks the output document text. adds pdfmerger to the public api in the tests.
2020-03-02 17:00:16 +00:00
Eliot Jones
43574097f1 rename marked content elements and use factory
since the properties in marked content may be indirect references or belong to the page resources array, the value should be calculated during content processing. this change tidies up the marked content classes so they do not expose mutable data and uses the pdf token scanner overloads to load dictionary data.
2020-01-09 15:30:16 +00:00
BobLd
097692f1cb Move ArtifactType inside PdfArtifactMarkedContent 2020-01-09 11:24:32 +00:00
BobLd
7be36fdc58 Update PublicApiScannerTests 2 2020-01-08 11:07:27 +00:00
BobLd
4b929482cc Update PublicApiScannerTests 2020-01-08 10:46:49 +00:00
Eliot Jones
00bd285262 add support for quadpoints to annotations
highlight, link, strikeout, squiggly and underline annotation types may define a set of quadrilaterals using the quadpoints entry. this defines the regions to show/activate the annotation. the order of points in the quadpoints array does not match the specification so we provide a convenience class to access the point data rather than interpreting it as a rectangle: https://stackoverflow.com/questions/9855814/pdf-spec-vs-acrobat-creation-quadpoints.
2020-01-05 16:23:07 +00:00
Eliot Jones
b29354e3e6 move compact font format fonts to fonts project 2020-01-05 12:08:01 +00:00
Eliot Jones
d09b33af4d move tokens to new project 2020-01-05 10:07:01 +00:00
Eliot Jones
1c38a2ae8a move pdfline to the core project 2020-01-05 09:33:59 +00:00
Eliot Jones
15525acbaa move document layout analysis and export to new project 2020-01-05 09:19:58 +00:00
Eliot Jones
a6541f1cfc fix test references
update references for unit tests to reference new core and fonts projects. all tests except the public api scanner tests now run successfully.
2020-01-04 22:56:41 +00:00
Eliot Jones
cf1b8651d6 make adler32checksum public
there's no reason to keep adler32checksum internal so it is made public in case people find it useful.
2020-01-04 10:27:07 +00:00
Eliot Jones
b15a3a9b57 tidy up truetype tables
* improves the naming of truetype related classes.
* uses correct numeric type for the loca table.
* makes a few related classes public.
2020-01-04 10:27:07 +00:00
BobLd
07f51712c6 Update PublicApiScannerTests 2020-01-03 12:31:23 +00:00
BobLd
3a060d9769 Update PublicApiScannerTests 2019-12-28 14:43:09 +00:00
Eliot Jones
e984180b3d add method to retrieve any embedded files 2019-12-21 16:16:36 +00:00
Eliot Jones
7296c3c125
merge pull request #105 from BobLd/master
whitespace covering algorithm and #104
2019-12-20 11:57:31 +00:00
BobLd
6dba5bb2b4 update PublicApiScannerTests 2019-12-18 11:43:39 +00:00
Eliot Jones
1fb416eee3 add convenience method to retrieve all hyperlinks and their text from annotations on a page 2019-12-18 11:41:02 +00:00
BobLd
1656411fcb Improving Geometry classes with Tests 2019-12-14 11:41:11 +00:00
Eliot Jones
75a6260501 make cropbox public 2019-12-06 17:34:51 +00:00
Eliot Jones
2e5c995322 make external nodes different to document nodes and finish reimplementation 2019-12-05 13:21:19 +00:00
Eliot Jones
928347bcce
merge pull request #84 from BobLd/master
add basic bookmarks extraction capabilities.
2019-12-04 14:24:10 +00:00
Eliot Jones
80f024dbed make form access public 2019-11-27 16:36:25 +00:00
Eliot Jones
910e22a4e9 wrap checkboxes and radiobuttons in their own form field types with access to the child collections 2019-11-26 16:33:24 +00:00
BobLd
89daa2818e update PublicApiScannerTests 2019-11-04 15:17:25 +00:00
BobLd
99f260befb Enhancing NearestNeighbourWordExtractor
- Making the code easier to read
- Using 20% of Width instead of 60%
- Making DefaultWordExtractor public
2019-10-21 20:51:27 +01:00
Eliot Jones
57dfee3211 move alto xml exporter to root export namespace 2019-10-17 10:46:43 +01:00
Eliot Jones
f14c52a05a fix tests for renaming and validating generate alto xml 2019-10-15 13:59:09 +01:00
BobLd
e76badaeaf Update PublicApiScannerTests with new public classes 2019-10-11 08:57:16 +01:00
BobLd
fe1a3c4b8b updated from comments
- still need to look at XmlWriter
2019-10-10 12:29:28 +01:00
Eliot Jones
2ef45f71d5 make missing acroform types public and start improving data
also changes pages to use a proper tree structure since this will be required for resource inheritance and for acroform widget dictionaries.
2019-10-09 14:28:37 +01:00
BobLd
9ab943e1f9 Merge branch 'master' of https://github.com/UglyToad/PdfPig 2019-10-08 14:16:59 +01:00
Eliot Jones
68bcaf3901 #55 move support for images to page and add inline images
support both xobject and inline images. adds unsupported filters so that exceptions are only thrown when accessing lazily evaluated image.bytes property rather than when opening the page.

treat all warnings as errors.
2019-10-08 14:04:36 +01:00
BobLd
d939be1b9c update PublicApiScannerTests 2 2019-10-07 16:09:30 +01:00
BobLd
f4f2b0e3fd update PublicApiScannerTests 2019-10-07 16:02:11 +01:00
BobLd
93313118e9 Support for hORC, AtloXml and PageXml output formats
Tested with:
- 'hocrjs' for hORC (see https://unpkg.com/hocrjs)
- 'PAGE Viewer' for hORC, AtloXml and PageXml (see http://www.primaresearch.org/tools/PAGEViewer)
2019-10-07 15:19:30 +01:00
Eliot Jones
f5e025aa70
merge pull request #58 from uglytoad/colors
adds colors to letters and prepares code to add colors to paths.
2019-08-13 20:50:06 +01:00
Eliot Jones
f55091f3d2 make color types public and add stream based tests to prevent future breaking as observed in #52 2019-08-13 20:48:22 +01:00
Eliot Jones
980e67fabe
Merge pull request #56 from BobLd/master
Document Layout Analysis - IPageSegmenter, Docstrum
2019-08-11 14:04:39 +01:00
Eliot Jones
0349bedd3e #57 add access to document metadata and expose wrapper type 2019-08-11 12:42:30 +01:00
BobLd
c14d77e414 PublicApiScannerTests updated 2019-08-10 16:36:50 +01:00
BobLd
eb9a9fd00e Document Layout Analysis - IPageSegmenter, Docstrum
- Create a TextBlock class
- Creates IPageSegmenter
- Add other useful distances: angle, etc.
- Update RecursiveXYCut
 - With IPageSegmenter and TextBlock
 - Make XYNode and XYLeaf internal
- Optimise (faster) NearestNeighbourWordExtractor and isolate the clustering algorithms for use outside of this class
- Implement a Docstrum inspired page segmentation algorithm
2019-08-10 16:01:27 +01:00
BobLd
801ea3ba7f Modified PublicApiScannerTests 2019-08-07 14:22:39 +01:00
BobLd
83889cfb52 Document Layout Analysis - Text edges extractor
Text edges are where words have either there BoundingBox's left, right or mid coordinate aligned on the same vertical line.
Useful to detect tables, justified text, lists, etc.
2019-08-06 15:24:16 +01:00
Eliot Jones
0b9ae1db13 add color information to the operation context. create color classes for letters and paths to use 2019-08-04 16:47:47 +01:00
Eliot Jones
1d551d6de3 add and document core classes for colorspace information 2019-08-04 12:57:06 +01:00