Commit Graph

837 Commits

Author SHA1 Message Date
Eliot Jones
f1be6634a7 add a bunch more performance improvements
filter provider becomes single instance and no longer has constructor parameters.

tokenizers use list and stringbuilder pools to reduce allocations.

system font finder becomes static to preserve file cache across all documents.
2020-04-05 15:34:47 +01:00
Eliot Jones
7baa18b5dd add stringbuilder pool for tokenizers
we could replace these with spans in the next net core however for now our pools seem to increase performance by reducing gc load.
2020-04-04 18:31:55 +01:00
Eliot Jones
cf46230c05 #127 add pdf/a2-b compliance to the builder 2020-04-04 17:49:27 +01:00
Eliot Jones
729234477a fix issue with null encodings for cid fonts 2020-04-04 17:30:06 +01:00
Eliot Jones
9abe9f4b2f #158 add strong naming to the solution 2020-04-04 16:59:51 +01:00
Eliot Jones
7f1bf094bc #127 pdf/a-1a compliance
adds struct tree and markinfo dictionaries to support pdf/a-1a compliance.
2020-03-29 17:55:02 +01:00
Eliot Jones
5f45ee53bd #127 add basic pdf/a-1b level compliance to the document builder
adds color profiles/output intents and an xmp metadata stream to the document in order to be compliant with pdf/a-1b (basic). this compliance level is toggled on the builder since it will generate larger files and set to 'off/none' by default. pdf/a documents are also not able to use standard fonts so using a font when the compliance level is not none will throw.
2020-03-29 16:43:52 +01:00
BobLd
7d52bc8be4 make Distances.FindIndexNearest public
add tests for Distances.FindIndexNearest
2020-03-24 19:52:01 +00:00
BobLd
8460b8bc1d Add test for Mode(float) 2020-03-24 19:52:01 +00:00
BobLd
0d786a1265 Add tests for KdTree, MathExtensions and Distances
Add reference to DLA project
Make KdTree public
Fix mode computation for multimodal
2020-03-24 19:52:01 +00:00
BobLd
5f0ddf131e make OrientedBoundingBox public 2020-03-24 19:52:01 +00:00
Eliot Jones
4ed1600cab version 0.1.1 0.1.1 2020-03-18 20:10:51 +00:00
Eliot Jones
0f91017613 fix issue with newlines in object start tokens #88
where we brute force the file and it contains newlines between object tokens we fix the parsing to prevent pseudo-infinite loops.
2020-03-17 20:09:47 +00:00
Eliot Jones
5094f9d9d0 remove debugging code
this code was used to check content streams but should not be committed.
2020-03-16 19:32:57 +00:00
Eliot Jones
98bcc16e11 fix width and height order in jpeg parsing
height is before width, incorrect order caused adobe reader to draw image strangely.
2020-03-16 19:32:57 +00:00
Eliot Jones
7212b9e38c enable re-use of jpeg images between or within pages
returns a reference to the added image object when calling addjpeg so that it can be shared between or within pages meaning the image is only written to the output file once but can appear multiple times.

this image doesn't seem to be displaying correctly in adobe reader.
2020-03-16 19:32:57 +00:00
Eliot Jones
19462d79f0 add support for jpeg images in pdf document builder
since jpegs can be trivially embedded in pdf documents without changes to the data stream this is the first image format we will support. currently this is a naive approach which doesn't share an image resources between pages. ideally we will either de-duplicated images when added, return a re-usable key once an image is added, or both.
2020-03-16 19:32:57 +00:00
Eliot Jones
8ac4195b83 0.1.1-alpha001 0.1.1-alpha001 2020-03-15 16:52:28 +00:00
Eliot Jones
908d84ccc6 remove debug code from the test
i accidentally left this in when debugging the pdf merging.
2020-03-15 16:29:18 +00:00
BobLd
c1a1fa1f7f - fix minimum area rectangle algo and make it public
- add tests
- tidy up code
2020-03-15 17:20:36 +01:00
InusualZ
bd6b03c2e8 Removed the possibility of deepCopy a token.
Anyway, the DataToken properties most are readonly, If you want to change something. You have to crate a new token anyway.  Discussion: https://github.com/UglyToad/PdfPig/pull/150
2020-03-15 16:05:23 +01:00
InusualZ
3abe210c6f Add new flag to control, weather we want to dispose the base stream.
Let the base stream throw in case of using unsupported method (Ex. Seek, Read, etc..)
2020-03-15 16:05:23 +01:00
InusualZ
02289d75ac Use ReferenceEquals to avoid infinite recursion
Fix InlineImageDataToken not checking Data length before comparing bytes
Use a more straightforward compare Dictionary Content
2020-03-15 16:05:23 +01:00
InusualZ
26f92a9630 Add test case, that test that we lower the object count 2020-03-15 16:05:23 +01:00
InusualZ
b3f310a249 Make PdfMerger use the PdfStreamWriter 2020-03-15 16:05:23 +01:00
InusualZ
44ad5c8b0c PdfStreamWriter: Error Checking and Code Formatting 2020-03-15 16:05:23 +01:00
InusualZ
c533d47386 New class PdfStreamWriter
This class would allow us to lazily flush resource. This would allow us to make changes to them, while new content is pushed. Compress, Merge, Deduplicate, etc...
2020-03-15 16:05:23 +01:00
InusualZ
be7716eeea Make IToken implement IEquatable<IToken>
This would allow us to deduplicate tokens, by comparing their content
2020-03-15 16:05:23 +01:00
BobLd
9366aa2b37 Tidy up code 2020-03-15 15:39:19 +01:00
BobLd
5b8a2f2e38 handle k-nearest neighbours search
update DocstrumBB with kd-tree
2020-03-15 15:39:19 +01:00
BobLd
8cafda3577 handle nearest neighbour not found 2020-03-15 15:39:19 +01:00
Eliot Jones
aa9df30722 handle invalid charstring sequences
it is possible for a file with an adobe type 1 font to contain an invalid charstring sequence, if this happens we handle it and return false from trygenerate.
2020-03-08 14:33:26 +00:00
Eliot Jones
8df2f9cf6b generate all xml docs and pack them #148
after we split the solution into multiple projects the xml doc comments were no longer packed in the generated nuget package. in addition they were only generated for the net standard 2.0 target framework.

this change generates comments for all target frameworks and makes sure they're included in the generated package. it also adds missing doc comments where they weren't included on the public api and clears up a couple of minor formatting issues in the affected files.
2020-03-08 13:44:09 +00:00
Eliot Jones
24c5cbea4b support custom page sizes for document builder #147
page size custom is not supported for the document builder so a new overload which supports user defined page sizes is provided.
2020-03-07 16:48:19 +00:00
InusualZ
ab3779e644 Fix an issue where the root/Pages Count was incorrect
`/Pages` Count should reflect the number leaf nodes (page objects) that are descendants of this node.
2020-03-07 16:47:35 +00:00
BobLd
439dad9f35 fix depth for leaves 2020-03-06 16:53:18 +00:00
BobLd
dbdef7f195 kd-tree 2020-03-06 16:53:18 +00:00
BobLd
c4309ef31e fix kd-tree 2020-03-06 16:53:18 +00:00
BobLd
1e1a33d46e fix typo in kd-tree
replace Count() by Count or Length
2020-03-06 16:53:18 +00:00
BobLd
e477bc8d6d Tidy up and optimise kd-tree 2020-03-06 16:53:18 +00:00
Eliot Jones
bef68a0654 avoid infinite loop in brute-force searcher #88
fixes the case where the brute-force searcher becomes stuck in an infinite loop, it may be the case that the problem pdf from #88 has a newline or some other whitespace between its object and generation number so this may cause a failure elsewhere.
2020-03-03 15:49:17 +00:00
Eliot Jones
58972de7cb begin to rework cross-reference parsing
most of the cross-reference code is the earliest code in the project and hasn't been revisited since then. the issue #88 has been reopened due to a bug with brute-force searching so this tidies up the code in this area ahead of trying to fix the bug.
2020-03-03 15:21:11 +00:00
Eliot Jones
4b5c8d510e add test for comment in dictionary from #145
check that we correctly handle the case where a comment appears inside a dictionary, this was handled by commit 3084a9. use list internally to dictionary tokenizer to avoid interface performance penalties.
2020-03-03 11:36:01 +00:00
Eliot Jones
14599521f5 test the simple document merge in reverse order
check we can create the document in both merge orders.
2020-03-02 17:00:16 +00:00
Eliot Jones
2effedd3c5 add inheritable keys back into the copied pages node
keys suchs as resources, mediabox, cropbox, etc can be inherited. we now copy them if they are present on the parent pages node.
2020-03-02 17:00:16 +00:00
Eliot Jones
c596bef024 rename to resharper conventions and test 2nd page
renames fields to match the expected conventions for resharper. removes fully qualified names for using statements since resharper marks these as not-required.

adds a check to the pdf merger test to make sure the second page has the expected content. this is not currently valid since we lose the resources node on the pages tree.
2020-03-02 17:00:16 +00:00
Eliot Jones
2193063809 fix tests for public api and merge conflict
the cross reference parser tests behaviour had change, this fixes a compilation error from merge conflicts. also updates the merger tests to account for new version behaviour and checks the output document text. adds pdfmerger to the public api in the tests.
2020-03-02 17:00:16 +00:00
Inusual
b560c73fa9 Set the version as the highest version found in the merged document
And reorganize the code a little bit
2020-03-02 17:00:16 +00:00
Inusual
ca250a8c6f IsLenientParsing as false 2020-03-02 17:00:16 +00:00
Inusual
439a186ed7 Remove dead code 2020-03-02 17:00:16 +00:00