Commit Graph

526 Commits

Author SHA1 Message Date
Eliot Jones
bef68a0654 avoid infinite loop in brute-force searcher #88
fixes the case where the brute-force searcher becomes stuck in an infinite loop, it may be the case that the problem pdf from #88 has a newline or some other whitespace between its object and generation number so this may cause a failure elsewhere.
2020-03-03 15:49:17 +00:00
Eliot Jones
58972de7cb begin to rework cross-reference parsing
most of the cross-reference code is the earliest code in the project and hasn't been revisited since then. the issue #88 has been reopened due to a bug with brute-force searching so this tidies up the code in this area ahead of trying to fix the bug.
2020-03-03 15:21:11 +00:00
Eliot Jones
2effedd3c5 add inheritable keys back into the copied pages node
keys suchs as resources, mediabox, cropbox, etc can be inherited. we now copy them if they are present on the parent pages node.
2020-03-02 17:00:16 +00:00
Eliot Jones
c596bef024 rename to resharper conventions and test 2nd page
renames fields to match the expected conventions for resharper. removes fully qualified names for using statements since resharper marks these as not-required.

adds a check to the pdf merger test to make sure the second page has the expected content. this is not currently valid since we lose the resources node on the pages tree.
2020-03-02 17:00:16 +00:00
Inusual
b560c73fa9 Set the version as the highest version found in the merged document
And reorganize the code a little bit
2020-03-02 17:00:16 +00:00
Inusual
ca250a8c6f IsLenientParsing as false 2020-03-02 17:00:16 +00:00
Inusual
439a186ed7 Remove dead code 2020-03-02 17:00:16 +00:00
Inusual
932857cf8c Revert "Don't forget to copy /Pages properties too"
This reverts commit 224d9dc6e52245f9d16a22af460f386545403cd1.
2020-03-02 17:00:16 +00:00
Inusual
20ff625c2e Don't forget to copy /Pages properties too 2020-03-02 17:00:16 +00:00
Inusual
ec67ef76cd Remove ObjectsTree class since it's obsolete 2020-03-02 17:00:16 +00:00
Inusual
4ffc1620a1 Don't create unnecessary objects 2020-03-02 17:00:16 +00:00
Inusual
761e3d291e Rename DocumentBuilder to DocumentMerger
Remove IDisposable trait
2020-03-02 17:00:16 +00:00
Inusual
6e0caee317 Add StreamToken as exception to CopyTokenq
Add summary to better explain CopyToken purpose
2020-03-02 17:00:16 +00:00
Inusual
669742b6bd Fix for page object having as parent his own object
I decided to move to his own method, the part that create the page node. This allowed me to visualize better, from where I was suppose to get the correct reference.
2020-03-02 17:00:16 +00:00
Inusual
f485826751 PdfMerger: basic functionality implemented
I has a lot of unknown and TODO please look at them
2020-03-02 17:00:16 +00:00
Inusual
013cbd14e0 Make CrossReferenceTableParser a static class 2020-03-02 17:00:16 +00:00
Eliot Jones
b7a86f482f cache bounding boxes for composite fonts
cached the bounding box for a specific character code value for type 0 (composite) fonts to improve performance.
2020-02-28 16:36:06 +00:00
Eliot Jones
4442a69a97 use tryget rather than lambdas for union type
avoid the allocations caused by lambda expressions for performance reasons.
2020-02-28 16:02:20 +00:00
Eliot Jones
4d911fb9d1 use transform x for widths to improve performance
when transforming the advance width inside a font, we transform only the x coordinate rather than making a new point to transform.
2020-02-28 15:15:35 +00:00
Eliot Jones
c864fa512c remove islenientparsing from page classes 2020-02-28 11:50:18 +00:00
Eliot Jones
48d166276d remove islenientparsing from contentstreamprocessor 2020-02-28 11:44:13 +00:00
Eliot Jones
6fdaf054cb remove islenientparsing from annotation provider 2020-02-28 11:39:56 +00:00
Eliot Jones
7b09999a3f remove islenientparsing from the font handlers
we're removing islenientparsing to make the code simpler to maintain and use as well as more resilient.
2020-02-28 11:37:18 +00:00
Eliot Jones
746cbfa30c remove lenient parsing from font related classes
lenient parsing gives us more code to maintain for no real benefit, parsing should always be as lenient as possible. remove the flag from some of the font code.
2020-02-27 18:10:02 +00:00
Eliot Jones
ec9e425712 use length from stream dictionary if directly available
when brute forcing we use the length available in the stream's dictionary token if it is a direct number rather than an indirect reference.
2020-02-27 17:17:49 +00:00
Eliot Jones
c033a0b3fe handle missing end bytes for cmap code
when the byte input does not contain a code of the full code length we return early.
2020-02-27 15:26:48 +00:00
Eliot Jones
0fcc4e54c8 add istestproject setting to all projects
indicates which projects are test projects to the test runner.
2020-02-27 12:35:40 +00:00
Eliot Jones
4150881be9 recover from invalid acro-form references
we add a try/catch to the direct object finder's tryget method so it returns false rather than throwing.

if we have an acro-form reference in the catalog but no corresponding object in the document we instead scan all objects in the document to find form fields and reconstruct the acro-form dictionary.
2020-02-27 12:08:40 +00:00
Eliot Jones
f415c3116e cross reference offset is in the xref table we ignore the error
previously we checked the offset was not inside the table (correct thing to check), however this is only a special case of the more general issue (cross reference offsets are wrong). we move handling for this into the pdf token scanner. if we attempt to read an object at an offset and it fails we brute force the entire file to find correct offsets. we also needed to add handling to make sure we don't attempt to use stream length tokens if we're brute-forcing since we can't look up indirect references for length.
2020-02-26 14:03:46 +00:00
Eliot Jones
7d0d5806a9 fix reverse xref location search
when brute force searching for the start of the cross-reference table (xref) we read 5 byte buffers, previously if the 'x' of 'xref' was the first character of the buffer we skipped it. this checks when 'x' is the first character of the buffer.
2020-02-26 12:55:11 +00:00
Eliot Jones
f07e2dfb84 more tolerant handling of endimage recovery
fixes the recorded offset when an endimage is recovered from the first time. it was off by one so if the subsequent end image was also the wrong tag then the second attempt at recovery failed.

also allows recovery when other tags appear after an endimage as long as they're not block ending operations (end text, perhaps pop/push in future).
2020-02-26 12:41:39 +00:00
Eliot Jones
43afac8f5d default to zero width characters in truetype for '.notdef'
when the character is not defined and the corresponding '.notdef' glyph isn't included in a truetype font we now default to a zero width character. it might be that we should use the default/missing width instead but this will work ok for most use-cases.
2020-02-26 12:39:12 +00:00
Eliot Jones
486ea446c5 #141 divide width by 1000 for adobe type 1 font
the width shouldn't be transformed by the font's matrix, instead the width is divided by 1000 by default.
2020-02-25 13:44:15 +00:00
Eliot Jones
d6d3869fe2 fix brute force searcher offsets
the brute force searcher offsets were off by one. this change means the offset returned is now aligned with the object number in the object number/generation/operator triple.
2020-02-24 12:24:18 +00:00
BobLd
0afaa19d15 Handle null CurrentPath 2020-02-24 11:20:56 +00:00
BobLd
1d095af974 Implement Modify Clipping operations 2020-02-24 11:20:56 +00:00
BobLd
b0eaccf56f Add PdfRectangle.Contains(PdfRectangle) and tests 2020-02-23 14:44:35 +00:00
BobLd
bbdb778d5e use Count == 0 i.o. !Any() 2020-02-23 11:23:27 +00:00
BobLd
c6669679d6 use MemberData in Line.Length tests 2020-02-23 11:23:27 +00:00
BobLd
1b15238e31 avoid creating PdfLines in IntersectsWith 2020-02-23 11:23:27 +00:00
BobLd
49caa071ba improve length computation
tidy up IntersectsWith()
2020-02-23 11:23:27 +00:00
BobLd
7d7a7fc5ee remove useless code 2020-02-23 11:23:27 +00:00
BobLd
42245d70ca Improve PdfRectangle.GetWidthHeight();
Improve and simplify Word's oriented bounding box
2020-02-23 11:23:27 +00:00
BobLd
67c5abf2b6 fix Intersect(BezierCurve, Line) and add tests 2020-02-23 11:23:27 +00:00
BobLd
2878e74017 Add bezier curve test 2020-02-23 11:23:27 +00:00
BobLd
36566f42e6 Create generic methods for lines in GeometryExtensions 2020-02-23 11:23:27 +00:00
BobLd
6fe0ef0351 Add line tests and improve GeometryExtensions 2020-02-23 11:23:27 +00:00
BobLd
b8d1eba8ee PdfLine.Insersect() 2020-02-23 11:23:27 +00:00
BobLd
d2ac2f598a update Centroid, GetWidthHeight and tests 2020-02-23 11:23:27 +00:00
BobLd
2c8c6cda87 add GeometryExtensions tests 2020-02-23 11:23:27 +00:00