Commit Graph

45 Commits

Author SHA1 Message Date
davebrokit
f3e37eafae Introduce IBlock and ILettersBlock interfaces (Round 2) (#1263)
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Common Crawl Tests / build (0008-0009) (push) Has been cancelled
Run Common Crawl Tests / build (0010-0011) (push) Has been cancelled
Run Common Crawl Tests / build (0012-0013) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Tag Release / tag_if_version_changed (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
* Code review changes
- Keep the Bounds property on the image classes so this isn't a major breaking API change
- Don't expose letters collection

* Minor fix

* Switch to using BoundingBox in the library

---------

Co-authored-by: davmarksman <david@brokit.co.uk>
2026-02-28 16:25:51 +00:00
Eliot Jones
9c0d6893e0 revert flate decode handling to more lenient processing (#1254)
* revert flate decode handling to more lenient processing

the change to use zlib/a adler checksum verification flow meant that
invalid flate streams would not be decoded correctly. this caused
issues for files that included invalid/missing checksums. this reverts
the processing to the old approach for files like #1235

* fix object stream offset handling and track circular refs

* update tests

* normalize line endings for mac runner

* fixes for mac clownery

* add next pair to common crawl action

* add a test case for the root cause of the int overflow
2026-02-22 15:49:50 +00:00
BobLd
0a2b1e076f Improve HasFormXObjectCircularReference and fix #1250 2026-02-15 18:06:35 +00:00
BobLd
d6e86b057e Handle empty encoding in Type1FontSimple and fix #1248
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2026-02-10 06:14:39 +00:00
BobLd
c27f1b6553 Only throw if ArrayToken length is less than 4 in ToRectangle() and fix #1238
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2026-02-01 16:10:24 +00:00
BobLd
7c4f5e2424 Introduce StackDepthGuard class to check for stack depth in CoreTokenScanner and fix #1217
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-12-23 16:24:04 +01:00
BobLd
ee0cb1dc4a Use file header offset when doing brute force find and fix #1223
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-12-07 13:43:22 +00:00
BobLd
37a5dffcaa Skip in ReadCompositeGlyph when glyphIndex is out of range and fix #1213 2025-11-21 21:31:18 +00:00
BobLd
005e52783e Ensure no key end up missing in ResolveInternal and fix #1209 2025-11-13 14:53:28 +00:00
BobLd
f4e7db5b5b Simply order by offset also when not doing brute force to fix #1208 2025-11-13 14:26:35 +00:00
BobLd
599ce04bef Add test that fails before revert of e11dc6b
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-11-07 22:31:43 +00:00
BobLd
b49e5aa697 Check for index out of range in GlyphDataTable.ReadFlags() and fix #1199 2025-11-07 19:39:18 +00:00
Bert Huijben
3592fc8438 Use zlib information to verify compressed content before using it
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
2025-10-15 18:46:36 +01:00
BobLd
c9034f991c Only apply RemoveStridePadding() when bytes per pixel is one and fix #1183 2025-10-15 12:57:25 +01:00
Richard Flamsholt
d7d01f842e Update test Issue874: No longer missing a font
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
Including the stream-xref means that the formerly missing font is no longer missing, so simply run the two test-cases under the (stricter) assumption of SkipMissingFonts=false.
2025-09-30 18:35:45 +01:00
Richard Flamsholt
33a8d829ee Update test Issue874: Also more text on page 2
Page two has had four more characters added, which is now delected by this xref-stream fix
2025-09-30 18:35:45 +01:00
Richard Flamsholt
57921c7e9b Update test Issue874: Now finds more text on page 1
With the fix for including associated streams, this test now finds more text on the first page. I've verified using Aspose.PDF and by viewing the ErcotFacts.pdf file being tested that yes, it was indeed missing part of the text before.
2025-09-30 18:35:45 +01:00
BobLd
304d7dde5a Use correct font matrix when transforming the width in Type 0 font and fix #1156 2025-09-14 08:22:58 +01:00
Eliot Jones
07df6fd740 read last line of ignore file (#1155)
* read last line of ignore file

- do not cancel other matrix jobs if one test fails
- read all lines of the ignore list even if it doesn't end with a newline
- add ignore list for 0008 and 0009

* support missing object numbers when brute-forcing

the file 10404 (ironically) contains not found references with number 43 0
for its info dictionary. changes brute-force code so that objects can be
entirely missing

* fix test since document is now opened successfully but mediabox is broken
2025-09-13 16:57:35 +02:00
BobLd
d5b97065bd Fix #1148 2025-09-13 10:38:35 +01:00
Eliot Jones
0afe021ad3 move file parsing to single-pass static methods (#1102)
* move file parsing to single-pass static methods

for the file 0002973.pdf in the test corpus we need to completely overhaul
how initial xref parsing is done since we need to locate the xref stream by
brute-force and this is currently broken. i wanted to take this opportunity to
change the logic to be more imperative and less like the pdfbox methods with
instance data and classes.

currently the logic is split between the xref offset validator and parser methods
and we call the validator logic twice, followed by brute-force searching again
in the actual parser. we're going to move to a single method that performs
the following steps:

1. find the first (from the end) occurrence of "startxref" and pull out the location
in bytes. this will also support "startref" since some files in the wild have that
2. go to that offset if found and parse the chain of tables or streams by /prev
reference
3. if any element in step 2 fails then we perform a single brute-force over the
entire file and like pdfbox treat later in file-length xrefs as the ultimate arbiter
of the object positions. while we do this we potentially can capture the actual
object offsets since the xref positions are probably incorrect too.

the aim with this is to avoid as much seeking and re-reading of bytes as
possible. while this won't technically be single-pass it gets us much closer. it
also removes the more strict logic requiring a "startxref" token to exist and be
valid, since we can repair this by brute-force anyway.

we will surface as much information as possible from the static method so that
we could in future support an object explorer ui for pdfs.

this will also be more resilient to invalid xref formats with e.g. comment tokens
or missing newlines.

* move more parsing to the static classes

* plumb through the new parsing results

* plug in new parser and remove old classes, port tests to new classes

* update tests to reflect logic changes

* apply correction when file header has offset

* ignore console runner launch settings

* skip offsets outside of file bounds

* fix parsing tables missing a line break

* use brute forced locations if they're already present

* only treat line breaks and spaces as whitespace for stream content

* address review comments

---------

Co-authored-by: BobLd <38405645+BobLd@users.noreply.github.com>
2025-09-02 19:41:00 +01:00
BobLd
1031dcc221 Prevent StackOverflow in ParseTrailer and fix #1122 2025-08-09 08:46:04 +01:00
BobLd
813d3baa18 Track IndirectReference instead of only ObjectNumber when checking for cycles during indirect reference resolutionv and add test 2025-07-20 19:24:31 +01:00
BobLd
6a50160e65 Prevent RunLengthFilter malicious OOM 2025-06-29 13:57:01 +01:00
BobLd
d9b3891eb3 Do not throw if the Mask dictionary contains a ColorSpace key 2025-05-30 07:53:25 +01:00
BobLd
2b54a546d3 Check for infinite recursion in ObjectLocationProvider.TryGetOffset() and fix #1050 2025-05-28 20:24:31 +01:00
BobLd
5b566b53da Only reset missed attempts count if table is found in CrossReferenceParser.Parse() and fix #1047 2025-05-27 20:57:38 +02:00
BobLd
ca9f70ffb0 Skip control chars in CoreTokenScanner.MoveNext() and fix #1048 2025-05-27 20:57:38 +02:00
BobLd
67d3dde04a Handle TrueType case in CidFontFactory where the font is CFF, implement missing members in PdfCidCompactFontFormatFont and fix #554
Some checks failed
Build and test / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / Check latest commit (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-05-19 00:27:51 +01:00
BobLd
e4d7805a1f Add test to ensure #822 is fixed 2025-05-18 22:32:07 +01:00
BobLd
6911f31b49 Try to repair xref offset by looking for all startxref and fix #1040
Some checks are pending
Build and test / build (push) Waiting to run
Build and test [MacOS] / build (push) Waiting to run
Run Integration Tests / build (push) Waiting to run
2025-05-18 17:32:27 +01:00
BobLd
afdd1f8924 Fix issue #1013 2025-04-20 18:03:04 +01:00
BobLd
74d61bd985 Fix PatternColor Equals() method and fix #1016 2025-03-26 19:48:51 +00:00
BobLd
1b3c7bd355 Better handle integer overflow in DocstrumBoundingBoxes
Some checks failed
Build and test / build (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / Check latest commit (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-03-02 18:29:21 +00:00
BobLd
67d8f56366 Do not throw exception when lenient parsing in GetExtendedGraphicsStateDictionary() and improve StackDictionary.TryGetValue() to not throw on empty 2025-03-02 11:51:26 +00:00
BobLd
5a06e1e1cc Assess if transformedGlyphBounds and use transformedPdfBounds as fallback and fix #987
Some checks are pending
Build and test / build (push) Waiting to run
Run Integration Tests / build (push) Waiting to run
2025-02-20 00:11:06 +00:00
BobLd
fdb8835b37 CcittFaxDecodeFilter: do not check for input length, invert bitmap with ref byte and fix #982
Some checks failed
Build and test / build (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
2025-02-02 14:26:16 +00:00
BobLd
c4576e4ffa Do not throw error on Pop when stack size is 1 in lenient mode and fix #973 2025-01-19 11:19:32 +00:00
BobLd
50dca593da Do not throw exception when lenient parsing in ON in CrossReferenceParser and fix #959 2024-12-28 12:29:28 +01:00
BobLd
9ad51067b0 Handle odd ligatures names and fix #945 (#946) 2024-11-27 19:44:17 +00:00
BobLd
20804245d0 Handle alternate Unicode name representation cXXX and fix #943 (#944) 2024-11-24 20:24:36 +00:00
BobLd
40af401364 Default page number to 0 in ExplicitDestination when the Dest has no page number and fix #736
Some checks failed
Build and test / build (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / Check latest commit (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2024-10-25 21:28:11 +01:00
BobLd
e10609e4e1 Use pdfScanner in ReadVerticalDisplacements and fix #693 and return 0 in CMap on exception in ReadByte() if useLenientParsing is true and fix #692
Some checks failed
Build and test / build (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
2024-10-19 00:29:42 +01:00
BobLd
a258090e1c Fix GetTextOrientation by cleanly checking if rotation is divisible by 90 and fix #913
Some checks failed
Build and test / build (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
2024-10-07 20:08:24 +01:00
BobLd
5c168f9cd0 Handle null token in DirectObjectFinder, handle null state in SetNamedGraphicsState(), add and test and fix #874 2024-09-29 16:43:50 +01:00