1774 Commits

Author SHA1 Message Date
Carlo Kok
ce563db133 Clarify handling of optional content group names
Updated comments to clarify handling of optional content group names in PDFs.
nightly-latest
2025-11-23 10:52:21 +00:00
BobLd
37a5dffcaa Skip in ReadCompositeGlyph when glyphIndex is out of range and fix #1213 2025-11-21 21:31:18 +00:00
EliotJones
aef0a78ee6 update release logic to check out master before commit
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
2025-11-14 09:50:07 +00:00
BobLd
005e52783e Ensure no key end up missing in ResolveInternal and fix #1209 2025-11-13 14:53:28 +00:00
BobLd
f4e7db5b5b Simply order by offset also when not doing brute force to fix #1208 2025-11-13 14:26:35 +00:00
BobLd
52ecef0e28 Increment version to 0.1.13 2025-11-13 09:34:17 +00:00
BobLd
599ce04bef Add test that fails before revert of e11dc6b
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
v0.1.12
2025-11-07 22:31:43 +00:00
BobLd
2a6ee918b7 Revert "Avoid a lot of seeks by making most tokenizers no longer read to far by using seek."
This reverts commit e11dc6bf40.
2025-11-07 22:31:43 +00:00
BobLd
9d3cd0a429 Check for array size before slice in ColorSpaceDetailsByteConverter.Convert() 2025-11-07 20:02:39 +00:00
BobLd
b49e5aa697 Check for index out of range in GlyphDataTable.ReadFlags() and fix #1199 2025-11-07 19:39:18 +00:00
Alexander Vinnikov
181fa9d837 make transform stack consistent
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-11-03 20:06:40 +00:00
BobLd
6ce6986d78 Update test run command to use Release configuration
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-10-29 20:29:42 +00:00
BobLd
37c9fef20b Do not slice the stream to the length breaks decoding in FlateDecode 2025-10-29 20:27:40 +00:00
Bert Huijben
e11dc6bf40 Avoid a lot of seeks by making most tokenizers no longer read to far by using seek.
Optimize the FirstPassParser to just fetch a final chunk before doing things char-by-char backwards.
2025-10-28 06:48:41 +00:00
BobLd
40bcc22ea1 Add CMap caching at document level and add MurmurHash3 hashing function
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-10-26 16:20:27 +00:00
Bert Huijben
94d515061e Update NameToUnicodeConvertAglSpecification to test what was intended
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-10-25 19:47:56 +01:00
BobLd
3555521634 Fix regression introduced in 3592fc8 where slicing the stream to the length breaks decoding
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-10-23 19:08:05 +01:00
Bert Huijben
6fba565d66 Avoid doing a true file seek for simple peeking the next char in the token parser
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-10-20 06:33:34 +01:00
Bert Huijben
3592fc8438 Use zlib information to verify compressed content before using it
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
2025-10-15 18:46:36 +01:00
BobLd
c9034f991c Only apply RemoveStridePadding() when bytes per pixel is one and fix #1183 2025-10-15 12:57:25 +01:00
BobLd
255e70f0a7 Set Type 3 font ascent to Top instead of Height, see #1164
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-10-14 12:18:52 +01:00
BobLd
2216ade1f2 Trim excess in long lived font collections
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
2025-10-14 10:26:41 +01:00
BobLd
cf0c33b1e0 Improve DfsIterative() performance
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-10-13 12:35:21 +01:00
BobLd
ffba176060 Improve GroupIndexes() performance with #1178 2025-10-13 12:35:21 +01:00
BobLd
b14f45f59f Add more tests to NearestNeighbourWordExtractorTests
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
2025-10-12 19:51:04 +01:00
ricflams
c28d114b79 Guard against circular references in XRef tables/streams
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
- Detect and prevent an xref table/stream at a certain offset from being read twice; malformed xref tables with circular references could otherwise cause the table-reading to loop forever.
- Another approach could be to prevent TryReadTableAtOffset from changing the bytes' CurrentOffset to the lastObjPosition in its attempt to read a table (eg restore CurrentOffset after the attempt to read a Table) so the outer bytes-loop could continue its search through the entire bytes unaffected.
2025-10-01 06:32:38 +01:00
Richard Flamsholt
d7d01f842e Update test Issue874: No longer missing a font
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
Including the stream-xref means that the formerly missing font is no longer missing, so simply run the two test-cases under the (stricter) assumption of SkipMissingFonts=false.
2025-09-30 18:35:45 +01:00
Richard Flamsholt
33a8d829ee Update test Issue874: Also more text on page 2
Page two has had four more characters added, which is now delected by this xref-stream fix
2025-09-30 18:35:45 +01:00
Richard Flamsholt
57921c7e9b Update test Issue874: Now finds more text on page 1
With the fix for including associated streams, this test now finds more text on the first page. I've verified using Aspose.PDF and by viewing the ErcotFacts.pdf file being tested that yes, it was indeed missing part of the text before.
2025-09-30 18:35:45 +01:00
ricflams
5a6b3970f0 Add table-xref's associated stream-xrefs
- If an XrefTable has an associated stream, as indicated via the XrefStm-property, then read and add that XrefStream
- Any table can have 0 or 1 such associated streams
- A caveat: such an associated stream might also theoretically be part of the Parts-sequence in which case it would be encountered both by looping through all those parts along with all the regular tables and now also by association to any of those tables. It doesn't seem harmful since the offsets are flattened eventually anyway and stored by their offset-key into a mapping-table.
2025-09-30 18:35:45 +01:00
ricflams
397ccb15d6 Add xref-streams tied to any parts, not just the first
On a large sample of pdf-files PdfPig failed to read the correct StructTree-object for about 1% of them. The StructTree object was simply missing in the CrossReferenceTable.CrossReferenceTable.
It turned out that the constructed CrossReferenceTable could miss Stream-parts if there were multiple Table-parts because a stream will only be added if it's associated with the very first Table-part. The remedy would seem to be to check for and add streams that are associated with any of the Table-parts, not just the first one.
On a sample of 72 files where this failed, this changed fixed the StructTree for all of them.
2025-09-30 18:35:45 +01:00
BobLd
ca284e0cb9 Use pageFactoryCache.Clear() in Pages dispose and fix #1170 2025-09-28 17:18:00 +01:00
BobLd
b2f4ca8839 Add GetDescent() and GetAscent() methods to IFont, improve font matrix for TrueTypeSimpleFont and TrueTypeStandard14FallbackSimpleFont and add loose bounding box to Letter
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-09-21 15:07:52 +01:00
BobLd
008959457a Expose letter's font via GetFont(), make Font property as obsolete and use FontDetails instead
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-09-20 17:11:38 +01:00
BobLd
a53d96cb73 Use record struct in FileHeaderOffset 2025-09-20 13:45:50 +01:00
EliotJones
efdedb9495 handle case where offsets are out of range
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
default to returning empty glyph where the offset is out of the
file length range, this fixes file 12623 where the truetype file
is completely broken
2025-09-14 15:26:12 +01:00
BobLd
eb906a776d Handle non seekable stream by copying it into a memory stream and fix #1146 2025-09-14 14:42:59 +01:00
BobLd
44e638ee4d Add initial support to process CFF fonts contained inside a TrueType font 2025-09-14 11:32:32 +01:00
BobLd
304d7dde5a Use correct font matrix when transforming the width in Type 0 font and fix #1156 2025-09-14 08:22:58 +01:00
Eliot Jones
07df6fd740 read last line of ignore file (#1155)
* read last line of ignore file

- do not cancel other matrix jobs if one test fails
- read all lines of the ignore list even if it doesn't end with a newline
- add ignore list for 0008 and 0009

* support missing object numbers when brute-forcing

the file 10404 (ironically) contains not found references with number 43 0
for its info dictionary. changes brute-force code so that objects can be
entirely missing

* fix test since document is now opened successfully but mediabox is broken
2025-09-13 16:57:35 +02:00
Eliot Jones
c96880ac61 handle case where xobjects use same key as fonts (#1154)
in document 10122 the font and xobject names are the same so the
xobject overwrote references to the font for the page content, separate
the dictionaries
2025-09-13 16:49:24 +02:00
EliotJones
77db6c6b54 add test jobs for common crawl 0000 to 0007 2025-09-13 14:52:04 +01:00
EliotJones
e886ae648f copy other parser behavior by treating end of stream as valid end inline image
this file cotains corrupt content following an inline image but other parsers
just treat this content as part of the image and parse the rest of the file
successfully
2025-09-13 14:36:14 +01:00
BobLd
c4f442c0cd Properly fix #1148 by always parsing optional tables in TrueTypeFontParser and remove Type 0 font hack 2025-09-13 12:48:20 +01:00
BobLd
0ef120dc5c Properly handle CompactFontFormatCidFont font matrix and fix #1149 2025-09-13 10:38:35 +01:00
BobLd
d5b97065bd Fix #1148 2025-09-13 10:38:35 +01:00
BobLd
22eab422a3 First create the StreamInputBytes in PdfDocument.Open() to check the stream CanRead and CanSeek
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-09-09 19:12:58 +01:00
Eliot Jones
8408c98aec Draft release on master build (#1145)
* remove alpha postfix, releases will increment version

* update the master build job to draft a release

* add publish action to publish full release

* enable setting assembly and file version

* bump assembly and file version for package project

---------

Co-authored-by: BobLd <38405645+BobLd@users.noreply.github.com>
2025-09-08 20:07:36 +01:00
Eliot Jones
dd5aa46c75 File buffering read stream investigation (#1140)
Some checks failed
Build and test / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
* add test for filebufferingreadstream

* #1124 do not trust reported stream length if bytes can be read at end

the filebufferingreadstream input stream does not report more than the read
length. the change to seek the xref  in a sliding window from the end broke
with the assumption that the reported length was correct. here we switch to
reading the window or continue reading if we can read beyond the stream's
initially reported length while seeking the startxref marker

* remove rogue newlines
2025-09-07 14:39:46 +01:00
BobLd
e4ed4d1b39 Add early version of IOSSystemFontLister
Some checks failed
Build and test / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
2025-09-02 19:53:12 +01:00