Carlo Kok
ce563db133
Clarify handling of optional content group names
...
Updated comments to clarify handling of optional content group names in PDFs.
nightly-latest
2025-11-23 10:52:21 +00:00
BobLd
37a5dffcaa
Skip in ReadCompositeGlyph when glyphIndex is out of range and fix #1213
2025-11-21 21:31:18 +00:00
EliotJones
aef0a78ee6
update release logic to check out master before commit
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
2025-11-14 09:50:07 +00:00
BobLd
005e52783e
Ensure no key end up missing in ResolveInternal and fix #1209
2025-11-13 14:53:28 +00:00
BobLd
f4e7db5b5b
Simply order by offset also when not doing brute force to fix #1208
2025-11-13 14:26:35 +00:00
BobLd
52ecef0e28
Increment version to 0.1.13
2025-11-13 09:34:17 +00:00
BobLd
599ce04bef
Add test that fails before revert of e11dc6b
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
v0.1.12
2025-11-07 22:31:43 +00:00
BobLd
2a6ee918b7
Revert "Avoid a lot of seeks by making most tokenizers no longer read to far by using seek."
...
This reverts commit e11dc6bf40 .
2025-11-07 22:31:43 +00:00
BobLd
9d3cd0a429
Check for array size before slice in ColorSpaceDetailsByteConverter.Convert()
2025-11-07 20:02:39 +00:00
BobLd
b49e5aa697
Check for index out of range in GlyphDataTable.ReadFlags() and fix #1199
2025-11-07 19:39:18 +00:00
Alexander Vinnikov
181fa9d837
make transform stack consistent
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-11-03 20:06:40 +00:00
BobLd
6ce6986d78
Update test run command to use Release configuration
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-10-29 20:29:42 +00:00
BobLd
37c9fef20b
Do not slice the stream to the length breaks decoding in FlateDecode
2025-10-29 20:27:40 +00:00
Bert Huijben
e11dc6bf40
Avoid a lot of seeks by making most tokenizers no longer read to far by using seek.
...
Optimize the FirstPassParser to just fetch a final chunk before doing things char-by-char backwards.
2025-10-28 06:48:41 +00:00
BobLd
40bcc22ea1
Add CMap caching at document level and add MurmurHash3 hashing function
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-10-26 16:20:27 +00:00
Bert Huijben
94d515061e
Update NameToUnicodeConvertAglSpecification to test what was intended
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-10-25 19:47:56 +01:00
BobLd
3555521634
Fix regression introduced in 3592fc8 where slicing the stream to the length breaks decoding
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-10-23 19:08:05 +01:00
Bert Huijben
6fba565d66
Avoid doing a true file seek for simple peeking the next char in the token parser
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-10-20 06:33:34 +01:00
Bert Huijben
3592fc8438
Use zlib information to verify compressed content before using it
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
2025-10-15 18:46:36 +01:00
BobLd
c9034f991c
Only apply RemoveStridePadding() when bytes per pixel is one and fix #1183
2025-10-15 12:57:25 +01:00
BobLd
255e70f0a7
Set Type 3 font ascent to Top instead of Height, see #1164
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-10-14 12:18:52 +01:00
BobLd
2216ade1f2
Trim excess in long lived font collections
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
2025-10-14 10:26:41 +01:00
BobLd
cf0c33b1e0
Improve DfsIterative() performance
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-10-13 12:35:21 +01:00
BobLd
ffba176060
Improve GroupIndexes() performance with #1178
2025-10-13 12:35:21 +01:00
BobLd
b14f45f59f
Add more tests to NearestNeighbourWordExtractorTests
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
2025-10-12 19:51:04 +01:00
ricflams
c28d114b79
Guard against circular references in XRef tables/streams
...
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
- Detect and prevent an xref table/stream at a certain offset from being read twice; malformed xref tables with circular references could otherwise cause the table-reading to loop forever.
- Another approach could be to prevent TryReadTableAtOffset from changing the bytes' CurrentOffset to the lastObjPosition in its attempt to read a table (eg restore CurrentOffset after the attempt to read a Table) so the outer bytes-loop could continue its search through the entire bytes unaffected.
2025-10-01 06:32:38 +01:00
Richard Flamsholt
d7d01f842e
Update test Issue874: No longer missing a font
...
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
Including the stream-xref means that the formerly missing font is no longer missing, so simply run the two test-cases under the (stricter) assumption of SkipMissingFonts=false.
2025-09-30 18:35:45 +01:00
Richard Flamsholt
33a8d829ee
Update test Issue874: Also more text on page 2
...
Page two has had four more characters added, which is now delected by this xref-stream fix
2025-09-30 18:35:45 +01:00
Richard Flamsholt
57921c7e9b
Update test Issue874: Now finds more text on page 1
...
With the fix for including associated streams, this test now finds more text on the first page. I've verified using Aspose.PDF and by viewing the ErcotFacts.pdf file being tested that yes, it was indeed missing part of the text before.
2025-09-30 18:35:45 +01:00
ricflams
5a6b3970f0
Add table-xref's associated stream-xrefs
...
- If an XrefTable has an associated stream, as indicated via the XrefStm-property, then read and add that XrefStream
- Any table can have 0 or 1 such associated streams
- A caveat: such an associated stream might also theoretically be part of the Parts-sequence in which case it would be encountered both by looping through all those parts along with all the regular tables and now also by association to any of those tables. It doesn't seem harmful since the offsets are flattened eventually anyway and stored by their offset-key into a mapping-table.
2025-09-30 18:35:45 +01:00
ricflams
397ccb15d6
Add xref-streams tied to any parts, not just the first
...
On a large sample of pdf-files PdfPig failed to read the correct StructTree-object for about 1% of them. The StructTree object was simply missing in the CrossReferenceTable.CrossReferenceTable.
It turned out that the constructed CrossReferenceTable could miss Stream-parts if there were multiple Table-parts because a stream will only be added if it's associated with the very first Table-part. The remedy would seem to be to check for and add streams that are associated with any of the Table-parts, not just the first one.
On a sample of 72 files where this failed, this changed fixed the StructTree for all of them.
2025-09-30 18:35:45 +01:00
BobLd
ca284e0cb9
Use pageFactoryCache.Clear() in Pages dispose and fix #1170
2025-09-28 17:18:00 +01:00
BobLd
b2f4ca8839
Add GetDescent() and GetAscent() methods to IFont, improve font matrix for TrueTypeSimpleFont and TrueTypeStandard14FallbackSimpleFont and add loose bounding box to Letter
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-09-21 15:07:52 +01:00
BobLd
008959457a
Expose letter's font via GetFont(), make Font property as obsolete and use FontDetails instead
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-09-20 17:11:38 +01:00
BobLd
a53d96cb73
Use record struct in FileHeaderOffset
2025-09-20 13:45:50 +01:00
EliotJones
efdedb9495
handle case where offsets are out of range
...
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
default to returning empty glyph where the offset is out of the
file length range, this fixes file 12623 where the truetype file
is completely broken
2025-09-14 15:26:12 +01:00
BobLd
eb906a776d
Handle non seekable stream by copying it into a memory stream and fix #1146
2025-09-14 14:42:59 +01:00
BobLd
44e638ee4d
Add initial support to process CFF fonts contained inside a TrueType font
2025-09-14 11:32:32 +01:00
BobLd
304d7dde5a
Use correct font matrix when transforming the width in Type 0 font and fix #1156
2025-09-14 08:22:58 +01:00
Eliot Jones
07df6fd740
read last line of ignore file ( #1155 )
...
* read last line of ignore file
- do not cancel other matrix jobs if one test fails
- read all lines of the ignore list even if it doesn't end with a newline
- add ignore list for 0008 and 0009
* support missing object numbers when brute-forcing
the file 10404 (ironically) contains not found references with number 43 0
for its info dictionary. changes brute-force code so that objects can be
entirely missing
* fix test since document is now opened successfully but mediabox is broken
2025-09-13 16:57:35 +02:00
Eliot Jones
c96880ac61
handle case where xobjects use same key as fonts ( #1154 )
...
in document 10122 the font and xobject names are the same so the
xobject overwrote references to the font for the page content, separate
the dictionaries
2025-09-13 16:49:24 +02:00
EliotJones
77db6c6b54
add test jobs for common crawl 0000 to 0007
2025-09-13 14:52:04 +01:00
EliotJones
e886ae648f
copy other parser behavior by treating end of stream as valid end inline image
...
this file cotains corrupt content following an inline image but other parsers
just treat this content as part of the image and parse the rest of the file
successfully
2025-09-13 14:36:14 +01:00
BobLd
c4f442c0cd
Properly fix #1148 by always parsing optional tables in TrueTypeFontParser and remove Type 0 font hack
2025-09-13 12:48:20 +01:00
BobLd
0ef120dc5c
Properly handle CompactFontFormatCidFont font matrix and fix #1149
2025-09-13 10:38:35 +01:00
BobLd
d5b97065bd
Fix #1148
2025-09-13 10:38:35 +01:00
BobLd
22eab422a3
First create the StreamInputBytes in PdfDocument.Open() to check the stream CanRead and CanSeek
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-09-09 19:12:58 +01:00
Eliot Jones
8408c98aec
Draft release on master build ( #1145 )
...
* remove alpha postfix, releases will increment version
* update the master build job to draft a release
* add publish action to publish full release
* enable setting assembly and file version
* bump assembly and file version for package project
---------
Co-authored-by: BobLd <38405645+BobLd@users.noreply.github.com >
2025-09-08 20:07:36 +01:00
Eliot Jones
dd5aa46c75
File buffering read stream investigation ( #1140 )
...
Build and test / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
* add test for filebufferingreadstream
* #1124 do not trust reported stream length if bytes can be read at end
the filebufferingreadstream input stream does not report more than the read
length. the change to seek the xref in a sliding window from the end broke
with the assumption that the reported length was correct. here we switch to
reading the window or continue reading if we can read beyond the stream's
initially reported length while seeking the startxref marker
* remove rogue newlines
2025-09-07 14:39:46 +01:00
BobLd
e4ed4d1b39
Add early version of IOSSystemFontLister
Build and test / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
2025-09-02 19:53:12 +01:00