Commit Graph

  • 7c4f5e2424 Introduce StackDepthGuard class to check for stack depth in CoreTokenScanner and fix #1217 master nightly-latest BobLd 2025-12-02 12:05:43 +00:00
  • bd573b2a1c Increment version to 0.1.14 BobLd 2025-12-23 10:48:11 +00:00
  • baeac0d0c3 Do not return glyph bbox and path in Type1Font if character name is '.notdef' 0.1.13 BobLd 2025-12-22 10:50:47 +00:00
  • ee0cb1dc4a Use file header offset when doing brute force find and fix #1223 BobLd 2025-12-07 13:28:32 +00:00
  • c70b343caa Minor Type1FontParser optimisations BobLd 2025-12-02 15:01:51 +00:00
  • ce563db133 Clarify handling of optional content group names Carlo Kok 2025-11-22 07:34:35 +01:00
  • 37a5dffcaa Skip in ReadCompositeGlyph when glyphIndex is out of range and fix #1213 BobLd 2025-11-21 21:19:36 +00:00
  • aef0a78ee6 update release logic to check out master before commit EliotJones 2025-11-13 21:07:41 -04:00
  • 3174289a14 update release logic to check out master before commit release-actions-fix-1 EliotJones 2025-11-13 21:07:41 -04:00
  • 005e52783e Ensure no key end up missing in ResolveInternal and fix #1209 BobLd 2025-11-13 14:30:03 +00:00
  • f4e7db5b5b Simply order by offset also when not doing brute force to fix #1208 BobLd 2025-11-13 13:26:24 +00:00
  • 52ecef0e28 Increment version to 0.1.13 BobLd 2025-11-13 09:17:02 +00:00
  • 599ce04bef Add test that fails before revert of e11dc6b v0.1.12 BobLd 2025-11-07 22:18:53 +00:00
  • 2a6ee918b7 Revert "Avoid a lot of seeks by making most tokenizers no longer read to far by using seek." BobLd 2025-11-07 22:13:19 +00:00
  • 9d3cd0a429 Check for array size before slice in ColorSpaceDetailsByteConverter.Convert() BobLd 2025-11-07 19:43:26 +00:00
  • b49e5aa697 Check for index out of range in GlyphDataTable.ReadFlags() and fix #1199 BobLd 2025-11-07 19:03:08 +00:00
  • 181fa9d837 make transform stack consistent Alexander Vinnikov 2025-10-23 15:50:22 +02:00
  • 6ce6986d78 Update test run command to use Release configuration BobLd 2025-10-29 20:25:49 +00:00
  • 37c9fef20b Do not slice the stream to the length breaks decoding in FlateDecode BobLd 2025-10-29 20:05:47 +00:00
  • e11dc6bf40 Avoid a lot of seeks by making most tokenizers no longer read to far by using seek. Bert Huijben 2025-10-16 11:36:49 +02:00
  • 40bcc22ea1 Add CMap caching at document level and add MurmurHash3 hashing function BobLd 2025-10-26 15:36:42 +00:00
  • 94d515061e Update NameToUnicodeConvertAglSpecification to test what was intended Bert Huijben 2025-10-18 18:58:08 +02:00
  • 3555521634 Fix regression introduced in 3592fc8 where slicing the stream to the length breaks decoding BobLd 2025-10-23 18:53:20 +01:00
  • 6fba565d66 Avoid doing a true file seek for simple peeking the next char in the token parser Bert Huijben 2025-10-16 10:32:14 +02:00
  • 3592fc8438 Use zlib information to verify compressed content before using it Bert Huijben 2025-10-15 16:41:06 +02:00
  • c9034f991c Only apply RemoveStridePadding() when bytes per pixel is one and fix #1183 BobLd 2025-10-14 17:40:40 +01:00
  • 255e70f0a7 Set Type 3 font ascent to Top instead of Height, see #1164 BobLd 2025-10-14 11:58:15 +01:00
  • 2216ade1f2 Trim excess in long lived font collections BobLd 2025-10-14 10:08:50 +01:00
  • cf0c33b1e0 Improve DfsIterative() performance BobLd 2025-10-12 21:21:17 +01:00
  • ffba176060 Improve GroupIndexes() performance with #1178 BobLd 2025-10-12 20:17:34 +01:00
  • b14f45f59f Add more tests to NearestNeighbourWordExtractorTests BobLd 2025-10-12 19:34:47 +01:00
  • c28d114b79 Guard against circular references in XRef tables/streams ricflams 2025-10-01 01:25:36 +02:00
  • d7d01f842e Update test Issue874: No longer missing a font Richard Flamsholt 2025-09-30 18:00:47 +02:00
  • 33a8d829ee Update test Issue874: Also more text on page 2 Richard Flamsholt 2025-09-30 16:18:07 +02:00
  • 57921c7e9b Update test Issue874: Now finds more text on page 1 Richard Flamsholt 2025-09-30 15:54:09 +02:00
  • 5a6b3970f0 Add table-xref's associated stream-xrefs ricflams 2025-09-29 17:46:52 +02:00
  • 397ccb15d6 Add xref-streams tied to any parts, not just the first ricflams 2025-09-26 20:36:43 +02:00
  • ca284e0cb9 Use pageFactoryCache.Clear() in Pages dispose and fix #1170 BobLd 2025-09-28 14:34:34 +01:00
  • b2f4ca8839 Add GetDescent() and GetAscent() methods to IFont, improve font matrix for TrueTypeSimpleFont and TrueTypeStandard14FallbackSimpleFont and add loose bounding box to Letter BobLd 2025-09-21 14:27:57 +01:00
  • 008959457a Expose letter's font via GetFont(), make Font property as obsolete and use FontDetails instead BobLd 2025-09-20 13:46:25 +01:00
  • a53d96cb73 Use record struct in FileHeaderOffset BobLd 2025-09-14 16:34:43 +01:00
  • efdedb9495 handle case where offsets are out of range EliotJones 2025-09-14 16:00:10 +02:00
  • 1b5927b2a5 handle case where offsets are out of range file-12623-true-type-error-handling EliotJones 2025-09-14 16:00:10 +02:00
  • eb906a776d Handle non seekable stream by copying it into a memory stream and fix #1146 BobLd 2025-09-14 08:43:06 +01:00
  • 44e638ee4d Add initial support to process CFF fonts contained inside a TrueType font BobLd 2025-09-14 11:17:33 +01:00
  • 304d7dde5a Use correct font matrix when transforming the width in Type 0 font and fix #1156 BobLd 2025-09-14 08:13:22 +01:00
  • 07df6fd740 read last line of ignore file (#1155) Eliot Jones 2025-09-13 16:57:35 +02:00
  • c96880ac61 handle case where xobjects use same key as fonts (#1154) Eliot Jones 2025-09-13 16:49:24 +02:00
  • 14af2a3858 fix test since document is now opened successfully but mediabox is broken fix-common-crawl-action EliotJones 2025-09-13 16:48:39 +02:00
  • 853ce8b93e support missing object numbers when brute-forcing EliotJones 2025-09-13 16:27:09 +02:00
  • c57cd5008b read last line of ignore file EliotJones 2025-09-13 16:20:01 +02:00
  • 472fcea60e handle case where xobjects use same key as fonts fix-10122-dictionary-conflicts EliotJones 2025-09-13 15:52:16 +02:00
  • 77db6c6b54 add test jobs for common crawl 0000 to 0007 EliotJones 2025-09-13 15:28:36 +02:00
  • e886ae648f copy other parser behavior by treating end of stream as valid end inline image EliotJones 2025-09-13 14:58:07 +02:00
  • c4f442c0cd Properly fix #1148 by always parsing optional tables in TrueTypeFontParser and remove Type 0 font hack BobLd 2025-09-13 12:28:12 +01:00
  • 0ef120dc5c Properly handle CompactFontFormatCidFont font matrix and fix #1149 BobLd 2025-09-13 10:24:15 +01:00
  • d5b97065bd Fix #1148 BobLd 2025-09-13 09:43:14 +01:00
  • 22eab422a3 First create the StreamInputBytes in PdfDocument.Open() to check the stream CanRead and CanSeek BobLd 2025-09-09 19:03:17 +01:00
  • 8408c98aec Draft release on master build (#1145) Eliot Jones 2025-09-08 21:07:36 +02:00
  • dd5aa46c75 File buffering read stream investigation (#1140) Eliot Jones 2025-09-07 15:39:46 +02:00
  • e4ed4d1b39 Add early version of IOSSystemFontLister BobLd 2025-09-02 19:35:26 +01:00
  • 0afe021ad3 move file parsing to single-pass static methods (#1102) Eliot Jones 2025-09-02 20:41:00 +02:00
  • 3650e27432 add container node support for BookmarksProvider.cs (#1133) Karl 2025-08-15 04:17:58 +08:00
  • a43b968ea9 Lower max search depth in preventing StackOverflow in ParseTrailer BobLd 2025-08-10 09:17:23 +01:00
  • 1031dcc221 Prevent StackOverflow in ParseTrailer and fix #1122 BobLd 2025-08-08 19:48:49 +01:00
  • 0f641774e6 Update build_and_test_macos.yml BobLd 2025-08-09 08:33:21 +01:00
  • a3edc926c8 Update build_and_test_macos.yml BobLd 2025-08-09 08:20:54 +01:00
  • f1923fcbcd Increase FlateFilter multiplier when preventing malicious OOM and fix #1125 BobLd 2025-08-08 18:52:09 +01:00
  • 7ff58893af only run tests if nightly publish needed EliotJones 2025-08-04 21:46:13 -05:00
  • bee6f13888 fix tag fetching and parse behavior EliotJones 2025-08-04 21:40:28 -05:00
  • e6dd2d15c2 use gemini to mark ched gpt's work and improve the action EliotJones 2025-08-04 21:00:12 -05:00
  • 7dd5d68be3 prevent duplicate package publish on manual run, attempt 1 EliotJones 2025-08-04 20:49:18 -05:00
  • bdf3b8e2b4 Update nightly_release.yml BobLd 2025-08-03 20:02:38 +01:00
  • c8dff885bd Update run_common_crawl_tests.yml BobLd 2025-08-03 08:56:05 +01:00
  • 0b228c57b7 Update run_integration_tests.yml BobLd 2025-08-03 08:52:14 +01:00
  • ef21227b3c Update run_integration_tests.yml BobLd 2025-08-03 08:46:24 +01:00
  • b9f2230a0a Add global.json in tools BobLd 2025-08-03 08:42:38 +01:00
  • b6950a5fb0 Update run_integration_tests.yml (#1117) BobLd 2025-08-03 08:34:50 +01:00
  • 1ed9e017f4 Performance improvements and .Net 9 support (#1116) Chuck B. 2025-08-01 17:24:16 -04:00
  • 83d6fc6cc2 allow missing catalog type definition for catalog dictionary EliotJones 2025-07-26 16:55:20 -05:00
  • febfa4d4b3 Fix usage of List.Contains theolivenbaum 2025-07-26 21:42:40 +02:00
  • 0ebbe0540d add nullability to core projec (#1111) Eliot Jones 2025-07-26 20:48:58 -05:00
  • 3131dae49e passing -r will repeat parsing the set n times, -f will run a single file support-console-runner-single-file-and-repeats EliotJones 2025-07-26 17:30:43 -05:00
  • 52c0635273 support performance profiling information in console runner EliotJones 2025-07-26 15:04:03 -05:00
  • dcb2c2bbe2 Merge branch 'master' into nullability-core-project nullability-core-project Eliot Jones 2025-07-26 13:44:58 -05:00
  • b6bd0a3169 bump version to 0.1.12-alpha001 EliotJones 2025-07-26 13:43:28 -05:00
  • 3d2e12cb16 version 0.1.11 v0.1.11 EliotJones 2025-07-26 13:16:01 -05:00
  • ef5366f117 add nullability to core projec EliotJones 2025-07-26 13:05:18 -05:00
  • 9cb3b71e62 update readme to avoid people using page.Text or asking about editing docs (#1109) Eliot Jones 2025-07-26 12:58:35 -05:00
  • 27df4af5f9 handle additional broken pdf files in the common crawl set EliotJones 2025-07-26 12:32:41 -05:00
  • 6b764e7c20 rogue tab rewrite-readme EliotJones 2025-07-26 12:51:40 -05:00
  • fed46965ff tabs to spaces EliotJones 2025-07-26 12:51:02 -05:00
  • 03fa53ece0 update readme to avoid people using page.Text or asking about editing docs EliotJones 2025-07-26 12:49:42 -05:00
  • 21f1cd5354 handle additional broken pdf files in the common crawl set additional-resilience-checks EliotJones 2025-07-26 12:32:41 -05:00
  • 50f878b2ba restore copy link func logic EliotJones 2025-07-24 19:00:53 -05:00
  • 2a10b6c285 make link copying more tolerant when adding page EliotJones 2025-07-23 20:56:26 -05:00
  • 85fc63d585 rework numeric tokenizer hot path EliotJones 2025-07-24 21:16:25 -05:00
  • 45cf9745a3 rework numeric tokenizer hot path improve-numeric-tokenizer-performance EliotJones 2025-07-24 21:16:25 -05:00
  • f8af7daeeb restore copy link func logic explore-issue-1082 EliotJones 2025-07-24 19:00:53 -05:00
  • d7f0fd96d8 make link copying more tolerant when adding page EliotJones 2025-07-23 20:56:26 -05:00