Commit Graph

65 Commits

Author SHA1 Message Date
Eliot Jones
07df6fd740 read last line of ignore file (#1155)
* read last line of ignore file

- do not cancel other matrix jobs if one test fails
- read all lines of the ignore list even if it doesn't end with a newline
- add ignore list for 0008 and 0009

* support missing object numbers when brute-forcing

the file 10404 (ironically) contains not found references with number 43 0
for its info dictionary. changes brute-force code so that objects can be
entirely missing

* fix test since document is now opened successfully but mediabox is broken
2025-09-13 16:57:35 +02:00
EliotJones
77db6c6b54 add test jobs for common crawl 0000 to 0007 2025-09-13 14:52:04 +01:00
Eliot Jones
8408c98aec Draft release on master build (#1145)
* remove alpha postfix, releases will increment version

* update the master build job to draft a release

* add publish action to publish full release

* enable setting assembly and file version

* bump assembly and file version for package project

---------

Co-authored-by: BobLd <38405645+BobLd@users.noreply.github.com>
2025-09-08 20:07:36 +01:00
Eliot Jones
0afe021ad3 move file parsing to single-pass static methods (#1102)
* move file parsing to single-pass static methods

for the file 0002973.pdf in the test corpus we need to completely overhaul
how initial xref parsing is done since we need to locate the xref stream by
brute-force and this is currently broken. i wanted to take this opportunity to
change the logic to be more imperative and less like the pdfbox methods with
instance data and classes.

currently the logic is split between the xref offset validator and parser methods
and we call the validator logic twice, followed by brute-force searching again
in the actual parser. we're going to move to a single method that performs
the following steps:

1. find the first (from the end) occurrence of "startxref" and pull out the location
in bytes. this will also support "startref" since some files in the wild have that
2. go to that offset if found and parse the chain of tables or streams by /prev
reference
3. if any element in step 2 fails then we perform a single brute-force over the
entire file and like pdfbox treat later in file-length xrefs as the ultimate arbiter
of the object positions. while we do this we potentially can capture the actual
object offsets since the xref positions are probably incorrect too.

the aim with this is to avoid as much seeking and re-reading of bytes as
possible. while this won't technically be single-pass it gets us much closer. it
also removes the more strict logic requiring a "startxref" token to exist and be
valid, since we can repair this by brute-force anyway.

we will surface as much information as possible from the static method so that
we could in future support an object explorer ui for pdfs.

this will also be more resilient to invalid xref formats with e.g. comment tokens
or missing newlines.

* move more parsing to the static classes

* plumb through the new parsing results

* plug in new parser and remove old classes, port tests to new classes

* update tests to reflect logic changes

* apply correction when file header has offset

* ignore console runner launch settings

* skip offsets outside of file bounds

* fix parsing tables missing a line break

* use brute forced locations if they're already present

* only treat line breaks and spaces as whitespace for stream content

* address review comments

---------

Co-authored-by: BobLd <38405645+BobLd@users.noreply.github.com>
2025-09-02 19:41:00 +01:00
BobLd
b9f2230a0a Add global.json in tools 2025-08-03 08:43:58 +01:00
EliotJones
52c0635273 support performance profiling information in console runner 2025-07-26 15:04:03 -05:00
EliotJones
b6bd0a3169 bump version to 0.1.12-alpha001 2025-07-26 13:43:28 -05:00
EliotJones
3d2e12cb16 version 0.1.11 2025-07-26 13:16:01 -05:00
BobLd
a4a0fe220a Bump version to 0.1.11-alpha001
Some checks are pending
Build and test / build (push) Waiting to run
Run Integration Tests / build (push) Waiting to run
2025-03-08 13:42:57 +00:00
BobLd
d36e9a900f version 0.1.10 2025-03-08 13:00:43 +00:00
BobLd
c4a235fb62 Update Microsoft NuGet packages for UglyToad.PdfPig.Package
Some checks are pending
Build and test / build (push) Waiting to run
Run Integration Tests / build (push) Waiting to run
2025-02-22 12:55:11 +00:00
BobLd
92d3439465 Update UglyToad.PdfPig.ConsoleRunner target framework to net8
Some checks failed
Build and test / build (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
2025-01-14 22:00:18 +00:00
BobLd
7ec4e692a9 Fix "Nightly Release" pipeline following csproj changes 2024-12-15 10:58:38 +00:00
Eliot Jones
c46722fa26 version 0.1.9
Some checks failed
Build and test / build (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
2024-10-06 15:40:24 +01:00
Richard Webb
995f287e0d Update the dependencies in UglyToad.PdfPig.Package (#835) 2024-05-07 20:21:04 +01:00
Jason Nelson
a412a239be Enable nullable annotations (#803)
* Enable nullable annotations

* Remove unused Jetbrain annotations

* Ensure system using statements are first

* Improve nullability annotations

* Annotate encryptionDictionary is non-null when IsEncrypted is true

* Disable nullable for PdfTokenScanner.Get

* Improve nullability annotations for ObjectLocationProvider.TryGetCached

* Revert changes to RGBWorkingSpace

* Update UglyToad.PdfPig.Package with new framework targets (fixes nightly builds)
2024-03-17 18:51:40 +00:00
Eliot Jones
88a148374e add script to easily target a single framework 2024-01-10 21:34:35 +00:00
Eliot Jones
fcf09ac6b3 move nightly builds back to main feed 2023-08-05 16:22:55 +01:00
Eliot Jones
66248f19e4 include readme in package 2023-06-05 22:01:04 +01:00
Eliot Jones
2366d22719 move nightly builds to separate package id
while nightly builds are useful they also cause
a large amount of spam on the main project
nuget. here we try to change the package id
so that it will be hosted as a separate package
while having all the same code and namespaces
this means people can opt into the nightly builds
while keeping the version history of the released
package tidy.

no idea if this will work because actions and
yaml is my idea of hell and is impossible to
debug, but let's give it a go
2023-06-05 21:48:02 +01:00
Eliot Jones
2be4c69c16 set nightly release version
also toggles class back to public for nightly
versions #538
2023-06-05 21:37:34 +01:00
Eliot Jones
16ac297d10 version 0.1.8 2023-06-05 21:36:00 +01:00
Eliot Jones
c68c195ea8 update package target frameworks 2023-05-18 19:59:58 +01:00
Eliot Jones
23aeb66566 bump integration test runner version 2023-05-17 20:34:10 +01:00
BobLd
58b3394d01 Setting C# version to latest in Examples and ConsoleRunner projects (#582) 2023-03-25 19:15:01 +00:00
Eliot Jones
d2944e14e5 change alpha version for nightly builds 2023-01-08 12:01:48 -05:00
Eliot Jones
03692cf42f set version to alpha of 0.1.7 for future nightly builds 2022-04-25 10:06:46 -04:00
Eliot Jones
5597a8f38c version 0.1.6 2022-04-25 09:22:47 -04:00
Eliot Jones
d7898d851c add net 6 as a target framework and dual target tests 2022-04-03 14:48:50 -04:00
Eliot Jones
7ed985a023 move console runner to named file and clean output then run in action 2022-01-11 11:27:50 +00:00
Eliot Jones
a538aaf0de bump version so that nightly builds for pre-release have correct version 2022-01-10 12:40:47 +00:00
Eliot Jones
4c36f84a0d version 0.1.5 2021-09-17 11:16:43 -04:00
Eliot Jones
7176fc8814 nightly push action rework v10 2021-08-16 21:46:23 -04:00
Eliot Jones
e1716e0393 nightly push action rework v9 2021-08-16 21:40:34 -04:00
Eliot Jones
c82287119e nightly push action rework v7 2021-08-16 07:11:39 -04:00
Eliot Jones
50dc67c2ca nightly push action rework v5
include alpha in the generate nightly package version
i don't know which logic nuget uses to detect pre-release
so it's safer to include alpha in the nightly build version.
2021-08-14 13:21:27 -04:00
Eliot Jones
165bc74c53 try creating a nightly push to nuget script v1
this is intended to create nightly build with package
versions of the form x.y.z-yyyyMMdd.sha and
push them to nuget automatically, only where there
are new changes per day. since devops is a nightmare
and entirely untestable we will have to go through 20
iterations to actually get this to work
2021-08-14 13:01:46 -04:00
Eliot Jones
ea4889778d support specifying number of documents for console runner 2021-06-02 13:45:12 -04:00
Eliot Jones
16d26effc5 0.1.5-alpha002 2021-05-09 13:05:54 -04:00
Eliot Jones
ea9c2f045c change console runner to target net core 2.1 2021-03-02 12:24:50 -04:00
Eliot Jones
3437b48925 actions build and test v003
also adds a console runner for testing arbitrary files
2021-02-28 17:03:32 -04:00
Eliot Jones
b930924b9b 0.1.5-alpha001 2021-02-28 13:55:25 -04:00
Eliot Jones
237fd96f9e version 0.1.4 2020-11-29 14:02:42 -04:00
Eliot Jones
ad0fb4ec5b version 0.1.3 2020-11-15 12:08:37 -04:00
Eliot Jones
58ecfbf963 0.1.3-alpha001 2020-09-04 13:19:03 +01:00
Eliot Jones
6359ba5df1 handle objects without endobj markers #198 2020-08-21 18:15:30 +01:00
Eliot Jones
98af575ee3 version 0.1.2 2020-07-04 16:55:14 +01:00
Eliot Jones
5fb04582a7 0.1.2-alpha003 2020-06-20 12:54:31 +01:00
Eliot Jones
256c2833ab 0.1.2-alpha002 2020-05-10 16:36:14 +01:00
Eliot Jones
98dd736f94 0.1.2-alpha001 2020-04-25 15:20:07 +01:00