PdfPig

lsm/PdfPig

mirror of https://github.com/UglyToad/PdfPig.git synced 2025-10-07 16:14:03 +08:00

Author	SHA1	Message	Date
Eliot Jones	07df6fd740	read last line of ignore file (#1155 ) * read last line of ignore file - do not cancel other matrix jobs if one test fails - read all lines of the ignore list even if it doesn't end with a newline - add ignore list for 0008 and 0009 * support missing object numbers when brute-forcing the file 10404 (ironically) contains not found references with number 43 0 for its info dictionary. changes brute-force code so that objects can be entirely missing * fix test since document is now opened successfully but mediabox is broken	2025-09-13 16:57:35 +02:00
EliotJones	77db6c6b54	add test jobs for common crawl 0000 to 0007	2025-09-13 14:52:04 +01:00
Eliot Jones	8408c98aec	Draft release on master build (#1145 ) * remove alpha postfix, releases will increment version * update the master build job to draft a release * add publish action to publish full release * enable setting assembly and file version * bump assembly and file version for package project --------- Co-authored-by: BobLd <38405645+BobLd@users.noreply.github.com>	2025-09-08 20:07:36 +01:00
Eliot Jones	0afe021ad3	move file parsing to single-pass static methods (#1102 ) * move file parsing to single-pass static methods for the file 0002973.pdf in the test corpus we need to completely overhaul how initial xref parsing is done since we need to locate the xref stream by brute-force and this is currently broken. i wanted to take this opportunity to change the logic to be more imperative and less like the pdfbox methods with instance data and classes. currently the logic is split between the xref offset validator and parser methods and we call the validator logic twice, followed by brute-force searching again in the actual parser. we're going to move to a single method that performs the following steps: 1. find the first (from the end) occurrence of "startxref" and pull out the location in bytes. this will also support "startref" since some files in the wild have that 2. go to that offset if found and parse the chain of tables or streams by /prev reference 3. if any element in step 2 fails then we perform a single brute-force over the entire file and like pdfbox treat later in file-length xrefs as the ultimate arbiter of the object positions. while we do this we potentially can capture the actual object offsets since the xref positions are probably incorrect too. the aim with this is to avoid as much seeking and re-reading of bytes as possible. while this won't technically be single-pass it gets us much closer. it also removes the more strict logic requiring a "startxref" token to exist and be valid, since we can repair this by brute-force anyway. we will surface as much information as possible from the static method so that we could in future support an object explorer ui for pdfs. this will also be more resilient to invalid xref formats with e.g. comment tokens or missing newlines. * move more parsing to the static classes * plumb through the new parsing results * plug in new parser and remove old classes, port tests to new classes * update tests to reflect logic changes * apply correction when file header has offset * ignore console runner launch settings * skip offsets outside of file bounds * fix parsing tables missing a line break * use brute forced locations if they're already present * only treat line breaks and spaces as whitespace for stream content * address review comments --------- Co-authored-by: BobLd <38405645+BobLd@users.noreply.github.com>	2025-09-02 19:41:00 +01:00
BobLd	b9f2230a0a	Add global.json in tools	2025-08-03 08:43:58 +01:00
EliotJones	52c0635273	support performance profiling information in console runner	2025-07-26 15:04:03 -05:00
EliotJones	b6bd0a3169	bump version to 0.1.12-alpha001	2025-07-26 13:43:28 -05:00
EliotJones	3d2e12cb16	version 0.1.11	2025-07-26 13:16:01 -05:00
BobLd	a4a0fe220a	Bump version to 0.1.11-alpha001 Some checks are pending Build and test / build (push) Waiting to run Details Run Integration Tests / build (push) Waiting to run Details	2025-03-08 13:42:57 +00:00
BobLd	d36e9a900f	version 0.1.10	2025-03-08 13:00:43 +00:00
BobLd	c4a235fb62	Update Microsoft NuGet packages for UglyToad.PdfPig.Package Some checks are pending Build and test / build (push) Waiting to run Details Run Integration Tests / build (push) Waiting to run Details	2025-02-22 12:55:11 +00:00
BobLd	92d3439465	Update UglyToad.PdfPig.ConsoleRunner target framework to net8 Some checks failed Build and test / build (push) Has been cancelled Details Run Integration Tests / build (push) Has been cancelled Details	2025-01-14 22:00:18 +00:00
BobLd	7ec4e692a9	Fix "Nightly Release" pipeline following csproj changes	2024-12-15 10:58:38 +00:00
Eliot Jones	c46722fa26	version 0.1.9 Some checks failed Build and test / build (push) Has been cancelled Details Run Integration Tests / build (push) Has been cancelled Details	2024-10-06 15:40:24 +01:00
Richard Webb	995f287e0d	Update the dependencies in UglyToad.PdfPig.Package (#835 )	2024-05-07 20:21:04 +01:00
Jason Nelson	a412a239be	Enable nullable annotations (#803 ) * Enable nullable annotations * Remove unused Jetbrain annotations * Ensure system using statements are first * Improve nullability annotations * Annotate encryptionDictionary is non-null when IsEncrypted is true * Disable nullable for PdfTokenScanner.Get * Improve nullability annotations for ObjectLocationProvider.TryGetCached * Revert changes to RGBWorkingSpace * Update UglyToad.PdfPig.Package with new framework targets (fixes nightly builds)	2024-03-17 18:51:40 +00:00
Eliot Jones	88a148374e	add script to easily target a single framework	2024-01-10 21:34:35 +00:00
Eliot Jones	fcf09ac6b3	move nightly builds back to main feed	2023-08-05 16:22:55 +01:00
Eliot Jones	66248f19e4	include readme in package	2023-06-05 22:01:04 +01:00
Eliot Jones	2366d22719	move nightly builds to separate package id while nightly builds are useful they also cause a large amount of spam on the main project nuget. here we try to change the package id so that it will be hosted as a separate package while having all the same code and namespaces this means people can opt into the nightly builds while keeping the version history of the released package tidy. no idea if this will work because actions and yaml is my idea of hell and is impossible to debug, but let's give it a go	2023-06-05 21:48:02 +01:00
Eliot Jones	2be4c69c16	set nightly release version also toggles class back to public for nightly versions #538	2023-06-05 21:37:34 +01:00
Eliot Jones	16ac297d10	version 0.1.8	2023-06-05 21:36:00 +01:00
Eliot Jones	c68c195ea8	update package target frameworks	2023-05-18 19:59:58 +01:00
Eliot Jones	23aeb66566	bump integration test runner version	2023-05-17 20:34:10 +01:00
BobLd	58b3394d01	Setting C# version to latest in Examples and ConsoleRunner projects (#582 )	2023-03-25 19:15:01 +00:00
Eliot Jones	d2944e14e5	change alpha version for nightly builds	2023-01-08 12:01:48 -05:00
Eliot Jones	03692cf42f	set version to alpha of 0.1.7 for future nightly builds	2022-04-25 10:06:46 -04:00
Eliot Jones	5597a8f38c	version 0.1.6	2022-04-25 09:22:47 -04:00
Eliot Jones	d7898d851c	add net 6 as a target framework and dual target tests	2022-04-03 14:48:50 -04:00
Eliot Jones	7ed985a023	move console runner to named file and clean output then run in action	2022-01-11 11:27:50 +00:00
Eliot Jones	a538aaf0de	bump version so that nightly builds for pre-release have correct version	2022-01-10 12:40:47 +00:00
Eliot Jones	4c36f84a0d	version 0.1.5	2021-09-17 11:16:43 -04:00
Eliot Jones	7176fc8814	nightly push action rework v10	2021-08-16 21:46:23 -04:00
Eliot Jones	e1716e0393	nightly push action rework v9	2021-08-16 21:40:34 -04:00
Eliot Jones	c82287119e	nightly push action rework v7	2021-08-16 07:11:39 -04:00
Eliot Jones	50dc67c2ca	nightly push action rework v5 include alpha in the generate nightly package version i don't know which logic nuget uses to detect pre-release so it's safer to include alpha in the nightly build version.	2021-08-14 13:21:27 -04:00
Eliot Jones	165bc74c53	try creating a nightly push to nuget script v1 this is intended to create nightly build with package versions of the form x.y.z-yyyyMMdd.sha and push them to nuget automatically, only where there are new changes per day. since devops is a nightmare and entirely untestable we will have to go through 20 iterations to actually get this to work	2021-08-14 13:01:46 -04:00
Eliot Jones	ea4889778d	support specifying number of documents for console runner	2021-06-02 13:45:12 -04:00
Eliot Jones	16d26effc5	0.1.5-alpha002	2021-05-09 13:05:54 -04:00
Eliot Jones	ea9c2f045c	change console runner to target net core 2.1	2021-03-02 12:24:50 -04:00
Eliot Jones	3437b48925	actions build and test v003 also adds a console runner for testing arbitrary files	2021-02-28 17:03:32 -04:00
Eliot Jones	b930924b9b	0.1.5-alpha001	2021-02-28 13:55:25 -04:00
Eliot Jones	237fd96f9e	version 0.1.4	2020-11-29 14:02:42 -04:00
Eliot Jones	ad0fb4ec5b	version 0.1.3	2020-11-15 12:08:37 -04:00
Eliot Jones	58ecfbf963	0.1.3-alpha001	2020-09-04 13:19:03 +01:00
Eliot Jones	6359ba5df1	handle objects without endobj markers #198	2020-08-21 18:15:30 +01:00
Eliot Jones	98af575ee3	version 0.1.2	2020-07-04 16:55:14 +01:00
Eliot Jones	5fb04582a7	0.1.2-alpha003	2020-06-20 12:54:31 +01:00
Eliot Jones	256c2833ab	0.1.2-alpha002	2020-05-10 16:36:14 +01:00
Eliot Jones	98dd736f94	0.1.2-alpha001	2020-04-25 15:20:07 +01:00

1 2

65 Commits