PdfPig

lsm/PdfPig

mirror of https://github.com/UglyToad/PdfPig.git synced 2025-07-18 15:40:11 +08:00

Author	SHA1	Message	Date
Eliot Jones	12ad8278e3	don't lose object stream offsets when brute-forcing offsets	2021-03-01 14:18:19 -04:00
Eliot Jones	becc772242	check for offsets exceeding file length for xref parsing #293	2021-02-21 12:04:17 -04:00
Eliot Jones	bf45602ac5	fix #176 , allow startxref to appear earlier in the document	2020-05-31 17:01:38 +01:00
Eliot Jones	58972de7cb	begin to rework cross-reference parsing most of the cross-reference code is the earliest code in the project and hasn't been revisited since then. the issue #88 has been reopened due to a bug with brute-force searching so this tidies up the code in this area ahead of trying to fix the bug.	2020-03-03 15:21:11 +00:00
Inusual	013cbd14e0	Make CrossReferenceTableParser a static class	2020-03-02 17:00:16 +00:00
Eliot Jones	f415c3116e	cross reference offset is in the xref table we ignore the error previously we checked the offset was not inside the table (correct thing to check), however this is only a special case of the more general issue (cross reference offsets are wrong). we move handling for this into the pdf token scanner. if we attempt to read an object at an offset and it fails we brute force the entire file to find correct offsets. we also needed to add handling to make sure we don't attempt to use stream length tokens if we're brute-forcing since we can't look up indirect references for length.	2020-02-26 14:03:46 +00:00
Eliot Jones	7d0d5806a9	fix reverse xref location search when brute force searching for the start of the cross-reference table (xref) we read 5 byte buffers, previously if the 'x' of 'xref' was the first character of the buffer we skipped it. this checks when 'x' is the first character of the buffer.	2020-02-26 12:55:11 +00:00
Eliot Jones	8ab2838063	recover from invalid cross reference position if we are reading a cross reference offset which contains a number we assumed it was a stream object. if it's not we now brute-force the entire file looking for an 'xref' token. this should be combined with a search for cross-reference streams and should run when we read neither the numeric token or an 'xref' token but for now this fixes the observed issue. also adds number of images to the page api to prevent consumers needing to enumerate.	2020-01-28 18:07:05 +00:00
Eliot Jones	693a3d5958	use offset to file header to correct cross references if the %pdf version header comment is offset from the start of the file the cross reference offsets will also be wrong by that amount. this change updates the cross reference location logic to use the offset from the located version header.	2020-01-26 15:30:20 +00:00
Eliot Jones	a561c8954e	handle the format header being preceded by nonsense some files seem to have the format header preceded by large amounts of junk but this appears to be valid for chrome and acrobat reader. this change ups the amount of nonsense to be read prior to the version header. also makes parsing of the version header culture invariant which may be related to #85.	2020-01-25 16:53:41 +00:00
Eliot Jones	903d71a93d	skip cross references outside file if the previous cross-reference location points to an offset outside the file size we skip it. also makes cid font factory more resilient by skipping missing descriptors.	2020-01-07 12:37:41 +00:00
Eliot Jones	0b048fde57	handle eof further back in file an %%eof for a pdf file may appear further back than the last 1024 bytes. this change doubles the search range. it also handles an empty differences array being defined for a font encoding. we also remove the old approach to dependency injection from the code since we are now favouring static classes where possible.	2020-01-07 11:48:09 +00:00
Eliot Jones	74774995d6	complete move of truetype, afm and standard14 fonts the 3 font types mentioned are moved to the new fonts project, any referenced types are moved to the core project. most truetype classes are made public #8.	2020-01-04 22:39:13 +00:00
Eliot Jones	7c0ef111ea	move classes to new projects to make the project more useful and expose more usable classes we're rearchitecting in the following way. code used to read fonts from external file formats like truetype, adobe font metrics (afm) and adobe type 1 fonts are moving to a new project which doesn't reference most of the pdf logic. the shared logic is moving to a new flat-structured project called core. this is a sort-of onion type architecture, with core being the... core, fonts being the next layer of the onion, pdfpig itself the next. this will then support additional libraries/projects as outer layers of the onion as well as releasing standalone version of the font library as pdfbox does with fontbox.	2020-01-04 16:38:18 +00:00
Eliot Jones	d98b8b43c1	small performance tweaks and remove package license expression package license url is deprecated in favour of package license expression but nuget doesn't seem to support expressions properly for published packages yet so we'll keep the deprecated url for the time being. having both url and expression causes the build to fail. small obvious performance improvements for file header passing and getting the encoding information using the existing reverse name to code map.	2019-08-18 13:47:01 +01:00
Eliot Jones	0dfe742770	continue searching for xref tokens even if an %%eof is encountered #38	2019-07-06 14:26:38 +01:00
Eliot Jones	41eddca0bf	handle incorrect xref offsets #34 previously if the cross reference did not exist at exactly the provided offset we'd immediately throw, now we assume we can read a few more tokens to find the xref table or stream start. this won't work in the case where the provided offset is past the start of the table or nowhere near the table but in those cases there's not much we can do. there's some more work to do to provide a fallback xref parser which finds the xref tables and streams using a brute-force scan of the whole document.	2019-06-23 12:05:21 +01:00
Eliot Jones	caf1a0c233	use invariant culture for parsing all numbers #37	2019-06-18 19:12:51 +01:00
Eliot Jones	4d5518a599	move annotations to experimental access, support changing color state for document creation and update readme	2018-12-30 14:12:04 +00:00
Eliot Jones	2fa781b8e9	#10 make all token classes public and expose via a public structure member on pdf document	2018-11-24 19:02:06 +00:00
Eliot Jones	0f68dfeb19	#10 move tokens to the root namespace for discoverability. upgrade xunit versions. there is a bug with test discovery for stringtokenizertests	2018-11-16 20:00:12 +00:00
Eliot Jones	07161cef28	unify raw byte access method	2018-01-21 18:08:00 +00:00
Eliot Jones	3172596b7c	remove all old cos objects	2018-01-21 14:56:50 +00:00
Eliot Jones	e24a306c31	remove all old parsing logic	2018-01-21 14:48:49 +00:00
Eliot Jones	da7d83d863	finish the migration	2018-01-20 20:20:40 +00:00
Eliot Jones	7d90f4858a	continue migrating code to tokenizer	2018-01-20 18:42:29 +00:00
Eliot Jones	36c0eedd7c	move the usages of cos object key to indirect reference	2018-01-14 14:48:54 +00:00
Eliot Jones	ec62542b64	change the project name to something silly	2018-01-10 19:49:32 +00:00

28 Commits