PdfPig

lsm/PdfPig

mirror of https://github.com/UglyToad/PdfPig.git synced 2025-10-14 02:44:58 +08:00

Author	SHA1	Message	Date
Eliot Jones	fa5e37dc8c	handle presence of endobj markers in object stream #235	2020-11-22 12:51:38 -04:00
Eliot Jones	6359ba5df1	handle objects without endobj markers #198	2020-08-21 18:15:30 +01:00
romain v	5a82c36631	FIX : undefined references is a valid use case. I tried to mitigate the breaking change by keep on throwing in most uses of the change method.	2020-08-17 11:10:44 +02:00
Eliot Jones	ec9e425712	use length from stream dictionary if directly available when brute forcing we use the length available in the stream's dictionary token if it is a direct number rather than an indirect reference.	2020-02-27 17:17:49 +00:00
Eliot Jones	f415c3116e	cross reference offset is in the xref table we ignore the error previously we checked the offset was not inside the table (correct thing to check), however this is only a special case of the more general issue (cross reference offsets are wrong). we move handling for this into the pdf token scanner. if we attempt to read an object at an offset and it fails we brute force the entire file to find correct offsets. we also needed to add handling to make sure we don't attempt to use stream length tokens if we're brute-forcing since we can't look up indirect references for length.	2020-02-26 14:03:46 +00:00
Eliot Jones	693a3d5958	use offset to file header to correct cross references if the %pdf version header comment is offset from the start of the file the cross reference offsets will also be wrong by that amount. this change updates the cross reference location logic to use the offset from the located version header.	2020-01-26 15:30:20 +00:00
Eliot Jones	e588b2bc50	support documents without endobj for stream some documents declare stream objects without an endobj marker at the end of the stream. if a new obj token is encountered after reading a stream we reset the scanner to the object number token and return the stream.	2020-01-07 15:27:01 +00:00
Eliot Jones	10dc5a8eed	don't cache invalid offsets unless brute forced don't cache objects parsed if their offset doesn't match the cross-reference offset, unless the object was parsed by a brute-force search operation. this is because 1 object may lie in 2 streams, 1 valid and 1 invalid. If the invalid stream is parsed first for another object then the valid stream will never be read.	2020-01-07 14:54:12 +00:00
Eliot Jones	7c0ef111ea	move classes to new projects to make the project more useful and expose more usable classes we're rearchitecting in the following way. code used to read fonts from external file formats like truetype, adobe font metrics (afm) and adobe type 1 fonts are moving to a new project which doesn't reference most of the pdf logic. the shared logic is moving to a new flat-structured project called core. this is a sort-of onion type architecture, with core being the... core, fonts being the next layer of the onion, pdfpig itself the next. this will then support additional libraries/projects as outer layers of the onion as well as releasing standalone version of the font library as pdfbox does with fontbox.	2020-01-04 16:38:18 +00:00
Eliot Jones	23c7e44fc8	handle stream length being an object stream value	2019-12-24 15:22:47 +00:00
Eliot Jones	3084a9aab6	support streams containing only carriage returns. handle comments in arrays and dictionaries * while the pdf specification says stream data should follow a newline following a stream operator some files have only a carriage return following the stream operator. * since comment tokens may appear inside an array or dictionary we ignore them if they occur here since they will break interpretation of the dictionary or array contents.	2019-12-20 14:04:58 +00:00
Eliot Jones	68bcaf3901	#55 move support for images to page and add inline images support both xobject and inline images. adds unsupported filters so that exceptions are only thrown when accessing lazily evaluated image.bytes property rather than when opening the page. treat all warnings as errors.	2019-10-08 14:04:36 +01:00
Eliot Jones	bbe5409f94	#62 use length value of stream directly to read the full stream once	2019-08-20 21:08:06 +01:00
Eliot Jones	caf1a0c233	use invariant culture for parsing all numbers #37	2019-06-18 19:12:51 +01:00
Eliot Jones	98424b32aa	special case handling for faulty offsets in xref with missing whitespace between eof and object number	2019-06-14 20:40:24 +01:00
Eliot Jones	2b486dccab	prevent infinite loops where a stream token's length entry references itself. perform brute force scans in case of a faulty xref table #33	2019-06-08 16:45:02 +01:00
Eliot Jones	03af28ed6d	fix bug with compact font format font matrix reading and where endstream token is missed if immediately following 'e'	2019-05-10 20:02:29 +01:00
Eliot Jones	bad57763a1	finish initial support for rc4 encryption with blank user password	2019-05-06 15:41:29 +01:00
Eliot Jones	be394f5bba	start adding support for reading encrypted documents	2019-05-04 15:36:13 +01:00
Eliot Jones	2fa781b8e9	#10 make all token classes public and expose via a public structure member on pdf document	2018-11-24 19:02:06 +00:00
Eliot Jones	7d90f4858a	continue migrating code to tokenizer	2018-01-20 18:42:29 +00:00
Eliot Jones	3d2a66cbf9	fix bug with endstream appearing without line break	2018-01-20 11:53:24 +00:00
Eliot Jones	615ee88a46	start passing the pdf scanner in to read the type 1 files	2018-01-14 15:33:22 +00:00
Eliot Jones	b19b96604d	make the pdf object scanner work with streams	2018-01-14 10:53:01 +00:00
Eliot Jones	8dcea9b37f	create a pdf object scanner which sits on top of the core token scanner to provide complete object parsing	2018-01-13 22:30:15 +00:00

25 Commits