PdfPig

lsm/PdfPig

mirror of https://github.com/UglyToad/PdfPig.git synced 2025-11-28 17:47:12 +08:00

Author	SHA1	Message	Date
Eliot Jones	76f8222f74	start adding support for undocumented revision 6 encryption revision 6 was added in the pdf 2.0 specification which is document iso 32000-2:2017. because iso are rent-seeking they charge money to view this specification so it is effectively undocumented. this site details some of the algorithm https://web.archive.org/web/20180311160224/esec-lab.sogeti.com/posts/2011/09/14/the-undocumented-password-validation-algorithm-of-adobe-reader-x.html. the code in this change ports the pdfbox logic line by line. it doesn't implement the correct behaviour for owner password yet.	2019-06-24 20:37:25 +01:00
Eliot Jones	cc98bf1089	remove byte order marks from unicode strings #32	2019-06-23 15:22:37 +01:00
Eliot Jones	f86c2545bd	treat encryption entries as optional for revisions 5+ #34 the revision 5 and 6 encryption algorithms specify the presence of additional encryption material named 'oe' and 'ue'. it turns out this is not always required so will now default to null if not present. this also adds support for those values being in hex rather than normal string format. tidies up some commenting on the xynode class, moves public methods below constructors and adds xy to the resharper list of abbreviations for the solution.	2019-06-23 13:52:12 +01:00
Eliot Jones	ff9e2ad83f	handle hex registry and ordering. decrypt hex tokens #34 cid fonts may contain a registry, ordering and supplement to identify the font. we were checking for string registry and ordering tokens but failing on hex tokens. for encrypted documents we now decrypt hex data.	2019-06-23 13:27:32 +01:00
Eliot Jones	0f103554fb	handle non-standard crypt dictionary type and use hex bytes for password #34 using an online tool to encrypt a simple document with aes-128 seems to add the dictionary type cryptalgorithm rather than cryptfilter. i couldn't find any references to cryptalgorithm in the spec or pdfbox but it seems to work ok when treated as equivalent to cryptfilter. there are situations where the string derived from a hex token has a different length to the underlying bytes, for example if the hex token contains the '\0' byte, the encryption algorithm needs to use the raw bytes rather than the 'stringified' bytes. this change passes raw bytes for hex tokens for both the user and owner password keys.	2019-06-23 13:12:47 +01:00
Eliot Jones	d259f89bd9	Merge pull request #40 from Numpsy/rw/unicode_hex_strings add utf-16 parsing support to hextoken	2019-06-23 12:38:44 +01:00
Eliot Jones	41eddca0bf	handle incorrect xref offsets #34 previously if the cross reference did not exist at exactly the provided offset we'd immediately throw, now we assume we can read a few more tokens to find the xref table or stream start. this won't work in the case where the provided offset is past the start of the table or nowhere near the table but in those cases there's not much we can do. there's some more work to do to provide a fallback xref parser which finds the xref tables and streams using a brute-force scan of the whole document.	2019-06-23 12:05:21 +01:00
Eliot Jones	0c1b50fcc4	Merge pull request #36 from BobLd/master Document Layout Analysis Tools	2019-06-23 11:32:50 +01:00
Richard Webb	b5b862e63f	unit tests for tokenizing UTF16 encoded hex strings.	2019-06-23 01:19:43 +01:00
Richard Webb	0432f703c4	extend HexToken to support UTF-16BE encoded hex strings	2019-06-23 01:18:48 +01:00
BobLd	00233fa5d0	Update with corrections - 2	2019-06-20 22:10:05 +01:00
Eliot Jones	7b96483664	include raw dictionary token in the document information class #38	2019-06-19 21:23:06 +01:00
Eliot Jones	b7b08fa881	add gitter badge	2019-06-19 18:50:48 +01:00
Eliot Jones	35b6c4f0eb	handle case where font metrics do not declare width or height #35	2019-06-19 18:47:50 +01:00
BobLd	080354dc54	Corrected PublicApiScannerTests	2019-06-18 21:32:14 +01:00
BobLd	f8d0883da5	Update with corrections	2019-06-18 20:48:49 +01:00
Eliot Jones	caf1a0c233	use invariant culture for parsing all numbers #37	2019-06-18 19:12:51 +01:00
BobLd	4416793f6d	Corrected PublicApiScannerTests	2019-06-16 19:19:44 +01:00
BobLd	2525cd243f	Typo correction	2019-06-16 14:03:12 +01:00
BobLd	a0c864e8af	Addind Document Layout Analysis: - Nearest Neighbour Word Extractor - Recursive X-Y Cut algorithm, useful for multi-column pdf documents	2019-06-16 13:57:30 +01:00
Eliot Jones	2c9a3d6e96	add test coverage for direct object finder	2019-06-14 20:57:46 +01:00
Eliot Jones	98424b32aa	special case handling for faulty offsets in xref with missing whitespace between eof and object number	2019-06-14 20:40:24 +01:00
Eliot Jones	4c716fcbd6	finish support for revision 5 encryption using aes 256 #34	2019-06-13 19:46:08 +01:00
Eliot Jones	d0a3cd398f	start adding support for revision 5 aes-256 encrypted documents #34	2019-06-09 13:27:03 +01:00
Eliot Jones	f3c8220ec4	add test coverage for invalid document from #33	2019-06-08 16:58:20 +01:00
Eliot Jones	2b486dccab	prevent infinite loops where a stream token's length entry references itself. perform brute force scans in case of a faulty xref table #33	2019-06-08 16:45:02 +01:00
Eliot Jones	21a4ba597e	add support for aes-128 decryption #34	2019-06-08 15:23:21 +01:00
Eliot Jones	a19122478d	begin adding support for in-document security handlers to support aes 128/256 encryption #34	2019-06-08 14:14:51 +01:00
Eliot Jones	39d05e6a47	support big endian and little endian utf 16 in string tokens #32	2019-06-05 18:03:20 +01:00
Eliot Jones	f375cb6f04	keep letters in word when using default word extractor	2019-05-30 20:07:52 +01:00
Eliot Jones	ef822b484d	0.0.6 - update version and sourcelink nuget version v0.0.6	2019-05-19 13:39:06 +01:00
Eliot Jones	31d12eb731	handle extraneous def token in some dictionaries and skip returning glyph bounds if not in font	2019-05-19 13:27:38 +01:00
Eliot Jones	e9e376c52a	update readme and make page dictionary public	2019-05-19 13:14:38 +01:00
Eliot Jones	872e338ecb	skip invalid commands in type 1 command definitions	2019-05-19 12:58:49 +01:00
Eliot Jones	7e8f3623a4	handle type 1 parser already being at def token when reading till next def token	2019-05-19 12:28:26 +01:00
Eliot Jones	ffa7b3bcc7	generate synthetic encoding where not present and use direct object finder to lookup cropbox and mediabox	2019-05-18 15:20:07 +01:00
Eliot Jones	8a74d5b2f3	use missing width for type 1 fonts when not in pdf array	2019-05-18 14:43:22 +01:00
Eliot Jones	7a3b89ece1	tidy up some doc comments	2019-05-18 12:28:42 +01:00
Eliot Jones	f884674807	Merge branch 'master' of https://github.com/UglyToad/Pdf	2019-05-18 12:25:57 +01:00
Eliot Jones	f3bc3a37b9	add lzw filter support	2019-05-18 12:25:47 +01:00
Eliot Jones	86c5478ddb	Merge pull request #31 from BobLd/master add textline and pdfline	2019-05-15 22:34:01 +01:00
Eliot Jones	9a8becde3e	update the readme to reflect expanded capabilities	2019-05-15 20:05:05 +01:00
Eliot Jones	69b6958c9d	only declare a cff font to be a cid font if the registry ordering supplement (ros) is provided	2019-05-15 20:00:24 +01:00
BobLd	f4ec425bf0	- Correction of the PdfLine's length formula; - Moving Line to TextLine	2019-05-15 19:44:47 +01:00
BobLd	97f0f6fe75	Minor modifications and updates	2019-05-14 20:56:34 +01:00
Eliot Jones	5cf62eaa11	fix counting hintmask bytes where cntrmask is present in type 2 charstrings for cff fonts	2019-05-14 20:08:44 +01:00
BobLd	de421d65a1	Adding Line, PdfLine	2019-05-12 19:39:58 +01:00
BobLd	2011d504a7	In Content: - Adding a 'Line' of text object - Adding a 'TextDirection' property in the 'Word' object In Geometry: - Adding a 'PdfLine' object - Making the 'PdfRectangle' creator public	2019-05-12 19:34:00 +01:00
Eliot Jones	55d34e3998	use standardencoding name for seac command in type 1 charstrings	2019-05-11 15:57:19 +01:00
Eliot Jones	5b5a0b7f55	fix null reference bug and handle escaped escape characters in string tokenization	2019-05-11 15:35:56 +01:00

... 3 4 5 6 7 ...

573 Commits