Commit Graph

38 Commits

Author SHA1 Message Date
Eliot Jones
e37e4c37b3 require end image token to be followed by at least 1 whitespace 2019-12-19 17:34:40 +00:00
Eliot Jones
82c2ee7026 handle ei end image token appearing in inline image data 2019-12-19 16:29:44 +00:00
Eliot Jones
dab64ec406 handle newlines before inline images and support larger data streams in brute force search 2019-12-18 12:02:07 +00:00
Eliot Jones
68bcaf3901 #55 move support for images to page and add inline images
support both xobject and inline images. adds unsupported filters so that exceptions are only thrown when accessing lazily evaluated image.bytes property rather than when opening the page.

treat all warnings as errors.
2019-10-08 14:04:36 +01:00
Eliot Jones
bbe5409f94 #62 use length value of stream directly to read the full stream once 2019-08-20 21:08:06 +01:00
Eliot Jones
364bd25fa8 #48 add handling of inline image data to pdf content parsing
an inline image in a pdf content stream starts with the bi tag, then id declares the start of image data and ei the end. attempting to parse the bytes after the id tag as usual resulted in errors. this change adds special case handling for inline images.
2019-08-03 15:42:19 +01:00
Eliot Jones
cc98bf1089 remove byte order marks from unicode strings #32 2019-06-23 15:22:37 +01:00
Eliot Jones
caf1a0c233 use invariant culture for parsing all numbers #37 2019-06-18 19:12:51 +01:00
Eliot Jones
98424b32aa special case handling for faulty offsets in xref with missing whitespace between eof and object number 2019-06-14 20:40:24 +01:00
Eliot Jones
2b486dccab prevent infinite loops where a stream token's length entry references itself. perform brute force scans in case of a faulty xref table #33 2019-06-08 16:45:02 +01:00
Eliot Jones
39d05e6a47 support big endian and little endian utf 16 in string tokens #32 2019-06-05 18:03:20 +01:00
Eliot Jones
31d12eb731 handle extraneous def token in some dictionaries and skip returning glyph bounds if not in font 2019-05-19 13:27:38 +01:00
Eliot Jones
5b5a0b7f55 fix null reference bug and handle escaped escape characters in string tokenization 2019-05-11 15:35:56 +01:00
Eliot Jones
03af28ed6d fix bug with compact font format font matrix reading and where endstream token is missed if immediately following 'e' 2019-05-10 20:02:29 +01:00
Eliot Jones
bad57763a1 finish initial support for rc4 encryption with blank user password 2019-05-06 15:41:29 +01:00
Eliot Jones
be394f5bba start adding support for reading encrypted documents 2019-05-04 15:36:13 +01:00
Eliot Jones
3a4b7b79d1 #21 change dictionarytoken to use explicit key type, finish os/2 table for truetype, first file creation using embedded truetype font 2018-12-08 14:38:27 +00:00
Eliot Jones
2fa781b8e9 #10 make all token classes public and expose via a public structure member on pdf document 2018-11-24 19:02:06 +00:00
Eliot Jones
0f68dfeb19 #10 move tokens to the root namespace for discoverability. upgrade xunit versions. there is a bug with test discovery for stringtokenizertests 2018-11-16 20:00:12 +00:00
Eliot Jones
1deefdc987 begin implementing support for compact font format fonts in type 1 font handling 2018-04-28 13:00:43 +01:00
Eliot Jones
1fe54c5f49 add xobjects to pages, fix parsing truetype fonts where the glyphs use the repeat flag. 2018-04-26 22:22:29 +01:00
Eliot Jones
e063ac45fe add support for parsing pfb files in type 1 fonts and an extra integration test 2018-04-12 22:34:38 +01:00
Eliot Jones
7af2b1bcb9 start adding code and tests for reading metrics of type 1 fonts 2018-04-11 22:51:31 +01:00
Eliot Jones
07161cef28 unify raw byte access method 2018-01-21 18:08:00 +00:00
Eliot Jones
3172596b7c remove all old cos objects 2018-01-21 14:56:50 +00:00
Eliot Jones
7d90f4858a continue migrating code to tokenizer 2018-01-20 18:42:29 +00:00
Eliot Jones
3d2a66cbf9 fix bug with endstream appearing without line break 2018-01-20 11:53:24 +00:00
Eliot Jones
c5e3ce7ec7 finish moving all parsing to token scanner 2018-01-20 00:49:53 +00:00
Eliot Jones
a0deab446b switch classes still using the cos object approach to the tokenization approach initally used for parsing cmap files. 2018-01-19 00:35:04 +00:00
Eliot Jones
0ead678a43 add tests for array token 2018-01-17 21:48:11 +00:00
Eliot Jones
54b6374e7d coverage for the name tokenizer reading hex 2018-01-15 21:16:36 +00:00
Eliot Jones
4443cde229 add very hacky parsing for type 1 font files in order to read the encoding 2018-01-14 18:59:03 +00:00
Eliot Jones
615ee88a46 start passing the pdf scanner in to read the type 1 files 2018-01-14 15:33:22 +00:00
Eliot Jones
36c0eedd7c move the usages of cos object key to indirect reference 2018-01-14 14:48:54 +00:00
Eliot Jones
b19b96604d make the pdf object scanner work with streams 2018-01-14 10:53:01 +00:00
Eliot Jones
8dcea9b37f create a pdf object scanner which sits on top of the core token scanner to provide complete object parsing 2018-01-13 22:30:15 +00:00
Eliot Jones
ba8d2f5b1d fix a bug with tokenization without spaces before string 2018-01-10 22:15:29 +00:00
Eliot Jones
ec62542b64 change the project name to something silly 2018-01-10 19:49:32 +00:00