Eliot Jones
e37e4c37b3
require end image token to be followed by at least 1 whitespace
2019-12-19 17:34:40 +00:00
Eliot Jones
82c2ee7026
handle ei end image token appearing in inline image data
2019-12-19 16:29:44 +00:00
Eliot Jones
dab64ec406
handle newlines before inline images and support larger data streams in brute force search
2019-12-18 12:02:07 +00:00
Eliot Jones
68bcaf3901
#55 move support for images to page and add inline images
...
support both xobject and inline images. adds unsupported filters so that exceptions are only thrown when accessing lazily evaluated image.bytes property rather than when opening the page.
treat all warnings as errors.
2019-10-08 14:04:36 +01:00
Eliot Jones
bbe5409f94
#62 use length value of stream directly to read the full stream once
2019-08-20 21:08:06 +01:00
Eliot Jones
364bd25fa8
#48 add handling of inline image data to pdf content parsing
...
an inline image in a pdf content stream starts with the bi tag, then id declares the start of image data and ei the end. attempting to parse the bytes after the id tag as usual resulted in errors. this change adds special case handling for inline images.
2019-08-03 15:42:19 +01:00
Eliot Jones
cc98bf1089
remove byte order marks from unicode strings #32
2019-06-23 15:22:37 +01:00
Eliot Jones
caf1a0c233
use invariant culture for parsing all numbers #37
2019-06-18 19:12:51 +01:00
Eliot Jones
98424b32aa
special case handling for faulty offsets in xref with missing whitespace between eof and object number
2019-06-14 20:40:24 +01:00
Eliot Jones
2b486dccab
prevent infinite loops where a stream token's length entry references itself. perform brute force scans in case of a faulty xref table #33
2019-06-08 16:45:02 +01:00
Eliot Jones
39d05e6a47
support big endian and little endian utf 16 in string tokens #32
2019-06-05 18:03:20 +01:00
Eliot Jones
31d12eb731
handle extraneous def token in some dictionaries and skip returning glyph bounds if not in font
2019-05-19 13:27:38 +01:00
Eliot Jones
5b5a0b7f55
fix null reference bug and handle escaped escape characters in string tokenization
2019-05-11 15:35:56 +01:00
Eliot Jones
03af28ed6d
fix bug with compact font format font matrix reading and where endstream token is missed if immediately following 'e'
2019-05-10 20:02:29 +01:00
Eliot Jones
bad57763a1
finish initial support for rc4 encryption with blank user password
2019-05-06 15:41:29 +01:00
Eliot Jones
be394f5bba
start adding support for reading encrypted documents
2019-05-04 15:36:13 +01:00
Eliot Jones
3a4b7b79d1
#21 change dictionarytoken to use explicit key type, finish os/2 table for truetype, first file creation using embedded truetype font
2018-12-08 14:38:27 +00:00
Eliot Jones
2fa781b8e9
#10 make all token classes public and expose via a public structure member on pdf document
2018-11-24 19:02:06 +00:00
Eliot Jones
0f68dfeb19
#10 move tokens to the root namespace for discoverability. upgrade xunit versions. there is a bug with test discovery for stringtokenizertests
2018-11-16 20:00:12 +00:00
Eliot Jones
1deefdc987
begin implementing support for compact font format fonts in type 1 font handling
2018-04-28 13:00:43 +01:00
Eliot Jones
1fe54c5f49
add xobjects to pages, fix parsing truetype fonts where the glyphs use the repeat flag.
2018-04-26 22:22:29 +01:00
Eliot Jones
e063ac45fe
add support for parsing pfb files in type 1 fonts and an extra integration test
2018-04-12 22:34:38 +01:00
Eliot Jones
7af2b1bcb9
start adding code and tests for reading metrics of type 1 fonts
2018-04-11 22:51:31 +01:00
Eliot Jones
07161cef28
unify raw byte access method
2018-01-21 18:08:00 +00:00
Eliot Jones
3172596b7c
remove all old cos objects
2018-01-21 14:56:50 +00:00
Eliot Jones
7d90f4858a
continue migrating code to tokenizer
2018-01-20 18:42:29 +00:00
Eliot Jones
3d2a66cbf9
fix bug with endstream appearing without line break
2018-01-20 11:53:24 +00:00
Eliot Jones
c5e3ce7ec7
finish moving all parsing to token scanner
2018-01-20 00:49:53 +00:00
Eliot Jones
a0deab446b
switch classes still using the cos object approach to the tokenization approach initally used for parsing cmap files.
2018-01-19 00:35:04 +00:00
Eliot Jones
0ead678a43
add tests for array token
2018-01-17 21:48:11 +00:00
Eliot Jones
54b6374e7d
coverage for the name tokenizer reading hex
2018-01-15 21:16:36 +00:00
Eliot Jones
4443cde229
add very hacky parsing for type 1 font files in order to read the encoding
2018-01-14 18:59:03 +00:00
Eliot Jones
615ee88a46
start passing the pdf scanner in to read the type 1 files
2018-01-14 15:33:22 +00:00
Eliot Jones
36c0eedd7c
move the usages of cos object key to indirect reference
2018-01-14 14:48:54 +00:00
Eliot Jones
b19b96604d
make the pdf object scanner work with streams
2018-01-14 10:53:01 +00:00
Eliot Jones
8dcea9b37f
create a pdf object scanner which sits on top of the core token scanner to provide complete object parsing
2018-01-13 22:30:15 +00:00
Eliot Jones
ba8d2f5b1d
fix a bug with tokenization without spaces before string
2018-01-10 22:15:29 +00:00
Eliot Jones
ec62542b64
change the project name to something silly
2018-01-10 19:49:32 +00:00