Commit Graph

40 Commits

Author SHA1 Message Date
Eliot Jones
4150881be9 recover from invalid acro-form references
we add a try/catch to the direct object finder's tryget method so it returns false rather than throwing.

if we have an acro-form reference in the catalog but no corresponding object in the document we instead scan all objects in the document to find form fields and reconstruct the acro-form dictionary.
2020-02-27 12:08:40 +00:00
Eliot Jones
693a3d5958 use offset to file header to correct cross references
if the %pdf version header comment is offset from the start of the file the cross reference offsets will also be wrong by that amount. this change updates the cross reference location logic to use the offset from the located version header.
2020-01-26 15:30:20 +00:00
Eliot Jones
63b118b141 handle type1 fonts disguised as truetype
if the font descriptor uses the fromsubtype flag the actual type of the font can differ from that specified in the font dictionary. in this case a truetype font actually contains a type1c, compact font format, font. in this case we fall back to using the type1 parser.

also handles a closesubpath command appearing without any path construction operators.
2020-01-07 16:49:21 +00:00
Eliot Jones
0b048fde57 handle eof further back in file
an %%eof for a pdf file may appear further back than the last 1024 bytes. this change doubles the search range. it also handles an empty differences array being defined for a font encoding.

we also remove the old approach to dependency injection from the code since we are now favouring static classes where possible.
2020-01-07 11:48:09 +00:00
Eliot Jones
b29354e3e6 move compact font format fonts to fonts project 2020-01-05 12:08:01 +00:00
Eliot Jones
74774995d6 complete move of truetype, afm and standard14 fonts
the 3 font types mentioned are moved to the new fonts project, any referenced types are moved to the core project. most truetype classes are made public #8.
2020-01-04 22:39:13 +00:00
Eliot Jones
7c0ef111ea move classes to new projects
to make the project more useful and expose more usable classes we're rearchitecting in the following way. code used to read fonts from external file formats like truetype, adobe font metrics (afm) and adobe type 1 fonts are moving to a new project which doesn't reference most of the pdf logic. the shared logic is moving to a new flat-structured project called core. this is a sort-of onion type architecture, with core being the... core, fonts being the next layer of the onion, pdfpig itself the next. this will then support additional libraries/projects as outer layers of the onion as well as releasing standalone version of the font library as pdfbox does with fontbox.
2020-01-04 16:38:18 +00:00
Eliot Jones
4d697e3669 allow the user to supply multiple passwords for decryption
previously the only way to test if a password was correct was to supply a single password and throw if the value was incorrect. this was slow. now parsing options supports a list of passwords as well as a single password option (which is equivalent to a list with a single item). these passwords are all tested at the same time and an exception is only thrown once all passwords are tested.
2019-12-20 15:11:05 +00:00
Eliot Jones
c30cd1b96d use cid font subroutines where applicable. add ucs 2 cmap support for type 1 fonts
* cid cff fonts have multiple sub-fonts and multiple private dictionaries, in addition to a top level font and private dictionary. this fix uses the specific sub-dictionary when getting local subroutines on a per-glyph basis.
* chinese, japanese or korean fonts can use a ucs-2 encoding cmap for retrieving unicode values.
* add support for the additional glyph list for unicode values in true type fonts. adds nonmarkingreturn mapping to carriage return.
* makes font parsing classes static where there's no reason for them to be per-instance.
2019-12-19 13:33:44 +00:00
Eliot Jones
ecf0b8743b make bookmarknode immutable and use scanner when retrieving bookmarks 2019-12-05 12:03:30 +00:00
Eliot Jones
2ef45f71d5 make missing acroform types public and start improving data
also changes pages to use a proper tree structure since this will be required for resource inheritance and for acroform widget dictionaries.
2019-10-09 14:28:37 +01:00
Eliot Jones
68bcaf3901 #55 move support for images to page and add inline images
support both xobject and inline images. adds unsupported filters so that exceptions are only thrown when accessing lazily evaluated image.bytes property rather than when opening the page.

treat all warnings as errors.
2019-10-08 14:04:36 +01:00
Eliot Jones
d98b8b43c1 small performance tweaks and remove package license expression
package license url is deprecated in favour of package license expression but nuget doesn't seem to support expressions properly for published packages yet so we'll keep the deprecated url for the time being. having both url and expression causes the build to fail.

small obvious performance improvements for file header passing and getting the encoding information using the existing reverse name to code map.
2019-08-18 13:47:01 +01:00
Eliot Jones
0349bedd3e #57 add access to document metadata and expose wrapper type 2019-08-11 12:42:30 +01:00
Eliot Jones
23c033c788 implement validation of owner password and throw more descriptive exception for encrypted documents 2019-05-09 19:02:39 +01:00
Eliot Jones
bad57763a1 finish initial support for rc4 encryption with blank user password 2019-05-06 15:41:29 +01:00
Eliot Jones
be394f5bba start adding support for reading encrypted documents 2019-05-04 15:36:13 +01:00
Eliot Jones
245efae8ed fixes various font handling issues for type 1 and truetype fonts
handle "unionsq" and other tricky glyph names. log missing glyphs. ignore flexpoints in type 1 subroutines. improve system font performance and substitution. handle truetype fonts using standard 14 fonts.
2019-01-12 13:54:16 +00:00
Eliot Jones
20e843f5ae #24 start adding classes for the acroform api 2019-01-01 17:44:46 +00:00
Eliot Jones
47e49c4044 #9 fix bug with truetype fonts and start adding support for cid fonts using compact font format 2018-12-28 22:34:47 +00:00
Eliot Jones
ed3792c950 #20 support retrieval of named system fonts for truetype on windows 2018-12-22 18:28:49 +00:00
Eliot Jones
997979cc92 #11 early access to the raw xobjects for images.
temporary 'safe' untested implementation of seac for type 1 charstrings.
make structure public
bump version of package and project to 0.0.3 (it had accidentally increased to 0.0.5)
2018-11-26 19:46:41 +00:00
Eliot Jones
2fa781b8e9 #10 make all token classes public and expose via a public structure member on pdf document 2018-11-24 19:02:06 +00:00
Eliot Jones
0f68dfeb19 #10 move tokens to the root namespace for discoverability. upgrade xunit versions. there is a bug with test discovery for stringtokenizertests 2018-11-16 20:00:12 +00:00
Eliot Jones
b51ebfd70c fix a bug where the mediabox being defined on the pages node would throw. a lot more work on parsing compact font format 2018-04-29 14:42:54 +01:00
Eliot Jones
85d1f84965 more compact font format parsing 2018-04-28 19:33:50 +01:00
Eliot Jones
1deefdc987 begin implementing support for compact font format fonts in type 1 font handling 2018-04-28 13:00:43 +01:00
Eliot Jones
1fe54c5f49 add xobjects to pages, fix parsing truetype fonts where the glyphs use the repeat flag. 2018-04-26 22:22:29 +01:00
Eliot Jones
7af2b1bcb9 start adding code and tests for reading metrics of type 1 fonts 2018-04-11 22:51:31 +01:00
Eliot Jones
c64e54d6c0 support opening from stream and improve performance of brute force searching since the seek operation is now slower. 2018-01-21 19:34:21 +00:00
Eliot Jones
07161cef28 unify raw byte access method 2018-01-21 18:08:00 +00:00
Eliot Jones
3172596b7c remove all old cos objects 2018-01-21 14:56:50 +00:00
Eliot Jones
e24a306c31 remove all old parsing logic 2018-01-21 14:48:49 +00:00
Eliot Jones
7d90f4858a continue migrating code to tokenizer 2018-01-20 18:42:29 +00:00
Eliot Jones
c5e3ce7ec7 finish moving all parsing to token scanner 2018-01-20 00:49:53 +00:00
Eliot Jones
a0deab446b switch classes still using the cos object approach to the tokenization approach initally used for parsing cmap files. 2018-01-19 00:35:04 +00:00
Eliot Jones
4443cde229 add very hacky parsing for type 1 font files in order to read the encoding 2018-01-14 18:59:03 +00:00
Eliot Jones
615ee88a46 start passing the pdf scanner in to read the type 1 files 2018-01-14 15:33:22 +00:00
Eliot Jones
36c0eedd7c move the usages of cos object key to indirect reference 2018-01-14 14:48:54 +00:00
Eliot Jones
ec62542b64 change the project name to something silly 2018-01-10 19:49:32 +00:00