Commit Graph

665 Commits

Author SHA1 Message Date
Eliot Jones
ab5a357665 fix bugs with reading documents from microsoft print to pdf 2018-01-10 19:23:10 +00:00
Eliot Jones
674945206c handle case where no unicode value is found for type 1 fonts. 2018-01-09 22:32:18 +00:00
Eliot Jones
b2936660d7 update the readme and document public properties 2018-01-08 23:19:51 +00:00
Eliot Jones
10df612a00 throw informative exception when the document is encrypted 2018-01-08 22:43:48 +00:00
Eliot Jones
bf7d3868da check public api through reflection 2018-01-08 22:08:58 +00:00
Eliot Jones
b1fbcd0ccd support type 1 normal fonts and fix bug with fetching resource dictionary 2018-01-08 21:58:07 +00:00
Eliot Jones
133ab43d45 fix a bug with font reading order in multi-page documents 2018-01-07 20:19:17 +00:00
Eliot Jones
59c36a7ddd fix a bug with t* operator due to incorrect specification. add generic name to type 3 fonts. 2018-01-07 15:38:43 +00:00
Eliot Jones
70361973b3 add support for reading encoding differences from font dictionary. add type 3 font support. 2018-01-07 14:49:17 +00:00
Eliot Jones
18eeb896e0 more documentation and remove unused code 2018-01-07 12:41:55 +00:00
Eliot Jones
bb93484909 encapsulation for internal classes, remove old code, document public api 2018-01-07 12:37:48 +00:00
Eliot Jones
ad1ef0e167 make package nuget publishable 2018-01-07 12:09:23 +00:00
Eliot Jones
a6c3dba25a bug fix for indirect page link, bug fix for array in base font range in cmap 2018-01-07 11:51:18 +00:00
Eliot Jones
c75b9d10bd add test for unusual latin characters and different document producer. 2018-01-06 21:06:26 +00:00
Eliot Jones
d1aa390f01 support reading text from type 1 fonts which use standard 14 fonts 2018-01-06 20:51:20 +00:00
Eliot Jones
96d787e498 finish adobe font metrics parser 2018-01-06 19:48:07 +00:00
Eliot Jones
02845e8ebb handle case where contents is an array of objects 2018-01-06 18:25:47 +00:00
Eliot Jones
eb66611e55 checkpoint check in for adobe font metrics parsing 2018-01-06 14:11:14 +00:00
Eliot Jones
03f31a84e5 tests for end of line tokenizer and branch coverage for string tokenizer 2018-01-06 12:08:52 +00:00
Eliot Jones
bbcb5af2be fix bugs revealed by mortality metadata document. get references using the direct object finder. fix a bug with string tokenizer 2018-01-05 23:08:20 +00:00
Eliot Jones
2e7f9b8d76 add a more complicated type 2 font pdf and tests 2018-01-04 21:25:49 +00:00
Eliot Jones
6b4bd8689f start migrating cross reference parsing process to token scanner 2018-01-04 21:09:47 +00:00
Eliot Jones
1c41618950 cover missing line in file trailer parsing 2018-01-03 22:51:44 +00:00
Eliot Jones
1aacb14285 add test for multiple page pdf from libre office 2018-01-03 22:46:26 +00:00
Eliot Jones
21be34a938 substitute the token scanner into the file trailer parsing and test 2018-01-03 22:29:09 +00:00
Eliot Jones
f09ef85e5a make tokenizer classes internal and change the file header to use a scanner rather than the pdfbox type reader 2018-01-03 20:15:25 +00:00
Eliot Jones
72ffa1f308 make byte array input bytes behave correctly when seeking rather than forcing the consumer to work around it. 2018-01-03 19:17:12 +00:00
Eliot Jones
bfdca3079f change the itext document test to reflect its text being form content. fix readme typo 2018-01-03 19:13:12 +00:00
Eliot Jones
7a28b05372 update readme to provide more details on the api 2018-01-03 13:06:54 +00:00
Eliot Jones
0ef33f5215 move catalog parsing to its own factory. parse document information if present and expose publically. add test for itext generated document 2018-01-02 23:26:58 +00:00
Eliot Jones
8b8f2941a5 undo debugging code 2018-01-02 22:23:50 +00:00
Eliot Jones
5ab8d69ea5 fix bug with computing text positions 2018-01-02 22:23:08 +00:00
Eliot Jones
d03c04cca1 test coverage for parsing glyph lists and fix bug with octal conversion 2018-01-01 18:39:18 +00:00
Eliot Jones
d7b9a9d559 implement glyph list for mapping from character code to name to unicode 2018-01-01 17:23:32 +00:00
Eliot Jones
c34bdac92a add test for non latin characters and use normal ints rather than octal in the encoding classes. 2018-01-01 13:49:24 +00:00
Eliot Jones
874f713566 load resources dictionary from pages as well as page node and throw informative error when the font is not found. 2018-01-01 10:49:05 +00:00
Eliot Jones
b77b7ec0d8 delete unused code and move cosboolean to pdfboolean 2017-12-31 16:04:50 +00:00
Eliot Jones
8f18a55c22 add very basic handler for simple truetype fonts 2017-12-31 15:30:47 +00:00
Eliot Jones
d668c4e892 make everything internal which does not need to be public 2017-12-31 14:23:36 +00:00
Eliot Jones
a77e8e1a56 implement the show text with positioning operator, fix bugs with parsing stream lengths contained in indirect objects. fix bug with parsing font dictionaries contained in indirect objects. 2017-12-31 14:11:13 +00:00
Eliot Jones
33c628e0c8 add page size and more tests for cross reference table parsing 2017-12-30 18:37:57 +00:00
Eliot Jones
cf41614cc2 add doc comments, make point size internal until I can learn matrices, add test case for ascii 85 filter, add test case for document text 2017-12-30 18:02:59 +00:00
Eliot Jones
88c86971d8 test coverage for cross reference table parsing 2017-12-30 17:47:41 +00:00
Eliot Jones
2a706af87b test coverage and fix bugs with ascii 85 filter 2017-12-30 13:09:46 +00:00
Eliot Jones
6adc0c169d fix bugs with reading cross reference tables. 2017-12-30 12:56:46 +00:00
Eliot Jones
f869bba72c add run length filter and delete old code 2017-12-29 11:08:59 +00:00
Eliot Jones
26e244371b add support for ascii hex encoding in streams 2017-12-28 18:10:18 +00:00
Eliot Jones
1feaf878cb make test data more tolerant to different environments 2017-12-28 17:06:02 +00:00
Eliot Jones
17d1d77abc add more documents to test font size and add tests to check our text positions against other providers 2017-12-28 16:58:52 +00:00
Eliot Jones
b1d28a5af8 encapsulate the internals better and improve the api for pdfdocument, delete old code and tidy tests. expand readme 2017-12-28 13:14:03 +00:00