Eliot Jones
935d182888
use doubles where calculations are being run
2019-12-24 12:22:17 +00:00
Eliot Jones
e984180b3d
add method to retrieve any embedded files
2019-12-21 16:16:36 +00:00
Eliot Jones
4d697e3669
allow the user to supply multiple passwords for decryption
...
previously the only way to test if a password was correct was to supply a single password and throw if the value was incorrect. this was slow. now parsing options supports a list of passwords as well as a single password option (which is equivalent to a list with a single item). these passwords are all tested at the same time and an exception is only thrown once all passwords are tested.
2019-12-20 15:11:05 +00:00
Eliot Jones
5e68720495
add support for type1c cid fonts
2019-12-20 14:46:25 +00:00
Eliot Jones
f401ab3ba0
handle case insensitive truetype table tags and missing tables for postscript fonts
2019-12-20 14:40:25 +00:00
Eliot Jones
3084a9aab6
support streams containing only carriage returns. handle comments in arrays and dictionaries
...
* while the pdf specification says stream data should follow a newline following a stream operator some files have only a carriage return following the stream operator.
* since comment tokens may appear inside an array or dictionary we ignore them if they occur here since they will break interpretation of the dictionary or array contents.
2019-12-20 14:04:58 +00:00
Eliot Jones
3e6fa4b694
correctly map character code to glyph id when retrieving bounding boxes for truetype fonts
...
previously we just treated character codes as glyph ids when getting the bounding box from the truetype font program itself. this change uses the code for character code to glyph id mapping from pdfbox, with some changes, to retrieve the correct bounding box where possible. since this relies in some places on using the unicode value or name, rather than character code, we add a cache to the individual truetype fonts to store the character code to unicode mapping which should have the benefit of improving performance.
2019-12-20 12:48:00 +00:00
Eliot Jones
7296c3c125
merge pull request #105 from BobLd/master
...
whitespace covering algorithm and #104
2019-12-20 11:57:31 +00:00
Eliot Jones
e37e4c37b3
require end image token to be followed by at least 1 whitespace
2019-12-19 17:34:40 +00:00
Eliot Jones
03a28287e9
handle missing widths in cid fonts correctly
2019-12-19 16:59:17 +00:00
Eliot Jones
82c2ee7026
handle ei end image token appearing in inline image data
2019-12-19 16:29:44 +00:00
Eliot Jones
528df5c396
handle malformed cmap base character listings
2019-12-19 15:27:12 +00:00
Eliot Jones
c30cd1b96d
use cid font subroutines where applicable. add ucs 2 cmap support for type 1 fonts
...
* cid cff fonts have multiple sub-fonts and multiple private dictionaries, in addition to a top level font and private dictionary. this fix uses the specific sub-dictionary when getting local subroutines on a per-glyph basis.
* chinese, japanese or korean fonts can use a ucs-2 encoding cmap for retrieving unicode values.
* add support for the additional glyph list for unicode values in true type fonts. adds nonmarkingreturn mapping to carriage return.
* makes font parsing classes static where there's no reason for them to be per-instance.
2019-12-19 13:33:44 +00:00
Eliot Jones
a167d4c1dd
fix bug where hex tokens for document identifier lost bytes due to encoding
2019-12-18 14:54:56 +00:00
Eliot Jones
dab64ec406
handle newlines before inline images and support larger data streams in brute force search
2019-12-18 12:02:07 +00:00
BobLd
6dba5bb2b4
update PublicApiScannerTests
2019-12-18 11:43:39 +00:00
BobLd
47b4428562
Adding Whitespace covering algorithm
...
Adding support for MaxDegreeOfParallelism in DocumentLayoutAnalysis
2019-12-18 11:41:39 +00:00
Eliot Jones
1fb416eee3
add convenience method to retrieve all hyperlinks and their text from annotations on a page
2019-12-18 11:41:02 +00:00
Eliot Jones
777bf9b63d
version 0.0.11
0.0.11
2019-12-17 18:07:08 +00:00
Eliot Jones
0a7cad8f79
Merge branch 'master' of https://github.com/UglyToad/Pdf
2019-12-17 17:33:20 +00:00
Eliot Jones
57fb3d3e79
support system fonts without descriptors and also enable overridden widths #101
2019-12-17 17:32:22 +00:00
Eliot Jones
53e7c7d4f4
Merge pull request #102 from BobLd/master
...
improving geometry classes with tests
2019-12-16 14:42:46 +00:00
BobLd
5cf1f6c58c
Modifications and adding som tests
2019-12-16 14:36:52 +00:00
Eliot Jones
ec4f8ac5bf
merge pull request #103 from uglytoad/version-10-bugfixes
...
version 10 bugfixes
2019-12-16 13:24:21 +00:00
Eliot Jones
7a6b8ce6d6
Merge pull request #100 from huzhiguan/master
...
fix issue with leaf/nodes in RecursiveXYCut that only have whitespaces words.
2019-12-14 16:44:00 +00:00
BobLd
1656411fcb
Improving Geometry classes with Tests
2019-12-14 11:41:11 +00:00
Zhiguan Hu
9baa8c3ca0
Fix format as suggested.
2019-12-11 10:05:17 -06:00
Zhiguan Hu
6f3a30a723
Merge branch 'master' of https://github.com/huzhiguan/PdfPig
2019-12-10 15:04:07 -06:00
Zhiguan Hu
30247ba774
Fix the bug that happens when all the words in the current leaf for VerticalCut/HorizontalCut are all white spaces.
2019-12-10 15:03:05 -06:00
Eliot Jones
3c0cd17a8b
use correct defaults for separation colorspace #89
2019-12-10 14:10:50 +00:00
Eliot Jones
c89928d976
remove inefficient approach to checking if content stream path has been added #99
2019-12-10 13:20:57 +00:00
Eliot Jones
d0443c6567
make operator token threadsafe #97
2019-12-10 11:31:02 +00:00
Eliot Jones
af1217f910
version 0.0.10
v0.0.10
2019-12-09 13:49:07 +00:00
Eliot Jones
4042649b46
update the readme for 0.0.10 release
2019-12-09 13:37:22 +00:00
Eliot Jones
6ee7c09788
merge pull request #93 from BobLd/master
...
improving clustering algorithm
2019-12-09 13:10:17 +00:00
Eliot Jones
d37149a8d7
support custom encodings for type 1 standard 14 fonts without metrics #95
2019-12-09 13:02:52 +00:00
Eliot Jones
f2ead37134
handle missing whitespaces before the start of the object #88
2019-12-09 12:24:20 +00:00
BobLd
b69c004548
Changing functions description to reflect changes
2019-12-07 22:49:10 +00:00
Eliot Jones
75a6260501
make cropbox public
2019-12-06 17:34:51 +00:00
BobLd
fd9efb5b5d
making FindIndexNearest() internal
2019-12-06 17:29:33 +00:00
Eliot Jones
e38da0a403
add support for alternative colorspace in separation colorspaces #89
2019-12-06 17:23:15 +00:00
BobLd
b5a0faaa3b
Improving clustering algorithm
2019-12-06 16:02:30 +00:00
BobLd
e0138c7ae1
Merge branch 'master' of https://github.com/UglyToad/PdfPig
2019-12-06 15:39:03 +00:00
Eliot Jones
c57d8446e4
merge pull request #92 from uglytoad/outlines-refactoring
...
outlines refactoring
2019-12-06 13:46:37 +00:00
Eliot Jones
2ff6a4bf11
handle rotations of 360 degrees or more #90
2019-12-06 13:36:45 +00:00
BobLd
439bd9bb53
Merge branch 'master' of https://github.com/UglyToad/PdfPig
2019-12-06 11:41:33 +00:00
Eliot Jones
a9b61d81fa
skip explicit destinations where the page cannot be found
2019-12-05 16:40:21 +00:00
Eliot Jones
d6e1dccc01
add support for standardencoding in type 1 fonts #78
2019-12-05 16:32:10 +00:00
Eliot Jones
e01d77b93a
add negative test case and make tests non-lenient
2019-12-05 13:56:12 +00:00
Eliot Jones
8ca947542f
skip unrelated entries in document name tree
2019-12-05 13:47:42 +00:00