Commit Graph

845 Commits

Author SHA1 Message Date
Eliot Jones
7296c3c125 merge pull request #105 from BobLd/master
whitespace covering algorithm and #104
2019-12-20 11:57:31 +00:00
Eliot Jones
e37e4c37b3 require end image token to be followed by at least 1 whitespace 2019-12-19 17:34:40 +00:00
Eliot Jones
03a28287e9 handle missing widths in cid fonts correctly 2019-12-19 16:59:17 +00:00
Eliot Jones
82c2ee7026 handle ei end image token appearing in inline image data 2019-12-19 16:29:44 +00:00
Eliot Jones
528df5c396 handle malformed cmap base character listings 2019-12-19 15:27:12 +00:00
Eliot Jones
c30cd1b96d use cid font subroutines where applicable. add ucs 2 cmap support for type 1 fonts
* cid cff fonts have multiple sub-fonts and multiple private dictionaries, in addition to a top level font and private dictionary. this fix uses the specific sub-dictionary when getting local subroutines on a per-glyph basis.
* chinese, japanese or korean fonts can use a ucs-2 encoding cmap for retrieving unicode values.
* add support for the additional glyph list for unicode values in true type fonts. adds nonmarkingreturn mapping to carriage return.
* makes font parsing classes static where there's no reason for them to be per-instance.
2019-12-19 13:33:44 +00:00
Eliot Jones
a167d4c1dd fix bug where hex tokens for document identifier lost bytes due to encoding 2019-12-18 14:54:56 +00:00
Eliot Jones
dab64ec406 handle newlines before inline images and support larger data streams in brute force search 2019-12-18 12:02:07 +00:00
BobLd
6dba5bb2b4 update PublicApiScannerTests 2019-12-18 11:43:39 +00:00
BobLd
47b4428562 Adding Whitespace covering algorithm
Adding support for MaxDegreeOfParallelism in DocumentLayoutAnalysis
2019-12-18 11:41:39 +00:00
Eliot Jones
1fb416eee3 add convenience method to retrieve all hyperlinks and their text from annotations on a page 2019-12-18 11:41:02 +00:00
Eliot Jones
777bf9b63d version 0.0.11 0.0.11 2019-12-17 18:07:08 +00:00
Eliot Jones
0a7cad8f79 Merge branch 'master' of https://github.com/UglyToad/Pdf 2019-12-17 17:33:20 +00:00
Eliot Jones
57fb3d3e79 support system fonts without descriptors and also enable overridden widths #101 2019-12-17 17:32:22 +00:00
Eliot Jones
53e7c7d4f4 Merge pull request #102 from BobLd/master
improving geometry classes with tests
2019-12-16 14:42:46 +00:00
BobLd
5cf1f6c58c Modifications and adding som tests 2019-12-16 14:36:52 +00:00
Eliot Jones
ec4f8ac5bf merge pull request #103 from uglytoad/version-10-bugfixes
version 10 bugfixes
2019-12-16 13:24:21 +00:00
Eliot Jones
7a6b8ce6d6 Merge pull request #100 from huzhiguan/master
fix issue with leaf/nodes in RecursiveXYCut that only have whitespaces words.
2019-12-14 16:44:00 +00:00
BobLd
1656411fcb Improving Geometry classes with Tests 2019-12-14 11:41:11 +00:00
Zhiguan Hu
9baa8c3ca0 Fix format as suggested. 2019-12-11 10:05:17 -06:00
Zhiguan Hu
6f3a30a723 Merge branch 'master' of https://github.com/huzhiguan/PdfPig 2019-12-10 15:04:07 -06:00
Zhiguan Hu
30247ba774 Fix the bug that happens when all the words in the current leaf for VerticalCut/HorizontalCut are all white spaces. 2019-12-10 15:03:05 -06:00
Eliot Jones
3c0cd17a8b use correct defaults for separation colorspace #89 2019-12-10 14:10:50 +00:00
Eliot Jones
c89928d976 remove inefficient approach to checking if content stream path has been added #99 2019-12-10 13:20:57 +00:00
Eliot Jones
d0443c6567 make operator token threadsafe #97 2019-12-10 11:31:02 +00:00
Eliot Jones
af1217f910 version 0.0.10 v0.0.10 2019-12-09 13:49:07 +00:00
Eliot Jones
4042649b46 update the readme for 0.0.10 release 2019-12-09 13:37:22 +00:00
Eliot Jones
6ee7c09788 merge pull request #93 from BobLd/master
improving clustering algorithm
2019-12-09 13:10:17 +00:00
Eliot Jones
d37149a8d7 support custom encodings for type 1 standard 14 fonts without metrics #95 2019-12-09 13:02:52 +00:00
Eliot Jones
f2ead37134 handle missing whitespaces before the start of the object #88 2019-12-09 12:24:20 +00:00
BobLd
b69c004548 Changing functions description to reflect changes 2019-12-07 22:49:10 +00:00
Eliot Jones
75a6260501 make cropbox public 2019-12-06 17:34:51 +00:00
BobLd
fd9efb5b5d making FindIndexNearest() internal 2019-12-06 17:29:33 +00:00
Eliot Jones
e38da0a403 add support for alternative colorspace in separation colorspaces #89 2019-12-06 17:23:15 +00:00
BobLd
b5a0faaa3b Improving clustering algorithm 2019-12-06 16:02:30 +00:00
BobLd
e0138c7ae1 Merge branch 'master' of https://github.com/UglyToad/PdfPig 2019-12-06 15:39:03 +00:00
Eliot Jones
c57d8446e4 merge pull request #92 from uglytoad/outlines-refactoring
outlines refactoring
2019-12-06 13:46:37 +00:00
Eliot Jones
2ff6a4bf11 handle rotations of 360 degrees or more #90 2019-12-06 13:36:45 +00:00
BobLd
439bd9bb53 Merge branch 'master' of https://github.com/UglyToad/PdfPig 2019-12-06 11:41:33 +00:00
Eliot Jones
a9b61d81fa skip explicit destinations where the page cannot be found 2019-12-05 16:40:21 +00:00
Eliot Jones
d6e1dccc01 add support for standardencoding in type 1 fonts #78 2019-12-05 16:32:10 +00:00
Eliot Jones
e01d77b93a add negative test case and make tests non-lenient 2019-12-05 13:56:12 +00:00
Eliot Jones
8ca947542f skip unrelated entries in document name tree 2019-12-05 13:47:42 +00:00
Eliot Jones
2e5c995322 make external nodes different to document nodes and finish reimplementation 2019-12-05 13:21:19 +00:00
Eliot Jones
2ea71ce3bb fix off-by-one error in format 4 cmap subtable for truetype #91 2019-12-05 12:21:58 +00:00
Eliot Jones
ecf0b8743b make bookmarknode immutable and use scanner when retrieving bookmarks 2019-12-05 12:03:30 +00:00
Eliot Jones
928347bcce merge pull request #84 from BobLd/master
add basic bookmarks extraction capabilities.
2019-12-04 14:24:10 +00:00
Eliot Jones
a967e0898a handle missing width and height correctly for compact font format fonts #75 2019-12-04 14:19:06 +00:00
Eliot Jones
8a51795e99 update codecov version for azure pipeline 2019-11-27 16:45:05 +00:00
Eliot Jones
80f024dbed make form access public 2019-11-27 16:36:25 +00:00