Eliot Jones
0c1b50fcc4
Merge pull request #36 from BobLd/master
...
Document Layout Analysis Tools
2019-06-23 11:32:50 +01:00
Richard Webb
b5b862e63f
unit tests for tokenizing UTF16 encoded hex strings.
2019-06-23 01:19:43 +01:00
Richard Webb
0432f703c4
extend HexToken to support UTF-16BE encoded hex strings
2019-06-23 01:18:48 +01:00
BobLd
00233fa5d0
Update with corrections - 2
2019-06-20 22:10:05 +01:00
Eliot Jones
7b96483664
include raw dictionary token in the document information class #38
2019-06-19 21:23:06 +01:00
Eliot Jones
b7b08fa881
add gitter badge
2019-06-19 18:50:48 +01:00
Eliot Jones
35b6c4f0eb
handle case where font metrics do not declare width or height #35
2019-06-19 18:47:50 +01:00
BobLd
080354dc54
Corrected PublicApiScannerTests
2019-06-18 21:32:14 +01:00
BobLd
f8d0883da5
Update with corrections
2019-06-18 20:48:49 +01:00
Eliot Jones
caf1a0c233
use invariant culture for parsing all numbers #37
2019-06-18 19:12:51 +01:00
BobLd
4416793f6d
Corrected PublicApiScannerTests
2019-06-16 19:19:44 +01:00
BobLd
2525cd243f
Typo correction
2019-06-16 14:03:12 +01:00
BobLd
a0c864e8af
Addind Document Layout Analysis:
...
- Nearest Neighbour Word Extractor
- Recursive X-Y Cut algorithm, useful for multi-column pdf documents
2019-06-16 13:57:30 +01:00
Eliot Jones
2c9a3d6e96
add test coverage for direct object finder
2019-06-14 20:57:46 +01:00
Eliot Jones
98424b32aa
special case handling for faulty offsets in xref with missing whitespace between eof and object number
2019-06-14 20:40:24 +01:00
Eliot Jones
4c716fcbd6
finish support for revision 5 encryption using aes 256 #34
2019-06-13 19:46:08 +01:00
Eliot Jones
d0a3cd398f
start adding support for revision 5 aes-256 encrypted documents #34
2019-06-09 13:27:03 +01:00
Eliot Jones
f3c8220ec4
add test coverage for invalid document from #33
2019-06-08 16:58:20 +01:00
Eliot Jones
2b486dccab
prevent infinite loops where a stream token's length entry references itself. perform brute force scans in case of a faulty xref table #33
2019-06-08 16:45:02 +01:00
Eliot Jones
21a4ba597e
add support for aes-128 decryption #34
2019-06-08 15:23:21 +01:00
Eliot Jones
a19122478d
begin adding support for in-document security handlers to support aes 128/256 encryption #34
2019-06-08 14:14:51 +01:00
Eliot Jones
39d05e6a47
support big endian and little endian utf 16 in string tokens #32
2019-06-05 18:03:20 +01:00
Eliot Jones
f375cb6f04
keep letters in word when using default word extractor
2019-05-30 20:07:52 +01:00
Eliot Jones
ef822b484d
0.0.6 - update version and sourcelink nuget version
v0.0.6
2019-05-19 13:39:06 +01:00
Eliot Jones
31d12eb731
handle extraneous def token in some dictionaries and skip returning glyph bounds if not in font
2019-05-19 13:27:38 +01:00
Eliot Jones
e9e376c52a
update readme and make page dictionary public
2019-05-19 13:14:38 +01:00
Eliot Jones
872e338ecb
skip invalid commands in type 1 command definitions
2019-05-19 12:58:49 +01:00
Eliot Jones
7e8f3623a4
handle type 1 parser already being at def token when reading till next def token
2019-05-19 12:28:26 +01:00
Eliot Jones
ffa7b3bcc7
generate synthetic encoding where not present and use direct object finder to lookup cropbox and mediabox
2019-05-18 15:20:07 +01:00
Eliot Jones
8a74d5b2f3
use missing width for type 1 fonts when not in pdf array
2019-05-18 14:43:22 +01:00
Eliot Jones
7a3b89ece1
tidy up some doc comments
2019-05-18 12:28:42 +01:00
Eliot Jones
f884674807
Merge branch 'master' of https://github.com/UglyToad/Pdf
2019-05-18 12:25:57 +01:00
Eliot Jones
f3bc3a37b9
add lzw filter support
2019-05-18 12:25:47 +01:00
Eliot Jones
86c5478ddb
Merge pull request #31 from BobLd/master
...
add textline and pdfline
2019-05-15 22:34:01 +01:00
Eliot Jones
9a8becde3e
update the readme to reflect expanded capabilities
2019-05-15 20:05:05 +01:00
Eliot Jones
69b6958c9d
only declare a cff font to be a cid font if the registry ordering supplement (ros) is provided
2019-05-15 20:00:24 +01:00
BobLd
f4ec425bf0
- Correction of the PdfLine's length formula;
...
- Moving Line to TextLine
2019-05-15 19:44:47 +01:00
BobLd
97f0f6fe75
Minor modifications and updates
2019-05-14 20:56:34 +01:00
Eliot Jones
5cf62eaa11
fix counting hintmask bytes where cntrmask is present in type 2 charstrings for cff fonts
2019-05-14 20:08:44 +01:00
BobLd
de421d65a1
Adding Line, PdfLine
2019-05-12 19:39:58 +01:00
BobLd
2011d504a7
In Content:
...
- Adding a 'Line' of text object
- Adding a 'TextDirection' property in the 'Word' object
In Geometry:
- Adding a 'PdfLine' object
- Making the 'PdfRectangle' creator public
2019-05-12 19:34:00 +01:00
Eliot Jones
55d34e3998
use standardencoding name for seac command in type 1 charstrings
2019-05-11 15:57:19 +01:00
Eliot Jones
5b5a0b7f55
fix null reference bug and handle escaped escape characters in string tokenization
2019-05-11 15:35:56 +01:00
Eliot Jones
72ef0f174a
handle hidden standard14 font with no encoding when document incorrectly indicates font is truetype
2019-05-11 14:53:39 +01:00
Eliot Jones
90e9c46373
correctly determine compact font format encoding where supplements are used and pass the font encoding for type 1 fonts to the encoding used to read the pdf
2019-05-11 11:56:01 +01:00
Eliot Jones
9afceed1c5
correctly delimit content streams when concatenating arrays
2019-05-11 10:49:04 +01:00
Eliot Jones
a108672ce8
implement missing methods for various compact font format charsets and fix bug with constructors
2019-05-11 10:23:57 +01:00
Eliot Jones
7d9bd46437
handle null name in tounicode cmap, handle indirect reference in widths array for cid font and handle empty data for private dictionary in compact font format
2019-05-11 10:00:04 +01:00
Eliot Jones
03af28ed6d
fix bug with compact font format font matrix reading and where endstream token is missed if immediately following 'e'
2019-05-10 20:02:29 +01:00
Eliot Jones
3396820d49
throw more descriptive exception when loading system font finder
2019-05-09 19:14:11 +01:00