Eliot Jones
80f024dbed
make form access public
2019-11-27 16:36:25 +00:00
Eliot Jones
ed53773c7b
handle checked state of radio buttons and checkboxes
2019-11-27 15:34:28 +00:00
Eliot Jones
910e22a4e9
wrap checkboxes and radiobuttons in their own form field types with access to the child collections
2019-11-26 16:33:24 +00:00
BobLd
9da0623fab
Merge branch 'master' of https://github.com/UglyToad/PdfPig
2019-11-26 12:16:43 +00:00
Eliot Jones
677d2b5e8f
#82 make resource store state local to the page and operation being processed
...
resources such as fonts are linked to page content operations using name labels, e.g. "/F1", these resource labels can be reassigned on different pages or inside form xobjects. we now clear the entire resource state for each page which is parsed and after form xobject operations which use resource dictionaries.
2019-11-25 14:34:02 +00:00
Eliot Jones
9028f932b2
#83 decrypt dictionary hex values
2019-11-25 12:42:32 +00:00
BobLd
a8559c1167
Add basic bookmarks extraction capabilities.
2019-11-04 15:11:54 +00:00
BobLd
99f260befb
Enhancing NearestNeighbourWordExtractor
...
- Making the code easier to read
- Using 20% of Width instead of 60%
- Making DefaultWordExtractor public
2019-10-21 20:51:27 +01:00
BobLd
0b2a0f4bc7
AltoDocument: make all xxxSpecified
setter public to allow Deserialize
.
2019-10-20 12:25:34 +01:00
Eliot Jones
80fc404b10
#47 improve performance by caching truetype bounding boxes
...
also uses less reflection when parsing the page content stream
2019-10-18 15:56:28 +01:00
Eliot Jones
84990722ca
#76 add infinite loop protection for brute force search
...
also treats 'm' or 'j' in endstream/endobj as a valid object number start character
2019-10-17 16:50:01 +01:00
Eliot Jones
efe7896824
#75 support vertical writing mode fonts
2019-10-17 15:57:04 +01:00
Eliot Jones
a2147902a0
merge pull request #72 from uglytoad/fix-export-formatting
...
fix export formatting
2019-10-17 11:28:06 +01:00
Eliot Jones
09b26c43e0
#74 add intersectswith method to rectangle
2019-10-17 11:21:49 +01:00
Eliot Jones
57dfee3211
move alto xml exporter to root export namespace
2019-10-17 10:46:43 +01:00
Eliot Jones
3f1321141a
#73 process xobject form content when extracting text and images
2019-10-16 14:59:16 +01:00
Eliot Jones
6174877892
#71 ignore malformed dates in true type header table. fix reading of dates from bytes
2019-10-16 10:51:02 +01:00
Eliot Jones
d68bd88824
format and tidy up alto export autogenerated code. tidy up docstrum
2019-10-14 18:30:18 +01:00
BobLd
e9b3db7102
Make ITextExporter implementations public
2019-10-11 08:55:03 +01:00
BobLd
f886411e12
Merge https://github.com/UglyToad/PdfPig
2019-10-10 16:52:45 +01:00
Eliot Jones
dec4c31a33
fix bug where cross reference stream subsections were skipped
...
a single cross-reference stream may contain multiple disjoint runs of object numbers, previously we only took the first now we load all objects.
adds indexer to array token for ease-of-use.
adds page number and bounds information to all form fields.
2019-10-10 16:05:21 +01:00
BobLd
a15f56a6ac
Better handling of UTF8 in XmlWriter
2019-10-10 14:14:05 +01:00
BobLd
fe1a3c4b8b
updated from comments
...
- still need to look at XmlWriter
2019-10-10 12:29:28 +01:00
Eliot Jones
2ef45f71d5
make missing acroform types public and start improving data
...
also changes pages to use a proper tree structure since this will be required for resource inheritance and for acroform widget dictionaries.
2019-10-09 14:28:37 +01:00
Eliot Jones
81ab414c56
add is supported flag to filters and add missing doc comment
2019-10-08 15:53:42 +01:00
BobLd
bf09aee99c
Adding images regions
2019-10-08 15:29:18 +01:00
BobLd
9ab943e1f9
Merge branch 'master' of https://github.com/UglyToad/PdfPig
2019-10-08 14:16:59 +01:00
Eliot Jones
77f968b6ea
merge pull request #70 from uglytoad/add-images
...
#55 move support for images to page and add inline images
2019-10-08 14:11:19 +01:00
Eliot Jones
68bcaf3901
#55 move support for images to page and add inline images
...
support both xobject and inline images. adds unsupported filters so that exceptions are only thrown when accessing lazily evaluated image.bytes property rather than when opening the page.
treat all warnings as errors.
2019-10-08 14:04:36 +01:00
BobLd
eb5400e01b
Correct PageXmlTextExporter's Height and Width
2019-10-08 12:00:04 +01:00
BobLd
93313118e9
Support for hORC, AtloXml and PageXml output formats
...
Tested with:
- 'hocrjs' for hORC (see https://unpkg.com/hocrjs )
- 'PAGE Viewer' for hORC, AtloXml and PageXml (see http://www.primaresearch.org/tools/PAGEViewer )
2019-10-07 15:19:30 +01:00
BobLd
1c3519fd51
Update PdfPath.cs
...
Need to account the case where a `Close` command is called but the first and last commands are not connected.
2019-10-06 12:47:12 +01:00
BobLd
1975db4752
correct typo
2019-10-04 14:50:22 +01:00
BobLd
5d3e4cd4e1
Improve PdfPath
...
- Determine if Closed path
- Determine if Clockwise or CounterClockwise
- Add Centroid
2019-10-04 14:37:41 +01:00
Eliot Jones
e02e130947
#57 add creation and modified date to document information
...
this enables users to check if xmp metadata is outdated
2019-10-03 12:56:48 +01:00
Eliot Jones
38b6f8e812
add current geometry path to page content when it is not explicitly closed #66
2019-09-11 15:38:57 +01:00
BobLd
d36dee0e25
Adding handling when pageWords count = 0 for IPageSegmenters
2019-09-04 22:14:08 +01:00
BobLd
68e04603c0
Fix error in DocstrumBB
2019-09-02 19:07:27 +01:00
Eliot Jones
d089a34aa4
lazily evaluate page text and remove linq from word constructor
2019-08-25 15:06:37 +01:00
Eliot Jones
0cd7795bff
add method to get all pages from document
2019-08-23 19:09:33 +01:00
Eliot Jones
3fbfc1130e
lazily evaluate centroid of rectangle
2019-08-20 23:03:27 +01:00
Eliot Jones
6878d9a82d
#64 use decimal values directly rather than from array for transformation matrix
2019-08-20 22:51:00 +01:00
Eliot Jones
613af46472
#62 use byte array instance rather than interface for input bytes
2019-08-20 21:37:31 +01:00
Eliot Jones
bbe5409f94
#62 use length value of stream directly to read the full stream once
2019-08-20 21:08:06 +01:00
Eliot Jones
e0a32a701b
#63 make cache of parsed system fonts static and read the whole file up-front rather than using a filestream
2019-08-19 20:09:07 +01:00
Eliot Jones
0fa3b27ad3
#47 improve flate filter performance by streaming all data in single operation
...
also improves page constructor performance by removing linq and invoking stringbuilder directly. removes page rotation overhead by skipping multiplication for non-rotated pages and using cached transformation matrices for rotations. removes linq from filter provider and shares instances of filter types.
2019-08-19 19:48:02 +01:00
Eliot Jones
11b244eda1
remove thread-unsafe stringbuilder access from adobe font metrics parser
...
this also hoists the char arrays used for string splits since these will be allocated per call if declared inline
2019-08-18 14:10:38 +01:00
Eliot Jones
d98b8b43c1
small performance tweaks and remove package license expression
...
package license url is deprecated in favour of package license expression but nuget doesn't seem to support expressions properly for published packages yet so we'll keep the deprecated url for the time being. having both url and expression causes the build to fail.
small obvious performance improvements for file header passing and getting the encoding information using the existing reverse name to code map.
2019-08-18 13:47:01 +01:00
Eliot Jones
3ff8637bb0
keep license url in the nuget info even though it is deprecated
2019-08-18 11:59:02 +01:00
Eliot Jones
4548ae934b
Merge pull request #61 from vadik299/master
...
Adding TextSequence number to each letter to determine if letters belong to the same Tj operation
2019-08-17 12:59:46 +01:00