Eliot Jones
68bcaf3901
#55 move support for images to page and add inline images
...
support both xobject and inline images. adds unsupported filters so that exceptions are only thrown when accessing lazily evaluated image.bytes property rather than when opening the page.
treat all warnings as errors.
2019-10-08 14:04:36 +01:00
BobLd
eb5400e01b
Correct PageXmlTextExporter's Height and Width
2019-10-08 12:00:04 +01:00
BobLd
d939be1b9c
update PublicApiScannerTests 2
2019-10-07 16:09:30 +01:00
BobLd
f4f2b0e3fd
update PublicApiScannerTests
2019-10-07 16:02:11 +01:00
BobLd
93313118e9
Support for hORC, AtloXml and PageXml output formats
...
Tested with:
- 'hocrjs' for hORC (see https://unpkg.com/hocrjs )
- 'PAGE Viewer' for hORC, AtloXml and PageXml (see http://www.primaresearch.org/tools/PAGEViewer )
2019-10-07 15:19:30 +01:00
Eliot Jones
c3da10055b
Merge pull request #68 from BobLd/master
...
Improve PdfPath
2019-10-07 11:46:11 +01:00
BobLd
1c3519fd51
Update PdfPath.cs
...
Need to account the case where a `Close` command is called but the first and last commands are not connected.
2019-10-06 12:47:12 +01:00
BobLd
1975db4752
correct typo
2019-10-04 14:50:22 +01:00
BobLd
5d3e4cd4e1
Improve PdfPath
...
- Determine if Closed path
- Determine if Clockwise or CounterClockwise
- Add Centroid
2019-10-04 14:37:41 +01:00
Eliot Jones
e02e130947
#57 add creation and modified date to document information
...
this enables users to check if xmp metadata is outdated
2019-10-03 12:56:48 +01:00
Eliot Jones
38b6f8e812
add current geometry path to page content when it is not explicitly closed #66
2019-09-11 15:38:57 +01:00
Eliot Jones
f822ad48ea
merge pull request #67 from BobLd/master
...
Fix error in DocstrumBB
2019-09-11 12:30:22 +01:00
BobLd
d36dee0e25
Adding handling when pageWords count = 0 for IPageSegmenters
2019-09-04 22:14:08 +01:00
BobLd
68e04603c0
Fix error in DocstrumBB
2019-09-02 19:07:27 +01:00
Eliot Jones
d089a34aa4
lazily evaluate page text and remove linq from word constructor
2019-08-25 15:06:37 +01:00
Eliot Jones
0cd7795bff
add method to get all pages from document
2019-08-23 19:09:33 +01:00
Eliot Jones
3fbfc1130e
lazily evaluate centroid of rectangle
2019-08-20 23:03:27 +01:00
Eliot Jones
6878d9a82d
#64 use decimal values directly rather than from array for transformation matrix
2019-08-20 22:51:00 +01:00
Eliot Jones
613af46472
#62 use byte array instance rather than interface for input bytes
2019-08-20 21:37:31 +01:00
Eliot Jones
bbe5409f94
#62 use length value of stream directly to read the full stream once
2019-08-20 21:08:06 +01:00
Eliot Jones
e0a32a701b
#63 make cache of parsed system fonts static and read the whole file up-front rather than using a filestream
2019-08-19 20:09:07 +01:00
Eliot Jones
0fa3b27ad3
#47 improve flate filter performance by streaming all data in single operation
...
also improves page constructor performance by removing linq and invoking stringbuilder directly. removes page rotation overhead by skipping multiplication for non-rotated pages and using cached transformation matrices for rotations. removes linq from filter provider and shares instances of filter types.
2019-08-19 19:48:02 +01:00
Eliot Jones
11b244eda1
remove thread-unsafe stringbuilder access from adobe font metrics parser
...
this also hoists the char arrays used for string splits since these will be allocated per call if declared inline
2019-08-18 14:10:38 +01:00
Eliot Jones
d98b8b43c1
small performance tweaks and remove package license expression
...
package license url is deprecated in favour of package license expression but nuget doesn't seem to support expressions properly for published packages yet so we'll keep the deprecated url for the time being. having both url and expression causes the build to fail.
small obvious performance improvements for file header passing and getting the encoding information using the existing reverse name to code map.
2019-08-18 13:47:01 +01:00
Eliot Jones
3ff8637bb0
keep license url in the nuget info even though it is deprecated
2019-08-18 11:59:02 +01:00
Eliot Jones
4548ae934b
Merge pull request #61 from vadik299/master
...
Adding TextSequence number to each letter to determine if letters belong to the same Tj operation
2019-08-17 12:59:46 +01:00
Eliot Jones
8c100efe04
Merge pull request #60 from BobLd/master
...
Improve ClusteringAlgorithms.GroupIndexes() and add Equals() to PdfLine
2019-08-17 12:58:06 +01:00
vadik299
cc767b8cd6
Merge branch 'master' into master
2019-08-16 18:34:57 -04:00
BobLd
afa2b7baa1
Improve ClusteringAlgorithms.GroupIndexes()
...
Add Equals() to PdfLine
2019-08-14 19:58:31 +01:00
Eliot Jones
ac62b7247b
version 0.0.9
0.0.9
2019-08-13 21:24:54 +01:00
Eliot Jones
f5e025aa70
merge pull request #58 from uglytoad/colors
...
adds colors to letters and prepares code to add colors to paths.
2019-08-13 20:50:06 +01:00
Eliot Jones
f55091f3d2
make color types public and add stream based tests to prevent future breaking as observed in #52
2019-08-13 20:48:22 +01:00
Vasya
22278f64c4
Added TextSequence
2019-08-11 14:55:59 -04:00
Eliot Jones
980e67fabe
Merge pull request #56 from BobLd/master
...
Document Layout Analysis - IPageSegmenter, Docstrum
2019-08-11 14:04:39 +01:00
BobLd
9f13739add
correcting typo
2019-08-11 13:54:47 +01:00
BobLd
7e8b3bdc85
Update DocstrumBB to account for middle point of the overlapping area distance. For this, using distance between 2 lines.
2019-08-11 13:45:08 +01:00
Eliot Jones
0349bedd3e
#57 add access to document metadata and expose wrapper type
2019-08-11 12:42:30 +01:00
BobLd
c14d77e414
PublicApiScannerTests updated
2019-08-10 16:36:50 +01:00
BobLd
eb9a9fd00e
Document Layout Analysis - IPageSegmenter, Docstrum
...
- Create a TextBlock class
- Creates IPageSegmenter
- Add other useful distances: angle, etc.
- Update RecursiveXYCut
- With IPageSegmenter and TextBlock
- Make XYNode and XYLeaf internal
- Optimise (faster) NearestNeighbourWordExtractor and isolate the clustering algorithms for use outside of this class
- Implement a Docstrum inspired page segmentation algorithm
2019-08-10 16:01:27 +01:00
vadik299
dadcccfa82
Merge pull request #1 from UglyToad/master
...
merging updates
2019-08-10 10:10:44 -04:00
Eliot Jones
fc2d532b82
use single instances of black and white for rgb/gray colors
2019-08-10 14:58:02 +01:00
Eliot Jones
2d6e49426a
Merge pull request #54 from BobLd/master
...
Improving PdfPoint
2019-08-10 14:37:16 +01:00
BobLd
9b24223190
Removing ToDouble()
2019-08-10 13:52:01 +01:00
BobLd
bd58879e32
Update from comments
2019-08-10 13:05:25 +01:00
BobLd
474ce9a442
Improving PdfPoint
2019-08-09 19:58:48 +01:00
Eliot Jones
f243117cfa
Merge pull request #53 from BobLd/master
...
Adding Centroid to PdfRectangle.
2019-08-09 18:59:29 +01:00
BobLd
5399456919
Making the RecursiveXYCut class static.
2019-08-09 18:50:20 +01:00
BobLd
ac065e988a
Adding Centroid to PdfRectangle.
2019-08-09 17:22:16 +01:00
Eliot Jones
d6757e69cb
Merge pull request #52 from Numpsy/rw/streaminputbytes
...
Change StreamInputBytes.Seek to reset isAtEnd to false
2019-08-09 09:40:19 +01:00
Richard Webb
54cd0ae516
Extend the ArrayAndStreamBehaveTheSame test to test seeking back to the start
2019-08-08 23:14:59 +01:00