Commit Graph

642 Commits

Author SHA1 Message Date
Eliot Jones
bbe5409f94 #62 use length value of stream directly to read the full stream once 2019-08-20 21:08:06 +01:00
Eliot Jones
e0a32a701b #63 make cache of parsed system fonts static and read the whole file up-front rather than using a filestream 2019-08-19 20:09:07 +01:00
Eliot Jones
0fa3b27ad3 #47 improve flate filter performance by streaming all data in single operation
also improves page constructor performance by removing linq and invoking stringbuilder directly. removes page rotation overhead by skipping multiplication for non-rotated pages and using cached transformation matrices for rotations. removes linq from filter provider and shares instances of filter types.
2019-08-19 19:48:02 +01:00
Eliot Jones
11b244eda1 remove thread-unsafe stringbuilder access from adobe font metrics parser
this also hoists the char arrays used for string splits since these will be allocated per call if declared inline
2019-08-18 14:10:38 +01:00
Eliot Jones
d98b8b43c1 small performance tweaks and remove package license expression
package license url is deprecated in favour of package license expression but nuget doesn't seem to support expressions properly for published packages yet so we'll keep the deprecated url for the time being. having both url and expression causes the build to fail.

small obvious performance improvements for file header passing and getting the encoding information using the existing reverse name to code map.
2019-08-18 13:47:01 +01:00
Eliot Jones
3ff8637bb0 keep license url in the nuget info even though it is deprecated 2019-08-18 11:59:02 +01:00
Eliot Jones
4548ae934b Merge pull request #61 from vadik299/master
Adding TextSequence number to each letter to determine if letters belong to the same Tj operation
2019-08-17 12:59:46 +01:00
Eliot Jones
8c100efe04 Merge pull request #60 from BobLd/master
Improve ClusteringAlgorithms.GroupIndexes() and add Equals() to PdfLine
2019-08-17 12:58:06 +01:00
vadik299
cc767b8cd6 Merge branch 'master' into master 2019-08-16 18:34:57 -04:00
BobLd
afa2b7baa1 Improve ClusteringAlgorithms.GroupIndexes()
Add Equals() to PdfLine
2019-08-14 19:58:31 +01:00
Eliot Jones
ac62b7247b version 0.0.9 0.0.9 2019-08-13 21:24:54 +01:00
Eliot Jones
f5e025aa70 merge pull request #58 from uglytoad/colors
adds colors to letters and prepares code to add colors to paths.
2019-08-13 20:50:06 +01:00
Eliot Jones
f55091f3d2 make color types public and add stream based tests to prevent future breaking as observed in #52 2019-08-13 20:48:22 +01:00
Vasya
22278f64c4 Added TextSequence 2019-08-11 14:55:59 -04:00
Eliot Jones
980e67fabe Merge pull request #56 from BobLd/master
Document Layout Analysis - IPageSegmenter, Docstrum
2019-08-11 14:04:39 +01:00
BobLd
9f13739add correcting typo 2019-08-11 13:54:47 +01:00
BobLd
7e8b3bdc85 Update DocstrumBB to account for middle point of the overlapping area distance. For this, using distance between 2 lines. 2019-08-11 13:45:08 +01:00
Eliot Jones
0349bedd3e #57 add access to document metadata and expose wrapper type 2019-08-11 12:42:30 +01:00
BobLd
c14d77e414 PublicApiScannerTests updated 2019-08-10 16:36:50 +01:00
BobLd
eb9a9fd00e Document Layout Analysis - IPageSegmenter, Docstrum
- Create a TextBlock class
- Creates IPageSegmenter
- Add other useful distances: angle, etc.
- Update RecursiveXYCut
 - With IPageSegmenter and TextBlock
 - Make XYNode and XYLeaf internal
- Optimise (faster) NearestNeighbourWordExtractor and isolate the clustering algorithms for use outside of this class
- Implement a Docstrum inspired page segmentation algorithm
2019-08-10 16:01:27 +01:00
vadik299
dadcccfa82 Merge pull request #1 from UglyToad/master
merging updates
2019-08-10 10:10:44 -04:00
Eliot Jones
fc2d532b82 use single instances of black and white for rgb/gray colors 2019-08-10 14:58:02 +01:00
Eliot Jones
2d6e49426a Merge pull request #54 from BobLd/master
Improving PdfPoint
2019-08-10 14:37:16 +01:00
BobLd
9b24223190 Removing ToDouble() 2019-08-10 13:52:01 +01:00
BobLd
bd58879e32 Update from comments 2019-08-10 13:05:25 +01:00
BobLd
474ce9a442 Improving PdfPoint 2019-08-09 19:58:48 +01:00
Eliot Jones
f243117cfa Merge pull request #53 from BobLd/master
Adding Centroid to PdfRectangle.
2019-08-09 18:59:29 +01:00
BobLd
5399456919 Making the RecursiveXYCut class static. 2019-08-09 18:50:20 +01:00
BobLd
ac065e988a Adding Centroid to PdfRectangle. 2019-08-09 17:22:16 +01:00
Eliot Jones
d6757e69cb Merge pull request #52 from Numpsy/rw/streaminputbytes
Change StreamInputBytes.Seek to reset isAtEnd to false
2019-08-09 09:40:19 +01:00
Richard Webb
54cd0ae516 Extend the ArrayAndStreamBehaveTheSame test to test seeking back to the start 2019-08-08 23:14:59 +01:00
Richard Webb
f70b7c69a0 Change StreamInputBytes.Seek to reset isAtEnd to false 2019-08-08 23:14:16 +01:00
Eliot Jones
31e15ea097 remove unused docstrum class 2019-08-08 21:21:27 +01:00
Eliot Jones
c5d03bca97 move application of transformation matrix outside path 2019-08-08 21:19:18 +01:00
Eliot Jones
fe270aa9bd merge pull request #50 from BobLd/master
document layout analysis - text edges extractor
2019-08-08 21:16:42 +01:00
BobLd
801ea3ba7f Modified PublicApiScannerTests 2019-08-07 14:22:39 +01:00
BobLd
7de6de3780 Updating with comments 2019-08-07 13:50:07 +01:00
BobLd
e19b03035e Updating woth comments 2019-08-07 13:49:05 +01:00
BobLd
85d5bb7c7e Adding enum EdgeType 2019-08-07 13:45:57 +01:00
Eliot Jones
709294975b Merge pull request #51 from BenyErnest/patch-1
pdate README.md
2019-08-06 19:48:48 +01:00
Benito E. Gómez
d6c4d62dac Update README.md
I think you need to pass the byte array to the File.WriteAllBytes method.
2019-08-06 11:34:54 -04:00
BobLd
9694b1f8e8 Update TextEdgesExtractor.cs 2019-08-06 15:27:16 +01:00
BobLd
83889cfb52 Document Layout Analysis - Text edges extractor
Text edges are where words have either there BoundingBox's left, right or mid coordinate aligned on the same vertical line.
Useful to detect tables, justified text, lists, etc.
2019-08-06 15:24:16 +01:00
Eliot Jones
4dde4ca0c1 add colors to letters based on current font and graphics state 2019-08-05 19:26:10 +01:00
Eliot Jones
0df35b8488 fix naming of color space to be 2 words 2019-08-05 18:32:44 +01:00
Eliot Jones
0b9ae1db13 add color information to the operation context. create color classes for letters and paths to use 2019-08-04 16:47:47 +01:00
Eliot Jones
1d551d6de3 add and document core classes for colorspace information 2019-08-04 12:57:06 +01:00
Eliot Jones
f07ab7d2c3 version 0.0.7 v0.0.7 2019-08-03 16:14:58 +01:00
Eliot Jones
364bd25fa8 #48 add handling of inline image data to pdf content parsing
an inline image in a pdf content stream starts with the bi tag, then id declares the start of image data and ei the end. attempting to parse the bytes after the id tag as usual resulted in errors. this change adds special case handling for inline images.
2019-08-03 15:42:19 +01:00
Eliot Jones
5ee9c49f8a merge pull request #49 from BobLd/master
check if 'fontProgram' is null in Type2CidFont.GetWidthFromFont()
2019-07-30 19:16:09 +01:00