BobLd
3ac26bb1bc
fix bbox for TextLine and TextBlock
2020-05-30 13:03:59 +01:00
BobLd
14454184ad
update RecursiveXYCutTests
2020-05-30 13:03:59 +01:00
BobLd
6d31ef80a7
add RecursiveXYCutTests
2020-05-30 13:03:59 +01:00
BobLd
aa0e75d768
update DocstrumBoundingBoxesTests
2020-05-30 13:03:59 +01:00
BobLd
208e1dd8f2
add DocstrumBoundingBoxesTests
2020-05-30 13:03:59 +01:00
BobLd
75e9046c16
add DlaHelper for tests and correct minor typos
2020-05-30 13:03:59 +01:00
BobLd
05d96cd9c4
add documents for tests
2020-05-30 13:03:59 +01:00
BobLd
465cf3f072
update word rotated bbox with previous PdfRectangle constructor order
2020-05-30 13:03:59 +01:00
BobLd
dacf816a86
add summary doc to Clipper
2020-05-30 13:03:59 +01:00
BobLd
f883b56e72
completely rework DocstrumBoundingBoxes, now handle rotated text
2020-05-30 13:03:59 +01:00
BobLd
a16f377d5a
update DefaultPageSegmenter to use DlaOptions
2020-05-30 13:03:59 +01:00
BobLd
1438fec741
update RecursiveXYCut to use DlaOptions
2020-05-30 13:03:59 +01:00
BobLd
5362a335f5
update XYLeaf with word separator
2020-05-30 13:03:59 +01:00
BobLd
79b78f486a
add ReadingOrderHelper
2020-05-30 13:03:59 +01:00
BobLd
ec613d337f
correct Word bounding box
2020-05-30 13:03:59 +01:00
BobLd
8f1ab2022f
update NearestNeighbourWordExtractor to use DlaOptions, stop ordering words
2020-05-30 13:03:59 +01:00
BobLd
43a68693ba
allow oriented bounding box for TextBlock
2020-05-30 13:03:59 +01:00
BobLd
5b0b0a6db3
allow oriented bounding box for TextLine
2020-05-30 13:03:59 +01:00
BobLd
bb94348127
add text Separator in TextBlock and TextLine
2020-05-30 13:03:59 +01:00
BobLd
5f75205e41
rename TextDirection into TextOrientation
2020-05-30 13:03:59 +01:00
BobLd
33ee66af42
add PageSegmenterOptions abstract class
2020-05-30 13:03:59 +01:00
BobLd
dd546dcfc8
update IPageSegmenter with DlaOptions
2020-05-30 13:03:59 +01:00
BobLd
3cf7c45994
add DlaOptions abstract class
2020-05-30 13:03:59 +01:00
BobLd
a00660cd6e
update nn word extractor for new clustering type
2020-05-30 13:03:59 +01:00
BobLd
78da925263
add AlmostEqualsToZero() and AlmostEquals().
2020-05-30 13:03:59 +01:00
BobLd
08300f6a3a
use IReadOnlyList<PdfPoint> i/o PdfPoint[] in KdTree
2020-05-30 13:03:59 +01:00
BobLd
ca4111ec1b
better parameters for FindIndexNearest
2020-05-30 13:03:59 +01:00
BobLd
d2c2a2f592
add angle bounding functions
2020-05-30 13:03:59 +01:00
BobLd
404d6621de
return grouped elements i/o grouped indexes in Clustering.NearestNeighbours
2020-05-30 13:03:59 +01:00
BobLd
78d57ad5f9
Add italic-bold test
2020-05-23 16:26:39 +01:00
BobLd
40afe977a3
check for ItalicAngle != 0
2020-05-23 16:26:39 +01:00
Eliot Jones
a7a2ef0630
remove old text from the readme
2020-05-10 16:40:05 +01:00
Eliot Jones
256c2833ab
0.1.2-alpha002
01.2-alpha002
2020-05-10 16:36:14 +01:00
BobLd
bb33741552
Fix KdTree.FindNearestNeighbours(k) returning the pivot itself
2020-05-10 15:49:00 +01:00
Eliot Jones
0512bb1e4f
handle indirect references appearing in cid font widths array #174
2020-05-10 15:46:38 +01:00
BobLd
f91acefcfa
Set ClipPaths to false if no ParsingOptions given (consistent behaviour)
2020-04-27 17:21:52 +01:00
Eliot Jones
09b951f667
expose font details on individual letters
...
also fixes a regression for image extraction
2020-04-25 17:15:26 +01:00
Eliot Jones
98dd736f94
0.1.2-alpha001
0.1.2-alpha001
2020-04-25 15:20:07 +01:00
Eliot Jones
ae62197178
merge pull request #167 from bobld/master
...
Improve NearestNeighbourWordExtractor
2020-04-25 15:07:18 +01:00
Eliot Jones
e264583c21
add merging to the readme
2020-04-25 12:06:11 +01:00
Eliot Jones
19047f62ae
fix name output for merged documents
2020-04-25 11:23:37 +01:00
Eliot Jones
391b650e3c
add more examples to the examples solution
2020-04-25 10:14:05 +01:00
BobLd
ae82c30a31
Merge branch 'master' of https://github.com/UglyToad/PdfPig
2020-04-25 10:10:08 +01:00
Eliot Jones
27e251f921
make filter provider and filter public and use tryget for image bytes
2020-04-25 09:42:24 +01:00
Eliot Jones
635c4b4c5e
formatting tidy-up
2020-04-25 09:11:16 +01:00
BobLd
c2de52423e
Make NearestNeighbours public
2020-04-25 08:40:43 +01:00
BobLd
d4210cd5d1
Make clustering algos public and use shorter names
2020-04-25 08:40:43 +01:00
Adam Busbin
00b9d416df
added check for bad fonts see 61ceca8376/fontbox/src/main/java/org/apache/fontbox/ttf/HorizontalMetricsTable.java
line 67 for matching code.
2020-04-25 08:40:12 +01:00
BobLd
0a6ec3946b
NearestNeighbourWordExtractor:
...
- Improve results by using PointSize
- Make 'filterFunction' public for ad hoc GetWords()
- Allow text in different direction
Make Letter.PointSize public and add warning (needed for NNWordExtractor)
Remove Page.GetPointSize(Letter letter)
2020-04-20 13:09:35 +01:00
BobLd
8eb50517dd
Merge branch 'master' of https://github.com/UglyToad/PdfPig
2020-04-20 00:08:09 +01:00