BobLd
0dad611cb1
Implement minimum bounding box algorithm
2020-01-31 16:24:59 +00:00
BobLd
380c36918b
Remove unnecessary code
2020-01-31 16:24:59 +00:00
BobLd
0cbf3434bc
Remove 'orderFunc' from 'NearestNeighbourWordExtractor' to use the order found by clustering algo
2020-01-31 16:24:59 +00:00
BobLd
bc69376743
Increase max distance for TextDirection.Other in NearestNeighbourWordExtractor
2020-01-31 16:24:59 +00:00
BobLd
a326d7e9d9
TextDirection.Unknown -> TextDirection.Other
...
Imporve NearestNeighbourWordExtractor for TextDirection.Other
2020-01-31 16:24:59 +00:00
BobLd
75821919a7
Fix NearestNeighbourWordExtractor for rotated text
2020-01-31 16:24:59 +00:00
BobLd
47672d3f90
Make TextBlock.SetReadingOrder(int) public
2020-01-13 09:25:57 +00:00
BobLd
e8216b29c5
Add reading order in PageXml export
2020-01-12 11:15:58 +00:00
BobLd
e7417be75a
ReadingOrderDetector and tidying DLA project
2020-01-11 11:18:11 +00:00
Eliot Jones
e064d39671
remove unused project references from document layout analysis
2020-01-05 15:44:02 +00:00
Eliot Jones
e0a45e3774
include dependencies as dlls in the published nuget
...
by default nuget pack does not include project dependencies. this is suboptimal since it would require managing at least 5 nuget packages. this uses a workaround detailed here https://github.com/nuget/home/issues/3891 to copy the dependent dlls to the generated nuget package. this doesn't resolve the issue of how we publish the documentlayoutanalysis project, since it is the top of the dependency tree and we publish its parent, rather than it.
2020-01-05 13:56:14 +00:00
Eliot Jones
b29354e3e6
move compact font format fonts to fonts project
2020-01-05 12:08:01 +00:00
Eliot Jones
bbde38f656
move tokenizers to their own project
...
since both pdfs and Adobe Type1 fonts use postscript type objects, tokenization is needed by the main project and the fonts project
2020-01-05 10:40:44 +00:00
Eliot Jones
15525acbaa
move document layout analysis and export to new project
2020-01-05 09:19:58 +00:00