Commit Graph

14 Commits

Author SHA1 Message Date
BobLd
0dad611cb1 Implement minimum bounding box algorithm 2020-01-31 16:24:59 +00:00
BobLd
380c36918b Remove unnecessary code 2020-01-31 16:24:59 +00:00
BobLd
0cbf3434bc Remove 'orderFunc' from 'NearestNeighbourWordExtractor' to use the order found by clustering algo 2020-01-31 16:24:59 +00:00
BobLd
bc69376743 Increase max distance for TextDirection.Other in NearestNeighbourWordExtractor 2020-01-31 16:24:59 +00:00
BobLd
a326d7e9d9 TextDirection.Unknown -> TextDirection.Other
Imporve NearestNeighbourWordExtractor for TextDirection.Other
2020-01-31 16:24:59 +00:00
BobLd
75821919a7 Fix NearestNeighbourWordExtractor for rotated text 2020-01-31 16:24:59 +00:00
BobLd
47672d3f90 Make TextBlock.SetReadingOrder(int) public 2020-01-13 09:25:57 +00:00
BobLd
e8216b29c5 Add reading order in PageXml export 2020-01-12 11:15:58 +00:00
BobLd
e7417be75a ReadingOrderDetector and tidying DLA project 2020-01-11 11:18:11 +00:00
Eliot Jones
e064d39671 remove unused project references from document layout analysis 2020-01-05 15:44:02 +00:00
Eliot Jones
e0a45e3774 include dependencies as dlls in the published nuget
by default nuget pack does not include project dependencies. this is suboptimal since it would require managing at least 5 nuget packages. this uses a workaround detailed here https://github.com/nuget/home/issues/3891 to copy the dependent dlls to the generated nuget package. this doesn't resolve the issue of how we publish the documentlayoutanalysis project, since it is the top of the dependency tree and we publish its parent, rather than it.
2020-01-05 13:56:14 +00:00
Eliot Jones
b29354e3e6 move compact font format fonts to fonts project 2020-01-05 12:08:01 +00:00
Eliot Jones
bbde38f656 move tokenizers to their own project
since both pdfs and Adobe Type1 fonts use postscript type objects, tokenization is needed by the main project and the fonts project
2020-01-05 10:40:44 +00:00
Eliot Jones
15525acbaa move document layout analysis and export to new project 2020-01-05 09:19:58 +00:00