Commit Graph

7 Commits

Author SHA1 Message Date
BobLd
d246bf5c74 - remove unnecessary casts
- make PageXmlTextExporter.Deserialize() public
2019-12-31 10:43:07 +00:00
BobLd
fd9efb5b5d making FindIndexNearest() internal 2019-12-06 17:29:33 +00:00
BobLd
b5a0faaa3b Improving clustering algorithm 2019-12-06 16:02:30 +00:00
BobLd
7e8b3bdc85 Update DocstrumBB to account for middle point of the overlapping area distance. For this, using distance between 2 lines. 2019-08-11 13:45:08 +01:00
BobLd
eb9a9fd00e Document Layout Analysis - IPageSegmenter, Docstrum
- Create a TextBlock class
- Creates IPageSegmenter
- Add other useful distances: angle, etc.
- Update RecursiveXYCut
 - With IPageSegmenter and TextBlock
 - Make XYNode and XYLeaf internal
- Optimise (faster) NearestNeighbourWordExtractor and isolate the clustering algorithms for use outside of this class
- Implement a Docstrum inspired page segmentation algorithm
2019-08-10 16:01:27 +01:00
BobLd
f8d0883da5 Update with corrections 2019-06-18 20:48:49 +01:00
BobLd
a0c864e8af Addind Document Layout Analysis:
- Nearest Neighbour Word Extractor
- Recursive X-Y Cut algorithm, useful for multi-column pdf documents
2019-06-16 13:57:30 +01:00