improve page segmenter description

BobLd 2020-07-11 15:47:12 +01:00
parent 87e0c7463a
commit 8684acbfa4

@ -110,7 +110,9 @@ Words are indicated by red rectangles:
![nearest neighbour word example](https://github.com/UglyToad/PdfPig/blob/master/documentation/Document%20Layout%20Analysis/nearest%20neighbour%20word%20example%20v2.png)
# Page segmenters
Page segmenters deal with the task of finding block of text in a page. 3 different methods are currently available:
Page segmenters deal with the task of finding blocks of text in a page. They return a list of `TextBlock`s that can be thinked as paragraphs. Each `TextBlock` will contain the list of lines (`TextLine`) that belong to it. In turn, each `TextLine` contains the list `Word`s that belong to it. Each of these elements have their own bounding box and text.
3 different methods are currently available:
- [__Default method__](https://github.com/UglyToad/PdfPig/wiki/Document-Layout-Analysis#default-method-1)
- [__Recursive XY Cut__](https://github.com/UglyToad/PdfPig/wiki/Document-Layout-Analysis#recursive-xy-cut-method) a top-down method
- [__Docstrum for bounding boxes__](https://github.com/UglyToad/PdfPig/wiki/Document-Layout-Analysis#docstrum-for-bounding-boxes-method) a bottom-up method