mirror of
https://github.com/UglyToad/PdfPig.git
synced 2025-08-20 09:37:44 +08:00
improve page segmenter description
parent
87e0c7463a
commit
8684acbfa4
@ -110,7 +110,9 @@ Words are indicated by red rectangles:
|
||||

|
||||
|
||||
# Page segmenters
|
||||
Page segmenters deal with the task of finding block of text in a page. 3 different methods are currently available:
|
||||
Page segmenters deal with the task of finding blocks of text in a page. They return a list of `TextBlock`s that can be thinked as paragraphs. Each `TextBlock` will contain the list of lines (`TextLine`) that belong to it. In turn, each `TextLine` contains the list `Word`s that belong to it. Each of these elements have their own bounding box and text.
|
||||
|
||||
3 different methods are currently available:
|
||||
- [__Default method__](https://github.com/UglyToad/PdfPig/wiki/Document-Layout-Analysis#default-method-1)
|
||||
- [__Recursive XY Cut__](https://github.com/UglyToad/PdfPig/wiki/Document-Layout-Analysis#recursive-xy-cut-method) – a top-down method
|
||||
- [__Docstrum for bounding boxes__](https://github.com/UglyToad/PdfPig/wiki/Document-Layout-Analysis#docstrum-for-bounding-boxes-method) – a bottom-up method
|
||||
|
||||
Loading…
Reference in New Issue
Block a user