Updated Document Layout Analysis (markdown)

davebrokit
2021-12-20 20:32:04 +00:00
parent 2fe2701979
commit 453e8700fb

@@ -558,7 +558,9 @@ Exporters deal with the task of transforming a pdf page into a text representati
### Description
The PAGE (Page Analysis and Ground-Truth Elements) Format is an _XML-based page image representation framework that records information on image characteristics (image borders, geometric distortions and corresponding corrections, binarisation etc.) in addition to layout structure and page content._ [PRImA Research Lab](https://www.primaresearch.org/publications/ICPR2010_Pletschacher_PAGE)
You can use PRImA Research Lab's [LayoutEvalGUI](https://www.primaresearch.org/tools/PerformanceEvaluation) software to open the exported file.
You can use one of the following tools from PRImA Research Lab's to open the exported file:
- [LayoutEvalGUI](https://www.primaresearch.org/tools/PerformanceEvaluation)
- [PAGEViewer](https://www.primaresearch.org/tools/PAGEViewer) (This might be easier for beginners)
- __The PAGE xml exporter is the only exporter supporting reading order (as of Jan. 2020).__ See [here](https://github.com/UglyToad/PdfPig/wiki/Document-Layout-Analysis/#result-2) for an example.