mirror of
https://github.com/UglyToad/PdfPig.git
synced 2025-08-20 09:37:44 +08:00
Updated Document Layout Analysis (markdown)
parent
3974295f5b
commit
87e0c7463a
@ -37,8 +37,7 @@ In order to decide wether two glyphs are _close enough_ from each other, the alg
|
||||
|
||||
If the measured distance between the two glyphs is below this threshold, they are deemed to be connected.
|
||||
|
||||
Once glyphs are connected, they are then grouped to form words via a [depth first search algorithm](https://en.wikipedia.org/wiki/Depth-first_search).
|
||||
It seems that both [left-to-right and right-to-left](https://en.wikipedia.org/wiki/Right-to-left) scripts have there glyph `StartBaseLine` on the left and `EndBaseLine` on the right.
|
||||
Once glyphs are connected, they are then grouped to form words via a [depth first search algorithm](https://en.wikipedia.org/wiki/Depth-first_search). The extractor should work for [left-to-right and right-to-left](https://en.wikipedia.org/wiki/Right-to-left) scripts as it seems that they both have there glyphs' `StartBaseLine` on the left and `EndBaseLine` on the right of the bounding box.
|
||||
|
||||
### Usage
|
||||
#### Simple case
|
||||
|
||||
Loading…
Reference in New Issue
Block a user