mirror of
https://github.com/UglyToad/PdfPig.git
synced 2026-03-10 00:23:29 +08:00
Updated Document Layout Analysis (markdown)
@@ -324,7 +324,8 @@ var blocks = recursiveXYCut.GetBlocks(words);
|
||||
|
||||
__NB__: Isolated bullet points can be handled by setting a minimum block width, e.g. `RecursiveXYCut.Instance.GetBlocks(words, new RecursiveXYCut.RecursiveXYCutOptions() { MinimumWidth = page.Width / 3.0 })`
|
||||
|
||||
__NB__: DominantFontHeightFunc: The examples above use the average letter glyph height for that page. But using the median glyph height would generally produce better results (the median ignores extremes that can impact the results). You may also want to consider using the median glyph height across all the pages in certain situations.
|
||||
__NB__: DominantFontHeightFunc: The examples above use the average letter glyph height for that page. But using the median glyph height would generally produce better results (the median ignores extremes that can impact the results). You may also want to consider using the median glyph height across all the pages in certain situations. With text where the gap between lines may be wider than the font height consider using the mean/median distance between the text lines.
|
||||
|
||||
|
||||
## [Docstrum for bounding boxes method](https://github.com/UglyToad/PdfPig/blob/master/src/UglyToad.PdfPig.DocumentLayoutAnalysis/DocstrumBoundingBoxes.cs)
|
||||
### Description
|
||||
|
||||
Reference in New Issue
Block a user