Updated Document Layout Analysis (markdown)

davebrokit
2025-01-05 17:25:28 +00:00
parent 6dff7a6c88
commit e40f9f1302

@@ -324,7 +324,8 @@ var blocks = recursiveXYCut.GetBlocks(words);
__NB__: Isolated bullet points can be handled by setting a minimum block width, e.g. `RecursiveXYCut.Instance.GetBlocks(words, new RecursiveXYCut.RecursiveXYCutOptions() { MinimumWidth = page.Width / 3.0 })`
__NB__: DominantFontHeightFunc: The examples above use the average letter glyph height for that page. But using the median glyph height would generally produce better results (the median ignores extremes that can impact the results). You may also want to consider using the median glyph height across all the pages in certain situations.
__NB__: DominantFontHeightFunc: The examples above use the average letter glyph height for that page. But using the median glyph height would generally produce better results (the median ignores extremes that can impact the results). You may also want to consider using the median glyph height across all the pages in certain situations. With text where the gap between lines may be wider than the font height consider using the mean/median distance between the text lines.
## [Docstrum for bounding boxes method](https://github.com/UglyToad/PdfPig/blob/master/src/UglyToad.PdfPig.DocumentLayoutAnalysis/DocstrumBoundingBoxes.cs)
### Description