Commit Graph

12 Commits

Author SHA1 Message Date
EliotJones
85fc63d585 rework numeric tokenizer hot path
the existing numeric tokenizer involved allocations and string parsing. since
the number formats in pdf files are fairly predictable we can improve this
substantially
2025-07-25 18:12:43 +01:00
Jason Nelson
7f42a8d60c Reduce Allocations (#821)
* Introduce ValueStringBuilder

* Make NumericTokenizer and PlanTextTokenizer thread-safe

* Replace ListPool with ArrayPoolBufferWriter

* Seal ITokenizer classes

* Eliminate array allocation in Type1ArrayTokenizer

* Eliminate array allocation in AcroFormFactory

* Eliminate StringBuilder allocation in Page.GetText

* Optimize PdfSubpath.ToLines

* Eliminate various allocations when parsing CompactFontFormat

* Remove unused FromOctalInt helper

* Ensure Pdf.Content is not null

* Write ASCII values directly to stream (avoiding allocations)

* Avoid encoding additional ASCII values

* Eliminate allocations in TokenWriter.WriteName

* Eliminate allocation in TokenWriter.WriteNumber

* Add System.Memory reference to Fonts
2024-04-28 18:55:58 +01:00
BobLd
9f3d2745f6 Change NumericToken from IDataToken<decimal> to IDataToken<double> and fix #765 2024-02-18 14:53:38 +00:00
Eliot Jones
f2188729a3 #453 handle messed up number format 2022-06-17 20:35:21 -04:00
Eliot Jones
1b472f6992 handle messed up numbers in content #355 2021-08-11 20:56:06 -04:00
Plaisted
a0f0c4d6c7 switch to old syntax for build server 2021-01-19 18:53:44 -06:00
Plaisted
feb6117e1e fix EOL issues 2021-01-19 18:39:51 -06:00
Plaisted
9bfe69aef1 removing locking 2021-01-19 18:06:50 -06:00
Eliot Jones
db442194c3 use a mutable struct 2020-04-18 12:10:17 +01:00
Eliot Jones
7baa18b5dd add stringbuilder pool for tokenizers
we could replace these with spans in the next net core however for now our pools seem to increase performance by reducing gc load.
2020-04-04 18:31:55 +01:00
Eliot Jones
c6dc4d9eb8 handle tokenizing invalid numeric string correctly
rather than throwing when an invalid numeric string is read, our tokenizer now returns false so that error recovery methods can be attempted.
2020-02-21 11:16:31 +00:00
Eliot Jones
bbde38f656 move tokenizers to their own project
since both pdfs and Adobe Type1 fonts use postscript type objects, tokenization is needed by the main project and the fonts project
2020-01-05 10:40:44 +00:00