EliotJones
85fc63d585
rework numeric tokenizer hot path
...
the existing numeric tokenizer involved allocations and string parsing. since
the number formats in pdf files are fairly predictable we can improve this
substantially
2025-07-25 18:12:43 +01:00
Jason Nelson
7f42a8d60c
Reduce Allocations ( #821 )
...
* Introduce ValueStringBuilder
* Make NumericTokenizer and PlanTextTokenizer thread-safe
* Replace ListPool with ArrayPoolBufferWriter
* Seal ITokenizer classes
* Eliminate array allocation in Type1ArrayTokenizer
* Eliminate array allocation in AcroFormFactory
* Eliminate StringBuilder allocation in Page.GetText
* Optimize PdfSubpath.ToLines
* Eliminate various allocations when parsing CompactFontFormat
* Remove unused FromOctalInt helper
* Ensure Pdf.Content is not null
* Write ASCII values directly to stream (avoiding allocations)
* Avoid encoding additional ASCII values
* Eliminate allocations in TokenWriter.WriteName
* Eliminate allocation in TokenWriter.WriteNumber
* Add System.Memory reference to Fonts
2024-04-28 18:55:58 +01:00
BobLd
9f3d2745f6
Change NumericToken from IDataToken<decimal> to IDataToken<double> and fix #765
2024-02-18 14:53:38 +00:00
Eliot Jones
f2188729a3
#453 handle messed up number format
2022-06-17 20:35:21 -04:00
Eliot Jones
1b472f6992
handle messed up numbers in content #355
2021-08-11 20:56:06 -04:00
Plaisted
a0f0c4d6c7
switch to old syntax for build server
2021-01-19 18:53:44 -06:00
Plaisted
feb6117e1e
fix EOL issues
2021-01-19 18:39:51 -06:00
Plaisted
9bfe69aef1
removing locking
2021-01-19 18:06:50 -06:00
Eliot Jones
db442194c3
use a mutable struct
2020-04-18 12:10:17 +01:00
Eliot Jones
7baa18b5dd
add stringbuilder pool for tokenizers
...
we could replace these with spans in the next net core however for now our pools seem to increase performance by reducing gc load.
2020-04-04 18:31:55 +01:00
Eliot Jones
c6dc4d9eb8
handle tokenizing invalid numeric string correctly
...
rather than throwing when an invalid numeric string is read, our tokenizer now returns false so that error recovery methods can be attempted.
2020-02-21 11:16:31 +00:00
Eliot Jones
bbde38f656
move tokenizers to their own project
...
since both pdfs and Adobe Type1 fonts use postscript type objects, tokenization is needed by the main project and the fonts project
2020-01-05 10:40:44 +00:00