Commit Graph

12 Commits

Author SHA1 Message Date
Jason Nelson
7f42a8d60c Reduce Allocations (#821)
* Introduce ValueStringBuilder

* Make NumericTokenizer and PlanTextTokenizer thread-safe

* Replace ListPool with ArrayPoolBufferWriter

* Seal ITokenizer classes

* Eliminate array allocation in Type1ArrayTokenizer

* Eliminate array allocation in AcroFormFactory

* Eliminate StringBuilder allocation in Page.GetText

* Optimize PdfSubpath.ToLines

* Eliminate various allocations when parsing CompactFontFormat

* Remove unused FromOctalInt helper

* Ensure Pdf.Content is not null

* Write ASCII values directly to stream (avoiding allocations)

* Avoid encoding additional ASCII values

* Eliminate allocations in TokenWriter.WriteName

* Eliminate allocation in TokenWriter.WriteNumber

* Add System.Memory reference to Fonts
2024-04-28 18:55:58 +01:00
Jason Nelson
6d54355754 Spanify filters 2024-04-12 07:42:19 +01:00
BobLd
acfe8b5fdd Allow lenient parsing in DictionaryTokenizer and fix #791 2024-03-11 20:01:07 +00:00
Eliot Jones
6f59bed9a2 use pdfdocencoding when parsing strings 2023-06-04 16:40:43 +01:00
Eliot Jones
fc2f7b9325 add intelligent error recovery for known dictionaries #511
if we're parsing a known dictionary (e.g. all keys are required
and there are no additional optional keys) and we encounter
an error we provide the possibility to recover by assuming
a dictionary end token after all required tokens are consumed
if parsing by looking for dictionary end failed due to a format
exception
2023-05-21 14:58:39 +01:00
Plaisted
4c807691b7 adding in PlainTokenizer to unpooled SB changes 2021-01-19 18:52:14 -06:00
Plaisted
feb6117e1e fix EOL issues 2021-01-19 18:39:51 -06:00
Plaisted
0b716a759f adding comment for non-static tokenizer 2021-01-19 18:18:33 -06:00
Plaisted
9bfe69aef1 removing locking 2021-01-19 18:06:50 -06:00
Eliot Jones
693a3d5958 use offset to file header to correct cross references
if the %pdf version header comment is offset from the start of the file the cross reference offsets will also be wrong by that amount. this change updates the cross reference location logic to use the offset from the located version header.
2020-01-26 15:30:20 +00:00
Eliot Jones
ba09a13d08 more end image recovery logic
since inline image data may contain the end image "ei" token inside the data stream there's no reliable way to actually determine if we've read all the data. for this reason if we end up with an invalid state parsing operations after we've read the end image token we try to recover by reading from the previous token to the next end image token if any. we supply log information to let the consumer know this is what we're doing. it's still not bullet-proof but it should be good enough.

also support negative page rotation values by adding them to a 360 degree rotation so -90 degrees clockwise is 270 degrees clockwise.
2020-01-25 15:53:08 +00:00
Eliot Jones
bbde38f656 move tokenizers to their own project
since both pdfs and Adobe Type1 fonts use postscript type objects, tokenization is needed by the main project and the fonts project
2020-01-05 10:40:44 +00:00