PdfPig

lsm/PdfPig

mirror of https://github.com/UglyToad/PdfPig.git synced 2026-03-10 00:23:29 +08:00

Author	SHA1	Message	Date
EliotJones	e886ae648f	copy other parser behavior by treating end of stream as valid end inline image this file cotains corrupt content following an inline image but other parsers just treat this content as part of the image and parse the rest of the file successfully	2025-09-13 14:36:14 +01:00
Eliot Jones	8408c98aec	Draft release on master build (#1145 ) * remove alpha postfix, releases will increment version * update the master build job to draft a release * add publish action to publish full release * enable setting assembly and file version * bump assembly and file version for package project --------- Co-authored-by: BobLd <38405645+BobLd@users.noreply.github.com>	2025-09-08 20:07:36 +01:00
Chuck B.	1ed9e017f4	Performance improvements and .Net 9 support (#1116 ) * Refactor letter handling by orientation for efficiency Improved the processing of letters based on their text orientation by preallocating separate lists for each orientation (horizontal, rotate270, rotate180, rotate90, and other). This change reduces multiple calls to `GetWords` and minimizes enumerations and allocations, enhancing performance and readability. Each letter is now added to the appropriate list in a single iteration over the `letters` collection. * Update target frameworks to include net9.0 Expanded compatibility in `UglyToad.PdfPig.csproj` by adding `net9.0` to the list of target frameworks, alongside existing versions. * Add .NET 9.0 support and refactor key components Updated project files for UglyToad.PdfPig to target .NET 9.0, enhancing compatibility with the latest framework features. Refactored `GetBlocks` in `DocstrumBoundingBoxes.cs` for improved input handling and performance. Significantly optimized `NearestNeighbourWordExtractor.cs` by replacing multiple lists with an array of buckets and implementing parallel processing for better efficiency. Consistent updates across `Fonts`, `Tests`, `Tokenization`, and `Tokens` project files to include .NET 9.0 support. * Improve null checks and optimize list handling - Updated null check for `words` in `DocstrumBoundingBoxes.cs` for better readability and performance. - Changed from `ToList()` to `ToArray()` to avoid unnecessary enumeration. - Added `results.TrimExcess()` in `NearestNeighbourWordExtractor.cs` to optimize memory usage. --------- Co-authored-by: Chuck Beasley <CBeasley@kilpatricktownsend.com>	2025-08-01 22:24:16 +01:00
EliotJones	b6bd0a3169	bump version to 0.1.12-alpha001	2025-07-26 13:43:28 -05:00
EliotJones	3d2e12cb16	version 0.1.11	2025-07-26 13:16:01 -05:00
EliotJones	85fc63d585	rework numeric tokenizer hot path the existing numeric tokenizer involved allocations and string parsing. since the number formats in pdf files are fairly predictable we can improve this substantially	2025-07-25 18:12:43 +01:00
EliotJones	0586713da3	skip comments in pdf objects streams #926 the file provided in issue #926 contains the following syntax in pdf object streams: ``` % 750 0 obj << >> ``` currently we read the comment token and skip the rest however this producer is writing nonsense to the stream. comment tokens are only valid outside streams in pdf files so we align to the behavior of pdfbox here by skipping the entire line containing a comment inside a stream which fixes parsing this file.	2025-07-06 07:13:55 +01:00
BobLd	8f9194c9a4	Miscellaneous minor changes	2025-05-31 23:02:46 +01:00
BobLd	f84f2aceec	Improve memory allocation by changing IFilter.Decode() signature to use Memory<byte> instead of ReadOnlyMemory/ReadOnlySpan	2025-05-29 12:41:50 +01:00
BobLd	ca9f70ffb0	Skip control chars in CoreTokenScanner.MoveNext() and fix #1048	2025-05-27 20:57:38 +02:00
BobLd	a4a0fe220a	Bump version to 0.1.11-alpha001 Some checks are pending Build and test / build (push) Waiting to run Details Run Integration Tests / build (push) Waiting to run Details	2025-03-08 13:42:57 +00:00
BobLd	d36e9a900f	version 0.1.10	2025-03-08 13:00:43 +00:00
BobLd	bcc8ccecbe	Stop treating Warnings as Errors (#941 )	2024-11-23 18:23:22 +00:00
Eliot Jones	c46722fa26	version 0.1.9 Some checks failed Build and test / build (push) Has been cancelled Details Run Integration Tests / build (push) Has been cancelled Details	2024-10-06 15:40:24 +01:00
Jason Nelson	c6a7a2d0a2	Improve Code Quality (#831 ) * Introduce globals * Spanify TransformationMatrix.FromArray * Eliminate allocation in GeometryExtensions.ParametricPerpendicularProjection * Eliminate allocation in CrossReferenceTablePart.Parse * Optimize Adam7 (eliminate virtual calls) * Spanify QuadPointsQuadrilateral.Points to eliminate virtual calls * Eliminate allocation in PdfRectangle.Normalize * Format TransformMatrix * Pass TransformationMatrix by reference in TransformationMatrix.Multiply * Seal NoTextTokenWriter	2024-05-06 07:38:06 +01:00
Jason Nelson	da44e1a540	Improve code quality (#825 ) * Avoid encoding ASCII in more cases * Make Space a const * Use WriteWhiteSpace extension to eliminate possible virtual call * Use ASCII when encoding constrained character subset * Simplify pragmas * Revert Whitespace rename * Fix using statement order * Remove obsolete serialization support on .NET * Remove obsolete serialization support on .NET (part 2)	2024-05-03 07:36:19 +01:00
Jason Nelson	7f42a8d60c	Reduce Allocations (#821 ) * Introduce ValueStringBuilder * Make NumericTokenizer and PlanTextTokenizer thread-safe * Replace ListPool with ArrayPoolBufferWriter * Seal ITokenizer classes * Eliminate array allocation in Type1ArrayTokenizer * Eliminate array allocation in AcroFormFactory * Eliminate StringBuilder allocation in Page.GetText * Optimize PdfSubpath.ToLines * Eliminate various allocations when parsing CompactFontFormat * Remove unused FromOctalInt helper * Ensure Pdf.Content is not null * Write ASCII values directly to stream (avoiding allocations) * Avoid encoding additional ASCII values * Eliminate allocations in TokenWriter.WriteName * Eliminate allocation in TokenWriter.WriteNumber * Add System.Memory reference to Fonts	2024-04-28 18:55:58 +01:00
Jason Nelson	1ef2e127a6	Improve Code Quality (#818 ) * Make AdobeFontMetricsLigature a struct * Make AdobeFontMetricsCharacterSize a struct * Eliminate allocation in CompactFontFormatData * Pass TransformationMatrix by reference * Seal Encoding classes * Make SubTableHeaderEntry a readonly struct * Introduce StringSplitter and eliminate various allocations in GlyphListFactory * Eliminate a few substring allocations * Use char overload on StringBuilder * Eliminate virtual calls on stringIndex * Optimize ReadHelper ReadLong and ReadInt methods * Add additional readonly annotations to PdfRectangle * Optimize NameTokenizer * Eliminate allocation in TrueTypeGlyphTableSubsetter * Use empty arrays * Eliminate allocations in OperationWriteHelper.WriteHex * Use simplified DecryptCbc method on .NET 6+ * Fix windows-1252 encoding not working on net6.0 and 8.0 * Update int buffers to exact unsigned max length and eliminate additional byte allocation * Fix typo * Remove unused constant	2024-04-18 19:58:40 +01:00
Jason Nelson	6d54355754	Spanify filters	2024-04-12 07:42:19 +01:00
Jason Nelson	f62929eb7c	Spanify work 1 (#812 ) * Add GetString(ReadOnlySpan<byte>) polyfill * Add ArrayPoolBufferWriter * Use Utf8.IsValid & char.IsAsciiHexDigit on NET8.0+ * Optimize HexTokenizer * Eliminate various Tuple allocations * Eliminate List allocation in CrossReferenceTable * Eliminate various allocations in Ascii85Filter * Spanify HexToken * Spanify Palette * Spanify various Cmap & font methods * Spanify Type1Charstring classes * Spanify PdfDocEncoding.TryConvertBytesToString * Spanify OctalHelpers.FromOctalDigits * Add missing braces * React to HexToken.Byte type changes * Cleanup * [Tests] React to span changes * Add ArgumentNullException check back to Type1CharstringDecryptedBytes * Remove unsafe code * Seal HexToken * Avoid allocation when passing an empty span	2024-04-01 09:18:01 +01:00
Jason Nelson	907181d5ce	Bump .net4.7 target to .net4.7.1 and eliminate System.ValueTuple depedency	2024-03-15 13:10:25 +00:00
Jason Nelson	4653671b2d	Set LangVersion to 12	2024-03-15 13:10:25 +00:00
Jason Nelson	6da9c90042	Add .NET 8.0 target	2024-03-15 13:10:25 +00:00
Jason Nelson	8e0500e833	Drop unsupported .NET frameworks	2024-03-15 13:10:25 +00:00
BobLd	acfe8b5fdd	Allow lenient parsing in DictionaryTokenizer and fix #791	2024-03-11 20:01:07 +00:00
BobLd	9f3d2745f6	Change NumericToken from IDataToken<decimal> to IDataToken<double> and fix #765	2024-02-18 14:53:38 +00:00
Eliot Jones	129e69fdf9	Merge pull request #638 from UglyToad/pdfdocencoding Implement PdfDocEncoding for reading string tokens	2023-06-05 22:01:25 +01:00
Eliot Jones	2be4c69c16	set nightly release version also toggles class back to public for nightly versions #538	2023-06-05 21:37:34 +01:00
Eliot Jones	16ac297d10	version 0.1.8	2023-06-05 21:36:00 +01:00
Eliot Jones	6f59bed9a2	use pdfdocencoding when parsing strings	2023-06-04 16:40:43 +01:00
Eliot Jones	fc2f7b9325	add intelligent error recovery for known dictionaries #511 if we're parsing a known dictionary (e.g. all keys are required and there are no additional optional keys) and we encounter an error we provide the possibility to recover by assuming a dictionary end token after all required tokens are consumed if parsing by looking for dictionary end failed due to a format exception	2023-05-21 14:58:39 +01:00
Eliot Jones	35ff13732e	remove completely out of support net 4.5	2023-05-17 20:20:05 +01:00
Eliot Jones	d2944e14e5	change alpha version for nightly builds	2023-01-08 12:01:48 -05:00
Eliot Jones	f2188729a3	#453 handle messed up number format	2022-06-17 20:35:21 -04:00
Eliot Jones	03692cf42f	set version to alpha of 0.1.7 for future nightly builds	2022-04-25 10:06:46 -04:00
Eliot Jones	5597a8f38c	version 0.1.6	2022-04-25 09:22:47 -04:00
Eliot Jones	d7898d851c	add net 6 as a target framework and dual target tests	2022-04-03 14:48:50 -04:00
Eliot Jones	a538aaf0de	bump version so that nightly builds for pre-release have correct version	2022-01-10 12:40:47 +00:00
Eliot Jones	4c36f84a0d	version 0.1.5	2021-09-17 11:16:43 -04:00
Eliot Jones	1b472f6992	handle messed up numbers in content #355	2021-08-11 20:56:06 -04:00
Eliot Jones	16d26effc5	0.1.5-alpha002	2021-05-09 13:05:54 -04:00
Eliot Jones	b930924b9b	0.1.5-alpha001	2021-02-28 13:55:25 -04:00
Plaisted	a0f0c4d6c7	switch to old syntax for build server	2021-01-19 18:53:44 -06:00
Plaisted	4c807691b7	adding in PlainTokenizer to unpooled SB changes	2021-01-19 18:52:14 -06:00
Plaisted	feb6117e1e	fix EOL issues	2021-01-19 18:39:51 -06:00
Plaisted	0b716a759f	adding comment for non-static tokenizer	2021-01-19 18:18:33 -06:00
Plaisted	9bfe69aef1	removing locking	2021-01-19 18:06:50 -06:00
Eliot Jones	237fd96f9e	version 0.1.4	2020-11-29 14:02:42 -04:00
Eliot Jones	ad0fb4ec5b	version 0.1.3	2020-11-15 12:08:37 -04:00
Eliot Jones	58ecfbf963	0.1.3-alpha001	2020-09-04 13:19:03 +01:00

1 2

71 Commits