Commit Graph

191 Commits

Author SHA1 Message Date
BobLd
689c127cd9 Get correct text orientation when base line points are equal and fix #741 2024-09-29 16:43:21 +01:00
Arnaud TAMAILLON
cf45dcf6ad Support not finding the Pages dictionary in lenient mode (#897)
* Support not finding the Pages dictionary in lenient mode and support Kids object not referencing a page object in lenient mode

---------

Co-authored-by: Arnaud TAMAILLON <arnaud.tamaillon@younited-credit.fr>
2024-09-01 15:09:48 +01:00
BobLd
affc1ed8b5 Seal and update IFilters to return ReadOnlyMemory<byte> (#843)
* Avoid ToArray() in memoryFactory

* Seal and update IFilters to return ReadOnlyMemory<byte>

* Fix filter tests

* Seal and update IFilters to return ReadOnlyMemory<byte>
2024-06-08 06:16:09 +01:00
Jason Nelson
da44e1a540 Improve code quality (#825)
* Avoid encoding ASCII in more cases

* Make Space a const

* Use WriteWhiteSpace extension to eliminate possible virtual call

* Use ASCII when encoding constrained character subset

* Simplify pragmas

* Revert Whitespace rename

* Fix using statement order

* Remove obsolete serialization support on .NET

* Remove obsolete serialization support on .NET (part 2)
2024-05-03 07:36:19 +01:00
Jason Nelson
7f42a8d60c Reduce Allocations (#821)
* Introduce ValueStringBuilder

* Make NumericTokenizer and PlanTextTokenizer thread-safe

* Replace ListPool with ArrayPoolBufferWriter

* Seal ITokenizer classes

* Eliminate array allocation in Type1ArrayTokenizer

* Eliminate array allocation in AcroFormFactory

* Eliminate StringBuilder allocation in Page.GetText

* Optimize PdfSubpath.ToLines

* Eliminate various allocations when parsing CompactFontFormat

* Remove unused FromOctalInt helper

* Ensure Pdf.Content is not null

* Write ASCII values directly to stream (avoiding allocations)

* Avoid encoding additional ASCII values

* Eliminate allocations in TokenWriter.WriteName

* Eliminate allocation in TokenWriter.WriteNumber

* Add System.Memory reference to Fonts
2024-04-28 18:55:58 +01:00
Jason Nelson
1ef2e127a6 Improve Code Quality (#818)
* Make AdobeFontMetricsLigature a struct

* Make AdobeFontMetricsCharacterSize a struct

* Eliminate allocation in CompactFontFormatData

* Pass TransformationMatrix  by reference

* Seal Encoding classes

* Make SubTableHeaderEntry a readonly struct

* Introduce StringSplitter and eliminate various allocations in GlyphListFactory

* Eliminate a few substring allocations

* Use char overload on StringBuilder

* Eliminate virtual calls on stringIndex

* Optimize ReadHelper ReadLong and ReadInt methods

* Add additional readonly annotations to PdfRectangle

* Optimize NameTokenizer

* Eliminate allocation in TrueTypeGlyphTableSubsetter

* Use empty arrays

* Eliminate allocations in OperationWriteHelper.WriteHex

* Use simplified DecryptCbc method on .NET 6+

* Fix windows-1252 encoding not working on net6.0 and 8.0

* Update int buffers to exact unsigned max length and eliminate additional byte allocation

* Fix typo

* Remove unused constant
2024-04-18 19:58:40 +01:00
Jason Nelson
c3a2e8c08f Rename TryGetMemory -> TryGetBytesAsMemory 2024-04-12 07:42:19 +01:00
Jason Nelson
a5e9b438cc Eliminate allocation in BasePageFactory 2024-04-12 07:42:19 +01:00
Jason Nelson
49c155cca2 Add IPdfImage.RawBytes 2024-04-12 07:42:19 +01:00
Jason Nelson
6d54355754 Spanify filters 2024-04-12 07:42:19 +01:00
Jason Nelson
f62929eb7c Spanify work 1 (#812)
* Add GetString(ReadOnlySpan<byte>) polyfill

* Add ArrayPoolBufferWriter

* Use Utf8.IsValid & char.IsAsciiHexDigit on NET8.0+

* Optimize HexTokenizer

* Eliminate various Tuple allocations

* Eliminate List allocation in CrossReferenceTable

* Eliminate various allocations in Ascii85Filter

* Spanify HexToken

* Spanify Palette

* Spanify various Cmap & font methods

* Spanify Type1Charstring classes

* Spanify PdfDocEncoding.TryConvertBytesToString

* Spanify OctalHelpers.FromOctalDigits

* Add missing braces

* React to HexToken.Byte type changes

* Cleanup

* [Tests] React to span changes

* Add ArgumentNullException check back to Type1CharstringDecryptedBytes

* Remove unsafe code

* Seal HexToken

* Avoid allocation when passing an empty span
2024-04-01 09:18:01 +01:00
Jason Nelson
a412a239be Enable nullable annotations (#803)
* Enable nullable annotations

* Remove unused Jetbrain annotations

* Ensure system using statements are first

* Improve nullability annotations

* Annotate encryptionDictionary is non-null when IsEncrypted is true

* Disable nullable for PdfTokenScanner.Get

* Improve nullability annotations for ObjectLocationProvider.TryGetCached

* Revert changes to RGBWorkingSpace

* Update UglyToad.PdfPig.Package with new framework targets (fixes nightly builds)
2024-03-17 18:51:40 +00:00
Jason Nelson
95f0459900 Prefer is null to == null
ensures that an equals overload isn't use, and we don't compare structs
2024-03-16 12:37:51 +00:00
Jason Nelson
9859c2672b Use switch expressions 2024-03-16 12:37:51 +00:00
Jason Nelson
834fb350a3 Use Array.Empty 2024-03-15 13:10:25 +00:00
BobLd
ac0276f1bf Use double in fonts instead of decimals and tidy up remaining decimals 2024-03-06 20:53:11 +00:00
BobLd
3bdc9498de Use double for pdf version instead of decimal 2024-02-14 21:09:16 +00:00
BobLd
04fc8d696d Use double instead of decimal in IPdfImage's Decode property 2024-01-20 18:52:26 +00:00
Richard Webb
83519b27b1 Add a DynamicallyAccessedMembers attribute to AddPageFactory<TPage, TPageFactory> 2024-01-11 19:07:35 +00:00
BobLd
3a96af3dcd Add GetPage<TPage> and AddPageFactory<TPage and TPageFactory> methods 2023-11-18 16:46:23 +00:00
BobLd
3fbf8aaa6c Abstract away PageFactory main logic into BasePageFactory 2023-11-18 16:46:23 +00:00
BobLd
63096de210 Add IPageFactory to the public API, remove InternalParsingOptions 2023-10-25 20:03:02 +01:00
BobLd
ba865b340e Make IResourceStore part of the public API and pass InternalParsingOptions to the ResourceStore constructor 2023-10-22 19:16:41 +01:00
BobLd
7ab3a6a2cd Add more classes to the Public API (#717)
* Made interfaces and classes public

* Made NamedDestinations public and fixed namespace

* Fixed NamedDestinationsProvider namespace

* Expose more internal classes as public

* Update PublicApiScannerTests

* Make setters internal
2023-10-22 17:34:47 +01:00
BobLd
c6e2de1b0c Make all structs readonly when possible 2023-10-18 23:44:11 +01:00
BobLd
fe0e4db419 Properly handle page rotation for crop box and media box and fix #665 2023-10-18 21:27:56 +01:00
Eliot Jones
9d2b3f914d account for skipmissingfonts in positioned text #637 2023-06-04 11:47:30 +01:00
Eliot Jones
fba1cbc13c skip missing objects if skip fonts is true #298
if skip missing fonts is set we want to read the file
as much as possible so we will also skip any missing
xobjects like images, forms or postscript code
2023-05-27 10:46:29 +01:00
Eliot Jones
20d3cc9066 tidy up during investigation #600 2023-05-23 19:22:00 +01:00
BobLd
a4284aa5a8 Implement Pattern color space and Shading, seal IColor classes, stop using decimal in colors and use double instead 2023-05-18 20:24:55 +01:00
Yufei Huang
3898f09a5f Pdf merger support copy links 2023-04-22 13:54:31 +01:00
BobLd
42e4171c31 Fix integration tests for #579 (3) 2023-04-13 19:27:54 +01:00
BobLD
b8a98fbed2 Properly implement color spaces 2023-04-12 07:25:09 +01:00
mvantzet
0e39bc0b76 Annotations named destinations (#579)
* Add Named Destinations to Catalog so that bookmarks and links can access
them.

The named destinations require access to page nodes, so created Pages object
that is made using PagesFactory (which contains the page-related code from
Catalog).

* Further implementation of destinations:
- Implement NamedDestinations in AnnotationProvider, so that we can look
  up named destinations for annotations and turn them into explicit destinations.
  Reused existing code inside BookmarksProvider to get destinations/actions.
- Added GoToE action
- According to the PDF reference, destinations are also required for
  external destinations and hence for ExternalBookmarkNode. This allows us
  to push up DocumentBookmarkNode.Destination to BookmarkNode.

* Implemented stateful appearance streams and integration test

* Added AppearanceStream to public API because it is used in the (public)
Annotation constructor

* After #552, must push down ExplicitDestination do DocumentBookmarkNode since it
does not apply to UriBookmarkNode.

* Added actions, which fits the PDF model better and works well with the
new bookmarks code (after PR #552)

* Rename Action to PdfAction + removed unused using in ActionProvider.cs

---------

Co-authored-by: mvantzet <mark@radialsg.com>
2023-04-10 17:14:14 +01:00
mvantzet
76ce251a6e Merge branch 'UglyToad:master' into PageSizesAndRotation 2023-03-17 19:35:55 +01:00
mvantzet
a07fdb8d45 Follow suggestion by @BobLd, added 2 more test cases for if we want to
support more lenient page size parsing in the future.
2023-03-14 12:33:21 +01:00
mvantzet
0413f3f1bf Fix related to page sizes / rotation / coordinate transformations (issue 560):
The initial transformation matrix was incorrect, as it translated by the cropbox width/height
instead of by the cropbox left/bottom offsets. Also, it did not translate the results back into
the 1st quadrant so that (0,0) would (again) be the lower left corner origin for the cropped area.
Added unit tests in new file ContentStreamProcessorTests.

EFFECTIVE CHANGES:

- The coordinates used for letters etc. are different now for rotated and/or cropped pages,
  but as those were not very consistent anyway this is probably OK.

- The Page Size (A4, A3, Custom, etc.), Width and Height are now determined by the CropBox,
  not by the MediaBox; the CropBox ultimately determines what you see on screen and is printable.
  If no cropbox is defined in the PDF, it is set to the MediaBox; so in that case it is
  backwards compatible with the old code.

- The Page MediaBox and CropBox properties are no longer rotated according to Page.Rotation.
  The Page Width and Height do take rotation into account (kept it backward compatible).
2023-03-09 16:42:09 +01:00
mvantzet
3a0a6e1411 Resolving page sizes did not work when the page orientation was landscape, or when
side lengths were not integer or off by one. Added unit tests.
2023-03-09 16:09:14 +01:00
BobLD
c56705d4ff Implement pdf functions and add type 0, 2 and 4 function tests 2023-03-08 18:59:16 +00:00
mvantzet
06253966e4 Added Letter properties RenderingMode, StrokeColor, FillColor and added those as mandatory
constructor arguments. Kept property Color, which contains either StrokeColor (if rendering mode is Stroke)
or FillColor (for all other rendering modes).
In PdfPageBuilder opted for default text rendering mode "Fill" which seems like a sensible default.
2023-01-13 12:35:25 +01:00
Eliot Jones
37e31c40ae Merge pull request #522 from fnatzke/master
Fix #514 Print Character with ZapfDingbats font
2023-01-08 11:52:15 -05:00
Fred Natzke
8f13e2f11c Fix #375 Add checkmark from ZapfDingbats using unicode character. Letters extracted have unicode values. 2022-12-17 17:39:15 +10:00
Eliot Jones
c8874c5984 #483 make skip missing fonts even more resilient to nonsense files 2022-12-11 16:18:09 -05:00
Eliot Jones
2aed996319 Merge pull request #517 from fnatzke/master
Fixes for Issue#512, 516 and 519
2022-12-09 09:39:55 -05:00
Eliot Jones
95df15996b add ability to rotate page by number of degrees, make builder fluent 2022-12-09 08:44:56 -05:00
Fred Natzke
29adece983 Original #516 fix causes Tests to fail. Alternative to use pagesByNumber?.Count in Catalog for discovered pages. Some other unrelated tests failed due to source using single rather than two character newlines. Changed to test string itself for '\r' rather than environment is Unix 2022-11-30 11:10:52 +10:00
Fred Natzke
a7f64ec64b Fix Issue 516. Page Dictionary Count field has incorrect page count. Compare with PageTree children count. 2022-11-30 09:25:23 +10:00
Eliot Jones
e2246a88bb #482 add skip missing fonts option and pass parsing options to content stream processor
this doesn't fix the reported issue since the pdf itself is corrupted on page 8 however it will
allow recovery in some scenarios where text content isn't important.

also adds more informative error when stream unintentionally passed with non zero offset
2022-10-09 13:44:05 -04:00
Eliot Jones
c643facee0 #481 skip resource entry if null token 2022-10-09 13:06:04 -04:00
grinay
19962af011 Fix page number order.
For case when root nod has reference to page the order will be incorrect.
The case if root node has reference
[2 0 R 3 0 R 10 0 R]
Where 2 0 R is intermediate node containing page 1,2,3

Where 3 0 R is intermediate node containing page 4,5,6
Where 10 0 R is page 7

without that fix 7 page will in array as page 1
2022-07-29 18:27:56 +08:00