if skip missing fonts is set we want to read the file
as much as possible so we will also skip any missing
xobjects like images, forms or postscript code
- Pass in the initial matrix to the annotation provider, so that it can return
the correct rectangles / quad points.
- Made a change / extensions to the Annotation class:
- ModifiedDate is now a DateTimeOffset instead of unparsed string.
If the string is invalid, ModifiedDate is set to the default value.
- Added lookup for the "appearance streams"; all the annotations should have
a "N" (normal) appearance, and optionally have a "R" (roll-over/hover)
and "D" (down/click) appearance. Did not expose the actual stream objects,
but added a flag indicating the existence of "R" / "D". At some point
we can consider doing something with the appearances.
- Changed signature of GetInitialMatrix / ContentStreamProcessor constructor
from PdfRectangle back to what it was earlier, namely MediaBox and CropBox,
to prevent accidentally mixing the two up in the caller.
The initial transformation matrix was incorrect, as it translated by the cropbox width/height
instead of by the cropbox left/bottom offsets. Also, it did not translate the results back into
the 1st quadrant so that (0,0) would (again) be the lower left corner origin for the cropped area.
Added unit tests in new file ContentStreamProcessorTests.
EFFECTIVE CHANGES:
- The coordinates used for letters etc. are different now for rotated and/or cropped pages,
but as those were not very consistent anyway this is probably OK.
- The Page Size (A4, A3, Custom, etc.), Width and Height are now determined by the CropBox,
not by the MediaBox; the CropBox ultimately determines what you see on screen and is printable.
If no cropbox is defined in the PDF, it is set to the MediaBox; so in that case it is
backwards compatible with the old code.
- The Page MediaBox and CropBox properties are no longer rotated according to Page.Rotation.
The Page Width and Height do take rotation into account (kept it backward compatible).
constructor arguments. Kept property Color, which contains either StrokeColor (if rendering mode is Stroke)
or FillColor (for all other rendering modes).
In PdfPageBuilder opted for default text rendering mode "Fill" which seems like a sensible default.
this doesn't fix the reported issue since the pdf itself is corrupted on page 8 however it will
allow recovery in some scenarios where text content isn't important.
also adds more informative error when stream unintentionally passed with non zero offset
fix 're' operator to reflect documentation
Update ContentStreamProcessor with fill, stroke and clip operations
Throw errors when currentPosition is null in PdfSubpath
an encoding array in an adobe type 1 font may be missing its declaration ending in 'for', if we encounter 'dup' while looking for the 'for' token we have a special case to go straight into reading the encoding.
also handles a case where the page content stream contains a path-closing operator without any path being active.
since inline image data may contain the end image "ei" token inside the data stream there's no reliable way to actually determine if we've read all the data. for this reason if we end up with an invalid state parsing operations after we've read the end image token we try to recover by reading from the previous token to the next end image token if any. we supply log information to let the consumer know this is what we're doing. it's still not bullet-proof but it should be good enough.
also support negative page rotation values by adding them to a 360 degree rotation so -90 degrees clockwise is 270 degrees clockwise.