though a form dictionary should always contain fields (as required by the spec) it is possible for this entry to be missing. in this case we return false for trygetform.
though required by the spec an adobe type 1 font may be missing all width data. in this case we default to empty values and treat it like a normal adobe type 1 font.
the additional stored fields made the struct slower to copy and had an impact on the performance. this moves non-essential fields back to computed properties.
since the rectangle constructor is a hot path any calculations slow the library down considerably. for this reason we move calculations for the following properties into the property getter:
* width (cached)
* height (cached)
* rotation
* area
* centroid
where values are cached they set their backing field once calculated. this won't be thread safe if the same rectangle is accessed on multiple threads.
now that rectangle constructor uses the order [ llx, lly, urx, ury ] and does not apply correction for points constructor parameters must be passed in the correct order. this change fixes the hyperlink factory which was passing them in the wrong order.
in addition the pdfpath bounding box was using left, right, top and bottom to calculate the minimum bounding box. this produced incorrect values now individual path operator bounding boxes are rotated, since for a rotated rectangle top may be less than bottom.
the performance seems to have taken a hit due to these changes however.
if we are reading a cross reference offset which contains a number we assumed it was a stream object. if it's not we now brute-force the entire file looking for an 'xref' token. this should be combined with a search for cross-reference streams and should run when we read neither the numeric token or an 'xref' token but for now this fixes the observed issue.
also adds number of images to the page api to prevent consumers needing to enumerate.
an encoding array in an adobe type 1 font may be missing its declaration ending in 'for', if we encounter 'dup' while looking for the 'for' token we have a special case to go straight into reading the encoding.
also handles a case where the page content stream contains a path-closing operator without any path being active.
fonts can appear as dicitionary objects rather than indirect references in the resource dictionary for a page. if we encounter this we parse and store the font by name for retrieval during content parsing.
in order to recreate the valid bytes for use in decryption it is necessary to know which encoding was used to read a string token. this is because utf16-be encoding has a byte-order marker which should be included in the resulting bytes.
if the %pdf version header comment is offset from the start of the file the cross reference offsets will also be wrong by that amount. this change updates the cross reference location logic to use the offset from the located version header.
some files seem to have the format header preceded by large amounts of junk but this appears to be valid for chrome and acrobat reader. this change ups the amount of nonsense to be read prior to the version header.
also makes parsing of the version header culture invariant which may be related to #85.
it appears charstring definitions in adobe type 1 fonts can omit the charstring name. in this case we set the name to the string value of the charstring index.
since inline image data may contain the end image "ei" token inside the data stream there's no reliable way to actually determine if we've read all the data. for this reason if we end up with an invalid state parsing operations after we've read the end image token we try to recover by reading from the previous token to the next end image token if any. we supply log information to let the consumer know this is what we're doing. it's still not bullet-proof but it should be good enough.
also support negative page rotation values by adding them to a 360 degree rotation so -90 degrees clockwise is 270 degrees clockwise.
updates the information on the github pages site for the new api changes. includes some more seo friendly terms to improve discoverability, more engaging images as well as comprehensive code examples to improve onboarding.
our bounding rectangle values still seem to be wrong for rotated letters. this change adds some test cases for common transformation matrix operations on a rectangle, scale, translate and rotate.
if no widths array entry exists for the character and no font program is present for a true type simple font then use the 0 index glyph width if present in the widths array.