previously if no matching unicode was found for a character code we would return a null letter. instead we now map from the character code directly to a character. this seems to work for most documents, except where there are ligatures, e.g. fi or ff, but is still better than not returning anything.
point size was previously only calculated based on the transformation matrix but now uses the transformation matrix, the rotation matrix and the font matrix values. the calculated value still seems unlikely to be correct so it is exposed using the page's experimental access for now, rather than as a public getter.
- begin adding support for extended graphics state (the 'gs' operator) including setting the font #39.
- apply page level rotation to the glyph bounding box and width to get correct glyph sizes #41.
- wrap page rotation in a value type to ensure the value is restricted to right angle rotations and provide convenience members #42.
- fix bug where system font finder never worked for truetype fonts because it began reading the file from the wrong offset.
we need to apply rotation to the crop and media box and therefore find the correct width and height. but for now correctly deriving the rotation from the page tree should help consumers.
the revision 5 and 6 encryption algorithms specify the presence of additional encryption material named 'oe' and 'ue'. it turns out this is not always required so will now default to null if not present. this also adds support for those values being in hex rather than normal string format.
tidies up some commenting on the xynode class, moves public methods below constructors and adds xy to the resharper list of abbreviations for the solution.
cid fonts may contain a registry, ordering and supplement to identify the font. we were checking for string registry and ordering tokens but failing on hex tokens.
for encrypted documents we now decrypt hex data.
using an online tool to encrypt a simple document with aes-128 seems to add the dictionary type cryptalgorithm rather than cryptfilter. i couldn't find any references to cryptalgorithm in the spec or pdfbox but it seems to work ok when treated as equivalent to cryptfilter.
there are situations where the string derived from a hex token has a different length to the underlying bytes, for example if the hex token contains the '\0' byte, the encryption algorithm needs to use the raw bytes rather than the 'stringified' bytes. this change passes raw bytes for hex tokens for both the user and owner password keys.
previously if the cross reference did not exist at exactly the provided offset we'd immediately throw, now we assume we can read a few more tokens to find the xref table or stream start. this won't work in the case where the provided offset is past the start of the table or nowhere near the table but in those cases there's not much we can do. there's some more work to do to provide a fallback xref parser which finds the xref tables and streams using a brute-force scan of the whole document.