lenient parsing gives us more code to maintain for no real benefit, parsing should always be as lenient as possible. remove the flag from some of the font code.
we add a try/catch to the direct object finder's tryget method so it returns false rather than throwing.
if we have an acro-form reference in the catalog but no corresponding object in the document we instead scan all objects in the document to find form fields and reconstruct the acro-form dictionary.
previously we checked the offset was not inside the table (correct thing to check), however this is only a special case of the more general issue (cross reference offsets are wrong). we move handling for this into the pdf token scanner. if we attempt to read an object at an offset and it fails we brute force the entire file to find correct offsets. we also needed to add handling to make sure we don't attempt to use stream length tokens if we're brute-forcing since we can't look up indirect references for length.
when brute force searching for the start of the cross-reference table (xref) we read 5 byte buffers, previously if the 'x' of 'xref' was the first character of the buffer we skipped it. this checks when 'x' is the first character of the buffer.
fixes the recorded offset when an endimage is recovered from the first time. it was off by one so if the subsequent end image was also the wrong tag then the second attempt at recovery failed.
also allows recovery when other tags appear after an endimage as long as they're not block ending operations (end text, perhaps pop/push in future).
when the character is not defined and the corresponding '.notdef' glyph isn't included in a truetype font we now default to a zero width character. it might be that we should use the default/missing width instead but this will work ok for most use-cases.
the brute force searcher offsets were off by one. this change means the offset returned is now aligned with the object number in the object number/generation/operator triple.
multiple master fonts are an extension of the adobe type 1 font format. we don't have any special case handling for them so for now we default to attempting to use the adobe type 1 font handler. it may be that we need some special parsing logic but the test file using the mmtype1 fonts didn't actually specify any font bytes so we can't check.
decimal numbers were dependent on the current thread culture for the output file. this meant values like '70.679' were output as '70,679' for cultures using a comma rather than period separator for the floating point (i.e. the whole world). this resulted in the file displaying incorrectly.
though a form dictionary should always contain fields (as required by the spec) it is possible for this entry to be missing. in this case we return false for trygetform.
though required by the spec an adobe type 1 font may be missing all width data. in this case we default to empty values and treat it like a normal adobe type 1 font.
now that rectangle constructor uses the order [ llx, lly, urx, ury ] and does not apply correction for points constructor parameters must be passed in the correct order. this change fixes the hyperlink factory which was passing them in the wrong order.
in addition the pdfpath bounding box was using left, right, top and bottom to calculate the minimum bounding box. this produced incorrect values now individual path operator bounding boxes are rotated, since for a rotated rectangle top may be less than bottom.
the performance seems to have taken a hit due to these changes however.