I decided to move to his own method, the part that create the page node. This allowed me to visualize better, from where I was suppose to get the correct reference.
lenient parsing gives us more code to maintain for no real benefit, parsing should always be as lenient as possible. remove the flag from some of the font code.
when the close parenthesis is unbalanced and precedes a line break followed by '/' or '>' we assume the bracket to be unbalanced and finish reading the string.
each time we want to up the version number of the nuget package it involves opening every csproj and manually updating the version. this script updates the version for all projects, except the test project, in the 'src' folder.
we add a try/catch to the direct object finder's tryget method so it returns false rather than throwing.
if we have an acro-form reference in the catalog but no corresponding object in the document we instead scan all objects in the document to find form fields and reconstruct the acro-form dictionary.
previously we checked the offset was not inside the table (correct thing to check), however this is only a special case of the more general issue (cross reference offsets are wrong). we move handling for this into the pdf token scanner. if we attempt to read an object at an offset and it fails we brute force the entire file to find correct offsets. we also needed to add handling to make sure we don't attempt to use stream length tokens if we're brute-forcing since we can't look up indirect references for length.
when brute force searching for the start of the cross-reference table (xref) we read 5 byte buffers, previously if the 'x' of 'xref' was the first character of the buffer we skipped it. this checks when 'x' is the first character of the buffer.
fixes the recorded offset when an endimage is recovered from the first time. it was off by one so if the subsequent end image was also the wrong tag then the second attempt at recovery failed.
also allows recovery when other tags appear after an endimage as long as they're not block ending operations (end text, perhaps pop/push in future).
when the character is not defined and the corresponding '.notdef' glyph isn't included in a truetype font we now default to a zero width character. it might be that we should use the default/missing width instead but this will work ok for most use-cases.
the individual cff parser uses a cff dictionary reader inside it which has a per-instance operands list, for this reason it is not thread-safe and cannot be shared. this change creates a new individual font parser for each call to the top-level cff parser.
where the value is 28 the next two bytes indicate a short, not a 16 bit two's complement number, apparently, or i've misunderstood what the two's complement bit is about...
the byte value of 28 indicates the next 2 bytes are a 16 bit two's complement number rather than just a short. this changes the calculation to generate the two's complement value correctly.
the brute force searcher offsets were off by one. this change means the offset returned is now aligned with the object number in the object number/generation/operator triple.