This is a naive implementation, because if you copy multiple pages from the same document, the recipient document would be bloated with duplicated resources
You can create new content stream (NewContentStreamAfter, NewContentStreamBefore) and select (SelectContentStream) which one we are editing at the moment
returns a reference to the added image object when calling addjpeg so that it can be shared between or within pages meaning the image is only written to the output file once but can appear multiple times.
this image doesn't seem to be displaying correctly in adobe reader.
since jpegs can be trivially embedded in pdf documents without changes to the data stream this is the first image format we will support. currently this is a naive approach which doesn't share an image resources between pages. ideally we will either de-duplicated images when added, return a re-usable key once an image is added, or both.
when writing a new pdf document we now use the flate filter to compress the page content stream. we also move letters in the same word into the same showtext operation.
this starts to add logic for per-character mapping of unicode characters to byte values for truetype fonts in the pdf document builder. in order to support unicode characters outside the 0-255 range when creating new pdf documents without using composite fonts, we need to map values outside these range into this range. to do this we start at 1 and map each character we encounter to the next code, up to a maximum of 255. we provide a custom tounicode cmap in the font dictionary which maps these byte values, 0-255, back to unicode code points (short).
we also provide a custom firstchar, lastchar and widths array for the font mapping just the values we use.
since fonts no longer contain just the latin character set the font descriptor enum is set to have the symbolic flag set. this means values will be looked up in either the mac-roman (1, 0) or windows-symbol (3, 0) cmap tables (these cmap tables are distinct from cmap tables in the pdf file) inside the actual truetype font bytes. this means the currently generated font file is invalid, because while the widths array and tounicode cmap return the correct values the actual font itself returns whatever values where in those positions before the remapping occurred.
in order to fix this we will need to override the windows-symbol cmap contained in the underlying truetype font to match our mapping. this will be a lot of work and involve significant rewriting of the font file itself, in order to preserve checksum integrity.
when writing text content the current show text operator was just writing the unicode string value and hoping it produced the correct value in the resulting document despite the values being consumed in a different encoding. this change adds a method to retrieve the corresponding byte value for a unicode character and uses that to write a hex show text operator to the page content. this is only implemented for standard14 fonts in this change.
for standard14 fonts we look up the corresponding name for the unicode value from the adobe glyph list. once we find the corresponding glyph name we look up the code value in the encoding we have chosen when writing standard14 fonts (macromanencoding). this value is then the byte value written to the show text operator. if the value does not appear in any of the lookups we throw a not support exception.
this also adds a test case which will still fail for czech characters in a truetype font, the issue reported in #98.
- In the Letter class:
- Renaming 'Location' to 'StartBaseLine' and adding 'EndBaseLine' for better localisation of the letter ('Location' is also kept).
- Adding TextDirection.