PdfPig

lsm/PdfPig

mirror of https://github.com/UglyToad/PdfPig.git synced 2025-07-18 09:13:03 +08:00

Author	SHA1	Message	Date
Fred Natzke	324de1da67	Fix 540 Copy page with inline image.	2023-01-16 14:27:04 +10:00
Eliot Jones	57e9acbc12	post merge tidy up	2023-01-08 12:00:35 -05:00
mvantzet	6ef6c4d780	Added a PdfTextRemover utility that uses a NoTextTokenWriter, to output PDFs without text contents. Also added unit tests to test: - If we can use a custom ITokenWriter with PdfDocumentBuilder - If removing text works.	2022-12-20 21:31:15 +01:00
Jacob O'Toole	715e73ace8	Add SetTextRenderingMode method to PdfPageBuilder	2022-02-15 11:46:58 +00:00
Michael Plaisted	79b26bb434	- strip link annotations to avoid corrupt links to non-existant pages - fix issue with acrobat not liking same content stream on multiple pages	2021-08-22 18:06:09 -05:00
Michael Plaisted	73b7bb61bc	AddPage fixes for streams incorrectly being copied to page dict, add inheritance for media/crop box and rotation	2021-08-02 11:17:49 -05:00
Plaisted	6e1cf89cf9	clean up pagebuilder, switch merger to use pdfdocumentbuilder	2021-02-08 12:37:09 -06:00
Plaisted	c6ed29bda4	cleanup stream writing to only write multiple when needed	2021-02-07 10:37:31 -06:00
Plaisted	442fa8fb6d	create page tree for builder to help with large pdfs	2021-02-06 20:35:01 -06:00
Plaisted	1db481164c	perf improvement for copying lots of pages from large documents	2021-02-06 18:04:13 -06:00
Plaisted	92f9af613f	more build system c# version fixes	2021-02-06 15:31:58 -06:00
Plaisted	44ee6d394f	cleanup font usage, fix some build system issues with older c# version	2021-02-06 15:13:22 -06:00
Plaisted	7f42ad0af9	refactored previous work to fit pr #250	2021-02-06 12:24:53 -06:00
InusualZ	7126564eef	Allow to copy pages from another document This is a naive implementation, because if you copy multiple pages from the same document, the recipient document would be bloated with duplicated resources	2020-12-20 19:13:19 +00:00
InusualZ	ba5bc1f031	Allow to have multiple content stream in a page You can create new content stream (NewContentStreamAfter, NewContentStreamBefore) and select (SelectContentStream) which one we are editing at the moment	2020-12-19 10:41:36 +00:00
Eliot Jones	d8e0263ec7	#215 support filling rectangles on pdf builder	2020-11-17 17:00:13 -04:00
Eliot Jones	9b7554c973	#203 enable utf16 be strings to be written to the document builder	2020-08-27 09:06:14 +01:00
Eliot Jones	52104b6580	support conversion of pdf format images to png	2020-08-21 13:12:01 +01:00
Eliot Jones	8860e29191	tidy up png support	2020-08-21 12:11:27 +01:00
Eliot Jones	5ac7a957d0	add initial png support	2020-08-21 10:50:17 +01:00
Eliot Jones	635ae13a77	add pdf/a2-a support	2020-04-16 20:50:21 +01:00
Eliot Jones	cf46230c05	#127 add pdf/a2-b compliance to the builder	2020-04-04 17:49:27 +01:00
Eliot Jones	7f1bf094bc	#127 pdf/a-1a compliance adds struct tree and markinfo dictionaries to support pdf/a-1a compliance.	2020-03-29 17:55:02 +01:00
Eliot Jones	5f45ee53bd	#127 add basic pdf/a-1b level compliance to the document builder adds color profiles/output intents and an xmp metadata stream to the document in order to be compliant with pdf/a-1b (basic). this compliance level is toggled on the builder since it will generate larger files and set to 'off/none' by default. pdf/a documents are also not able to use standard fonts so using a font when the compliance level is not none will throw.	2020-03-29 16:43:52 +01:00
Eliot Jones	98bcc16e11	fix width and height order in jpeg parsing height is before width, incorrect order caused adobe reader to draw image strangely.	2020-03-16 19:32:57 +00:00
Eliot Jones	7212b9e38c	enable re-use of jpeg images between or within pages returns a reference to the added image object when calling addjpeg so that it can be shared between or within pages meaning the image is only written to the output file once but can appear multiple times. this image doesn't seem to be displaying correctly in adobe reader.	2020-03-16 19:32:57 +00:00
Eliot Jones	19462d79f0	add support for jpeg images in pdf document builder since jpegs can be trivially embedded in pdf documents without changes to the data stream this is the first image format we will support. currently this is a naive approach which doesn't share an image resources between pages. ideally we will either de-duplicated images when added, return a re-usable key once an image is added, or both.	2020-03-16 19:32:57 +00:00
Eliot Jones	24c5cbea4b	support custom page sizes for document builder #147 page size custom is not supported for the document builder so a new overload which supports user defined page sizes is provided.	2020-03-07 16:48:19 +00:00
Eliot Jones	a6541f1cfc	fix test references update references for unit tests to reference new core and fonts projects. all tests except the public api scanner tests now run successfully.	2020-01-04 22:56:41 +00:00
Eliot Jones	90f8f97bfd	add simple test case for subsetting issue #98 adds a single test which proves that the invalid truetype subsetting with roboto is related to our font subsetting code, since we can subset the same text correctly with windows calibri we must be reading roboto incorrectly.	2020-01-04 10:27:07 +00:00
Eliot Jones	fe315be2ef	fix truetype subsetting for composite glyphs #98 each glyph included in the subset must count towards the number of glyphs, the horizontal metrics and the maximum profile table for the output truetype font. each glyph must also lie on a 4 byte boundary in the output file. the output file is valid for the windows system font calibri containing accented characters but the roboto subset files are still invalid. moves all subsetting related classes into their own namespace which will be made public.	2020-01-04 10:27:07 +00:00
Eliot Jones	336947db73	add writing methods to truetype tables #98 since we have verified the problem with the characters not appearing in acrobat reader isn't the checksum (other files also have invalid checksums but work) it seems likely the issue is with the os/2 table. this change moves the logic for writing out the cmap table, the format 6 cmap sub-table, truetype table headers and the os/2 table into the classes themselves. now we can write an os/2 table and we've tested that the output matches the input, we can overwrite the os/2 table in order to work out which of the os/2 errors is causing our font to be invalid. the writeable interface should be added to more and more parts of the codebase so that writing, editing and document creation become first class citizens rather than hardcoded additions. this change also adds the macroman (1,0) cmap subtable to edited fonts so that it is present for consumers which expect it.	2020-01-04 10:27:07 +00:00
Eliot Jones	9fff879bd4	fix tests by using custom equality comparers since we now round glyph widths for truetype fonts in the widths array of the pdf some values are out by a very small amount from the expected value. since we don't care about such fractional inaccuracy we use a custom comparer for these tests.	2020-01-04 10:27:07 +00:00
Eliot Jones	f319e7f4b5	adds per character byte mapping to truetype #98 this starts to add logic for per-character mapping of unicode characters to byte values for truetype fonts in the pdf document builder. in order to support unicode characters outside the 0-255 range when creating new pdf documents without using composite fonts, we need to map values outside these range into this range. to do this we start at 1 and map each character we encounter to the next code, up to a maximum of 255. we provide a custom tounicode cmap in the font dictionary which maps these byte values, 0-255, back to unicode code points (short). we also provide a custom firstchar, lastchar and widths array for the font mapping just the values we use. since fonts no longer contain just the latin character set the font descriptor enum is set to have the symbolic flag set. this means values will be looked up in either the mac-roman (1, 0) or windows-symbol (3, 0) cmap tables (these cmap tables are distinct from cmap tables in the pdf file) inside the actual truetype font bytes. this means the currently generated font file is invalid, because while the widths array and tounicode cmap return the correct values the actual font itself returns whatever values where in those positions before the remapping occurred. in order to fix this we will need to override the windows-symbol cmap contained in the underlying truetype font to match our mapping. this will be a lot of work and involve significant rewriting of the font file itself, in order to preserve checksum integrity.	2020-01-04 10:27:07 +00:00
Eliot Jones	87528199c6	use byte values when showing text for document builder #98 when writing text content the current show text operator was just writing the unicode string value and hoping it produced the correct value in the resulting document despite the values being consumed in a different encoding. this change adds a method to retrieve the corresponding byte value for a unicode character and uses that to write a hex show text operator to the page content. this is only implemented for standard14 fonts in this change. for standard14 fonts we look up the corresponding name for the unicode value from the adobe glyph list. once we find the corresponding glyph name we look up the code value in the encoding we have chosen when writing standard14 fonts (macromanencoding). this value is then the byte value written to the show text operator. if the value does not appear in any of the lookups we throw a not support exception. this also adds a test case which will still fail for czech characters in a truetype font, the issue reported in #98.	2019-12-28 14:42:27 +00:00
Eliot Jones	935d182888	use doubles where calculations are being run	2019-12-24 12:22:17 +00:00
Eliot Jones	abd9212862	fix document creation behaviour for multiple pages	2018-12-30 14:39:49 +00:00
Eliot Jones	4d5518a599	move annotations to experimental access, support changing color state for document creation and update readme	2018-12-30 14:12:04 +00:00
Eliot Jones	d9052e1388	update readme and document public api for document creation	2018-12-28 16:55:46 +00:00
Eliot Jones	9abf6e226c	#21 use win ansi encoding	2018-12-18 18:30:51 +00:00
Eliot Jones	39786ac00a	#21 add a test for accented characters and fill in more writing methods for content stream operators the output is currently incorrect for accented characters	2018-12-14 18:33:01 +00:00
Eliot Jones	924fc7b37f	#21 support writing lines, curves and rectangles. add documentinformation to output. rename characterpath	2018-12-12 00:09:15 +00:00
Eliot Jones	29f9885fc4	#21 fix widths for system font baskerville old face	2018-12-11 21:29:39 +00:00
Eliot Jones	d1722dd23e	#21 use built in encoding to populate widths array	2018-12-11 20:03:18 +00:00
Eliot Jones	dc5d2b8fdd	#21 further changes to truetype to get accurate information out for creating documents	2018-12-08 18:04:02 +00:00
Eliot Jones	3a4b7b79d1	#21 change dictionarytoken to use explicit key type, finish os/2 table for truetype, first file creation using embedded truetype font	2018-12-08 14:38:27 +00:00
Eliot Jones	d6a896dcb0	#21 enable document creation using standard 14 font to test output	2018-12-02 16:14:55 +00:00
Eliot Jones	06bee446d8	#21 create first actual pdf document based on minimal example. writer for tokens. bump language version	2018-11-28 21:54:06 +00:00
Eliot Jones	eecb871ed1	#21 quick draft of minimal writing logic requirements	2018-11-27 20:00:38 +00:00
Eliot Jones	d5a50f2236	#8 tidy up truetype font internally. some more work on a potential document creation api	2018-11-25 13:56:27 +00:00

50 Commits