PdfPig

lsm/PdfPig

mirror of https://github.com/UglyToad/PdfPig.git synced 2025-10-15 19:54:52 +08:00

Author	SHA1	Message	Date
Eliot Jones	efc258b0f0	use tokenscanner when converting array to rectangle an arrray of 4 items representing a rectangle may define its values as indirect references. when converting to a rectangle we pass a pdf token scanner to resolve any indirect references.	2020-01-13 10:20:08 +00:00
Eliot Jones	0b048fde57	handle eof further back in file an %%eof for a pdf file may appear further back than the last 1024 bytes. this change doubles the search range. it also handles an empty differences array being defined for a font encoding. we also remove the old approach to dependency injection from the code since we are now favouring static classes where possible.	2020-01-07 11:48:09 +00:00
Eliot Jones	bbde38f656	move tokenizers to their own project since both pdfs and Adobe Type1 fonts use postscript type objects, tokenization is needed by the main project and the fonts project	2020-01-05 10:40:44 +00:00
Eliot Jones	74774995d6	complete move of truetype, afm and standard14 fonts the 3 font types mentioned are moved to the new fonts project, any referenced types are moved to the core project. most truetype classes are made public #8.	2020-01-04 22:39:13 +00:00
Eliot Jones	7c0ef111ea	move classes to new projects to make the project more useful and expose more usable classes we're rearchitecting in the following way. code used to read fonts from external file formats like truetype, adobe font metrics (afm) and adobe type 1 fonts are moving to a new project which doesn't reference most of the pdf logic. the shared logic is moving to a new flat-structured project called core. this is a sort-of onion type architecture, with core being the... core, fonts being the next layer of the onion, pdfpig itself the next. this will then support additional libraries/projects as outer layers of the onion as well as releasing standalone version of the font library as pdfbox does with fontbox.	2020-01-04 16:38:18 +00:00
Eliot Jones	b355a31ae8	write valid zlib stream for flate since c# only produces a deflate stream when compressing it is necessary to provide the header and footer bytes to convert this to a valid zlib stream. this involves setting the correct 2 bytes for the header and appending a 4 byte adler checksum for the uncompressed data after the compressed data stream.	2020-01-04 10:27:07 +00:00
Eliot Jones	336947db73	add writing methods to truetype tables #98 since we have verified the problem with the characters not appearing in acrobat reader isn't the checksum (other files also have invalid checksums but work) it seems likely the issue is with the os/2 table. this change moves the logic for writing out the cmap table, the format 6 cmap sub-table, truetype table headers and the os/2 table into the classes themselves. now we can write an os/2 table and we've tested that the output matches the input, we can overwrite the os/2 table in order to work out which of the os/2 errors is causing our font to be invalid. the writeable interface should be added to more and more parts of the codebase so that writing, editing and document creation become first class citizens rather than hardcoded additions. this change also adds the macroman (1,0) cmap subtable to edited fonts so that it is present for consumers which expect it.	2020-01-04 10:27:07 +00:00
Eliot Jones	935d182888	use doubles where calculations are being run	2019-12-24 12:22:17 +00:00
Eliot Jones	a967e0898a	handle missing width and height correctly for compact font format fonts #75	2019-12-04 14:19:06 +00:00
Eliot Jones	677d2b5e8f	#82 make resource store state local to the page and operation being processed resources such as fonts are linked to page content operations using name labels, e.g. "/F1", these resource labels can be reassigned on different pages or inside form xobjects. we now clear the entire resource state for each page which is parsed and after form xobject operations which use resource dictionaries.	2019-11-25 14:34:02 +00:00
BobLd	99f260befb	Enhancing NearestNeighbourWordExtractor - Making the code easier to read - Using 20% of Width instead of 60% - Making DefaultWordExtractor public	2019-10-21 20:51:27 +01:00
Eliot Jones	68bcaf3901	#55 move support for images to page and add inline images support both xobject and inline images. adds unsupported filters so that exceptions are only thrown when accessing lazily evaluated image.bytes property rather than when opening the page. treat all warnings as errors.	2019-10-08 14:04:36 +01:00
Eliot Jones	e02e130947	#57 add creation and modified date to document information this enables users to check if xmp metadata is outdated	2019-10-03 12:56:48 +01:00
Eliot Jones	d98b8b43c1	small performance tweaks and remove package license expression package license url is deprecated in favour of package license expression but nuget doesn't seem to support expressions properly for published packages yet so we'll keep the deprecated url for the time being. having both url and expression causes the build to fail. small obvious performance improvements for file header passing and getting the encoding information using the existing reverse name to code map.	2019-08-18 13:47:01 +01:00
Eliot Jones	3c49371c68	test hex to string implementation and remove unused method	2019-07-07 17:30:54 +01:00
Eliot Jones	f375cb6f04	keep letters in word when using default word extractor	2019-05-30 20:07:52 +01:00
BobLd	65647febcf	- Adding a TextDirection enum. - In the Letter class: - Renaming 'Location' to 'StartBaseLine' and adding 'EndBaseLine' for better localisation of the letter ('Location' is also kept). - Adding TextDirection.	2019-04-19 21:33:31 +01:00
Eliot Jones	575953c0ed	add multi targeting frameworks in the same project for net 4.5 through net 7.0 and net standard 2.0	2019-01-06 11:06:02 +00:00
Eliot Jones	21aa964e0b	#24 add different field types and code to read them	2019-01-02 22:28:50 +00:00
Eliot Jones	20e843f5ae	#24 start adding classes for the acroform api	2019-01-01 17:44:46 +00:00
Eliot Jones	a5349dd77a	start adding retrieval of annotations	2018-12-20 18:18:32 +00:00
Eliot Jones	a5ce43774b	revert change to public api of letter. update readme	2018-11-26 20:18:00 +00:00
Eliot Jones	fdd48b25d8	#15 change default word extraction for latex test	2018-11-25 10:10:28 +00:00
Eliot Jones	17909f8565	#15 add classes to extract words and initial tests	2018-11-24 20:51:27 +00:00
Eliot Jones	2fa781b8e9	#10 make all token classes public and expose via a public structure member on pdf document	2018-11-24 19:02:06 +00:00
Eliot Jones	2c159f71e8	#6 rename some cff classes, change protection levels and start fixing bugs with charstrings which include hints in routine calls	2018-11-18 16:32:28 +00:00
Eliot Jones	0f68dfeb19	#10 move tokens to the root namespace for discoverability. upgrade xunit versions. there is a bug with test discovery for stringtokenizertests	2018-11-16 20:00:12 +00:00
Eliot Jones	904f773525	add code for drawing type 1 glyphs and converting to svg	2018-11-13 20:45:54 +00:00
Eliot Jones	1d4dc7767d	change type1 commands to be static and lazily evaluated and return the command sequences from the parser	2018-11-01 19:34:22 +00:00
Eliot Jones	e24a306c31	remove all old parsing logic	2018-01-21 14:48:49 +00:00
Eliot Jones	da7d83d863	finish the migration	2018-01-20 20:20:40 +00:00
Eliot Jones	7d90f4858a	continue migrating code to tokenizer	2018-01-20 18:42:29 +00:00
Eliot Jones	a0deab446b	switch classes still using the cos object approach to the tokenization approach initally used for parsing cmap files.	2018-01-19 00:35:04 +00:00
Eliot Jones	ec62542b64	change the project name to something silly	2018-01-10 19:49:32 +00:00

34 Commits