Commit Graph

79 Commits

Author SHA1 Message Date
mvantzet
06253966e4 Added Letter properties RenderingMode, StrokeColor, FillColor and added those as mandatory
constructor arguments. Kept property Color, which contains either StrokeColor (if rendering mode is Stroke)
or FillColor (for all other rendering modes).
In PdfPageBuilder opted for default text rendering mode "Fill" which seems like a sensible default.
2023-01-13 12:35:25 +01:00
Eliot Jones
c8874c5984 #483 make skip missing fonts even more resilient to nonsense files 2022-12-11 16:18:09 -05:00
Eliot Jones
e2246a88bb #482 add skip missing fonts option and pass parsing options to content stream processor
this doesn't fix the reported issue since the pdf itself is corrupted on page 8 however it will
allow recovery in some scenarios where text content isn't important.

also adds more informative error when stream unintentionally passed with non zero offset
2022-10-09 13:44:05 -04:00
Eliot Jones
eb0758f050 only combine when it forms part of the same byte sequence 2022-04-14 20:22:49 -04:00
Eliot Jones
b5b15ee593 add handling for combining diacritics 2022-04-14 20:14:09 -04:00
Eliot Jones
9ae0a5ec15 allow stream filters to contain indirect references to name tokens 2021-04-25 16:22:22 -04:00
BobLd
eb85f67b18 Remove CurrentGraphicsState GetCurrentState() from IOperationContext. 2020-11-23 11:11:07 +00:00
BobLd
d07baa97d5 Remove reference from CurrentSubpath and CurrentPath in IOperationContext and add MoveTo(), BezierCurveTo(), LineTo() and Rectangle(). 2020-11-23 10:49:50 +00:00
BobLd
cd9ac6ac6c - fix letter's PointSize computation by applying the transform to a rectangle of height fontSize
- add test with rotated letters
2020-10-12 12:59:02 +01:00
BobLd
8f0f7769a6 fix clipping error when trying to fill a single line; add log; set EvenOdd as default in initiate CurrentClippingPath; add tests 2020-09-22 10:47:34 +01:00
BobLd
33f92cd11c handle page rotation by updating initial TransformationMatrix 2020-06-02 16:12:24 +01:00
BobLd
6e773446df simplify double cast 2020-06-01 14:55:45 +01:00
BobLd
2d9a4e5adb fix CurrentTransformationMatrix multiplication order in ProcessFormXObject 2020-06-01 14:00:17 +01:00
Eliot Jones
09b951f667 expose font details on individual letters
also fixes a regression for image extraction
2020-04-25 17:15:26 +01:00
Eliot Jones
f18bc0766a #161 handle zero point size by using rotated matrix 2020-04-19 10:28:11 +01:00
Eliot Jones
25314cc79d #161 change rotation to fix values and page size
this doesn't account for images and pdf paths yet.
2020-04-18 18:04:41 +01:00
Eliot Jones
db442194c3 use a mutable struct 2020-04-18 12:10:17 +01:00
Eliot Jones
e382e581ba add merge test for document with object stream 2020-04-16 20:57:57 +01:00
BobLd
ec2dcdc9f4 Check if CurrentSubpath is null in CloseSubpath() 2020-04-05 17:58:57 +01:00
BobLd
b923a42f9e Check if CurrentSubpath null before CloseSubpath 2020-04-05 17:58:57 +01:00
BobLd
20c4b9594b Rename PdfSubpath.ClosePath() to PdfSubpath.CloseSubpath() to avoid confusion 2020-04-05 17:58:57 +01:00
BobLd
04300eb12c Add PdfSubpath comment 2020-04-05 17:58:57 +01:00
BobLd
064fa4922a make Clipping internal
do not throw errors when CurrentPath is null
modify tests to match
2020-04-05 17:58:57 +01:00
BobLd
51165dc11a Implement EndPath
Make path clipping optional
2020-04-05 17:58:57 +01:00
BobLd
983cfcb2f6 Simplify path construction operators
fix 're' operator to reflect documentation
Update ContentStreamProcessor with fill, stroke and clip operations
Throw errors when currentPosition is null in PdfSubpath
2020-04-05 17:58:57 +01:00
BobLd
3ee9ac7915 Implement FillStrokePath() operator and filling rule. 2020-04-05 17:58:57 +01:00
BobLd
43b40da5d5 Change Subpath to path where necessary 2020-04-05 17:58:57 +01:00
BobLd
6677641b37 Create PdfPath
Rename ClippingRule to FillingRule
Move FillingRule from Subpath to Path
2020-04-05 17:58:57 +01:00
BobLd
ab6a0f11fc Change name from PdfPath to PdfSubpath 2020-04-05 17:58:57 +01:00
Eliot Jones
48d166276d remove islenientparsing from contentstreamprocessor 2020-02-28 11:44:13 +00:00
Eliot Jones
7b09999a3f remove islenientparsing from the font handlers
we're removing islenientparsing to make the code simpler to maintain and use as well as more resilient.
2020-02-28 11:37:18 +00:00
BobLd
0afaa19d15 Handle null CurrentPath 2020-02-24 11:20:56 +00:00
BobLd
1d095af974 Implement Modify Clipping operations 2020-02-24 11:20:56 +00:00
BobLd
588648d30b Fix #133 Marked content extraction issue 2020-02-10 11:23:19 +00:00
Eliot Jones
29061b1fd2 handle unexpected adobe type 1 format
an encoding array in an adobe type 1 font may be missing its declaration ending in 'for', if we encounter 'dup' while looking for the 'for' token we have a special case to go straight into reading the encoding.

also handles a case where the page content stream contains a path-closing operator without any path being active.
2020-01-28 16:05:53 +00:00
Eliot Jones
ba09a13d08 more end image recovery logic
since inline image data may contain the end image "ei" token inside the data stream there's no reliable way to actually determine if we've read all the data. for this reason if we end up with an invalid state parsing operations after we've read the end image token we try to recover by reading from the previous token to the next end image token if any. we supply log information to let the consumer know this is what we're doing. it's still not bullet-proof but it should be good enough.

also support negative page rotation values by adding them to a 360 degree rotation so -90 degrees clockwise is 270 degrees clockwise.
2020-01-25 15:53:08 +00:00
Eliot Jones
b4d917dcdc merge pull request #122 from uglytoad/marked-content
marked content
2020-01-10 17:07:21 +00:00
Eliot Jones
41cc7abd1b prevent negative point size for fonts 2020-01-10 14:40:28 +00:00
Eliot Jones
17b7cf2f61 load images eagerly for marked content
when a marked content region contains an image we load it eagerly since we won't have access to the necessary classes at evaluation time. we also default image colorspace to the active graphics state colorspace if the dictionary doesn't contain a valid entry.
2020-01-10 13:52:21 +00:00
Eliot Jones
d011f37316 merge master 2020-01-09 15:32:10 +00:00
Eliot Jones
43574097f1 rename marked content elements and use factory
since the properties in marked content may be indirect references or belong to the page resources array, the value should be calculated during content processing. this change tidies up the marked content classes so they do not expose mutable data and uses the pdf token scanner overloads to load dictionary data.
2020-01-09 15:30:16 +00:00
Eliot Jones
4976fa1027 handle incorrect end image detected
since an inline image's data stream may contain the characters 'ei' as a result of compression it's possible to read an end image operator mid-data, this results in the next operator also being end image and the content stream being in an invalid state. to recover from this when we detect this situation we remove the previous operator, read to the current operator and replace the operator and data bytes in the list of operations.
2020-01-08 12:17:30 +00:00
BobLd
84bab1b627 Add basic marked content extraction capabilities 2020-01-08 10:34:01 +00:00
Eliot Jones
7c0ef111ea move classes to new projects
to make the project more useful and expose more usable classes we're rearchitecting in the following way. code used to read fonts from external file formats like truetype, adobe font metrics (afm) and adobe type 1 fonts are moving to a new project which doesn't reference most of the pdf logic. the shared logic is moving to a new flat-structured project called core. this is a sort-of onion type architecture, with core being the... core, fonts being the next layer of the onion, pdfpig itself the next. this will then support additional libraries/projects as outer layers of the onion as well as releasing standalone version of the font library as pdfbox does with fontbox.
2020-01-04 16:38:18 +00:00
Eliot Jones
935d182888 use doubles where calculations are being run 2019-12-24 12:22:17 +00:00
Eliot Jones
c89928d976 remove inefficient approach to checking if content stream path has been added #99 2019-12-10 13:20:57 +00:00
Eliot Jones
e38da0a403 add support for alternative colorspace in separation colorspaces #89 2019-12-06 17:23:15 +00:00
Eliot Jones
677d2b5e8f #82 make resource store state local to the page and operation being processed
resources such as fonts are linked to page content operations using name labels, e.g. "/F1", these resource labels can be reassigned on different pages or inside form xobjects. we now clear the entire resource state for each page which is parsed and after form xobject operations which use resource dictionaries.
2019-11-25 14:34:02 +00:00
Eliot Jones
efe7896824 #75 support vertical writing mode fonts 2019-10-17 15:57:04 +01:00
Eliot Jones
3f1321141a #73 process xobject form content when extracting text and images 2019-10-16 14:59:16 +01:00