Commit Graph

1317 Commits

Author SHA1 Message Date
mvantzet
9ff095c516 Fix typo 2023-03-13 18:15:33 +01:00
mvantzet
ea77156eb8 Changes for annotation positions:
- Pass in the initial matrix to the annotation provider, so that it can return
  the correct rectangles / quad points.
- Made a change / extensions to the Annotation class:
  - ModifiedDate is now a DateTimeOffset instead of unparsed string.
    If the string is invalid, ModifiedDate is set to the default value.
  - Added lookup for the "appearance streams"; all the annotations should have
    a "N" (normal) appearance, and optionally have a "R" (roll-over/hover)
    and "D" (down/click) appearance. Did not expose the actual stream objects,
    but added a flag indicating the existence of "R" / "D". At some point
    we can consider doing something with the appearances.
- Changed signature of GetInitialMatrix / ContentStreamProcessor constructor
  from PdfRectangle back to what it was earlier, namely MediaBox and CropBox,
  to prevent accidentally mixing the two up in the caller.
2023-03-13 18:15:24 +01:00
mvantzet
a439b43246 Added integration test for cropped document, and a cropped+rotated document
with an annotation as well.
Added annotations to visual verification test (blue outlines).
2023-03-13 18:08:20 +01:00
mvantzet
17681472cc Also apply optimizations (set sin/cos to integers) for e.g. -270 degrees. 2023-03-13 17:50:58 +01:00
mvantzet
0413f3f1bf Fix related to page sizes / rotation / coordinate transformations (issue 560):
The initial transformation matrix was incorrect, as it translated by the cropbox width/height
instead of by the cropbox left/bottom offsets. Also, it did not translate the results back into
the 1st quadrant so that (0,0) would (again) be the lower left corner origin for the cropped area.
Added unit tests in new file ContentStreamProcessorTests.

EFFECTIVE CHANGES:

- The coordinates used for letters etc. are different now for rotated and/or cropped pages,
  but as those were not very consistent anyway this is probably OK.

- The Page Size (A4, A3, Custom, etc.), Width and Height are now determined by the CropBox,
  not by the MediaBox; the CropBox ultimately determines what you see on screen and is printable.
  If no cropbox is defined in the PDF, it is set to the MediaBox; so in that case it is
  backwards compatible with the old code.

- The Page MediaBox and CropBox properties are no longer rotated according to Page.Rotation.
  The Page Width and Height do take rotation into account (kept it backward compatible).
2023-03-09 16:42:09 +01:00
mvantzet
3a0a6e1411 Resolving page sizes did not work when the page orientation was landscape, or when
side lengths were not integer or off by one. Added unit tests.
2023-03-09 16:09:14 +01:00
Eliot Jones
999f9ee7dc Merge pull request #551 from fnatzke/Issue549
Fix for Issue#549. Skip over Jpg segments rather than use bruce force…
2023-02-18 10:50:44 +00:00
Eliot Jones
761bce8591 Merge pull request #539 from mvantzet/ExtendLetterProperties
Added Letter properties RenderingMode, StrokeColor, FillColor and add…
2023-02-18 10:48:47 +00:00
Fred Natzke
3a848c090c Fix for Issue#549. Skip over Jpg segments rather than use bruce force to find segment start marker 2023-02-16 15:55:42 +10:00
Eliot Jones
88aaddcf26 Merge pull request #541 from fnatzke/Fix540
Fix 540 Copy page with inline image.
2023-01-16 15:25:35 -05:00
Fred Natzke
324de1da67 Fix 540 Copy page with inline image. 2023-01-16 14:27:04 +10:00
mvantzet
2acca32987 Added integration test to see if we can detect the presence of invisible text (text rendering mode = Neither),
visible text, the presence of images and the presence of paths.
Certain combinations thereof potentially must be run through OCR.
2023-01-13 14:11:13 +01:00
mvantzet
06253966e4 Added Letter properties RenderingMode, StrokeColor, FillColor and added those as mandatory
constructor arguments. Kept property Color, which contains either StrokeColor (if rendering mode is Stroke)
or FillColor (for all other rendering modes).
In PdfPageBuilder opted for default text rendering mode "Fill" which seems like a sensible default.
2023-01-13 12:35:25 +01:00
Eliot Jones
65bc754f5b remove ci unsupported syntax 2023-01-08 15:38:05 -05:00
Eliot Jones
d5b196ff44 remove unsupported syntax 2023-01-08 14:18:09 -05:00
Eliot Jones
d2944e14e5 change alpha version for nightly builds 2023-01-08 12:01:48 -05:00
Eliot Jones
57e9acbc12 post merge tidy up 2023-01-08 12:00:35 -05:00
Eliot Jones
7b891edb69 Merge pull request #526 from fnatzke/Issue455-Issue_extracting_unicode_from_CJK_file
Fix #455 extracting unicode from CJK file
2023-01-08 11:53:32 -05:00
Eliot Jones
37e31c40ae Merge pull request #522 from fnatzke/master
Fix #514 Print Character with ZapfDingbats font
2023-01-08 11:52:15 -05:00
Eliot Jones
982f36647a Merge pull request #525 from mvantzet/ITokenWriter
Make TokenWriter non-static, implement ITokenWriter, injection in PdfDocumentBuilder, add PdfTextRemover
2023-01-08 11:49:56 -05:00
Fred Natzke
210c0dde50 Issue451_Type2CharStrings parsing/intepretation error 2022-12-26 17:08:03 +10:00
Fred Natzke
8b32a4d958 TYpe0Font better description of conversion to unicode. 2022-12-24 10:53:55 +10:00
Fred Natzke
4ba2a29aa0 Fix #455 extracting unicode from CJK file 2022-12-23 16:53:37 +10:00
mvantzet
94c62e1b65 Forgot to commit this updated test, to allow ITokenWriter and PdfTextRemover
to be public.
2022-12-21 09:35:29 +01:00
mvantzet
371e148c63 Remove unreachable code 2022-12-20 21:31:35 +01:00
mvantzet
6ef6c4d780 Added a PdfTextRemover utility that uses a NoTextTokenWriter, to output PDFs without text contents.
Also added unit tests to test:
- If we can use a custom ITokenWriter with PdfDocumentBuilder
- If removing text works.
2022-12-20 21:31:15 +01:00
mvantzet
9273a43965 Merge branch 'master' of https://github.com/mvantzet/PdfPig into ITokenWriter 2022-12-20 11:08:13 +01:00
mvantzet
6125c00089 Make it possible to inject a custom ITokenWriter in PdfDocumentBuilder. 2022-12-20 10:50:41 +01:00
Fred Natzke
2bcac59917 Minor cleanup of some texts. 2022-12-20 14:22:37 +10:00
Fred Natzke
7b441a3b0a Standard14WritingFontTests Add tests of invalid characters for each standard font 2022-12-20 14:17:02 +10:00
Fred Natzke
8f13e2f11c Fix #375 Add checkmark from ZapfDingbats using unicode character. Letters extracted have unicode values. 2022-12-17 17:39:15 +10:00
mvantzet
3594231c67 Introduce ITokenWriter / non-static TokenWriter class. This is the first step
in making it possible to override methods in the token writer, for example to filter
streams when writing using PdfDocumentBuilder.

The second step is injecting ITokenWriter into PdfDocumentBuilder.
2022-12-15 18:01:10 +01:00
Fred Natzke
620fa9b8cc Fix #514 Print Character with ZapfDingbats font 2022-12-13 14:22:30 +10:00
Eliot Jones
c8874c5984 #483 make skip missing fonts even more resilient to nonsense files v0.1.7 2022-12-11 16:18:09 -05:00
Eliot Jones
2aed996319 Merge pull request #517 from fnatzke/master
Fixes for Issue#512, 516 and 519
2022-12-09 09:39:55 -05:00
Eliot Jones
060c7bc728 Merge pull request #521 from eliotjones-roger/rotation-support-for-page-builder
add ability to rotate page by number of degrees, make builder fluent
2022-12-09 09:08:27 -05:00
Eliot Jones
6764d81958 Merge pull request #520 from mjolivet-lucca/master
Adding non regression unit test to cover PR 473
2022-12-09 08:56:22 -05:00
Eliot Jones
95df15996b add ability to rotate page by number of degrees, make builder fluent 2022-12-09 08:44:56 -05:00
Mathieu jolivet
998e768bba [ADD] adding non regression unit test to cover PR 473 2022-12-08 14:38:19 +01:00
Fred Natzke
f5fe39b285 Issue 512 revisited. Use of hashset to avoid reprocessing same token in infinite loop. 2022-12-07 17:27:34 +10:00
Fred Natzke
9ef07b0176 Fix Issue 519 Break infinite loop parsing kid token list containing parent 2022-12-06 16:29:09 +10:00
Fred Natzke
29adece983 Original #516 fix causes Tests to fail. Alternative to use pagesByNumber?.Count in Catalog for discovered pages. Some other unrelated tests failed due to source using single rather than two character newlines. Changed to test string itself for '\r' rather than environment is Unix 2022-11-30 11:10:52 +10:00
Fred Natzke
a7f64ec64b Fix Issue 516. Page Dictionary Count field has incorrect page count. Compare with PageTree children count. 2022-11-30 09:25:23 +10:00
Fred Natzke
afe473e10e Fix for Issue#512: Unable to open PDF. BruteForceSearcher::GetLastEndOfFileMarker() minimumEndOffset out by 1. 2022-11-29 17:31:23 +10:00
Eliot Jones
9c9c7c99ea ci new namespace file scope unsupported 2022-10-09 16:01:20 -04:00
Eliot Jones
e2246a88bb #482 add skip missing fonts option and pass parsing options to content stream processor
this doesn't fix the reported issue since the pdf itself is corrupted on page 8 however it will
allow recovery in some scenarios where text content isn't important.

also adds more informative error when stream unintentionally passed with non zero offset
2022-10-09 13:44:05 -04:00
Eliot Jones
c643facee0 #481 skip resource entry if null token 2022-10-09 13:06:04 -04:00
Eliot Jones
2f9a9ace9a Merge pull request #473 from grinay/master
Fix page number order.
2022-08-13 16:02:44 -04:00
grinay
19962af011 Fix page number order.
For case when root nod has reference to page the order will be incorrect.
The case if root node has reference
[2 0 R 3 0 R 10 0 R]
Where 2 0 R is intermediate node containing page 1,2,3

Where 3 0 R is intermediate node containing page 4,5,6
Where 10 0 R is page 7

without that fix 7 page will in array as page 1
2022-07-29 18:27:56 +08:00
Eliot Jones
545d1a0793 Merge branch 'master' of github.com:UglyToad/PdfPig 2022-07-02 18:09:22 -04:00