Commit Graph

882 Commits

Author SHA1 Message Date
BobLd
2c8c6cda87 add GeometryExtensions tests 2020-02-23 11:23:27 +00:00
Eliot Jones
c6dc4d9eb8 handle tokenizing invalid numeric string correctly
rather than throwing when an invalid numeric string is read, our tokenizer now returns false so that error recovery methods can be attempted.
2020-02-21 11:16:31 +00:00
Eliot Jones
8d415fd162 use type 1 font handler for mmtype1
multiple master fonts are an extension of the adobe type 1 font format. we don't have any special case handling for them so for now we default to attempting to use the adobe type 1 font handler. it may be that we need some special parsing logic but the test file using the mmtype1 fonts didn't actually specify any font bytes so we can't check.
2020-02-21 10:49:29 +00:00
Eliot Jones
28faf1c22c default to .notdef for type 2 charstrings
if the glyph with a specific name isn't found in the set of type 2 charstrings we default to using the .notdef glyph if present.
2020-02-21 10:37:58 +00:00
Eliot Jones
c635b919d2 make numbers culture invariant in document builder
decimal numbers were dependent on the current thread culture for the output file. this meant values like '70.679' were output as '70,679' for cultures using a comma rather than period separator for the floating point (i.e. the whole world). this resulted in the file displaying incorrectly.
2020-02-20 13:06:12 +00:00
BobLd
848d687910 Add kd tree and improve clustering 2020-02-19 11:12:54 +00:00
Eliot Jones
ed894baffc Merge pull request #135 from BobLd/rxycut-stackoverflow
Fix RecursiveXYCut
2020-02-15 10:52:20 +00:00
BobLd
3229574345 Correcting typo 2020-02-12 12:00:14 +00:00
BobLd
6a72ce655c Merge branch 'master' of https://github.com/UglyToad/PdfPig into rxycut-stackoverflow 2020-02-12 11:40:51 +00:00
BobLd
f0be3e99ee Add Projection class 2020-02-11 10:04:04 +00:00
BobLd
dd8baa0d2f Fix stack overflow error in RecursiveXYCut
Was linked to the new rotated rectangle methods
2020-02-10 13:53:59 +00:00
BobLd
b49740d093 add test for index increment when children are present 2020-02-10 11:23:19 +00:00
BobLd
ec68231ab3 remove useless using 2020-02-10 11:23:19 +00:00
BobLd
ac1e2c49ba Fix bounding box for artifact
Add tests
2020-02-10 11:23:19 +00:00
BobLd
588648d30b Fix #133 Marked content extraction issue 2020-02-10 11:23:19 +00:00
BobLd
6dfc7aea30 add test for index increment when children are present 2020-02-09 18:01:18 +00:00
BobLd
f66c25103f remove useless using 2020-02-09 17:54:20 +00:00
BobLd
905559e282 Fix bounding box for artifact
Add tests
2020-02-09 17:46:35 +00:00
BobLd
635693f032 Fix #133 Marked content extraction issue 2020-02-09 15:23:55 +00:00
BobLd
f05101ad07 TransformationMatrix: add comment and improve multiplication 2020-02-09 14:20:47 +00:00
BobLd
1a11e49124 Add TransformationMatrix.Inverse() tests 2020-02-09 14:20:47 +00:00
BobLd
df73206788 try rerun failed tests 2020-02-09 14:20:47 +00:00
BobLd
6a86cdab73 make GeometryExtensions.OrientedBoundingBox() internal 2020-02-09 14:20:47 +00:00
BobLd
288beab39d limit access to array and add comments 2020-02-09 14:20:47 +00:00
BobLd
7364e53bb9 Improve bounding box for word 2020-02-09 14:20:47 +00:00
Eliot Jones
f3fcd1b3a1 ignore form dictionaries that do not contain fields #131
though a form dictionary should always contain fields (as required by the spec) it is possible for this entry to be missing. in this case we return false for trygetform.
2020-02-05 10:56:01 +00:00
Eliot Jones
40dc80c281 handle type 1 font with no descriptor information #132
though required by the spec an adobe type 1 font may be missing all width data. in this case we default to empty values and treat it like a normal adobe type 1 font.
2020-02-05 10:46:39 +00:00
Eliot Jones
10ca77a034 move values back to computed properties
the additional stored fields made the struct slower to copy and had an impact on the performance. this moves non-essential fields back to computed properties.
2020-01-31 16:24:59 +00:00
BobLd
6dfbd45eb3 Add image example with new bounding box method 2020-01-31 16:24:59 +00:00
BobLd
6d0b14d2a7 Delete nearest neighbour example.png 2020-01-31 16:24:59 +00:00
BobLd
6dce4b1f8d Use double.NaN instead of double.MinValue
Only compute t, cos and sin once
2020-01-31 16:24:59 +00:00
Eliot Jones
b585fe9581 make width, height, area, rotation and centroid lazy
since the rectangle constructor is a hot path any calculations slow the library down considerably. for this reason we move calculations for the following properties into the property getter:
* width (cached)
* height (cached)
* rotation
* area
* centroid

where values are cached they set their backing field once calculated. this won't be thread safe if the same rectangle is accessed on multiple threads.
2020-01-31 16:24:59 +00:00
Eliot Jones
0e84fa34a8 fix usages of rectangle constructor.
now that rectangle constructor uses the order [ llx, lly, urx, ury ] and does not apply correction for points constructor parameters must be passed in the correct order. this change fixes the hyperlink factory which was passing them in the wrong order.

in addition the pdfpath bounding box was using left, right, top and bottom to calculate the minimum bounding box. this produced incorrect values now individual path operator bounding boxes are rotated, since for a rotated rectangle top may be less than bottom.

the performance seems to have taken a hit due to these changes however.
2020-01-31 16:24:59 +00:00
BobLd
0e613fb526 Handle cases with not enough points in minimal bounding rectangle 2020-01-31 16:24:59 +00:00
BobLd
4c65cbc139 Improve minimum bounding box orientation v2 2020-01-31 16:24:59 +00:00
BobLd
bff18d81ca Improve minimum bounding box orientation 2020-01-31 16:24:59 +00:00
BobLd
483b30f44d Remove rounding 2020-01-31 16:24:59 +00:00
BobLd
253ae32193 Remove ordering from minimal bounding rectangle 2020-01-31 16:24:59 +00:00
BobLd
0dad611cb1 Implement minimum bounding box algorithm 2020-01-31 16:24:59 +00:00
BobLd
36c03459a7 first and last letter 2020-01-31 16:24:59 +00:00
BobLd
f221b58936 Remove useless code 2020-01-31 16:24:59 +00:00
BobLd
ea27820ca4 Improve Word bounding box TextDirection.Other case 2020-01-31 16:24:59 +00:00
BobLd
2e5fdb5867 Fix PdfRectangle's Centroid and Translate() 2020-01-31 16:24:59 +00:00
BobLd
adaccf97b3 Add files via upload 2020-01-31 16:24:59 +00:00
BobLd
380c36918b Remove unnecessary code 2020-01-31 16:24:59 +00:00
BobLd
0cbf3434bc Remove 'orderFunc' from 'NearestNeighbourWordExtractor' to use the order found by clustering algo 2020-01-31 16:24:59 +00:00
BobLd
3b90370f28 Using Math.Min(letter.Width, letter.GlyphRectangle.Width) for rotated 180 word bounding box 2020-01-31 16:24:59 +00:00
BobLd
c4b6bbc8e5 Using Math.Max(letter.Width, letter.GlyphRectangle.Width) for word bounding box 2020-01-31 16:24:59 +00:00
BobLd
6d8744e722 More decimals to Width and Height
+ handle the case where both bottom points are identical
2020-01-31 16:24:59 +00:00
BobLd
bc69376743 Increase max distance for TextDirection.Other in NearestNeighbourWordExtractor 2020-01-31 16:24:59 +00:00