Commit Graph

32 Commits

Author SHA1 Message Date
Eliot Jones
0fcc4e54c8 add istestproject setting to all projects
indicates which projects are test projects to the test runner.
2020-02-27 12:35:40 +00:00
Eliot Jones
7ac3fb2a39 remove old clipping rule code and make enum byte
removes the unused set winding rule method and makes the clipping rule enum a byte which will save 3 bytes per pdfpath instance.
2020-02-24 11:29:06 +00:00
BobLd
1d095af974 Implement Modify Clipping operations 2020-02-24 11:20:56 +00:00
BobLd
49caa071ba improve length computation
tidy up IntersectsWith()
2020-02-23 11:23:27 +00:00
BobLd
42245d70ca Improve PdfRectangle.GetWidthHeight();
Improve and simplify Word's oriented bounding box
2020-02-23 11:23:27 +00:00
BobLd
36566f42e6 Create generic methods for lines in GeometryExtensions 2020-02-23 11:23:27 +00:00
BobLd
b8d1eba8ee PdfLine.Insersect() 2020-02-23 11:23:27 +00:00
BobLd
d2ac2f598a update Centroid, GetWidthHeight and tests 2020-02-23 11:23:27 +00:00
BobLd
f05101ad07 TransformationMatrix: add comment and improve multiplication 2020-02-09 14:20:47 +00:00
BobLd
7364e53bb9 Improve bounding box for word 2020-02-09 14:20:47 +00:00
Eliot Jones
10ca77a034 move values back to computed properties
the additional stored fields made the struct slower to copy and had an impact on the performance. this moves non-essential fields back to computed properties.
2020-01-31 16:24:59 +00:00
BobLd
6dce4b1f8d Use double.NaN instead of double.MinValue
Only compute t, cos and sin once
2020-01-31 16:24:59 +00:00
Eliot Jones
b585fe9581 make width, height, area, rotation and centroid lazy
since the rectangle constructor is a hot path any calculations slow the library down considerably. for this reason we move calculations for the following properties into the property getter:
* width (cached)
* height (cached)
* rotation
* area
* centroid

where values are cached they set their backing field once calculated. this won't be thread safe if the same rectangle is accessed on multiple threads.
2020-01-31 16:24:59 +00:00
Eliot Jones
0e84fa34a8 fix usages of rectangle constructor.
now that rectangle constructor uses the order [ llx, lly, urx, ury ] and does not apply correction for points constructor parameters must be passed in the correct order. this change fixes the hyperlink factory which was passing them in the wrong order.

in addition the pdfpath bounding box was using left, right, top and bottom to calculate the minimum bounding box. this produced incorrect values now individual path operator bounding boxes are rotated, since for a rotated rectangle top may be less than bottom.

the performance seems to have taken a hit due to these changes however.
2020-01-31 16:24:59 +00:00
BobLd
483b30f44d Remove rounding 2020-01-31 16:24:59 +00:00
BobLd
0dad611cb1 Implement minimum bounding box algorithm 2020-01-31 16:24:59 +00:00
BobLd
ea27820ca4 Improve Word bounding box TextDirection.Other case 2020-01-31 16:24:59 +00:00
BobLd
2e5fdb5867 Fix PdfRectangle's Centroid and Translate() 2020-01-31 16:24:59 +00:00
BobLd
6d8744e722 More decimals to Width and Height
+ handle the case where both bottom points are identical
2020-01-31 16:24:59 +00:00
BobLd
9bcafdaa98 Update word bounding box computation 2020-01-31 16:24:59 +00:00
BobLd
27edf6cf77 Handle Width and Height for rotated rectangles 2020-01-31 16:24:59 +00:00
BobLd
75821919a7 Fix NearestNeighbourWordExtractor for rotated text 2020-01-31 16:24:59 +00:00
Eliot Jones
8ab2838063 recover from invalid cross reference position
if we are reading a cross reference offset which contains a number we assumed it was a stream object. if it's not we now brute-force the entire file looking for an 'xref' token. this should be combined with a search for cross-reference streams and should run when we read neither the numeric token or an 'xref' token but for now this fixes the observed issue.

also adds number of images to the page api to prevent consumers needing to enumerate.
2020-01-28 18:07:05 +00:00
Eliot Jones
6cf257a331 strings record encoding used to create them.
in order to recreate the valid bytes for use in decryption it is necessary to know which encoding was used to read a string token. this is because utf16-be encoding has a byte-order marker which should be included in the resulting bytes.
2020-01-26 17:07:58 +00:00
Eliot Jones
0ed4e58556 add test cases for rectangle transforms
our bounding rectangle values still seem to be wrong for rotated letters. this change adds some test cases for common transformation matrix operations on a rectangle, scale, translate and rotate.
2020-01-22 13:28:47 +00:00
Eliot Jones
e0a45e3774 include dependencies as dlls in the published nuget
by default nuget pack does not include project dependencies. this is suboptimal since it would require managing at least 5 nuget packages. this uses a workaround detailed here https://github.com/nuget/home/issues/3891 to copy the dependent dlls to the generated nuget package. this doesn't resolve the issue of how we publish the documentlayoutanalysis project, since it is the top of the dependency tree and we publish its parent, rather than it.
2020-01-05 13:56:14 +00:00
Eliot Jones
b29354e3e6 move compact font format fonts to fonts project 2020-01-05 12:08:01 +00:00
Eliot Jones
bbde38f656 move tokenizers to their own project
since both pdfs and Adobe Type1 fonts use postscript type objects, tokenization is needed by the main project and the fonts project
2020-01-05 10:40:44 +00:00
Eliot Jones
d09b33af4d move tokens to new project 2020-01-05 10:07:01 +00:00
Eliot Jones
1c38a2ae8a move pdfline to the core project 2020-01-05 09:33:59 +00:00
Eliot Jones
74774995d6 complete move of truetype, afm and standard14 fonts
the 3 font types mentioned are moved to the new fonts project, any referenced types are moved to the core project. most truetype classes are made public #8.
2020-01-04 22:39:13 +00:00
Eliot Jones
7c0ef111ea move classes to new projects
to make the project more useful and expose more usable classes we're rearchitecting in the following way. code used to read fonts from external file formats like truetype, adobe font metrics (afm) and adobe type 1 fonts are moving to a new project which doesn't reference most of the pdf logic. the shared logic is moving to a new flat-structured project called core. this is a sort-of onion type architecture, with core being the... core, fonts being the next layer of the onion, pdfpig itself the next. this will then support additional libraries/projects as outer layers of the onion as well as releasing standalone version of the font library as pdfbox does with fontbox.
2020-01-04 16:38:18 +00:00