Commit Graph

48 Commits

Author SHA1 Message Date
BobLd
0dad611cb1 Implement minimum bounding box algorithm 2020-01-31 16:24:59 +00:00
Eliot Jones
b29354e3e6 move compact font format fonts to fonts project 2020-01-05 12:08:01 +00:00
Eliot Jones
1c38a2ae8a move pdfline to the core project 2020-01-05 09:33:59 +00:00
Eliot Jones
7c0ef111ea move classes to new projects
to make the project more useful and expose more usable classes we're rearchitecting in the following way. code used to read fonts from external file formats like truetype, adobe font metrics (afm) and adobe type 1 fonts are moving to a new project which doesn't reference most of the pdf logic. the shared logic is moving to a new flat-structured project called core. this is a sort-of onion type architecture, with core being the... core, fonts being the next layer of the onion, pdfpig itself the next. this will then support additional libraries/projects as outer layers of the onion as well as releasing standalone version of the font library as pdfbox does with fontbox.
2020-01-04 16:38:18 +00:00
BobLd
d246bf5c74 - remove unnecessary casts
- make PageXmlTextExporter.Deserialize() public
2019-12-31 10:43:07 +00:00
Eliot Jones
935d182888 use doubles where calculations are being run 2019-12-24 12:22:17 +00:00
Eliot Jones
1fb416eee3 add convenience method to retrieve all hyperlinks and their text from annotations on a page 2019-12-18 11:41:02 +00:00
BobLd
5cf1f6c58c Modifications and adding som tests 2019-12-16 14:36:52 +00:00
BobLd
1656411fcb Improving Geometry classes with Tests 2019-12-14 11:41:11 +00:00
Eliot Jones
09b26c43e0 #74 add intersectswith method to rectangle 2019-10-17 11:21:49 +01:00
BobLd
fe1a3c4b8b updated from comments
- still need to look at XmlWriter
2019-10-10 12:29:28 +01:00
Eliot Jones
81ab414c56 add is supported flag to filters and add missing doc comment 2019-10-08 15:53:42 +01:00
BobLd
bf09aee99c Adding images regions 2019-10-08 15:29:18 +01:00
BobLd
93313118e9 Support for hORC, AtloXml and PageXml output formats
Tested with:
- 'hocrjs' for hORC (see https://unpkg.com/hocrjs)
- 'PAGE Viewer' for hORC, AtloXml and PageXml (see http://www.primaresearch.org/tools/PAGEViewer)
2019-10-07 15:19:30 +01:00
BobLd
1c3519fd51
Update PdfPath.cs
Need to account the case where a `Close` command is called but the first and last commands are not connected.
2019-10-06 12:47:12 +01:00
BobLd
1975db4752 correct typo 2019-10-04 14:50:22 +01:00
BobLd
5d3e4cd4e1 Improve PdfPath
- Determine if Closed path
- Determine if Clockwise or CounterClockwise
- Add Centroid
2019-10-04 14:37:41 +01:00
Eliot Jones
3fbfc1130e lazily evaluate centroid of rectangle 2019-08-20 23:03:27 +01:00
Eliot Jones
8c100efe04
Merge pull request #60 from BobLd/master
Improve ClusteringAlgorithms.GroupIndexes() and add Equals() to PdfLine
2019-08-17 12:58:06 +01:00
BobLd
afa2b7baa1 Improve ClusteringAlgorithms.GroupIndexes()
Add Equals() to PdfLine
2019-08-14 19:58:31 +01:00
Eliot Jones
f5e025aa70
merge pull request #58 from uglytoad/colors
adds colors to letters and prepares code to add colors to paths.
2019-08-13 20:50:06 +01:00
BobLd
9b24223190 Removing ToDouble() 2019-08-10 13:52:01 +01:00
BobLd
bd58879e32 Update from comments 2019-08-10 13:05:25 +01:00
BobLd
474ce9a442 Improving PdfPoint 2019-08-09 19:58:48 +01:00
BobLd
ac065e988a Adding Centroid to PdfRectangle. 2019-08-09 17:22:16 +01:00
Eliot Jones
c5d03bca97 move application of transformation matrix outside path 2019-08-08 21:19:18 +01:00
Eliot Jones
364bd25fa8 #48 add handling of inline image data to pdf content parsing
an inline image in a pdf content stream starts with the bi tag, then id declares the start of image data and ei the end. attempting to parse the bytes after the id tag as usual resulted in errors. this change adds special case handling for inline images.
2019-08-03 15:42:19 +01:00
vadimy
7d3a0929b6 Refactoring and fixing according to Eliot's comments 2019-07-24 00:00:00 -04:00
vadimy
b9d0cca2a6 Added "Paths" collection to Page object.
Added matrix transformation to path operators.
2019-07-16 00:35:29 -04:00
Eliot Jones
7a3b89ece1 tidy up some doc comments 2019-05-18 12:28:42 +01:00
BobLd
f4ec425bf0 - Correction of the PdfLine's length formula;
- Moving Line to TextLine
2019-05-15 19:44:47 +01:00
BobLd
97f0f6fe75 Minor modifications and updates 2019-05-14 20:56:34 +01:00
BobLd
2011d504a7 In Content:
- Adding a 'Line' of text object
  - Adding a 'TextDirection' property in the 'Word' object

In Geometry:
  - Adding a 'PdfLine' object
  - Making the 'PdfRectangle' creator public
2019-05-12 19:34:00 +01:00
BobLd
214ef8a958 Fix issue with Width, Height and Area when the texxt orientation is not horizontal. 2019-04-19 20:12:31 +01:00
Eliot Jones
5c8a77bf33 #26 make almost all operators public 2019-01-03 22:20:53 +00:00
Eliot Jones
9a1879829d move path class and add doc comments to document creation api 2018-12-25 10:37:00 +00:00
Eliot Jones
d5a50f2236 #8 tidy up truetype font internally. some more work on a potential document creation api 2018-11-25 13:56:27 +00:00
Eliot Jones
3f3badb7b4 #12 performance optimizations for type 1 fonts and other tweaks 2018-11-25 11:37:00 +00:00
Eliot Jones
3a025052c9 add test for type 1 cff glyph locations and add missing doc comments 2018-11-19 21:43:22 +00:00
Eliot Jones
4d18a2478d add charset interface, create class to store cff font data. add the command logic for type 2 charstrings #6 2018-11-17 14:59:58 +00:00
Eliot Jones
9700ee03b8
Merge pull request #3 from GowenGit/master
letter boundaries - todo: review changes for non latin characters, it seems like we need both the bounding box and the origin to be stored for each letter since the origin is on the baseline while the bounding box can extend below.
2018-04-15 11:41:30 +01:00
modest-as
4570427a6f merge from upstream 2018-04-14 22:20:36 +01:00
Eliot Jones
8def7d7e0b unify truetype glyphs to a single class. build composite glyphs from elements 2018-04-14 22:16:26 +01:00
Eliot Jones
983933b6e8 tidy up glyph reading code, simple glyphs now contain glyph points rather than 3 related arrays 2018-04-14 15:44:07 +01:00
Eliot Jones
ebdda46098 make rectangle a struct. add infrastructure for handling composite glyphs 2018-04-14 14:11:10 +01:00
modest-as
564e32e072 Return bounding boxes for letters 2018-03-30 23:16:54 +01:00
Eliot Jones
70025edd79 truetype glyphs now contain the bounds 2018-03-30 21:26:21 +01:00
Eliot Jones
ec62542b64 change the project name to something silly 2018-01-10 19:49:32 +00:00