Commit Graph

15 Commits

Author SHA1 Message Date
davebrokit
f3e37eafae Introduce IBlock and ILettersBlock interfaces (Round 2) (#1263)
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Common Crawl Tests / build (0008-0009) (push) Has been cancelled
Run Common Crawl Tests / build (0010-0011) (push) Has been cancelled
Run Common Crawl Tests / build (0012-0013) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Tag Release / tag_if_version_changed (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
* Code review changes
- Keep the Bounds property on the image classes so this isn't a major breaking API change
- Don't expose letters collection

* Minor fix

* Switch to using BoundingBox in the library

---------

Co-authored-by: davmarksman <david@brokit.co.uk>
2026-02-28 16:25:51 +00:00
Jason Nelson
1d2777d59a [Tests] Enable implict usings 2024-03-16 07:40:17 +00:00
Eliot Jones
407ee5ca51 add content order text extractor and example of use 2020-04-19 17:06:34 +01:00
Eliot Jones
75c04eb81c fix namespace using order 2020-04-05 17:58:57 +01:00
Eliot Jones
2a0a3fae69 add test for svg exporter and escape xml characters 2020-04-05 17:58:57 +01:00
Eliot Jones
a6541f1cfc fix test references
update references for unit tests to reference new core and fonts projects. all tests except the public api scanner tests now run successfully.
2020-01-04 22:56:41 +00:00
Eliot Jones
0349bedd3e #57 add access to document metadata and expose wrapper type 2019-08-11 12:42:30 +01:00
Eliot Jones
557d8bc948 map missing character codes directly #44
previously if no matching unicode was found for a character code we would return a null letter. instead we now map from the character code directly to a character. this seems to work for most documents, except where there are ligatures, e.g. fi or ff, but is still better than not returning anything.
2019-07-07 13:53:25 +01:00
Eliot Jones
55a0e6b646 move large strings from code files to avoid wrong language detected on github 2018-12-29 11:55:19 +00:00
Eliot Jones
243f3dc099 #13 handle special case cff file and reduce duplication in integration tests 2018-11-25 12:36:38 +00:00
Eliot Jones
fdd48b25d8 #15 change default word extraction for latex test 2018-11-25 10:10:28 +00:00
Eliot Jones
03938d8352 #5 assert against pdfbox positions and fix by implementing subroutines 2018-11-23 18:00:53 +00:00
Eliot Jones
b9c8e152c1 #16 change letter api to match the actual information 2018-11-22 19:32:16 +00:00
Eliot Jones
df0b60c2e1 port type 1 lexer from pdf box and add test data 2018-10-23 20:02:20 +01:00
Eliot Jones
4443cde229 add very hacky parsing for type 1 font files in order to read the encoding 2018-01-14 18:59:03 +00:00