Commit Graph

22 Commits

Author SHA1 Message Date
Eliot Jones
58ecfbf963 0.1.3-alpha001 2020-09-04 13:19:03 +01:00
Eliot Jones
98af575ee3 version 0.1.2 2020-07-04 16:55:14 +01:00
Eliot Jones
5fb04582a7 0.1.2-alpha003 2020-06-20 12:54:31 +01:00
Eliot Jones
256c2833ab 0.1.2-alpha002 2020-05-10 16:36:14 +01:00
Eliot Jones
98dd736f94 0.1.2-alpha001 2020-04-25 15:20:07 +01:00
Eliot Jones
db442194c3 use a mutable struct 2020-04-18 12:10:17 +01:00
Eliot Jones
f1be6634a7 add a bunch more performance improvements
filter provider becomes single instance and no longer has constructor parameters.

tokenizers use list and stringbuilder pools to reduce allocations.

system font finder becomes static to preserve file cache across all documents.
2020-04-05 15:34:47 +01:00
Eliot Jones
7baa18b5dd add stringbuilder pool for tokenizers
we could replace these with spans in the next net core however for now our pools seem to increase performance by reducing gc load.
2020-04-04 18:31:55 +01:00
Eliot Jones
9abe9f4b2f #158 add strong naming to the solution 2020-04-04 16:59:51 +01:00
Eliot Jones
4ed1600cab version 0.1.1 2020-03-18 20:10:51 +00:00
Eliot Jones
8ac4195b83 0.1.1-alpha001 2020-03-15 16:52:28 +00:00
Eliot Jones
8df2f9cf6b generate all xml docs and pack them #148
after we split the solution into multiple projects the xml doc comments were no longer packed in the generated nuget package. in addition they were only generated for the net standard 2.0 target framework.

this change generates comments for all target frameworks and makes sure they're included in the generated package. it also adds missing doc comments where they weren't included on the public api and clears up a couple of minor formatting issues in the affected files.
2020-03-08 13:44:09 +00:00
Eliot Jones
4b5c8d510e add test for comment in dictionary from #145
check that we correctly handle the case where a comment appears inside a dictionary, this was handled by commit 3084a9. use list internally to dictionary tokenizer to avoid interface performance penalties.
2020-03-03 11:36:01 +00:00
Eliot Jones
420daaac6e handle unbalanced parentheses for string tokenization
when the close parenthesis is unbalanced and precedes a line break followed by '/' or '>' we assume the bracket to be unbalanced and finish reading the string.
2020-02-27 17:01:15 +00:00
Eliot Jones
0fcc4e54c8 add istestproject setting to all projects
indicates which projects are test projects to the test runner.
2020-02-27 12:35:40 +00:00
Eliot Jones
c6dc4d9eb8 handle tokenizing invalid numeric string correctly
rather than throwing when an invalid numeric string is read, our tokenizer now returns false so that error recovery methods can be attempted.
2020-02-21 11:16:31 +00:00
Eliot Jones
6cf257a331 strings record encoding used to create them.
in order to recreate the valid bytes for use in decryption it is necessary to know which encoding was used to read a string token. this is because utf16-be encoding has a byte-order marker which should be included in the resulting bytes.
2020-01-26 17:07:58 +00:00
Eliot Jones
693a3d5958 use offset to file header to correct cross references
if the %pdf version header comment is offset from the start of the file the cross reference offsets will also be wrong by that amount. this change updates the cross reference location logic to use the offset from the located version header.
2020-01-26 15:30:20 +00:00
Eliot Jones
ba09a13d08 more end image recovery logic
since inline image data may contain the end image "ei" token inside the data stream there's no reliable way to actually determine if we've read all the data. for this reason if we end up with an invalid state parsing operations after we've read the end image token we try to recover by reading from the previous token to the next end image token if any. we supply log information to let the consumer know this is what we're doing. it's still not bullet-proof but it should be good enough.

also support negative page rotation values by adding them to a 360 degree rotation so -90 degrees clockwise is 270 degrees clockwise.
2020-01-25 15:53:08 +00:00
Eliot Jones
0183c0af5f add project for nuget package #119
in order to include all projects from the solution we create a new solution with an entry-point assembly which references all projects. calling dotnet pack on this single project then packages all assemblies into the produced nuget package.

also remove old glyph list references from the main project since they have moved to the fonts project.
2020-01-06 11:31:41 +00:00
Eliot Jones
e0a45e3774 include dependencies as dlls in the published nuget
by default nuget pack does not include project dependencies. this is suboptimal since it would require managing at least 5 nuget packages. this uses a workaround detailed here https://github.com/nuget/home/issues/3891 to copy the dependent dlls to the generated nuget package. this doesn't resolve the issue of how we publish the documentlayoutanalysis project, since it is the top of the dependency tree and we publish its parent, rather than it.
2020-01-05 13:56:14 +00:00
Eliot Jones
bbde38f656 move tokenizers to their own project
since both pdfs and Adobe Type1 fonts use postscript type objects, tokenization is needed by the main project and the fonts project
2020-01-05 10:40:44 +00:00