Commit Graph

71 Commits

Author SHA1 Message Date
Eliot Jones
a4047247a8 Additional digital corpora testing (#1261)
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Common Crawl Tests / build (0008-0009) (push) Has been cancelled
Run Common Crawl Tests / build (0010-0011) (push) Has been cancelled
Run Common Crawl Tests / build (0012-0013) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Tag Release / tag_if_version_changed (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
* add additional testing pairs 0010-0011 for integration

some of these files required the skip missing fonts flag set to true

- propagate use lenient parsing for dictionaries inside arrays, handles a
corrupt file 0012710 not in this test set

* add pair 0012-0013
2026-02-22 21:01:03 +00:00
Eliot Jones
9c0d6893e0 revert flate decode handling to more lenient processing (#1254)
* revert flate decode handling to more lenient processing

the change to use zlib/a adler checksum verification flow meant that
invalid flate streams would not be decoded correctly. this caused
issues for files that included invalid/missing checksums. this reverts
the processing to the old approach for files like #1235

* fix object stream offset handling and track circular refs

* update tests

* normalize line endings for mac runner

* fixes for mac clownery

* add next pair to common crawl action

* add a test case for the root cause of the int overflow
2026-02-22 15:49:50 +00:00
EliotJones
7e370a0eab confirm tag is on master before running publish
Some checks failed
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Tag Release / tag_if_version_changed (push) Has been cancelled
2026-02-15 18:39:45 +00:00
EliotJones
b9aa53166d replace release flow single job with pr process
since actions do not have permissions to push
directly to master and bot accounts to achieve
the same are hard to manage we will change the
release flow to work as follows.

1. manually invoke `prepare_release_pr.yml` action,
this creates a new branch with the version of all
project files updated and creates a pull request for
that version. this pr then should be merged using
rebase merge

2. `tag_release.yml` checks if the newest commit
name starts with the text "Release " and also verifies
if it changed the version of the package csproj. if
both those conditions are met it will create and
push a new tag, e.g. `v0.1.17` to master

3. `publish_nuget.yml` listens for new `v*` tags on
master and triggers the nuget deployment

this is all chat gpt code so who knows if it will work
2026-02-15 18:39:45 +00:00
EliotJones
aef0a78ee6 update release logic to check out master before commit
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-11-14 09:50:07 +00:00
BobLd
6ce6986d78 Update test run command to use Release configuration
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-10-29 20:29:42 +00:00
Eliot Jones
07df6fd740 read last line of ignore file (#1155)
* read last line of ignore file

- do not cancel other matrix jobs if one test fails
- read all lines of the ignore list even if it doesn't end with a newline
- add ignore list for 0008 and 0009

* support missing object numbers when brute-forcing

the file 10404 (ironically) contains not found references with number 43 0
for its info dictionary. changes brute-force code so that objects can be
entirely missing

* fix test since document is now opened successfully but mediabox is broken
2025-09-13 16:57:35 +02:00
EliotJones
77db6c6b54 add test jobs for common crawl 0000 to 0007 2025-09-13 14:52:04 +01:00
Eliot Jones
8408c98aec Draft release on master build (#1145)
* remove alpha postfix, releases will increment version

* update the master build job to draft a release

* add publish action to publish full release

* enable setting assembly and file version

* bump assembly and file version for package project

---------

Co-authored-by: BobLd <38405645+BobLd@users.noreply.github.com>
2025-09-08 20:07:36 +01:00
BobLd
0f641774e6 Update build_and_test_macos.yml 2025-08-09 08:33:34 +01:00
BobLd
a3edc926c8 Update build_and_test_macos.yml 2025-08-09 08:21:21 +01:00
EliotJones
7ff58893af only run tests if nightly publish needed 2025-08-04 21:46:13 -05:00
EliotJones
bee6f13888 fix tag fetching and parse behavior 2025-08-04 21:40:28 -05:00
EliotJones
e6dd2d15c2 use gemini to mark ched gpt's work and improve the action 2025-08-04 21:00:12 -05:00
EliotJones
7dd5d68be3 prevent duplicate package publish on manual run, attempt 1 2025-08-04 20:49:18 -05:00
BobLd
bdf3b8e2b4 Update nightly_release.yml 2025-08-03 20:03:13 +01:00
BobLd
c8dff885bd Update run_common_crawl_tests.yml 2025-08-03 08:56:17 +01:00
BobLd
0b228c57b7 Update run_integration_tests.yml 2025-08-03 08:52:27 +01:00
BobLd
ef21227b3c Update run_integration_tests.yml 2025-08-03 08:46:40 +01:00
BobLd
b6950a5fb0 Update run_integration_tests.yml (#1117) 2025-08-03 08:34:50 +01:00
BobLd
a5e92cd11c Update run_common_crawl_tests.yml 2025-07-19 12:21:10 +01:00
EliotJones
4bf746c747 add new action to run integration against common crawl corpus 2025-07-19 11:49:34 +01:00
BobLd
b8bd40e486 Create build_and_test_macos.yml 2025-04-06 12:04:24 +01:00
BobLd
f1f27a63e1 Update run_integration_tests.yml 2025-03-08 13:15:40 +00:00
Arnaud TAMAILLON
cd2a85e642 Ensure tests are reusing the previous build and run on release configuration 2024-09-03 05:09:03 +01:00
Richard Webb
937793bec7 Update Github actions in the CI build 2024-05-06 17:38:00 +01:00
BobLd
8163d9ff89 Update run_integration_tests.yml 2024-03-16 12:44:03 +00:00
Jason Nelson
f1ebaab26d [CI] Update windows to 2022, and install net8.0 (part 2) 2024-03-15 13:10:25 +00:00
Jason Nelson
03fd2832ac [CI] Install net8.0 2024-03-15 13:10:25 +00:00
Eliot Jones
fcf09ac6b3 move nightly builds back to main feed 2023-08-05 16:22:55 +01:00
Eliot Jones
32cf25e6b7 fix typo in accursed yaml 2023-06-06 19:36:29 +01:00
Eliot Jones
2366d22719 move nightly builds to separate package id
while nightly builds are useful they also cause
a large amount of spam on the main project
nuget. here we try to change the package id
so that it will be hosted as a separate package
while having all the same code and namespaces
this means people can opt into the nightly builds
while keeping the version history of the released
package tidy.

no idea if this will work because actions and
yaml is my idea of hell and is impossible to
debug, but let's give it a go
2023-06-05 21:48:02 +01:00
Eliot Jones
fc59d1e58f try making nightly release dependent on test passing 2023-05-18 20:05:06 +01:00
Eliot Jones
23aeb66566 bump integration test runner version 2023-05-17 20:34:10 +01:00
BobLd
0d15c395ea Revert "Update run_integration_tests.yml"
This reverts commit 92f47df9f6.
2023-04-16 19:29:15 +01:00
BobLd
92f47df9f6 Update run_integration_tests.yml
Update integration tests action dotnet version from 2.1.x to 3.1.x
2023-04-10 17:41:05 +01:00
Eliot Jones
e402a07105 add net 6 support in actions 2022-04-03 15:22:39 -04:00
Eliot Jones
6b08547481 use windows 2019 now 2016 is deprecated 2022-04-03 14:26:37 -04:00
Eliot Jones
b0a5f4c8d0 fix test, again 2022-02-19 16:27:21 -04:00
Eliot Jones
81e7861854 check if build works on windows 2016 2022-02-19 15:47:04 -04:00
Eliot Jones
77ac6caa64 use windows 2019 for actions 2022-02-19 12:57:33 -04:00
Eliot Jones
b89c8c577d add 3rd part of archive to integration tests 2022-01-11 17:39:14 +00:00
Eliot Jones
a342115b9c run build and test on pull requests 2022-01-11 13:56:13 +00:00
Eliot Jones
7ed985a023 move console runner to named file and clean output then run in action 2022-01-11 11:27:50 +00:00
Eliot Jones
16e1e5e52d fix workflow typos 2022-01-11 10:13:23 +00:00
Eliot Jones
bb16af1d9a add code to try unzip archives 2022-01-11 10:05:21 +00:00
Eliot Jones
41bfa1a054 let wget create archive folder if not exists 2022-01-10 23:07:18 +00:00
Eliot Jones
0bb77c1144 also download and cache part 1 2022-01-10 23:05:18 +00:00
Eliot Jones
20e1695c5f try caching downloaded file 2022-01-10 23:01:17 +00:00
Eliot Jones
9a46135de5 start writing an action for integration tests 2022-01-10 22:50:37 +00:00