Compare commits

...

19 Commits

Author SHA1 Message Date
Bert Huijben
3592fc8438 Use zlib information to verify compressed content before using it
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
2025-10-15 18:46:36 +01:00
BobLd
c9034f991c Only apply RemoveStridePadding() when bytes per pixel is one and fix #1183 2025-10-15 12:57:25 +01:00
BobLd
255e70f0a7 Set Type 3 font ascent to Top instead of Height, see #1164
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-10-14 12:18:52 +01:00
BobLd
2216ade1f2 Trim excess in long lived font collections
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
2025-10-14 10:26:41 +01:00
BobLd
cf0c33b1e0 Improve DfsIterative() performance
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-10-13 12:35:21 +01:00
BobLd
ffba176060 Improve GroupIndexes() performance with #1178 2025-10-13 12:35:21 +01:00
BobLd
b14f45f59f Add more tests to NearestNeighbourWordExtractorTests
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
2025-10-12 19:51:04 +01:00
ricflams
c28d114b79 Guard against circular references in XRef tables/streams
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
- Detect and prevent an xref table/stream at a certain offset from being read twice; malformed xref tables with circular references could otherwise cause the table-reading to loop forever.
- Another approach could be to prevent TryReadTableAtOffset from changing the bytes' CurrentOffset to the lastObjPosition in its attempt to read a table (eg restore CurrentOffset after the attempt to read a Table) so the outer bytes-loop could continue its search through the entire bytes unaffected.
2025-10-01 06:32:38 +01:00
Richard Flamsholt
d7d01f842e Update test Issue874: No longer missing a font
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
Including the stream-xref means that the formerly missing font is no longer missing, so simply run the two test-cases under the (stricter) assumption of SkipMissingFonts=false.
2025-09-30 18:35:45 +01:00
Richard Flamsholt
33a8d829ee Update test Issue874: Also more text on page 2
Page two has had four more characters added, which is now delected by this xref-stream fix
2025-09-30 18:35:45 +01:00
Richard Flamsholt
57921c7e9b Update test Issue874: Now finds more text on page 1
With the fix for including associated streams, this test now finds more text on the first page. I've verified using Aspose.PDF and by viewing the ErcotFacts.pdf file being tested that yes, it was indeed missing part of the text before.
2025-09-30 18:35:45 +01:00
ricflams
5a6b3970f0 Add table-xref's associated stream-xrefs
- If an XrefTable has an associated stream, as indicated via the XrefStm-property, then read and add that XrefStream
- Any table can have 0 or 1 such associated streams
- A caveat: such an associated stream might also theoretically be part of the Parts-sequence in which case it would be encountered both by looping through all those parts along with all the regular tables and now also by association to any of those tables. It doesn't seem harmful since the offsets are flattened eventually anyway and stored by their offset-key into a mapping-table.
2025-09-30 18:35:45 +01:00
ricflams
397ccb15d6 Add xref-streams tied to any parts, not just the first
On a large sample of pdf-files PdfPig failed to read the correct StructTree-object for about 1% of them. The StructTree object was simply missing in the CrossReferenceTable.CrossReferenceTable.
It turned out that the constructed CrossReferenceTable could miss Stream-parts if there were multiple Table-parts because a stream will only be added if it's associated with the very first Table-part. The remedy would seem to be to check for and add streams that are associated with any of the Table-parts, not just the first one.
On a sample of 72 files where this failed, this changed fixed the StructTree for all of them.
2025-09-30 18:35:45 +01:00
BobLd
ca284e0cb9 Use pageFactoryCache.Clear() in Pages dispose and fix #1170 2025-09-28 17:18:00 +01:00
BobLd
b2f4ca8839 Add GetDescent() and GetAscent() methods to IFont, improve font matrix for TrueTypeSimpleFont and TrueTypeStandard14FallbackSimpleFont and add loose bounding box to Letter
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-09-21 15:07:52 +01:00
BobLd
008959457a Expose letter's font via GetFont(), make Font property as obsolete and use FontDetails instead
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
2025-09-20 17:11:38 +01:00
BobLd
a53d96cb73 Use record struct in FileHeaderOffset 2025-09-20 13:45:50 +01:00
EliotJones
efdedb9495 handle case where offsets are out of range
Some checks failed
Build, test and publish draft / build (push) Has been cancelled
Build and test [MacOS] / build (push) Has been cancelled
Run Common Crawl Tests / build (0000-0001) (push) Has been cancelled
Run Common Crawl Tests / build (0002-0003) (push) Has been cancelled
Run Common Crawl Tests / build (0004-0005) (push) Has been cancelled
Run Common Crawl Tests / build (0006-0007) (push) Has been cancelled
Run Integration Tests / build (push) Has been cancelled
Nightly Release / Check if this commit has already been published (push) Has been cancelled
Nightly Release / tests (push) Has been cancelled
Nightly Release / build_and_publish_nightly (push) Has been cancelled
default to returning empty glyph where the offset is out of the
file length range, this fixes file 12623 where the truetype file
is completely broken
2025-09-14 15:26:12 +01:00
BobLd
eb906a776d Handle non seekable stream by copying it into a memory stream and fix #1146 2025-09-14 14:42:59 +01:00
67 changed files with 1276 additions and 330 deletions

1
.gitignore vendored
View File

@@ -249,3 +249,4 @@ _Pvt_Extensions
/docs/doxygen
/tools/UglyToad.PdfPig.ConsoleRunner/Properties/launchSettings.json
/src/UglyToad.PdfPig.Tests/Images/Files/Pdf/indexed-png-with-mask.png

View File

@@ -224,9 +224,18 @@
[Pure]
public double TransformX(double x)
{
var xt = A * x + C * 0 + E;
return A * x + E; // + C * 0
}
return xt;
/// <summary>
/// Transform an Y coordinate using this transformation matrix.
/// </summary>
/// <param name="y">The Y coordinate.</param>
/// <returns>The transformed Y coordinate.</returns>
public double TransformY(double y)
{
return D * y + F;
}
/// <summary>

View File

@@ -216,71 +216,37 @@
yield return group.Select(i => elements[i]).ToList();
}
}
/// <summary>
/// Group elements using Depth-first search.
/// <para>https://en.wikipedia.org/wiki/Depth-first_search</para>
/// </summary>
/// <param name="edges">The graph. edges[i] = j indicates that there is an edge between i and j.</param>
/// <returns>A List of HashSets containing the grouped indexes.</returns>
internal static List<HashSet<int>> GroupIndexes(int[] edges)
internal static List<List<int>> GroupIndexes(int[] edges)
{
int[][] adjacency = new int[edges.Length][];
// Improved thanks to https://github.com/UglyToad/PdfPig/issues/1178
var adjacency = new List<int>[edges.Length];
for (int i = 0; i < edges.Length; i++)
{
HashSet<int> matches = new HashSet<int>();
if (edges[i] != -1) matches.Add(edges[i]);
for (int j = 0; j < edges.Length; j++)
{
if (edges[j] == i) matches.Add(j);
}
adjacency[i] = matches.ToArray();
adjacency[i] = new List<int>();
}
List<HashSet<int>> groupedIndexes = new List<HashSet<int>>();
// one pass O(n)
for (int i = 0; i < edges.Length; i++)
{
int j = edges[i];
if (j != -1)
{
// i <-> j
adjacency[i].Add(j);
adjacency[j].Add(i);
}
}
List<List<int>> groupedIndexes = new List<List<int>>();
bool[] isDone = new bool[edges.Length];
for (int p = 0; p < edges.Length; p++)
{
if (isDone[p]) continue;
groupedIndexes.Add(DfsIterative(p, adjacency, ref isDone));
}
return groupedIndexes;
}
/// <summary>
/// Group elements using Depth-first search.
/// <para>https://en.wikipedia.org/wiki/Depth-first_search</para>
/// </summary>
/// <param name="edges">The graph. edges[i] = [j, k, l, ...] indicates that there is an edge between i and each element j, k, l, ...</param>
/// <returns>A List of HashSets containing the grouped indexes.</returns>
internal static List<HashSet<int>> GroupIndexes(int[][] edges)
{
int[][] adjacency = new int[edges.Length][];
for (int i = 0; i < edges.Length; i++)
{
HashSet<int> matches = new HashSet<int>();
for (int j = 0; j < edges[i].Length; j++)
if (isDone[p])
{
if (edges[i][j] != -1) matches.Add(edges[i][j]);
continue;
}
for (int j = 0; j < edges.Length; j++)
{
for (int k = 0; k < edges[j].Length; k++)
{
if (edges[j][k] == i) matches.Add(j);
}
}
adjacency[i] = matches.ToArray();
}
List<HashSet<int>> groupedIndexes = new List<HashSet<int>>();
bool[] isDone = new bool[edges.Length];
for (int p = 0; p < edges.Length; p++)
{
if (isDone[p]) continue;
groupedIndexes.Add(DfsIterative(p, adjacency, ref isDone));
}
return groupedIndexes;
@@ -290,22 +256,33 @@
/// Depth-first search
/// <para>https://en.wikipedia.org/wiki/Depth-first_search</para>
/// </summary>
private static HashSet<int> DfsIterative(int s, int[][] adj, ref bool[] isDone)
private static List<int> DfsIterative(int s, List<int>[] adj, ref bool[] isDone)
{
HashSet<int> group = new HashSet<int>();
Stack<int> S = new Stack<int>();
List<int> group = new List<int>();
Stack<int> S = new Stack<int>(4);
S.Push(s);
isDone[s] = true;
while (S.Count > 0)
{
var u = S.Pop();
if (!isDone[u])
group.Add(u);
#if NET
var currentAdj = System.Runtime.InteropServices.CollectionsMarshal.AsSpan(adj[u]);
int count = currentAdj.Length;
#else
var currentAdj = adj[u];
int count = currentAdj.Count;
#endif
for (int i = 0; i < count; ++i)
{
group.Add(u);
isDone[u] = true;
foreach (var v in adj[u])
var v = currentAdj[i];
ref bool done = ref isDone[v];
if (!done)
{
S.Push(v);
done = true;
}
}
}

View File

@@ -68,25 +68,11 @@
// update textSequence?
// update font details to bold
var fontDetails = new FontDetails(letter.Font.Name, true, letter.Font.Weight, letter.Font.IsItalic);
var newLetter = new Letter(letter.Value,
letter.GlyphRectangle,
letter.StartBaseLine,
letter.EndBaseLine,
letter.Width,
letter.FontSize,
fontDetails,
letter.RenderingMode,
letter.StrokeColor,
letter.FillColor,
letter.PointSize,
letter.TextSequence);
// update markedContentStack?
// update letters
cleanLetters[duplicatesOverlappingIndex] = newLetter;
cleanLetters[duplicatesOverlappingIndex] = letter.AsBold();
}
}

View File

@@ -60,11 +60,12 @@
letter = new Letter(
" ",
letter.GlyphRectangle,
letter.GlyphRectangleLoose,
letter.StartBaseLine,
letter.EndBaseLine,
letter.Width,
letter.FontSize,
letter.Font,
letter.GetFont()!,
letter.RenderingMode,
letter.StrokeColor,
letter.FillColor,

View File

@@ -770,6 +770,8 @@ namespace UglyToad.PdfPig.Fonts.CompactFontFormat.CharStrings
}
}
values.TrimExcess();
return new Type2CharStrings.CommandSequence(values, commandIdentifiers);
}

View File

@@ -257,6 +257,10 @@
gidToStringIdAndNameMap[gid++] = pair;
}
#if NET
gidToStringIdAndNameMap.TrimExcess();
#endif
glyphIdToStringIdAndName = gidToStringIdAndNameMap;
}

View File

@@ -121,14 +121,23 @@
for (var i = 0; i < glyphCount; i++)
{
if (offsets[i + 1] <= offsets[i])
var offset = offsets[i];
if (offsets[i + 1] <= offset)
{
// empty glyph
result[i] = emptyGlyph;
continue;
}
data.Seek(offsets[i]);
// Invalid table, just sub in the empty glyph
if (offset >= data.Length)
{
result[i] = emptyGlyph;
continue;
}
data.Seek(offset);
var contourCount = data.ReadSignedShort();

View File

@@ -7,6 +7,7 @@
<GenerateDocumentationFile>true</GenerateDocumentationFile>
<SignAssembly>true</SignAssembly>
<AssemblyOriginatorKeyFile>..\pdfpig.snk</AssemblyOriginatorKeyFile>
<Nullable>annotations</Nullable>
</PropertyGroup>
<ItemGroup>
<None Remove="Resources\AdobeFontMetrics\*" />

View File

@@ -1,6 +1,7 @@
namespace UglyToad.PdfPig.Tests.ContentStream
{
using PdfPig.Core;
using System.Globalization;
public class IndirectReferenceTests
{
@@ -33,50 +34,59 @@
[Fact]
public void IndirectReferenceHashTest()
{
var reference0 = new IndirectReference(1574, 690);
Assert.Equal(1574, reference0.ObjectNumber);
Assert.Equal(690, reference0.Generation);
CultureInfo lastCulture = CultureInfo.CurrentCulture;
CultureInfo.CurrentCulture = new CultureInfo("en-US");
try
{
var reference0 = new IndirectReference(1574, 690);
Assert.Equal(1574, reference0.ObjectNumber);
Assert.Equal(690, reference0.Generation);
var reference1 = new IndirectReference(-1574, 690);
Assert.Equal(-1574, reference1.ObjectNumber);
Assert.Equal(690, reference1.Generation);
var reference1 = new IndirectReference(-1574, 690);
Assert.Equal(-1574, reference1.ObjectNumber);
Assert.Equal(690, reference1.Generation);
var reference2 = new IndirectReference(58949797283757, 16);
Assert.Equal(58949797283757, reference2.ObjectNumber);
Assert.Equal(16, reference2.Generation);
var reference2 = new IndirectReference(58949797283757, 16);
Assert.Equal(58949797283757, reference2.ObjectNumber);
Assert.Equal(16, reference2.Generation);
var reference3 = new IndirectReference(-58949797283757, ushort.MaxValue);
Assert.Equal(-58949797283757, reference3.ObjectNumber);
Assert.Equal(ushort.MaxValue, reference3.Generation);
var reference3 = new IndirectReference(-58949797283757, ushort.MaxValue);
Assert.Equal(-58949797283757, reference3.ObjectNumber);
Assert.Equal(ushort.MaxValue, reference3.Generation);
var reference4 = new IndirectReference(140737488355327, ushort.MaxValue);
Assert.Equal(140737488355327, reference4.ObjectNumber);
Assert.Equal(ushort.MaxValue, reference4.Generation);
var reference4 = new IndirectReference(140737488355327, ushort.MaxValue);
Assert.Equal(140737488355327, reference4.ObjectNumber);
Assert.Equal(ushort.MaxValue, reference4.Generation);
var reference5 = new IndirectReference(-140737488355327, ushort.MaxValue);
Assert.Equal(-140737488355327, reference5.ObjectNumber);
Assert.Equal(ushort.MaxValue, reference5.Generation);
var reference5 = new IndirectReference(-140737488355327, ushort.MaxValue);
Assert.Equal(-140737488355327, reference5.ObjectNumber);
Assert.Equal(ushort.MaxValue, reference5.Generation);
var ex0 = Assert.Throws<ArgumentOutOfRangeException>(() => new IndirectReference(140737488355328, 0));
Assert.StartsWith("Object number must be between -140,737,488,355,327 and 140,737,488,355,327.", ex0.Message);
var ex1 = Assert.Throws<ArgumentOutOfRangeException>(() => new IndirectReference(-140737488355328, 0));
Assert.StartsWith("Object number must be between -140,737,488,355,327 and 140,737,488,355,327.", ex1.Message);
var ex2 = Assert.Throws<ArgumentOutOfRangeException>(() => new IndirectReference(1574, -1));
Assert.StartsWith("Generation number must not be a negative value.", ex2.Message);
// We make sure object number is still correct even if generation is not
var reference6 = new IndirectReference(1574, int.MaxValue);
Assert.Equal(1574, reference6.ObjectNumber);
var reference7 = new IndirectReference(-1574, ushort.MaxValue + 10);
Assert.Equal(-1574, reference7.ObjectNumber);
var ex0 = Assert.Throws<ArgumentOutOfRangeException>(() => new IndirectReference(140737488355328, 0));
Assert.StartsWith("Object number must be between -140,737,488,355,327 and 140,737,488,355,327.", ex0.Message);
var ex1 = Assert.Throws<ArgumentOutOfRangeException>(() => new IndirectReference(-140737488355328, 0));
Assert.StartsWith("Object number must be between -140,737,488,355,327 and 140,737,488,355,327.", ex1.Message);
var reference9 = new IndirectReference(-140737488355327, ushort.MaxValue + 10);
Assert.Equal(-140737488355327, reference9.ObjectNumber);
var ex2 = Assert.Throws<ArgumentOutOfRangeException>(() => new IndirectReference(1574, -1));
Assert.StartsWith("Generation number must not be a negative value.", ex2.Message);
var reference10 = new IndirectReference(140737488355327, ushort.MaxValue * 10);
Assert.Equal(140737488355327, reference10.ObjectNumber);
// We make sure object number is still correct even if generation is not
var reference6 = new IndirectReference(1574, int.MaxValue);
Assert.Equal(1574, reference6.ObjectNumber);
var reference7 = new IndirectReference(-1574, ushort.MaxValue + 10);
Assert.Equal(-1574, reference7.ObjectNumber);
var reference9 = new IndirectReference(-140737488355327, ushort.MaxValue + 10);
Assert.Equal(-140737488355327, reference9.ObjectNumber);
var reference10 = new IndirectReference(140737488355327, ushort.MaxValue * 10);
Assert.Equal(140737488355327, reference10.ObjectNumber);
}
finally
{
CultureInfo.CurrentCulture = lastCulture;
}
}
[Fact]

View File

@@ -2,16 +2,23 @@
{
internal static class DlaHelper
{
private static readonly string DlaFolder = Path.GetFullPath(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "..", "..", "..", "Dla", "Documents"));
private static readonly string IntegrationFolder = Path.GetFullPath(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "..", "..", "..", "Integration", "Documents"));
public static string GetDocumentPath(string name, bool isPdf = true)
{
var documentFolder = Path.GetFullPath(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "..", "..", "..", "Dla", "Documents"));
if (!name.EndsWith(".pdf") && isPdf)
{
name += ".pdf";
}
return Path.Combine(documentFolder, name);
string doc = Path.Combine(DlaFolder, name);
if (File.Exists(doc))
{
return doc;
}
return Path.Combine(IntegrationFolder, name);
}
}
}

View File

@@ -4,19 +4,84 @@
public class NearestNeighbourWordExtractorTests
{
[Fact]
public void Words2559Doc()
public static IEnumerable<object[]> DataWords => new[]
{
// Microsoft Word count of words = 2559
new object[]
{
"2559 words.pdf",
5118,
2559
},
new object[]
{
"fseprd1102849.pdf",
12903,
11177
},
new object[]
{
"90 180 270 rotated.pdf",
589,
292
},
new object[]
{
"complex rotated.pdf",
805,
403
},
new object[]
{
"no horizontal distance.pdf",
4,
2
},
new object[]
{
"no vertical distance.pdf",
22,
10
},
new object[]
{
"no vertical horizontal distance.pdf",
4,
2
},
new object[]
{
"Random 2 Columns Lists Hyph - Justified.pdf",
1191,
607
},
new object[]
{
"caly-issues-56-1.pdf",
184,
156
},
new object[]
{
"caly-issues-58-2.pdf",
49,
49
},
};
using (var document = PdfDocument.Open(DlaHelper.GetDocumentPath("2559 words.pdf")))
[SkippableTheory]
[MemberData(nameof(DataWords))]
public void WordCount(string path, int wordCount, int noSpacesWordCount)
{
using (var document = PdfDocument.Open(DlaHelper.GetDocumentPath(path)))
{
var page = document.GetPage(1);
var words = NearestNeighbourWordExtractor.Instance.GetWords(page.Letters).ToArray();
Assert.Equal(wordCount, words.Length);
var noSpacesWords = words.Where(x => !string.IsNullOrEmpty(x.Text.Trim())).ToArray();
Assert.Equal(2559, noSpacesWords.Length);
Assert.Equal(noSpacesWordCount, noSpacesWords.Length);
}
}
}

View File

@@ -1,14 +1,12 @@
namespace UglyToad.PdfPig.Tests.Dla
{
using System;
using PdfFonts;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using UglyToad.PdfPig.Content;
using UglyToad.PdfPig.Core;
using UglyToad.PdfPig.DocumentLayoutAnalysis;
using UglyToad.PdfPig.DocumentLayoutAnalysis.ReadingOrderDetector;
using UglyToad.PdfPig.Core;
public class UnsupervisedReadingOrderTests
{
@@ -62,10 +60,11 @@
private static TextBlock CreateFakeTextBlock(PdfRectangle boundingBox)
{
var letter = new Letter("a",
boundingBox,
boundingBox,
boundingBox.BottomLeft,
boundingBox.BottomRight,
10, 1, null, TextRenderingMode.NeitherClip, null, null, 0, 0);// These don't matter
10, 1, (FontDetails)null, TextRenderingMode.NeitherClip, null, null, 0, 0);// These don't matter
var leftTextBlock = new TextBlock(new[] { new TextLine(new[] { new Word(new[] { letter }) }) });
return leftTextBlock;
}

View File

@@ -4,6 +4,7 @@ using UglyToad.PdfPig.Tests.Dla;
namespace UglyToad.PdfPig.Tests.Fonts.SystemFonts
{
using PdfPig.Core;
using PdfPig.Geometry;
public class Linux
{
@@ -68,7 +69,10 @@ namespace UglyToad.PdfPig.Tests.Fonts.SystemFonts
Assert.Equal(expectedData.TopLeft.Y, current.GlyphRectangle.TopLeft.Y, 6);
Assert.Equal(expectedData.Width, current.GlyphRectangle.Width, 6);
Assert.Equal(expectedData.Height, current.GlyphRectangle.Height, 6);
Assert.Equal(expectedData.Rotation, current.GlyphRectangle.Rotation, 3);
Assert.Equal(expectedData.Rotation, current.GlyphRectangle.Rotation, 3);
Assert.True(current.GlyphRectangle.IntersectsWith(current.GlyphRectangleLoose));
Assert.Equal(current.GlyphRectangle.Rotation, current.GlyphRectangleLoose.Rotation, 3);
}
}
}

View File

@@ -229,5 +229,19 @@ namespace UglyToad.PdfPig.Tests.Fonts.TrueType.Parser
Assert.NotNull(font.TableRegister.NameTable);
Assert.NotEmpty(font.TableRegister.NameTable.NameRecords);
}
[Fact]
public void Parse12623CorruptFileAndGetGlyphs()
{
var bytes = TrueTypeTestHelper.GetFileBytes("corrupt-12623");
var input = new TrueTypeDataBytes(new MemoryInputBytes(bytes));
var font = TrueTypeFontParser.Parse(input);
Assert.NotNull(font);
font.TryGetPath(1, out _);
}
}
}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 79 KiB

View File

@@ -9,5 +9,17 @@
using var document = PdfDocument.Open(path);
Assert.Equal(3, document.NumberOfPages);
}
[Fact]
public void CanReadDocumentWithCircularXRef()
{
string path = IntegrationHelpers.GetSpecificTestDocumentPath("B17-2000-transportation-fuels.pdf");
// If parser can't deal with xrefs that have circular references then
// opening the document will loop forever
using var document = PdfDocument.Open(path);
Assert.Equal(1, document.NumberOfPages);
}
}
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 196 KiB

View File

@@ -86,8 +86,8 @@
var letter = page.Letters[l];
var expected = DataBoldItalic[l];
Assert.Equal((string)expected[0], letter.Value);
Assert.Equal((bool)expected[1], letter.Font.IsBold);
Assert.Equal((bool)expected[2], letter.Font.IsItalic);
Assert.Equal((bool)expected[1], letter.FontDetails.IsBold);
Assert.Equal((bool)expected[2], letter.FontDetails.IsItalic);
}
}
}

View File

@@ -4,9 +4,76 @@
using DocumentLayoutAnalysis.PageSegmenter;
using DocumentLayoutAnalysis.WordExtractor;
using PdfPig.Core;
using SkiaSharp;
public class GithubIssuesTests
{
[Fact]
public void Issue1183()
{
var path = IntegrationHelpers.GetDocumentPath("test_a.pdf");
byte[] expected =
[
82, 85, 134, 255, 87, 90, 139, 255, 81, 84, 133, 255, 87, 89, 139, 255, 89, 91, 141, 255, 81, 83, 133,
255, 84, 86, 136, 255, 84, 86, 136, 255, 70, 59, 113, 255, 69, 62, 116, 255, 75, 73, 126, 255, 45, 48,
100, 255, 42, 48, 99, 255, 50, 55, 107, 255, 56, 59, 111, 255, 64, 66, 118, 255, 68, 63, 118, 255, 61,
56, 111, 255, 70, 64, 120, 255, 67, 62, 117, 255, 61, 56, 111, 255, 68, 63, 118, 255, 68, 62, 118, 255,
59, 54, 109, 255, 61, 60, 117, 255, 69, 65, 122, 255, 67, 59, 116, 255, 71, 62, 118, 255, 66, 60, 115,
255, 47, 49, 102, 255, 40, 51, 102, 255, 35, 51, 100, 255, 70, 58, 114, 255, 68, 56, 112, 255, 76, 65,
121, 255, 68, 58, 114, 255, 66, 58, 114, 255, 71, 64, 119, 255, 62, 56, 111, 255, 67, 62, 117, 255, 77,
61, 118, 255, 71, 56, 113, 255, 76, 63, 119, 255, 74, 63, 118, 255, 63, 55, 108, 255, 71, 64, 116, 255,
73, 68, 119, 255, 52, 49, 99, 255, 38, 51, 99, 255, 49, 62, 110, 255, 39, 51, 100, 255, 46, 55, 106,
255, 50, 55, 107, 255, 63, 62, 116, 255, 67, 60, 116, 255, 71, 60, 116, 255, 67, 58, 112, 255, 68, 61,
114, 255, 70, 67, 119, 255, 50, 50, 101, 255, 42, 47, 96, 255, 49, 59, 106, 255, 40, 54, 100, 255, 42,
57, 103, 255, 51, 51, 102, 255, 67, 60, 112, 255, 73, 62, 114, 255, 71, 65, 117, 255, 48, 53, 103, 255,
45, 55, 104, 255, 49, 55, 105, 255, 63, 63, 114, 255, 68, 59, 115, 255, 71, 59, 115, 255, 73, 59, 115,
255, 74, 61, 118, 255, 66, 58, 114, 255, 50, 51, 105, 255, 39, 51, 104, 255, 34, 52, 103, 255, 64, 60,
116, 255, 67, 64, 119, 255, 66, 66, 120, 255, 46, 49, 102, 255, 45, 51, 102, 255, 52, 61, 111, 255, 39,
51, 99, 255, 41, 54, 102, 255, 42, 54, 100, 255, 43, 53, 99, 255, 47, 55, 103, 255, 51, 56, 104, 255,
56, 57, 108, 255, 67, 65, 117, 255, 67, 63, 116, 255, 52, 47, 100, 255, 44, 55, 106, 255, 44, 56, 106,
255, 42, 54, 103, 255, 42, 54, 102, 255, 40, 52, 100, 255, 41, 52, 99, 255, 45, 57, 103, 255, 42, 53,
99, 255, 38, 54, 95, 255, 39, 55, 97, 255, 47, 64, 105, 255, 37, 53, 95, 255, 37, 53, 95, 255, 46, 63,
104, 255, 39, 55, 96, 255, 42, 58, 99, 255, 41, 55, 105, 255, 45, 55, 106, 255, 46, 51, 103, 255, 51,
51, 103, 255, 63, 61, 114, 255, 70, 68, 121, 255, 60, 60, 113, 255, 46, 48, 100, 255, 49, 51, 101, 255,
51, 52, 103, 255, 58, 58, 109, 255, 69, 66, 119, 255, 64, 60, 113, 255, 61, 55, 109, 255, 70, 62, 118,
255, 67, 58, 114, 255, 72, 59, 115, 255, 70, 58, 115, 255, 72, 62, 118, 255, 61, 55, 110, 255, 64, 62,
116, 255, 65, 65, 119, 255, 47, 50, 104, 255, 52, 56, 109, 255, 39, 53, 106, 255, 41, 54, 107, 255, 40,
50, 102, 255, 45, 51, 103, 255, 64, 66, 117, 255, 62, 61, 112, 255, 67, 63, 114, 255, 53, 47, 98, 255,
49, 54, 101, 255, 51, 56, 104, 255, 43, 48, 95, 255, 50, 55, 102, 255, 49, 54, 102, 255, 42, 47, 94,
255, 51, 56, 103, 255, 47, 52, 100, 255, 72, 62, 114, 255, 71, 62, 114, 255, 72, 67, 119, 255, 52, 52,
103, 255, 44, 48, 99, 255, 48, 57, 106, 255, 39, 52, 100, 255, 43, 58, 106, 255, 43, 51, 98, 255, 44,
52, 99, 255, 48, 57, 104, 255, 46, 55, 102, 255, 41, 50, 97, 255, 45, 55, 101, 255, 49, 59, 105, 255,
43, 53, 100, 255, 51, 57, 106, 255, 41, 49, 98, 255, 40, 52, 100, 255, 45, 60, 107, 255, 38, 53, 101,
255, 36, 48, 96, 255, 46, 54, 102, 255, 49, 55, 104, 255, 44, 55, 104, 255, 46, 56, 105, 255, 48, 58,
107, 255, 41, 49, 99, 255, 43, 50, 100, 255, 52, 59, 108, 255, 50, 55, 105, 255, 50, 55, 105, 255, 43,
54, 105, 255, 42, 51, 102, 255, 45, 53, 104, 255, 45, 49, 101, 255, 63, 63, 116, 255, 66, 63, 116, 255,
68, 63, 117, 255, 62, 55, 109, 255, 74, 60, 120, 255, 73, 59, 119, 255, 72, 58, 119, 255, 76, 62, 122,
255, 74, 60, 120, 255, 71, 57, 118, 255, 75, 61, 121, 255, 76, 62, 123, 255
];
using (var document = PdfDocument.Open(path, new ParsingOptions() { UseLenientParsing = true }))
{
var page = document.GetPage(16);
var images = page.GetImages().ToArray();
Assert.Single(images);
var image = images[0];
Assert.True(image.TryGetPng(out var bytes));
File.WriteAllBytes("test_a_16.png", bytes);
using (SKBitmap actual = SKBitmap.Decode(bytes, new SKImageInfo(431, 690, SKColorType.Bgra8888)))
{
var pixels = actual.GetPixelSpan();
Assert.Equal(1189560, pixels.Length);
Assert.Equal(expected, pixels.Slice(0, 4 * 200).ToArray());
}
}
}
[Fact]
public void Issue1156()
{
@@ -65,7 +132,7 @@
var path = IntegrationHelpers.GetSpecificTestDocumentPath("StackOverflow_Issue_1122.pdf");
var ex = Assert.Throws<PdfDocumentFormatException>(() => PdfDocument.Open(path, new ParsingOptions() { UseLenientParsing = true }));
Assert.StartsWith("Reached maximum search depth while getting indirect reference.", ex.Message);
Assert.Equal("The root object in the trailer did not resolve to a readable dictionary.", ex.Message);
}
[Fact]
@@ -124,7 +191,7 @@
{
var path = IntegrationHelpers.GetSpecificTestDocumentPath("SpookyPass.pdf");
var ex = Assert.Throws<PdfDocumentFormatException>(() => PdfDocument.Open(path, new ParsingOptions() { UseLenientParsing = true }));
Assert.Equal("Avoiding infinite recursion in ObjectLocationProvider.TryGetOffset() as 'offset' and 'reference.ObjectNumber' have the same value and opposite signs.", ex.Message);
Assert.Equal("The root object in the trailer did not resolve to a readable dictionary.", ex.Message);
}
[Fact]
@@ -289,7 +356,8 @@
using (var document = PdfDocument.Open(path, new ParsingOptions() { UseLenientParsing = true, SkipMissingFonts = true }))
{
var page = document.GetPage(13);
Assert.Throws<OverflowException>(() => DocstrumBoundingBoxes.Instance.GetBlocks(page.GetWords()));
// This used to fail with an overflow exception when we failed to validate the zlib encoded data
Assert.NotNull(DocstrumBoundingBoxes.Instance.GetBlocks(page.GetWords()));
}
}
@@ -472,19 +540,13 @@
{
var doc = IntegrationHelpers.GetDocumentPath("ErcotFacts.pdf");
using (var document = PdfDocument.Open(doc, new ParsingOptions() { UseLenientParsing = true, SkipMissingFonts = true }))
{
var page1 = document.GetPage(1);
Assert.Equal(1788, page1.Letters.Count);
var page2 = document.GetPage(2);
Assert.Equal(2430, page2.Letters.Count);
}
using (var document = PdfDocument.Open(doc, new ParsingOptions() { UseLenientParsing = true, SkipMissingFonts = false }))
{
var ex = Assert.Throws<ArgumentNullException>(() => document.GetPage(1));
Assert.StartsWith("Value cannot be null.", ex.Message);
var page1 = document.GetPage(1);
Assert.Equal(1939, page1.Letters.Count);
var page2 = document.GetPage(2);
Assert.Equal(2434, page2.Letters.Count);
}
}

View File

@@ -1,5 +1,7 @@
namespace UglyToad.PdfPig.Tests.Integration
{
using PdfPig.Geometry;
public class IntegrationDocumentTests
{
private static readonly Lazy<string> DocumentFolder = new Lazy<string>(() => Path.GetFullPath(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "..", "..", "..", "Integration", "Documents")));
@@ -11,6 +13,36 @@
"cmap-parsing-exception.pdf"
];
[Theory]
[MemberData(nameof(GetAllDocuments))]
public void CheckGlyphLooseBoundingBoxes(string documentName)
{
// Add the full path back on, we removed it so we could see it in the test explorer.
documentName = Path.Combine(DocumentFolder.Value, documentName);
using (var document = PdfDocument.Open(documentName, new ParsingOptions { UseLenientParsing = true }))
{
for (var i = 0; i < document.NumberOfPages; i++)
{
var page = document.GetPage(i + 1);
foreach (var letter in page.Letters)
{
var bbox = letter.GlyphRectangle;
if (bbox.Height > 0)
{
if (letter.GlyphRectangleLoose.Height <= 0)
{
_ = letter.GetFont().GetAscent();
}
Assert.True(letter.GlyphRectangleLoose.Height > 0, $"Page {i + 1}");
}
}
}
}
}
[Theory]
[MemberData(nameof(GetAllDocuments))]
public void CanReadAllPages(string documentName)

View File

@@ -53,6 +53,8 @@ namespace UglyToad.PdfPig.Tests.Integration
Assert.Contains(page.Letters, x => x.GlyphRectangle.Width != 0);
Assert.Contains(page.Letters, x => x.GlyphRectangle.Height != 0);
Assert.Contains(page.Letters, x => x.GlyphRectangleLoose.Width != 0);
Assert.Contains(page.Letters, x => x.GlyphRectangleLoose.Height != 0);
}
}
}

View File

@@ -147,6 +147,13 @@
Run(Type3FontZeroHeight, 1255);
}
[Fact]
public void test_a()
{
// Rendered glyphs are not correct, but we use the grid to assess
Run("test_a", 1584, 1);
}
private static void Run(string file, int imageHeight = 792, int pageNo = 1)
{
var pdfFileName = GetFilename(file);
@@ -193,6 +200,32 @@
d.SaveTo(fs);
}
}
using (var bitmap = SKBitmap.FromImage(image))
using (var graphics = new SKCanvas(bitmap))
{
foreach (var letter in page.Letters)
{
DrawRectangle(letter.GlyphRectangleLoose, graphics, violetPen, imageHeight, scale);
}
graphics.Flush();
var imageName = $"{file}_loose.jpg";
if (!Directory.Exists(OutputPath))
{
Directory.CreateDirectory(OutputPath);
}
var savePath = Path.Combine(OutputPath, imageName);
using (var fs = new FileStream(savePath, FileMode.Create))
using (SKData d = bitmap.Encode(SKEncodedImageFormat.Jpeg, 100))
{
d.SaveTo(fs);
}
}
}
}
@@ -220,7 +253,11 @@
pdf = pdf.Replace(".pdf", ".jpg");
return SKImage.FromEncodedData(pdf);
if (File.Exists(pdf))
{
return SKImage.FromEncodedData(pdf);
}
return SKImage.FromEncodedData(pdf.Replace(".jpg", ".png"));
}
}
}

View File

@@ -85,6 +85,31 @@
d.SaveTo(fs);
}
}
using (var picture = document.GetPage<SKPicture>(pageNo))
using (var image = SKImage.FromPicture(picture, size, ScaleMatrix))
using (var bmp = SKBitmap.FromImage(image))
using (var canvas = new SKCanvas(bmp))
{
Assert.NotNull(picture);
if (RenderGlyphRectangle)
{
foreach (var letter in page.Letters)
{
DrawRectangle(letter.GlyphRectangleLoose, canvas, redPaint, size.Height, Scale);
}
}
var imageName = $"{file}_{pageNo}_loose.png";
var savePath = Path.Combine(OutputPath, imageName);
using (var fs = new FileStream(savePath, FileMode.Create))
using (var d = bmp.Encode(SKEncodedImageFormat.Png, 100))
{
d.SaveTo(fs);
}
}
}
}

View File

@@ -109,10 +109,8 @@ public class FirstPassParserTests
%%EOF
""";
if (Environment.NewLine == "\n")
{
content = content.Replace("\n", "\r\n");
}
// Handle "\r\n" or "\n" in the sourcecode in the same way
content = content.Replace("\r\n", "\n").Replace("\n", "\r\n");
var ib = StringBytesTestConverter.Convert(content, false);

View File

@@ -10,6 +10,7 @@
<AssemblyOriginatorKeyFile>..\pdfpig.snk</AssemblyOriginatorKeyFile>
<RuntimeFrameworkVersion Condition="'$(TargetFramework)'=='netcoreapp2.1'">2.1.30</RuntimeFrameworkVersion>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>annotations</Nullable>
</PropertyGroup>
<ItemGroup>
@@ -39,11 +40,7 @@
</ItemGroup>
<ItemGroup>
<EmbeddedResource Remove="Fonts\TrueType\Andada-Regular.ttf" />
<EmbeddedResource Remove="Fonts\TrueType\google-simple-doc.ttf" />
<EmbeddedResource Remove="Fonts\TrueType\issue-258-corrupt-name-table.ttf" />
<EmbeddedResource Remove="Fonts\TrueType\PMingLiU.ttf" />
<EmbeddedResource Remove="Fonts\TrueType\Roboto-Regular.ttf" />
<EmbeddedResource Remove="Fonts\TrueType\*.ttf" />
<EmbeddedResource Remove="Fonts\Type1\AdobeUtopia.pfa" />
<EmbeddedResource Remove="Fonts\Type1\CMBX10.pfa" />
<EmbeddedResource Remove="Fonts\Type1\CMBX12.pfa" />
@@ -61,24 +58,12 @@
<Content Include="Fonts\CompactFontFormat\MinionPro.bin">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</Content>
<Content Include="Fonts\TrueType\Andada-Regular.ttf">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</Content>
<Content Include="Fonts\TrueType\google-simple-doc.ttf">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</Content>
<Content Include="Fonts\TrueType\issue-258-corrupt-name-table.ttf">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</Content>
<Content Include="Fonts\TrueType\PMingLiU.ttf">
<Content Include="Fonts\TrueType\*.ttf">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</Content>
<Content Include="Fonts\TrueType\Roboto-Regular.GlyphData.txt">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</Content>
<Content Include="Fonts\TrueType\Roboto-Regular.ttf">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</Content>
<Content Include="Fonts\Type1\AdobeUtopia.pfa" />
<Content Include="Fonts\Type1\CMBX10.pfa" />
<Content Include="Fonts\Type1\CMBX12.pfa" />
@@ -117,9 +102,18 @@
<None Update="Dla\Documents\90 180 270 rotated.pdf">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</None>
<None Update="Dla\Documents\caly-issues-56-1.pdf">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</None>
<None Update="Dla\Documents\caly-issues-58-2.pdf">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</None>
<None Update="Dla\Documents\complex rotated.pdf">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</None>
<None Update="Dla\Documents\fseprd1102849.pdf">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</None>
<None Update="Dla\Documents\no horizontal distance.pdf">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</None>

View File

@@ -1154,7 +1154,7 @@
{
location += letter.Location.X;
location += letter.Location.Y;
location += letter.Font.Name.Length;
location += letter.FontDetails.Name.Length;
}
}
}

View File

@@ -46,6 +46,12 @@
/// </summary>
public PdfRectangle GlyphRectangle { get; }
/// <summary>
/// The loose bounding box for the glyph. Contrary to the <see cref="GlyphRectangle"/>, the loose bounding box will be the same across all glyphes of the same font.
/// It takes in account the font Ascent and Descent.
/// </summary>
public PdfRectangle GlyphRectangleLoose { get; }
/// <summary>
/// Size as defined in the PDF file. This is not equivalent to font size in points but is relative to other font sizes on the page.
/// </summary>
@@ -54,12 +60,20 @@
/// <summary>
/// The name of the font.
/// </summary>
public string? FontName => Font?.Name;
public string? FontName => FontDetails?.Name;
/// <summary>
/// Details about the font for this letter.
/// </summary>
public FontDetails Font { get; }
public FontDetails FontDetails { get; }
/// <summary>
/// Details about the font for this letter.
/// </summary>
[Obsolete("Use FontDetails instead.")]
public FontDetails Font => FontDetails;
private readonly IFont? _font;
/// <summary>
/// Text rendering mode that indicates whether we should draw this letter's strokes,
@@ -100,12 +114,58 @@
/// <summary>
/// Create a new letter to represent some text drawn by the Tj operator.
/// </summary>
public Letter(string value, PdfRectangle glyphRectangle,
public Letter(string value,
PdfRectangle glyphRectangle,
PdfRectangle glyphRectangleLoose,
PdfPoint startBaseLine,
PdfPoint endBaseLine,
double width,
double fontSize,
FontDetails font,
IFont font,
TextRenderingMode renderingMode,
IColor strokeColor,
IColor fillColor,
double pointSize,
int textSequence) :
this(value, glyphRectangle, glyphRectangleLoose,
startBaseLine, endBaseLine,
width, fontSize, font.Details, font,
renderingMode, strokeColor, fillColor,
pointSize, textSequence)
{ }
/// <summary>
/// Create a new letter to represent some text drawn by the Tj operator.
/// </summary>
public Letter(string value,
PdfRectangle glyphRectangle,
PdfRectangle glyphRectangleLoose,
PdfPoint startBaseLine,
PdfPoint endBaseLine,
double width,
double fontSize,
FontDetails fontDetails,
TextRenderingMode renderingMode,
IColor strokeColor,
IColor fillColor,
double pointSize,
int textSequence):
this(value, glyphRectangle, glyphRectangleLoose,
startBaseLine, endBaseLine,
width, fontSize, fontDetails, null,
renderingMode, strokeColor, fillColor,
pointSize, textSequence)
{ }
private Letter(string value,
PdfRectangle glyphRectangle,
PdfRectangle glyphRectangleLoose,
PdfPoint startBaseLine,
PdfPoint endBaseLine,
double width,
double fontSize,
FontDetails fontDetails,
IFont? font,
TextRenderingMode renderingMode,
IColor strokeColor,
IColor fillColor,
@@ -114,11 +174,13 @@
{
Value = value;
GlyphRectangle = glyphRectangle;
GlyphRectangleLoose = glyphRectangleLoose;
StartBaseLine = startBaseLine;
EndBaseLine = endBaseLine;
Width = width;
FontSize = fontSize;
Font = font;
FontDetails = fontDetails;
_font = font;
RenderingMode = renderingMode;
if (renderingMode == TextRenderingMode.Stroke)
{
@@ -135,6 +197,43 @@
TextOrientation = GetTextOrientation();
}
/// <summary>
/// Creates a new <see cref="Letter"/> instance with the same properties as the current instance,
/// but with the font details set to bold.
/// </summary>
/// <returns>
/// A new <see cref="Letter"/> instance with bold font details.
/// </returns>
public Letter AsBold()
{
return new Letter(Value,
GlyphRectangle,
GlyphRectangleLoose,
StartBaseLine,
EndBaseLine,
Width,
FontSize,
FontDetails.AsBold(),
_font,
RenderingMode,
StrokeColor,
FillColor,
PointSize,
TextSequence);
}
/// <summary>
/// Retrieves the font associated with this letter, if available.
/// </summary>
/// <returns>
/// The <see cref="IFont"/> instance representing the font used for this letter,
/// or <c>null</c> if no font is associated.
/// </returns>
public IFont? GetFont()
{
return _font;
}
private TextOrientation GetTextOrientation()
{
if (Math.Abs(StartBaseLine.Y - EndBaseLine.Y) < 10e-5)

View File

@@ -182,16 +182,15 @@
public void Dispose()
{
foreach (var key in pageFactoryCache.Keys)
foreach (var factory in pageFactoryCache.Values)
{
var factory = pageFactoryCache[key];
pageFactoryCache.Remove(key);
if (factory is IDisposable disposable)
{
disposable.Dispose();
}
}
pageFactoryCache.Clear();
}
}
}

View File

@@ -56,16 +56,16 @@
// add this and follow chain defined by 'Prev' keys
xrefPartToBytePositionOrder.Add(firstCrossReferenceOffset);
// Get any streams that are tied to this table.
var activePart = currentPart;
var dependents = parts.Where(x => x.TiedToXrefAtOffset == activePart.Offset);
foreach (var dependent in dependents)
{
xrefPartToBytePositionOrder.Add(dependent.Offset);
}
while (currentPart.Dictionary != null)
{
// Get any streams that are tied to this table.
var activePart = currentPart;
var dependents = parts.Where(x => x.TiedToXrefAtOffset == activePart.Offset);
foreach (var dependent in dependents)
{
xrefPartToBytePositionOrder.Add(dependent.Offset);
}
long prevBytePos = currentPart.GetPreviousOffset();
if (prevBytePos == -1)
{

View File

@@ -0,0 +1,82 @@
namespace UglyToad.PdfPig.Filters
{
using System;
using System.IO;
internal sealed class Adler32ChecksumStream : Stream
{
private readonly Stream underlyingStream;
public Adler32ChecksumStream(Stream writeStream)
{
underlyingStream = writeStream ?? throw new ArgumentNullException(nameof(writeStream));
}
public override bool CanRead => underlyingStream.CanRead;
public override bool CanSeek => false;
public override bool CanWrite => underlyingStream.CanWrite;
public override long Length => underlyingStream.Length;
public override long Position { get => underlyingStream.Position; set => throw new NotImplementedException(); }
public override void Flush()
{
underlyingStream.Flush();
}
public override int Read(byte[] buffer, int offset, int count)
{
int n = underlyingStream.Read(buffer, offset, count);
if (n > 0)
{
UpdateAdler(buffer.AsSpan(offset, n));
}
return n;
}
public override long Seek(long offset, SeekOrigin origin)
{
throw new InvalidOperationException();
}
public override void SetLength(long value)
{
throw new InvalidOperationException();
}
public override void Write(byte[] buffer, int offset, int count)
{
underlyingStream.Write(buffer, offset, count);
if (count > 0)
{
UpdateAdler(buffer.AsSpan(offset, count));
}
}
public uint Checksum { get; private set; } = 1;
private void UpdateAdler(Span<byte> span)
{
const uint MOD_ADLER = 65521;
uint a = Checksum & 0xFFFF;
uint b = (Checksum >> 16) & 0xFFFF;
foreach (byte c in span)
{
a = (a + c) % MOD_ADLER;
b = (b + a) % MOD_ADLER;
}
Checksum = (b << 16) | a;
}
public override void Close()
{
underlyingStream.Close();
}
}
}

View File

@@ -2,6 +2,7 @@
{
using Fonts;
using System;
using System.Buffers.Binary;
using System.IO;
using System.IO.Compression;
using Tokens;
@@ -43,6 +44,15 @@
var colors = Math.Min(parameters.GetIntOrDefault(NameToken.Colors, DefaultColors), 32);
var bitsPerComponent = parameters.GetIntOrDefault(NameToken.BitsPerComponent, DefaultBitsPerComponent);
var columns = parameters.GetIntOrDefault(NameToken.Columns, DefaultColumns);
var length = parameters.GetIntOrDefault(NameToken.Length, -1);
if (length > 0 && length < input.Length)
{
// Truncates final "\r\n" or "\n" from source data if any. Fixes detecting where the adler checksum is. (Zlib uses framing for this)
input = input.Slice(0, length);
}
return Decompress(input, predictor, colors, bitsPerComponent, columns);
}
catch
@@ -55,29 +65,83 @@
private static Memory<byte> Decompress(Memory<byte> input, int predictor, int colors, int bitsPerComponent, int columns)
{
using (var memoryStream = MemoryHelper.AsReadOnlyMemoryStream(input))
#if NET
using var memoryStream = MemoryHelper.AsReadOnlyMemoryStream(input);
try
{
// The first 2 bytes are the header which DeflateStream does not support.
memoryStream.ReadByte();
memoryStream.ReadByte();
try
using (var zlib = new ZLibStream(memoryStream, CompressionMode.Decompress))
using (var output = new MemoryStream((int)(input.Length * 1.5)))
using (var f = PngPredictor.WrapPredictor(output, predictor, colors, bitsPerComponent, columns))
{
using (var deflate = new DeflateStream(memoryStream, CompressionMode.Decompress))
using (var output = new MemoryStream((int)(input.Length * 1.5)))
using (var f = PngPredictor.WrapPredictor(output, predictor, colors, bitsPerComponent, columns))
{
deflate.CopyTo(f);
f.Flush();
zlib.CopyTo(f);
f.Flush();
return output.AsMemory();
}
}
catch (InvalidDataException ex)
{
throw new CorruptCompressedDataException("Invalid Flate compressed stream encountered", ex);
return output.AsMemory();
}
}
catch (InvalidDataException ex)
{
throw new CorruptCompressedDataException("Invalid Flate compressed stream encountered", ex);
}
#else
// Ideally we would like to use the ZLibStream class but that is only available in .NET 5+.
// We look at the raw data now
// * First we have 2 bytes, specifying the type of compression
// * Then we have the deflated data
// * Then we have a 4 byte checksum (Adler32)
// Would be so nice to have zlib do the framing here... but the deflate stream already reads data from the stream that we need.
using var memoryStream = MemoryHelper.AsReadOnlyMemoryStream(input.Slice(2, input.Length - 2 /* Header */ - 4 /* Checksum */));
// The first 2 bytes are the header which DeflateStream can't handle. After the s
var adlerBytes = input.Slice(input.Length - 4, 4).Span;
uint expected = BinaryPrimitives.ReadUInt32BigEndian(adlerBytes);
uint altExpected = expected;
// Sometimes the data ends with "\r\n", "\r" or "\n" and we don't know if it is part of the zlib
// Ideally this would have been removed by the caller from the provided length...
if (adlerBytes[3] == '\n' || adlerBytes[3] == '\r')
{
if (adlerBytes[3] == '\n' && adlerBytes[2] == '\r')
{
// Now we don't know which value is the good one. The value could be ok, or padding.
// Lets allow both values for now. Allowing two out of 2^32 is much better than allowing everything
adlerBytes = input.Slice(input.Length - 6, 4).Span;
}
else
{
// Same but now for just '\n' or '\r' instead of '\r\n'
adlerBytes = input.Slice(input.Length - 5, 4).Span;
}
altExpected = BinaryPrimitives.ReadUInt32BigEndian(adlerBytes);
}
try
{
using (var deflate = new DeflateStream(memoryStream, CompressionMode.Decompress))
using (var adlerStream = new Adler32ChecksumStream(deflate))
using (var output = new MemoryStream((int)(input.Length * 1.5)))
using (var f = PngPredictor.WrapPredictor(output, predictor, colors, bitsPerComponent, columns))
{
adlerStream.CopyTo(f);
f.Flush();
uint actual = adlerStream.Checksum;
if (expected != actual && altExpected != actual)
{
throw new CorruptCompressedDataException("Flate stream has invalid checksum");
}
return output.AsMemory();
}
}
catch (InvalidDataException ex)
{
throw new CorruptCompressedDataException("Invalid Flate compressed stream encountered", ex);
}
#endif
}
/// <inheritdoc />
@@ -95,9 +159,10 @@
using (var compressStream = new MemoryStream())
using (var compressor = new DeflateStream(compressStream, CompressionLevel.Fastest))
using (var adlerStream = new Adler32ChecksumStream(compressor))
{
compressor.Write(data, 0, data.Length);
compressor.Close();
adlerStream.Write(data, 0, data.Length);
adlerStream.Close();
var compressed = compressStream.ToArray();
@@ -111,7 +176,7 @@
Array.Copy(compressed, 0, result, headerLength, compressed.Length);
// Write Checksum of raw data.
var checksum = Adler32Checksum.Calculate(data);
var checksum = adlerStream.Checksum;
var offset = headerLength + compressed.Length;

View File

@@ -100,12 +100,6 @@ namespace UglyToad.PdfPig.Graphics
var transformedGlyphBounds = PerformantRectangleTransformer
.Transform(renderingMatrix, textMatrix, transformationMatrix, characterBoundingBox.GlyphBounds);
var transformedPdfBounds = PerformantRectangleTransformer
.Transform(renderingMatrix,
textMatrix,
transformationMatrix,
new PdfRectangle(0, 0, characterBoundingBox.Width, UserSpaceUnit.PointMultiples));
if (ParsingOptions.ClipPaths)
{
var currentClipping = currentState.CurrentClippingPath;
@@ -129,11 +123,12 @@ namespace UglyToad.PdfPig.Graphics
letter = new Letter(
newLetter,
attachTo.GlyphRectangle,
attachTo.GlyphRectangleLoose,
attachTo.StartBaseLine,
attachTo.EndBaseLine,
attachTo.Width,
attachTo.FontSize,
attachTo.Font,
attachTo.GetFont()!,
attachTo.RenderingMode,
attachTo.StrokeColor,
attachTo.FillColor,
@@ -151,14 +146,30 @@ namespace UglyToad.PdfPig.Graphics
// If we did not create a letter for a combined diacritic, create one here.
if (letter is null)
{
var transformedPdfBounds = PerformantRectangleTransformer
.Transform(renderingMatrix,
textMatrix,
transformationMatrix,
new PdfRectangle(0, 0, characterBoundingBox.Width, UserSpaceUnit.PointMultiples));
var looseBox = PerformantRectangleTransformer
.Transform(renderingMatrix,
textMatrix,
transformationMatrix,
new PdfRectangle(0,
font.GetDescent(),
characterBoundingBox.Width,
font.GetAscent()));
letter = new Letter(
unicode,
isBboxValid ? transformedGlyphBounds : transformedPdfBounds,
looseBox,
transformedPdfBounds.BottomLeft,
transformedPdfBounds.BottomRight,
transformedPdfBounds.Width,
fontSize,
font.Details,
font,
currentState.FontState.TextRenderingMode,
currentState.CurrentStrokingColor!,
currentState.CurrentNonStrokingColor!,
@@ -167,7 +178,6 @@ namespace UglyToad.PdfPig.Graphics
}
letters.Add(letter);
markedContentStack.AddLetter(letter);
}

View File

@@ -39,7 +39,16 @@
var strideWidth = decoded.Length / imageHeight / bytesPerPixel;
if (strideWidth != imageWidth)
{
decoded = RemoveStridePadding(decoded, strideWidth, imageWidth, imageHeight, bytesPerPixel);
if (bytesPerPixel > 1)
{
// Fixed thanks to / see discussion at https://github.com/UglyToad/PdfPig/issues/1183
// Unclear what should be done here, we assume we can just remove the trailing bytes
decoded = decoded.Slice(0, imageWidth * imageHeight * bytesPerPixel);
}
else
{
decoded = RemoveStridePadding(decoded, strideWidth, imageWidth, imageHeight, bytesPerPixel);
}
}
return details.Transform(decoded);

View File

@@ -1,35 +0,0 @@
namespace UglyToad.PdfPig.Images.Png
{
using System;
/// <summary>
/// Used to calculate the Adler-32 checksum used for ZLIB data in accordance with
/// RFC 1950: ZLIB Compressed Data Format Specification.
/// </summary>
internal static class Adler32Checksum
{
// Both sums (s1 and s2) are done modulo 65521.
private const int AdlerModulus = 65521;
/// <summary>
/// Calculate the Adler-32 checksum for some data.
/// </summary>
public static int Calculate(ReadOnlySpan<byte> data)
{
// s1 is the sum of all bytes.
var s1 = 1;
// s2 is the sum of all s1 values.
var s2 = 0;
foreach (var b in data)
{
s1 = (s1 + b) % AdlerModulus;
s2 = (s1 + s2) % AdlerModulus;
}
// The Adler-32 checksum is stored as s2*65536 + s1.
return s2 * 65536 + s1;
}
}
}

View File

@@ -1,8 +1,10 @@
namespace UglyToad.PdfPig.Images.Png
{
using System.Buffers.Binary;
using System.IO;
using System.IO.Compression;
using System.Text;
using UglyToad.PdfPig.Filters;
/// <summary>
/// Used to construct PNG images. Call <see cref="Create"/> to make a new builder.
@@ -121,9 +123,10 @@
const int checksumLength = 4;
using (var compressStream = new MemoryStream())
using (var compressor = new DeflateStream(compressStream, CompressionLevel.Fastest, true))
using (var adlerStream = new Adler32ChecksumStream(compressor))
{
compressor.Write(data, 0, data.Length);
compressor.Close();
adlerStream.Write(data, 0, data.Length);
adlerStream.Close();
compressStream.Seek(0, SeekOrigin.Begin);
@@ -143,15 +146,11 @@
}
// Write Checksum of raw data.
var checksum = Adler32Checksum.Calculate(data);
var checksum = adlerStream.Checksum;
var offset = headerLength + compressStream.Length;
result[offset++] = (byte)(checksum >> 24);
result[offset++] = (byte)(checksum >> 16);
result[offset++] = (byte)(checksum >> 8);
result[offset] = (byte)(checksum >> 0);
BinaryPrimitives.WriteUInt32BigEndian(result.AsSpan((int)offset, 4), checksum);
return result;
}
}

View File

@@ -1,11 +1,11 @@
namespace UglyToad.PdfPig.Images.Png
namespace UglyToad.PdfPig.Images.Png
{
using System.Diagnostics.CodeAnalysis;
using Content;
using Content;
using Graphics.Colors;
using UglyToad.PdfPig.Core;
internal static class PngFromPdfImageFactory
internal static class PngFromPdfImageFactory
{
private static bool TryGenerateSoftMask(IPdfImage image, [NotNullWhen(true)] out ReadOnlySpan<byte> maskBytes)
{
@@ -26,9 +26,9 @@
return false;
}
if (!mask.TryGetBytesAsMemory(out var maskMemory))
{
return false;
if (!mask.TryGetBytesAsMemory(out var maskMemory))
{
return false;
}
try
@@ -67,24 +67,24 @@
bytesPure[actualSize - 2] == ReadHelper.AsciiCarriageReturn &&
bytesPure[actualSize - 1] == ReadHelper.AsciiLineFeed);
}
public static bool TryGenerate(IPdfImage image, [NotNullWhen(true)] out byte[]? bytes)
{
bytes = null;
var hasValidDetails = image.ColorSpaceDetails != null && !(image.ColorSpaceDetails is UnsupportedColorSpaceDetails);
public static bool TryGenerate(IPdfImage image, [NotNullWhen(true)] out byte[]? bytes)
{
bytes = null;
var hasValidDetails = image.ColorSpaceDetails != null && !(image.ColorSpaceDetails is UnsupportedColorSpaceDetails);
var isColorSpaceSupported = hasValidDetails && image.ColorSpaceDetails!.BaseType != ColorSpace.Pattern;
if (!isColorSpaceSupported || !image.TryGetBytesAsMemory(out var imageMemory))
{
return false;
if (!isColorSpaceSupported || !image.TryGetBytesAsMemory(out var imageMemory))
{
return false;
}
var bytesPure = imageMemory.Span;
try
{
var bytesPure = imageMemory.Span;
try
{
bytesPure = ColorSpaceDetailsByteConverter.Convert(image.ColorSpaceDetails!, bytesPure,
image.BitsPerComponent, image.WidthInSamples, image.HeightInSamples);
@@ -108,7 +108,7 @@
}
}
var builder = PngBuilder.Create(image.WidthInSamples, image.HeightInSamples, hasMask);
var builder = PngBuilder.Create(image.WidthInSamples, image.HeightInSamples, hasMask);
if (!IsCorrectlySized(image, bytesPure))
{
@@ -183,17 +183,17 @@
}
}
}
}
bytes = builder.Save();
return true;
}
catch
}
bytes = builder.Save();
return true;
}
catch
{
// ignored.
}
return false;
// ignored.
}
return false;
}
}
}
}
}

View File

@@ -4,14 +4,14 @@
/// How many bytes precede the "%PDF-" version header in the file. In some files this 'junk' can
/// offset all following offset bytes.
/// </summary>
internal readonly struct FileHeaderOffset(int value)
internal readonly record struct FileHeaderOffset(int Value) : IEquatable<FileHeaderOffset>
{
public int Value => value;
public override string ToString() => Value.ToString();
public bool Equals(FileHeaderOffset other)
{
return Value == other.Value;
}
public override string ToString() => value.ToString();
public override bool Equals(object? obj) =>
obj is FileHeaderOffset other && value == other.Value;
public override int GetHashCode() => value.GetHashCode();
public override int GetHashCode() => Value.GetHashCode();
}

View File

@@ -153,6 +153,23 @@ internal static partial class FirstPassParser
{
results.Add(table);
nextLocation = table.GetPrevious();
// Also add any optional associated Stream
var xRefStm = table.GetXRefStm();
if (xRefStm is long xRefStmValue)
{
var stream = GetXrefStreamOrTable(
offset,
input,
scanner,
xRefStmValue,
log);
if (stream != null)
{
results.Add(stream);
}
}
}
else if (streamOrTable is XrefStream stream)
{

View File

@@ -16,6 +16,9 @@ internal static class XrefBruteForcer
{
var results = new List<IXrefSection>();
// Guard against circular references; only read xref at each offset once
var xrefOffsetSeen = new HashSet<long>();
var bruteForceObjPositions = new Dictionary<IndirectReference, long>();
DictionaryToken? trailer = null;
@@ -131,6 +134,14 @@ internal static class XrefBruteForcer
ClearQueues();
var potentialTableOffset = bytes.CurrentOffset - 4;
if (xrefOffsetSeen.Contains(potentialTableOffset))
{
log.Debug($"Skipping circular xref reference at {potentialTableOffset}");
continue;
}
xrefOffsetSeen.Add(potentialTableOffset);
var table = XrefTableParser.TryReadTableAtOffset(
new FileHeaderOffset(0),
potentialTableOffset,
@@ -152,15 +163,22 @@ internal static class XrefBruteForcer
{
ClearQueues();
if (!lastObjPosition.HasValue)
if (lastObjPosition is not long offset)
{
log.Error("Found an /XRef without having encountered an object first");
continue;
}
if (xrefOffsetSeen.Contains(offset))
{
log.Debug($"Skipping circular /XRef reference at {offset}");
continue;
}
xrefOffsetSeen.Add(offset);
var stream = XrefStreamParser.TryReadStreamAtOffset(
new FileHeaderOffset(0),
lastObjPosition.Value,
offset,
bytes,
scanner,
log);

View File

@@ -44,4 +44,14 @@ internal sealed class XrefTable : IXrefSection
return null;
}
public long? GetXRefStm()
{
if (Dictionary != null && Dictionary.TryGet(NameToken.XrefStm, out NumericToken xRefStm))
{
return xRefStm.Long;
}
return null;
}
}

View File

@@ -44,9 +44,23 @@
internal static PdfDocument Open(Stream stream, ParsingOptions? options)
{
var streamInput = new StreamInputBytes(stream, false);
var initialPosition = stream.Position;
StreamInputBytes streamInput;
long initialPosition;
if (stream is { CanRead: true, CanSeek: false })
{
// We need the stream to be seekable
var ms = new MemoryStream();
stream.CopyTo(ms); // Copy the non seekable stream in memory (seekable)
ms.Position = 0;
streamInput = new StreamInputBytes(ms, true); // The created memory stream will be disposed on document dispose
initialPosition = ms.Position;
}
else
{
streamInput = new StreamInputBytes(stream, false);
initialPosition = stream.Position;
}
try
{
@@ -225,6 +239,11 @@
var rootDictionary = DirectObjectFinder.Get<DictionaryToken>(trailer.Root, pdfTokenScanner)!;
if (rootDictionary is null)
{
throw new PdfDocumentFormatException($"The root object in the trailer did not resolve to a readable dictionary.");
}
if (!rootDictionary.ContainsKey(NameToken.Type) && isLenientParsing)
{
rootDictionary = rootDictionary.With(NameToken.Type, NameToken.Catalog);

View File

@@ -112,10 +112,14 @@
/// <summary>
/// Creates a <see cref="PdfDocument"/> for reading from the provided stream.
/// <para>
/// If the stream provided is not seekable (<see cref="Stream.CanSeek"/> is <c>false</c>), the stream will be copied into a new <see cref="MemoryStream"/>.
/// </para>
/// The caller must manage disposing the stream. The created PdfDocument will not dispose the stream.
/// </summary>
/// <param name="stream">
/// A stream of the file contents, this must support reading and seeking.
/// <para>If the stream provided is not seekable (<see cref="Stream.CanSeek"/> is <c>false</c>), the stream will be copied into a new <see cref="MemoryStream"/>.</para>
/// The PdfDocument will not dispose of the provided stream.
/// </param>
/// <param name="options">Optional parameters controlling parsing.</param>

View File

@@ -7,6 +7,7 @@
using Parser.Parts;
using Tokenization.Scanner;
using Tokens;
using UglyToad.PdfPig.Util;
/// <summary>
/// Extensions for PDF types.
@@ -62,6 +63,18 @@
double totalMaxEstSize = stream.Data.Length * 100;
var transform = stream.Data;
var length = stream.StreamDictionary.GetIntOrDefault(NameToken.Length, -1);
// If a length is available and it's smaller than the actual data length, use that. This trims whitespace (e.g. newlines) that might have been introduced during transport.
// And with that it handles some issues before individual filters have to deal with it.
//
// Do this before the first filter (to handle cases like multiple filters, etc).
if (length > 0 && length < transform.Length)
{
transform = transform.Slice(0, length);
}
for (var i = 0; i < filters.Count; i++)
{
var filter = filters[i];
@@ -89,6 +102,18 @@
double totalMaxEstSize = stream.Data.Length * 100;
var transform = stream.Data;
var length = stream.StreamDictionary.GetIntOrDefault(NameToken.Length, -1);
// If a length is available and it's smaller than the actual data length, use that. This trims whitespace (e.g. newlines) that might have been introduced during transport.
// And with that it handles some issues before individual filters have to deal with it.
//
// Do this before the first filter (to handle cases like multiple filters, etc).
if (length > 0 && length < transform.Length)
{
transform = transform.Slice(0, length);
}
for (var i = 0; i < filters.Count; i++)
{
var filter = filters[i];

View File

@@ -57,6 +57,10 @@
TransformationMatrix GetFontMatrix(int characterIdentifier);
double GetDescent();
double GetAscent();
/// <summary>
/// Returns the glyph path for the given character code.
/// </summary>

View File

@@ -20,6 +20,10 @@
bool TryGetBoundingAdvancedWidth(int characterIdentifier, out double width);
double? GetDescent();
double? GetAscent();
bool TryGetPath(int characterCode, [NotNullWhen(true)] out IReadOnlyList<PdfSubpath>? path);
bool TryGetPath(int characterCode, Func<int, int?> characterCodeToGlyphId, [NotNullWhen(true)] out IReadOnlyList<PdfSubpath>? path);

View File

@@ -42,6 +42,18 @@
public PdfRectangle? GetCharacterBoundingBox(string characterName) => fontCollection.GetCharacterBoundingBox(characterName);
public double? GetDescent()
{
// BobLd: we don't support ascent / descent for cff for the moment
return null;
}
public double? GetAscent()
{
// BobLd: we don't support ascent / descent for cff for the moment
return null;
}
public bool TryGetBoundingBox(int characterIdentifier, out PdfRectangle boundingBox)
{
boundingBox = new PdfRectangle(0, 0, 500, 0);

View File

@@ -36,6 +36,16 @@
public int GetFontMatrixMultiplier() => font.GetUnitsPerEm();
public double? GetDescent()
{
return font.TableRegister.HorizontalHeaderTable.Descent;
}
public double? GetAscent()
{
return font.TableRegister.HorizontalHeaderTable.Ascent;
}
public bool TryGetFontMatrix(int characterCode, [NotNullWhen(true)] out TransformationMatrix? matrix)
{
// We don't have a matrix here

View File

@@ -148,6 +148,38 @@
return fontProgram.TryGetFontMatrix(characterIdentifier, out var m) ? m.Value : FontMatrix;
}
public double GetDescent()
{
if (fontProgram is null)
{
return Descriptor.Descent;
}
double? descent = fontProgram.GetDescent();
if (descent.HasValue)
{
return descent.Value;
}
return Descriptor.Descent;
}
public double GetAscent()
{
if (fontProgram is null)
{
return Descriptor.Ascent;
}
double? ascent = fontProgram.GetAscent();
if (ascent.HasValue)
{
return ascent.Value;
}
return Descriptor.Ascent;
}
public bool TryGetPath(int characterCode, [NotNullWhen(true)] out IReadOnlyList<PdfSubpath>? path)
{
path = null;

View File

@@ -132,6 +132,38 @@
return FontMatrix;
}
public double GetDescent()
{
if (fontProgram is null)
{
return Descriptor.Descent;
}
double? descent = fontProgram.GetDescent();
if (descent.HasValue)
{
return descent.Value;
}
return Descriptor.Descent;
}
public double GetAscent()
{
if (fontProgram is null)
{
return Descriptor.Ascent;
}
double? ascent = fontProgram.GetAscent();
if (ascent.HasValue)
{
return ascent.Value;
}
return Descriptor.Ascent;
}
public bool TryGetPath(int characterCode, [NotNullWhen(true)] out IReadOnlyList<PdfSubpath>? path) => TryGetPath(characterCode, cidToGid.GetGlyphIndex, out path);
public bool TryGetPath(int characterCode, Func<int, int?> characterCodeToGlyphId, [NotNullWhen(true)] out IReadOnlyList<PdfSubpath>? path)

View File

@@ -81,11 +81,15 @@ namespace UglyToad.PdfPig.PdfFonts.Cmap
public CMap Build()
{
#if NET
BaseFontCharacterMap?.TrimExcess();
#endif
return new CMap(GetCidSystemInfo(), Type, WMode, Name, Version,
BaseFontCharacterMap ?? new Dictionary<int, string>(),
CodespaceRanges ?? new CodespaceRange[0],
CidRanges ?? new CidRange[0],
CidCharacterMappings ?? new CidCharacterMapping[0]);
CodespaceRanges ?? Array.Empty<CodespaceRange>(),
CidRanges ?? Array.Empty<CidRange>(),
CidCharacterMappings ?? Array.Empty<CidCharacterMapping>());
}
private CharacterIdentifierSystemInfo GetCidSystemInfo()

View File

@@ -22,6 +22,8 @@
= new Dictionary<int, CharacterBoundingBox>();
private readonly bool useLenientParsing;
private readonly double ascent;
private readonly double descent;
public NameToken Name => BaseFont;
@@ -57,6 +59,30 @@
?? FontDetails.GetDefault(Name.Data);
useLenientParsing = parsingOptions.UseLenientParsing;
ascent = ComputeAscent();
descent = ComputeDescent();
}
private double ComputeDescent()
{
double d = CidFont.GetDescent();
if (Math.Abs(d) > double.Epsilon)
{
return GetFontMatrix().TransformY(d);
}
return -0.25;
}
private double ComputeAscent()
{
double a = CidFont.GetAscent();
if (Math.Abs(a) > double.Epsilon)
{
return GetFontMatrix().TransformY(a);
}
return 0.75;
}
public int ReadCharacterCode(IInputBytes bytes, out int codeLength)
@@ -144,6 +170,16 @@
return CidFont.FontMatrix;
}
public double GetDescent()
{
return descent;
}
public double GetAscent()
{
return ascent;
}
public PdfVector GetPositionVector(int characterCode)
{
var characterIdentifier = CMap.ConvertToCid(characterCode);

View File

@@ -3,7 +3,7 @@
/// <summary>
/// Summary details of the font used to draw a glyph.
/// </summary>
public class FontDetails
public sealed class FontDetails
{
/// <summary>
/// The normal weight for a font.
@@ -35,6 +35,8 @@
/// </summary>
public bool IsItalic { get; }
private readonly Lazy<FontDetails> _bold;
/// <summary>
/// Create a new <see cref="FontDetails"/>.
/// </summary>
@@ -44,6 +46,17 @@
IsBold = isBold;
Weight = weight;
IsItalic = isItalic;
_bold = isBold ? new Lazy<FontDetails>(() => this) : new Lazy<FontDetails>(() => new FontDetails(Name, true, Weight, IsItalic));
}
/// <summary>
/// An instance of <see cref="FontDetails"/> with the same properties as the current instance,
/// but with the <see cref="IsBold"/> property set to <c>true</c>.
/// </summary>
public FontDetails AsBold()
{
return _bold.Value;
}
internal static FontDetails GetDefault(string? name = null) => new FontDetails(name ?? string.Empty,
@@ -51,7 +64,7 @@
DefaultWeight,
false);
internal FontDetails WithName(string name) => name != null
internal FontDetails WithName(string? name) => name is not null
? new FontDetails(name, IsBold, Weight, IsItalic)
: this;

View File

@@ -45,6 +45,24 @@
/// </summary>
TransformationMatrix GetFontMatrix();
/// <summary>
/// Retrieves the descent value of the font, adjusted by the font matrix.
/// </summary>
/// <returns>
/// A <see cref="double"/> representing the descent of the font,
/// which is the distance from the baseline to the lowest point of the font's glyphs.
/// </returns>
double GetDescent();
/// <summary>
/// Retrieves the ascent value of the font, adjusted byt the font matrix.
/// </summary>
/// <returns>
/// A <see cref="double"/> representing the ascent of the font,
/// which is the distance from the baseline to the highest point of the font's glyphs.
/// </returns>
double GetAscent();
/// <summary>
/// Returns the glyph path for the given character code.
/// </summary>

View File

@@ -33,6 +33,10 @@
private readonly bool isZapfDingbats;
private readonly TransformationMatrix fontMatrix;
private readonly double descent;
private readonly double ascent;
#nullable disable
public NameToken Name { get; }
#nullable enable
@@ -66,6 +70,37 @@
?? FontDetails.GetDefault(Name?.Data);
isZapfDingbats = encoding is ZapfDingbatsEncoding || Details.Name.Contains("ZapfDingbats");
// Set font matrix
double scale = 1000.0;
if (this.font?.TableRegister.HeaderTable is not null)
{
scale = this.font.GetUnitsPerEm();
}
fontMatrix = TransformationMatrix.FromValues(1.0 / scale, 0, 0, 1.0 / scale, 0, 0);
descent = ComputeDescent();
ascent = ComputeAscent();
}
private double ComputeDescent()
{
if (font is null)
{
return DefaultTransformation.TransformY(descriptor!.Descent);
}
return GetFontMatrix().TransformY(font.TableRegister.HorizontalHeaderTable.Descent);
}
private double ComputeAscent()
{
if (font is null)
{
return DefaultTransformation.TransformY(descriptor!.Ascent);
}
return GetFontMatrix().TransformY(font.TableRegister.HorizontalHeaderTable.Ascent);
}
public int ReadCharacterCode(IInputBytes bytes, out int codeLength)
@@ -195,14 +230,7 @@
public TransformationMatrix GetFontMatrix()
{
var scale = 1000.0;
if (font?.TableRegister.HeaderTable != null)
{
scale = font.GetUnitsPerEm();
}
return TransformationMatrix.FromValues(1 / scale, 0, 0, 1 / scale, 0, 0);
return fontMatrix;
}
private PdfRectangle GetBoundingBoxInGlyphSpace(int characterCode, out bool fromFont)
@@ -338,6 +366,16 @@
return widths[index];
}
public double GetDescent()
{
return descent;
}
public double GetAscent()
{
return ascent;
}
/// <inheritdoc/>
public bool TryGetPath(int characterCode, [NotNullWhen(true)] out IReadOnlyList<PdfSubpath>? path)
{

View File

@@ -19,6 +19,9 @@
private static readonly TransformationMatrix DefaultTransformation =
TransformationMatrix.FromValues(1 / 1000.0, 0, 0, 1 / 1000.0, 0, 0);
private readonly TransformationMatrix fontMatrix;
private readonly double ascent;
private readonly double descent;
private readonly AdobeFontMetrics fontMetrics;
private readonly Encoding encoding;
private readonly TrueTypeFont font;
@@ -45,8 +48,42 @@
// Assumption is ZapfDingbats is not possible here. We need to change the behaviour if not the case
System.Diagnostics.Debug.Assert(!(encoding is ZapfDingbatsEncoding || Details.Name.Contains("ZapfDingbats")));
// Set font matrix
if (this.font?.TableRegister.HeaderTable is not null)
{
var scale = (double)this.font.GetUnitsPerEm();
fontMatrix = TransformationMatrix.FromValues(1.0 / scale, 0, 0, 1.0 / scale, 0, 0);
}
else
{
fontMatrix = DefaultTransformation;
}
descent = ComputeDescent();
ascent = ComputeAscent();
}
private double ComputeDescent()
{
if (fontMetrics is not null)
{
return GetFontMatrix().TransformY(fontMetrics.Descender);
}
return GetFontMatrix().TransformY(font.TableRegister.HorizontalHeaderTable.Descent);
}
private double ComputeAscent()
{
if (fontMetrics is not null)
{
return GetFontMatrix().TransformY(fontMetrics.Ascender);
}
return GetFontMatrix().TransformY(font.TableRegister.HorizontalHeaderTable.Ascent);
}
public int ReadCharacterCode(IInputBytes bytes, out int codeLength)
{
codeLength = 1;
@@ -127,14 +164,17 @@
public TransformationMatrix GetFontMatrix()
{
if (font?.TableRegister.HeaderTable != null)
{
var scale = (double)font.GetUnitsPerEm();
return fontMatrix;
}
return TransformationMatrix.FromValues(1 / scale, 0, 0, 1 / scale, 0, 0);
}
public double GetDescent()
{
return descent;
}
return DefaultTransformation;
public double GetAscent()
{
return ascent;
}
/// <inheritdoc/>

View File

@@ -36,7 +36,9 @@
private readonly ToUnicodeCMap toUnicodeCMap;
private readonly TransformationMatrix fontMatrix;
private readonly double ascent;
private readonly double descent;
private readonly bool isZapfDingbats;
public NameToken Name { get; }
@@ -83,6 +85,60 @@
Details = fontDescriptor?.ToDetails(name?.Data)
?? FontDetails.GetDefault(name?.Data);
isZapfDingbats = encoding is ZapfDingbatsEncoding || Details.Name.Contains("ZapfDingbats");
descent = ComputeDescent();
ascent = ComputeAscent();
}
private double ComputeDescent()
{
if (Math.Abs(fontDescriptor.Descent) > double.Epsilon)
{
return fontMatrix.TransformY(fontDescriptor.Descent);
}
/*
// BobLd: Should 'fontProgram' be used
if (fontProgram is not null)
{
if (fontProgram.TryGetFirst(out var t1))
{
}
if (fontProgram.TryGetSecond(out var cffCol))
{
}
}
*/
return -0.25;
}
private double ComputeAscent()
{
if (Math.Abs(fontDescriptor.Ascent) > double.Epsilon)
{
return fontMatrix.TransformY(fontDescriptor.Ascent);
}
/*
// BobLd: Should 'fontProgram' be used
if (fontProgram is not null)
{
if (fontProgram.TryGetFirst(out var t1))
{
}
if (fontProgram.TryGetSecond(out var cffCol))
{
}
}
*/
return 0.75;
}
public int ReadCharacterCode(IInputBytes bytes, out int codeLength)
@@ -206,7 +262,7 @@
{
var first = cffFont.FirstFont;
string characterName;
if (encoding != null)
if (encoding is not null)
{
characterName = encoding.GetName(characterCode);
}
@@ -232,6 +288,16 @@
return fontMatrix;
}
public double GetDescent()
{
return descent;
}
public double GetAscent()
{
return ascent;
}
/// <inheritdoc/>
public bool TryGetPath(int characterCode, [NotNullWhen(true)] out IReadOnlyList<PdfSubpath>? path)
{

View File

@@ -26,6 +26,8 @@ namespace UglyToad.PdfPig.PdfFonts.Simple
public FontDetails Details { get; }
private readonly TransformationMatrix fontMatrix = TransformationMatrix.FromValues(0.001, 0, 0, 0.001, 0, 0);
private readonly double ascent;
private readonly double descent;
public Type1Standard14Font(AdobeFontMetrics standardFontMetrics, Encoding? overrideEncoding = null)
{
@@ -40,6 +42,28 @@ namespace UglyToad.PdfPig.PdfFonts.Simple
standardFontMetrics.Weight == "Bold" ? 700 : FontDetails.DefaultWeight,
standardFontMetrics.ItalicAngle != 0);
isZapfDingbats = encoding is ZapfDingbatsEncoding || Details.Name.Contains("ZapfDingbats");
descent = ComputeDescent();
ascent = ComputeAscent();
}
private double ComputeDescent()
{
if (Math.Abs(standardFontMetrics.Descender) < double.Epsilon)
{
return -0.25;
}
return fontMatrix.TransformY(standardFontMetrics.Descender);
}
private double ComputeAscent()
{
if (Math.Abs(standardFontMetrics.Ascender) < double.Epsilon)
{
return 0.75;
}
return fontMatrix.TransformY(standardFontMetrics.Ascender);
}
public int ReadCharacterCode(IInputBytes bytes, out int codeLength)
@@ -115,6 +139,16 @@ namespace UglyToad.PdfPig.PdfFonts.Simple
return fontMatrix;
}
public double GetDescent()
{
return descent;
}
public double GetAscent()
{
return ascent;
}
/// <summary>
/// <inheritdoc/>
/// <para>Not implemented.</para>

View File

@@ -13,6 +13,8 @@
{
private readonly PdfRectangle boundingBox;
private readonly TransformationMatrix fontMatrix;
private readonly double ascent;
private readonly double descent;
private readonly Encoding encoding;
private readonly int firstChar;
private readonly int lastChar;
@@ -45,6 +47,18 @@
// Assumption is ZapfDingbats is not possible here. We need to change the behaviour if not the case
System.Diagnostics.Debug.Assert(!(encoding is ZapfDingbatsEncoding || Details.Name.Contains("ZapfDingbats")));
descent = ComputeDescent();
ascent = ComputeAscent();
}
private double ComputeDescent()
{
return 0;
}
private double ComputeAscent()
{
return fontMatrix.TransformY(boundingBox.Top);
}
public int ReadCharacterCode(IInputBytes bytes, out int codeLength)
@@ -106,6 +120,16 @@
return fontMatrix;
}
public double GetDescent()
{
return descent;
}
public double GetAscent()
{
return ascent;
}
/// <summary>
/// <inheritdoc/>
/// <para>Type 3 fonts do not use vector paths. Always returns <c>false</c>.</para>

View File

@@ -0,0 +1,6 @@
#if !NET
namespace System.Runtime.CompilerServices
{
internal static class IsExternalInit { }
}
#endif

View File

@@ -1069,6 +1069,7 @@
var letter = new Letter(
c.ToString(),
documentSpace,
documentSpace,
advanceRect.BottomLeft,
advanceRect.BottomRight,
width,