Friday, April 29, 2022

Poppler finally has support for embedding fonts in PDF files!

 Why would you want to embed fonts in PDF files are you probably asking yourself?

Short answer: It fixes issues when adding text to the PDF files.

Long answer:

Poppler has had the feature of being able to fill in forms, create annotations and more recently add Digital Signatures to existing PDF files.

This works relatively well if you limit yourself to entering 'basic' ASCII characters, but once you go to more 'complex' characters, things don't really work, from the outside it seems like it should be relatively simple to fix, but things related to PDF are never as simple as they may seem.

In PDF each bit of text is associated with a Font object. That Font generally only supports one kind of text encoding and at most 'only' 65535 characters (65535 may seem a lot, but once you start taking into account non latin-based languages, you quickly 'run out' of characters).

What Poppler used to do in the past was just save the text in the PDF file and say "This text is written in Helvetica font", without even really care to specify much what 'Helvetica font' meant,  and then let the PDF viewer (remember when we save the PDF file, it will not only be rendered by Poppler again, but potentially by Adobe Reader, Chrome, Firefox, etc.) try to figure out what to do with that information, which as said usually didn't go very well for the more 'complex' characters.

What we do now is for each character of new text that we add to the file is we make sure to embed a font for it. So if you're writing something like 'holaħŋ↓' we may end up adding a few fonts to the PDF file, and then instead of saying 'This is the text and it's in Helvetica, good luck', we will say something like 'This text is characters 4, 67, 83 and 98 of embedded Font X, characters 4 and 99 of embedded Font X2 and character 16574 of embedded Font X3'. This way when the file is opened by a PDF viewer it is 'very easy' for them to do the right thing and show what we wanted.

Enough of technical talk! Now some screenshots to show how this has been fixed for Text Annotations, Forms and Signatures :)

Writing "hello↓漢you" to a form

Before

imatge 

Now

 imatge 

Signing a PDF file with my name being "Albeŋŧ As漢tals Ciđ"

Before

image 

Now

image 

 

Writing hola↓漢字 in a Text Annotation

Before

 

 Now