In March, Tim Brody sent a patch to the poppler mailing list to make poppler extract accentuated characters of old pdf generated by latex (where the character and the accent where represented as two different glyphs) as a character plus a combining character. After testing the patch i saw it did not work at all, as it was changing the output of pdftotext from
R. L¨wen and B. Polster
o
to
R. Lowen and B. Polster
instead of
R. Löwen and B. Polster
Tim answered with a "WorksForMe" that left us quite puzzled for a while. It took me some time to realize it was actually Konsole's fault as it did not understand combining characters. A bug about that had already been reported by Thiago back in 2005. As i had been bit by that bug I decided to have a go at fixing it and after some days of coding I landed a patch to support Unicode composing characters in Konsole. Unfortunately it was already too late for KDE 4.7 so you'll have to wait until 4.8 to enjoy of this goodness (or use master ;))
P.S: Tim if you are reading this you never answered about the regressions caused by your patch so it was never committed to poppler
I cannot be part of 4.7.x ?
ReplyDelete@Egon: It could, but given that it was a kind of intrusive patch both the maintainer of Konsole and me agreed to give it some time to mature in master in case the patch was not totally perfect.
ReplyDelete“P.S: Tim if you are reading this you never answered about the regressions caused by your patch so it was never committed to poppler”
ReplyDeleteAh too bad, i regularly run into this bug in okular. At least it is great konsole has a fix for composed characters, thanks! The support of such chars is not that great in KDE yet (bug #143364 for instance). This is unfortunate was this is now the recommended way to add characters with diacritics that are not already in Unicode.