Sometimes the decision of how a program should behave is not right or wrong technically but based on the user expectation.
In Okular we are asking ourselves what should happen when you have two lines with the following text
This is an ex-
ample
and copy it. Should it return "This is an ex-\nample" or "This is an ex-ample" or "This is an example"?
Head over to the KDE forums and vote!
Is it possible to distinguish for example in PDF between hyphens that are added because of line breaks and hyphens that belong to the word?
ReplyDeleteI'd go for 'example'.
ReplyDeleteThink is the most common case of them all.
"example" all the way :)
ReplyDeleteThanks for asking.
@Anonymous: No, a hyphen is a hyphen
ReplyDelete@The rest: You are voting in the wrong place ;-)
I think the coolest would be if okular tried a spellcheck on the "example". If example turns out to be in the users wordbook, it would be copied as this word else as "ex-ample".
ReplyDeleteIt would be nice to have these options in the preferences dialog box, with default option set to "example".
ReplyDeleteThe PDF reference says:
ReplyDeleteHyphenation. Among the artifacts introduced by text layout is the hyphen marking the incidental division of
a word at the end of a line. In Tagged PDF, such an incidental word division shall be represented by a soft
hyphen character, which the Unicode mapping algorithm (see “Unicode Mapping in Tagged PDF” in
14.8.2.4, “Extraction of Character Properties”) translates to the Unicode value U+00AD. (This character is
distinct from an ordinary hard hyphen, whose Unicode value is U+002D.) The producer of a Tagged PDF
document shall distinguish explicitly between soft and hard hyphens so that the consumer does not have
to guess which type a given character represents.
So okular could at least distinguish between soft and hard hyphens and treat them separately.
For some languages hyphenation changes the spelling. A straightforward merging would then introduce errors.
So I think that removing soft hyphens but keeping others is the best solution.
(I comment here because it does not require an additional login)
This may be language dependent.
ReplyDeleteIn Portuguese, if the line break happens where there is already a gramatical hyphen, then you should repeat the hyphen at the beginning of the next line.
Isto é um guarda-chuva.
should break as:
..... Isto é um guarda-
-chuva.
IOW, in both these cases:
guarda-
-chuva.
guarda-chu-
va.
when written on a single line, the word is
guarda-chuva
If the same rules apply to English (I don't know if they do), you return either "This is an ex-\nample" or "This is an example", but never "This is an ex-ample".