TSDgeos' blog: Okular users we want your input!

Tuesday, February 07, 2012

Okular users we want your input!

Sometimes the decision of how a program should behave is not right or wrong technically but based on the user expectation.

In Okular we are asking ourselves what should happen when you have two lines with the following text

This is an ex-
ample

and copy it. Should it return "This is an ex-\nample" or "This is an ex-ample" or "This is an example"?

Head over to the KDE forums and vote!

8 comments:

Anonymous said...: Is it possible to distinguish for example in PDF between hyphens that are added because of line breaks and hyphens that belong to the word?; Wednesday, February 08, 2012 12:23:00 AM
Ivan Čukić said...: I'd go for 'example'.

Think is the most common case of them all.; Wednesday, February 08, 2012 12:23:00 AM
Luis Román said...: "example" all the way :)

Thanks for asking.; Wednesday, February 08, 2012 12:31:00 AM
Albert Astals Cid said...: @Anonymous: No, a hyphen is a hyphen

@The rest: You are voting in the wrong place ;-); Wednesday, February 08, 2012 12:41:00 AM
Anonymous said...: I think the coolest would be if okular tried a spellcheck on the "example". If example turns out to be in the users wordbook, it would be copied as this word else as "ex-ample".; Wednesday, February 08, 2012 7:55:00 AM
Uomo Ragno said...: It would be nice to have these options in the preferences dialog box, with default option set to "example".; Wednesday, February 08, 2012 8:53:00 AM
Christoph Bartoschek said...: The PDF reference says:

Hyphenation. Among the artifacts introduced by text layout is the hyphen marking the incidental division of
a word at the end of a line. In Tagged PDF, such an incidental word division shall be represented by a soft
hyphen character, which the Unicode mapping algorithm (see “Unicode Mapping in Tagged PDF” in
14.8.2.4, “Extraction of Character Properties”) translates to the Unicode value U+00AD. (This character is
distinct from an ordinary hard hyphen, whose Unicode value is U+002D.) The producer of a Tagged PDF
document shall distinguish explicitly between soft and hard hyphens so that the consumer does not have
to guess which type a given character represents.

So okular could at least distinguish between soft and hard hyphens and treat them separately.

For some languages hyphenation changes the spelling. A straightforward merging would then introduce errors.

So I think that removing soft hyphens but keeping others is the best solution.

(I comment here because it does not require an additional login); Wednesday, February 08, 2012 10:05:00 AM
Pedro Alves said...: This may be language dependent.

In Portuguese, if the line break happens where there is already a gramatical hyphen, then you should repeat the hyphen at the beginning of the next line.

Isto é um guarda-chuva.

should break as:

..... Isto é um guarda-
-chuva.

IOW, in both these cases:

guarda-
-chuva.

guarda-chu-
va.

when written on a single line, the word is

guarda-chuva

If the same rules apply to English (I don't know if they do), you return either "This is an ex-\nample" or "This is an example", but never "This is an ex-ample".; Wednesday, February 08, 2012 10:37:00 AM