Bug report #6883

curved placement for labels breaks Indic scripts (Khmer, Lao, Nepale, Bangladeshe, etc.) strings

Added by Mathieu Pellerin - nIRV over 11 years ago. Updated about 9 years ago.

Status:Closed
Priority:Normal
Assignee:Larry Shaffer
Category:Labelling
Affected QGIS version:master Regression?:No
Operating System: Easy fix?:No
Pull Request or Patch supplied:No Resolution:
Crashes QGIS or corrupts data:No Copied to github as #:16011

Description

QGIS currently breaks Indic scripts when labelling lines with the curved placement.

I believe this is due to the curved placement logic assuming a "latin" behavior to all strings. While in a latin-based language each char follows the previous one, it is not the case with Indic scripts. Indic scripts are broken down into clusters. Unicode char data for a basic cluster looks like this: [consonant],[leg(s)],[vowel]. The consonant is used as the center glyph, while leg(s) and vowel can be placed all around the center glyph / consonant.

You can spot the broken rendering when you see the dotted-circle (which is used by rendering engines such as harfbuzz and uniscribe to indicate that legs and vowels are not attached to a needed consonant to form a cluster. This is happening in QGIS because the curved placement breaks down strings into single chars and place it.

label-correct-paralle_placement.jpg - parallel placement renders indic scripts correctly (45 KB) Mathieu Pellerin - nIRV, 2012-12-16 08:17 PM

label-broken-curved_placement.jpg - curved placement results in broken indic scripts (49.5 KB) Mathieu Pellerin - nIRV, 2012-12-16 08:17 PM

Associated revisions

Revision 2dc5d95f
Added by Nyall Dawson about 9 years ago

Fix broken rendering of curved labels for scripts which use >1 char
graphemes (fix #6883)

Revision 71b5a993
Added by Nyall Dawson about 9 years ago

Fix broken rendering of curved labels for scripts which use >1 char
graphemes (fix #6883)

Cherry-picked from 2dc5d95f00770c602497f633612a6dabb8be4962

History

#1 Updated by Mathieu Pellerin - nIRV over 11 years ago

Note: Hermann Kraus (https://github.com/herm) has recently implemented curved placement labels into mapnik relying on harfbuzz-ng. He might be a resourceful person to rely on. Behdad Esfahbod (https://github.com/behdad), the creator of harfbuzz, is also a great person to talk to.

#2 Updated by Mathieu Pellerin - nIRV over 11 years ago

This commit, applied to mapnik, insures clusters are respected and using same rotation angle: https://github.com/mapnik/mapnik/commit/f10d5b107f5fd62a2592cc1b0315fb9fcca38990 -- this might be useful in figuring out which functions are needed.

#3 Updated by Paolo Cavallini about 11 years ago

  • Category set to Labelling

#4 Updated by Mathieu Pellerin - nIRV about 10 years ago

Good news, everyone!

It seems there's actually no need to talk to harfbuzz directly, a QT function (QTextLayout::isValidCursorPosition) will actually validate whether the cursor in a text string is valid or not (i.e., "In a Unicode context some positions in the text are not valid cursor positions, because the position is inside a Unicode surrogate or a grapheme cluster.").

Currently, in qgspallabeling.cpp, the curved text placement simply breaks a string into its individual chars (line 147: for ( int i = 0; i < mText.count(); i++ )) which breaks Indic-based scripts (and most probably other languages) which rely on clusters of characters that can't be dissociated.

The code would have to be reworked to call isVlaidCursorPosition and accumulate clusters of chars to be compatible with non-Latin strings.

#5 Updated by Mathieu Pellerin - nIRV about 10 years ago

  • Assignee set to Larry Shaffer
  • Target version set to Future Release - Lower Priority

#6 Updated by Nyall Dawson about 9 years ago

  • Status changed from Open to Closed

Also available in: Atom PDF