Bug report #6883
curved placement for labels breaks Indic scripts (Khmer, Lao, Nepale, Bangladeshe, etc.) strings
|Affected QGIS version:||master||Regression?:||No|
|Operating System:||Easy fix?:||No|
|Pull Request or Patch supplied:||No||Resolution:|
|Crashes QGIS or corrupts data:||No||Copied to github as #:||16011|
QGIS currently breaks Indic scripts when labelling lines with the curved placement.
I believe this is due to the curved placement logic assuming a "latin" behavior to all strings. While in a latin-based language each char follows the previous one, it is not the case with Indic scripts. Indic scripts are broken down into clusters. Unicode char data for a basic cluster looks like this: [consonant],[leg(s)],[vowel]. The consonant is used as the center glyph, while leg(s) and vowel can be placed all around the center glyph / consonant.
You can spot the broken rendering when you see the dotted-circle (which is used by rendering engines such as harfbuzz and uniscribe to indicate that legs and vowels are not attached to a needed consonant to form a cluster. This is happening in QGIS because the curved placement breaks down strings into single chars and place it.
Fix broken rendering of curved labels for scripts which use >1 char
graphemes (fix #6883)
#1 Updated by Mathieu Pellerin - nIRV almost 8 years ago
#2 Updated by Mathieu Pellerin - nIRV almost 8 years ago
This commit, applied to mapnik, insures clusters are respected and using same rotation angle: https://github.com/mapnik/mapnik/commit/f10d5b107f5fd62a2592cc1b0315fb9fcca38990 -- this might be useful in figuring out which functions are needed.
#4 Updated by Mathieu Pellerin - nIRV over 6 years ago
Good news, everyone!
It seems there's actually no need to talk to harfbuzz directly, a QT function (QTextLayout::isValidCursorPosition) will actually validate whether the cursor in a text string is valid or not (i.e., "In a Unicode context some positions in the text are not valid cursor positions, because the position is inside a Unicode surrogate or a grapheme cluster.").
Currently, in qgspallabeling.cpp, the curved text placement simply breaks a string into its individual chars (line 147: for ( int i = 0; i < mText.count(); i++ )) which breaks Indic-based scripts (and most probably other languages) which rely on clusters of characters that can't be dissociated.
The code would have to be reworked to call isVlaidCursorPosition and accumulate clusters of chars to be compatible with non-Latin strings.