Bug report #13796

Attribute table swallow values

Added by dr - almost 5 years ago. Updated over 1 year ago.

Status:Closed
Priority:Normal
Assignee:-
Category:Attribute table
Affected QGIS version:master Regression?:No
Operating System:Ubuntu Easy fix?:No
Pull Request or Patch supplied:No Resolution:end of life
Crashes QGIS or corrupts data:No Copied to github as #:21821

Description

If open attached shapefile (original encoding is windows-1251) and choose UTF-8 then you can see question marks - this is what expected, but if choose System encoding, then you can see that value of CLNAME attribute is empty.

poly.zip (1.08 KB) dr -, 2015-11-09 09:55 AM

10.png (68.3 KB) dr -, 2015-11-09 09:55 AM

encoding_master.PNG (30.2 KB) Saber Razmjooei, 2016-05-24 10:23 PM

History

#1 Updated by dr - almost 5 years ago

#2 Updated by Giovanni Manghi almost 5 years ago

  • Category set to Attribute table

#3 Updated by Saber Razmjooei over 4 years ago

Looks fine in the latest master under Windows.

#4 Updated by Giovanni Manghi over 4 years ago

  • Resolution set to worksforme
  • Status changed from Feedback to Closed

closing for lack of feedback, please reopen if necessary.

#5 Updated by dr - over 4 years ago

  • Resolution deleted (worksforme)
  • Status changed from Closed to Reopened

Issue is still present.

#6 Updated by Jürgen Fischer over 4 years ago

  • Status changed from Reopened to Feedback

And what behavior do you expect? If you select "windows-1251" the value is "ЛЕСА" (forests). I suppose that would also occur if you "System"'s encoding was "windows-1251".

#7 Updated by dr - over 4 years ago

If I select "UTF-8" then I see field value as "����" but if I select "System" then I see field value as empty string. Why I get different result if my system encoding is UTF-8?

#8 Updated by Jürgen Fischer over 4 years ago

dr - wrote:

If I select "UTF-8" then I see field value as "����" but if I select "System" then I see field value as empty string. Why I get different result if my system encoding is UTF-8?

Qt's detection (or iconv's) of the system encoding is tricky. With LC_CTYPE=de_DE.UTF-8 the system codec is different from the UTF-8 codec.

#!/usr/bin/python

from PyQt4.QtCore import QTextCodec

sc = QTextCodec.codecForName( "System" )
lc = QTextCodec.codecForLocale()
uc = QTextCodec.codecForName( "utf-8" )
wc = QTextCodec.codecForName( "windows-1251" )

a = "\\313\\305\\321\\300" 
print "input:", a

b = sc.toUnicode(a)
print u"sc output: {} ({})".format(unicode(b), sc.name())

b = lc.toUnicode(a)
print u"lc output: {} ({})".format(unicode(b), lc.name())

b = uc.toUnicode(a)
print u"uc output: {} ({})".format(unicode(b), uc.name())

b = wc.toUnicode(a)
print u"wc output: {} ({})".format(unicode(b), wc.name())

produces

input: ����
sc output:  (System)
lc output:  (System)
uc output: ���� (UTF-8)
wc output: ЛЕСА (windows-1251)

But I don't see much difference between the empty string and the question mark junk. What's the point of not using the actual encoding of the data?

#9 Updated by dr - over 4 years ago

I see a lot of difference between empty string and questions marks! Questions marks are strongly associated with 'something is wrong', empty strings on the other side are common in the data and 'look' correct.

#10 Updated by Giovanni Manghi almost 4 years ago

  • Status changed from Feedback to Open

#11 Updated by Giovanni Manghi over 3 years ago

  • Regression? set to No
  • Easy fix? set to No

#12 Updated by Giovanni Manghi over 1 year ago

  • Resolution set to end of life
  • Status changed from Open to Closed

Also available in: Atom PDF