Bug report #4343

Shapefile, created in Qgis, encoding not recognized by Esri ArcGIS 10

Added by vvj - almost 8 years ago. Updated over 6 years ago.

Status:Closed
Priority:Low
Assignee:-
Category:Data Provider
Affected QGIS version:master Regression?:No
Operating System: Easy fix?:No
Pull Request or Patch supplied:No Resolution:fixed
Crashes QGIS or corrupts data:No Copied to github as #:14280

Description

Hello,

I have created shapefile with UTF-8 attribute values. In QuantumGIS 1.7.1 they are displayed correctly, but in Esri ArcGIS 10 encoding is lost.

Registry was modified according to http://support.esri.com/en/knowledgebase/techarticles/detail/21106 , correct .cpg file was created - UTF-8 encoding still not recognized.

Temporary workaround: open .dbf with LibreOffice Calc (encoding should be selected "Unicode UTF8), save it. Then create .cpg file with correct encoding name ("UTF-8") in it. After these steps, shapefile correctly can be opened in ArcGIS.

Attached are two sample shapefiles:
  • test_utf - original QuantumGIS shapefile with manually created .cpg
  • test_utf_lo - shapefile .dbf was modified with LibreOffice Calc

Also there is screenshot with actual and detected encoding sample.

test_utf8.zip - Sample shapefiles (90.8 KB) vvj -, 2011-10-03 01:41 AM


Related issues

Related to QGIS Application - Bug report #5911: Language Driver ID in dbf file of new shapefile Closed 2012-06-30

History

#1 Updated by Giovanni Manghi almost 8 years ago

  • Category set to Data Provider
  • Priority changed from Normal to Low

Arcgis does not allow the user to choose the enconding when opening a vector, as QGIS does?

#2 Updated by Giovanni Manghi almost 8 years ago

I'm not really sure this is a QGIS issue...

#3 Updated by vvj - almost 8 years ago

ArcGIS does not allow to choose encoding. For UTF-8 support in shapefiles, several requirements must be met:
1. Registry key with correct encoding must be set (otherwise, default system encoding will be used).
2. "When opening a shapefile and dBASE file in ArcGIS Desktop, the Desktop programs look at the Language Driver ID (LDID) in the header of a dBASE file, or an associated *.CPG file, which are both used to define the code page, and to determine the code page of the file that is read".

In my case, registry key and .cpg files are created, but encoding is not recognized, until .dbf file is opened with LibreOffice Calc (upon opening, encoding can be choosed) and saved with the same name. dbf, saved with Calc, is slightly different than QGIS created.

I guess, that QGIS incorrectly sets encoding in .dbf header, and yes, it IS QGIS issue.

I have attached "bad" and "good" shapefile samples, maybe you can figure out, what is different.

#4 Updated by Giovanni Manghi almost 8 years ago

  • Target version set to Version 1.7.4

#5 Updated by Paolo Cavallini over 7 years ago

  • Target version changed from Version 1.7.4 to Version 1.8.0
  • Crashes QGIS or corrupts data set to No
  • Affected QGIS version set to master

#6 Updated by Paolo Cavallini about 7 years ago

  • Target version changed from Version 1.8.0 to Version 2.0.0

#7 Updated by Borys Jurgiel over 6 years ago

  • Resolution set to fixed
  • Status changed from Open to Closed

Fixed in c0551a68c250489955c9831f5714f187df087d83

The difference between the files is the test_utf contains LDID=0x57, what is recognized by various software either as "current codepage" or "ISO-8859-1". Seems this value overrides CPG declaration for ArcGIS. LibreOffice Calc resets the LDID value to 0, so the CPG file can act. Now QGIS will reset LDID and create CPG file when creating new layer.

For existing layers you can use Shapefile Encoding Fixer plugin.

Also available in: Atom PDF