Bug report #4343
Shapefile, created in Qgis, encoding not recognized by Esri ArcGIS 10
Status: | Closed | ||
---|---|---|---|
Priority: | Low | ||
Assignee: | - | ||
Category: | Data Provider | ||
Affected QGIS version: | master | Regression?: | No |
Operating System: | Easy fix?: | No | |
Pull Request or Patch supplied: | No | Resolution: | fixed |
Crashes QGIS or corrupts data: | No | Copied to github as #: | 14280 |
Description
Hello,
I have created shapefile with UTF-8 attribute values. In QuantumGIS 1.7.1 they are displayed correctly, but in Esri ArcGIS 10 encoding is lost.
Registry was modified according to http://support.esri.com/en/knowledgebase/techarticles/detail/21106 , correct .cpg file was created - UTF-8 encoding still not recognized.
Temporary workaround: open .dbf with LibreOffice Calc (encoding should be selected "Unicode UTF8), save it. Then create .cpg file with correct encoding name ("UTF-8") in it. After these steps, shapefile correctly can be opened in ArcGIS.
Attached are two sample shapefiles:- test_utf - original QuantumGIS shapefile with manually created .cpg
- test_utf_lo - shapefile .dbf was modified with LibreOffice Calc
Also there is screenshot with actual and detected encoding sample.
Related issues
History
#1 Updated by Giovanni Manghi about 13 years ago
- Category set to Data Provider
- Priority changed from Normal to Low
Arcgis does not allow the user to choose the enconding when opening a vector, as QGIS does?
#2 Updated by Giovanni Manghi about 13 years ago
I'm not really sure this is a QGIS issue...
#3 Updated by vvj - about 13 years ago
ArcGIS does not allow to choose encoding. For UTF-8 support in shapefiles, several requirements must be met:
1. Registry key with correct encoding must be set (otherwise, default system encoding will be used).
2. "When opening a shapefile and dBASE file in ArcGIS Desktop, the Desktop programs look at the Language Driver ID (LDID) in the header of a dBASE file, or an associated *.CPG file, which are both used to define the code page, and to determine the code page of the file that is read".
In my case, registry key and .cpg files are created, but encoding is not recognized, until .dbf file is opened with LibreOffice Calc (upon opening, encoding can be choosed) and saved with the same name. dbf, saved with Calc, is slightly different than QGIS created.
I guess, that QGIS incorrectly sets encoding in .dbf header, and yes, it IS QGIS issue.
I have attached "bad" and "good" shapefile samples, maybe you can figure out, what is different.
#4 Updated by Giovanni Manghi almost 13 years ago
- Target version set to Version 1.7.4
#5 Updated by Paolo Cavallini over 12 years ago
- Target version changed from Version 1.7.4 to Version 1.8.0
- Crashes QGIS or corrupts data set to No
- Affected QGIS version set to master
#6 Updated by Paolo Cavallini about 12 years ago
- Target version changed from Version 1.8.0 to Version 2.0.0
#7 Updated by Borys Jurgiel over 11 years ago
- Resolution set to fixed
- Status changed from Open to Closed
Fixed in c0551a68c250489955c9831f5714f187df087d83
The difference between the files is the test_utf contains LDID=0x57, what is recognized by various software either as "current codepage" or "ISO-8859-1". Seems this value overrides CPG declaration for ArcGIS. LibreOffice Calc resets the LDID value to 0, so the CPG file can act. Now QGIS will reset LDID and create CPG file when creating new layer.
For existing layers you can use Shapefile Encoding Fixer plugin.