Bug report #4343
Shapefile, created in Qgis, encoding not recognized by Esri ArcGIS 10
|Affected QGIS version:||master||Regression?:||No|
|Operating System:||Easy fix?:||No|
|Pull Request or Patch supplied:||No||Resolution:||fixed|
|Crashes QGIS or corrupts data:||No||Copied to github as #:||14280|
I have created shapefile with UTF-8 attribute values. In QuantumGIS 1.7.1 they are displayed correctly, but in Esri ArcGIS 10 encoding is lost.
Registry was modified according to http://support.esri.com/en/knowledgebase/techarticles/detail/21106 , correct .cpg file was created - UTF-8 encoding still not recognized.
Temporary workaround: open .dbf with LibreOffice Calc (encoding should be selected "Unicode UTF8), save it. Then create .cpg file with correct encoding name ("UTF-8") in it. After these steps, shapefile correctly can be opened in ArcGIS.Attached are two sample shapefiles:
- test_utf - original QuantumGIS shapefile with manually created .cpg
- test_utf_lo - shapefile .dbf was modified with LibreOffice Calc
Also there is screenshot with actual and detected encoding sample.
#3 Updated by vvj - almost 8 years ago
ArcGIS does not allow to choose encoding. For UTF-8 support in shapefiles, several requirements must be met:
1. Registry key with correct encoding must be set (otherwise, default system encoding will be used).
2. "When opening a shapefile and dBASE file in ArcGIS Desktop, the Desktop programs look at the Language Driver ID (LDID) in the header of a dBASE file, or an associated *.CPG file, which are both used to define the code page, and to determine the code page of the file that is read".
In my case, registry key and .cpg files are created, but encoding is not recognized, until .dbf file is opened with LibreOffice Calc (upon opening, encoding can be choosed) and saved with the same name. dbf, saved with Calc, is slightly different than QGIS created.
I guess, that QGIS incorrectly sets encoding in .dbf header, and yes, it IS QGIS issue.
I have attached "bad" and "good" shapefile samples, maybe you can figure out, what is different.
#7 Updated by Borys Jurgiel over 6 years ago
- Resolution set to fixed
- Status changed from Open to Closed
The difference between the files is the test_utf contains LDID=0x57, what is recognized by various software either as "current codepage" or "ISO-8859-1". Seems this value overrides CPG declaration for ArcGIS. LibreOffice Calc resets the LDID value to 0, so the CPG file can act. Now QGIS will reset LDID and create CPG file when creating new layer.
For existing layers you can use Shapefile Encoding Fixer plugin.