Bug report #5911
Language Driver ID in dbf file of new shapefile
|Affected QGIS version:||1.8.0||Regression?:||No|
|Operating System:||Easy fix?:||No|
|Pull Request or Patch supplied:||No||Resolution:|
|Crashes QGIS or corrupts data:||Yes||Copied to github as #:||15355|
Shapefile created with QGIS has 0x57 value in the LDID field of dbf file regardless of what encoding has been selected in the dialog. The LDID/87 (0x57) value means ISO-8859-1, which is a default. See OGR driver: ESRI Shapefile. This issue causes character corruption in the attribute table.
In detail, though the createEmptyDataSource() receives encoding as one of the parameters, it is not used to create shapefile.
Although the LDID might be set to the codepage specified by the user, the generated dataset that had zero in the LDID field and included .cpg file might be easier to handle as a user. This point is desirable to be discussed.
#1 Updated by Jürgen Fischer about 7 years ago
Mapping between LDID values and codepages (LDID/87 means ISO-8859-1): http://trac.osgeo.org/gdal/browser/branches/1.9/gdal/ogr/ogrsf_frmts/shape/ogrshapelayer.cpp#L170
#2 Updated by Minoru Akagi almost 7 years ago
I show three solutions:
1. Create a mapping which converts the MIBenum to text of cpg file(or LDID value).
QTextCodec::codecForName(encoding_name)->mibEnum() gives MIBenum.
- The encodings in the listbox of QgsEncodingFileDialog are those that QTextCodec supports.
- Mapping between MIBenum and character set name is at http://www.iana.org/assignments/character-sets
- Some encodings supported by QTextCodec are not supported by Shapefile.
- Shapefile supports code page values in http://resources.arcgis.com/fr/content/kbase?fa=articleShow&d=21106
- There are a few plug-ins which use QgsEncodingFileDialog.
- Some other dialogs (such as "Save Vector Layer As" dialog) also have the encoding listbox.
3. Generate a Shapefile dataset that has zero in LDID field and no cpg file regardless of the selected encoding. Then QGIS opens the dataset with the encoding specified.
I guess number 3 is easiest one.
#6 Updated by Marco Lechner almost 7 years ago
- Priority changed from Normal to High
- File encodingtest.zip added
I guess this should be priority high, because all shapes not having LDID set or not having an cpg-file (which surely are most of Shapes out there) are forced latin1. Users choice to select the encoding when loading a layer, should always overwrite the default. Otherwise it can not be understood why a Shapes attributetable is always displayed wrong, wether the user tries to define encoding or not. This brakes the behavior of QGIS as known by the user.
I add some Shapes and qgs-files for testing.
btw it depends on gdal-Version 1.9.x
#8 Updated by Minoru Akagi almost 7 years ago
Jef's fix is good at reading shapefiles and creating new shapefiles, so the issue of this ticket has been fixed. However I've found an encoding problem of the shapefile generated via the "Save vector layer as" dialog or fTools is still existing. OGR Shapefile driver converts character encoding from UTF-8 to ISO-8859-1 and rarely garbles attribute strings.
See also GDAL #4808
#10 Updated by Minoru Akagi over 6 years ago
- Status changed from Closed to Reopened
I've noticed that the garbling occurs when saving Spatialite/PostGIS layer to Shapefile. The above fix means that if the option "Ignore shapefile encoding" is checked, OGR Shapefile's encoding conversion will be disabled when a OGR layer is loaded. Saving after editing newly created layer has no problem because in the layer creation QGIS generates an empty layer and then loads it. However, in the particular case I encountered, if no OGR layer has been loaded, output will be garbled.