Bug report #5911

Language Driver ID in dbf file of new shapefile

Added by Minoru Akagi over 12 years ago. Updated over 11 years ago.

Status:Closed
Priority:High
Assignee:-
Category:Data Provider
Affected QGIS version:1.8.0 Regression?:No
Operating System: Easy fix?:No
Pull Request or Patch supplied:No Resolution:
Crashes QGIS or corrupts data:Yes Copied to github as #:15355

Description

Shapefile created with QGIS has 0x57 value in the LDID field of dbf file regardless of what encoding has been selected in the dialog. The LDID/87 (0x57) value means ISO-8859-1, which is a default. See OGR driver: ESRI Shapefile. This issue causes character corruption in the attribute table.

In detail, though the createEmptyDataSource() receives encoding as one of the parameters, it is not used to create shapefile.

Although the LDID might be set to the codepage specified by the user, the generated dataset that had zero in the LDID field and included .cpg file might be easier to handle as a user. This point is desirable to be discussed.

Best regards.

qgsogrprovider3.patch Magnifier - a patch for solution number 3 (1.02 KB) Minoru Akagi, 2012-08-01 11:04 PM

encodingtest.zip (6.16 KB) Marco Lechner, 2012-08-16 07:39 AM

japan_poly.zip - test data (Japanese main islands). (30.5 KB) Minoru Akagi, 2013-04-12 04:19 AM

shp-encoding-problem-cp1250.zip (238 KB) Ivan Mincik, 2013-04-18 02:02 AM


Related issues

Related to QGIS Application - Bug report #5900: QGIS 1.8.0 windows standalone ships with GDAL version tha... Rejected 2012-06-29
Related to QGIS Application - Bug report #5255: Wrong codepage of shapefile Closed 2012-03-29
Related to QGIS Application - Bug report #4343: Shapefile, created in Qgis, encoding not recognized by Es... Closed 2011-10-03
Related to QGIS Application - Bug report #5622: layer properties, general, provider-specific options, enc... Closed 2012-05-20
Related to QGIS Application - Bug report #5508: DBF encoding and cyrillic values Closed 2012-04-26
Related to QGIS Application - Bug report #5340: QGIS loses non-latin letters in new shapefiles Closed 2012-04-11
Related to QGIS Application - Bug report #5927: ESRI shapefile encoding problem Closed 2012-07-02
Related to QGIS Application - Bug report #5982: vector layer encoding default not saved not configurable Closed 2012-07-09
Related to QGIS Application - Bug report #6057: QGIS 1.8 Encoding problem with Bulgarian characters CP1251 Closed 2012-07-17
Related to QGIS Application - Bug report #13203: When opening Shapefile the .cpg file is ignored in Window... Closed 2015-08-10
Duplicated by QGIS Application - Bug report #6500: Language Encoding very broken in 1.8 Lisboa Closed 2012-10-11

Associated revisions

Revision 75dc85b4
Added by Jürgen Fischer over 12 years ago

allow to ignore (OGR's interpretation of ) shape file encoding (might fix #5911)

Revision 7fb46498
Added by Jürgen Fischer almost 12 years ago

also optionally apply SHAPE_ENCODING to layer creation (fixes #5911)

History

#1 Updated by Jürgen Fischer over 12 years ago

Mapping between LDID values and codepages (LDID/87 means ISO-8859-1): http://trac.osgeo.org/gdal/browser/branches/1.9/gdal/ogr/ogrsf_frmts/shape/ogrshapelayer.cpp#L170

#2 Updated by Minoru Akagi over 12 years ago

I show three solutions:
1. Create a mapping which converts the MIBenum to text of cpg file(or LDID value). QTextCodec::codecForName(encoding_name)->mibEnum() gives MIBenum.

  • The encodings in the listbox of QgsEncodingFileDialog are those that QTextCodec supports.
  • Mapping between MIBenum and character set name is at http://www.iana.org/assignments/character-sets
  • Some encodings supported by QTextCodec are not supported by Shapefile.
2. Change encodings in the listbox of QgsEncodingFileDialog for supported encodings of Shapefile.

3. Generate a Shapefile dataset that has zero in LDID field and no cpg file regardless of the selected encoding. Then QGIS opens the dataset with the encoding specified.

I guess number 3 is easiest one.

#3 Updated by Minoru Akagi over 12 years ago

I attach a patch for solution number 3.

#5 Updated by Minoru Akagi over 12 years ago

If default LDID of OGR Shapefile Driver dataset creation was changed to zero, the encoding problem of shapefiles generated via the "Save vector layer as" dialog would be solved as well. Also that of shapefiles generated by some plug-ins(e.g. fTools).

#6 Updated by Marco Lechner over 12 years ago

I guess this should be priority high, because all shapes not having LDID set or not having an cpg-file (which surely are most of Shapes out there) are forced latin1. Users choice to select the encoding when loading a layer, should always overwrite the default. Otherwise it can not be understood why a Shapes attributetable is always displayed wrong, wether the user tries to define encoding or not. This brakes the behavior of QGIS as known by the user.

I add some Shapes and qgs-files for testing.

btw it depends on gdal-Version 1.9.x

#7 Updated by Jürgen Fischer over 12 years ago

  • Status changed from Open to Closed

#8 Updated by Minoru Akagi about 12 years ago

Jef's fix is good at reading shapefiles and creating new shapefiles, so the issue of this ticket has been fixed. However I've found an encoding problem of the shapefile generated via the "Save vector layer as" dialog or fTools is still existing. OGR Shapefile driver converts character encoding from UTF-8 to ISO-8859-1 and rarely garbles attribute strings.

See also GDAL #4808

#9 Updated by Minoru Akagi about 12 years ago

Sorry,

Testing it again today, I don't experience any character corruption of shapefiles generated via both "Save vector layer as" and fTools. Maybe I had forgotten to check the option. The fix is very nice!

#10 Updated by Minoru Akagi almost 12 years ago

  • Status changed from Closed to Reopened

I've noticed that the garbling occurs when saving Spatialite/PostGIS layer to Shapefile. The above fix means that if the option "Ignore shapefile encoding" is checked, OGR Shapefile's encoding conversion will be disabled when a OGR layer is loaded. Saving after editing newly created layer has no problem because in the layer creation QGIS generates an empty layer and then loads it. However, in the particular case I encountered, if no OGR layer has been loaded, output will be garbled.

#11 Updated by Jürgen Fischer almost 12 years ago

  • Status changed from Reopened to Closed

#12 Updated by Minoru Akagi almost 12 years ago

Thank you very much.

#13 Updated by Minoru Akagi over 11 years ago

#14 Updated by Ivan Mincik over 11 years ago

I am attaching test Shapefile in cp1250 and the same in utf-8 for comparison. Both where made by QGIS 1.8 compiled with GDAL 1.7 (in Debian Squeeze).

#16 Updated by Minoru Akagi over 11 years ago

In master, LDID is set to zero and .cpg file is appended except for "System" on creating shapefile. Thank you Borys!

#17 Updated by Jürgen Fischer over 6 years ago

  • Related to Bug report #13203: When opening Shapefile the .cpg file is ignored in Windows 8.1 added

Also available in: Atom PDF