Bug report #5255

Wrong codepage of shapefile

Added by Stanislaw Kapustka over 7 years ago. Updated about 7 years ago.

Status:Closed
Priority:Normal
Assignee:-
Category:-
Affected QGIS version:1.7.4 Regression?:No
Operating System: Easy fix?:No
Pull Request or Patch supplied:No Resolution:upstream
Crashes QGIS or corrupts data:No Copied to github as #:14989

Description

When opening shapefiles, it doesn't matters what codepage You choose, it is always UTF-8 in QGIS 1.74, so polish letters are wrong displayed (when shapefile was saved in other codepage than UTF-8, of course). Other coding is on list but it not works. In QGIS 1.73 it works perfect. The same problem is in master version.

chinese.zip (555 Bytes) Even Rouault, 2012-06-10 09:06 AM


Related issues

Related to QGIS Application - Bug report #5340: QGIS loses non-latin letters in new shapefiles Closed 2012-04-11
Related to QGIS Application - Bug report #5900: QGIS 1.8.0 windows standalone ships with GDAL version tha... Rejected 2012-06-29
Related to QGIS Application - Bug report #5911: Language Driver ID in dbf file of new shapefile Closed 2012-06-30
Related to QGIS Application - Bug report #13203: When opening Shapefile the .cpg file is ignored in Window... Closed 2015-08-10

History

#1 Updated by Alexander Bruy over 7 years ago

This is because 1.7.4 and master now compiled against GDAL 1.9.0.

#2 Updated by zirneklitis - over 7 years ago

When *.dbf file is re-saved with OpenOffice Calc, QGIS shows the correct characters with any given code page. Until any edits are saved within QGIS. Question marks are saved in place of any non-latin characters. It's impossible to switch the code page for any shape files created by QGIS.

#3 Updated by Giovanni Manghi over 7 years ago

zirneklitis - wrote:

It's impossible to switch the code page for any shape files created by QGIS.

it is not qgis fault, is gdal one. see:

http://ssrebelious.wordpress.com/2012/03/11/qgis-and-gdal1-9-encoding-issue-a-workaround/

this is because 1.7.3 works, it is compiled with an old release of gdal.

#4 Updated by Alexander Bruy over 7 years ago

Bug in GDAL already fixed, see http://trac.osgeo.org/gdal/ticket/4650

#5 Updated by Giovanni Manghi over 7 years ago

  • Resolution set to upstream
  • Status changed from Open to Closed

#6 Updated by zirneklitis - over 7 years ago

Recompiled GDAl and QGIS:

QGIS version: 1.8.0-Lisboa, QGIS code revision: a1255fc, Compiled against GDAL/OGR: 2.0dev, Running against GDAL/OGR: 2.0dev.

Nothing has changed. The problem still remains.

OS: Fedora 14 x64.

#7 Updated by Alexander Bruy over 7 years ago

  • Status changed from Closed to Reopened

You can try custom QGIS build from NextGIS (http://nextgis.ru/en/nextgis-qgis/) where this issue solved

#8 Updated by Giovanni Manghi over 7 years ago

  • Status changed from Reopened to Closed

zirneklitis - wrote:

Recompiled GDAl and QGIS:

QGIS version: 1.8.0-Lisboa, QGIS code revision: a1255fc, Compiled against GDAL/OGR: 2.0dev, Running against GDAL/OGR: 2.0dev.

Nothing has changed. The problem still remains.

OS: Fedora 14 x64.

still a gdal issue, not a qgis one.

#9 Updated by zirneklitis - over 7 years ago

I insist that this is a QGIS issue.

GDAL 1.9.0 (and newer) is trying to interpret the encoding setting from the shape file itself. When creating a new shape file “ENCODING” should be passed as an attribute, which, obviously, is not done.

Calling qgis from terminal allows two track down an warning messages. Saving non-Latin characters in a shape files generates following warning message: “Warning 1: One or several characters couldn't be converted correctly from UTF-8 to ISO-8859-1.
This warning will not be emitted anymore”.

On the other hand, most of the shape files used by users are without character encoding byte. So QGIS has to operate with environmental variable “SHAPE_ENCODING”. At present the only solution is to use the same character coding for the given QGIS session, e.g.:

SHAPE_ENCODING=UTF-8
export SHAPE_ENCODING
qgis

The example above allows to create and edit shape files with UTF-8 as a character encoding (example for Linux users, Windows users must use “SET SHAPE_ENCODING=UTF-8”).

------------------------------------------------
Excerpt from

http://trac.osgeo.org/gdal/wiki/ConfigOptions

In C/C++ configuration switches can be set programmatically like this:

#include "cpl_conv.h"
...
CPLSetConfigOption( "GDAL_CACHEMAX", "64" );

Normally a configuration option applies to all threads active in a program, but they can be limited to only the current thread this way:

CPLSetThreadLocalConfigOption( "GDAL_CACHEMAX", "64" );

#10 Updated by zirneklitis - over 7 years ago

The Linux example above should be as follows:

$ SHAPE_ENCODING=UTF-8
$ export SHAPE_ENCODING
$ qgis

#11 Updated by Alexander Bruy over 7 years ago

zirneklitis - wrote:

I insist that this is a QGIS issue.

This is GDAL issue. GDAL always reports that it returned attributes is UTF-8, even when attributes have different encoding. SHAPE_ENCODING environment variable didn't work in most cases. This bug was partially fixed (see http://trac.osgeo.org/gdal/ticket/4650), but some more fixes needed

#12 Updated by Jürgen Fischer over 7 years ago

Alexander Bruy wrote:

You can try custom QGIS build from NextGIS (http://nextgis.ru/en/nextgis-qgis/) where this issue solved

how?

#13 Updated by Alexander Bruy over 7 years ago

Jürgen Fischer wrote:

how?

This is only workaround, not real fix. We simply reverted some parts of 2d0edcd7a2 (related to OLCStringsAsUTF8). With GDAL 2.0 in most cases all works fine without this workaround and we are working on final fix for GDAL

#14 Updated by Even Rouault over 7 years ago

Note that I've just pushed additonnal fixes in GDAL ( see http://trac.osgeo.org/gdal/ticket/4650 ) that should make OLCStringsAsUTF8 more reliable.

#15 Updated by Tim Sutton over 7 years ago

Hi

Could you please provide a Free, minimal test dataset so the we can add a test to our test suit, along with an idea of how we can evaluate the test as passing.

#16 Updated by Even Rouault over 7 years ago

I'm attaching a small shapefile generated by the following OGR Python script (needs latest GDAL trunk, to support recoding of field name from UTF-8 to CP936 - reading should be OK with GDAL 1.9)

import sys
from osgeo import ogr, osr, gdal
import struct

ds = ogr.GetDriverByName('ESRI Shapefile').CreateDataSource('chinese.dbf')
lyr = ds.CreateLayer('chinese', options = ['ENCODING=LDID/77'])
chinese_str = struct.pack('B' * 6, 229, 144, 141, 231, 167, 176)
lyr.CreateField(ogr.FieldDefn(chinese_str, ogr.OFTString))
feat = ogr.Feature(lyr.GetLayerDefn())
feat.SetField(0, chinese_str)
lyr.CreateFeature(feat)
ds = None

#17 Updated by zirneklitis - over 7 years ago

Who should create the .cpg files – GDAL or QGIS? Shape file with *.cpg* present works as expected (partly – QGIS has no idea of the existence of this file). The attribute values are not crippled any more. More about *.cpg files:

http://support.esri.com/en/knowledgebase/techarticles/detail/21106

#18 Updated by Minoru Akagi about 7 years ago

I installed GDAL 1.9.1 by using OSGeo4W.

When I convert a dataset of Shapefile which dbf file has "19" value (it means "CP932") in LDID field to KML format with ogr2ogr, the following message is shown.

Warning1: Recode from CP932 to UTF-8 not supported, treated as ISO8859-1 to UTF-8

The Japanese characters of generated KML file is incorrect. This will also result character corruption in QGIS.

I think that recoding of GDAL with iconv library is not enabled now.
For testing, I built GDAL 1.9.1 compiled with HAVE_ICONV constant declared and linked with iconv library.
With my built ogr2ogr, the warning is not appeared and a KML file with readable Japanese characters is generated.

I, as a Japanese user of the great softwares, desired that QGIS use GDAL with iconv library linked.

#19 Updated by Minoru Akagi about 7 years ago

I've also reported this recoding issue to OSGeo4W Trac.
http://trac.osgeo.org/osgeo4w/ticket/294

#20 Updated by Minoru Akagi about 7 years ago

Sorry, I noticed that I had a problem, which had been solved already in latest GDAL trunk. There is no problem converting CP932 to UTF-8.

#21 Updated by Jürgen Fischer about 1 year ago

  • Related to Bug report #13203: When opening Shapefile the .cpg file is ignored in Windows 8.1 added

Also available in: Atom PDF