Bug report #13203

When opening Shapefile the .cpg file is ignored in Windows 8.1

Added by Adrian Klink about 4 years ago. Updated 6 months ago.

Status:Closed
Priority:Normal
Assignee:-
Category:Vectors
Affected QGIS version:3.2.2 Regression?:No
Operating System:Windows Easy fix?:No
Pull Request or Patch supplied:No Resolution:no timely feedback
Crashes QGIS or corrupts data:No Copied to github as #:21264

Description

When opening a shapefile the .cpg file is ignored and default is used instead (ISO8859-1 in my case when using Drag-and-Drop, UTF-8 when using file open dialog).

I saved a shapefile (point, line, or polygon) with UTF-8 encoding (.cpg with UTF-8 was created). When opening via Drag-and-Drop the .cpg file is ignored and file is being opened with wrong encoding (ISO8859-1 instead of UTF-8) resulting in broken chars. When opening via add vector layer (ctrl + shift + v) using open file dialog, UTF-8 is used as default (which can be changed), but .cpg file is ignored as well. I have to pick the proper encoding manually, if shapefile has different encoding then UTF-8.

tested using:
Windows 8.1 64bit
+ QGIS 2.8.3-Wien (64bit)
+ QGIS 2.10.1-Pisa (32bit)

Windows 7 64bit (different machine)
+ QGIS 2.10.1-Pisa (64bit)

QGIS_2_10_Pisa_Layer_settings_contrast_stretch.png - Quantum GIS Option Ignore Shapefile Encoding (100 KB) Adrian Klink, 2015-08-11 08:14 AM

QGIS_2_10_Pisa_Ignore_Shapefile_Encoding.png - Quantum GIS Option Ignore Shapefile Encoding (58 KB) Adrian Klink, 2015-08-11 08:24 AM

windows 10 qgis encoding hell.zip - example shapefiles (5.96 KB) Johannes Kroeger, 2018-07-13 02:11 PM


Related issues

Related to QGIS Application - Feature request #18782: charset dialog window Open 2018-04-21
Related to QGIS Application - Bug report #5255: Wrong codepage of shapefile Closed 2012-03-29
Related to QGIS Application - Bug report #5911: Language Driver ID in dbf file of new shapefile Closed 2012-06-30
Related to QGIS Application - Feature request #21313: Consideration of CPG files for encoding shape files Open 2019-02-19

History

#1 Updated by Adrian Klink about 4 years ago

I have found the reason for my "bug" which seems to be a "feature". It is due to default Setting Ignore Shapefile Encoding in Quantum GIS Options that seems to have been introduced here:

#5911

[...]

Fixed in changeset [[https://issues.qgis.org/projects/quantum-gis/repository/revisions/75dc85b4d652116814873bb7674cab15ce6cde66]].

[...]

I've noticed that the garbling occurs when saving Spatialite/PostGIS layer to Shapefile. The above fix means that if the option "Ignore shapefile encoding" is checked, OGR Shapefile's encoding conversion will be disabled when a OGR layer is loaded.

[...]

I fixed my problem by disabling Ignore Shapefile Encoding in Options (see screenshot attached).

However, I find it irritating or confusing that this Option is enabled by default (I would have expected this Option to be disabled, in fact I didn't even know about this option before reading old bug reports and forum entries).

Why isn't it at least mentioned in QGIS documentation? e.g. here:

http://docs.qgis.org/2.8/de/docs/user_manual/working_with_vector/supported_data.html#esri-shapefiles

#2 Updated by Adrian Klink about 4 years ago

Wrong screenshot... replaced.

#3 Updated by Adrian Klink about 4 years ago

I understand, that Ignore Shapefile Encoding has been set to true for new created vector layers:

75dc85b

But why has it been generally set to true since QGIS 2.0 ?

ddb5117

This default true option leads to corrupted encoding if doing the following:

  1. Creating new vector layer, adding objects (with special characters in the attribute) and saving as Shapefile (UTF-8 is used as default, .cpg is automatically created)
  2. Later: Opening created shapefile using drag-and-drop (Encoding is then ignored upon loading since Ignore Shapefile Encoding is true by default)
  3. Selecting objects from vector layer and saving it to new shapefile (only selected objects), again using UTF-8 (as default encoding)

Surprise: special chars in file are now corrupted!

Same happens when saving file to kml (always using UTF-8!).

In short: IMHO the default option is not compatible to drag-and-drop usage.

Please explain and/or add comments to documentation - aspecially in the shapefile part as mentioned above, I know there is a really short description in options part but this doesn't really help if someone does not expect such a behaviour. It still seems to be a bug to me (according to what a user expects), although I am pretty sure this option was not set to true just by accident. But I do not understand why it was. In QGIS 1.9 this option was originally set to false which makes more sense to me.

#4 Updated by Adrian Klink about 4 years ago

Well, after more investigating into the issue, I found the following discussion going back to QGIS 1.8, why this was changed #6500

What would be the downside of setting QGIS to ignore GDAL's shapefile encoding by default?

Probably nothing. I also think it is preferable to disable encoding conversion of Shapefile layer in the present situation, for example, the handling of LDID/87 and OGR interface about encoding. All the non-ASCII characters using users will need the option, so it should be checked by default.

My question: Is this still true for current QGIS versions? As I mentioned above, there are downsides...

All the non-ASCII characters using users will need the option, so it should be checked by default.

All users using non-ASCII characters currently need to ignore GDAL shapefile encoding feature, and all users using ASCII characters will not be affected by turning this option on by default.

That's a win win solution :)

Again: IMHO it's not any more a win win situation since it interferes with drag and drop usage of QGIS 2.x.

Anyone who was in this discussion 3 years ago, could you please check if this default option still makes sense? I could not see any problems when disabling this option (Ignore Shapefile Encoding), but if option is active by default if interferes with drag and drop usage.

#5 Updated by Minoru Akagi about 4 years ago

Hi,

In Japan, we usually get shapefiles that encoding is CP932 (the code page in Japanese windows). Bad thing is that they sometimes have LDID/87 in the LDID field of dbf file. OGR shapefile driver handles LDID/87 as ISO-8859-1 so character corruption occurs.
Edited: 2015-09-05

When the "Ignore Shapefile Encoding" option is not checked and a shapefile to be loaded has non-zero LDID or a cpg file, the encoding selection on the open vector layer dialog is not applied to the layer. In this case, the user selection is ignored. Experienced users might be able to avoid the corruption by creating a cpg file or checking the option, but I am afraid of beginners' (or general users') confusion. So I think the option should be checked by default.

IMHO, a more flexible way to deal with this issue:
- When a shapefile is opened by drag & drop, QGIS tries to read the LDID / cpg file in its side and apply the encoding to the layer.
- The Ignore Shapefile Encoding option is always checked (maybe will be removed), so the layer encoding should be able to change on the properties dialog.

#6 Updated by Adrian Klink about 4 years ago

I am not sure if I get it right, but if I understand it correctly the automatic shapefile encoding detection is blocking manual selection of encoding in dialog (if not ignored). Therefore, if detection fails like e.g. for Japanese CP932 Encoding (or other non ASCII compatible encodings), user has to select encoding manually, but can not do so if "Ignore Shapefile Encoding" is not true.
What happens if a Japanese CP932 Shapefile is being opened by drag & drop? Will it fail? Does the "Ignore" option has any effect on it?

IMHO, same flexible way as mentioned by Minoru Akagi, but from different view:
- Moving the "Ignore Shapefile Encoding" option from general properties to the dialog (can be always checked unless deselected by user) so it will only have effect in the dialog (but not on drag & drop)
- Drag & Drop will not be affected by "Ignore Shapefile Encoding" at all (if shapefile is ISO-8859-1 encoding it should be detected properly, in any other case it is better to at least try to detect encoding instead of always using default ISO-8859-1 which may be wrong)

#7 Updated by Minoru Akagi about 4 years ago

Adrian Klink wrote:

What happens if a Japanese CP932 Shapefile is being opened by drag & drop? Will it fail? Does the "Ignore" option has any effect on it?

What the layer encoding is set to and whether characters in the attribute table are right or not when a CP932 shapefile is opened by drag & drop:

  • Ignore Shapefile Encoding option: ON
    OS\LDID LDID/0 LDID/19
    Win System (CP932) * System (CP932) *
    Ubuntu System (UTF-8) ** System (UTF-8) **
  • Ignore Shapefile Encoding option: OFF
    OS\LDID LDID/0 LDID/19
    Win System (CP932) * UTF-8 (grayed-out) *
    Ubuntu System (UTF-8) ** UTF-8 (grayed-out) *
*   Fine
**  Characters are wrong, but user can correct the encoding setting on the layer properties dialog.

*** Characters are wrong, and user cannot correct the encoding setting on the GUI.

LDID/19 means CP932.

Edited: 2015-09-05

#8 Updated by Minoru Akagi about 4 years ago

@Adrian Klink, I'm sorry. I had a misunderstanding. I do not get CP932 shapefiles with LDID/87. I get shapefiles with LDID/0 or LDID/19. I've edited my above comments.

And there is a nice plugin to fix the encoding declaration on GUI: Shapefile Encoding Fixer

#9 Updated by Adrian Klink about 4 years ago

@Minoru Akagi: Thank you very much for investigating into Japanese CP932 Shapefile opening by Drag&Drop with and w/o Ignore Shapefile Encoding option. And thank you for the link to the Shapefile Encoding Fixer Plugin. It is very usefull.

So I think we do agree that Option "Ignore Shapefile Encoding" should be handled differently for Drag&Drop (disabled) and opening via Dialog (enabled), unlike it is currently implemented. The remaining question is how this can be done the best way without any side effects.

#10 Updated by Peter Drexel over 2 years ago

I just ran into the same issue...

Opening Shapefiles using Drag and Drop should definitely use .cpg-File-Settings.

So as mentioned above
we should move
"the "Ignore Shapefile Encoding" option from general properties to the dialog (can be always checked unless deselected by user) so it will only have effect in the dialog (but not on drag & drop)"

So
"Drag & Drop will not be affected by "Ignore Shapefile Encoding" at all (if shapefile is ISO-8859-1 encoding it should be detected properly, in any other case it is better to at least try to detect encoding instead of always using default ISO-8859-1 which may be wrong)

Thanks Peter

#11 Updated by Giovanni Manghi over 2 years ago

  • Regression? set to No
  • Easy fix? set to No

#12 Updated by Johannes Kroeger over 1 year ago

This is really unexpected behaviour. Ignoring the .cpg file seems weird to me. Users can override the encoding the open dialog so if they must, the option exists.

The option is not clear, what exactly does it mean if one disables "on-the-fly conversion to UTF-8"? Our files might be declared as UTF-8 in the .cpg, what happens if we untick the option?

Please at least notify the user in a message about the ignoring. If magic must happen, maybe use something to try to detect the encoding?

.cpg files are there for a reason, it is not a good idea to assume that the system encoding is a sane, modern one (Hi Windows!). This just gave me flashbacks to ArcGIS...

#13 Updated by Quan Tum over 1 year ago

+1, same problem here on Windows...

#14 Updated by Johannes Kroeger over 1 year ago

As this continues to be a common source of frustration and student aversion against QGIS, I spent some time documenting what is happening and how unpredictable things are in this current state.

Pleeeeeaase:

- Disable the "Ignore shapefile encoding declaration" option by default.
- Do not force the user to load files with an override in the Data Source Manager. (Disable the Encoding setting there unless the user explicitly wants to override the Encoding for all the files they are loading at that moment.)

Essay ahead

I installed QGIS 3.2.0 on a fresh Windows 10 (free by Microsoft at https://developer.microsoft.com/en-us/microsoft-edge/tools/vms/).

PS C:\Users\IEUser> [System.Text.Encoding]::Default
IsSingleByte : True
BodyName : iso-8859-1
EncodingName : Western European (Windows)
HeaderName : Windows-1252
WebName : Windows-1252
WindowsCodePage : 1252
IsBrowserDisplay : True
IsBrowserSave : True
IsMailNewsDisplay : True
IsMailNewsSave : True
EncoderFallback : System.Text.InternalEncoderBestFitFallback
DecoderFallback : System.Text.InternalDecoderBestFitFallback
IsReadOnly : True
CodePage : 1252

QGIS' options menu shows "Ignore shapefile encoding declaration" checked as active.

I created a new Shapefile, "System encoding" in the dialog was chosen and I did not change it. I added a text field "text".

I then added one single feature, settings its "text" to "äöü". Drawing that as label was no problem.

I then used right-click → Export → Save Features to save the data to new shapefiles, manually setting the encoding to:
- UTF-8
- Windows-1252
- ISO-8859-1
- latin1

This led to all files but the initial one ("System") having a .cpg file:

File Content of CPG
System No .cpg file
ISO-8859-1 88591
latin1 latin1
UTF-8 UTF-8
Windows-1252 1252

The files are attached to this comment.

The files were automatically loaded after creation and their labels are fine.

I saved the project. Text was fine after loading.

With "Ignore shapefile encoding declaration"

I made a new project and used drag and drop from the Windows Explorer to add the files.

For the UTF-8 file the text field contents are now shown as garbage: "äöü". The others are fine.

I made a new project and used the Data Source Manager to add the files. Encoding "ISO-8859-1" was pre-selected in the dialog (not "System", interestingly). Of course loading the files with that forced encoding leads to the UTF-8 one being garbled again.

Out of interest I tried setting the dialog to "UTF-8". As expected this made the UTF-8 file load fine and the others become garbage (���).

I made a new project and used double-clicking in the Browser to load the files. They were magically loaded in UTF-8 mode, rendering all but the UTF-8 file garbage.

I looked around in the Browser to find its settings and I find none. I guess this forced override comes from my change in the Data Source Manager?

I enabled the Browser's Information Panel. Selecting the ISO-8859-1 file yielded the line "Encoding UTF-8" in it. Hell no, that file is NOT UTF-8! Still, probably from the forced override in the Data Source Manager.

I opened the Data Source Manager again, changed the Encoding to "System" and ... how do I save this ... clicked Close. No change in the Information Panel. I tried again, this time loading any random file. Now the Information Panel shows "System" for all my differently encoded files.

Without "Ignore shapefile encoding declaration"

Next I unticked the "Ignore shapefile encoding declaration" option in QGIS' options menu.

Checking the Information Panel again, it now shows "UTF-8" for all my files except the one with "System" encoding (the first one I created and used as base for the others). OK, I guess that is what was to be expected as the "Ignore shapefile encoding declaration" mentions some automatic conversion (not override!) from the original encoding to UTF-8 that OGR does. No idea why the one file is still shown as "System" though...

Loading the files via the Browser works fine for all of them.

Drag and drop from a Windows Explorer works fine for all of them.

Using the Data Source Manager (where the Encoding was set to "System" again) works fine for all of them. Wat.

Using the Data Source Manager where the Encoding was set to "UTF-8" works fine for all of them except for the "System" file which gives ���.

Using the Data Source Manager where the Encoding was set to "latin1" works fine for all of them. Wat².

#15 Updated by Adrian Klink over 1 year ago

I agree to it. The default "IgnoreShapefileEncoding" causes way too much confusion! It should be off by default and can be enabled if necessary.

@Minoru Akagi: Can you please make a Pull request to revert this commit ddb5117 for further qgis versions?

Commit ddb5117:

src/providers/ogr/qgsogrprovider.cpp

CPLSetConfigOption( "SHAPE_ENCODING", settings.value( "/qgis/ignoreShapeEncoding", true ).toBool() ? "" : 0 );

src/app/qgsoptions.cpp
cbxIgnoreShapeEncoding->setChecked( settings.value( "/qgis/ignoreShapeEncoding", true ).toBool() );

#16 Updated by Jürgen Fischer over 1 year ago

#17 Updated by Jürgen Fischer over 1 year ago

#18 Updated by Jürgen Fischer over 1 year ago

  • Related to Bug report #5911: Language Driver ID in dbf file of new shapefile added

#19 Updated by Jérôme Seigneuret about 1 year ago

Hi,
This problem is also on Windows 7 64x and QGIS 3.2.2 Drag & Drop don't use the CPG file. I need set it manually in source panel

#20 Updated by Jérôme Seigneuret about 1 year ago

It's in relation with ignoreShapeEncoding

Because default value is true. So there is no autodetection in drag&drop

But if you set ignoreShapeEncoding=false in QGIS.ini the encoding is UTF-8 and in layer properties, encoding is not editable (grey combobox is shaded)

I edit this directety in QGIS.ini textfile because there is an error on parameters dialogbox #19741

#21 Updated by Giovanni Manghi about 1 year ago

  • Description updated (diff)

I edit this directety in QGIS.ini textfile because there is an error on parameters dialogbox #19741

that error is caused by a 3rd party plugin, not qgis itself. If you remove/disable the plugin it works as expected?

#22 Updated by Jérôme Seigneuret about 1 year ago

Giovanni Manghi wrote:

that error is caused by a 3rd party plugin, not qgis itself. If you remove/disable the plugin it works as expected?

I have desactive all 3rd party plugin

I do a test to activate an desactivate the ignore encoding checkbox. There is no crash but there is no modification to...

Encoding is gray all time

My test

  1. Save a layer in CP1252 > It is added in list of layers
  1. Check encoding in properties. > combobox is gray and value is UTF8
  1. Save project and change Ignore Shapefile Encoding have no effect > there is an other problem here
  1. I set Ignore Shapefile Encoding to True in properties panel
  1. Drag&Drop my file in map
  1. Check properties > That is in encoding System so that is not ignored? The value are revesed???
  1. I set Ignore Shapefile Encoding to False
  1. Drag&Drop my file in map
  1. Check properties > That is in encoding utf-8 so that is not ignored? The value are really revesed!

That is not ignore shapefile encoding but use shapefile encoding

#23 Updated by Giovanni Manghi about 1 year ago

My test

  1. Save a layer in CP1252 > It is added in list of layers
  1. Check encoding in properties. > combobox is gray and value is UTF8
  1. Save project and change Ignore Shapefile Encoding have no effect > there is an other problem here
  1. I set Ignore Shapefile Encoding to True in properties panel
  1. Drag&Drop my file in map
  1. Check properties > That is in encoding System so that is not ignored? The value are revesed???
  1. I set Ignore Shapefile Encoding to False
  1. Drag&Drop my file in map
  1. Check properties > That is in encoding utf-8 so that is not ignored? The value are really revesed!

That is not ignore shapefile encoding but use shapefile encoding

what qgis version?

#24 Updated by Giovanni Manghi about 1 year ago

  • Category changed from GUI to Vectors
  • Status changed from Open to Feedback

#25 Updated by Jérôme Seigneuret about 1 year ago

Version de QGIS
3.2.2-Bonn
Révision du code
26842169e9
Compilé avec Qt
5.9.2
Utilisant Qt
5.9.2
Compilé avec GDAL/OGR
2.2.4
Utilisé avec GDAL/OGR
2.2.4
Compilé avec GEOS
3.6.1-CAPI-1.10.1
Utilisé avec GEOS
3.6.1-CAPI-1.10.1 r0
Version du client PostgreSQL
9.2.4
Version de SpatiaLite
4.3.0
Version de QWT
6.1.3
Version de QScintilla2
2.10.1
Version de PROJ.4 :
493

#26 Updated by Giovanni Manghi about 1 year ago

  • Affected QGIS version changed from 2.10.1 to 3.2.2

#27 Updated by Jürgen Fischer 8 months ago

  • Status changed from Feedback to Closed
  • Resolution set to no timely feedback

Bulk closing 82 tickets in feedback state for more than 90 days affecting an old version. Feel free to reopen if it still applies to a current version and you have more information that clarify the issue.

#28 Updated by Johannes Kroeger 6 months ago

Related #21313

#29 Updated by Jürgen Fischer 6 months ago

Also available in: Atom PDF