Bug report #18601

QGIS3: texts with leading zero get integers

Added by Tobias Wendorff over 2 years ago. Updated almost 2 years ago.

Status:Closed
Priority:Normal
Assignee:-
Category:Data Provider/Delimited Text
Affected QGIS version:3.0.0 Regression?:No
Operating System:Microsoft Windows 7, 64-bit Easy fix?:No
Pull Request or Patch supplied:No Resolution:
Crashes QGIS or corrupts data:No Copied to github as #:26489

Description

When loading delimited text without CSVT, fields with leading zeros (like German ZIP codes (f.e. 01099) or phone numbers) get smashed into integers.
I've written a pretty complex regex to identify numbers and numbers-alike strings. I think, for QGIS it's better to have an Excel or Calc like behavior to let the user chose, which format the columns are in?

Same behavior in 6ddd9ab0c7 and below.


Related issues

Related to QGIS Application - Bug report #18789: Import of CSV-Data Closed 2018-04-22

Associated revisions

Revision b4f2069c
Added by Chris Crook almost 2 years ago

Delimited text detectTypes flag - fixes #18601

[FEATURE] Adds a detectTypes flag to the delimited text provider url. If
set to "no" then type detection is not done and all attributes are
treated as text fields. Otherwise the original behaviour of
detecting field types is preserved.

[needs-docs] Adds a "Detect field types" check box to the record and
field options section of the delimited text provider GUI.

This addresses (at least partially) issue #18601. A more complete
solution would be to allow users to set field types.

History

#1 Updated by Tobias Wendorff over 2 years ago

Demo dataset (text file), which get parsed as numeric instead of string:

01234
+1555487218
001555487218
0123456789
01234567890123456789
012345678901234567890123456789
0123456789012345678901234567890123456789

#2 Updated by Tobias Wendorff about 2 years ago

Could a mod please edit my first post? Actually, it is destroying data. I missed to add this.

#3 Updated by Nyall Dawson about 2 years ago

  • Status changed from Open to Feedback

Is this using the delimited text provider? Or ogr? You've marked it as the delimited text provider yet only ogr supports csvt.

#4 Updated by Tobias Wendorff about 2 years ago

Sorry, it's often very hard to find the correct category. I don't have have a clue, which is the right one this time. You can find the dialog this way:

Menu: "Layer > Add Layer > Add Delimited Text Layer"
Data Source Manager: "Delimited Text"

I just released that double-quoting the values also cast the strings them to integer. As of RFC 4180, text-qualified strings should be enclosed in quotes.

#5 Updated by Jürgen Fischer about 2 years ago

#6 Updated by Jürgen Fischer about 2 years ago

  • Description updated (diff)

#7 Updated by Jürgen Fischer about 2 years ago

  • Status changed from Feedback to Open

#8 Updated by Tobias Wendorff about 2 years ago

This still has not been fixed. I think, we should rework this feature ... it should work like XLSX import, where you can set the field type of every column. Also, a "scan 25/50/100% of file" feature is needed to correctly identify the fields.

Right now, this function is broken. It's highly destructive. An easy work-around would be to cast every field to text in import.

#9 Updated by Chris Crook about 2 years ago

Hi Tobias. I hear your frustration.

Setting every field type would be good but is not a quick option, and harder to do well in a GUI that is already too busy. At the moment the parser identifies field type by trying to read each field as integer and float (and potentially scanning the whole file). This is further complicated as QGIS now accepts many more field types than it did, so to do this properly would require handling dates for example, which means providing some means of specifying the date format.

One option that could be added quickly though is to add an option to set every column type to text as you suggest. The expression builder could still be used to convert the text fields to other types if required. And the X and Y columns could then be converted to floats to generate geometry and return null geometries where they cannot be read as numbers. I think this needs to be an option rather than a default, as there is much value in having numeric fields available by default - most users find the "highly destructive" delimited text function very useful, and get a lot of value from having numeric data that can be used directly for rendering, statistics, etc without having to use the expression builder. However the delimited text GUI is good at remembering options you select.

This would be one extra checkbox in the GUI, and one extra parameter in the layer URL, which seems a good solution.

How does that sound to you?

#10 Updated by Chris Crook almost 2 years ago

  • % Done changed from 0 to 100
  • Status changed from Open to Closed

Also available in: Atom PDF