Bug report #18601
QGIS3: texts with leading zero get integers
|Category:||Data Provider/Delimited Text|
|Affected QGIS version:||3.0.0||Regression?:||No|
|Operating System:||Microsoft Windows 7, 64-bit||Easy fix?:||No|
|Pull Request or Patch supplied:||No||Resolution:|
|Crashes QGIS or corrupts data:||No||Copied to github as #:||26489|
When loading delimited text without CSVT, fields with leading zeros (like German ZIP codes (f.e. 01099) or phone numbers) get smashed into integers.
I've written a pretty complex regex to identify numbers and numbers-alike strings. I think, for QGIS it's better to have an Excel or Calc like behavior to let the user chose, which format the columns are in?
Same behavior in 6ddd9ab0c7 and below.
Delimited text detectTypes flag - fixes #18601
[FEATURE] Adds a detectTypes flag to the delimited text provider url. If
set to "no" then type detection is not done and all attributes are
treated as text fields. Otherwise the original behaviour of
detecting field types is preserved.
[needs-docs] Adds a "Detect field types" check box to the record and
field options section of the delimited text provider GUI.
This addresses (at least partially) issue #18601. A more complete
solution would be to allow users to set field types.
#4 Updated by Tobias Wendorff about 2 years ago
Sorry, it's often very hard to find the correct category. I don't have have a clue, which is the right one this time. You can find the dialog this way:
Menu: "Layer > Add Layer > Add Delimited Text Layer"
Data Source Manager: "Delimited Text"
I just released that double-quoting the values also cast the strings them to integer. As of RFC 4180, text-qualified strings should be enclosed in quotes.
#8 Updated by Tobias Wendorff about 2 years ago
This still has not been fixed. I think, we should rework this feature ... it should work like XLSX import, where you can set the field type of every column. Also, a "scan 25/50/100% of file" feature is needed to correctly identify the fields.
Right now, this function is broken. It's highly destructive. An easy work-around would be to cast every field to text in import.
#9 Updated by Chris Crook about 2 years ago
Hi Tobias. I hear your frustration.
Setting every field type would be good but is not a quick option, and harder to do well in a GUI that is already too busy. At the moment the parser identifies field type by trying to read each field as integer and float (and potentially scanning the whole file). This is further complicated as QGIS now accepts many more field types than it did, so to do this properly would require handling dates for example, which means providing some means of specifying the date format.
One option that could be added quickly though is to add an option to set every column type to text as you suggest. The expression builder could still be used to convert the text fields to other types if required. And the X and Y columns could then be converted to floats to generate geometry and return null geometries where they cannot be read as numbers. I think this needs to be an option rather than a default, as there is much value in having numeric fields available by default - most users find the "highly destructive" delimited text function very useful, and get a lot of value from having numeric data that can be used directly for rendering, statistics, etc without having to use the expression builder. However the delimited text GUI is good at remembering options you select.
This would be one extra checkbox in the GUI, and one extra parameter in the layer URL, which seems a good solution.
How does that sound to you?