Skip to content

Commit fab2c57

Browse files
committedApr 15, 2013
Merge pull request #527 from ccrook/master
Fix of delimited text provider to handle CSV files including quoted newlines properly
2 parents 82b41db + 632bfbb commit fab2c57

21 files changed

+3886
-1021
lines changed
 

‎src/core/qgsvectorlayer.h

Lines changed: 259 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -142,8 +142,252 @@ struct CORE_EXPORT QgsVectorJoinInfo
142142

143143

144144
/** \ingroup core
145-
* Vector layer backed by a data source provider.
145+
* Represents a vector layer which manages a vector based data sets.
146+
*
147+
* The QgsVectorLayer is instantiated by specifying the name of a data provider,
148+
* such as postgres or wfs, and url defining the specific data set to connect to.
149+
* The vector layer constructor in turn instantiates a QgsVectorDataProvider subclass
150+
* corresponding to the provider type, and passes it the url. The data provider
151+
* connects to the data source.
152+
*
153+
* The QgsVectorLayer provides a common interface to the different data types. It also
154+
* manages editing transactions.
155+
*
156+
* Sample usage of the QgsVectorLayer class:
157+
*
158+
* \code
159+
* QString uri = "point?crs=epsg:4326&field=id:integer";
160+
* QgsVectorLayer *scratchLayer = new QgsVectorLayer(uri, "Scratch point layer", "memory");
161+
* \endcode
162+
*
163+
* The main data providers supported by QGis are listed below.
164+
*
165+
* \section providers Vector data providers
166+
*
167+
* \subsection memory Memory data providerType (memory)
168+
*
169+
* The memory data provider is used to construct in memory data, for example scratch
170+
* data or data generated from spatial operations such as contouring. There is no
171+
* inherent persistent storage of the data. The data source uri is constructed. The
172+
* url specifies the geometry type ("point", "linestring", "polygon",
173+
* "multipoint","multilinestring","multipolygon"), optionally followed by url parameters
174+
* as follows:
175+
*
176+
* - crs=definition
177+
* Defines the coordinate reference system to use for the layer.
178+
* definition is any string accepted by QgsCoordinateReferenceSystem::createFromString()
179+
*
180+
* - index=yes
181+
* Specifies that the layer will be constructed with a spatial index
182+
*
183+
* - field=name:type(length,precision)
184+
* Defines an attribute of the layer. Multiple field parameters can be added
185+
* to the data provider definition. type is one of "integer", "double", "string".
186+
*
187+
* An example url is "Point?crs=epsg:4326&field=id:integer&field=name:string(20)&index=yes"
188+
*
189+
* \subsection ogr OGR data provider (ogr)
190+
*
191+
* Accesses data using the OGR drivers (http://www.gdal.org/ogr/ogr_formats.html). The url
192+
* is the OGR connection string. A wide variety of data formats can be accessed using this
193+
* driver, including file based formats used by many GIS systems, database formats, and
194+
* web services. Some of these formats are also supported by custom data providers listed
195+
* below.
196+
*
197+
* \subsection spatialite Spatialite data provider (spatialite)
198+
*
199+
* Access data in a spatialite database. The url defines the connection parameters, table,
200+
* geometry column, and other attributes. The url can be constructed using the
201+
* QgsDataSourceURI class.
202+
*
203+
* \subsection postgres Postgresql data provider (postgres)
204+
*
205+
* Connects to a postgresql database. The url defines the connection parameters, table,
206+
* geometry column, and other attributes. The url can be constructed using the
207+
* QgsDataSourceURI class.
208+
*
209+
* \subsection mssql Microsoft SQL server data provider (mssql)
210+
*
211+
* Connects to a Microsoft SQL server database. The url defines the connection parameters, table,
212+
* geometry column, and other attributes. The url can be constructed using the
213+
* QgsDataSourceURI class.
214+
*
215+
* \subsection sqlanywhere SQL Anywhere data provider (sqlanywhere)
216+
*
217+
* Connects to an SQLanywhere database. The url defines the connection parameters, table,
218+
* geometry column, and other attributes. The url can be constructed using the
219+
* QgsDataSourceURI class.
220+
*
221+
* \subsection wfs WFS (web feature service) data provider (wfs)
222+
*
223+
* Used to access data provided by a web feature service.
224+
*
225+
* The url can be a HTTP url to a WFS 1.0.0 server or a GML2 data file path.
226+
* Examples are http://foobar/wfs or /foo/bar/file.gml
227+
*
228+
* If a GML2 file path is provided the driver will attempt to read the schema from a
229+
* file in the same directory with the same basename + “.xsd”. This xsd file must be
230+
* in the same format as a WFS describe feature type response. If no xsd file is provide
231+
* then the driver will attempt to guess the attribute types from the file.
232+
*
233+
* In the case of a HTTP URL the ‘FILTER’ query string parameter can be used to filter
234+
* the WFS feature type. The ‘FILTER’ key value can either be a QGIS expression
235+
* or an OGC XML filter. If the value is set to a QGIS expression the driver will
236+
* turn it into OGC XML filter before passing it to the WFS server. Beware the
237+
* QGIS expression filter only supports” =, != ,<,> ,<= ,>= ,AND ,OR ,NOT, LIKE, IS NULL”
238+
* attribute operators, “BBOX, Disjoint, Intersects, Touches, Crosses, Contains, Overlaps, Within”
239+
* spatial binary operators and the QGIS local “geomFromWKT, geomFromGML”
240+
* geometry constructor functions.
241+
*
242+
* Also note:
243+
*
244+
* - You can use various functions available in the QGIS Expression list,
245+
* however the function must exist server side and have the same name and arguments to work.
246+
*
247+
* - Use the special $geometry parameter to provide the layer geometry column as input
248+
* into the spatial binary operators e.g intersects($geometry, geomFromWKT('POINT (5 6)'))
249+
*
250+
* \subsection delimitedtext Delimited text file data provider (delimitedtext)
251+
*
252+
* Accesses data in a delimited text file, for example CSV files generated by
253+
* spreadsheets. The contents of the file are split into columns based on specified
254+
* delimiter characters. Each record may be represented spatially either by an
255+
* X and Y coordinate column, or by a WKT (well known text) formatted columns.
256+
*
257+
* The url defines the filename, the formatting options (how the
258+
* text in the file is divided into data fields, and which fields contain the
259+
* X,Y coordinates or WKT text definition. The options are specified as url query
260+
* items.
261+
*
262+
* At its simplest the url can just be the filename, in which case it will be loaded
263+
* as a CSV formatted file.
264+
*
265+
* The url may include the following items:
266+
*
267+
* - encoding=UTF-8
268+
*
269+
* Defines the character encoding in the file. The default is UTF-8. To use
270+
* the default encoding for the operating system use "System".
271+
*
272+
* - type=(csv|regexp|whitespace|plain)
273+
*
274+
* Defines the algorithm used to split records into columns. Records are
275+
* defined by new lines, except for csv format files for which quoted fields
276+
* may span multiple records. The default type is csv.
277+
*
278+
* - "csv" splits the file based on three sets of characters:
279+
* delimiter characters, quote characters,
280+
* and escape characters. Delimiter characters mark the end
281+
* of a field. Quote characters enclose a field which can contain
282+
* delimiter characters, and newlines. Escape characters cause the
283+
* following character to be treated literally (including delimiter,
284+
* quote, and newline characters). Escape and quote characters must
285+
* be different from delimiter characters. Escape characters that are
286+
* also quote characters are treated specially - they can only
287+
* escape themselves within quotes. Elsewhere they are treated as
288+
* quote characters. The defaults for delimiter, quote, and escape
289+
* are ',', '"', '"'.
290+
* - "regexp" splits each record using a regular expression (see QRegExp
291+
* documentation for details).
292+
* - "whitespace" splits each record based on whitespace (on or more whitespace
293+
* characters. Leading whitespace in the record is ignored.
294+
* - "plain" is provided for backwards compatibility. It is equivalent to
295+
* CSV except that the default quote characters are single and double quotes,
296+
* and there is no escape characters.
297+
*
298+
* - delimiter=characters
299+
*
300+
* Defines the delimiter characters used for csv and plain type files, or the
301+
* regular expression for regexp type files. It is a literal string of characters
302+
* except that "\t" may be used to represent a tab character.
303+
*
304+
* - quote=characters
305+
*
306+
* Defines the characters that are used as quote characters for csv and plain type
307+
* files.
308+
*
309+
* - escape=characters
310+
*
311+
* Defines the characters used to escape delimiter, quote, and newline characters.
312+
*
313+
* - skipEmptyFields=(yes|no)
314+
*
315+
* If yes then empty fields will be discarded (eqivalent to concatenating consecutive
316+
* delimiters)
317+
*
318+
* - trimFields=(yes|no)
319+
*
320+
* If yes then leading and trailing whitespace will be removed from fields
321+
*
322+
* - skipLines=n
323+
*
324+
* Defines the number of lines to ignore at the beginning of the file (default 0)
325+
*
326+
* - useHeader=(yes|no)
327+
*
328+
* Defines whether the first record in the file (after skipped lines) contains
329+
* column names (default yes)
330+
*
331+
* - xField=column yField=column
332+
*
333+
* Defines the name of the columns holding the x and y coordinates for XY point geometries.
334+
* If the useHeader is no (ie there are no column names), then this is the column
335+
* number (with the first column as 1).
336+
*
337+
* - decimalPoint=c
338+
*
339+
* Defines a character that is used as a decimal point in the X and Y columns.
340+
* The default is '.'.
341+
*
342+
* - xyDms=(yes|no)
343+
*
344+
* If yes then the X and Y coordinates are interpreted as
345+
* degrees/minutes/seconds format (fairly permissively),
346+
* or degree/minutes format.
347+
*
348+
* - wktField=column
349+
*
350+
* Defines the name of the columns holding the WKT geometry definition for WKT geometries.
351+
* If the useHeader is no (ie there are no column names), then this is the column
352+
* number (with the first column as 1).
353+
*
354+
* - geomType=(point|line|polygon|none)
355+
*
356+
* Defines the geometry type for WKT type geometries. QGis will only display one
357+
* type of geometry for the layer - any others will be ignored when the file is
358+
* loaded. By default the provider uses the type of the first geometry in the file.
359+
* Use geomType to override this type.
360+
*
361+
* geomType can also be set to none, in which case the layer is loaded without
362+
* geometries.
363+
*
364+
* - crs=crsstring
365+
*
366+
* Defines the coordinate reference system used for the layer. This can be
367+
* any string accepted by QgsCoordinateReferenceSystem::createFromString()
368+
*
369+
* - quiet
370+
*
371+
* Errors encountered loading the file will not be reported in a user dialog if
372+
* quiet is included (They will still be shown in the output log).
373+
*
374+
* \subsection gpx GPX data provider (gpx)
375+
*
376+
* Provider reads tracks, routes, and waypoints from a GPX file. The url
377+
* defines the name of the file, and the type of data to retrieve from it
378+
* ("track", "route", or "waypoint").
379+
*
380+
* An example url is "/home/user/data/holiday.gpx?type=route"
381+
*
382+
* \subsection grass Grass data provider (grass)
383+
*
384+
* Provider to display vector data in a GRASS GIS layer.
385+
*
386+
*
387+
*
146388
*/
389+
390+
147391
class CORE_EXPORT QgsVectorLayer : public QgsMapLayer
148392
{
149393
Q_OBJECT
@@ -235,7 +479,18 @@ class CORE_EXPORT QgsVectorLayer : public QgsMapLayer
235479
QList<GroupData> mGroups;
236480
};
237481

238-
/** Constructor */
482+
/** Constructor - creates a vector layer
483+
*
484+
* The QgsVectorLayer is constructed by instantiating a data provider. The provider
485+
* interprets the supplied path (url) of the data source to connect to and access the
486+
* data.
487+
*
488+
* @param path The path or url of the parameter. Typically this encodes
489+
* parameters used by the data provider as url query items.
490+
* @param baseName The name used to represent the layer in the legend
491+
* @param providerLib The name of the data provider, eg "memory", "postgres"
492+
*
493+
*/
239494
QgsVectorLayer( QString path = QString::null, QString baseName = QString::null,
240495
QString providerLib = QString::null, bool loadDefaultStyleFlag = true );
241496

@@ -337,7 +592,7 @@ class CORE_EXPORT QgsVectorLayer : public QgsMapLayer
337592
* @see deselect(QgsFeatureIds)
338593
* @see deselect(QgsFeatureId)
339594
*/
340-
void modifySelection(QgsFeatureIds selectIds, QgsFeatureIds deselectIds );
595+
void modifySelection( QgsFeatureIds selectIds, QgsFeatureIds deselectIds );
341596

342597
/** Select not selected features and deselect selected ones */
343598
void invertSelection();
@@ -940,7 +1195,7 @@ class CORE_EXPORT QgsVectorLayer : public QgsMapLayer
9401195
*
9411196
* @see deselect(QgsFeatureId)
9421197
*/
943-
void deselect(const QgsFeatureIds& featureIds );
1198+
void deselect( const QgsFeatureIds& featureIds );
9441199

9451200
/**
9461201
* Clear selection

‎src/providers/delimitedtext/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
SET (DTEXT_SRCS
66
qgsdelimitedtextfeatureiterator.cpp
77
qgsdelimitedtextprovider.cpp
8+
qgsdelimitedtextfile.cpp
89
qgsdelimitedtextsourceselect.cpp
910
)
1011

‎src/providers/delimitedtext/qgsdelimitedtextfeatureiterator.cpp

Lines changed: 22 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@
1313
* *
1414
***************************************************************************/
1515
#include "qgsdelimitedtextfeatureiterator.h"
16-
1716
#include "qgsdelimitedtextprovider.h"
17+
#include "qgsdelimitedtextfile.h"
1818

1919
#include "qgsgeometry.h"
2020
#include "qgsmessagelog.h"
@@ -49,14 +49,12 @@ bool QgsDelimitedTextFeatureIterator::nextFeature( QgsFeature& feature )
4949
if ( mClosed )
5050
return false;
5151

52-
while ( !P->mStream->atEnd() )
52+
QStringList tokens;
53+
while ( true )
5354
{
54-
QString line = P->readLine( P->mStream ); // Default local 8 bit encoding
55-
if ( line.isEmpty() )
56-
continue;
57-
58-
// lex the tokens from the current data line
59-
QStringList tokens = P->splitLine( line );
55+
QgsDelimitedTextFile::Status status = P->mFile->nextRecord( tokens );
56+
if ( status == QgsDelimitedTextFile::RecordEOF ) break;
57+
if ( status != QgsDelimitedTextFile::RecordOk ) continue;
6058

6159
while ( tokens.size() < P->mFieldCount )
6260
tokens.append( QString::null );
@@ -74,7 +72,11 @@ bool QgsDelimitedTextFeatureIterator::nextFeature( QgsFeature& feature )
7472

7573
if ( !geom && P->mWkbType != QGis::WKBNoGeometry )
7674
{
77-
P->mInvalidLines << line;
75+
// Already dealt with invalid lines in provider - no need to repeat
76+
// removed code (CC 2013-04-13) ...
77+
// P->mInvalidLines << line;
78+
// In any case it may be a valid line that is excluded because of
79+
// bounds check...
7880
continue;
7981
}
8082

@@ -114,7 +116,7 @@ bool QgsDelimitedTextFeatureIterator::nextFeature( QgsFeature& feature )
114116

115117
// End of the file. If there are any lines that couldn't be
116118
// loaded, display them now.
117-
P->handleInvalidLines();
119+
// P->handleInvalidLines();
118120

119121
close();
120122
return false;
@@ -128,11 +130,7 @@ bool QgsDelimitedTextFeatureIterator::rewind()
128130
// Reset feature id to 0
129131
mFid = 0;
130132
// Skip to first data record
131-
P->mStream->seek( 0 );
132-
int n = P->mFirstDataLine - 1;
133-
while ( n-- > 0 )
134-
P->readLine( P->mStream );
135-
133+
P->resetStream();
136134
return true;
137135
}
138136

@@ -143,7 +141,6 @@ bool QgsDelimitedTextFeatureIterator::close()
143141

144142
// tell provider that this iterator is not active anymore
145143
P->mActiveIterator = 0;
146-
147144
mClosed = true;
148145
return true;
149146
}
@@ -152,24 +149,11 @@ bool QgsDelimitedTextFeatureIterator::close()
152149
QgsGeometry* QgsDelimitedTextFeatureIterator::loadGeometryWkt( const QStringList& tokens )
153150
{
154151
QgsGeometry* geom = 0;
155-
try
156-
{
157-
QString sWkt = tokens[P->mWktFieldIndex];
158-
// Remove Z and M coordinates if present, as currently fromWkt doesn't
159-
// support these.
160-
if ( P->mWktHasZM )
161-
{
162-
sWkt.remove( P->mWktZMRegexp ).replace( P->mWktCrdRegexp, "\\1" );
163-
}
152+
QString sWkt = tokens[P->mWktFieldIndex];
164153

165-
geom = QgsGeometry::fromWkt( sWkt );
166-
}
167-
catch ( ... )
168-
{
169-
geom = 0;
170-
}
154+
geom = P->geomFromWkt( sWkt );
171155

172-
if ( geom && geom->wkbType() != P->mWkbType )
156+
if ( geom && geom->type() != P->mGeometryType )
173157
{
174158
delete geom;
175159
geom = 0;
@@ -187,38 +171,26 @@ QgsGeometry* QgsDelimitedTextFeatureIterator::loadGeometryXY( const QStringList&
187171
{
188172
QString sX = tokens[P->mXFieldIndex];
189173
QString sY = tokens[P->mYFieldIndex];
174+
QgsPoint pt;
175+
bool ok = P->pointFromXY( sX, sY, pt );
190176

191-
if ( !P->mDecimalPoint.isEmpty() )
177+
if ( ok && boundsCheck( pt ) )
192178
{
193-
sX.replace( P->mDecimalPoint, "." );
194-
sY.replace( P->mDecimalPoint, "." );
195-
}
196-
197-
bool xOk, yOk;
198-
double x = sX.toDouble( &xOk );
199-
double y = sY.toDouble( &yOk );
200-
if ( xOk && yOk )
201-
{
202-
if ( boundsCheck( x, y ) )
203-
{
204-
return QgsGeometry::fromPoint( QgsPoint( x, y ) );
205-
}
179+
return QgsGeometry::fromPoint( pt );
206180
}
207181
return 0;
208182
}
209183

210-
211-
212184
/**
213185
* Check to see if the point is within the selection rectangle
214186
*/
215-
bool QgsDelimitedTextFeatureIterator::boundsCheck( double x, double y )
187+
bool QgsDelimitedTextFeatureIterator::boundsCheck( const QgsPoint &pt )
216188
{
217189
// no selection rectangle or geometry => always in the bounds
218190
if ( mRequest.filterType() != QgsFeatureRequest::FilterRect || ( mRequest.flags() & QgsFeatureRequest::NoGeometry ) )
219191
return true;
220192

221-
return mRequest.filterRect().contains( QgsPoint( x, y ) );
193+
return mRequest.filterRect().contains( pt );
222194
}
223195

224196
/**

‎src/providers/delimitedtext/qgsdelimitedtextfeatureiterator.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ class QgsDelimitedTextFeatureIterator : public QgsAbstractFeatureIterator
4444
QgsGeometry* loadGeometryWkt( const QStringList& tokens );
4545
QgsGeometry* loadGeometryXY( const QStringList& tokens );
4646

47-
bool boundsCheck( double x, double y );
47+
bool boundsCheck( const QgsPoint &pt );
4848
bool boundsCheck( QgsGeometry *geom );
4949

5050
void fetchAttribute( QgsFeature& feature, int fieldIdx, const QStringList& tokens );

‎src/providers/delimitedtext/qgsdelimitedtextfile.cpp

Lines changed: 576 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 303 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,303 @@
1+
/***************************************************************************
2+
qgsdelimitedtextparser.h - File for delimited text file
3+
-------------------
4+
begin : 2004-02-27
5+
copyright : (C) 2013 by Chris Crook
6+
email : ccrook at linz.govt.nz
7+
***************************************************************************/
8+
9+
/***************************************************************************
10+
* *
11+
* This program is free software; you can redistribute it and/or modify *
12+
* it under the terms of the GNU General Public License as published by *
13+
* the Free Software Foundation; either version 2 of the License, or *
14+
* (at your option) any later version. *
15+
* *
16+
***************************************************************************/
17+
18+
#include <QStringList>
19+
#include <QRegExp>
20+
#include <QUrl>
21+
22+
class QgsFeature;
23+
class QgsField;
24+
class QFile;
25+
class QTextStream;
26+
27+
28+
/**
29+
\class QgsDelimitedTextFile
30+
\brief Delimited text file parser extracts records from a QTextStream as a QStringList.
31+
*
32+
*
33+
* The delimited text parser is used by the QgsDelimitedTextProvider to parse
34+
* a QTextStream into records of QStringList. It provides a number of variants
35+
* for parsing each record. The following options are supported:
36+
* - Basic whitespace parsing. Each line in the file is treated as a record.
37+
* Extracts all contiguous sequences of non-whitespace
38+
* characters. Leading and trailing whitespace are ignored.
39+
* - Regular expression parsing. Each line in the file is treated as a record.
40+
* The string is split into fields based on a regular expression.
41+
* - Character delimited, based on a delimiter character set, a quote character, and
42+
* an escape character. The escape treats the next character as a part of a field.
43+
* Fields may start and end with quote characters, in which case any non-escaped
44+
* character within the field is treated literally, including end of line characters.
45+
* The escape character within a string causes the next character to be read literally
46+
* (this includes new line characters). If the escape and quote characters are the
47+
* same, then only quote characters will be escaped (ie to include a quote in a
48+
* quoted field it is entered as two quotes. All other characters in quoted fields
49+
* are treated literally, including newlines.
50+
* - CSV format files - these are a special case of character delimited, in which the
51+
* delimiter is a comma, and the quote and escape characters are double quotes (")
52+
*
53+
* The delimiters can be encode in and decoded from a QUrl as query items. The
54+
* items used are:
55+
* - delimiterType, one of plain (delimiter is any of a set of characters),
56+
* regexp, csv, whitespace
57+
* - delimiter, interpreted according to the type. For plain characters this is
58+
* a sequence of characters. The string \t in the sequence is replaced by a tab.
59+
* For regexp type delimiters this specifies the reqular expression.
60+
* The field is ignored for csv and whitespace
61+
* - quoteChar, optional, a single character used for quoting plain fields
62+
* - escapeChar, optional, a single characer used for escaping (may be the same as quoteChar)
63+
*/
64+
65+
// Note: this has been implemented as a single class rather than a set of classes based
66+
// on an abstract base class in order to facilitate changing the type of the parser easily
67+
// eg in the provider dialog
68+
69+
class QgsDelimitedTextFile
70+
{
71+
72+
public:
73+
74+
enum Status
75+
{
76+
RecordOk,
77+
InvalidDefinition,
78+
RecordEmpty,
79+
RecordInvalid,
80+
RecordEOF
81+
};
82+
83+
enum DelimiterType
84+
{
85+
DelimTypeWhitespace,
86+
DelimTypeCSV,
87+
DelimTypeRegexp,
88+
};
89+
90+
QgsDelimitedTextFile( QString url = QString() );
91+
92+
virtual ~QgsDelimitedTextFile();
93+
94+
/** Set the filename
95+
* @param filename the name of the file
96+
*/
97+
void setFileName( QString filename );
98+
/** Return the filename
99+
* @return filename the name of the file
100+
*/
101+
QString fileName()
102+
{
103+
return mFileName;
104+
}
105+
106+
/** Set the file encoding (defuault is UTF-8)
107+
* @param encoding the encoding to use for the fileName()
108+
*/
109+
void setEncoding( QString encoding );
110+
/** Return the file encoding
111+
* @return encoding The file encoding
112+
*/
113+
QString encoding() { return mEncoding; }
114+
115+
/** Decode the parser settings from a url as a string
116+
* @param url The url from which the delimiter and delimiterType items are read
117+
*/
118+
bool setFromUrl( QString url );
119+
/** Decode the parser settings from a url
120+
* @param url The url from which the delimiter and delimiterType items are read
121+
*/
122+
bool setFromUrl( QUrl &url );
123+
124+
/** Encode the parser settings into a QUrl
125+
* @return url The url into which the delimiter and delimiterType items are set
126+
*/
127+
QUrl url();
128+
129+
/** Set the parser for parsing CSV files
130+
*/
131+
void setTypeWhitespace();
132+
133+
/** Set the parser for parsing based on a reqular expression delimiter
134+
@param regexp A string defining the regular expression
135+
*/
136+
void setTypeRegexp( QString regexp );
137+
/** Set the parser to use a character type delimiter.
138+
* @param delim The field delimiter character set
139+
* @param quote The quote character, used to define quoted fields
140+
* @param escape The escape character used to escape quote or delim
141+
* characters.
142+
*/
143+
void setTypeCSV( QString delim = QString( "," ), QString quote = QString( "\"" ), QString escape = QString( "\"" ) );
144+
145+
/* Set the number of header lines to skip
146+
* @param skiplines The maximum lines to skip
147+
*/
148+
void setSkipLines( int skiplines );
149+
/* Return the number of header lines to skip
150+
* @return skiplines The maximum lines to skip
151+
*/
152+
int skipLines()
153+
{
154+
return mSkipLines;
155+
}
156+
157+
/* Set reading column names from the first record
158+
* @param useheaders Column names will be read if true
159+
*/
160+
void setUseHeader( bool useheader = true );
161+
/* Return the option for reading column names from the first record
162+
* @return useheaders Column names will be read if true
163+
*/
164+
bool useHeader()
165+
{
166+
return mUseHeader;
167+
}
168+
169+
/* Set the option for dicarding empty fields
170+
* @param useheaders Empty fields will be discarded if true
171+
*/
172+
void setDiscardEmptyFields( bool discardEmptyFields = true );
173+
/* Return the option for discarding empty fields
174+
* @return useheaders Empty fields will be discarded if true
175+
*/
176+
bool discardEmptyFields()
177+
{
178+
return mDiscardEmptyFields;
179+
}
180+
181+
/* Set the option for trimming whitespace from fields
182+
* @param trimFields Fields will be trimmed if true
183+
*/
184+
void setTrimFields( bool trimFields = true );
185+
/* Return the option for trimming empty fields
186+
* @return useheaders Empty fields will be trimmed if true
187+
*/
188+
bool trimFields()
189+
{
190+
return mTrimFields;
191+
}
192+
193+
/** Return the column names read from the header, or default names
194+
* Col## if none defined. Will open and read the head of the file
195+
* if required, then reset..
196+
*/
197+
QStringList &columnNames();
198+
199+
/** Reads the next record from the stream splits into string fields.
200+
* @param fields The string list to populate with the fields
201+
* @return status The result of trying to parse a record. RecordOk
202+
* if read successfully, RecordEOF if reached the end of the
203+
* file, RecordEmpty if no data on the next line, and
204+
* RecordInvalid if the record is ill-formatted.
205+
*/
206+
Status nextRecord( QStringList &fields );
207+
208+
/** Return the line number of the start of the last record read
209+
* @return linenumber The line number of the start of the record
210+
*/
211+
int recordLineNumber()
212+
{
213+
return mRecordLineNumber;
214+
}
215+
216+
/** Reset the file to reread from the beginning
217+
*/
218+
Status reset();
219+
220+
/** Return a string defining the type of the delimiter as a string
221+
* @return type The delimiter type as a string
222+
*/
223+
QString type();
224+
225+
/** Check that provider is valid (filename and definition valid)
226+
*
227+
* @return valid True if the provider is valid
228+
*/
229+
bool isValid();
230+
231+
/** Encode characters - used to convert delimiter/quote/escape characters to
232+
* encoded form (eg replace tab with \t)
233+
* @param string The unencoded string
234+
* @return encstring The encoded string
235+
*/
236+
static QString encodeChars( QString string );
237+
238+
/** Encode characters - used to encoded character strings to
239+
* decoded form (eg replace \t with tab)
240+
* @param string The unencoded string
241+
* @return decstring The decoded string
242+
*/
243+
static QString decodeChars( QString string );
244+
245+
246+
247+
248+
private:
249+
250+
/** Open the file
251+
*
252+
* @return valid True if the file is successfully opened
253+
*/
254+
bool open();
255+
256+
/** Close the text file
257+
*/
258+
void close();
259+
260+
/** Reset the status if the definition is changing (eg clear
261+
* existing column names, etc...
262+
*/
263+
void resetDefinition();
264+
265+
/** Parse reqular expression delimited fields */
266+
Status parseRegexp( QStringList &fields );
267+
/** Parse quote delimited fields, where quote and escape are different */
268+
Status parseQuoted( QStringList &fields );
269+
270+
/** Return the next line from the data file. If skipBlank is true then
271+
* blank lines will be skipped - this is for compatibility with previous
272+
* delimited text parser implementation.
273+
*/
274+
Status nextLine( QString &buffer, bool skipBlank = false );
275+
276+
// Pointer to the currently selected parser
277+
Status( QgsDelimitedTextFile::*mParser )( QStringList &fields );
278+
279+
QString mFileName;
280+
QString mEncoding;
281+
QFile *mFile;
282+
QTextStream *mStream;
283+
284+
// Parameters common to parsers
285+
bool mDefinitionValid;
286+
DelimiterType mType;
287+
bool mUseHeader;
288+
bool mDiscardEmptyFields;
289+
bool mTrimFields;
290+
int mSkipLines;
291+
int mMaxFields;
292+
293+
// Parameters used by parsers
294+
QRegExp mDelimRegexp;
295+
QString mDelimChars;
296+
QString mQuoteChar;
297+
QString mEscapeChar;
298+
299+
// Information extracted from file
300+
QStringList mColumnNames;
301+
int mLineNumber;
302+
int mRecordLineNumber;
303+
};

‎src/providers/delimitedtext/qgsdelimitedtextprovider.cpp

Lines changed: 377 additions & 298 deletions
Large diffs are not rendered by default.

‎src/providers/delimitedtext/qgsdelimitedtextprovider.h

Lines changed: 40 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -23,31 +23,47 @@
2323

2424
class QgsFeature;
2525
class QgsField;
26+
class QgsGeometry;
27+
class QgsPoint;
2628
class QFile;
2729
class QTextStream;
2830

2931
class QgsDelimitedTextFeatureIterator;
32+
class QgsDelimitedTextFile;
3033

3134

3235
/**
3336
\class QgsDelimitedTextProvider
3437
\brief Data provider for delimited text files.
3538
*
3639
* The provider needs to know both the path to the text file and
37-
* the delimiter to use. Since the means to add a layer is farily
40+
* the delimiter to use. Since the means to add a layer is fairly
3841
* rigid, we must provide this information encoded in a form that
3942
* the provider can decipher and use.
40-
* The uri must contain the path and delimiter in this format:
41-
* /full/path/too/delimited.txt?delimiter=<delimiter>
4243
*
43-
* Example uri = "/home/foo/delim.txt?delimiter=|"
44+
* The uri must defines the file path and the parameters used to
45+
* interpret the contents of the file.
46+
*
47+
* Example uri = "/home/foo/delim.txt?delimiter=|"*
48+
*
49+
* For detailed information on the uri format see the QGSVectorLayer
50+
* documentation.
51+
*
52+
4453
*/
4554
class QgsDelimitedTextProvider : public QgsVectorDataProvider
4655
{
4756
Q_OBJECT
4857

4958
public:
5059

60+
/**
61+
* Regular expression defining possible prefixes to WKT string,
62+
* (EWKT srid, Informix SRID)
63+
*/
64+
static QRegExp WktPrefixRegexp;
65+
static QRegExp CrdDmsRegexp;
66+
5167
QgsDelimitedTextProvider( QString uri = QString() );
5268

5369
virtual ~QgsDelimitedTextProvider();
@@ -149,23 +165,26 @@ class QgsDelimitedTextProvider : public QgsVectorDataProvider
149165
*/
150166
bool boundsCheck( QgsGeometry *geom );
151167

152-
153-
static QString readLine( QTextStream *stream );
154-
static QStringList splitLine( QString line, QString delimiterType, QString delimiter );
155-
156168
private:
157169

170+
static QRegExp WktZMRegexp;
171+
static QRegExp WktCrdRegexp;
172+
173+
void clearInvalidLines();
174+
void recordInvalidLine( QString message );
158175
void handleInvalidLines();
176+
void resetStream();
177+
178+
QgsGeometry *geomFromWkt( QString &sWkt );
179+
bool pointFromXY( QString &sX, QString &sY, QgsPoint &point );
180+
double dmsStringToDouble( const QString &sX, bool *xOk );
181+
//! Text file
182+
QgsDelimitedTextFile *mFile;
159183

160-
//! Fields
184+
// Fields
161185
QList<int> attributeColumns;
162186
QgsFields attributeFields;
163187

164-
QString mFileName;
165-
QString mDelimiter;
166-
QRegExp mDelimiterRegexp;
167-
QString mDelimiterType;
168-
169188
int mFieldCount; // Note: this includes field count for wkt field
170189
int mXFieldIndex;
171190
int mYFieldIndex;
@@ -174,30 +193,27 @@ class QgsDelimitedTextProvider : public QgsVectorDataProvider
174193
// Handling of WKT types with .. Z, .. M, and .. ZM geometries (ie
175194
// Z values and/or measures). mWktZMRegexp is used to test for and
176195
// remove the Z or M fields, and mWktCrdRegexp is used to remove the
177-
// extra coordinate values.
196+
// extra coordinate values. mWktPrefix regexp is used to clean up
197+
// prefixes sometimes used for WKT (postgis EWKT, informix SRID)
178198

179199
bool mWktHasZM;
180-
QRegExp mWktZMRegexp;
181-
QRegExp mWktCrdRegexp;
200+
bool mWktHasPrefix;
182201

183202
//! Layer extent
184203
QgsRectangle mExtent;
185204

186-
//! Text file
187-
QFile *mFile;
188-
189-
QTextStream *mStream;
190-
191205
bool mValid;
192206

193207
int mGeomType;
194208

195209
long mNumberFeatures;
196210
int mSkipLines;
197-
int mFirstDataLine; // Actual first line of data (accounting for blank lines)
198211
QString mDecimalPoint;
212+
bool mXyDms;
199213

200214
//! Storage for any lines in the file that couldn't be loaded
215+
int mMaxInvalidLines;
216+
int mNExtraInvalidLines;
201217
QStringList mInvalidLines;
202218
//! Only want to show the invalid lines once to the user
203219
bool mShowInvalidLines;
@@ -215,8 +231,7 @@ class QgsDelimitedTextProvider : public QgsVectorDataProvider
215231
QgsCoordinateReferenceSystem mCrs;
216232

217233
QGis::WkbType mWkbType;
218-
219-
QStringList splitLine( QString line ) { return splitLine( line, mDelimiterType, mDelimiter ); }
234+
QGis::GeometryType mGeometryType;
220235

221236
friend class QgsDelimitedTextFeatureIterator;
222237
QgsDelimitedTextFeatureIterator* mActiveIterator;

‎src/providers/delimitedtext/qgsdelimitedtextsourceselect.cpp

Lines changed: 487 additions & 274 deletions
Large diffs are not rendered by default.

‎src/providers/delimitedtext/qgsdelimitedtextsourceselect.h

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
#include "qgisgui.h"
2121

2222
class QgisInterface;
23+
class QgsDelimitedTextFile;
2324

2425
/**
2526
* \class QgsDelimitedTextSourceSelect
@@ -35,18 +36,34 @@ class QgsDelimitedTextSourceSelect : public QDialog, private Ui::QgsDelimitedTex
3536
QStringList splitLine( QString line );
3637

3738
private:
38-
bool haveValidFileAndDelimiters();
39+
bool loadDelimitedFileDefinition();
3940
void updateFieldLists();
4041
void getOpenFileName();
41-
QStringList selectedChars();
42+
QString selectedChars();
43+
void setSelectedChars( QString delimiters );
44+
void loadSettings( QString subkey = QString(), bool loadGeomSettings = true );
45+
void saveSettings( QString subkey = QString(), bool saveGeomSettings = true );
46+
void loadSettingsForFile( QString filename );
47+
void saveSettingsForFile( QString filename );
48+
bool trySetXYField( QStringList &fields, QList<bool> &isValidNumber, QString xname, QString yname );
49+
50+
private:
51+
QgsDelimitedTextFile *mFile;
52+
int mExampleRowCount;
53+
QString mColumnNamePrefix;
54+
QString mPluginKey;
4255

4356
private slots:
4457
void on_buttonBox_accepted();
4558
void on_buttonBox_rejected();
46-
void on_buttonBox_helpRequested() { QgsContextHelp::run( metaObject()->className() ); }
59+
void on_buttonBox_helpRequested()
60+
{
61+
QgsContextHelp::run( metaObject()->className() );
62+
}
4763
void on_btnBrowseForFile_clicked();
4864

4965
public slots:
66+
void updateFileName();
5067
void updateFieldsAndEnable();
5168
void enableAccept();
5269

‎src/ui/qgsdelimitedtextsourceselectbase.ui

Lines changed: 827 additions & 366 deletions
Large diffs are not rendered by default.

‎tests/src/python/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ ADD_PYTHON_TEST(PyQgsVectorLayer test_qgsvectorlayer.py)
66
ADD_PYTHON_TEST(PyQgsRasterLayer test_qgsrasterlayer.py)
77
ADD_PYTHON_TEST(PyQgsRasterFileWriter test_qgsrasterfilewriter.py)
88
ADD_PYTHON_TEST(PyQgsMemoryProvider test_qgsmemoryprovider.py)
9+
ADD_PYTHON_TEST(PyQgsDelimitedTextProvider test_qgsdelimitedtextprovider.py)
910
ADD_PYTHON_TEST(PyQgsLogger test_qgslogger.py)
1011
ADD_PYTHON_TEST(PyQgsCoordinateTransform test_qgscoordinatetransform.py)
1112
ADD_PYTHON_TEST(PyQgsRectangle test_qgsrectangle.py)

‎tests/src/python/test_qgsdelimitedtextprovider.py

Lines changed: 874 additions & 0 deletions
Large diffs are not rendered by default.

‎tests/testdata/delimitedtext/test.csv

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
id,"description",data,info
2+
1,Basic unquoted record,Some data,Some info
3+
2,Quoted field,"Quoted data",Unquoted
4+
3,Escaped quotes,"Quoted ""citation"" data",Unquoted
5+
4,Quoted newlines,"Line 1
6+
Line 2
7+
8+
Line 4",No data
9+
5,Extra fields,data,info,message,,,
10+
6,Missing fields
11+
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
id|description|data|info
2+
1|Using pipe delimiter|data 1|info 1
3+
2|Using backslash escape on pipe|data 2 \| piped|info2
4+
3|Backslash escaped newline|data3 \
5+
line2 \
6+
line3|info3
7+
4|Empty field||info4
8+
5|Quoted field|"More | piped data"|info5
9+
6|Escaped quote|"Field \"citation\" "|info6
10+
7|Missing fields|
11+
8|Extra fields|data8|info8|message8|more|||
12+
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
id,description,data,info
2+
1,Multiple quotes 1,"Quoted,data1",info1
3+
2,Multiple quotes 2,'Quoted,data2',info2
4+
3,Leading and following whitespace, 'Quoted, data3' ,info3
5+
4,Embedded quotes 1,"Quoted ''""'' data4",info4
6+
5,Embedded quotes 2,'Quoted ''""'' data5',info5
7+
6,Invalid quotes 1,Quote ' too late',info6
8+
7,Invalid quotes 2,'End quote too 'soon,info7
9+
8,Invalid quotes 3,'Embedded unescaped' quote',info8
10+
9,Final record,date9,info9
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
id description data info
2+
1 Simple_whitespace_file data1 info1
3+
2 Whitespace_at_start_of_line data2 info2
4+
3 Tab_whitespace data3 info3
5+
4 Multiple_whitespace_characters data4 info4
6+
5 Extra_fields data5 info5 message5 rubbish5
7+
6 Missing_fields
8+
9+
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
1,No headers,Some data 1,Some info
2+
2,Extra data,"Data 2",info2,message2
3+
3,Less data,data3
4+
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
id,description,lon,lat
2+
3+
1,"Basic DMS string","1 5 30.6","35 51 20"
4+
2,"Basic DMS string 2","1 05 30.6005","035 51 20"
5+
3,"Basic DMS string 3","1 05 30.6","35 59 9.99"
6+
7+
4,"Prefix sign 1","n1 05 30.6","e035 51 20"
8+
5,"Prefix sign 2","N1 05 30.6","E035 51 20"
9+
6,"Prefix sign 3","N 1 05 30.6","E 035 51 20"
10+
7,"Prefix sign 4","S1 05 30.6","W035 51 20"
11+
8,"Prefix sign 5","+1 05 30.6","+035 51 20"
12+
9,"Prefix sign 6","-1 05 30.6","-035 51 20"
13+
14+
10,"Postfix sign 1","1 05 30.6n","035 51 20e"
15+
11,"Postfix sign 2","1 05 30.6N","035 51 20E"
16+
12,"Postfix sign 3","1 05 30.6 N","035 51 20 E"
17+
13,"Postfix sign 4","1 05 30.6S","035 51 20W"
18+
14,"Postfix sign 5","1 05 30.6+","035 51 20+"
19+
15,"Postfix sign 6","1 05 30.6-","035 51 20-"
20+
21+
16,"Leading and trailing blanks 1"," 1 05 30.6","035 51 20 "
22+
17,"Leading and trailing blanks 2"," N 1 05 30.6","035 51 20 E "
23+
24+
18,"Alternative characters for D,M,S","1d05m30.6s S","35d51'20"
25+
19,"Degrees/minutes format",1 05.23,4 55.03
26+
27+
20,"Invalid DMS 1",1 65 30.6,35 51 20
28+
21,"Invalid DMS 2",1 05 30.6,35 61 20
29+
22,"Invalid DMS 3",1 05 30.6,35 51 200
30+
23,"Invalid DMS 4",1 05 30.6,35 51 020
31+
24,"Invalid DMS 5",12.234,35 51 20
32+
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
id,description,geom_x,geom_y
2+
1,Basic point,10.0,20.0
3+
2,Integer point,11,22
4+
3,Invalid coordinate format,ten,20.0
5+
4,Final point,13.0,23.0
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
id|description|geom_wkt
2+
1|Point wkt|POINT(10 20)
3+
2|Multipoint wkt|MULTIPOINT(10 20,11 21)
4+
3|Linestring wkt|LINESTRING(10 20,11 21)
5+
4|Multiline string wkt|MULTILINESTRING((10 20,11 21),(20 30,21 31))
6+
5|Polygon wkt|POLYGON((10 10,10 20,20 20,20 10,10 10),(14 14,14 16,16 16,14 14))
7+
6|MultiPolygon wkt|MULTIPOLYGON(((10 10,10 20,20 20,20 10,10 10),(14 14,14 16,16 16,14 14)),((30 30,30 35,35 35,30 30)))
8+
7|Invalid wkt|POINT(10)
9+
8|EWKT prefix|SRID=1234;POINT(10 10)
10+
9|Informix prefix|1 POINT(10 10)
11+
10|Measure in point|POINTM(10 20 30)
12+
11|Measure in line|LINESTRING(10.0 20.0 30.0, 11.0 21.0 31.0)
13+
12|Z in line|LINESTRING Z(10.0 20.0 30.0, 11.0 21.0 31.0)
14+
13|Measure and Z in line|LINESTRING ZM(10.0 20.0 30.0 40.0, 11.0 21.0 31.0 41.0)
15+

0 commit comments

Comments
 (0)
Please sign in to comment.