You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<a href="#wkt">How WKT text is interpreted</a><br />
9
+
<a href="#attributes">Attributes in delimited text files</a><br />
9
10
<a href="#example">Example of a text file with X,Y point coordinates</a><br/>
10
11
<a href="#wkt_example">Example of a text file with WKT geometries</a><br/>
11
12
<a href="#python">Using delimited text layers in Python</a><br/>
@@ -68,15 +69,15 @@ It is safer to use an explicit coding if the QGis project needs to be portable.
68
69
affects the alignment of data into fields and is equivalent to treating consecutive delimiters as a
69
70
single delimiter. Quoted fields are never discarded.</li>
70
71
<li>Decimal point is comma: if selected then commas are used as the decimal point in real numbers. For
71
-
example "-51,354" is equivalent to -51.354.
72
+
example <tt>-51,354</tt> is equivalent to -51.354.
72
73
</li>
73
74
</ul>
74
75
<h5>Geometry definition</h5>
75
76
<p>The geometry is can be define as one of</p>
76
77
<ul>
77
78
<li>Point coordinates: each feature is represented as a point defined by X and Y coordinates.</li>
78
79
<li>Well known text (WKT) geometry: each feature is represented as a well known text string, for example
79
-
"POINT(1.525622 51.20836)". See details of the <a href="#wkt">well known text</a> format.
80
+
<tt>POINT(1.525622 51.20836)</tt>. See details of the <a href="#wkt">well known text</a> format.
80
81
<li>No geometry (attribute only table): records will not be displayed on the map, but can be viewed
81
82
in the attribute table and joined to other layers in QGis</li>
82
83
</ul>
@@ -88,7 +89,7 @@ It is safer to use an explicit coding if the QGis project needs to be portable.
88
89
or degrees/minutes. QGis is quite permissive in its interpretation of degrees/minutes/seconds.
89
90
A valid DMS coordinate will contain three numeric fields with an optional hemisphere prefix or suffix
90
91
(N, E, or + are positive, S, W, or - are negative). Additional non numeric characters are
91
-
generally discarded. For example "N41d54'01.54"" is a valid coordinate.
92
+
generally discarded. For example <tt>N41d54'01.54"</tt> is a valid coordinate.
92
93
</li>
93
94
</ul>
94
95
<p>For well known text geometry the following options apply:</p>
@@ -104,33 +105,41 @@ It is safer to use an explicit coding if the QGis project needs to be portable.
104
105
</ul>
105
106
106
107
<h4><a name="csv">How the delimiter, quote, and escape characters work</a></h4>
107
-
<p>Records are split into fields using three character sets: delimiter characters, quote characters,
108
-
and escape characters. Quote and escape characters cannot be the same as delimiter characters - they
108
+
<p>Records are split into fields using three character sets:
109
+
delimiter characters, quote characters, and escape characters.
110
+
Other characters in the record are considered as data, split into
111
+
fields by delimiter characters.
112
+
Quote characters occur in pairs and cause the text between them to be treated as a data. Escape characters cause the character following them to be treated as data.
113
+
</p>
114
+
<p>
115
+
Quote and escape characters cannot be the same as delimiter characters - they
109
116
will be ignored if they are. Escape characters can be the same as quote characters, but behave differently
110
117
if they are.</p>
111
118
<p>The delimiter characters are used to mark the end of each field. If more than one delimiter character
112
119
is defined then any one of the characters can mark the end of a field. The quote and escape characters
113
-
can override the delimiter character, so that it is treated as a normal character.</p>
120
+
can override the delimiter character, so that it is treated as a normal data character.</p>
114
121
<p>Quote characters may be used to mark the beginning and end of quoted fields. Quoted fields can
115
122
contain delimiters and may span multiple lines in the text file. If a field is quoted then it must
116
123
start and end with the same quote character. Quote characters cannot occur within a field unless they
117
124
are escaped.</p>
118
-
<p>Escape characters which are not quote characters force the following character to be treated normally
125
+
<p>Escape characters which are not quote characters force the following character to be treated as data.
119
126
(that is, to stop it being treated as a new line, delimiter, or quote character).
120
127
</p>
121
-
<p>If a quote character is also an escape character, then it can be represented in a quoted field by
122
-
entering it twice. For example if ' is a quote character and an escape character, then the string
123
-
'Smith''s Creek' will represent the value Smith's Creek.
128
+
<p>Escape characters that are also quote characters have much more limited effect. They only apply within quotes and only escape themselves. For example, if
129
+
<tt>'</tt> is a quote and escape character, then the string
130
+
<tt>'Smith''s Creek'</tt> will represent the value Smith's Creek.
<p>Regular expressions are mini-language used to represent character patterns. There are many variations
127
136
of regular expression syntax - QGis uses the syntax provided by the <a href="http://qt-project.org/doc/qt-4.8/qregexp.html">QRegExp</a> class of the <a href="http://qt.digia.com">Qt</a> framework.</p>
128
137
<p>In a regular expression delimited file each line is treated as a record. Each match of the regular expression in the line is treated as the end of a field.
129
-
If the regular expression contains grouped expressions (eg "(cat|dog)")
138
+
If the regular expression contains capture groups (eg <tt>(cat|dog)</tt>)
130
139
then these are extracted as fields.
131
-
If this is not desired then use non-capturing groups eg "(?:cat|dog)".
140
+
If this is not desired then use non-capturing groups (eg <tt>(?:cat|dog)</tt>).
132
141
</p>
133
-
<p>The regular expression is treated differently if it is anchored to the start of the line (that is, the pattern starts with "^".
142
+
<p>The regular expression is treated differently if it is anchored to the start of the line (that is, the pattern starts with <tt>^</tt>).
134
143
In this case the regular expression is matched against each line. If the line does not match it is discarded
135
144
as an invalid record. Each capture group in the expression is treated as a field. The regular expression
136
145
is invalid if it does not have capture groups. As an example this can be used as a (somewhat
@@ -143,19 +152,46 @@ expression
143
152
Lines less than 45 characters long will be discarded.
144
153
</p>
145
154
155
+
146
156
<h4><a name="wkt">How WKT text is interpreted</a></h4>
147
157
<p>
148
158
The delimited text layer recognizes the following
149
159
<a href="http://en.wikipedia.org/wiki/Well-known_text">well known text</a> types -
150
-
POINT, MULTIPOINT, LINESTRING, MULTILINESTRING, POLYGON, and MULTIPOLYGON. It will accept geometries with
151
-
a Z coordinate (eg "POINT Z"), a measure ("POINT M"), or both ("POINT ZM").
160
+
<tt>POINT</tt>, <tt>MULTIPOINT</tt>, <tt>LINESTRING</tt>, <tt>MULTILINESTRING</tt>, <tt>POLYGON</tt>, and <tt>MULTIPOLYGON</tt>.
161
+
It will accept geometries with
162
+
a Z coordinate (eg <tt>POINT Z</tt>), a measure (<tt>POINT M</tt>), or both (<tt>POINT ZM</tt>).
152
163
</p>
153
164
<p>
154
165
It can also handle the PostGIS EWKT variation, in which the geometry is preceded by an spatial reference
155
-
system id (eg "SRID=4326;POINT(175.3 41.2)"), and a variant used by Informix in which the WKT is
156
-
preceded by an integer spatial reference id (eg "1 POINT(175.3 41.2)").
166
+
system id (eg <tt>SRID=4326;POINT(175.3 41.2)</tt>), and a variant used by Informix in which the WKT is
167
+
preceded by an integer spatial reference id (eg <tt>1 POINT(175.3 41.2)</tt>).
157
168
In both cases the SRID is ignored.
158
169
</p>
170
+
171
+
172
+
173
+
<h4><a name="attributes">Attributes in delimited text files</a></h4>
174
+
<p>Each record in the delimited text file is split into fields representing
175
+
attributes of the record. Usually the attribute names are taken from the first
176
+
data record in the file. However if this does not contain attribute names, then they will be named <tt>field_1</tt>, <tt>field_2</tt>, and so on. QGis may override
177
+
the names in the text file if they are numbers, or have names like <tt>field_#</tt>,
178
+
or are duplicated.
179
+
</p>
180
+
<p>
181
+
In addition to the attributes explicitly in the data file QGis assigns a unique
182
+
feature id to each record. This is the line number in the source file on which
183
+
the record starts.
184
+
</p>
185
+
<p>
186
+
Each attribute also has a data type, one of string (text), integer, or real number.
187
+
The data type is inferred from the content of the fields - if every non blank value
188
+
is a valid integer then the type is integer, otherwise if it is a valid real
189
+
number then the type is real, otherwise the type is string. Note that this is
190
+
based on the content of the fields - quoting fields does not change the way they
191
+
are interpreted.
192
+
</p>
193
+
194
+
159
195
<h4><a name="example">Example of a text file with X,Y point coordinates</a></h4>
160
196
<pre>
161
197
X;Y;ELEV<br />
@@ -167,7 +203,6 @@ X;Y;ELEV<br />
167
203
<ul>
168
204
<li> Uses <b>;</b> as delimiter. Any character can be used to delimit the fields.</li>
169
205
<li>The first row is the header row. It contains the field names X, Y and ELEV.</li>
170
-
<li>No quotes (") are used to delimit text fields.</li>
171
206
<li>The x coordinates are contained in the X field.</li>
172
207
<li>The y coordinates are contained in the Y field.</li>
<p>This could be used to load the second example file above.</p>
209
246
<p>The configuration of the delimited text layer is defined by adding query items to the uri.
210
247
The following options can be added
211
248
</p>
212
249
<ul>
213
-
<li><i>encoding=..</i> defines the file encoding. The default is "UTF-8"</li>
214
-
<li><i>type=(csv|regexp|whitespace)</i> defines the delimiter type. Valid values are csv,
215
-
regexp, and whitespace (which is just a special case of regexp). Default is csv.</li>
216
-
<li><i>delimiter=...</i> defines the delimiters that will be used for csv formatted files,
217
-
or the regular expression for regexp formatted files. Default is , for CSV files. There is
250
+
<li><tt>encoding=..</tt> defines the file encoding. The default is "UTF-8"</li>
251
+
<li><tt>type=(csv|regexp|whitespace)</tt> defines the delimiter type. Valid values are csv,
252
+
regexp, and whitespace (which is just a special case of regexp). The default is csv.</li>
253
+
<li><tt>delimiter=...</tt> defines the delimiters that will be used for csv formatted files,
254
+
or the regular expression for regexp formatted files. The default is , for CSV files. There is
218
255
no default for regexp files.</li>
219
-
<li><i>quote=..</i> (for csv files) defines the characters used to quote fields. Default is "</li>
220
-
<li><i>escape=..</i> (for csv files) defines the characters used to escape the special meaning of the next character. Default is "</li>
221
-
<li><i>skipLines=#</i> defines the number of lines to discard from the beginning of the file. Default is 0.</li>
222
-
<li><i>useHeader=(yes|no)</i> defines whether the first data record contains the names of the data fields. Default is yes.</li>
223
-
<li><i>trimFields=(yes|no)</i> defines whether leading and trailing whitespace is to be removed from unquoted fields. Default is no.</li>
224
-
<li><i>maxFields=#</i> defines the maximum number of fields that will be loaded from the file.
225
-
Additional fields in each record will be discarded. Default is 0 - display all fields.
256
+
<li><tt>quote=..</tt> (for csv files) defines the characters used to quote fields. The default is "</li>
257
+
<li><tt>escape=..</tt> (for csv files) defines the characters used to escape the special meaning of the next character. The default is "</li>
258
+
<li><tt>skipLines=#</tt> defines the number of lines to discard from the beginning of the file. The default is 0.</li>
259
+
<li><tt>useHeader=(yes|no)</tt> defines whether the first data record contains the names of the data fields. The default is yes.</li>
260
+
<li><tt>trimFields=(yes|no)</tt> defines whether leading and trailing whitespace is to be removed from unquoted fields. The default is no.</li>
261
+
<li><tt>maxFields=#</tt> defines the maximum number of fields that will be loaded from the file.
262
+
Additional fields in each record will be discarded. The default is 0 - include all fields.
226
263
(This option is not available from the delimited text layer dialog box).</li>
227
-
<li><i>skipEmptyFields=(yes|no)</i> defines whether empty unquoted fields will be discarded if they are empty (applied after trimFields). Default is no.</li>
228
-
<li><i>decimalPoint=.</i> specifies an alternative character that may be used as a decimal point in numeric fields. Default is a point (full stop) character.</li>
229
-
<li><i>wktField=fieldname</i> specifies the name or number (starting at 1) of the field containing a well known text geometry definition</li>
230
-
<li><i>xField=fieldname</i> specifies the name or number (starting at 1) of the field the X coordinate (only applies if wktField is not defined)</li>
231
-
<li><i>yField=fieldname</i> specifies the name or number (starting at 1) of the field the Y coordinate (only applies if wktField is not defined)</li>
232
-
<li><i>geomType=(auto|point|line|polygon|none)</i> specifies type of geometry for wkt fields, or none to load the file as an attribute-only table. Default is auto.</li>
233
-
<li><i>crs=...</i> specifies the coordinate system to use for the vector layer, in a format accepted by QgsCoordinateReferenceSystem.createFromString (for example "EPSG:4167"). If this is not
234
-
specified then a dialog box may request this information from the user.</li>
235
-
<li><i>quiet=(yes|no)</i> specifies whether errors encountered loading the layer are presented in a dialog box (they will be written to the QGis log in any case). Default is no.</li>
264
+
<li><tt>skipEmptyFields=(yes|no)</tt> defines whether empty unquoted fields will be discarded (applied after trimFields). The default is no.</li>
265
+
<li><tt>decimalPoint=.</tt> specifies an alternative character that may be used as a decimal point in numeric fields. The default is a point (full stop) character.</li>
266
+
<li><tt>wktField=fieldname</tt> specifies the name or number (starting at 1) of the field containing a well known text geometry definition</li>
267
+
<li><tt>xField=fieldname</tt> specifies the name or number (starting at 1) of the field the X coordinate (only applies if wktField is not defined)</li>
268
+
<li><tt>yField=fieldname</tt> specifies the name or number (starting at 1) of the field the Y coordinate (only applies if wktField is not defined)</li>
269
+
<li><tt>geomType=(auto|point|line|polygon|none)</tt> specifies type of geometry for wkt fields, or none to load the file as an attribute-only table. The default is auto.</li>
270
+
<li><tt>crs=...</tt> specifies the coordinate system to use for the vector layer, in a format accepted by QgsCoordinateReferenceSystem.createFromString (for example "EPSG:4167"). If this is not
271
+
specified then a dialog box may request this information from the user
272
+
when the layer is loaded (depending on QGis CRS settings).</li>
273
+
<li><tt>quiet=(yes|no)</tt> specifies whether errors encountered loading the layer are presented in a dialog box (they will be written to the QGis log in any case). The default is no.</li>
0 commit comments