Bug report #14964

WFS provider chockes on non asci character, when using bbox-clause

Added by Richard Duivenvoorde over 4 years ago. Updated about 3 years ago.

Status:Closed
Priority:Normal
Assignee:Even Rouault
Category:Unknown
Affected QGIS version:master Regression?:No
Operating System: Easy fix?:No
Pull Request or Patch supplied:No Resolution:wontfix
Crashes QGIS or corrupts data:No Copied to github as #:22913

Description

We have this national OWS services. One of them is municipality borders:

If I add a WFS server via this url:

https://geodata.nationaalgeoregister.nl/bestuurlijkegrenzen/wfs?request=GetCapabilities
or
https://geodata.nationaalgeoregister.nl/bestuurlijkegrenzen/wfs

I get a:

Layer bestuurlijkegrenzen:gemeenten: Download of features for layer bestuurlijkegrenzen:gemeenten failed or partially failed: Error when parsing GetFeature response : Error: not well-formed (invalid token) on line 2096, column 20. You may attempt reloading the layer with F5

Looking into this line it is a character encoding issue:

<bestuurlijkegrenzen:gemeentenaam>Súdwest-Fryslân</bestuurlijkegrenzen:gemeentenaam>

BUT exactly the same url returns perfectly fine in one of my plugins.

First I thought it had something to do with the headers sent by the WFS server:

curl -s -D - "https://geodata.nationaalgeoregister.nl/bestuurlijkegrenzen/wfs?SERVICE=WFS&REQUEST=GetFeature&VERSION=2.0.0&TYPENAMES=bestuurlijkegrenzen:gemeenten&COUNT=15000&SRSNAME=urn:ogc:def:crs:EPSG::28992&" -o /dev/null

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: POST, GET, OPTIONS, HEAD
Access-Control-Max-Age: 1000
Access-Control-Allow-Headers: SOAPAction,X-Requested-With,Content-Type,Origin,Authorization,Accept
Content-Disposition: inline; filename=geoserver-GetFeature.text
X-Cnection: close
Content-Type: text/xml; subtype=gml/3.2
Transfer-Encoding: chunked
Date: Mon, 06 Jun 2016 13:47:20 GMT

curl -s -D - "https://geodata.nationaalgeoregister.nl/bestuurlijkegrenzen/wfs?request=GetCapabilities&SERVICE=WFS&REQUEST=GetFeature&VERSION=2.0.0&TYPENAMES=bestuurlijkegrenzen:gemeenten&COUNT=15000&SRSNAME=urn:ogc:def:crs:EPSG::28992&BBOX=99491.91838273697067052,532089.02270954963751137,216836.14587102248333395,582859.78376076021231711" -o /dev/null

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: POST, GET, OPTIONS, HEAD
Access-Control-Max-Age: 1000
Access-Control-Allow-Headers: SOAPAction,X-Requested-With,Content-Type,Origin,Authorization,Accept
Content-Disposition: inline; filename=geoserver-GetFeature.text
X-Cnection: close
Content-Type: text/xml; subtype=gml/3.2
Transfer-Encoding: chunked
Date: Mon, 06 Jun 2016 13:58:18 GMT

BUT that turned out NOT to be the case.

BUT looking into the url's fired, it apparently is the route taken into the wfs provider.

Both these requests have all municipalities in the netherlands and return the same xml on the commandline, the first one without bbox clause, the second one with:

curl "https://geodata.nationaalgeoregister.nl/bestuurlijkegrenzen/wfs?SERVICE=WFS&REQUEST=GetFeature&VERSION=2.0.0&TYPENAMES=bestuurlijkegrenzen:gemeenten&COUNT=15000&SRSNAME=urn:ogc:def:crs:EPSG::28992&" > wfsnobbox.xml

curl "https://geodata.nationaalgeoregister.nl/bestuurlijkegrenzen/wfs?request=GetCapabilities&SERVICE=WFS&REQUEST=GetFeature&VERSION=2.0.0&TYPENAMES=bestuurlijkegrenzen:gemeenten&COUNT=15000&SRSNAME=urn:ogc:def:crs:EPSG::28992&BBOX=0,300000,300000,600000" > wfswithbbox2.xml

The first one I use to programmatically use (the same wfs provider).
BUT the second one I see in the debug of QGIS and is used by the WFS provider, if used via the 'Add WFS Layer' route.

So my theory: the parsing route of the gml via the 'with bbox' clause is different from the one via the 'old route'.

The parsing (I think done to be able to incrementally load the features) does not take character encoding into account?

Thanks for looking into this.

History

#1 Updated by Richard Duivenvoorde over 4 years ago

  • Subject changed from WFS provider chockes on non asci character, when request=getcapabilities still in URL to WFS provider chockes on non asci character, when using bbox-clause

#2 Updated by Even Rouault over 4 years ago

This is a server issue

The output of

curl "https://geodata.nationaalgeoregister.nl/bestuurlijkegrenzen/wfs?request=GetCapabilities&SERVICE=WFS&REQUEST=GetFeature&VERSION=2.0.0&TYPENAMES=bestuurlijkegrenzen:gemeenten&COUNT=15000&SRSNAME=urn:ogc:def:crs:EPSG::28992&BBOX=0,300000,300000,600000" > wfswithbbox2.xml

is not XML valid

$ xmllint -noout wfswithbbox2.xml
wfswithbbox2.xml:2368: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xFA 0x64 0x77 0x65
<bestuurlijkegrenzen:gemeentenaam>S�dwest-Frysl�n</bestuurlijkegrenzen:gemeenten
^
wfswithbbox2.xml:6290: parser error : xmlSAX2Characters: huge text node
44.036 580711.145 201364.266 580716.592 201404.123 580726.725 201411.415 580728.
^
wfswithbbox2.xml:6290: parser error : Extra content at the end of the document
44.036 580711.145 201364.266 580716.592 201404.123 580726.725 201411.415 580728.
^

The encoding seems to be Latin-1 despite being indicated to be UTF-8 in the header.

#3 Updated by Richard Duivenvoorde over 4 years ago

Mmm, seen that issue before :-(

But: what is then the difference between the two with and without bbox?

Or why is this working:

qgis.utils.iface.addVectorLayer("https://geodata.nationaalgeoregister.nl/bestuurlijkegrenzen/wfs?SERVICE=WFS&VERSION=1.0.0&REQUEST=GetFeature&TYPENAME=bestuurlijkegrenzen:gemeenten&SRSNAME=EPSG:28992&BBOX=0,300000,300000,600000","wfs example","WFS")

anyway, thanks for your time, will create an issue at pdok probably

#4 Updated by Even Rouault over 4 years ago

Or why is this working:

qgis.utils.iface.addVectorLayer("https://geodata.nationaalgeoregister.nl/bestuurlijkegrenzen/wfs?SERVICE=WFS&VERSION=1.0.0&REQUEST=GetFeature&TYPENAME=bestuurlijkegrenzen:gemeenten&SRSNAME=EPSG:28992&BBOX=0,300000,300000,600000","wfs example","WFS")

After the update of the WFS provider, the BBOX specified in the URL is ignored, so when you retrieve features, the whole layer is retrieved. If you want to use BBOX bases GetFeature requests, the syntax is of the URI should be "retrictToRequestBBOX='1' srsname='EPSG:28992' typename='bestuurlijkegrenzen:gemeenten' url='https://geodata.nationaalgeoregister.nl/bestuurlijkegrenzen/wfs'" and then use queries like :

extent = QgsRectangle(0,300000,300000,600000)
request = QgsFeatureRequest().setFilterRect(extent)
features = [f for f in vl.getFeatures(request)]

#5 Updated by Even Rouault over 4 years ago

  • Resolution set to wontfix

Closing. There's nothing we can really do (reasonably) about a malformed XML file sent by the server

#6 Updated by Nyall Dawson over 4 years ago

  • Status changed from Open to Closed

#7 Updated by Jürgen Fischer about 3 years ago

  • Category set to Unknown

Also available in: Atom PDF