Bug report #3828

dissolve tool very slow

Added by Giovanni Manghi over 8 years ago. Updated over 2 years ago.

Status:Closed
Priority:Normal
Assignee:-
Category:Processing/QGIS
Affected QGIS version:master Regression?:No
Operating System: Easy fix?:No
Pull Request or Patch supplied:No Resolution:wontfix
Crashes QGIS or corrupts data:No Copied to github as #:13886

Description

Download the following shape (175MB)

http://www.igeo.pt/gdr/Downloads/ProdutosCLC/CLC06_PT.zip

and try dissolving using one of the fields, for example CODE_06. The tool will take hours (in my core2duo pc) to reach the 50%.

GRASS take under a minute to complete the operation. One/two minutes it is also the required time to complete the same operation with gvSIG.

History

#1 Updated by Paolo Cavallini over 8 years ago

  • Tracker changed from Feature request to 4
  • Start date set to 2011-07-25
  • Pull Request or Patch supplied set to No

#2 Updated by Giovanni Manghi almost 8 years ago

  • Target version changed from Version 1.7.0 to Version 1.7.4

#3 Updated by Paolo Cavallini almost 8 years ago

  • Affected QGIS version set to master
  • Priority changed from Low to Normal
  • Crashes QGIS or corrupts data set to No

Confirmed in current master. Furthermore, the dbf table is mostly filled up with NULLs.

#4 Updated by Giovanni Manghi almost 8 years ago

  • Category changed from Python plugins to 44
  • Crashes QGIS or corrupts data changed from No to Yes

The issue applies also to other vector geoprocessing tools (both lack of speed and corrupted attributes) and to me (just my opinion) is a show stopper.

#5 Updated by Giovanni Manghi almost 8 years ago

  • Subject changed from dissolve tool very inefficient (with large datasets?) to dissolve tool very slow

#6 Updated by Giovanni Manghi over 7 years ago

  • Tracker changed from 4 to Bug report

#7 Updated by Giovanni Manghi over 7 years ago

  • Operating System deleted (All)
  • Priority changed from Normal to High
  • Status info deleted (0)

#8 Updated by Paolo Cavallini over 7 years ago

  • Target version changed from Version 1.7.4 to Version 1.8.0

#9 Updated by Giovanni Manghi over 7 years ago

  • Assignee deleted (cfarmer -)

Hi Salvatore and Alex, are you willingly/available to give this a look? You have done a so great job in the last days that would be great to have also this fixed in time for 1.8

#10 Updated by Salvatore Larosa over 7 years ago

I think is not possible before of upcoming 1.8!
Many fTools are slow with this heavy files!!

For me it is a target 2.0.0 version!

Waiting for feedback from Alexander et al.!

#11 Updated by Giovanni Manghi over 7 years ago

Salvatore Larosa wrote:

I think is not possible before of upcoming 1.8!
Many fTools are slow with this heavy files!!

For me it is a target 2.0.0 version!

Waiting for feedback from Alexander et al.!

too bad... because it is just not slow... it freezes qgis... thanks anyway!

#12 Updated by Alexander Bruy over 7 years ago

Seems test dataset not available anymore. Is it possible to upload it again?

#13 Updated by Giovanni Manghi over 7 years ago

Alexander Bruy wrote:

Seems test dataset not available anymore. Is it possible to upload it again?

Hi Alexander, the link in the description works for me. Does not for you?

#14 Updated by Alexander Bruy over 7 years ago

Giovanni Manghi wrote:

Hi Alexander, the link in the description works for me. Does not for you?

Giovanni, it doesn't works for me.

#15 Updated by Giovanni Manghi over 7 years ago

Alexander Bruy wrote:

Giovanni, it doesn't works for me.

try here

http://ubuntuone.com/3Je6Dqp082xBERVWalE48U

#16 Updated by Giovanni Manghi over 7 years ago

Giovanni Manghi wrote:

Alexander Bruy wrote:

Giovanni, it doesn't works for me.

try here

http://ubuntuone.com/3Je6Dqp082xBERVWalE48U

seems still very slow, even with this patch

https://github.com/qgis/Quantum-GIS/pull/175

#17 Updated by Alexander Bruy over 7 years ago

Giovanni Manghi wrote:

seems still very slow, even with this patch

https://github.com/qgis/Quantum-GIS/pull/175

This patch is for very specific use case (when geometries are simply, but there are many unique values in attributes used for dissolve) and useless for usual usage

#18 Updated by Paolo Cavallini about 7 years ago

  • Target version changed from Version 1.8.0 to Version 2.0.0

#19 Updated by Giovanni Manghi almost 7 years ago

see also #5779

#20 Updated by Sandro Santilli over 6 years ago

Is this still an issue ? Is it correct for this to be marked as "causes crash or corruption"?

#21 Updated by Jürgen Fischer over 6 years ago

  • Crashes QGIS or corrupts data changed from Yes to No

#22 Updated by Giovanni Manghi over 6 years ago

Sandro Santilli wrote:

Is this still an issue ? Is it correct for this to be marked as "causes crash or corruption"?

yes, for even not very large vectors it is basically unusable.

#23 Updated by Giovanni Manghi over 6 years ago

Sandro Santilli wrote:

Is this still an issue ? Is it correct for this to be marked as "causes crash or corruption"?

don't know why it was removed the "causes crash" flag, because this tool have the tendency to freeze the program.

#24 Updated by Pieter Roggemans over 6 years ago

Indeed very slow :-(.

#25 Updated by Giovanni Manghi over 6 years ago

Pieter Roggemans wrote:

Indeed very slow :-(.

in qgis 2.0, with SEXTANTE, you have (fast) alternatives, both SAGA and GRASS have dissolve tools.

#26 Updated by Pieter Roggemans over 6 years ago

Yep, I saw and tried both, but they both gave the same type of error using the most recent qgis 1.9 (see below for the error).

I did the dissolve in ArcView now, so I'm fine, but just wanted to reconfirm the problem... should have mentioned that I already had a workaround.

Thanks anyway!

Oooops! SEXTANTE could not open the following output layers

Dissolved Polygons: C:\\Users\\Pieter\\AppData\\Local\\Temp\\sextante\\sagapolygondissolveallpolygonsac891687dc4e4bb89b8dc28e72b0cb16.shp
The above files could not be opened, which probably indicates that they were not correctly produced by the executed algorithm

Checking the log information might help you see why those layers were not created as expected

This algorithm requires SAGA to be run. A test to check if SAGA is correctly installed and configured in your system has been performed, with the following result:

SAGA seems to be correctly installed and configured

#27 Updated by Giovanni Manghi over 6 years ago

Pieter Roggemans wrote:

Yep, I saw and tried both, but they both gave the same type of error using the most recent qgis 1.9 (see below for the error).

I did the dissolve in ArcView now, so I'm fine, but just wanted to reconfirm the problem... should have mentioned that I already had a workaround.

Thanks anyway!

did you correctly configured SAGA in SEXTANTE options? in QGIS master you must use SAGA 2.1 then dissolve will run. I just tested.

Unfortunately it seems that polygon dissolve in SAGA 2.1 is bugged (it works ok in SAGA 2.0.8), but anyway the good news you can always use v.dissolve from GRASS.

#28 Updated by Pieter Roggemans over 6 years ago

I simply installed the qgis dev version from OSGeo4W, I didn't do anything to the configuration. I only found a saga 2.0.8 that is installed in the OSGeo4W dir, so I suppose that will be the version used by GGis?

Anyway... Grass gives exactly the same error as the one quoted in my previous post, only SAGA is replaced by GRASS...

#29 Updated by Giovanni Manghi over 6 years ago

Pieter Roggemans wrote:

I simply installed the qgis dev version from OSGeo4W, I didn't do anything to the configuration. I only found a saga 2.0.8 that is installed in the OSGeo4W dir, so I suppose that will be the version used by GGis?

Anyway... Grass gives exactly the same error as the one quoted in my previous post, only SAGA is replaced by GRASS...

in osgeo4w there is still saga 2.0.8 but you can manually download 2.1 and configure sextante to use it. GRASS also probably does not work because of a recent update in osgeo4w, but again in sextante configurations you can fix and then they both will work.

#30 Updated by Paolo Cavallini almost 6 years ago

  • Target version changed from Version 2.0.0 to Future Release - High Priority

#31 Updated by Giovanni Manghi almost 6 years ago

  • Priority changed from High to Normal

There is no crash or data corruption, so I'm lowering the priority also because there are alternatives in the processing toolbox.

#32 Updated by Giovanni Manghi over 5 years ago

  • Status changed from Open to Closed
  • Resolution set to wontfix

I'm closing this because the alternatives in the processing toolbox are good enough and also because no one seems to have interest in making better the native qgis tool.

#33 Updated by Giovanni Manghi over 2 years ago

The "ftools" category is being removed from the tracker, changing the category of this ticket to "Processing/QGIS" to not leave the category orphaned.

#34 Updated by Mike Bannister over 2 years ago

Giovanni Manghi wrote:

I'm closing this because the alternatives in the processing toolbox are good enough and also because no one seems to have interest in making better the native qgis tool.

I just ran into this issue with a relatively small dataset (9MB). VERY slow run time and appeared to corrupt the data. If the tool doesn't work should it be removed or have a pop up dialog that directs the user to the other options?

#35 Updated by Giovanni Manghi over 2 years ago

  • Regression? set to No
  • Description updated (diff)
  • Easy fix? set to No

Mike Bannister wrote:

Giovanni Manghi wrote:

I'm closing this because the alternatives in the processing toolbox are good enough and also because no one seems to have interest in making better the native qgis tool.

I just ran into this issue with a relatively small dataset (9MB). VERY slow run time and appeared to corrupt the data. If the tool doesn't work should it be removed or have a pop up dialog that directs the user to the other options?

yes use the GDAL based one. In QGIS3 the native QGIS tool will be much improved.

#36 Updated by Nyall Dawson over 2 years ago

Mike - which qgis version are you using? Can you attach your data?

#37 Updated by Giovanni Manghi over 2 years ago

Nyall Dawson wrote:

Mike - which qgis version are you using?

Nyall, beside master it doesn't really matter when speaking about the native QGIS tool. It has always been dead slow since its ftools times especially when dealing with large/complex input layers (and sometimes even not so large/complex).

#38 Updated by Nyall Dawson over 2 years ago

Giovanni - have you tested since ae59b733c3473f375d80a3680e44a98ab4586bd7 (i.e. 2.18)?

If it's still slow/unstable, give me test data and I'll fix! I need DATA though! DATA!!!11!! maw data!!!1!! I NEED IT!!11!!1.1.1.1.1.1

#39 Updated by Giovanni Manghi over 2 years ago

Nyall Dawson wrote:

Giovanni - have you tested since ae59b733c3473f375d80a3680e44a98ab4586bd7 (i.e. 2.18)?

If it's still slow/unstable, give me test data and I'll fix! I need DATA though! DATA!!!11!! maw data!!!1!! I NEED IT!!11!!1.1.1.1.1.1

I will get you data ;) as well as for intersections, unions, differences et al.

#40 Updated by Giovanni Manghi over 2 years ago

Nyall Dawson wrote:

Giovanni - have you tested since ae59b733c3473f375d80a3680e44a98ab4586bd7 (i.e. 2.18)?

If it's still slow/unstable, give me test data and I'll fix! I need DATA though! DATA!!!11!! maw data!!!1!! I NEED IT!!11!!1.1.1.1.1.1

see below shapefile, dissolve it by "distrito". On 2.18.10-nightly it takes minutes with the QGIS native tools, seconds with the GDAL one. On QGIS3 the native tool is as much as fast as the GDAL one.

https://www.dropbox.com/s/a3jl5z4a2z47jg8/conc.zip?dl=0

#41 Updated by Regis Haubourg over 2 years ago

Talking about dissolve tool, we have been working in the last months to fix an issue with unaryUnion raised topology exception and returning NULL geometry output in QGIS or PostGIS.
Next 3.6.2 is currently being released and moving qgis packages to GEOS 3.6.2 would be a good move, there a many bugfixes already in there useful for processing. Many thanks to Sandro Santilli for that work.

When consolidating queries to finally isolate the bug, I had to apply many optimizations only documented in GEOS source code or stackExchange (or again ESRI doc !) , such as dumping all multigeometries to single geometries before running ST_UnaryUnion, then collecting them, checking for validaty or adding a snaptogrid step for precision issues.

[[https://geos.osgeo.org/doxygen/classgeos_1_1operation_1_1geounion_1_1UnaryUnionOp.html
]]
[[https://gis.stackexchange.com/questions/204370/significant-performance-issues-with-postgis-unaryunion/204443]]



	

The check validity part is already in master, IIRC.
I think this is worth porting this logic to the default algorithm in QGIS.
See the equivalent PostGIS request that revealed a lot more robust than the raw Union operator:

select "tableA", "codeTableA", 
        st_area(ST_UnaryUnion(st_collect(geom_intersection))) geomdissolvedarea, --Unary 
        st_area(ST_UnaryUnion(st_collect(geom_intersection)))  / "areaObjectA" as ratio_couv

        FROM

        (
        Select
                'public.communes'::character varying "tableA",
                a.code::character varying  "codeTableA",
                b.code::character varying  "codeTableB",
                'public.couv4g'::character varying "tableB",
                st_intersection(st_snaptogrid(a.geom, 0.0001) , st_snaptogrid( ( CASE WHEN st_isvalid(b.geom) then b.geom ELSE st_makevalid(b.geom) END ), 0.0001))  as geom_intersection, -- snaptogrid to avoid some precision issues
        FROM

                    -- >>>>> dump all objects in tableA >>>
                    (  select code_com::character varying code, (st_dump(geom)).geom geom  from public.communes   ) a

            JOIN
                (  -- >>>>> dump all objects in tableB >>>

                select code_bds as code, st_dump(geom) from  public.couv4g 

                 ) b
            ON  (st_intersects(a.geom, b.geom)) -- spatial relation and index clause

        ) as crosselem
GROUP BY  "tableA",  "codeTableA", "areaObjectA" 

Any opinion on that? Nyall, could you ping us when your work is finished in that area?

#42 Updated by Nyall Dawson over 2 years ago

Can we open a GitHub page or something to collect these findings together? I'm having trouble keeping on top of the current state of these fubdamental algorithm related issues/improvements.

#43 Updated by Giovanni Manghi over 2 years ago

Nyall Dawson wrote:

Can we open a GitHub page or something to collect these findings together? I'm having trouble keeping on top of the current state of these fubdamental algorithm related issues/improvements.

yes Nyall, agree. I don't have permissions to create the page on GH QGIS account, probably you do. Once setup I will add sample data for QGIS native geoprocessing tools to show slowness, incorrect results and other issues.

#45 Updated by Regis Haubourg over 2 years ago

Giovanni Manghi wrote:

Nyall Dawson wrote:

Can we open a GitHub page or something to collect these findings together? I'm having trouble keeping on top of the current state of these fubdamental algorithm related issues/improvements.

I added my tips there! thanks Nyall!

Also available in: Atom PDF