Bug report #19973

QGIS 3.3 master delete duplicate geometries does not work

Added by salvatore fiandaca over 5 years ago. Updated over 5 years ago.

Status:Closed
Priority:High
Assignee:-
Category:Processing/Core
Affected QGIS version:3.3(master) Regression?:No
Operating System:win 10 - osgeo4w Easy fix?:No
Pull Request or Patch supplied:No Resolution:
Crashes QGIS or corrupts data:No Copied to github as #:27795

Description

I extracted the vertices of the polygonal vector that I attach,
subsequently I launch the geo-algorithm 'delete duplicate geometries', the process is fast but I stay still at 99% and does not create anything

it also does not work in 3.2.3

delete.png - screenshot (33.6 KB) salvatore fiandaca, 2018-09-27 09:01 PM

duplicate.7z - vector polygon (46.3 KB) salvatore fiandaca, 2018-09-27 09:03 PM

delete2.png (32.7 KB) salvatore fiandaca, 2018-09-27 09:24 PM

Associated revisions

Revision 9698444f
Added by Nyall Dawson over 5 years ago

[processing] Fix inefficiencies in Delete Duplicate Geometries algorithm

..and make progress bar more accurate.

Use a spatial index to avoid comparing every feature to every other
feature, and only compare against features with intersecting bounding
boxes instead. Also optimise feature requests and loop logic.

Benchmarks:

Point layer, 6000k features

Before: 30 seconds
After: 0.15 seconds

Point layer, 45k features

Before: > 10 minutes
After: 7 seconds

Fixes #19973

History

#1 Updated by salvatore fiandaca over 5 years ago

EDIT:
for a dataset of <6k points it takes 243 seconds

#2 Updated by Andrea Giudiceandrea over 5 years ago

On my system, core i5-460M, 8 GB RAM, Windows 7 64 bit:

vertices extracted: 5629 points

running "delete duplicate geometries" on 5629 points takes

~50 seconds with QGIS 2.18.23 64 bit

50 seconds with QGIS 3.2.3 64 bit

88 seconds with QGIS 3.3.0 (80723e89fd) 64 bit (slower probably due to the debug build slowness)

resulting in a 3948 points layer

All the three versions however are affected by the "stuck at 99%" misleading strange behaviour you reported.

#3 Updated by Nyall Dawson over 5 years ago

Not a regression - the algorithm is just extremely inefficient and doesn't scale for large layers (it compares EVERY feature with EVERY other). The solution here is probably to add a spatial index so that only features with intersecting bounding boxes are tested for equality

#4 Updated by Nyall Dawson over 5 years ago

(For reference -- there's a dedicated, optimised, 'remove duplicate vertices' algorithm which may be of use here)

#5 Updated by Nyall Dawson over 5 years ago

How's "Execution completed in 0.18 seconds" sound?

#6 Updated by Nyall Dawson over 5 years ago

  • Status changed from Open to In Progress

#7 Updated by salvatore fiandaca over 5 years ago

Nyall Dawson wrote:

How's "Execution completed in 0.18 seconds" sound?

wow, it sounds great
thank you

#8 Updated by Nyall Dawson over 5 years ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Closed

Also available in: Atom PDF