https://issues.qgis.org/https://issues.qgis.org/favicon.ico2019-03-03T22:38:26ZQGIS Issue TrackingQGIS Application - Bug report #21451: Quantile (Equal Count) on given dataset generates a lot of zero-classeshttps://issues.qgis.org/issues/21451?journal_id=1011642019-03-03T22:38:26ZNyall Dawson
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Feedback</i></li></ul><p>I don't think this is a bug -- looking at your data distribution, it's impossible to partition into 10 equal sized groups.</p> QGIS Application - Bug report #21451: Quantile (Equal Count) on given dataset generates a lot of zero-classeshttps://issues.qgis.org/issues/21451?journal_id=1011772019-03-04T07:54:31ZRichard Duivenvoorde
<ul></ul><p>@nyall: agreed.</p>
<p>Will leave it open for now, to ask a R-adept to see what R does in cases like this: take the 'same' buckets together, or create more buckets but arrange the values over it.</p> QGIS Application - Bug report #21451: Quantile (Equal Count) on given dataset generates a lot of zero-classeshttps://issues.qgis.org/issues/21451?journal_id=1011802019-03-04T09:50:47ZPedro VenĂ¢nciopedrongvenancio@gmail.com
<ul></ul><p>Hi Richard,</p>
<p>This is the correct result.</p>
<p>The R output is:</p>
<pre>
> table_random <- read.csv("\\random.csv")
> quantile(table_random$value, probs = seq(0, 1, by= 0.1))
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
-27.0 0.0 0.0 0.0 0.0 4.0 63.0 126.0 206.0 412.3 3580.0
</pre> QGIS Application - Bug report #21451: Quantile (Equal Count) on given dataset generates a lot of zero-classeshttps://issues.qgis.org/issues/21451?journal_id=1011822019-03-04T10:04:18ZRichard Duivenvoorde
<ul></ul><p>Hi Pedro, <br />Cool thanks! This is about creating the breaks isnt it? So QGIS does exactly the same.<br />But how does R divide the values then over the buckets? As you see in the QGIS screenie all values are put in the first 'zero'-bucket.<br />OR is this just a 'dumb' question, as you should just not use this method for such data.</p> QGIS Application - Bug report #21451: Quantile (Equal Count) on given dataset generates a lot of zero-classeshttps://issues.qgis.org/issues/21451?journal_id=1011852019-03-04T11:06:46ZPedro VenĂ¢nciopedrongvenancio@gmail.com
<ul><li><strong>File</strong> <a href="/attachments/download/14510/random_abs_freq.ods">random_abs_freq.ods</a> added</li></ul><p>Hi Richard,</p>
<p>There are several forms to calculate quantiles. R implements, by default, the types described here:</p>
<p><a class="external" href="https://www.rdocumentation.org/packages/stats/versions/3.5.2/topics/quantile">https://www.rdocumentation.org/packages/stats/versions/3.5.2/topics/quantile</a></p>
<p>The "problem" with your dataset is that the value "0" (zero) is repeated much more (2605 of 5408) than any other value.</p>
<p>Basically what quantile does is split the sample in n parts, in such a form that any part has the same amount of values. For instance, if you divide in 5 parts, each part should became with 20% of the sample (in your case, points).</p>
<p>The easiest way to check the percentiles/quantiles in a spreadsheet is to calculate the absolute frequency, then the relative frequency, and then the cumulative relative frequency. After you have this, you just check the breaks, finding the value that match the cumulated relative frequency you are looking for (the percentile). Please see the spreadsheet attached with your data. For instance, the value that corresponds to cumulated relative frequency 0 is -27 (minimum); 0.5 is 4; 0.6 is 63; and so on. But as 0 has a relative frequency of more than 0.48, it includes the percentile 0.1, 0.2, 0.3 and 0.4.</p>
<p>So, with this distribution of data, or you reduce the number of classes used by quantile, or it is better to use another method.</p>