This is good news?

If a database contains the following statistics:
No. of Queries         Avg      Std Deviation
134744                  0.14     0.19
.. masse you consider that good or bad?

What I want to come, it is that the average time required to execute 130 000 queries + is very good for us: 0.14 seconds, it's more than enough to keep users happy. But a gap of 0.19 on this sample say you the "0.14 ' is a fairly reliable way, or you are likely to get a month of Sundays only?

Another database of reports:
29247     0.266     0.9
It of a worse average (although everyone is still happy with 0.26 seconds), but the standard deviation of 0.9 means it is a much less reliable than the other result means? Is it possible to quantify how much?

Sorry: it is not a question of database, but one statistic, but if someone could explain how to spot a gap 'good' or 'bad', and when we move to another, I'd be fascinated! I've read the stuff of Wikipedia on the variance and its square root (standard deviation), but I do not understand how theory translates simple practices. I should have paid attention in math class, I guess...

First things first: the "unit of measure" a deviation is the same as the things being measured in the first place. Then your numbers say that your query has an average speed of 0.266 seconds, with a standard deviation of the average of 0.9 seconds.

Second, statistics tells us that if the data are normally distributed, you can expect at 95% of the results of 2 deviations average types. In other words, 2 * 0.9 = 1.8, then numbers suggest 95% queries on your system last between 1 and 60 + 2.0 seconds. It's a second four distributed around an average of only 0.266 seconds.

This is a big gap, perhaps, but not as big as in your old system, where your figures suggest queries took 4.5 +-(2*4.8) =-5,1 + 14.1 seconds. It is a bad medium to begin with (assuming it is the same data as your new query manages to extract in 0.3 it only takes a second), but it's also a very large 20 seconds spread around average... which means the old query run very unpredictable.

(By the way, is the fact that your broadcast involves some queries returned until they were submitted which suggests your data have a tilt!) I'll make a heroic assumption that your recliner is the result of only one or two queries really bad, however and a plough on despite everything!)

We can't know that if a 0.2 second query with a gap of 4 seconds is good enough for your application. If the query is used to power a Web site, for example, a waiting 4 second may be just about acceptable. An average of 4 seconds with a gap of 20 sounds pretty bad, though.

On the other hand, a second 4 spread is 15 times the average 0.26, while a second 20 distributed is only 5 times 4 seconds on average. Your new query is, proportionally, more unpredictable in performance than the same old if the old one was just predictable bad! I wouldn't worry, however: when you're dealing with very fast queries, even slight variations may look relatively bad, but waiting instead of 0,2 0,4 seconds is still a better result than to wait 8 seconds instead of 4!

Anyway: I'm not Statistician and my analysis of your numbers can be completely wacky, but you raise an important point which is that we need to know what is an average middle - and how to spot a bad to a good, regardless of its absolute value - when doing things like optimizing the performance. I may set up a more thoughtful piece of blog to discuss the issue. I'll let you know if I do.

Tags: Database

Similar Questions

Maybe you are looking for