How to calculate the 99th percentile of a stream of number...

Environment:

Oracle 11.2.0.3 EE on Solaris 10.5

I have a stream of numbers (for example 1000), I need to calculate the 99th percentile of the distribution of this stream in such a way that when a new ' number is introduced, I can tell if the new number is above the 99th percentile of the distribution of my workflow.

I don't have a good idea of the nature of the distribution if it's important.

I am NOT, I repeat NOT a statistician! :-)

I read the docs on the different functions available, ntile, percent_rank, percentile_cont, percentile_disc, etc. I also read a lot of articles referenced via Google.

The examples are not exactly what I'm trying and I was unable to get the result I need by trial and error (mostly!).

All suggestions are welcome!

If you need additional information, I will try to provide what I know.

-gary

Hi, Gary.

garywicke wrote:
Hi Frank

Well to try to facilitate the description of periods, I managed to confuse the issue a little more.

Sorry about that.

I need to use days not months. The script will be executed every day and watch period of ago 210 days to 31 days following there from 30 days at present.

With respect to the expected result, I need somehow calculate what turnover represents the 99th percentile (or 90th everything that works in the example) and store that in my table of this customer.

Then after a CREATE TABLE statement for this table and you want that it contains, given the data published in the sales table, after it is filled.

Another process on the whole will consider the last average of 30 days and determine if this average exceeds sales 99th percentile figure and initiate an action (alert email, etc.) if it is.

My problem is that I don't know how to calculate this 99th percentile number.

After the sample data, I'm guessing it should be close to the number of SD 3, but after reading your comments, maybe that's not correct.

That's my guess. Sometimes deviations corresponding to percentiles, sometimes they did not. People choose one or the other, depending on what amounts to their data, and what they want to show on these data. Your client has apparently chosen percentile. Unfortunately, we do not seem to know something about the problem.
Before you can solve this problem for your customer, you must understand what the problem is. Talk to your client and know exactly what is wanted here.

Here is one possible approach.

WITH     got_month_avg     AS
(
     SELECT       custno
     ,       AVG (amount)     AS month_avg
     FROM       sales
     WHERE       tran_dt     >  SYSDATE - 30
     AND       tran_dt     <= SYSDATE
     GROUP BY  custno
)
,     got_threshold_amount     AS
(
     SELECT DISTINCT
          custno
     ,     PERCENTILE_CONT (.4)          -- or .99
              WITHIN GROUP (ORDER BY  amount)
                        OVER ( PARTITION BY  custno )          AS threshold_amount
     FROM    sales
     WHERE     tran_dt     >  SYSDATE - 210
     AND     tran_dt <= SYSDATE -  30
)
SELECT       a.custno
,       a.month_avg
,       t.threshold_amount
FROM       got_month_avg          a
JOIN       got_threshold_amount  t  ON  t.custno  = a.custno
--WHERE       a.month_avg     >= threshold_amount
ORDER BY  custno
;

Output:

`   CUSTNO  MONTH_AVG THRESHOLD_AMOUNT
---------- ---------- ----------------
         5      126.5           4812.6
        26 4602.44444           3102.2

I set the WHERE clause comment, just so that we could see the results of each custno. I also put the extremely low threshold (40th percentile) just for a custono would be selected if you are not using the WHERE clause.

Subqueries are average in the last 30 days and the 40th percentile for 180 days before that. Let's take a look at custno = 5. There were two rows in the last 30 days (225 and 28 amounts), then the average in the last month is (225 + 28) / 2 = 253 / 2 = 126,5.
Within 180 days prior to that, he had 7 amounts, in ascending order:
1st (0th percentile): 647
2nd (17th percentile): 1869
3rd (33rd percentile): 3065
4th (50th percentile): 7434
5th (67th percentile): 8435
6th (83rd percentile): 9393
7th (100th percentile): 9712
Since (for now), we are interested in the 40th percentile. It will be more than number 3065 (33rd percentile) and less than 7434 (50th percentile), 3065 to 7434, somewhat 40% being somewhat closer than 33% to 50%. This number turns out to be 4812.6.

Tags: Database

Similar Questions

  • How to calculate the percentile in obiee

    Hello

    I need calculate the 25th percentile, 50th percentile 75th percentile and 95 percentile in the responses. How to get there?

    Please help me.

    Published by: user10441472 on July 23, 2009 09:08

    Use 'Ntile '. This gives you the ability to choose different percentiles. For example, if you want to see a report (10th percentile) decile, use Ntile (measure 10) and so on...

  • How to calculate the execution time of a SCTL in FPGA VI?

    Hello

    Can someone guide me that how to calculate the execution time of a SCTL for an iteration in the FPGA VI?

    Thank you and best regards,

    Rashid

    Hello r,.

    A SCTL will always run in a beat the clock it has been linked to.  So, if you use a 40 MHz clock, this loop will run in 25 ns.  If the code cannot complete in that, or if it requires two graduations of the watch to do the calculation, your code does not compile, then you have the guarantee that this will always be how long it takes this piece of code to run.

  • To build the waveform.vi function how to calculate the value of dt

    Dear all

    Please guide me How to calculate the value of dt according to waveform.vi of construction

    My sampling rate is 25000 and I take 200000 samples.

    Kind regards

    Muhammad Irfan

    It's all simple arithmetic. The inverse of the sampling frequency power of samples is then the dt or the time between samples the number of samples is not relevant.

  • How to calculate the execution time of a loop?

    Hello

    Can someone guide me that how to calculate the execution time of a loop to iterate?

    Thank you and best regards,

    Rashid

    I hope I have your question! See attached screenshot

  • How to calculate the polynomial graphic adjustment of waveform

    Hi all

    I am new to lab - view so would need a little assistance in one of the problem I have right now.

    My problem is: how to calculate the polynomial graph of waveform data adjustment? I need to convert the waveform to XY graph data, and then use the polynomial vi made integrated to calculate the fitting?

    Detail: My problem is that I have waveform graph, I calculate the vertices and the Valley, but because of the noise, my peaks and Valley detection is sometimes not exact, so to smooth the chart that I must apply the polynomial fit.

    If anyone can help me in this, I'll be very grateful.

    Thanks in advance

    Hi Omar,.

    have you seen the suggestion of Lynn above?

    You already have the values of Y (your table). Now, you need build the table of X as indicated, only to replace the value of dt with your spacing from point to point. Somewhere in your code, you know that the value that you have an x-axis indicated in milliseconds...

  • How to calculate the CPU Ready on Cluster DRS via Powercli?

    Hi all!

    I have a DRS Vsphere cluster. I want to know what is the value of the loan of CPU I have in my group.

    For example, I get 20% of powercli value, it is normal for the cluster, but if I have 100% or more, I have a problem.

    How to achieve via Powercli? And how to calculate the percentage values correctly?

    I know, I can get all values of CPU Ready of VMs cluster, but IT is not the same thing, I need overall value of CPU Ready.

    Thanks in advance!

    As far as I know you can get the cpu.ready.summation for ESXi nodes or VMs.

    For a cluster, you will need to get the value of each node in the cluster ESXi and then take the average.

    The metric cpu.radey.summation is expressed in milliseconds.

    To get a percentage, you need to calculate the percentage of loan period during the interval during which it was measured.

    Something like this (this will give the loan current %)

    $clusterName = "mycluster.

    $stat = "cpu.ready.summation".

    $esx = get-Cluster-name $clusterName | Get-VMHost

    $stats = get-Stat-entity $esx - Stat $stat - Realtime - MaxSamples 1 - forum «»

    $readyAvg = $stats | Measure-object-property - average value. Select - ExpandProperty average

    $readyPerc = $readyAvg / ($stats [0].) IntervalSecs * 1000)

    Write-Output "Cluster $($clusterName) - CPU % loan $(' {0:p}'-f $readyPerc).

  • How to calculate the cpu in the resource pool

    How to calculate the cpu in the resource pool

    and don't forget that shares in pools of resources are not inherited by the virtual machines in the pools. the action is related to the pool itself.

  • How to calculate the sum of two digital form fields based on the selection of the checkbox.

    I have a form in Acrobat Pro who needs a custom calculation. How to calculate the sum of two digital form fields based on a selection of the checkbox. I have three number fields. Field-A and B are simple one or two digits. Field-C is the sum, or the total field. I want to field-C have a control box which, when turned on and off, just gives a. gives the sum of A + B

    _ Field - 2

    _ Field - A 4

    [check] _ _ field - 6 C

    [disabled] _ _ field - 2 C

    Thank you

    The custom field C calculation script could be:

    (function () {
    
        // Get the values of the text fields, as numbers
        var v1 = +getField("A").value;
        var v2 = +getField("B").value;
    
        // Set this field's value based on the state of the check box named "CB"
        if (getField("CB").value !== "Off") {
            event.value = v1 + v2;
        } else {
            event.value = v1;
        }
    
    })();
    

    Replace 'A', 'B', and 'CB' with the real names of the fields.

  • How Illustrator calculates the height of box Em?

    I am trying to determine how Illustrator calculates the basic position of the top of the a Point text object bounding box.  She seems to be the same distance as an object of text box with the game setting of first base to box Em height line.  I just do not know what assets within the scope of the police which matches.  Help of the Illustrator defines it as:

    "The top of the box em in Asian fonts touches the top of the type object. This option is available regardless of the preference show Asian Options. »

    For some fonts, this distance seems to be near the pole of the police, but this is not true for each font.  The clues how Illustrator calculates it?

    metric-example.png

    Here you go

    var idoc = app.activeDocument;
    var itext = idoc.selection[0];
    
    var top = itext.position[1];
    var base = itext.anchor[1];
    
    alert('Em? size: ' + (top-base));
    
  • How to calculate the size of a VMFS volume?

    Creating a new partitiontable after ESXi 4 wiped empty was pretty easy with fdisk.
    These days with partedUtil and ESXi 5 is not so trivial.

    Consider that we have a disk/Lun, which looks like this

    ~ # partedUtil get /vmfs/devices/disks/mpx.vmhba1:C0:T4:L0
    93990 255 63 1509949440

    and suggests that the original - now messed the Volume was created with ESXi 5.
    How to calculate the value of the end in the sector so that I can recreate the VMFS partition like this:

    partedUtil setptbl ' / vmfs/devices/disks/mpx.vmhba1:C0:T4:L0 ' TPG '1-2048 ? '. AA31E02A400F11DB9590000C2911D1B8 0 "


    In this case, the correct value is 1509949349, but how do I calculate if I don't know the correct value?

    Hello

    the output of the command "partedUtil get /vmfs/devices/disks/mpx.vmhba1:C0:T4:L0" will be useful in your case, it's "93990 255 63 1509949440".

    gives us the values for C/H/S (cylinder/head/sector).

    A VMFS volume partition must end on the limit of a cylinder, so selecting the last sector of the partition table should be done using the formula endSector = (C * H * S - 1).

    who is '1509949440-1' = 1509949349.

    Concerning

    (Ref: http://www.virtuallyghetto.com/2011/07/how-to-format-and-create-vmfs-volume.html)

  • How to calculate the number of days/weeks/months between 2 dates?

    Hello

    I would like to know how to calculate the number of days/weeks/months between 2 dates in OBIEE 11 g, for example, I have 26/05/2013 and 19/05/2013, then I want to get 7 days.

    Thank you!
    Jamie

    Hi Jamie,

    Through this links...

    http://www.bravesoft.com/blog/?p=682
    http://twobiee.blogspot.in/2012/01/working-with-date-differences.html

    Mark as correct it allows u...
    Thank you...

  • How to calculate the size of HFM Cube in SQL Server 2005

    Hello

    How to calculate the size of the HFM Cube in SQL Server2005

    Under application used to Oracle. So what's application to SQL Server?

    ***********************************************************************************************************
    SQL > select sum(bytes/1024/1024) from dba_segments where nom_segment like 'FINANCIAL_ %' and owner = 'HFM;
    SUM(BYTES/1024/1024)

    SQL > select sum(bytes/1024/1024) from dba_segments where nom_segment like FINANCIAL% HSV' and owner = 'HFM;

    SUM(BYTES/1024/1024)
    **********************************************************************************************************

    Concerning
    Smilee

    What is your goal? The subcube in HFM is a concept that applies to application layer - not so much to the database layer. The size of the subcube is the unique number of sheets (data values for January - December inclusive, for example) for the given entity, a triplet of currency or node Parent.Child. Should be considered for accounts of parent and customs that do not exist in the database, but are calculated in the RAM of the application layer.

    So if your goal is to find the largest subcubes, you could do this by querying the database and by counting the number of records per entity value (tables DCE) or combination of entity parent.child (tables of the DCN). I am not versed in SQL, but I think the script below you would just the size of the pattern and not the sizes of subcube.

    See Accelatis.com for a third party software product that can do it for you. The function is called the subcube Analyzer and was written by the same team that wrote HFM, then they should know how :-)

    -chris

  • How to calculate the total memory used by oracle under Linux?

    Hi all

    In one of my server have 148 GB of total physical memory and 12 databases are running. For capacity planning, I need to know how the Oracle server use.

    How to calculate the use oracle database? I was perplexed because Yop out his show off 148 gb 128 GB are used.

    Below for your reference:
    ++++++++++++++++++++
    == > back to top
    top - 09:52:32 up to 151 days, 15:20, 3 users, load average: 6.11, 6.00 6.05
    Tasks: 7 running, 1314, 1321 total sleep, stopped 0, 0 zombie
    CPU: 37.8%us, 1.4%sy, 0.0%ni, 55.0%id, 5.6%wa, 0.1%hi, 0.2%si, 0.0%st
    MEM: 148290444 k total, 129368792 k used, 18921652 k free, 1450292 k buffers
    Swap: 50331632 k total, 1296188 k used, 49035444 k free, 114733336 k cached


    The memory allocated to each instance is: sga total == > 34 pga = > 48 GB gb
    +++++++++++++++++++++++++++++

    SGA PGA
    4 GB 4 gb
    4 GB 4 gb
    2 GB, 4 GB
    2 GB, 4 GB
    2 GB, 4 GB
    2 GB, 4 GB
    2 GB, 4 GB
    3 gb 4 gb
    3 gb 4 gb
    4 GB 4 gb
    3 gb 4 gb
    3 gb 4 gb

    Thank you

    Try this - http://www.pythian.com/news/29703/oracle-instance-memory-usage/ for each of your instances.

    As far as I know, different instances may share only the binary code oracle and using libraries, so you should be able to easily understand the amount of memory of all instances of 12.

    Lordane Iotzov
    http://iiotzov.WordPress.com/

  • How to calculate the month on month growth rates in an OBIEE application?

    Hi all

    I would like to ask for your help on how to calculate the months on growth rate previous/[(last month-previous month) month * 100%] in a query OBIEE. This report should always be calculated for the last 2 months available.

    I have the following query:
    Month0 | Months1. Month2
    Produces a 500 | 100. 200
    Product B 600 | 300. 150

    I would like to add month on month column as follows:
    Month0 | Months1. Month2. Month on Month (%)
    Produces a 500 | 100. 200 | + 100.00%
    Product B 600 | 300. 150. -50.00%

    I tried to add a calculated item, but it failed because I couldn't find a way to not show the column calculated in percent with 2 decimal places. In addition, I would ideally prefer to have an automatic update but as far as I understand it can be done in the automatically calculated item

    I also tried to add a new column to the column area and to filter the results for the last month, then for the previous month and then based on that to calculate the necessary ratio but unfortunately it does not work.

    Thank you in advance your advice

    Hello

    The best way to solve this problem is by using the function Ago. With this, you can create a logical column for the previous month. Then you will have 2 columns available with which you can do your calculations.

    If you want to do this with a calculated item (don't know if formatting will work for you), but you can do the more general calculation using $1 for the $2 for the 2 column and 1 column in your calculation. If ($x) columns will change with the columns in your report.

    Concerning

Maybe you are looking for