Hash Join pouring tempspace

I am using 11.2.0.4.0 - oracle Version. I have underwear paremters on GV$ parameter
pga_aggregate_target - 8GB
hash_area_size - 128 KB
Sort_area_size - 64 KB

Now under query plan is running for ~ 1 HR and resulting in question tempspace. Unable to extend segment temp of 128 in tablespace TEMP.
We have currently allocated ~ 200GB at the tempspace. This query runs good for the daily race with Nested loop and the required index, but to run monthly that it changes the plan due to the volume I think and I went for the join of HASH, who believe is good decision by the optimizer.

AFAIK, the hash join reverse to temp will slow query response time, so I need expert advice, if we increase the pga_aggregate_target so that HASH_AREA_SIZE will be raised to adequate to accommadate the driving table in this? howmuch size and should put us, it should be the same as the size of the array of conduct? or are there other work around the same? Note - the size of the driving table B is "~ 400GB.

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT               |                          |       |       |       |    10M(100)|          |       |       |
|   1 |  LOAD TABLE CONVENTIONAL       |                          |       |       |       |            |          |       |       |
|   2 |   FILTER                       |                          |       |       |       |            |          |       |       |
|   3 |    HASH JOIN                   |                          |  8223K|  1811M|       |    10M  (1)| 35:30:55 |       |       |
|   4 |     TABLE ACCESS STORAGE FULL  | A_GT                     |    82 |   492 |       |     2   (0)| 00:00:01 |       |       |
|   5 |     HASH JOIN                  |                          |  8223K|  1764M|   737M|    10M  (1)| 35:30:55 |       |       |
|   6 |      PARTITION RANGE ITERATOR  |                          |  8223K|   643M|       |    10M  (1)| 34:18:55 |   KEY |   KEY |
|   7 |       TABLE ACCESS STORAGE FULL| B                        |  8223K|   643M|       |    10M  (1)| 34:18:55 |   KEY |   KEY |
|   8 |      TABLE ACCESS STORAGE FULL | C_GT                     |    27M|  3801M|       |   118K  (1)| 00:23:48 |       |       |
-----------------------------------------------------------------------------------------------------------------------------------


Find plans by trial and error is not an efficient use of the time - and if it was a good idea to avoid joins and hash, then Oracle have set up their in the first place. I can understand your DBA with a yen to avoid, however, because any spill of a hash for disc often join a (relative) effect much more important than you might expect.  In this case, however, you have a loop nested in A_GT which operates 39M times to access a table of 82 lines index - clearly (a) CPU work to achieve would be reduced if you included table columns in the index definition, but more significantly the cost of CPU of the A_GT/C_GT join would drop if you have built a hash in memory of A_GT table that is not a hash join.

What you ask for is a description of how to optimize a warehouse of data on Exadata machine - a forum is not the right place for this discussion; all I can say is that you and your databases need to do some testing to find out the best way to match queries to the Exadata has, so keep an eye on queries that produces the application in case of change of usage patterns.  There are a few trivial generalities that anyone could offer:

(a) partitioning a day is good, so you can ensure that your queries are able to do partitioning to remove only the days where they want; even better is if there is a limited set of partitions that you can

(b) e/s for joins of large hash spilling to disk can be catastrophic compared to the underlying i/o for tablescans for the first access to the data, which means that simple queries can give the impression that Exadata is incredibly fast (especially when the index the flash cache and storage are effective), but slightly more complex queries are surprisingly slow in comparison.

(c) once you have passed the flash server cell cache, single block reads are very large and slow - queries that do a lot of single e/s (viz: big reports using nested against randomly scattered data loops joins) can cause very slow IO.

You must know the data types, know the general structure of your queries, be ready to generate of materialized views for derived complex data and understand the strengths and weaknesses of the Exadata.

Concerning

Jonathan Lewis

Tags: Database

Similar Questions

  • Indexes and hash join

    Hi all

    I'll ask very quick question, can I use the hash join to two tables with access by index as noop nested?  Is this possible?

    For example:

    HASH JOIN

    TABLE ACCESS BY INDEX ROWID

    INDEX RANGE SCAN

    TABLE ACCESS BY INDEX ROWID

    INDEX RANGE SCAN


    * Edition

    Thank you

    Of course, you can, if you do reference it:

    orclz > set autot traceonly exp

    orclz > create index emp_ename_i on emp (ename);

    The index is created.

    orclz > create index dept_dname_i on dept (dname);

    The index is created.

    orclz > select / * + use_hash (emp dept) * / * from emp join natural dept where dname = 'SALES' and ename = 'MILLER ';

    Execution plan

    ----------------------------------------------------------

    Hash value of plan: 937889317

    -----------------------------------------------------------------------------------------------------

    | ID | Operation | Name | Lines | Bytes | Cost (% CPU). Time |

    -----------------------------------------------------------------------------------------------------

    |   0 | SELECT STATEMENT |              |     1.   117.     4 (0) | 00:00:01 |

    |*  1 |  HASH JOIN |              |     1.   117.     4 (0) | 00:00:01 |

    |   2.   TABLE ACCESS BY ROWID INDEX BATCH | EMP |     1.    87.     2 (0) | 00:00:01 |

    |*  3 |    INDEX RANGE SCAN | EMP_ENAME_I |     1.       |     1 (0) | 00:00:01 |

    |   4.   TABLE ACCESS BY ROWID INDEX BATCH | DEPT |     1.    30.     2 (0) | 00:00:01 |

    |*  5 |    INDEX RANGE SCAN | DEPT_DNAME_I |     1.       |     1 (0) | 00:00:01 |

    -----------------------------------------------------------------------------------------------------

    Information of predicates (identified by the operation identity card):

    ---------------------------------------------------

    1 - access("EMP".") DEPTNO "=" DEPT ". ("' DEPTNO ')

    3 - access("EMP".") ENAME "= 'MILLER')

    5 - access("DEPT".") DNAME "= 'SALES')

    Note

    -----

    -the dynamic statistics used: dynamic sampling (level = 4)

  • Hash join

    Hi friends,


    If I have a table T1 and T2 table. Table T1 is to have 100 rows and table T2 has 20 rows. When you make a hash join, what table should be used to make the hash table, a larger or smaller, and why? IF the data set is too small for consideration then please keep table T1 with 10 million rows and the table T2 with 1 million rows.




    Thanks as always :)

    If you are a developer: "the database optimizer chooses."

    If you are a student: http://tahiti.oracle.com ' read the docs... we are not helping people cheat on tests. "

  • In buffered memory hash join

    Hi all

    What is the operation of source line buffered hash join. I know that it uses the hash of the pass to send data but other then that what would be the difference in the hash join and put in buffered memory hash join. I would like to know work mechanisam.

    Foreground:
    ========
    The Coordinator query scans, aggregates and hash distributes ZIPXXX of line 13 to the slave series 2

    Put slave 1 parallel analyses and hash distributes ZIPXXX from line 17 to series 2 slave

    Slave set 2 joins these on line 7, then hash (probably on another column) distributes backward for slave set 1 - that is why the hash as line 7 join must be buffered.

    Slave 2 set Parallels and hash analysis distribute REP_XXX on line 21 to the slave series 1

    Hash slave joins them to line 4, then passes them the query with the Coordinator to write to the table.

    It seems that you should 'alter session enable parallel DML' to allow parallel loading in select in line 4.

    Second plan:
    =========
    The plan is virtually identical, although the collection of statistics seems to have changed the table names.

    The query Coordinator scans, aggregates and the DIFFUSE GEO_XXX of line 13 to the slave set 2

    Because the very small result set aired together slave 2 can analyze and join the GEO_SXXX of line 15 then diffuses back result (probably on another column) to the slave set 1

    Because the very small result set aired slave together 1 can scan and join the REP_XXX of line 15, and then pass the results to the QC to write the table.

    Concerning
    Jonathan Lewis
    http://jonathanlewis.WordPress.com
    http://www.jlcomp.demon.co.UK

  • change the selected column puts loop nested in the hash join

    Hi all

    If "select * from...". «I "select table.* of...» "then plan changes.
    PLAN_TABLE_OUTPUT
    ----------------------------------------------------------------------------------------------------------------------------------------
    SQL_ID  a4fgvz5w6b0z8, child number 0
    -------------------------------------
    select * from ofertas ofe, ofertas_renting ofer where ofer.codigodeempresa = ofe.codigodeempresa    AND ofer.numerooferta =
    ofe.numerooferta    AND ofe.captacion = '1'
    
    Plan hash value: 3056192218
    
    ----------------------------------------------------------------------------------------------------------------------------------------
    | Id  | Operation          | Name            | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
    ----------------------------------------------------------------------------------------------------------------------------------------
    |*  1 |  HASH JOIN         |                 |      1 |  23766 |  4032   (2)|  27421 |00:00:00.96 |    5444 |  9608K|  1887K|   10M (0)|
    |*  2 |   TABLE ACCESS FULL| OFERTAS         |      1 |  23969 |  1324   (2)|  27421 |00:00:00.14 |    2140 |       |       |          |
    |   3 |   TABLE ACCESS FULL| OFERTAS_RENTING |      1 |  71297 |   937   (2)|  72385 |00:00:00.22 |    3304 |       |       |          |
    ----------------------------------------------------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - access("OFER"."CODIGODEEMPRESA"="OFE"."CODIGODEEMPRESA" AND "OFER"."NUMEROOFERTA"="OFE"."NUMEROOFERTA" AND
                  SYS_OP_DESCEND("OFER"."NUMEROOFERTA")=SYS_OP_DESCEND("OFE"."NUMEROOFERTA"))
       2 - filter("OFE"."CAPTACION"='1')
    
    
    22 filas seleccionadas.
    
    PLAN_TABLE_OUTPUT
    ----------------------------------------------------------------------------------------------------------------------------------------
    SQL_ID  2410uqu059fgw, child number 0
    -------------------------------------
    select ofe.* from ofertas ofe, ofertas_renting ofer where ofer.codigodeempresa = ofe.codigodeempresa
    AND ofer.numerooferta = ofe.numerooferta    AND ofe.captacion = '1'
    
    Plan hash value: 4206210976
    
    ----------------------------------------------------------------------------------------------------------------
    | Id  | Operation          | Name               | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
    ----------------------------------------------------------------------------------------------------------------
    |   1 |  NESTED LOOPS      |                    |      1 |  23766 |  1333   (3)|  27421 |00:00:00.58 |   33160 |
    |*  2 |   TABLE ACCESS FULL| OFERTAS            |      1 |  23969 |  1324   (2)|  27421 |00:00:00.27 |    3910 |
    |*  3 |   INDEX UNIQUE SCAN| PK_OFERTAS_RENTING |  27421 |      1 |     0   (0)|  27421 |00:00:00.26 |   29250 |
    ----------------------------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - filter("OFE"."CAPTACION"='1')
       3 - access("OFER"."CODIGODEEMPRESA"="OFE"."CODIGODEEMPRESA" AND
                  "OFER"."NUMEROOFERTA"="OFE"."NUMEROOFERTA")
    
    
    22 filas seleccionadas.
    Why change if the cost to access the complete table OFFERS is identical in the two plans?

    Thank you very much.

    Joaquin Gonzalez

    Published by: Joaquín González on November 4, 2008 17:32

    Joaquín González wrote:
    Hello

    Perhaps the reason for Blevel = 0 is?

    "This is."
    some common cases that could result in a variation between the basic formula and the
    "result:

    ...

    "Index where the blevel is set to 1 (the index goes directly from the root block in the).
    leaf blocks). The optimizer ignores effectively the blevel if each column in the index
    appears in a predicate of equality. "

    Joaquin,

    you're referring to the chapter "access Simple B-tree", it is a nested loop operation, so this does not apply. You can see that the 'Simple B-tree access' refers to a cost of 1, you have tested yourself.

    I think that it is a special case, if you have a nested loop operation that uses access unique index as the source of the inner line, then the cost of access unique index is simply BLEVEL - 1. You might get a different cost if access additional table of rowid is involved, which is usually the case. But even in this case access to the unique index is always encrypted in BLEVEL - 1, and access the table by rowid is usually encrypted to 1 by iteration.

    You can see on page 313 (chapter "Nested Loop") that Jonathan has used an example involving a unique index scan that also has a cost of 0.

    Kind regards
    Randolf

    Oracle related blog stuff:
    http://Oracle-Randolf.blogspot.com/

    SQLTools ++ for Oracle (Open source Oracle GUI for Windows):
    http://www.sqltools-plusplus.org:7676 /.
    http://sourceforge.NET/projects/SQLT-pp/

  • Always parallel hash join paginate in TEMP

    Hello

    I've known some strange behaviors on Oracle 9.2.0.5 recently: hash query simple join of two tables - smaller with 16 k records/1 MB in size and more important with 2.5 m records/1.5 GB in size is trading at TEMP at the launch in parallel mode (4 game PQ slaves). What is strange performance series runs as expected - in memory occurs hash join. It should be added that running parallel and series correctly selects smaller table as an intern but parallel query always decides to buffering data source (no matter what is its size).

    To be more precise - all the table statistics gathered, I have enough memory PGA assigned to queries (WORKAREA_POLICY_SIZE = AUTO, PGA_AGGREGATE_TARGET = 6 GB) and I analyze the results. Same hidden parameter px_max_size MMSis set correctly to about 2 GB, the problem is that parallel execution still decides to Exchange (even if the internal size of data for each slave is about 220 KB.).

    I dig in the footsteps (event 10104) and found a substantial difference between series and parallel execution. It seems that some internal indicator order PQ slaves to the buffer always data, here's what I found in trace slave PQ:

    HASH JOIN STATISTICS (INITIALIZATION)
    Original brief: 4428800
    Memory after all overhead costs: 4283220
    Memory for the slot machines: 3809280
    Calculated overhead for partitions and slot machine/line managers: 473940
    Fan-out hash join: 8
    Number of sheets: 9
    Number of slots: 15
    Diluvium IO: 31
    Block size: 8
    Cluster size (slot): 248
    Join hash fanout (manual): 8
    Cluster/slot size (manual): 280
    Minimum number of bytes per block: 8160
    Bit memory allocation (KB) vector: 128
    By partition bit vector length (KB): 16
    Possible maximum line length: 1455
    Size (KB) of construction found: 645
    Estimates the length of the line (including overhead): 167
    Immutable flags:
    The result of the join of the BUFFER for a parallel query
    kxhfSetPhase: phase = BUILD
    kxhfAddChunk: Add 0 (sz = 32) piece to the table slot machine
    kxhfAddChunk: chunk 0 (lb = 800003ff640ebb50, slotTab = 800003ff640ebce8) successfully added
    kxhfSetPhase: phase = PROBE_1

    In bold is the part that is not present in serial mode. Unfortunately that I can't find something that could help identify the reason or the definition that drives this behavior :(

    Best regards
    Bazyli

    Published by: user10419027 on October 13, 2008 03:53

    Buzzylee wrote:
    Jonathan,
    >
    After the trials of today that my understanding of the problem has not significantly changed - I still don't understand why Oracle swaps table of probe on the disc.
    The only new, is that I see it's not typical hash join of "on the disk", because the inner table is not written to TEMP. More you confirmed this immutable flag is not forcing this kind of behavior (BTW, thanks for that!).

    So maybe that's the bug? In the meantime, I checked it against never version of DB (9.2.0.8) - always the same behavior.

    I copied your example - the behavior also appears in 10g and 11g.
    This probably isn't a bug, but it may be a case where a generic strategy is not appropriate.

    The extra partition does NOT probe table, now is the result of the hash join. The result is built until this draft is sent to the next 'series of slave' (who is being the Coordinator of the application in this case). Your allocation of memory allowed for about 18 slots (diluvium IO lots) of 31 blocks each. You used 8 of them for the hash table, the rest is available to hold the result.

    Somewhere in your path, around the point where you go from scripture readings, you should see a summary on the partition 8 and set the number of "memory seats" that will tell you the size of the result.

    If the difference between the clusters and the slots in the memory is low, you can see that by setting the '_hash_multiblock_io_count' to a value less than 31 than the selected optimizer free you enough memory for the hash table for the result set to build in memory.

    Another option - to circumvent this spill - is to switch to a (broadcast, none) distribution.

    Concerning
    Jonathan Lewis
    http://jonathanlewis.WordPress.com
    http://www.jlcomp.demon.co.UK

  • Join hash Anti-NA

    A simple another day at the office...

    What has been the case.
    A colleague contacted me saying that he had two similar queries. One of the data return, the other not.
    The "simplified" version of both applications looked like:
    SELECT col1
      FROM tab1
     WHERE col1 NOT IN (SELECT col1 FROM tab2);
    This query returned no data, however it - and subsequently, I also never knew that there was an inconsistency in the data, which would have had to go back to the lines.
    This was also proved/shown by the second query:
    SELECT col1
      FROM tab1
     WHERE NOT EXISTS
              (SELECT col1
                 FROM tab2
                WHERE tab1.col1 = tab2.col1);
    This query returned the expected difference. And this request is in fact identical to the first request!
    Even when we have hardcoded extra WHERE clause, the result was the same. No line for:
    SELECT *
      FROM tab1
     WHERE  tab1.col1 NOT IN (SELECT col1 FROM tab2)
           AND tab1.col1 = 'car';
    and the correct lines to:
    SELECT *
      FROM tab1
     WHERE     NOT EXISTS
                  (SELECT 1
                     FROM tab2
                    WHERE tab1.col1 = tab2.col1)
           AND tab1.col1 = 'car';
    After an hour of searching, trying to reproduce the problem, I was almost about to give up and send it to Oracle Support qualifying as a bug.
    However, there is a difference that I saw, that could be the cause of the problem.
    Although the statements are almost the same, the execution plan showed a slight difference. The NOT IN query execution plan looked like:
    Plan
    SELECT STATEMENT ALL_ROWS Cost: 5 Bytes: 808 Cardinality: 2
    3 HASH JOIN ANTI NA Cost: 5 Bytes: 808 Cardinality: 2
    1 TABLE ACCESS FULL TABLE PIM_KRG.TAB1 Cost: 2 Bytes: 606 Cardinality: 3 
    2 TABLE ACCESS FULL TABLE PIM_KRG.TAB2 Cost: 2 Bytes: 404 Cardinality: 2 
    Whereas the execution plan of the query with the NOT EXISTS looked like:
    Plan
    SELECT STATEMENT ALL_ROWS Cost: 5 Bytes: 808 Cardinality: 2
    3 HASH JOIN ANTI Cost: 5 Bytes: 808 Cardinality: 2
    1 TABLE ACCESS FULL TABLE PIM_KRG.TAB1 Cost: 2 Bytes: 606 Cardinality: 3 
    2 TABLE ACCESS FULL TABLE PIM_KRG.TAB2 Cost: 2 Bytes: 404 Cardinality: 2 
    See the difference?
    Is not knowing what a "HASH JOIN ANTI NA" was exactly, I entered My Oracle Support knowledge base as a search command. In addition to a few lists of patch-set, I also found Document 1082123.1, which explains all about the HASH JOIN ANTI NULL_AWARE.

    In this document, the behavior we've seen explained, with the most important is the note:
    "*' If t2.n2 contains NULL values, do not return all t1 lines and cancel."

    And then it suddenly hit me as I was unable to reproduce the case using my own created test tables.

    In our case, this meant that if tab2.col1 would have contained all the rows with a NULL value, the join between the two tables could not be achieved based on a clause 'NOT IN'.
    The query would end without any result!
    And that's exactly what we saw.

    The query with the NOT EXISTS does not use an ANTI NULL_AWARE JOIN and therefore does not return the results

    Also the workaround solution mentioned:
    alter session set "_optimizer_null_aware_antijoin" = false;
    seems to not work. Allthought the execution plan changes:
    Plan
    SELECT STATEMENT ALL_ROWS Cost: 4 Bytes: 202 Cardinality: 1 
    3 FILTER 
    1 TABLE ACCESS FULL TABLE PIM_KRG.TAB1 Cost: 2 Bytes: 606 Cardinality: 3 
    2 TABLE ACCESS FULL TABLE PIM_KRG.TAB2 Cost: 2 Bytes: 404 Cardinality: 2 
    It will always return no line!


    And now?

    As a document explaining the behavior, I'm doubting if we can classify this as a bug. But in my opinion, if the developers do not know this strange behavior, they easily call it a bug.
    The 'problem' is easily solved (or work around) using the NOT EXISTS or NVL solution with the joined columns. However, I expect the optimizer to sort these things himself.


    For all those who want to reproduce/investigate this case, I have listed my test code.
    The database version, we used was 11.1.0.7 on Windows 2008 R2. I don't know anyone here of the operating system.
    -- Create two tables, make sure they allow NULL values
    CREATE TABLE tab1 (col1 VARCHAR2 (100) NULL);
    CREATE TABLE tab2 (col1 VARCHAR2 (100) NULL);
    
    INSERT INTO tab1
    VALUES ('bike');
    
    INSERT INTO tab1
    VALUES ('car');
    
    INSERT INTO tab1
    VALUES (NULL);
    
    INSERT INTO tab2
    VALUES ('bike');
    
    INSERT INTO tab2
    VALUES (NULL);
    
    COMMIT;
    
    -- This query returns No results
    SELECT col1
      FROM tab1
     WHERE col1 NOT IN (SELECT col1 FROM tab2);
    
    -- This query return results
    SELECT col1
      FROM tab1
     WHERE NOT EXISTS
              (SELECT col1
                 FROM tab2
                WHERE tab1.col1 = tab2.col1);
    I have also written a ticket with the text above to http://managingoracle.blogspot.com

    Anyone who has the true explanation of this behavior as in why the HASH JOIN ANTI takes end. Please give details
    Thank you

    Kind regards
    FJFranken

    As you have discovered, NOT IN and EXISTS are NOT the same.

    This is expected behavior when comparing with the value NULL.

    See:
    http://jonathanlewis.WordPress.com/2007/02/25/not-in/

  • Join cardinality estimate

    I am using version - 11.2.0.4.0 - Oracle.

    I have below details of stats for the two tables with no histograms on columns

    Table T1 - NUM_ROWS - 8 900 759
    ------------------------------
    column_name num_nulls num_distinct density
    C1 100800 0 9.92063492063492E - 6
    0-7184 0.000139198218262806 C2


    Table T2 - NUM_ROWS - 28835
    ---------------------------------
    column_name num_nulls num_distinct density
    C1 0 101 0.0099009900990099
    0 39 0.0256410256410256 C2

    Query:
    ------

    Select * from T1, T2
    WHERE t1.c1 = t2.c1;


    Execution plan
    ----------------------------------------------------------
    Hash value of plan: 4149194932

    --------------------------------------------------------------------------------------------------------------------------------
    | ID | Operation | Name                     | Lines | Bytes | TempSpc | Cost (% CPU). Time | Pstart. Pstop |
    --------------------------------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT |                          |  2546K |   675 M |       | 65316 (1) | 00:13:04 |       |       |
    |*  1 |  HASH JOIN |                          |  2546K |   675 M |  5944K | 65316 (1) | 00:13:04 |       |       |
    |   2.   STORE TABLE FULL ACCESS | T2 | 28835 |  5603K |       |   239 (1) | 00:00:03 |       |       |
    |   3.   RANGE OF PARTITION ALL THE |                          |  8900K |   670 M |       | 26453 (1) | 00:05:18 |     1.     2.
    |   4.    STORE TABLE FULL ACCESS | T1             |  8900K |   670 M |       | 26453 (1) | 00:05:18 |     1.     2.
    --------------------------------------------------------------------------------------------------------------------------------


    as the below rule says its

    Join selectivity =
    ((num_rows (t1) - num_nulls (t1.c1)) / (t1) num_rows) *.
    ((num_rows (t2) - num_nulls (t2.c2)) / (t2) num_rows).
    Greater (num_distinct (T1. (C1), num_distinct (t2.c2))

    Join selectivity = (((28835-0) / (28835)) * ((8900759-0)/8900759)) / 100800)

    Join cardinality = join selectivity * num_rows (t1) * num_rows (t2)
    = (((28835-0) / (28835)) * ((8900759-0)/8900759)) / 100800) * (8900759 * 28835)
    = 2546164.54, which corresponds to the output of the above plan.

    but when I add a different join condition as below, I am not able to understand, how the cardinality of the join becomes 28835? And what a difference it will behave in case of presence of histogram?

    Select * from T1, T2
    WHERE t1.c1 = t2.c1
    and t1.c2 = t2.c2;

    Execution plan
    ----------------------------------------------------------
    Hash value of plan: 1645075573

    ---------------------------------------------------------------------------------------------------------------------------------
    | ID | Operation | Name                     | Lines | Bytes | TempSpc | Cost (% CPU). Time | Pstart. Pstop |
    ---------------------------------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT |                          | 28835 |  7828K |       | 65316 (1) | 00:13:04 |       |       |
    |*  1 |  HASH JOIN |                          | 28835 |  7828K |  5944K | 65316 (1) | 00:13:04 |       |       |
    |   2.   JOIN FILTER PART CREATE | : BF0000 | 28835 |  5603K |       |   239 (1) | 00:00:03 |       |       |
    |   3.    STORE TABLE FULL ACCESS | T2 | 28835 |  5603K |       |   239 (1) | 00:00:03 |       |       |
    |   4.   RANGE OF PARTITION-JOIN FILTER |                          |  8900K |   670 M |       | 26453 (1) | 00:05:18 | : BF0000 | : BF0000 |
    |   5.    STORE TABLE FULL ACCESS | T1             |  8900K |   670 M |       | 26453 (1) | 00:05:18 | : BF0000 | : BF0000 |
    ---------------------------------------------------------------------------------------------------------------------------------

    Total of selectivity = selectivity of c1 * c2 selectivity

    = ((((28835 - 0)/(28835)) * ((8900759-0)/8900759))/ 100800)*((((28835 - 101)/(28835)) * ((8900759-0)/8900759))/ 7184)


    total of cardinality = selectivity * num_rows (t1) * num_rows (t2) total

    =

    ((((28835 - 0)/(28835)) * ((8900759-0)/8900759))/ 100800) * ((((28835 - 101)/(28835)) * ((8900759-0)/8900759))/ 7184) * (8900759) * ()28835) = 353.18 but its does not not at the outut above

    --> C2 for table T2 is partitioned column. T1 is not partitioned.
    --> There are two partitions of the range for the T2. And one of them is empty, the data resides in a single partition.
    --> As a single partition is empty, so it would be to visit only one partition for the final results.
    --> I use "set autotrace traceonly explain" to get the plan for the query.

    --> Here is the max and min for c1 and c2 for the T2 value

    Max (C1) min (c1) (c2) max Min (c2)
    86 383759 2/28 / 2011 23:59:38 28/02/2011 12:00:02 AM

    Here is the max and min for c1 and c2 for the T1 value

    Max (C1) min (c1) (c2) max Min (c2)
    4860 354087 2/28 / 2011 23:55:47 28/02/2011 12:07:49 AM

    --> Given below is the plan with the predicate section

    Execution plan
    ----------------------------------------------------------
    Hash value of plan: 1645075573

    ---------------------------------------------------------------------------------------------------------------------------------
    | ID | Operation | Name                     | Lines | Bytes | TempSpc | Cost (% CPU). Time | Pstart. Pstop |
    ---------------------------------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT |                          | 28835 |  8166K |       | 70364 (1) | 00:14:05 |       |       |
    |*  1 |  HASH JOIN |                          | 28835 |  8166K |  5944K | 70364 (1) | 00:14:05 |       |       |
    |   2.   JOIN FILTER PART CREATE | : BF0000 | 28835 |  5603K |       |   239 (1) | 00:00:03 |       |       |
    |   3.    STORE TABLE FULL ACCESS | T1 | 28835 |  5603K |       |   239 (1) | 00:00:03 |       |       |
    |   4.   RANGE OF PARTITION-JOIN FILTER |                          |  8900K |   772 M |       | 26453 (1) | 00:05:18 | : BF0000 | : BF0000 |
    |   5.    STORE TABLE FULL ACCESS | T2             |  8900K |   772 M |       | 26453 (1) | 00:05:18 | : BF0000 | : BF0000 |
    ---------------------------------------------------------------------------------------------------------------------------------

    Information of predicates (identified by the operation identity card):
    ---------------------------------------------------

    1 - access("T2".") C2 '= 'T1'.' C2"AND"T2 ". ' C1 '= "T1". ("" C1 ")

    --> I see below three values in all current data for Q2 of having County > 10 000
    C1 C2 Count (*)
    171966 2/28 / 2011 07:21:14 14990
    41895 2/28 / 2011 08:41:36 12193
    7408 2/28 / 2011 06:16:20 12158
    53120 2/28 / 2011 06:16:13 7931
    51724 2/28 / 2011 18:03:22 6783
    51724 2/28 / 2011 18:02:58 6757
    51724 2/28 / 2011 16:02:22 6451
    51724 2/28 / 2011 16:02:01 6388
    51724 2/28 / 2011 14:01:29 5979
    234233 2/28 / 2011 07:21:14 5975
    51724 2/28 / 2011 14:01:09 5917
    7408 2/28 / 2011 06:16:13 5355
    51724 2/28 / 2011 20:04:18 5074
    51724 2/28 / 2011 20:03:54 5058

    I see below three values in the current data set for T1, which is to have County > 75
    C1 C2 Count (*)
    4860 2/28 / 2011 19:33:45
    31217 2/28 / 2011 23:27:54
    31217 2/28 / 2011 23:48:14
    4860 2/28 / 2011 17:36:07
    4860 2/28 / 2011 20:00:11
    4860 2/28 / 2011 18:20:13
    4860 2/28 / 2011 14:35:39
    4860 2/28 / 2011 19:48:06
    4860 2/28 / 2011 12:30:29
    4860 2/28 / 2011 15:32:31
    4860 2/28 / 2011 17:48:05
    4860 2/28 / 2011 17:02:26
    4860 2/28 / 2011 22:27:02

    --> Yes the join is targeted on the larger partition, because the other is just empty.
      
    --> Here is the stats and the plan after having extended his stats collected on the Group column c1, c2 from T1 (by converting it to physics) with no histogram. now its giving a better estimate, which is the closure of real cardinality. But the problem is that in reality, table T1 is a global temporary table, so I'm not able to gather extended on that stat. Are there other work around for this quote?

    column_name density histogram Num_distinct

    SYS_STUMW3X8MDKZEJOG$ AHPEND1W $2699 NO 0.000370507595405706
      
    Execution plan
    ----------------------------------------------------------
    Hash value of plan: 1645075573

    ---------------------------------------------------------------------------------------------------------------------------------
    | ID | Operation | Name                     | Lines | Bytes | TempSpc | Cost (% CPU). Time | Pstart. Pstop |
    ---------------------------------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT |                          |   432K |   124 M |       | 70380 (1) | 00:14:05 |       |       |
    |*  1 |  HASH JOIN |                          |   432K |   124 M |  6280K | 70380 (1) | 00:14:05 |       |       |
    |   2.   JOIN FILTER PART CREATE | : BF0000 | 28835 |  5941K |       |   239 (1) | 00:00:03 |       |       |
    |   3.    STORE TABLE FULL ACCESS | T1 | 28835 |  5941K |       |   239 (1) | 00:00:03 |       |       |
    |   4.   RANGE OF PARTITION-JOIN FILTER |                          |  8900K |   772 M |       | 26453 (1) | 00:05:18 | : BF0000 | : BF0000 |
    |   5.    STORE TABLE FULL ACCESS | T2             |  8900K |   772 M |       | 26453 (1) | 00:05:18 | : BF0000 | : BF0000 |
    ---------------------------------------------------------------------------------------------------------------------------------

    Information of predicates (identified by the operation identity card):
    ---------------------------------------------------

    1 - access("T2".") C2 '= 'T1'.' C2"AND"T2 ". ' C1 '= "T1". ("" C1 ")

  • Partition wise joined possible with partitions of the interval?

    Hello

    I want to know the score wise join (NTC) is possible with interval partitioning - I can't find an explicit statement that he isn't, but I can't make it work - I did a simple test case to illustrate the issue.

    below, I have 2 create table scripts - 1 for the case of interval and 1 for the case of hash - I then a simple query on these 2 objects which should produce a NTC.

    In the case of hash, it works very well (see screenshot 2nd with a set of slaves), the first screenshot shows the case of the interval where I find myself with 2 sets of slaves and no NTC.

    No idea if this is possible and I just missed something?

    (for the test case choose the names of schema/storage appropriate for your system)

    Oh and version (I almost forgot... :-))-East 11.2.0.4.1 SLES 11)

    See you soon,.

    Rich

    -case interval

    CREATE TABLE 'SB_DWH_IN '. "' TEST1 '.

    TABLESPACE "SB_DWH_INTEGRATION".

    PARTITION BY RANGE ("OBJECT_ID") INTERVAL (10000)

    (PARTITION 'LESS_THAN_ZERO' VALUES LESS THAN (0) TABLESPACE "SB_DWH_INTEGRATION")

    in select * from DBA_OBJECTS where object_id is not null;

    CREATE TABLE 'SB_DWH_IN '. "" TEST2 ".

    TABLESPACE "SB_DWH_INTEGRATION".

    PARTITION BY RANGE ("OBJECT_ID") INTERVAL (10000)

    (PARTITION 'LESS_THAN_ZERO' VALUES LESS THAN (0) TABLESPACE "SB_DWH_INTEGRATION")

    in select * from DBA_OBJECTS where object_id is not null;

    -case of hash

    CREATE TABLE 'SB_DWH_IN '. "' TEST1 '.

    TABLESPACE "SB_DWH_INTEGRATION".

    8 partitions PARTITION OF HASH ("OBJECT_ID")

    store in ("SB_DWH_INTEGRATION")

    in select * from DBA_OBJECTS where object_id is not null;

    CREATE TABLE 'SB_DWH_IN '. "" TEST2 ".

    TABLESPACE "SB_DWH_INTEGRATION".

    8 partitions PARTITION OF HASH ("OBJECT_ID")

    store in ("SB_DWH_INTEGRATION")

    in select * from DBA_OBJECTS where object_id is not null;

    -query to run

    Select / * + PARALLEL(TEST2,8) PARALLEL(TEST1,8) * / *.

    of 'SB_DWH_IN '. "" TEST2 ","SB_DWH_IN ". "' TEST1 '.

    where TEST1.object_id = test2.object_id

    nonPWJ.PNG

    pwjenabled.PNG

    It is planned and a consequence of the estimate of the number of parallel slaves.

    To the parallel 41 each slave made 3 passes (i.e. sleeves 3 partitions).

    Add a partition (by table), and a set of slaves will have to manage a 4th pass: the cost of the query using NTC would increase from 33 percent even if the modification of the data is less than 0.8%.

    I guess that in the production Oracle distributes your lines of 1 M for a hash join.

    Because the decision is encrypted, it is possible that a very extreme tilt in partition in the table sizes billion line might overthrow the optimizer in a non - NTC join - but I have not tested that.

    If you want to force the plan John Watson suggestion for a hint of pq_distribute is relevant.  To cover all the bases and call your tables SMALL and LARGE

    /*+

    leading (FAT kid)

    USE_HASH (large)

    no_swap_join_inputs (large)

    PQ_DISTRIBUTE (wide none none)

    */

    If it's legal, that should do it.

    Concerning

    Jonathan Lewis

  • Bug with an outer join, or & Analytics function (or rownum)

    Hello

    Seems to be a combination of an outer join, OR and rownum confuses the CBO.

    First request is without rownum, the second is with rownum.

    The second query expects 203 t lines and never ends. It should behave the same as query 1, with 24 M lines.

    Remove the GOLD clause query 2 allows him to behave as a query 1, with 24 M lines.

    We never saw it? Is there a solution?

    SELECT *
      FROM message i
      LEFT JOIN (SELECT hi.message_id, hi.update_dt
                   FROM message_hist hi) h ON (t.id = h.master_id
                                           AND(t.update_dt = h.update_dt OR h.update_dt <TO_DATE('150901','RRMMDD')));
          
    -----------------------------------------------------------------------------------------------                                                                                                                                                                                                              
    | Id  | Operation           | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |                                                                                                                                                                                                              
    -----------------------------------------------------------------------------------------------                                                                                                                                                                                                              
    |   0 | SELECT STATEMENT    |                         |    24M|    13G|   475G  (2)|999:59:59 |                                                                                                                                                                                                              
    |   1 |  NESTED LOOPS OUTER |                         |    24M|    13G|   475G  (2)|999:59:59 |                                                                                                                                                                                                              
    |   2 |   TABLE ACCESS FULL | MESSAGE                 |  8037K|  1318M| 29883   (2)| 00:06:59 |                                                                                                                                                                                                              
    |   3 |   VIEW              |                         |     3 |  1302 | 59136   (2)| 00:13:48 |                                                                                                                                                                                                              
    |*  4 |    TABLE ACCESS FULL| MESSAGE_HIST            |     3 |   168 | 59136   (2)| 00:13:48 |                                                                                                                                                                                                              
    -----------------------------------------------------------------------------------------------                                                                                                                                                                                                              
                                                                                                                                                                                                                                                                                                                 
    Predicate Information (identified by operation id):                                                                                                                                                                                                                                                          
    ---------------------------------------------------                                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                                                                 
       4 - filter("I"."MESSAGE_ID"="HI"."MESSAGE_ID" AND                                                                                                                                                                                                                                                         
                  ("HI"."UPDATE_DT"<TO_DATE('150901','RRMMDD') OR "I"."UPDATE_DT"="HI"."UPDATE_DT"))     
    ----------------              
    SELECT *
      FROM message i
      LEFT JOIN (SELECT hi.message_id, hi.update_dt
                      , ROWNUM
                   FROM message_hist hi) h ON (t.id = h.master_id
                                           AND(t.update_dt = h.update_dt OR h.update_dt <TO_DATE('150901','RRMMDD')));
         
    -------------------------------------------------------------------------------------------------                                                                                                                                                                                                            
    | Id  | Operation             | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |                                                                                                                                                                                                            
    -------------------------------------------------------------------------------------------------                                                                                                                                                                                                            
    |   0 | SELECT STATEMENT      |                         |   203T|   112P|   475G  (2)|999:59:59 |                                                                                                                                                                                                            
    |   1 |  NESTED LOOPS OUTER   |                         |   203T|   112P|   475G  (2)|999:59:59 |                                                                                                                                                                                                            
    |   2 |   TABLE ACCESS FULL   | MESSAGE                 |  8037K|  1318M| 29883   (2)| 00:06:59 |                                                                                                                                                                                                            
    |   3 |   VIEW                |                         |    25M|    10G| 59151   (2)| 00:13:49 |                                                                                                                                                                                                            
    |*  4 |    VIEW               |                         |    25M|    10G| 59151   (2)| 00:13:49 |                                                                                                                                                                                                            
    |   5 |     COUNT             |                         |       |       |            |          |                                                                                                                                                                                                            
    |   6 |      TABLE ACCESS FULL| MESSAGE_HIST            |    25M|  1355M| 59151   (2)| 00:13:49 |                                                                                                                                                                                                            
    -------------------------------------------------------------------------------------------------                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                 
    Predicate Information (identified by operation id):                                                                                                                                                                                                                                                          
    ---------------------------------------------------                                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                                                                 
       4 - filter("I"."MESSAGE_ID"="H"."MESSAGE_ID" AND ("I"."UPDATE_DT"="H"."UPDATE_DT" OR                                                                                                                                                                                                                          
                  "H"."UPDATE_DT"<TO_DATE('150901','RRMMDD')))         
     
    

    RowNum in a subquery is supposed to ensure that the subquery is evaluated completely before filtering, otherwise, how could you go out rownum?

    Your question is compounded because of the join condition that forces a level of nested loops, which means that the table should be fully analysed once for each line of conduct rowsource. You can either transform the join in an equijoin and allow a hash to run, or you join could materialize the subquery once.

    Allow the hash join:

    SELECT count (*)

    Message FROM

    LEFT JOIN (SELECT hi.message_id, hi.update_dt

    ROWNUM

    OF message_hist salvation) PM ON (i.message_id = h.message_id

    AND i.update_dt = h.update_dt)

    LEFT JOIN (SELECT hi.message_id, hi.update_dt

    ROWNUM

    OF message_hist salvation) h2 ON (i.message_id = h2.message_id

    AND h2.update_dt<>

    AND h2.update_dt <> i.update_dt)

    /

    ----------------------------------------------------------------------------------------

    | ID | Operation | Name | Lines | Bytes | Cost (% CPU). Time |

    ----------------------------------------------------------------------------------------

    |   0 | SELECT STATEMENT |              |     1.    66.   211 (1) | 00:00:01 |

    |   1.  GLOBAL TRI |              |     1.    66.            |          |

    |*  2 |   EXTERNAL RIGHT HASH JOIN |              |   800 | 52800 |   211 (1) | 00:00:01 |

    |*  3 |    VIEW                 |              |     1.    22.    70 (0) | 00:00:01 |

    |   4.     COUNTY |              |       |       |            |          |

    |   5.      TABLE ACCESS FULL | MESSAGE_HIST |     1.    22.    70 (0) | 00:00:01 |

    |*  6 |    EXTERNAL RIGHT HASH JOIN |              |   800 | 35200.   141 (1) | 00:00:01 |

    |   7.     VIEW                |              |     1.    22.    70 (0) | 00:00:01 |

    |   8.      COUNTY |              |       |       |            |          |

    |   9.       TABLE ACCESS FULL | MESSAGE_HIST |     1.    22.    70 (0) | 00:00:01 |

    |  10.     TABLE ACCESS FULL | MESSAGE |   800 | 17600 |    70 (0) | 00:00:01 |

    ----------------------------------------------------------------------------------------

    Information of predicates (identified by the operation identity card):

    ---------------------------------------------------

    2 - access("I".") MESSAGE_ID '= 'H2'.' MESSAGE_ID "(+))"

    filter ("H2". "UPDATE_DT" (+)<>'I'. " ("' UPDATE_DT")

    3 - filter("H2".") UPDATE_DT "(+)<>

    6 - access("I".") "UPDATE_DT" ="H" UPDATE_DT "(+) AND"

    "I"." ' MESSAGE_ID ' ="H" MESSAGE_ID "(+))"

    Materialize the subquery:

    WITH h AS (SELECT / * + MATERIALIZE * / hi.message_id, hi.update_dt)

    ROWNUM

    OF message_hist salvation)

    SELECT count (*)

    Message FROM

    LEFT JOIN: ON (i.message_id = h.message_id

    AND (i.update_dt = h.update_dt OR h.update_dt<>

    ----------------------------------------------------------------------------------------------------------

    | ID | Operation | Name                        | Lines | Bytes | Cost (% CPU). Time |

    ----------------------------------------------------------------------------------------------------------

    |   0 | SELECT STATEMENT |                             |     1.    22.  1740 (0) | 00:00:01 |

    |   1.  TRANSFORMATION OF THE TEMPORARY TABLE.                             |       |       |            |       |

    |   2.   LOAD SELECT ACE | SYS_TEMP_0FD9D6810_5B8F6E67 |       |       |            |       |

    |   3.    COUNT                   |                             |       |       |            |       |

    |   4.     TABLE ACCESS FULL | MESSAGE_HIST |     1.    22.    70 (0) | 00:00:01 |

    |   5.   GLOBAL TRI |                             |     1.    22.            |       |

    |   6.    NESTED EXTERNAL LOOPS |                             |   800 | 17600 |  1670 (0) | 00:00:01 |

    |   7.     TABLE ACCESS FULL | MESSAGE |   800 | 17600 |    70 (0) | 00:00:01 |

    |   8.     VIEW                   |                             |     1.       |     2 (0) | 00:00:01 |

    |*  9 |      VIEW                  |                             |     1.    22.     2 (0) | 00:00:01 |

    |  10.       TABLE ACCESS FULL | SYS_TEMP_0FD9D6810_5B8F6E67 |     1.    22.     2 (0) | 00:00:01 |

    ----------------------------------------------------------------------------------------------------------

    Information of predicates (identified by the operation identity card):

    ---------------------------------------------------

    9 - filter("I".") ' MESSAGE_ID ' ="H" MESSAGE_ID' AND ('I'. "" "UPDATE_DT"="H" UPDATE_DT' OR

    "H"." UPDATE_DT ".<>

    You may need to change the first condition to make sure that you select the correct subquery.

    -edit

    Not able to view a plan but you can invade the second join condition select and then the result of a subquery with a predicate according to your requirement. This should delay the or rating and leave only an equijoin (although rowsource return may be slightly larger than the opposite).

    -Second edition, it did not work exactly when I tried it.

    A hybrid of the previous two plans with a slight modification of how he was imitating the GOLD:

    WITH h AS (SELECT / * + MATERIALIZE * / hi.message_id, hi.update_dt)

    ROWNUM Clotilde

    OF message_hist salvation)

    SELECT i.MESSAGE_ID

    i.UPDATE_DT

    COALESCE(h.message_id,h2.message_id) message_id

    , COALESCE (h.update_dt, h2.update_dt) update_dt

    Clotilde COALESCE (h.rown, h2.rown)

    Message FROM

    LEFT JOIN: ON (i.message_id = h.message_id

    AND i.update_dt = h.update_dt)

    LEFT JOIN: h2 WE (DECODE(h.message_id,,i.message_id) = h2.message_id - only try this if previous join returned NULL

    AND h2.update_dt<>

    /

    --------------------------------------------------------------------------------------------------------
    | ID | Operation | Name                      | Lines | Bytes | Cost (% CPU). Time |
    --------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT |                           |     1.    66.     8 (0) | 00:00:01 |
    |   1.  TRANSFORMATION OF THE TEMPORARY TABLE.                           |       |       |            |          |
    |   2.   LOAD SELECT ACE | SYS_TEMP_0FD9D6605_28F27F |       |       |            |          |
    |   3.    COUNT                   |                           |       |       |            |          |
    |   4.     TABLE ACCESS FULL | MESSAGE_HIST |   150.  3300 |     2 (0) | 00:00:01 |
    |   5.   GLOBAL TRI |                           |     1.    66.            |          |
    |*  6 |    EXTERNAL RIGHT HASH JOIN |                           | 10497.   676K |     6 (0). 00:00:01 |
    |*  7 |     VIEW                   |                           |   150.  3300 |     2 (0) | 00:00:01 |
    |   8.      TABLE ACCESS FULL | SYS_TEMP_0FD9D6605_28F27F |   150.  3300 |     2 (0) | 00:00:01 |
    |*  9 |     OUTER HASH JOIN |                           |   328. 14432 |     4 (0) | 00:00:01 |
    |  10.      TABLE ACCESS FULL | MESSAGE |   200 |  4400 |     2 (0) | 00:00:01 |
    |  11.      VIEW                  |                           |   150.  3300 |     2 (0) | 00:00:01 |
    |  12.       TABLE ACCESS FULL | SYS_TEMP_0FD9D6605_28F27F |   150.  3300 |     2 (0) | 00:00:01 |
    --------------------------------------------------------------------------------------------------------

    Information of predicates (identified by the operation identity card):
    ---------------------------------------------------

    6 - access("H2".") MESSAGE_ID "(+) = DECODE (TO_CHAR ('H'". "))" MESSAGE_ID"), NULL, 'I '. (("' MESSAGE_ID '))
    7 - filter("H2".") UPDATE_DT "(+)<>
    9 - access("I".") "UPDATE_DT" ="H" UPDATE_DT "(+) AND 'I'." "" ' MESSAGE_ID '="H" MESSAGE_ID "(+))"

    (This plan is another system if costs are not comparable)

  • When the join happens?

    Hi experts,

    When any type of join (hash join, merge join), please correct me if I'm wrong, first Oracle have to store all data for the other set of data, right join operation? Where do I store it? PGA is used for this operation? In other words, say that optimizer use the hash join method, (in-memory hash table must be built for this operation), he built in PGA or CMS?

    If it's in the PGA, what happens if the sorted data set will not match the memory?

    Thank you

    basically: Yes.

    If the input set is less than the value (explicitly defined or assigned automatically) for hash_area_size, then the hash join is made in memory of the PGA (as best of activities execution). If the game is bigger, it is necessary to empty the intermediate result in TEMP tablespace (causing onepass or operations activities even multipass). Jonathan Lewis described the mechanism in detail cost base fundamental Oracle.

  • ORDER BY in the subselect, then a join

    I had a difficult time in the research on this or even prove it so I came to the forum asking for advice.

    My question is how Oracle will take care of the scheduling of a query in which I

    SELECT *
      FROM (  SELECT *
                FROM (    SELECT LEVEL c1,
                                   LEVEL
                                 * TRUNC(DBMS_RANDOM.VALUE(1,
                                                           4))
                                    val_rnd
                            FROM DUAL
                      CONNECT BY LEVEL <= 200)
            ORDER BY c1 ASC) a,
           (    SELECT TRUNC(  LEVEL
                             * DBMS_RANDOM.VALUE(1,
                                                 4))
                          val_rnd2
                  FROM DUAL
            CONNECT BY LEVEL <= 200) b
     WHERE a.c1 = b.val_rnd2(+)
    

    In this case, my subselect is the order of the ascendants of c1. That part I understand, and I'll do my subquery 'a' ordered to become. My next step then joins this request sorted to a second table ('b').

    In this join, Oracle always preserves my initial sort or do I need to add a second order by a.C1 ASC after the join to guarantee that she will return in a sorted order.

    My example above returns the query how I want that it but I can't tell if it's just because of me be lucky and Oracle return correctly or if it will be guaranteed 100% function this way.

    I'm running on Oracle 11.2.0.4

    Oracle can transform your query into something that is easier to optimize, but with the same semantics. Being the ORDER BY in a view online, I guess that Oracle could even decide to ignore it (although I'm not sure of this point) - as Justin and Frank write already: only an ORDER BY in the main query will give you the 100% guarantee. If I run your query to 11.2.0.1 plan is:

    ----------------------------------------------------------------------------------------

    | ID | Operation | Name | Lines | Bytes | Cost (% CPU). Time |

    ----------------------------------------------------------------------------------------

    |   0 | SELECT STATEMENT |      |     1.    39.     6 (34) | 00:00:01 |

    |   1.  SORT ORDER BY |      |     1.    39.     6 (34) | 00:00:01 |

    |*  2 |   OUTER HASH JOIN |      |     1.    39.     5 (20) | 00:00:01 |

    |   3.    VIEW                         |      |     1.    26.     2 (0) | 00:00:01 |

    |*  4 |     CONNECT TO WITHOUT FILTERING.      |       |       |            |          |

    |   5.      QUICK DOUBLE |      |     1.       |     2 (0) | 00:00:01 |

    |   6.    VIEW                         |      |     1.    13.     2 (0) | 00:00:01 |

    |*  7 |     CONNECT TO WITHOUT FILTERING.      |       |       |            |          |

    |   3 ×      QUICK DOUBLE |      |     1.       |     2 (0) | 00:00:01 |

    ----------------------------------------------------------------------------------------

    Information of predicates (identified by the operation identity card):

    ---------------------------------------------------

    2 - access("from$_subquery$_002".") C1 '=' B '. "VAL_RND2" (+)) "

    4 filter (LEVEL<>

    7 filter (LEVEL<>

    So we can see that Oracle made a transformation and moved the SORT ORDER BY in subquery: with this plan, you get the correct order - but there is no guarantee that the Oracle will use this plan (and transformation) in different versions / with different settings etc.

  • Join the cardinality statistics interesting the result

    Hello Experts,

    I just read an article on the CARDINALITY of the JOIN related things Oracle: function table and join cardinality estimates , without collection of statistics (with the help of dynamic sampling) optimizer calculates the CARDINALITY of JOIN pain why? I mean, I'm trying to understand the CARDINALITY JOIN, must rely on the statistics? Although I don't have the statistics to gether, as you can see on the optimizer estimates that correct cardinalities of table, below, but JOIN CARDINALITY completely wrong. What do you think about this behavior? Dynamic sampling misled the optimizer?

    drop table t1 purge;

    drop table t2 purge;

    create table t1

    as

    Select rownum id, mod (rownum, 10) + 1 as fk, rpad ('X', 10) filter from dual connect by level < = 1000;

    create table t2 as

    Select rownum + id 20, rpad ('X', 10) filter from dual connect by level < = 10;

    explain plan for

    Select * from t1 join t2 on t1.fk = t2.id;

    Select * from table (dbms_xplan.display);

    Hash value of plan: 2959412835

    ---------------------------------------------------------------------------

    | ID | Operation | Name | Lines | Bytes | Cost (% CPU). Time |

    ---------------------------------------------------------------------------

    |   0 | SELECT STATEMENT |      |  1000 | 53000.     8 (13) | 00:00:01 |

    |*  1 |  HASH JOIN |      |  1000 | 53000.     8 (13) | 00:00:01 |

    |   2.   TABLE ACCESS FULL | T2 |    10.   200 |     3 (0) | 00:00:01 |

    |   3.   TABLE ACCESS FULL | T1 |  1000 | 33000 |     4 (0) | 00:00:01 |

    ---------------------------------------------------------------------------

    Information of predicates (identified by the operation identity card):

    ---------------------------------------------------

    1 - access("T1".") FK '= 'T2'.' (ID')

    Note

    -----

    -dynamic sample used for this survey (level = 2)

    exec dbms_stats.gather_table_stats (user, 'T1');

    exec dbms_stats.gather_table_stats (user, 'T2');

    explain plan for

    Select * from t1 join t2 on t1.fk = t2.id;

    Select * from table (dbms_xplan.display);

    Hash value of plan: 2959412835

    ---------------------------------------------------------------------------

    | ID | Operation | Name | Lines | Bytes | Cost (% CPU). Time |

    ---------------------------------------------------------------------------

    |   0 | SELECT STATEMENT |      |     1.    32.     8 (13) | 00:00:01 |

    |*  1 |  HASH JOIN |      |     1.    32.     8 (13) | 00:00:01 |

    |   2.   TABLE ACCESS FULL | T2 |    10.   140.     3 (0) | 00:00:01 |

    |   3.   TABLE ACCESS FULL | T1 |  1000 | 18000 |     4 (0) | 00:00:01 |

    ---------------------------------------------------------------------------

    Information of predicates (identified by the operation identity card):

    ---------------------------------------------------

    1 - access("T1".") FK '= 'T2'.' (ID')

    Thanks in advance.

    take a look at CBO (event 10053) track for your example, I see the different calculations for the cardinality of the join:

    -without statistics

    Join map: 1000.000000 == external (10.000000) * inner (1000.000000) * salt (0.100000)

    -with statistics

    Join map: 0.000000 == external (10.000000) * inner (1000.000000) * salt (0.000000)

    Salt (0.100000) for execution with a dynamic sampling corresponds to the standard formula: selectivity join = 1 / greater (num_distinct (t1.fk), num_distinct (t2.id)). In the CBO trace for this run, I also see information on sampling:

    SELECT / * OPT_DYN_SAMP * / / * + ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL (SAMPLESUB) opt_param ('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX (SAMPLESUB) NO_SQL_TUNE * / NVL (SUM (C1), 0), NVL (SUM (C2), 0), COUNT (DISTINCT (C3), NVL (SUM (CASE WHERE C3 IS NULL THEN 1 ELSE 0 END), 0) FROM (SELECT / * + NO_PARALLEL ('T1') FULL ('T1') NO_PARALLEL_INDEX ('T1') * / 1 AS C1 1 IN C2) , 'T1 '. ("" FK "AS C3 FROM"T1""T1") SAMPLESUB

    2013-12-13 14:14:19.584

    * Run the dynamic sampling query:

    level: 2

    PCT. example: 100.000000

    the actual sample size: 1000

    map of the filtered sample. : 1000

    map of the orig. : 572

    block the NTC. Table stats: 7

    block the NTC. for sampling: 7

    Maximum sample block cnt. : 64

    sample block cnt. : 7

    NDV C3: 10

    scale: 10 h 00

    values NULL C4: 0

    scaling: 0.00

    min. salt. is. :-1.00000000

    * Stats pass. dynamic sampling. :

    Column (#2): FK (manufacturer: 0)

    AvgLen: 22 NDV: 10 NULL values: density 0: 0.100000

    * By dynamic sampling NULLs estimates.

    * Using dynamic sampling NDV estimated.

    Scaling using malaria cardinality = 1000.

    * With the help of dynamic sampling card. : 1000

    * Updated dynamic sampling table card.

    Table: Alias T1: T1

    Map: Original: 1000.000000 rounded: 1000 calculated: 1000.00 unadjusted: 1000.00

    Path: analysis

    Cost: 2.00 RESP: 2,00 degree: 0

    Cost_io: 2.00 Cost_cpu: 239850

    Resp_io: 2.00 Resp_cpu: 239850

    Best: AccessPath: analysis

    Cost: 2.00: 1 degree Resp: 2,00 map: 1000,00 bytes: 0

    The application of sampling does not contain information on the range of values for the join columns - if the optimizer has no choice but to use the standard formula for the selectivity of the join.

    According to the statistics of the CBO knows the HIGH_VALUE and LOW_VALUE for the join columns and can guess that there is no intersection - and thus the selectivity is set to 0 (resulting in a cardinality of 1).

  • What is the differene between the operation of the filter and NL semi join?

    Hi all.

    The oracle is gr 10, 2 and 11 GR 2.

    There are differences between the operation of the filter and the HASH/MERGE semi join in the treatment of subqueries.

    But I couldn't see the difference between the operation of the filter and NL semi join?
    (the only thing I know is that the operation of the filter mechanism has cached, but I'm not sure if semi NL)

    What are the advantages and disadvantages of each of them?

    Thanks in advance.
    Best regards.

    >
    The oracle is gr 10, 2 and 11 GR 2.
    >
    These are not "version. What are the 4-digit versions?
    >
    There are differences between the operation of the filter and the HASH/MERGE semi join in the treatment of subqueries.
    >
    What is? When you make a statement like this post the information (and links) that backs up your statement.
    >
    But I couldn't see the difference between the operation of the filter and NL semi join?
    (the only thing I know is that the operation of the filter mechanism has cached, but I'm not sure if semi NL)
    >
    We cannot 'see the difference' you don't has not post anything to watch us.
    >
    What are the advantages and disadvantages of each of them?
    >
    Each of what? Again - it must be as specific as possible if you want specific comments.

    In addition, given that you should generally not use indications in production in any case code what does what are the differences? Oracle will make the choice.

    Without something specific to comment on everything that we can do is provide several links that talk about joins and how to produce and review them. These are all notebook of Jonathan Lewis:

    http://jonathanlewis.WordPress.com/?s=semi-join
    http://jonathanlewis.WordPress.com/2010/12/20/index-join-4/
    http://jonathanlewis.WordPress.com/2010/08/15/joins-MJ/--Les merge joins
    http://jonathanlewis.WordPress.com/2010/08/10/joins-HJ/--Les hash joins
    nested loops http://jonathanlewis.WordPress.com/2010/08/09/joins-NLJ/--joings
    http://jonathanlewis.WordPress.com/2011/06/08/how-to-hint-1/--comment tip for joins

    Check out these articles. Pay attention to how, in each of them, it does not provide only a text description but also provides the code and explains the code.

    In other words, it provides information specific to illustrate what he is talking about. That's what you need to do if you need help with a specific topic: provide the query plans and output that you use as the basis for your question. Allows us to see EXACTLY what you're talking about.

  • Inner join with an asymmetric column: plan not optimized SQL

    Hello

    I think that it is a FAQ, but I was unable to get a useful answer by Googling.

    Oracle version is 11.2.0 on Sparc64.

    Consider the following code:
    drop table x;
    PROMPT
    PROMPT Populate x with some users. Note the iduser PK
    PROMPT
    
    create table x as
    select 1 iduser, 'PUBLIC' owner from dual
    union
    select 2 iduser, 'SYSTEM' owner from dual
    union
    select 3 iduser, 'XDB' owner from dual
    union
    select 4 iduser, 'APPQOSSYS' owner from dual
    union
    select 5 iduser, 'SYS' owner from dual
    union
    select 6 iduser, 'OUTLN' owner from dual
    union
    select 7 iduser, 'DBSNMP' owner from dual
    /
    
    alter table x add constraint pk_x primary key(iduser);
    
    
    PROMPT
    PROMPT Create a table y from all_objects, but using the previous iduser 
    PROMPT as foreign key (owner column is not needed). Note the index
    PROMPT on column iduser
    PROMPT
    
    drop table y
    /
    create table y as
    select x.iduser, all_objects.*
    from all_objects, x
    where x.owner = all_objects.owner
    and x.owner in  ( 'PUBLIC', 'SYSTEM', 'XDB', 'APPQOSSYS',
        'SYS', 'OUTLN', 'DBSNMP')
    /
    alter table y drop column owner;
    alter table y add constraint y_fk foreign key(iduser) references x;
    create index idx_y on y(iduser);
    
    PROMPT
    PROMPT Take some stats. X stats are irrelevant, I think
    PROMPT
    exec dbms_stats.gather_table_stats( -
        USER, -
        'Y', -
        estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE, -
        degree => DBMS_STATS.DEFAULT_DEGREE, -
        cascade => true, -
        method_opt => 'FOR COLUMNS IDUSER SIZE AUTO', -
        granularity => 'ALL' -
    )
    exec dbms_stats.gather_table_stats( -
        USER, -
        'X', -
        estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE, -
        degree => DBMS_STATS.DEFAULT_DEGREE, -
        cascade => true, -
        method_opt => 'FOR COLUMNS IDUSER SIZE AUTO', -
        granularity => 'ALL' -
    )
    
    set autotrace trace exp
    
    PROMPT
    PROMPT APPQOSSYS has only 5 objects (well, your output may vary, but it should by
    PROMPT very similar), but the following query ignore the index and do a full scan on Y
    PROMPT
    select x.*, y.*
    from x, y
    where x.owner = 'APPQOSSYS'
    and y.iduser = x.iduser
    /
    
    PROMPT
    PROMPT Virtually, equivalent to the previous one. But the explain plan is very different
    PROMPT and the index is used
    PROMPT
    
    select y.*
    from x, y
    where x.owner = 'APPQOSSYS'
    and y.iduser = 4
    /
    
    set autotrace off
    The result is:
    ............................ [snip] .........................
    
    ( EXPLAIN PLAN OF FIRST QUERY)
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 1702571549
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |   559 | 58136 |   142   (1)| 02:23:19 |
    |*  1 |  HASH JOIN         |      |   559 | 58136 |   142   (1)| 02:23:19 |
    |*  2 |   TABLE ACCESS FULL| X    |     1 |     9 |     2   (0)| 00:02:02 |
    |   3 |   TABLE ACCESS FULL| Y    | 55883 |  5184K|   139   (0)| 02:20:47 |
    ---------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - access("Y"."IDUSER"="X"."IDUSER")
       2 - filter("X"."OWNER"='APPQOSSYS')
    
    
    
    ( EXPLAIN PLAN OF SECOND QUERY)
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 2241001346
    
    -------------------------------------------------------------------------------------
    | Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
    -------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |       |    10 |   950 |     2   (0)| 00:02:02 |
    |   1 |  TABLE ACCESS BY INDEX ROWID| Y     |    10 |   950 |     2   (0)| 00:02:02 |
    |*  2 |   INDEX RANGE SCAN          | IDX_Y |    10 |       |     1   (0)| 00:01:01 |
    -------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - access("Y"."IDUSER"=4)
    Well, the question is very simple: is it possible to get the first query to take the index and to avoid complete analysis? With no clues, of course.

    All of your advice and comments will be welcome. Thanks in advance.

    Best regards

    jjuanino wrote:

    Well, the question is very simple: is it possible to get the first query to take the index and to avoid complete analysis? With no clues, of course.

    All of your advice and comments will be welcome. Thanks in advance.

    This is a limit known optimizer - no realistic solution.

    The optimizer can recognize with the help of a histogram iduser on table column is very unevenly distributed, but there is not a generic strategy to recognize what owner on the table of X corresponds to what iduser on the table of Y.

    An approach that can help - even if I do not remember seeing documented is to rewrite the query with a subquery and use the / * + precompute_subquery * / tip.
    I have not tried with your data, but something like:

    select * from y
    where iduser in (
        select /*+ precompute_subquery */ x.iduser
        from x
        where owner = 'APPQSYS'
        )
    ;
    
    Regards
    Jonathan Lewis
    
    P.S.  Found a reference note by Tanel Poder: http://blog.tanelpoder.com/2009/01/23/multipart-cursor-subexecution-and-precompute_subquery-hint/
    

Maybe you are looking for