November | 2015 | Vinay's Notes

Dynamic Sampling (DS) /*+ dynamic_sampling(4) */ was introduced in Oracle Database 9i Release 2 to improve the optimizer’s ability to generate good execution plans. The most common misconception is that DS can be used as a substitute for optimizer statistics. The goal of DS is to augment the optimizer statistics; it is used when regular statistics are not sufficient to get good quality cardinality estimates.

During the compilation of a SQL statement, the optimizer decides whether to use DS or not by considering whether the available statistics are sufficient to generate a good execution plan. If the available statistics are not enough, dynamic sampling will be used. It is typically used to compensate for missing or insufficient statistics that would otherwise lead to a very bad plan. For the case where one or more of the tables in the query does not have statistics, DS is used by the optimizer to gather basic statistics on these tables before optimizing the statement. The statistics gathered in this case are not as high a quality or as complete as the statistics gathered using the DBMS_STATS package. This trade off is made to limit the impact on the compile time of the statement.

begin
   dbms_stats.gather_schema_stats(
      ownname          => 'SCOTT', 
      estimate_percent => dbms_stats.auto_sample_size, 
      method_opt       => 'for all columns size skewonly', 
      degree           => 7
   );
end;
/

The second scenario where DS is used is when the statement contains a complex predicate expression and extended statistics are not available. Extended statistics were introduced in Oracle Database 11g Release 1 with the goal to help the optimizer get good quality cardinality estimates for complex predicate expressions. For example, if you had a simple query that has where clause predicates on two correlated columns, standard statistics would not be sufficient. Consider the following SQL statement and its execution plan :

Explain plan for
     Select *
          From Customers
          Where cust_city='Los Angeles'
          And cust_state_province='CA';

Select * from table(dbms_xplan.display());

With standard statistics the Optimizer estimates the cardinality as 90 rows. The actual number of rows returned by this query is 916. By setting OPTIMIZER_DYNAMIC_SAMPLING to level 4, the optimizer will use dynamic sampling to gather additional information about the complex predicate expression. The additional information provided by dynamic sampling allows the optimizer to generate a more accurate cardinality estimate and therefore a better performing execution plan.

Alter session set optimizer_dynamic_sampling=4;

Explain plan for
     Select *
          From Customers
          Where cust_city='Los Angeles'
          And cust_state_province='CA';

Select * from table(dbms_xplan.display());

You may now be wondering why we had to set the parameter OPTIMIZER_DYNAMIC_SAMPLING to 4 . Dynamic sampling is controlled by the parameter OPTIMIZER_DYNAMIC_SAMPLING, which can be set to different levels (0-10). These levels control two different things; when dynamic sampling kicks in and how large a sample size will be used to gather the statistics. The greater the sample size the bigger impact DS has on the compilation time of a query.

Level	When Dynamic Sampling will be used	Sample size (blocks)
0	Switches off dynamic sampling	N/A
1	At least one non-partitioned table in the statement has no statistics	32
2 (default)	One or more tables in the statement have no statistics	64
3	Any statement that meets level 2 criteria and any statement that has one or more expressions used in the where clause predicates e.g. Where substr(CUSTLASTNAME,1,3) or Where a + b =5	64
4	Any statement that meets level 3 criteria and any statement that has complex predicates. An OR or AND operator between multiple predicates on the same table	64
5	Any statement that meets level 4 criteria	128
6	Any statement that meets level 4 criteria	256
7	Any statement that meets level 4 criteria	512
8	Any statement that meets level 4 criteria	1024
9	Any statement that meets level 4 criteria	4086
10	All statements	All Blocks

There are two ways to use dynamic sampling:
(1) The OPTIMIZER_DYNAMIC_SAMPLING parameter can be set at the database instance level and can also be overridden at the session level with the ALTER SESSION command.
(2) The DYNAMIC_SAMPLING query hint can be added to specific queries.

Another Example Scenario

Poor cardinality estimates, looking at a quarter of the table data

SQL> select * from t where flag1 = 'N' and flag2 = 'N';

Execution Plan
----------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 17014 |  1744K|   292   (1)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T    | 17014 |  1744K|   292   (1)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
----------------------------------------------------

   1 - filter("FLAG1" = 'N' AND "FLAG2" = 'N')

Above artifacts shows that when the predicate gets just a little more complex, the optimizer misses the estimated cardinality by a huge amount—it doesn’t know about the relationship between FLAG1 and FLAG2.

Good cardinality estimate, looking at only six rows

SQL> select /*+ dynamic_sampling(t 3) */ * from t where flag1 = 'N' and flag2 = 'N';

Execution Plan
-----------------------------
Plan hash value: 470836197

------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    6 |   630 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |    6 |   630 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IDX |    6 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
----------------------------------------------------

   2 - access("FLAG1"='N' AND "FLAG2"='N')

Note the much better row estimate (6 instead of more than 17,000) in Listing 7, compared to Listing 6, and also note the very different (and now optimal) plan. Instead of a full table scan, the optimizer has decided to use the index, and this execution plan will definitely be optimal compared to the original full table scan, because in reality, no data will be returned. The index will discover that instantly.

Cost-Based Optimizer estimates of Join Cardinality

In the absence of column histograms, Oracle CBO must be able to “guess” this information, and it sometimes gets it wrong. This is one reason why the ORDERED hint is one of the most popular SQL tuning hints; using the ORDERED hint allows you to specify that the tables be joined together in the same order that they appear in the FROM clause, like this:

select /+ ORDERED */ student_name
     from student
          natural join course
          natural join professor
     where
          professor = 'Vipin Chauhan'
          and course = 'Oracle'

Vinay's Notes

Developer : Java | J2EE | Hadoop | Big Data

Month: November 2015

Dynamic Sampling in Oracle

ORA-01775 Looping chain of Synonyms