How To Monitor Oracle DBs On Docker Containers With EM 13c

This post continues the thread of DevOps and automation for the Oracle Database. During the last couple of months I’ve seen more interest in deploying Oracle Databases on Docker containers. Even more now that Oracle released full support of RAC Databases running on Docker.



After you deploy your first Oracle Database on Docker, several questions come into mind; like:

  • How do I monitor the status of the Oracle Database?
  • How do I manage the performance of the Oracle Database and the SQL’s being executed on it?
  • How do I alert on issues on a timely manner?

Well, if you have the same questions or concerns… you are not alone!

On this post, I’m trying to explain the process I followed in order to deploy an Oracle Enterprise Manager agent on a Docker container and how I do monitor and manage the Oracle Database running on it.

First things first. Just a quick recap of my environment.

  • Oracle Linux 7 host running Docker 19.03
  • Create a MACVLAN network on Docker. My network adapter is ens3 on the host running Docker.
sudo docker network create -d macvlan --subnet=192.168.56.0/24 --gateway=192.168.56.1 -o parent=ens3 orcl_nw
  • Create a Docker volume type “nfs” using an NFS share named /NFSSHARE
sudo docker volume create --driver local --opt type=nfs --opt o=addr=<nfs server>,rw,bg,hard,tcp,vers=3,timeo=600,rsize=32768,wsize=32768,actimeo=0 --opt device=:/NFSSHARE agentstorage
  • Pull the 19.3 Docker image from container-registry.oracle.com
sudo docker pull container-registry.oracle.com/database/enterprise:19.3.0.0
  • Create the Docker container using the recently pulled image and mounting the volume “agentstorage” into “/u01” inside the container
sudo docker create -t -i \
--name ol7-orcl --hostname ol7-orcl --user oracle --ip 192.168.56.101 \
-e ORACLE_SID=ORCL \
-e ORACLE_PDB=PDB1 \
-v agentstorage:/u01 \
container-registry.oracle.com/database/enterprise:19.3.0.0
  • Disconnect the bridged Docker network from the container
sudo docker network disconnect bridge ol7-orcl
  • Connect the container to the MACVLAN network
sudo docker network connect orcl_nw --ip 192.168.56.101 ol7-orcl
  • Start our Docker container
sudo docker start ol7-orcl
  • Review the startup process and the creation of the database
sudo docker logs -f ol7-orcl

At this point we have our Docker container and the ORCL database up and running. But now I need a way to connect my ORCL database to the external world. Because I’m running a MACVLAN, I can create another Docker container in the same network and install Oracle Enterprise Manager (EM) in order to monitor it. Or I can also install Oracle EM on the host running Docker and setup a network link and a route to my Docker container. Let’s do the second option.

sudo ip link add mydocker-net link ens3 type macvlan mode bridge
sudo ip addr add 192.168.56.1/32 dev mydocker-net
sudo ip link set mydocker-net up
sudo ip route add 192.168.56.0/24 dev mydocker-net

I now have connectivity between my host and the Docker container.

$ ping 192.168.56.101
PING 192.168.56.101 (192.168.56.101) 56(84) bytes of data.
64 bytes from 192.168.56.101: icmp_seq=1 ttl=64 time=0.056 ms
64 bytes from 192.168.56.101: icmp_seq=2 ttl=64 time=0.070 ms
64 bytes from 192.168.56.101: icmp_seq=3 ttl=64 time=0.062 ms
64 bytes from 192.168.56.101: icmp_seq=4 ttl=64 time=0.073 ms
64 bytes from 192.168.56.101: icmp_seq=5 ttl=64 time=0.070 ms

sh-4.2$ ping 192.168.56.1
PING 192.168.56.1 (192.168.56.1) 56(84) bytes of data.
64 bytes from 192.168.56.1: icmp_seq=1 ttl=64 time=0.064 ms
64 bytes from 192.168.56.1: icmp_seq=2 ttl=64 time=0.079 ms
64 bytes from 192.168.56.1: icmp_seq=3 ttl=64 time=0.043 ms
64 bytes from 192.168.56.1: icmp_seq=4 ttl=64 time=0.082 ms
64 bytes from 192.168.56.1: icmp_seq=5 ttl=64 time=0.077 ms

The next step is to have name resolution setup between both hosts. For the Docker container it was easy to setup the full host name into the external DNS service. For the host resolution inside the Docker container you can use static entries in the /etc/hosts file or use the Docker embedded DNS server. For my exercise, I setup an entry in my /etc/hosts file providing the host name and the IP address of the host where EM is running.

The rest of the process is very straight forward. I use the silent install option for the Oracle EM agent. Below link has the steps to download the agent binaries for your specific platform and version.



Once you have the ZIP file of the agent binaries. Then just copy it inside the NFS share and connect to the Docker container.

sudo docker exec -it --user oracle ol7-orcl /bin/sh

After you login to the container just follow the steps in the provided agent install guide. Update the response file with the required information and run agentDeploy.sh. Make sure that the Agent Home resides in the Docker volume created. For this exercise will be inside /u01 directory.

After the agent deploy succeeds disconnect from the Docker session and re-connect as root.

sudo docker exec -it --user root ol7-orcl /bin/sh

Execute root.sh inside the Agent Home directory.

Let’s now verify that our EM has a new host target named ol7-orcl and is up and running.

Docker Container ol7-orcl on Oracle Enterprise Manager

The next step is to discover the ORCL database in EM. You can change the DBSNMP password by logging to the Docker container, then use SQLPlus to change the password and unlock the account.

alter user dbsnmp identified by <password> account unlock;

After the database discovery, verify the database target in the EM console.

ORCL database home page

Now that you have both the host target and the database target you can set metrics, thresholds and notifications on them.

Another important option is that if you have Diagnostics and Tuning packs, you can also monitor performance, get AWR/ASH reports and use EM to graphically analyze performance issues.

ORCL performance information

Hope this post helps understand the options you have in order to setup the required monitoring for the Oracle databases running on Docker containers.

Thanks,
Alfredo

Oracle Database Performance Diagnostics, Tuning and Validation with Oracle Enterprise Manager 13c Workshops

Last month we held 2 workshops where we went through some capabilities of Oracle Enterprise Manager 13c that help with performance diagnostics, tuning and validation for the Oracle Database.

If you were not able to attend and you want to learn about these capabilities, I recommend you take a look at the recording using the link below.


Session 1: August 4th 2021


Session 2: August 25th 2021


Thanks,
Alfredo

Oracle Enterprise Manager Performance Hub

Oracle Enterprise Manager (EM) 13.4 RU 2 has just been released this month. You may noticed that starting from EM 13.3 PG, the Top Activity page has been decommissioned. The Performance Hub page is the one that is available and yes, is not using Flash. If you have concerns about no longer having access to your EM Flash based graphs once the Web browsers stop supporting it, then your best bet is to upgrade your EM system.



Let’s talk about the features available on the Performance Hub page.

First of all, keep in mind that this page requires the Database Diagnostics and Tuning packs. All the data used by this page is provided thanks to Oracle’s instrumentation and stored in the Automatic Workload Repository (AWR). In order to open the Performance Hub page, from the Database Home page navigate to Performance -> Performance Hub -> ASH Analytics.

Oracle Enterprise Manager Performance Menu
ASH Analytics Home Page

I’m dividing the ASH Analytics Home page into 5 main sections:

  1. Performance Hub main timeline (red)
  2. Tab selection between ASH Analytics and SQL Monitoring (blue)
  3. Average Active Sessions graph (yellow)
  4. SQL ID breakdown information (purple)
  5. User Session breakdown information (green)

The Performance Hub main timeline shows the current database performance divided in three classes; CPU utilization, User I/O and Wait. You may try to relate this to the old Top Activity graph, this one has additional functionality though.

Using ASH analytics you can customize your performance analysis by adjusting the timeline selector. Remember that in the Top Activity page this was fixed to 5 minutes.

Customize Your Performance Analysis

This becomes real handy when you try to isolate performance issues in the database. Once you select the desired timeline for you analysis, the bottom of the page will update based on you selection. Only data data falls under you selection will be displayed in sections 3,4 and 5.

On top of this, we still have the classic time related selectors and page refresh rate.

Page Refresh Rate – Time Selector

Using the tab selector you can quickly switch between ASH Analytics and SQL Monitoring.

Tab Selector

If you click on SQL Monitoring, sections 3, 4 and 5 of the ASH analytics page will be replaced with the SQL Monitoring session information table. Keep in mind that the information showed in the table will be limited by the timeline selector that we already discussed. Adjust you selection accordingly.

SQL Monitoring

Let’s now move to the Average Active Sessions section. This section displays detailed information of the current active sessions in the database during the selected period of time. You have the flexibility to adjust the graph by changing the graph’s dimensions. By default the graph will use the Wait Class a the main dimension.

Average Active Sessions

Apart of that you can also adjust the Maximum CPU limit line to show the limit based on the number of CPU’s, the number of CPU cores or you may decide to not show the limit at all.

Maximum CPU Limit Line

Another option is to only show foreground sessions (default) or to also include background sessions. You can even choose on whether show the standard load graph or to switch into a heat map like chart and select as many dimensions as needed.

Average Active Sessions Heat Map

The bottom of the page will display two tables. The one on the left side will show all the SQL information and the one on the right side the session information. You should be pretty familiar with these as the old Top Activity page was showing them as well.

You can also customize the view of these two tables and adjust the dimension. By default is set to the Wait Class.

SQL ID Dimension Selector

Last but not least is the ability to select a SQL ID from the table and either call SQL Tuning advisor or to create a SQL Tuning Set from the same page.

SQL Tuning Advisor / SQL Tuning Set Buttons

As you can see, this is not the old Top Activity page that we were used to, but definitively has more features and offers way more flexibility while working on monitoring or when trying to spot performance issues in the database.

Thanks,
Alfredo

Compression Advisor killed my database!

Over the weekend one of the databases hung due to the flash recovery area was 100% full. I noticed one J001 process consuming significant CPU and I/O resources. Turns out this process was the automatic segment advisor job that runs on the weekend maintenance window.
The SQL executed was something like:

CREATE TABLE .dbms_tabcomp_temp_uncmp
TABLESPACE NOLOGGING
AS
SELECT /*+ FULL(.

) */
*
FROM .

After reading Oracle note Id 13463481.8 and confirming this with an SR, this is related to a bug for 11.2.0.3 version and fixed in 11.2.0.4. This bug generates excessive amount of redo when running the compression advisor on a table with a LOB column in a database running in ARCHIVELOG mode.

As we can’t just apply the required patch to the ORACLE_HOME right away, we decided to perform the workaround of disabling the automatic segment advisor task. The compression advisor is part of the segment advisor and is not possible just to disable one or the other.

To disable the segment advisor:

SQL> BEGIN
dbms_auto_task_admin.disable(
client_name => ‘auto space advisor’,
operation => NULL,
window_name => NULL);
END;
/  2    3    4    5    6    7
PL/SQL procedure successfully completed.
After executing the procedure, verify that the “auto space advisor” is disabled.

SQL> SELECT client_name, status FROM dba_autotask_client;
CLIENT_NAME                                                      STATUS
—————————————————————- ——–
auto optimizer stats collection                                  ENABLED
auto space advisor                                               DISABLED
sql tuning advisor                                               ENABLED
Although the advisor will not automatically run, you can always run it manually on the segments or indexes you want to be analyzed.
Thanks,

Alfredo

OEM 12c very slow after upgrade to 12.1.0.4

I noticed that OEM 12c console was very slow a few hours after the upgrade to 12.1.0.4 version. 

Looking at the repository DB, found several OMS sessions consuming significant CPU resources.
Bug 19199023 explains that some SQL queries executed against the repository consume high CPU on the servers. This bug affects the DB plug-in 12.1.0.6 and the patch 19176910 should be applied to the plug-ins.

More information available on MOS note, 12.1.0.4 OEM: High CPU utilization on Repository Database due to SYSMAN query WITH TARGETGUID AS (SELECT target_guid, host_name FROM mgmt_targets (Doc ID 1912172.1)
Thanks,

Alfredo

Understanding Oracle SQL Plan Management SPM – Part 3

This is the third post of SQL Plan Management. In the previous post, Understanding SQL Plan Management– Part 2, I described the three main components of SPM, this time I want to show you how to manually capture plans from the cursor cache. In other words, manually capture plans that reside in memory.

In order to manually capture plans, I’m considering that automatic capture is disabled or OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES is set to FALSE. Oracle provides a function named DBMS_SPM.load_plans_from_cursor_cache to manual capture plans from cursor cache.

DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE (
   sql_id            IN  VARCHAR2,
   plan_hash_value   IN  NUMBER   := NULL,
   sql_text          IN  CLOB,
   fixed             IN  VARCHAR2 := ‘NO’,
   enabled           IN  VARCHAR2 := ‘YES’)
 RETURN PLS_INTEGER;
The common use for this function, is to specify the sql_id of the statement we want to capture. We also can specify the plan_hash_value or the sql_text. Two more attributes can be set within this function, fixed is ‘NO’ by default and enabledis ‘YES’ by default. If you want to load plans, but you don’t want the CBO to make use if them; set the attribute enabled to ‘NO’.

In the below example I’m capturing all the SQL plans for sql_id4c372tsuhtunm and printing the number of plans loaded:
SQL> SET SERVEROUTPUT ON
DECLARE
  l_plans_loaded  PLS_INTEGER;
BEGIN
  l_plans_loaded := DBMS_SPM.load_plans_from_cursor_cache(
    sql_id => ‘4c372tsuhtunm’);
  DBMS_OUTPUT.put_line(‘Plans Loaded: ‘ || l_plans_loaded);
END;
/SQL>   2    3    4    5    6    7    8    9
Plans Loaded: 8
PL/SQL procedure successfully completed.
To verify the SQL baselines created for this sql_id, I just need to query DBA_SQL_PLAN_BASELINES view:
SQL> select SIGNATURE,SQL_HANDLE,PLAN_NAME,ORIGIN,ENABLED,ACCEPTED,FIXED,REPRODUCED from dba_sql_plan_baselines;
            SIGNATURE SQL_HANDLE            PLAN_NAME                      ORIGIN         ENABLED ACCEPTED FIXED REPRODUCED
——————— ——————— —————————— ————– ——- ——– —– ———-
  5535156277878016190 SQL_4cd0d34ee73148be  SQL_PLAN_4tn6m9vmm2k5ye8c8b02d MANUAL-LOAD    YES     YES      NO    YES
  5535156277878016190 SQL_4cd0d34ee73148be  SQL_PLAN_4tn6m9vmm2k5yc568a49d MANUAL-LOAD    YES     YES      NO    YES
  5535156277878016190 SQL_4cd0d34ee73148be  SQL_PLAN_4tn6m9vmm2k5ybff74238 MANUAL-LOAD    YES     YES      NO    YES
  5535156277878016190 SQL_4cd0d34ee73148be  SQL_PLAN_4tn6m9vmm2k5y0310173f MANUAL-LOAD    YES     YES      NO    YES
  5535156277878016190 SQL_4cd0d34ee73148be  SQL_PLAN_4tn6m9vmm2k5y8e42f3cc MANUAL-LOAD    YES     YES      NO    YES
  5535156277878016190 SQL_4cd0d34ee73148be  SQL_PLAN_4tn6m9vmm2k5y25375ef9 MANUAL-LOAD    YES     YES      NO    YES
  5535156277878016190 SQL_4cd0d34ee73148be  SQL_PLAN_4tn6m9vmm2k5y9de69d5d MANUAL-LOAD    YES     YES      NO    YES
  5535156277878016190 SQL_4cd0d34ee73148be  SQL_PLAN_4tn6m9vmm2k5yb76f0084 MANUAL-LOAD    YES     YES      NO    YES
8 rows selected.
Look at the data for the ORIGIN column, it shows from where the plans were loaded, either from manual load or auto capture.

Another method is to manual load plans from SQL Tuning Sets (STS). The STS may contain plans that are not present in memory, like plans in the AWR repository. This method is very useful when you want to create baselines of plans that were created by the CBO few days ago and are not in the cursor cache at this time. Keep in mind that STS requires a special license in order to be used. Oracle provides the function DBMS_SPM.load_plans_from_sqlset to accomplish this task.
DBMS_SPM.LOAD_PLANS_FROM_SQLSET (
   sqlset_name      IN  VARCHAR2,
   sqlset_owner     IN  VARCHAR2 := NULL,
   basic_filter     IN  VARCHAR2 := NULL,
   fixed            IN  VARCHAR2 := ‘NO’,
   enabled          IN  VARCHAR2 := ‘YES’
   commit_rows      IN  NUMBER   := 1000)
RETURN PLS_INTEGER;
You just need to specify the sqlset_name and the sqlset_owner in order to load the plans contained in the STS. It also has the default parameters fixed, set to ‘NO’ and enabled,set to ‘YES’. Two interesting attributes can be set within this function. The basic_filter attribute allows you to select only the plans that meet this filter criteria, in other words, is like a ‘where’ in a query.
basic_filter => ‘sql_text like ”select /*LOAD_STS*/%”’ orbasic_filter => ‘sql_id=”4c372tsuhtunm“‘
The commit_rows attribute, allows you to commit after the value specified, which is 1000 by default. Let’s assume that inside this STS we have 30000 plans and we want to load all of them, SPM will commit every 1000 plans to help to reduce the undo log.

Keep in mind that once you manually load plans for a specific SQL (signature), the CBO will continue capturing plans for it, even if the automatic capture is disabled. Those automatic captured plans will not be used by the CBO until you verify and evolve them.

In the next post, I’m going to show you how to evolve plans in 11g and 12c versions.

Thanks,

Alfredo

Understanding Oracle SQL Plan Management SPM – Part 2

In my previous post Understanding SQL Plan Management – Part 1, I tried to cover some basic concepts of SQL Plan Management (SPM). This post will show you the main components of SPM and how they work in the different versions of Oracle.   

SQL plan management framework has three main components, plan capture, plan selection and plan evolution. These components allow administrators choose which plans should be executed by the database. Plan capture is the process of loading execution plans from different sources into the SQL management base. Plan selection is the process the database follows in order to choose the best plan available after considering many factors including SQL baselines. Plan evolution is the process of making baselines executable after verifying if any performance improvement is available.
Plan capture can be done in two forms, automatic and manual. Automatic plan capture takes place when the database initialization parameter OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES is set to TRUE. Keep in mind that the default value for this parameter is FALSE. If automatic capture is enabled, the CBO will log the SQL signature of any SQL statement executed for the very first time in the database. When the SQL statement is executed for the second time it will recognize it as a repeatable statement in the database and SQL baselines are automatically created and marked as “ACCEPTED” for each repeatable SQL statement. Be careful with this feature because the baseline will be created using the execution plan of the second execution whether is optimal or not. If the CBO produces a better execution plan after the baseline is created, the plan will be stored in the SMB but not used until the plan is evolved.
Manual plan capture is the most common method to capture plans. It can be done in conjunction or instead of automatic capture for a single SQL statement or a group of statements. Plans can be manually loaded by using either DBMS_SPM package or Oracle Enterprise Manager (OEM) from four different sources. Manual loaded plans are automatically “ACCEPTED” and the CBO will continue to load plans automatically for this plans even if automatically plan capture is disabled. Plans automatically loaded after the manual load are marked as not “ACCEPTED”. You can load plans from four sources. From cursor cache, which are the active plans in memory. From SQL tuning sets, which require the SQL tuning pack or real application testing license. From stored outlines and from another Oracle database system using datapump. In order to transfer baselines from one database system to another, you must pack, load and unpack the baseline by following the below steps:
1. On the original system, create a staging table using the DBMS_SPM.CREATE_STGTAB_BASELINE procedure.
2. Pack the SQL plan baselines you want to export from the SQL management base into the staging table using the DBMS_SPM.PACK_STGTAB_BASELINE function.
3. Export the staging table into a flat file using the export command or Oracle Data Pump.
4. Transfer this flat file to the target system.
5. Import the staging table from the flat file using the import command or Oracle Data Pump.
6. Unpack the SQL plan baselines from the staging table into the SQL management base on the target system using the DBMS_SPM.UNPACK_STGTAB_BASELINE function.
The CBO calculates the execution plan every time the SQL statement is parsed (compiled) and then proceeds to execute the statement. If the OPTIMIZER_USE_SQL_PLAN_BASELINES is set to TRUE, the CBO will check if a SQL baseline exists before the plan is executed. If there is no SQL signature that matches the parsed SQL statement then the CBO will log this signature in the statement log if the automatic capture is enabled or just proceeds to execute the statement if disabled. If a signature matches the statement and a baseline doesn’t exists, it will proceed to create a plan for verification but if a baseline already exists then the baseline will be executed. If more than one “ACCEPTED” plan exists in the baseline, the CBO costs each plan for the given SQL statement and picks the one with the lowest cost. If a baseline is marked as “FIXED”, the CBO picks the lowest cost fixed plan unless all fixed plans are marked as non-reproducible.
Plan evolution, in simple terms is the process to “ACCEPT” or “REJECT” plans after verifying that performance is better or not compared with the current plan. Plan evolution as plan capture can be done automatic and manual. Automatic plan evolution is managed by the SQL tuning advisor task in 11g versions, while is managed by the SPM Evolve Advisor in 12c.
Adaptive SQL Plan Management  is one of the new features of Oracle 12c. Adaptive SQL Plan Management, is just the new automatic evolve task SYS_AUTO_SPM_EVOLVE_TASK that runs in the nightly maintenance window and is enabled by default. This task ranks all unaccepted plans and runs the evolve process for them. If the plan performs 1.5x times better than the current plan in the SQL plan baseline, then the plan is automatically accepted and becomes usable by the optimizer. This ratio defined by the hidden parameter _PLAN_VERIFY_IMPROVEMENT_MARGIN and can be modified. After the evolve task is complete, a persistent report is generated with details on how the non-accepted plan perform compared to the accepted plan performance. Administrators can go back and check what plans were evolved to any point in time.
      
Manual plan evolution is bit different in every version of the database. In 11g release 1, is being controlled using the DBMS_SPM.ALTER_SQL_PLAN_BASELINE function and by changing the attribute_name ‘ACCEPTED’ to ‘YES’. Is the administrator’s responsibility verify that the plan is performing better than the current plan. In 11g release 2, the DBMS_SPM.EVOLVE_SQL_PLAN_BASELINE function helps the adminstrator verify if the plan performs better than the current plan and if it does, the plan is marked as “ACCEPTED”. A new SPM Evolve Advisor API is available in 12c version. Create task, execute task and report the evolve taks are part of the three step process of evolving plans. Unaccepted plans are not manually evolved when using the SPM Evolve Advisor, therfore the plan must be manually accepted using DBMS_SPM.ACCEPT_SQL_PLAN_BASELINE function.   
In the next post I will show you how to use the different DBMS_SPM functions to capture and evolve plans.

Thanks,

Alfredo

Testing user calls

In the performance tuning world specially talking about Response Time Analysis (RTA), user calls is often used as a workload metric; however this instance statistic was not yet completely clear.

When this metric is incremented?, How this metric is instrumented?, Why sometimes user calls is less with the same SQL statement?; these were some questions rounding my mind.

My friend Craig Shallahamer did a great post in his blog “A Wider View” related to user calls (http://shallahamer-orapub.blogspot.mx/2010/05/understanding-user-calls.html), however looking into Tanel Poder’s blog today everything got clear.

Let’s start tracing an Oracle session using trace events ‘10046’ & ‘10051’. 

Event ‘10051’ will trace OPI calls, more information (http://blog.tanelpoder.com/2011/03/20/lobread-sql-trace-entry-in-oracle-11-2/), what are OPI calls?, well as per Oracle documentation:

Oracle Program Interface (OPI)

A networking layer responsible for responding to each of the possible messages sent by OCI. For example, an OCI request to fetch 25 rows would have an OPI response to return the 25 rows once they have been fetched.


Let’s do it!

SQL> ALTER SESSION SET EVENTS ‘10046 TRACE NAME CONTEXT FOREVER, LEVEL 12’;
ALTER SESSION SET EVENTS ‘10051 TRACE NAME CONTEXT FOREVER, LEVEL 12’;
Session altered.
SQL>
Session altered.
SQL> SELECT TRACEFILE FROM V$PROCESS WHERE ADDR =(SELECT PADDR FROM V$SESSION WHERE SID=49);
TRACEFILE
——————————————————————————–
/u01/app/oracle/diag/rdbms/rac/RAC1/trace/RAC1_ora_5393.trc

In another window open the trace file using tail –f in order to monitor the trace execution. Now let’s issue a small query:

SQL> select * from dual;
D
X
SQL>
Here’s the output of the trace file:

*** 2013-08-05 21:42:15.721
WAIT #3: nam=’SQL*Net message from client’ ela= 342028978 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1375756935721909
OPI CALL: type=105 argc= 2 cursor=  0 name=Cursor close all
CLOSE #3:c=0,e=18,dep=0,type=0,tim=1375756935722017
OPI CALL: type=94 argc=28 cursor=  0 name=V8 Bundled Exec
WAIT #1: nam=’ges message buffer allocation’ ela= 2 pool=0 request=1 allocated=0 obj#=-1 tim=1375756935722368
WAIT #1: nam=’ges message buffer allocation’ ela= 0 pool=0 request=1 allocated=0 obj#=-1 tim=1375756935722448
WAIT #1: nam=’library cache lock’ ela= 568 handle address=1287268616 lock address=1263092292 100*mode+namespace=65538 obj#=-1 tim=1375756935723089
WAIT #1: nam=’ges message buffer allocation’ ela= 1 pool=0 request=1 allocated=0 obj#=-1 tim=1375756935723137
WAIT #1: nam=’ges message buffer allocation’ ela= 2 pool=0 request=1 allocated=0 obj#=-1 tim=1375756935723205
WAIT #1: nam=’library cache pin’ ela= 473 handle address=1287268616 pin address=1263092428 100*mode+namespace=65538 obj#=-1 tim=1375756935723815
=====================
PARSING IN CURSOR #4 len=210 dep=1 uid=0 oct=3 lid=0 tim=1375756935724353 hv=864012087 ad=’4cb562e8′ sqlid=’96g93hntrzjtr’
select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt, timestamp#, sample_size, minimum, maximum, distcnt, lowval, hival, density, col#, spare1, spare2, avgcln from hist_head$ where obj#=:1 and intcol#=:2
END OF STMT
PARSE #4:c=0,e=70,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=1375756935724351
BINDS #4:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=0040a79c  bln=22  avl=03  flg=05
  value=116
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=0040a778  bln=24  avl=02  flg=05
  value=1
EXEC #4:c=0,e=89,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=1375756935724524
FETCH #4:c=0,e=51,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=1375756935724594
STAT #4 id=1 cnt=1 pid=0 pos=1 obj=424 op=’TABLE ACCESS BY INDEX ROWID HIST_HEAD$ (cr=3 pr=0 pw=0 time=0 us)’
STAT #4 id=2 cnt=1 pid=1 pos=1 obj=426 op=’INDEX RANGE SCAN I_HH_OBJ#_INTCOL# (cr=2 pr=0 pw=0 time=0 us)’
CLOSE #4:c=0,e=40,dep=1,type=3,tim=1375756935724660
=====================
PARSING IN CURSOR #1 len=18 dep=0 uid=0 oct=3 lid=0 tim=1375756935725402 hv=942515969 ad=’4b5c8e54′ sqlid=’a5ks9fhw2v9s1′
select * from dual
END OF STMT
PARSE #1:c=3999,e=3334,p=0,cr=3,cu=0,mis=1,r=0,dep=0,og=1,plh=272002086,tim=1375756935725401
EXEC #1:c=0,e=13,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=272002086,tim=1375756935725457
WAIT #1: nam=’SQL*Net message to client’ ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1375756935725480
WAIT #1: nam=’ges message buffer allocation’ ela= 2 pool=0 request=1 allocated=0 obj#=-1 tim=1375756935725627
WAIT #1: nam=’gc cr grant 2-way’ ela= 1035 p1=1 p2=928 p3=4 obj#=116 tim=1375756935726935
WAIT #1: nam=’db file sequential read’ ela= 634 file#=1 block#=928 blocks=1 obj#=116 tim=1375756935737381
WAIT #1: nam=’ges message buffer allocation’ ela= 2 pool=0 request=1 allocated=0 obj#=116 tim=1375756935741146
WAIT #1: nam=’gc cr grant 2-way’ ela= 470 p1=1 p2=929 p3=1 obj#=116 tim=1375756935741693
WAIT #1: nam=’db file sequential read’ ela= 1312 file#=1 block#=929 blocks=1 obj#=116 tim=1375756935743074
FETCH #1:c=1999,e=17670,p=2,cr=3,cu=0,mis=0,r=1,dep=0,og=1,plh=272002086,tim=1375756935743169
STAT #1 id=1 cnt=1 pid=0 pos=1 obj=116 op=’TABLE ACCESS FULL DUAL (cr=3 pr=2 pw=0 time=0 us cost=2 size=2 card=1)’
WAIT #1: nam=’SQL*Net message from client’ ela= 221 driver id=1650815232 #bytes=1 p3=0 obj#=116 tim=1375756935743513
OPI CALL: type= 5 argc= 2 cursor=  1 name=FETCH
FETCH #1:c=0,e=1,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=272002086,tim=1375756935743579
WAIT #1: nam=’SQL*Net message to client’ ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=116 tim=1375756935743629
Snapper shows the below metrics for our session:

SQL> @snapper4 all 5 1 49
Sampling SID 49 with interval 5 seconds, taking 1 snapshots…
— Session Snapper v4.09 BETA – by Tanel Poder ( http://blog.tanelpoder.com ) – Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! 🙂
——————————————————————————————————————————————————————————————————————
 SID @INST, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,   AVERAGES
——————————————————————————————–
    49  @1, SYS       , STAT, opened cursors cumulative,             1,         .2,         ,             ,          ,           ,          1 per execution
    49  @1, SYS       , STAT, user calls,             3,        .59,         ,             ,          ,           ,          3 per execution
Now let’s create a dummy table and insert some rows:
SQL> insert into test_user (user_name) select username from dba_users;
36 rows created.

From trace file:
*** 2013-08-05 21:55:34.386
WAIT #1: nam=’SQL*Net message from client’ ela= 58636486 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1375757734386174
OPI CALL: type=105 argc= 2 cursor=  0 name=Cursor close all
CLOSE #1:c=0,e=15,dep=0,type=0,tim=1375757734386272
OPI CALL: type=94 argc=28 cursor=  0 name=V8 Bundled Exec
=====================
PARSING IN CURSOR #3 len=37 dep=1 uid=0 oct=3 lid=0 tim=1375757734387095 hv=1398610540 ad=’4cbf4cb0′ sqlid=’grwydz59pu6mc’
select text from view$ where rowid=:1
END OF STMT
PARSE #3:c=0,e=375,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1375757734387093
BINDS #3:
 Bind#0
  oacdty=11 mxl=16(16) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=00 csi=00 siz=16 off=0
  kxsbbbfp=1184df90  bln=16  avl=16  flg=05
  value=00002068.0000.0001
EXEC #3:c=2000,e=54991,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=3684871272,tim=1375757734442193
FETCH #3:c=0,e=34,p=0,cr=2,cu=0,mis=0,r=1,dep=1,og=4,plh=3684871272,tim=1375757734442283
STAT #3 id=1 cnt=1 pid=0 pos=1 obj=69 op=’TABLE ACCESS BY USER ROWID VIEW$ (cr=1 pr=0 pw=0 time=0 us cost=1 size=15 card=1)’
CLOSE #3:c=0,e=51,dep=1,type=0,tim=1375757734442357
WAIT #2: nam=’ges message buffer allocation’ ela= 4 pool=0 request=1 allocated=0 obj#=-1 tim=1375757734443204
WAIT #2: nam=’ges message buffer allocation’ ela= 2 pool=0 request=1 allocated=0 obj#=-1 tim=1375757734443382
WAIT #2: nam=’library cache lock’ ela= 549 handle address=1222331724 lock address=1220909652 100*mode+namespace=65538 obj#=-1 tim=1375757734444345
WAIT #2: nam=’KJC: Wait for msg sends to complete’ ela= 10 msg=1361877012 dest|rcvr=65536 mtype=12 obj#=-1 tim=1375757734444389
WAIT #2: nam=’ges message buffer allocation’ ela= 2 pool=0 request=1 allocated=0 obj#=-1 tim=1375757734444433
WAIT #2: nam=’ges message buffer allocation’ ela= 2 pool=0 request=1 allocated=0 obj#=-1 tim=1375757734444564
WAIT #2: nam=’library cache pin’ ela= 640 handle address=1222331724 pin address=1220909516 100*mode+namespace=65538 obj#=-1 tim=1375757734445329
WAIT #2: nam=’KJC: Wait for msg sends to complete’ ela= 9 msg=1361877012 dest|rcvr=65536 mtype=12 obj#=-1 tim=1375757734445367
=====================
PARSING IN CURSOR #6 len=210 dep=1 uid=0 oct=3 lid=0 tim=1375757734475342 hv=864012087 ad=’4cb562e8′ sqlid=’96g93hntrzjtr’
select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt, timestamp#, sample_size, minimum, maximum, distcnt, lowval, hival, density, col#, spare1, spare2, avgcln from hist_head$ where obj#=:1 and intcol#=:2
END OF STMT
PARSE #6:c=0,e=73,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=1375757734475341
BINDS #6:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=0040210c  bln=22  avl=04  flg=05
  value=74760
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=004020e8  bln=24  avl=02  flg=05
  value=1
EXEC #6:c=0,e=91,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=1375757734475532
FETCH #6:c=0,e=15,p=0,cr=2,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=1375757734475567
STAT #6 id=1 cnt=0 pid=0 pos=1 obj=424 op=’TABLE ACCESS BY INDEX ROWID HIST_HEAD$ (cr=2 pr=0 pw=0 time=0 us)’
STAT #6 id=2 cnt=0 pid=1 pos=1 obj=426 op=’INDEX RANGE SCAN I_HH_OBJ#_INTCOL# (cr=2 pr=0 pw=0 time=0 us)’
CLOSE #6:c=0,e=1,dep=1,type=3,tim=1375757734475616
=====================
PARSING IN CURSOR #2 len=64 dep=0 uid=0 oct=2 lid=0 tim=1375757734477276 hv=1206268670 ad=’48d0a418′ sqlid=’4x5wmqj3yccry’
insert into test_user (user_name) select username from dba_users
END OF STMT
PARSE #2:c=10999,e=90955,p=0,cr=4,cu=0,mis=1,r=0,dep=0,og=1,plh=3422547789,tim=1375757734477275
WAIT #2: nam=’ges message buffer allocation’ ela= 5 pool=0 request=1 allocated=0 obj#=74760 tim=1375757734478040
WAIT #2: nam=’ges message buffer allocation’ ela= 2 pool=0 request=1 allocated=0 obj#=74760 tim=1375757734478200
WAIT #2: nam=’enq: TM – contention’ ela= 616 name|mode=1414332419 object #=74760 table/partition=0 obj#=74760 tim=1375757734478936
WAIT #2: nam=’asynch descriptor resize’ ela= 4 outstanding #aio=0 current aio limit=384 new aio limit=390 obj#=-1 tim=1375757734486405
WAIT #2: nam=’ges message buffer allocation’ ela= 2 pool=0 request=1 allocated=0 obj#=-1 tim=1375757734486564
WAIT #2: nam=’gc cr grant 2-way’ ela= 519 p1=1 p2=1976 p3=4 obj#=281 tim=1375757734487242
WAIT #2: nam=’db file sequential read’ ela= 788 file#=1 block#=1976 blocks=1 obj#=281 tim=1375757734488113
WAIT #2: nam=’ges message buffer allocation’ ela= 2 pool=0 request=1 allocated=0 obj#=281 tim=1375757734488224
WAIT #2: nam=’gc cr grant 2-way’ ela= 431 p1=1 p2=1977 p3=1 obj#=281 tim=1375757734488790
WAIT #2: nam=’db file sequential read’ ela= 889 file#=1 block#=1977 blocks=1 obj#=281 tim=1375757734489741
WAIT #2: nam=’ges message buffer allocation’ ela= 2 pool=0 request=1 allocated=0 obj#=281 tim=1375757734491203
WAIT #2: nam=’ges message buffer allocation’ ela= 2 pool=0 request=1 allocated=0 obj#=281 tim=1375757734491352
WAIT #2: nam=’enq: HW – contention’ ela= 1020 name|mode=1213661190 table space #=0 block=4282248 obj#=281 tim=1375757734492503
WAIT #2: nam=’ges message buffer allocation’ ela= 2 pool=0 request=1 allocated=0 obj#=281 tim=1375757734492596
WAIT #2: nam=’gc current grant 2-way’ ela= 422 p1=1 p2=87945 p3=33554433 obj#=74760 tim=1375757734493138
EXEC #2:c=5000,e=15461,p=2,cr=41,cu=7,mis=0,r=36,dep=0,og=1,plh=3422547789,tim=1375757734493365
STAT #2 id=1 cnt=0 pid=0 pos=1 obj=0 op=’LOAD TABLE CONVENTIONAL  (cr=41 pr=2 pw=0 time=0 us)’
STAT #2 id=2 cnt=36 pid=1 pos=1 obj=0 op=’HASH JOIN  (cr=40 pr=2 pw=0 time=35 us cost=24 size=1092 card=13)’
STAT #2 id=3 cnt=36 pid=2 pos=1 obj=0 op=’HASH JOIN  (cr=37 pr=2 pw=0 time=35 us cost=22 size=1053 card=13)’
STAT #2 id=4 cnt=36 pid=3 pos=1 obj=0 op=’HASH JOIN  (cr=28 pr=2 pw=0 time=35 us cost=17 size=1014 card=13)’
STAT #2 id=5 cnt=36 pid=4 pos=1 obj=0 op=’HASH JOIN OUTER (cr=19 pr=2 pw=0 time=140 us cost=13 size=975 card=13)’
STAT #2 id=6 cnt=36 pid=5 pos=1 obj=0 op=’HASH JOIN  (cr=16 pr=2 pw=0 time=35 us cost=10 size=598 card=13)’
STAT #2 id=7 cnt=36 pid=6 pos=1 obj=0 op=’HASH JOIN  (cr=13 pr=0 pw=0 time=0 us cost=8 size=572 card=13)’
STAT #2 id=8 cnt=2 pid=7 pos=1 obj=0 op=’MERGE JOIN CARTESIAN (cr=6 pr=0 pw=0 time=3 us cost=4 size=16 card=1)’
STAT #2 id=9 cnt=1 pid=8 pos=1 obj=280 op=’TABLE ACCESS FULL PROFILE$ (cr=3 pr=0 pw=0 time=0 us cost=2 size=8 card=1)’
STAT #2 id=10 cnt=2 pid=8 pos=2 obj=0 op=’BUFFER SORT (cr=3 pr=0 pw=0 time=1 us cost=2 size=8 card=1)’
STAT #2 id=11 cnt=2 pid=10 pos=1 obj=280 op=’TABLE ACCESS FULL PROFILE$ (cr=3 pr=0 pw=0 time=1 us cost=2 size=8 card=1)’
STAT #2 id=12 cnt=36 pid=7 pos=2 obj=22 op=’TABLE ACCESS FULL USER$ (cr=7 pr=0 pw=0 time=0 us cost=3 size=1008 card=36)’
STAT #2 id=13 cnt=2 pid=6 pos=2 obj=281 op=’TABLE ACCESS FULL PROFNAME$ (cr=3 pr=2 pw=0 time=1 us cost=2 size=2 card=1)’
STAT #2 id=14 cnt=2 pid=5 pos=2 obj=297 op=’TABLE ACCESS FULL RESOURCE_GROUP_MAPPING$ (cr=3 pr=0 pw=0 time=1 us cost=2 size=29 card=1)’
STAT #2 id=15 cnt=7 pid=4 pos=2 obj=16 op=’TABLE ACCESS FULL TS$ (cr=9 pr=0 pw=0 time=48 us cost=4 size=21 card=7)’
STAT #2 id=16 cnt=7 pid=3 pos=2 obj=16 op=’TABLE ACCESS FULL TS$ (cr=9 pr=0 pw=0 time=12 us cost=4 size=21 card=7)’
STAT #2 id=17 cnt=9 pid=2 pos=2 obj=292 op=’TABLE ACCESS FULL USER_ASTATUS_MAP (cr=3 pr=0 pw=0 time=8 us cost=2 size=27 card=9)’
WAIT #2: nam=’SQL*Net message to client’ ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=74760 tim=1375757734493823
And from Snapper:
SQL> @snapper4 all 10 1 49
Sampling SID 49 with interval 10 seconds, taking 1 snapshots…
— Session Snapper v4.09 BETA – by Tanel Poder ( http://blog.tanelpoder.com ) – Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! 🙂
——————————————————————————————————————————————————————————————————————
 SID @INST, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,   AVERAGES
——————————————————————————————————————————————————————————————————————
    49  @1, SYS       , STAT, opened cursors cumulative,             3,        .33,         ,             ,          ,           ,          1 per execution
    49  @1, SYS       , STAT, user calls,             2,        .22,         ,             ,          ,           ,        .67 per execution
Now let’s commit:
SQL> commit;
Commit complete.
SQL>
From trace file:
*** 2013-08-05 22:01:49.281
WAIT #2: nam=’SQL*Net message from client’ ela= 374787222 driver id=1650815232 #bytes=1 p3=0 obj#=74760 tim=1375758109281068
OPI CALL: type=105 argc= 2 cursor=  0 name=Cursor close all
CLOSE #2:c=0,e=20,dep=0,type=0,tim=1375758109281167
OPI CALL: type=94 argc=28 cursor=  0 name=V8 Bundled Exec
=====================
PARSING IN CURSOR #4 len=6 dep=0 uid=0 oct=44 lid=0 tim=1375758109281299 hv=3480936638 ad=’0′ sqlid=’23wm3kz7rps5y’
commit
END OF STMT
PARSE #4:c=0,e=91,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=0,tim=1375758109281298
XCTEND rlbk=0, rd_only=0, tim=1375758109281345
EXEC #4:c=0,e=102,p=0,cr=0,cu=1,mis=0,r=0,dep=0,og=0,plh=0,tim=1375758109281434
WAIT #4: nam=’log file sync’ ela= 24619 buffer#=4018 sync scn=1381492 p3=0 obj#=74760 tim=1375758109306071
WAIT #4: nam=’SQL*Net message to client’ ela= 351 driver id=1650815232 #bytes=1 p3=0 obj#=74760 tim=1375758109307375
From Snapper:
SQL> @snapper4 all 6 1 49
Sampling SID 49 with interval 6 seconds, taking 1 snapshots…
— Session Snapper v4.09 BETA – by Tanel Poder ( http://blog.tanelpoder.com ) – Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! 🙂
——————————————————————————————————————————————————————————————————————
 SID @INST, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,   AVERAGES
——————————————————————————————————————————————————————————————————————
    49  @1, SYS       , STAT, opened cursors cumulative,             1,        .17,         ,             ,          ,           ,          ~ per execution
    49  @1, SYS       , STAT, user commits,             1,        .17,         ,             ,          ,           ,          ~ per execution
    49  @1, SYS       , STAT, user calls,             2,        .34,         ,             ,          ,           ,          ~ per execution
This is just a little example of how OPI CALL is related to user calls metric. You can play and test more scenarios of select statements to see how they are being affected by ARRAYSIZE setting in SqlPlus.
Craig posted a table containing the result of his testing which is having the same results as this test.

SQL
User Calls
Select 1 row
3
Insert
2
Commit
2
Thanks,

Alfredo

Why the execution plan changed? Case study

Overview

At some point in time the execution plan of the SQL_ID etp65bryqtd2d changed. The new execution plan drops on performance from an Avg. of 0.01 seconds per execution to an Avg. of 1.5 seconds per execution.
This increase on elapsed time per execution provoke that the application can’t process work orders fast enough, the work queue started to drastically grow leading the application team to escalate a severity 1 issue.

Analysis

Execution plan changes

The execution plan changes only when a parse is performed, this because the _optim_bind_peeking parameter is on default value “TRUE”. A hard parse is required due to cursor invalidation in the shared pool; a cursor can be invalidated for some reasons but the most happening is, object change (Index rebuild, DBMS_STATS, etc.).

Rolling Cursor Invalidation

Starting with Oracle10g cursors are marked for rolling invalidation instead of marked INVALID immediately. On the next execution of the query the Oracle server will generate a random number between 0 and the value of the _optimizer_invalidation_period parameter, which has a default value of 18000, and the cursor will remain valid for this number of seconds. Upon every following execution Oracle will check if this random selected timeout has expired. If that is the case then the cursor will be hard parsed again.
Things will change without notice! Cursor invalidation has changed in the past and the reason for this change is that previously all dependent cursors were invalidated immediately after gathering new object statistics. This massive invalidation might cause a serious degradation in performance right after statistics gathering due to a high number of hard parses. Rolling cursor invalidation spreads the invalidation of cursors out over a longer period of time thereby avoiding the performance degradation caused by hard parsing the invalidated cursors.

Reference: Metalink note:557661.1

Bind variable peeking

The query optimizer peeks at the values of user-defined bind variables on the first invocation of a cursor. This feature lets the optimizer determine the selectivity of any WHERE clause condition, as well as if literals have been used instead of bind variables. On subsequent invocations of the cursor, no peeking takes place, and the cursor is shared, based on the standard cursor-sharing criteria, even if subsequent invocations use different bind values.
In this case we can clearly see that the elapsed time per execution is not changing because of the bind peeking; when Oracle peeks a bad execution plan, all elapsed times are bad.

CBO trace event 10053

Behind the scenes, what is the CBO doing when it comes to how it comes up with an execution plan? This is where the 10053 trace event comes to play. Other tools or settings show us WHAT the CBO comes up with; the 10053 setting tells us HOW the CBO came to its decision (the final execution plan).

conn / as sysdba
oradebug setmypid
oradebug unlimit
oradebug event 10053 trace name context forever, level 1
…your statement here…
oradebug event 10053 trace name context off
oradebug tracefile_name

We have performed this CBO trace when the query was performing bad and when it was performing well. With this method we have 2 trace files to compare and find out what is making the CBO to take such bad plan.
Notice that this method will produce a trace file only in the hard parse stage. So, in order to force the CBO to hard parse the SQL we have run it with a comment hint.

variable N1 VARCHAR2(128)
exec :N1 :=’125′;
select /* comment*/…

After comparing both trace files we can clearly see when the CBO chose the good plan:
***********************
Best so far: Table#: 1  cost: 7.3522  card: 3973.0000  bytes: 218515
             Table#: 7  cost: 16.2850  card: 9.0178  bytes: 612
             Table#: 2  cost: 21.6884  card: 33.7888  bytes: 3094
             Table#: 5  cost: 48.9215  card: 3565.0298  bytes: 406410
             Table#: 4  cost: 1475.7155  card: 3565.4846  bytes: 556140
             Table#: 0  cost: 1483.2886  card: 3559.8976  bytes: 644360
             Table#: 6  cost: 2195.7103  card: 3599.4022  bytes: 770186
             Table#: 3  cost: 2197.9000  card: 3599.4022  bytes: 820572
***********************
In the other hand, the trace of the bad plan is having a huge cost:
***********************
Best so far: Table#: 1  cost: 7.3510  card: 3928.0000  bytes: 216040
             Table#: 7  cost: 16.2836  card: 8.9798  bytes: 612
             Table#: 2  cost: 21.6870  card: 32.6678  bytes: 3003
             Table#: 5  cost: 153.8561  card: 20168.2755  bytes: 2299152
             Table#: 4  cost: 8225.5481  card: 20170.8485  bytes: 3146676
             Table#: 0  cost: 8233.2391  card: 20139.2414  bytes: 3645159
             Table#: 6  cost: 12263.4248  card: 20362.7290  bytes: 4357682
             Table#: 3  cost: 12266.6610  card: 20362.7290  bytes: 4642764
***********************
In the trace files we can clearly see that cost of table#5 messed up our total cost. Why is that?
***************
Now joining: TABLE5[TL2]#5
***************
NL Join
  Outer table: Card: 32.67  Cost: 21.69  Resp: 21.69  Degree: 1  Bytes: 91
  Inner table: TABLE5  Alias: TL2
  Access Path: TableScan
Index: TABLE5_IDX3
    resc_io: 20.00  resc_cpu: 360839
    ix_sel: 3.9216e-04  ix_sel_with_filters: 3.9216e-04
    NL Join: Cost: 153.86  Resp: 153.86  Degree: 1
      Cost_io: 152.55  Cost_cpu: 18392646
      Resp_io: 152.55  Resp_cpu: 18392646
****** finished trying bitmap/domain indexes ******
  Best NL cost: 153.86
          resc: 153.86 resc_io: 152.55 resc_cpu: 18392646
          resp: 153.86 resp_io: 152.55 resp_cpu: 18392646
Join Card:  20168.28 = outer (32.67) * inner (1574304.00) * sel (3.9216e-04)
Join Card – Rounded: 20168 Computed: 20168.28
Index selectivity on TABLE5_IDX3 is very low (3.9216e-04) compared with the good plan (6.7020e-05).

Index selectivity

After looking into the Index selectivity, the question arises; why the Index selectivity is low?
B*TREE Indexes improve the performance of queries that select a small percentage of rows from a table. As a general guideline, we should create indexes on tables that are often queried for less than 15% of the table’s rows. This value may be higher in situations where all data can be retrieved from an index, or where the indexed columns can be used for joining to other tables.
The ratio of the number of distinct values in the indexed column / columns to the number of records in the table represents the selectivity of an index. The ideal selectivity is 1. Such selectivity can be reached only by unique indexes on NOT NULL columns.

Selectivity = Distinct Values / Total Number Rows

With this measure we can correlate the importance of the statistics in the objects. If we gather statistics only on the table, the total number of rows will change; leading to drastically change the Index selectivity. In the same manner, if we only gather statistics on the Index, the number of distinct values will change and the selectivity will dramatically change.
Apart of that, the accuracy of the statistics will also play an important role. The accurate the statistics, the accurate the Index selectivity.

Suggestions – How to fix this issue?

The suggestions came from the percept that Index selectivity is playing a bad game on this particular SQL_ID.
        As table and its indexes have a close relationship on how the CBO calculates the cost, we must gather statistics on the table and its indexes in the same time. This will avoid the Index selectivity to drastically change.
       When an index is rebuilt, we must gather statistics of the table too.
       Review the accuracy of the statistics. In some cases AUTO_SAMPLE_SIZE is not giving a good sample.

Some test on this SQL_ID have proven that gathering statistics to the TABLE5 table and its indexes with an estimate percent of 50%, plus invalidating the cursor immediately solved the issue.
The CBO was able to pick the good execution plan because the index selectivity was good enough.

Thanks,
Alfred