Compression Advisor killed my database!

Over the weekend one of the databases hung due to the flash recovery area was 100% full. I noticed one J001 process consuming significant CPU and I/O resources. Turns out this process was the automatic segment advisor job that runs on the weekend maintenance window.
The SQL executed was something like:

CREATE TABLE .dbms_tabcomp_temp_uncmp
TABLESPACE NOLOGGING
AS
SELECT /*+ FULL(.

) */
*
FROM .

After reading Oracle note Id 13463481.8 and confirming this with an SR, this is related to a bug for 11.2.0.3 version and fixed in 11.2.0.4. This bug generates excessive amount of redo when running the compression advisor on a table with a LOB column in a database running in ARCHIVELOG mode.

As we can’t just apply the required patch to the ORACLE_HOME right away, we decided to perform the workaround of disabling the automatic segment advisor task. The compression advisor is part of the segment advisor and is not possible just to disable one or the other.

To disable the segment advisor:

SQL> BEGIN
dbms_auto_task_admin.disable(
client_name => ‘auto space advisor’,
operation => NULL,
window_name => NULL);
END;
/  2    3    4    5    6    7
PL/SQL procedure successfully completed.
After executing the procedure, verify that the “auto space advisor” is disabled.

SQL> SELECT client_name, status FROM dba_autotask_client;
CLIENT_NAME                                                      STATUS
—————————————————————- ——–
auto optimizer stats collection                                  ENABLED
auto space advisor                                               DISABLED
sql tuning advisor                                               ENABLED
Although the advisor will not automatically run, you can always run it manually on the segments or indexes you want to be analyzed.
Thanks,

Alfredo

Using repvfy to find problems in the OEM 12c repository

The repvfy Kit is very useful when you are trying to diagnose a problem in OEM Cloud Control 12c. 

I noticed that some of the tasks from the dbms_scheduler weren’t running on time, hence creating a backlog in the repository.

In order to get more information about this issue, you can make use of the repvfy Kit. The installation is pretty straight forward and is covered in the Oracle Support Note 1426973.1. At the time of this post repvfy version 2015.0622 is available.

Once installed you may start running test against individual modules or the entire OEM 12c repository.

What modules I can test using repvfy?

$ repvfy –h4
Let’s say you want a complete test with all the details of the entire OEM 12c repository, then you may run:

$ repvfy -level 9 -details
Keep in mind that this task is going to take some time to finalize, as is testing all modules available.

Ok, now going back to my problem with scheduler jobs not running on time. I decided to run the performance test to have more details of what is going on with the repository. This the command used for the test:

$ repvfy dump performance
The report looks like this
— — ——————————————————————— —
— — REPVFY: 2015.0507     Repository: 12.1.0.4.0     29-Jul-2015 11:27:01 —
— —————————————————————————
 [—– REPVFY Version details ———————————————–]
COMPONENT          INFO
—————— —————————————-
EMDIAG Version     2015.0507
Repository Version 12.1.0.4.0
Database Version   11.2.0.4.0
Test Version       2015.0526
Repository Type    CENTRAL
5 rows selected.
[—————————————————————————-]
[– Database information —————————————————-]
[—————————————————————————-]
[—– Database information ————————————————-]
[—– Instance information ————————————————-]

[—– DBMS_SCHEDULER execution statistics (last two days) ——————]
JOB_NAME                                       RUNS  MIN_DELAY  MAX_DELAY  AVG_DELAY
—————————————- ———- ———- ———- ———-
EM_AVAIL_UNKNOWN_STUCK                          169        .01       1.89        .43
EM_BEACON_GENSVC_AVAIL                          507        .01       1.87        .58
EM_BSLN_SET_THRESHOLDS                            8        .01       1.58        .38
EM_DERIV_RETRY_ACTIONS_JOB                      101        .01       1.79        .36
EM_ECM_VCPU_JOB                                   8        .02       1.72         .7
EM_GATHER_SYSMAN_STATS                            5        .05       1.66         .6
EM_GROUP_MEMBER_SYNCUP                          503        .01     113.34       2.04
EM_HEALTH_CALC_JOB                              507        .01       2.18        .58
EM_JOBS_STEP_SCHED                            11953          0       3.89        .35
EM_JOB_PURGE_POLICIES                             1        .04        .04        .04
EM_METBSLN_COMPUTE_STATS                         16        .01       1.08        .23
EM_PING_MARK_NODE_STATUS                       1014        .01       1.89        .44
EM_PURGE_POLICIES                                 1         .4         .4         .4
EM_REPOS_SEV_EVAL                             43077          0       6.94       1.06
EM_ROLLUP_SCHED_JOB                               1        .02        .02        .02
EM_SLM_COMP_SCHED_JOB                           507        .01       2.09        .58
EM_SYSTEM_MEMBER_SYNUP                          507        .01        1.9        .63
EM_TASK_RESUBMIT_FAILED                           8        .01       1.58        .37
EM_TASK_WORKER_23                               491        .02       1.94        .65
EM_TASK_WORKER_24                                 1       2.03       2.03       2.03
EM_TASK_WORKER_25                                17        .01       1.71        .59
EM_TASK_WORKER_26                                17        .02       1.92        .55
EM_TGT_PROP_CONF_PP                               1       1.67       1.67       1.67
23 rows selected.
[—– Worker thread count ————————————————–]
CLASS                     WORKER_COUNT
————————- ————
Short (0)                            1
Long (1)                             1
2 rows selected.
[—– Task worker backlog ————————————————–]
CLASS                            CNT
————————- ———-
Short (0)                       3190
1 row selected.
Here, we can clearly see that out Task Worker for Short tasks is getting a huge backlog. Next, I decided to run a system dump to get all the EM Infrastructure details.

$ repvfy dump system
Here’s another interesting finding:

[—– PL/SQL tracing levels ————————————————]
CONTEXT_TYPE_ID CONTEXT_TYPE                             TRACE_LEVEL     LAST_UPDATE_DATE
————— —————————————- ————— ——————–
              1 EM_EVENT_RECEIVER                        4-OFF           12-MAY-2014 18:23:13
              2 EM_EVENT_MANAGER                         4-OFF           12-MAY-2014 18:23:13
              4 EM.DERIV                                 4-OFF           12-MAY-2014 18:23:13
              5 EM_EVENT_BUS                             4-OFF           12-MAY-2014 18:23:13
              6 EM_NOTIFY                                4-OFF           12-MAY-2014 18:23:13
              7 EM_PPC                                   4-OFF           12-MAY-2014 18:23:13
              8 DEFAULT                                  4-OFF           12-MAY-2014 18:23:13
              9 TRACER                                   4-OFF           12-MAY-2014 18:23:13
             10 LOADER                                   4-OFF           12-MAY-2014 18:23:13
             11 NOTIFICATION                             4-OFF           12-MAY-2014 18:23:13
             12 REPOCOLLECTION                           4-OFF           12-MAY-2014 18:23:13
             13 EMCLI                                    4-OFF           12-MAY-2014 18:23:13
             14 EM.JOBS                                  4-OFF           12-MAY-2014 18:23:13
             15 EM.BLACKOUT                              4-OFF           12-MAY-2014 18:23:13
             16 SVCTESTAVAIL                             4-OFF           12-MAY-2014 18:23:13
             17 COMPLIANCE_EVALUATION                    4-OFF           12-MAY-2014 18:23:13
             18 EM.ECM                                   4-OFF           12-MAY-2014 18:23:13
             19 EM_SLM_COMPUTATION                       4-OFF           21-MAR-2012 14:24:35
             20 EM_CNTR_QUEUE                            4-OFF           12-MAY-2014 18:23:13
             21 EMD_RAC                                  4-OFF           12-MAY-2014 18:23:13
             22 DB_SYSTEM                                4-OFF           12-MAY-2014 18:23:13
             23 EMD_DBSERVICE                            2-WARNING       17-MAR-2015 13:33:16
             24 EM_DBM                                   2-WARNING       17-MAR-2015 13:36:38
             25 CAT                                      4-OFF           12-MAY-2014 18:23:13
             26 EM_SSA_XAAS                              4-OFF           12-MAY-2014 18:23:13
             27 MGMT_COLLECTION.COLLECTION_SUBSYSTEM     4-OFF           12-MAY-2014 18:23:13
             28 SEVERITY_EVALUATION                      4-OFF           12-MAY-2014 18:23:13
             29 SEVERITY_TRIGGER                         4-OFF           12-MAY-2014 18:23:13
             30 EM.GDS                                   2-WARNING       09-SEP-2014 13:43:36
             31 BLK_TRACE                                2-WARNING       17-MAR-2015 12:22:15
             32 MET_BASELINE                             2-WARNING       17-MAR-2015 12:23:34
             33 METRIC_LOAD                              2-WARNING       17-MAR-2015 12:23:34
             34 USAGE_SUMMARY                            2-WARNING       17-MAR-2015 12:26:36
             35 JVMD_LOG_MODULE                          2-WARNING       17-MAR-2015 13:06:15
             36 EM_HEALTH_CALC                           2-WARNING       17-MAR-2015 13:06:20
             39 CRS_EVENT                                2-WARNING       09-JUN-2015 15:44:03
36 rows selected.
As a best practice, we should have at least 2 Task Workers for each Short/Long tasks; and have trace disabled for the PL/SQL packages, unless we are troubleshooting an issue on them.

At this point repvfy helped us to identify 2 issues in our OEM 12c repository, now the question is, how do I fix them?

Well, repvfy also has the capabilities to fix problems related to those tests. In fact, if we want to check for the recommended values and have them fixed, we can run the following command:

$ repvfy execute optimize
This command will run tests against the internal task system, repository settings and the target system.

After the command finished, I checked again and found that my number of Task Workers was modified to 2 for each type and the trace was disabled for all the PL/SQL packages.

Do you want more information about the execute optimize command? Check Courtney Llamas blog.

Thanks,

Alfredo