Using repvfy to find problems in the OEM 12c repository

The repvfy Kit is very useful when you are trying to diagnose a problem in OEM Cloud Control 12c. 

I noticed that some of the tasks from the dbms_scheduler weren’t running on time, hence creating a backlog in the repository.

In order to get more information about this issue, you can make use of the repvfy Kit. The installation is pretty straight forward and is covered in the Oracle Support Note 1426973.1. At the time of this post repvfy version 2015.0622 is available.

Once installed you may start running test against individual modules or the entire OEM 12c repository.

What modules I can test using repvfy?

$ repvfy –h4
Let’s say you want a complete test with all the details of the entire OEM 12c repository, then you may run:

$ repvfy -level 9 -details
Keep in mind that this task is going to take some time to finalize, as is testing all modules available.

Ok, now going back to my problem with scheduler jobs not running on time. I decided to run the performance test to have more details of what is going on with the repository. This the command used for the test:

$ repvfy dump performance
The report looks like this
— — ——————————————————————— —
— — REPVFY: 2015.0507     Repository: 12.1.0.4.0     29-Jul-2015 11:27:01 —
— —————————————————————————
 [—– REPVFY Version details ———————————————–]
COMPONENT          INFO
—————— —————————————-
EMDIAG Version     2015.0507
Repository Version 12.1.0.4.0
Database Version   11.2.0.4.0
Test Version       2015.0526
Repository Type    CENTRAL
5 rows selected.
[—————————————————————————-]
[– Database information —————————————————-]
[—————————————————————————-]
[—– Database information ————————————————-]
[—– Instance information ————————————————-]

[—– DBMS_SCHEDULER execution statistics (last two days) ——————]
JOB_NAME                                       RUNS  MIN_DELAY  MAX_DELAY  AVG_DELAY
—————————————- ———- ———- ———- ———-
EM_AVAIL_UNKNOWN_STUCK                          169        .01       1.89        .43
EM_BEACON_GENSVC_AVAIL                          507        .01       1.87        .58
EM_BSLN_SET_THRESHOLDS                            8        .01       1.58        .38
EM_DERIV_RETRY_ACTIONS_JOB                      101        .01       1.79        .36
EM_ECM_VCPU_JOB                                   8        .02       1.72         .7
EM_GATHER_SYSMAN_STATS                            5        .05       1.66         .6
EM_GROUP_MEMBER_SYNCUP                          503        .01     113.34       2.04
EM_HEALTH_CALC_JOB                              507        .01       2.18        .58
EM_JOBS_STEP_SCHED                            11953          0       3.89        .35
EM_JOB_PURGE_POLICIES                             1        .04        .04        .04
EM_METBSLN_COMPUTE_STATS                         16        .01       1.08        .23
EM_PING_MARK_NODE_STATUS                       1014        .01       1.89        .44
EM_PURGE_POLICIES                                 1         .4         .4         .4
EM_REPOS_SEV_EVAL                             43077          0       6.94       1.06
EM_ROLLUP_SCHED_JOB                               1        .02        .02        .02
EM_SLM_COMP_SCHED_JOB                           507        .01       2.09        .58
EM_SYSTEM_MEMBER_SYNUP                          507        .01        1.9        .63
EM_TASK_RESUBMIT_FAILED                           8        .01       1.58        .37
EM_TASK_WORKER_23                               491        .02       1.94        .65
EM_TASK_WORKER_24                                 1       2.03       2.03       2.03
EM_TASK_WORKER_25                                17        .01       1.71        .59
EM_TASK_WORKER_26                                17        .02       1.92        .55
EM_TGT_PROP_CONF_PP                               1       1.67       1.67       1.67
23 rows selected.
[—– Worker thread count ————————————————–]
CLASS                     WORKER_COUNT
————————- ————
Short (0)                            1
Long (1)                             1
2 rows selected.
[—– Task worker backlog ————————————————–]
CLASS                            CNT
————————- ———-
Short (0)                       3190
1 row selected.
Here, we can clearly see that out Task Worker for Short tasks is getting a huge backlog. Next, I decided to run a system dump to get all the EM Infrastructure details.

$ repvfy dump system
Here’s another interesting finding:

[—– PL/SQL tracing levels ————————————————]
CONTEXT_TYPE_ID CONTEXT_TYPE                             TRACE_LEVEL     LAST_UPDATE_DATE
————— —————————————- ————— ——————–
              1 EM_EVENT_RECEIVER                        4-OFF           12-MAY-2014 18:23:13
              2 EM_EVENT_MANAGER                         4-OFF           12-MAY-2014 18:23:13
              4 EM.DERIV                                 4-OFF           12-MAY-2014 18:23:13
              5 EM_EVENT_BUS                             4-OFF           12-MAY-2014 18:23:13
              6 EM_NOTIFY                                4-OFF           12-MAY-2014 18:23:13
              7 EM_PPC                                   4-OFF           12-MAY-2014 18:23:13
              8 DEFAULT                                  4-OFF           12-MAY-2014 18:23:13
              9 TRACER                                   4-OFF           12-MAY-2014 18:23:13
             10 LOADER                                   4-OFF           12-MAY-2014 18:23:13
             11 NOTIFICATION                             4-OFF           12-MAY-2014 18:23:13
             12 REPOCOLLECTION                           4-OFF           12-MAY-2014 18:23:13
             13 EMCLI                                    4-OFF           12-MAY-2014 18:23:13
             14 EM.JOBS                                  4-OFF           12-MAY-2014 18:23:13
             15 EM.BLACKOUT                              4-OFF           12-MAY-2014 18:23:13
             16 SVCTESTAVAIL                             4-OFF           12-MAY-2014 18:23:13
             17 COMPLIANCE_EVALUATION                    4-OFF           12-MAY-2014 18:23:13
             18 EM.ECM                                   4-OFF           12-MAY-2014 18:23:13
             19 EM_SLM_COMPUTATION                       4-OFF           21-MAR-2012 14:24:35
             20 EM_CNTR_QUEUE                            4-OFF           12-MAY-2014 18:23:13
             21 EMD_RAC                                  4-OFF           12-MAY-2014 18:23:13
             22 DB_SYSTEM                                4-OFF           12-MAY-2014 18:23:13
             23 EMD_DBSERVICE                            2-WARNING       17-MAR-2015 13:33:16
             24 EM_DBM                                   2-WARNING       17-MAR-2015 13:36:38
             25 CAT                                      4-OFF           12-MAY-2014 18:23:13
             26 EM_SSA_XAAS                              4-OFF           12-MAY-2014 18:23:13
             27 MGMT_COLLECTION.COLLECTION_SUBSYSTEM     4-OFF           12-MAY-2014 18:23:13
             28 SEVERITY_EVALUATION                      4-OFF           12-MAY-2014 18:23:13
             29 SEVERITY_TRIGGER                         4-OFF           12-MAY-2014 18:23:13
             30 EM.GDS                                   2-WARNING       09-SEP-2014 13:43:36
             31 BLK_TRACE                                2-WARNING       17-MAR-2015 12:22:15
             32 MET_BASELINE                             2-WARNING       17-MAR-2015 12:23:34
             33 METRIC_LOAD                              2-WARNING       17-MAR-2015 12:23:34
             34 USAGE_SUMMARY                            2-WARNING       17-MAR-2015 12:26:36
             35 JVMD_LOG_MODULE                          2-WARNING       17-MAR-2015 13:06:15
             36 EM_HEALTH_CALC                           2-WARNING       17-MAR-2015 13:06:20
             39 CRS_EVENT                                2-WARNING       09-JUN-2015 15:44:03
36 rows selected.
As a best practice, we should have at least 2 Task Workers for each Short/Long tasks; and have trace disabled for the PL/SQL packages, unless we are troubleshooting an issue on them.

At this point repvfy helped us to identify 2 issues in our OEM 12c repository, now the question is, how do I fix them?

Well, repvfy also has the capabilities to fix problems related to those tests. In fact, if we want to check for the recommended values and have them fixed, we can run the following command:

$ repvfy execute optimize
This command will run tests against the internal task system, repository settings and the target system.

After the command finished, I checked again and found that my number of Task Workers was modified to 2 for each type and the trace was disabled for all the PL/SQL packages.

Do you want more information about the execute optimize command? Check Courtney Llamas blog.

Thanks,

Alfredo

Deploy multiple plug-ins at once using OEM 12.1.0.4 console

Today’s post is about a neat Oracle EM 12c feature. I spoke in the past Collaborate 2015 about deploying multiple plug-ins at once using emcli to save time. I used emcli because the console didn’t have the option to do that. Guess what? the new release 12.1.0.4 has the option to do it from the console! This is especially handy when you don’t know how to use emcli and you are in the need to deploy several plug-ins to the OMS and you don’t want to spend that humongous amount of time doing it one by one.

In order to do that you just have to go to:

      Click in Setup
      Navigate to Extensibility -> Plugins
      Select one of the plug-ins you want to deploy
The next screen will ask you to add more plug-ins if required.

It will also tell you if any downtime is required for the plug-in deployment.
Click next and proceed as usual with the deployment process.
Thanks,

Alfredo

RMAN jobs not working after OEM upgrade to 12.1.0.4

If you are planning to upgrade your OEM to 12.1.0.4 and you have RMAN jobs scheduled in Cloud Control, you should consider applying patch 19519190 to the OMS. I noticed that most of the RMAN jobs were having issues and even worst, some steps were empty!!! 

Obviously, the jobs were succeeding as the step is empty. In other words, the jobs were doing nothing.

Looks like this patch is not part of any PSU, yet! But having a problem with hundreds of jobs and especially with RMAN jobs is very risky.

Take a look at EM 12c: RMAN Step Commands are Being Removed from Multi-step RMAN Script Jobs in Enterprise Manager 12.1.0.4 Cloud Control (Doc ID 1914916.1).

Thanks,

Alfredo

Using OMS DEBUG mode to troubleshoot OEM 12c problems

This time, I want to show you how to troubleshoot OEM problems by enabling DEBUG mode in the OMS. The virtual machine (VM) running my sandbox installation of OEM 12c 12.1.0.4 crashed during the night. After restarting the VM and all the OEM components, I wasn’t able to login using the SYSMAN account. The error from the console was not very explicit, just, “Authentication failed. If problem persists, contact your system administrator.”

In order to get more details about the error, I decided to enable DEBUG mode for the OMS and reproduce the error. This is what I did to enable DEBUG mode.

$ cd /u01/app/oracle/oms/oms/bin
$ ./emctl set property -name log4j.rootCategory -value “DEBUG, emlogAppender, emtrcAppender” -module logging
Oracle Enterprise Manager Cloud Control 12c Release 4
Copyright (c) 1996, 2014 Oracle Corporation.  All rights reserved.
SYSMAN password:
Property log4j.rootCategory has been set to value DEBUG, emlogAppender, emtrcAppender for all Management Servers
OMS restart is not required to reflect the new property value
After enabling DEBUG mode, I reproduced the error several times using the console. I also wrote down the approximate time of the error, just to easy the search in the log file. Searching in the emoms.trc file located under /em/EMGC_OMS1/sysman/log/, found an ORA-14400 error. The MOS note 1493151.1, explains how to fix the issue by adding a new audit partition.

$ cd /u01/app/oracle/gc_inst/em/EMGC_OMS1/sysman/log/
$ view emoms.trc
java.sql.SQLException: ORA-14400: inserted partition key does not map to any partition
The final step is to disable the DEBUG mode for your OMS, otherwise the log files can grow real big and the performance could be affected.

$ ./emctl set property -name log4j.rootCategory -module LOGGING -value “WARN, emlogAppender, emtrcAppender”
Oracle Enterprise Manager Cloud Control 12c Release 4
Copyright (c) 1996, 2014 Oracle Corporation.  All rights reserved.
SYSMAN password:
Property log4j.rootCategory has been set to value WARN, emlogAppender, emtrcAppender for all Management Servers
OMS restart is not required to reflect the new property value
I hope this information is useful to you next time you are troubleshooting an OEM 12c issue.
Thanks,

Alfredo

Oracle Enterprise Manager Security– Disable SYSMAN access

In Enterprise Manager 12c SYSMAN user is the schema owner and as a best practice all the users should log in using their own individual accounts. To enforce this you can prevent SYSMAN from login into the console and/or emcli by setting SYSTEM_USER to -1 in the MGMT_CREATED_USERS table:
UPDATE MGMT_CREATED_USERS
SET SYSTEM_USER=’-1’
WHERE user_name=’SYSMAN’
To re-enable the access just set it to 1.
UPDATE MGMT_CREATED_USERS
SET SYSTEM_USER=’1’
WHERE user_name=’SYSMAN’
Refer to Oracle Support’s note:
How To Disable SYSMAN & SYSTEM Users from Logging into Grid Console? (Doc ID 867360.1)
Thanks,

Alfredo

Oracle Enterprise Manager – Reducing the noise, Part 1

Enterprise Manager 12c is a great monitoring tool, with it you can monitor a wide range of target types from databases to middleware; although out-of-the-box metrics can suit your monitoring requirements they can generate a considerable amount of white noise. In order to reduce this noise first you have to identify which are the top alerts in your system; Cloud Control comes with several predefined reports that help you to dig into multiple areas of your system, there’s a report “20 Most Common Alerts” which shows you the incidence of common alerts.


In the picture above, you can clearly see that metric “Database Time Spent Waiting (%)” appears twice in my Top 3, let’s find out our metric setting for my DB targets; in order to do this we must go to a DB home page then Oracle Database -> Monitoring -> Metrics and Collection Settings.  

 
Wait a minute! Why I’m receiving alerts if there are no thresholds setup for any of those metrics?, this behavior is clearly explained in MOS note 1500074.1 about a default warning threshold of 30% inside the database configuration. Let’s take a look to dba_threshold to confirm.
set lines 300
column METRICS_NAME format a30
column WARNING_OPERATOR format a30
column WARNING_VALUE format a30
column CRITICAL_OPERATOR format a30
column CRITICAL_VALUE format a30
SELECT METRICS_NAME,WARNING_OPERATOR ,WARNING_VALUE,CRITICAL_OPERATOR ,CRITICAL_VALUE FROM DBA_THRESHOLDS;
METRICS_NAME                        WARNING_OPERATOR               WARNING_VALUE                  CRITICAL_OPERATOR              CRITICAL_VALUE
———————————– —————————— —————————— —————————— ——————————
Average Users Waiting Counts        GT                             10                             NONE
Average Users Waiting Counts        GT                             10                             NONE
Average Users Waiting Counts        GT                             10                             NONE
Average Users Waiting Counts        GT                             10                             NONE
Average Users Waiting Counts        GT                             10                             NONE
Average Users Waiting Counts        GT                             10                             NONE
Average Users Waiting Counts        GT                             30                             NONE
Average Users Waiting Counts        GT                             30                             NONE
Blocked User Session Count          GT                             0                              NONE
Current Open Cursors Count          GT                             1200                           NONE
Database Time Spent Waiting (%)     GT                             30                             NONE
Database Time Spent Waiting (%)     GT                             30                             NONE
Database Time Spent Waiting (%)     GT                             30                             NONE
Database Time Spent Waiting (%)     GT                             30                             NONE
Database Time Spent Waiting (%)     GT                             30                             NONE
Database Time Spent Waiting (%)     GT                             30                             NONE
Database Time Spent Waiting (%)     GT                             50                             NONE
Database Time Spent Waiting (%)     GT                             50                             NONE
Logons Per Sec                      GE                             100                            NONE
Session Limit %                     GT                             90                             GT                             97
Tablespace Bytes Space Usage        DO NOT CHECK                   0                              DO_NOT_CHECK                   0
Tablespace Space Usage              GE                             85                             GE                             97
22 rows selected.
There you go!, all metrics for “Database Time Spent Waiting (%)” are set to 30% or 50% values, now the trick to disable these metrics is to set them to a different value like 99%; this will override the default value as follows:

  
Let’s look at the database setting again:
set lines 300
column METRICS_NAME format a30
column WARNING_OPERATOR format a30
column WARNING_VALUE format a30
column CRITICAL_OPERATOR format a30
column CRITICAL_VALUE format a30
METRICS_NAME                        WARNING_OPERATOR               WARNING_VALUE                  CRITICAL_OPERATOR              CRITICAL_VALUE
———————————– —————————— —————————— —————————— ——————————
Average Users Waiting Counts        GT                             10                             NONE
Average Users Waiting Counts        GT                             10                             NONE
Average Users Waiting Counts        GT                             10                             NONE
Average Users Waiting Counts        GT                             10                             NONE
Average Users Waiting Counts        GT                             10                             NONE
Average Users Waiting Counts        GT                             10                             NONE
Average Users Waiting Counts        GT                             30                             NONE
Average Users Waiting Counts        GT                             30                             NONE
Blocked User Session Count          GT                             0                              NONE
Current Open Cursors Count          GT                             1200                           NONE
Database Time Spent Waiting (%)     GT                             99                             NONE
Database Time Spent Waiting (%)     GT                             99                             NONE
Database Time Spent Waiting (%)     GT                             99                             NONE
Database Time Spent Waiting (%)     GT                             99                             NONE
Database Time Spent Waiting (%)     GT                             99                             NONE
Database Time Spent Waiting (%)     GT                             99                             NONE
Database Time Spent Waiting (%)     GT                             99                             NONE
Database Time Spent Waiting (%)     GT                             99                             NONE
Database Time Spent Waiting (%)     GT                             99                             NONE
Database Time Spent Waiting (%)     GT                             99                             NONE
Database Time Spent Waiting (%)     GT                             99                             NONE
Logons Per Sec                      GE                             100                            NONE
Session Limit %                     GT                             90                             GT                             97
Tablespace Bytes Space Usage        DO NOT CHECK                   0                              DO_NOT_CHECK                   0
Tablespace Space Usage              GE                             85                             GE                             97
25 rows selected.
We successfully modified these metrics to a very high value; at this point you can decide to stay at 99% or you can remove that threshold in order to completely disable them.

Now let’s confirm those settings in the database:
set lines 300
column METRICS_NAME format a30
column WARNING_OPERATOR format a30
column WARNING_VALUE format a30
column CRITICAL_OPERATOR format a30
column CRITICAL_VALUE format a30
METRICS_NAME                        WARNING_OPERATOR               WARNING_VALUE                  CRITICAL_OPERATOR              CRITICAL_VALUE
———————————– —————————— —————————— —————————— ——————————
Average Users Waiting Counts        GT                             10                             NONE
Average Users Waiting Counts        GT                             10                             NONE
Average Users Waiting Counts        GT                             10                             NONE
Average Users Waiting Counts        GT                             10                             NONE
Average Users Waiting Counts        GT                             10                             NONE
Average Users Waiting Counts        GT                             10                             NONE
Average Users Waiting Counts        GT                             30                             NONE
Average Users Waiting Counts        GT                             30                             NONE
Blocked User Session Count          GT                             0                              NONE
Current Open Cursors Count          GT                             1200                           NONE
Logons Per Sec                      GE                             100                            NONE
Session Limit %                     GT                             90                             GT                             97
Tablespace Bytes Space Usage        DO NOT CHECK                   0                              DO_NOT_CHECK                   0
Tablespace Space Usage              GE                             85                             GE                             97
14 rows selected.
The metrics are not there anymore and hopefully the alerts neither. This behavior is also noted for “Average Users Waiting Counts” metric, if you are receiving considerable white noise for this metric you can disable as well following the same procedure. A good practice is to create a Monitoring template to help you modify these thresholds for multiple targets at once.
Stay tuned for my next post about reducing OEM 12c noise.
Thanks,

Alfredo