THE FAILED LOGINS METRIC QUERY IS VERY SLOW BECAUSE OF SYS.AUD$ TABLE SIZE

Failed Logins metric in Oracle Enterprise Manage Cloud Control 13c is too slow.

SYMPTOMPS

An Oracle Enterprise Manage Cloud Control 13c is used to monitor an Oracle database. The Failed Logins metric is enabled by default. A query executed by the Failed Logins metric is very slow and causes high I/O in a monitored database.

DIAGNOSE

The Failed Logins metric is enabled for a monitored database

Based on a database version the Failed Logins metiric executes one of the following queries under DBSNMP user

c85zupw0dgrm4(19c)

SELECT TO_CHAR(current_timestamp AT TIME ZONE 'GMT', 'YYYY-MM-DD HH24:MI:SS TZD') AS curr_timestamp, 
SUM(failed_count) AS failed_count, TO_CHAR(MIN(first_occur_time), 'yyyy-mm-dd hh24:mi:ss') AS first_occur_time, 
TO_CHAR(MAX(last_occur_time), 'yyyy-mm-dd hh24:mi:ss') AS last_occur_time FROM ( SELECT COUNT(username) AS failed_count, 
MIN(timestamp) AS first_occur_time, MAX(timestamp) AS last_occur_time FROM sys.dba_audit_session WHERE returncode != 0 
AND timestamp >= current_timestamp - TO_DSINTERVAL('0 0:30:00') UNION SELECT COUNT(DBUSERNAME) AS failed_count, 
MIN(event_timestamp) AS first_occur_time, MAX(event_timestamp) AS last_occur_time FROM unified_audit_trail WHERE 
ACTION_NAME='LOGON' and return_code <> 0 AND event_timestamp >= current_timestamp - TO_DSINTERVAL('0 0:30:00'))

83jcj6at6ugq3 (12c)

SELECT TO_CHAR(current_timestamp AT TIME ZONE :"SYS_B_0", :"SYS_B_1") AS curr_timestamp, 
SUM(failed_count) AS failed_count, TO_CHAR(MIN(first_occur_time), :"SYS_B_2") AS first_occur_time, 
TO_CHAR(MAX(last_occur_time), :"SYS_B_3") AS last_occur_time FROM ( SELECT COUNT(username) AS failed_count, 
MIN(timestamp) AS first_occur_time, MAX(timestamp) AS last_occur_time FROM sys.dba_audit_session 
WHERE returncode != :"SYS_B_4" AND timestamp >= current_timestamp - TO_DSINTERVAL(:"SYS_B_5") 
UNION SELECT COUNT(DBUSERNAME) AS failed_count, MIN(event_timestamp) AS first_occur_time, 
MAX(event_timestamp) AS last_occur_time  FROM unified_audit_trail WHERE ACTION_NAME=:"SYS_B_6" 
and return_code <> :"SYS_B_7" AND event_timestamp >= current_timestamp - TO_DSINTERVAL(:"SYS_B_8"))

1c74605ddbvjt (11g)

SELECT TO_CHAR(current_timestamp AT TIME ZONE :"SYS_B_0", :"SYS_B_1") AS curr_timestamp, 
COUNT(username) AS failed_count, TO_CHAR(MIN(timestamp), :"SYS_B_2") AS first_occur_time, 
TO_CHAR(MAX(timestamp), :"SYS_B_3") AS last_occur_time FROM  sys.dba_audit_session 
WHERE returncode != :"SYS_B_4" AND ACTION_NAME = :"SYS_B_5" AND 
timestamp >= current_timestamp - TO_DSINTERVAL(:"SYS_B_6")

A Failed Logins metric query is very slow and most part of elapsed time of a query is IO wait time. It can be easily confirmed by either reviewing execution details of the query in the V$SQL view

Example of query statistics with slow performance

*************************************** : ******************
                                  SQL_ID : 83jcj6at6ugq3
                                  Child# : 0
                           Cursor Status : VALID
                     Is Cursor Obsolete? : N
                          Parsing Schema : DBSNMP
                        System Timestamp : 17-08-2020 17-21-21
                               First Run : 2020-01-06/12:53:04
                        Last Active Time : 17-08-2020 17-19-39
                   Elapsed Time Total(s) : 34318.82
                       CPU Time Total(s) : 3851.97
                   IO Wait Time Total(s) : 33462.37
          Concurrency Wait Time Total(s) : 3.4
          Application Wait Time Total(s) : .95
              Cluster Wait Time Total(s) : 0
                              Executions : 91
                Elapsed Time per Exec(s) : 377.13
                    CPU Time per Exec(s) : 42.33
                IO Wait Time per Exec(s) : 367.72
       Concurrency Wait Time per Exec(s) : .04
       Application Wait Time per Exec(s) : .01
           Cluster Wait Time per Exec(s) : 0
                                         :
                              Disk Reads : 352363760
                             Buffer Gets : 371205986
                           Direct Writes : 0
                           Row Processed : 91
*************************************** : ******************

According to V$SQL view there were 91 executions those took 9.5 hours to complete

or by reviewing the section SQL ordered by User I/O Wait Time in an AWR report

In this specific example, IO Wait Time of 83jcj6at6ugq3 query is almost 97,5% of total query elapsed time. A query is responsible for large part of I/O load (direct path read wait event) on a database

In this example, the direct path read is almost half time (48,7%) of dbtime.

CAUSE

The reason a Failed login metric query is too slow with high I/O impact on a database is that SYS.AUD$ table is too large. As by design there is no normal indexes on SYS.AUD$ table, a metric query is accessing SYS.AUD$ table through FULL TABLE SCAN. The more size of the table the more time it takes to read the table.

SET PAGES 999
SET LINES 300
COL OWNER FOR A10 
COL SEGMENT_NAME FOR A20 
COL SEGMENT_TYPE FOR A15 
COL MB FOR 9999999

SELECT OWNER,
       SEGMENT_NAME,
       SEGMENT_TYPE,
       ROUND(BYTES/1024/1024) MB 
FROM
       DBA_SEGMENTS
WHERE
       SEGMENT_NAME='AUD$';

OWNER      SEGMENT_NAME         SEGMENT_TYPE          MB
---------- -------------------- --------------- --------
SYS        AUD$                 TABLE              28780

NOTE

By design there is no normal indexes on SYS.AUD$ table

SET PAGES 999
SET LINES 300
COL OWNER FOR A15 
COL INDEX_NAME FOR A30
COL INDEX_TYPE FOR A20

SELECT OWNER,
       INDEX_NAME,
       INDEX_TYPE	   
FROM
       DBA_INDEXES 
WHERE 
       TABLE_OWNER='SYS' AND 
       TABLE_NAME='AUD$';

OWNER           INDEX_NAME                     INDEX_TYPE
--------------- ------------------------------ --------------------
SYS             SYS_IL0000000407C00040$$       LOB
SYS             SYS_IL0000000407C00041$$       LOB

SOLUTION

A proper solution is to regularly purge SYS.AUD$ table. Refer to SYS.AUD$ TABLE IS VERY LARGE for details about how to purge the table.

NOTE

If you need to keep all audit records in SYS.AUD$ table, then the only way to reduce impact of a metric query is to create aditional indexes

The Effect Of Creating Index On Table Sys.Aud$ (Doc ID 1329731.1)
Partitioning The AUD$ Table (Doc ID 1379831.1)
Audit Tablespace Datafile Fails To Extend When AUD$ Grows. (Doc ID 1310035.1)
DBUPGDIAG.SQL Takes Long Time On Selecting SYS.AUD$ (Doc ID 1941679.1)
Performance Issues Caused by DB Collector When AUD$ Is Very Large (Full Table Scan on AUD$) (Doc ID 1356088.1)

NOTE

To reduce impact of a metric query on a database the following 2 metrics can be disabled permanently

1 Failed Logins metric

A Target Database -> Monitoring -> Metric and Collection Settings -> Search for "Failed Logins" -> 
Click on "Every 30 minutes" -> Disable

2 Failed Logins (Historical) metric

A Target Database -> Monitoring -> Metric and Collection Settings -> Other Collected Items -> 
Failed Logins (Historical) -> Click on the Duration "Every 12 hours" -> Disable

REFERENCES

13.2 EM: "Failed Logins” and “Failed Logins (Historical)" Metrics Keep Running Despite Being Disabled (Doc ID 2459666.1)
EM13c : How to disable Failed Logins Metric Completely? (Doc ID 2429080.1)
EM 12c, 11g: High Cpu Utilization from Enterprise Manger Agent perl Process Running failedLogin.pl for audit_failed_logins metric (Doc ID 1265699.1)

Written At

07 SEPTEMBER 202014:30

Red Hat Release

7.x x64

Enterprise Manager

13.3.0.0.0

Database Release

11.2.0.4

Database Release

12.1.0.1

THE FAILED LOGINS METRIC QUERY IS VERY SLOW BECAUSE OF SYS.AUD$ TABLE SIZE

Tags In