Print Friendly, PDF & Email

 
SYMPTOMS

When the Oracle Enterprise Manager Cloud Control 13c is used to monitor the cluster database it repeatedly sends ‘Global Cache Blocks Lost‘ alerts every 5 minutes.

The alert message looks like EM Event: Warning:GPPRD_GPPRD1 – Total global cache block lost is 15.

Host=db-gp01.local 
Target type=Database Instance 
Target name=GPPRD_GPPRD1 
Categories=Error 
Message=Total global cache block lost is 15. 
Severity=Warning 
Event reported time=Aug 21, 2020 2:30:09 AM MSK 
Operating System=Linux
Platform=x86_64
Associated Incident Id=138883 
Associated Incident Status=New 
Associated Incident Owner= 
Associated Incident Acknowledged By Owner=No 
Associated Incident Priority=None 
Associated Incident Escalation Level=0 
Event Type=Metric Alert 
Event name=rac_global_cache:lost 
Metric Group=Global Cache Statistics
Metric=Global Cache Blocks Lost
Metric value=15
Key Value= 
Rule Name=ROOT_NOTIFICATION_RULE,ALL TARGET EVENTS 
Rule Owner=SYSMAN 
Update Details:
Total global cache block lost is 15. 

 
 
DIAGNOSE

 
Your email box is full of the messages like the EM Incident: Warning:New: – Total global cache block lost is 15.
In the OMS repository database there is a high number of sent Global Cache Blocks Lost alerts for the target instance

SELECT TO_DATE(COLLECTION_TIMESTAMP, 'DD-MM-YYYY') "RECEIVED AT",
       COUNT(COLLECTION_TIMESTAMP) "ALERTS"
FROM
       MGMT_VIEW.MGMT$ALERT_NOTIF_LOG
WHERE
       METRIC_NAME='rac_global_cache' AND
       METRIC_COLUMN='lost' AND
       COLUMN_LABEL = 'Global Cache Blocks Lost' AND
       TARGET_NAME = '&INSTANCE_NAME' 
GROUP BY 
       TO_DATE(COLLECTION_TIMESTAMP, 'DD-MM-YYYY')
ORDER BY 1;
Enter value for instance_name: GPPRD_GPPRD1
old   7:        TARGET_NAME = '&INSTANCE_NAME'
new   7:        TARGET_NAME = 'GPPRD_GPPRD1'

RECEIVED AT            ALERTS
------------------ ----------
14-AUG-20                 142
15-AUG-20                 202
16-AUG-20                 202
17-AUG-20                 202
18-AUG-20                 202
19-AUG-20                 202
20-AUG-20                 202
21-AUG-20                  64

8 rows selected.

 
From the output it’s seen that 202 related alert messages generated every day for the database instance GPPRD_GPPRD1.

During the day there are not many lost blocks (or even zero lost block) for the database instance

SET PAGES 999
SET LINES 300
COL MESSAGE FOR A60

SELECT TO_CHAR(COLLECTION_TIMESTAMP, 'HH24:MI DD-MM-YYYY') ALERTED_AT,
       MESSAGE 
FROM
       MGMT_VIEW.MGMT$ALERT_NOTIF_LOG 
WHERE
      TO_DATE(COLLECTION_TIMESTAMP, 'DD-MM-YYYY')=TO_DATE('&DDMMYYYY', 'DD-MM-YYYY') AND 
      TARGET_NAME = '&INSTANCE_NAME'
ORDER BY COLLECTION_TIMESTAMP;
Enter value for ddmmyyyy: 20-AUG-20
old   7:       TO_DATE(COLLECTION_TIMESTAMP, 'DD-MM-YYYY')=TO_DATE('&DDMMYYYY', 'DD-MM-YYYY') AND
new   7:       TO_DATE(COLLECTION_TIMESTAMP, 'DD-MM-YYYY')=TO_DATE('20-AUG-20', 'DD-MM-YYYY') AND
Enter value for instance_name: GPPRD_GPPRD1
old   8:       TARGET_NAME = '&INSTANCE_NAME'
new   8:       TARGET_NAME = 'GPPRD_GPPRD1'

ALERTED_AT       MESSAGE
---------------- ------------------------------------------------------------
00:00 20-08-2020 Total global cache block lost is 15.
00:00 20-08-2020 Total global cache block lost is 15.
00:05 20-08-2020 Total global cache block lost is 15.

...

23:50 20-08-2020 Total global cache block lost is 15.
23:50 20-08-2020 Total global cache block lost is 15.
23:55 20-08-2020 Total global cache block lost is 15.
23:55 20-08-2020 Total global cache block lost is 15.

202 rows selected.

 
In this example I have zero lost block during the day (at 20-AUG-20 betweeen 00:00 and 23:55), however, I received alerts throughout the day.

 
 
SOLUTION

 
First I want to say that the note is not about how to troubleshoot and resolve lost block issue. For that purpose the 563566.1 and 2296681.1 must be followed. The note is about why OEM keeps sending the alerts even if there are no lost blocks for the last hours(days, weeks and so on).

Well,
The Global Cache Blocks Lost alert is based on the Global Cache Blocks Lost Metric in Enterprise Manager.

By default the metric has the following threshold values : 1 lost block for WARNING and 3 lost block for CRITICAL. To find a number of lost blocks the Metric uses the gc blocks lost statistic of the V$SYSSTAT (GV$SYSSTAT) view of the target instance.

SET PAGES 999
SET LINES 300
COL NAME FOR A20
COL VALUE FOR 999999
SELECT NAME, VALUE FROM V$SYSSTAT WHERE NAME='gc blocks lost';
NAME                   VALUE
-------------------- -------
gc blocks lost            15

 
NOTE: The V$SYSSTAT is based on GV$SYSSTAT

SET LINES 300
SET PAGES 999
COL VIEW_DEFINITION FOR A50 WORD_WRAPPED
SELECT VIEW_DEFINITION 
FROM
       V$FIXED_VIEW_DEFINITION
WHERE
       VIEW_NAME='V$SYSSTAT';

VIEW_DEFINITION
--------------------------------------------------
select  STATISTIC# , NAME , CLASS , VALUE,
STAT_ID, CON_ID from GV$SYSSTAT where inst_id =
USERENV('Instance')

 
The V$SYSSTAT view keeps value of lost blocks (statistics name gc blocks lost) since the instance startup. It means that the value of gc blocks lost statistic can be only increased. But once it increased it never reset until the next instance restart.

When a number of lost block (gc blocks lost) exceeded the threshold value (1 block for warning or 3 blocks for critical) of the Global Cache Blocks Lost Metric the OEM starts to send alerts. As the statistic value (gc blocks lost) will never be less than 1 or 3 until the next instance restart a number of lost blocks in this case will always be more than the value of thresholds.

That’s why the OEM keeps sending Global Cache Blocks Lost alerts every 5 minutes after the Threshold exceeded once.
Even if there is no lost block for a long period of time (hours, days, weeks) you will still receive the Global Cache Blocks Lost messages.

If you are sure the cluster instance has no problem with loosing blocks (gc blocks lost) then disable the Global Cache Blocks Lost Metric. It will stop spamming your email box. To do so just empty thresholds for the metric.

 
 
REFERENCES
 

Troubleshooting gc block lost and Poor Network Performance in a RAC Environment (Doc ID 563566.1)
WAITEVENT: “gc current/cr block lost” Reference Note (Doc ID 2296681.1)
Tuning Inter-Instance Performance in RAC and OPS (Doc ID 181489.1)

EM 13c: How to disable “Global Cache Blocks Lost Metric” Using EMCLI (Doc ID 2543134.1)
False increase of ‘Global Cache Blocks Lost’ or ‘gc blocks lost’ after upgrade to 12c (Doc ID 2096299.1)

 
NOTE: You can find ratio of lost blocks by the following query against the target instance

SET PAGES 999
SET LINES 300
COL RATIO FOR 99999999

SELECT A.INST_ID "INSTANCE",
       A.VALUE "GC BLOCKS LOST",
       B.VALUE "GC CUR BLOCKS SERVED",
       C.VALUE "GC CR BLOCKS SERVED",
       A.VALUE/(B.VALUE+C.VALUE) RATIO
FROM
       GV$SYSSTAT A, 
       GV$SYSSTAT B,
       GV$SYSSTAT C
WHERE
       A.NAME='gc blocks lost' AND
       B.NAME='gc current blocks served' AND
       C.NAME='gc cr blocks served' and
       B.INST_ID=a.inst_id AND
       C.INST_ID = a.inst_id;

  INSTANCE GC BLOCKS LOST GC CUR BLOCKS SERVED GC CR BLOCKS SERVED     RATIO
---------- -------------- -------------------- ------------------- ---------
         1             15             32576274               42979         0

 
 

Version  : 20:53 23.08.2020
Database : 12.1.0.2, 12.2.0.1.0
OEM      : 13.3.0.0.0