AUDIT_TRAIL = DB, Portal and Grid Control

If your not familiar with Oracle’s auditing features then a good place to start is with Oracle’s Documentation.   I personally think Oracle’s documentation, especially for the database products is great.

The above documentation linked to above describes what auditing is, why you would need it, etc.   There’s no need for me to repeat it here.

Our environment is pretty small, with a limited number of people who have access to the databases, etc.   So normally I have the audit trail disabled, however for this particular database we did need to increase logging to investigate some issues.  After we were finished, we used NOAUDIT to disable the extra logging we enabled.   However we left AUDIT_TRAIL set to DB.

Move to a few months later and on a routine scan of database performance I noticed a query consuming a fair amount of resources. 

SQL Details: gh9pd08vhptgr
SELECT TO_CHAR(current_timestamp AT TIME ZONE 'GMT', 'YYYY-MM-DD HH24:MI:SS TZD') AS curr_timestamp, COUNT(username) AS failed_count
FROM sys.dba_audit_session
WHERE returncode != 0 AND timestamp >= current_timestamp - TO_DSINTERVAL('0 0:30:00')


By looking at the screenshot above it seems pretty nasty but in reality it was just a small spike on the Grid Control Top Activity chart.    As I mentioned, our environment is small and our servers have more than enough horsepower, so it didn’t have a significant impact on performance. 

From Metalink note: Slow Performance Of DBA_AUDIT_SESSION Query From EM [ID 829103.1]  “There is a known performance issue with DBA_AUDIT_SESSION table per non-published Bug 7633167”

However with AUDIT_TRAIL=DB, LOGON and LOGOFF’s were still be recorded.  Since this database hosts our Portal schema it mean upwards of 200k of entries going into the sys.aud$ daily, for a grand total of 32 million rows!

The note provides two options for the query above, the first is to disable the Failed Login Count Metric, the second is to purge the sys.aud$ table.  Note 73408.1 How to Truncate, Delete, or Purge Rows from the Audit Trail Table SYS.AUD$

Since at the moment we don’t require auditing to be enabled, then I am simply going to truncate the sys.aud$ table and disable auditing by setting the database initialization parameter AUDIT_TRAIL=none and restarting the database at the next maintenance window.

If you do require auditing then you should setup a purging strategy.  If you need to do this, some good blog articles to read are:

Receiving Clear alerts after a Blackout–Grid Control

Grid Control allows you to set blackouts for targets so that while your performing maintenance you won’t get notified    You can also use them to disable notifications while your working on an issue.  Nothing worse than being paged multiple times while your trying to fix an issue. 

One of our applications has a component which requires a quick nightly bounce schedule via cron.   So I setup a blackout in Grid Control to start 5 minutes before and extend to 5 minutes after the restart.   However, at 3am I received a lovely page letting me know that an alert has cleared:

Subject: EM Alert: Clear:MyApp PROD - Test MyApp Login Page is now up

Target Name=MyApp
Target type=Web Application
Occurred At=Mar 5, 2012 3:15:00 AM EST
Message=Test MyApp Login Page is now up: MyApp Login Page has status 6 since 03/05/12 03:15:00 till 03/05/12 03:15:00 in America/New_York. Beacon RCPSC Status: 1 from 02/28/12 09:54:31 till 03/05/12 03:17:13 in -05:00. No new severities found after the blackout Metric data found after blackout, using the latest severity Latest severity from the beacon is 15 at 02/29/12 21:13:04 Beacon votes up Beacon Tenzing Status: 1 from 02/28/12 10:20:19 till 03/05/12 03:16:48 in America/New_York. No new severities found after the blackout Metric data found after blackout, using the latest severity Latest severity from the beacon is 15 at 02/29/12 19:26:14 Beacon votes up The final status of the test is UP from 03/05/12 03:15:00 till 03/05/12 03:17:13
Metric=[Test Response] Status

Strange.    Initially I thought that there may have been a system time issue between the servers, or that the restart had taken longer than I expected.   Looking into it tho that was not the case. A search on metalink turns up that there is a bug:

Bug 10210193 WEB APPS SERVICE GENERATE ERROR WHEN BLACKOUT START the Notification rule used has the metric [Test Response] Status which is generating the notification.

The solution is to remove the remove the [Test Response] metric from the rule.    No sleep interruptions the following night!