Thursday

ORA-01555 and Automatic Undo

In the past couple of years of support Oracle E-Business Suite I have only hit ORA-01555: snapshot too old twice... About a year ago and again today.

Before you start to troubleshoot you need to gather some information:
  • If your unaware of the current settings for undo you should check. One way is to use the show parameter command but you can also check the database initialization parameter file or spfile.


    SQL> show parameter undo

    NAME TYPE VALUE
    ------------------------------------ ----------- -------------
    undo_management string AUTO
    undo_retention integer 1200
    undo_suppress_errors boolean FALSE
    undo_tablespace string APPS_UNDOTS1


  • Determine which time the error occurred from the alert log, which will help you narrow down the time for a query we will use later. You alert log should have an entry similiar to:

    Sat Aug 16 23:05:41 2008
    ORA-01555 caused by SQL statement below (Query Duration=1506 sec, SCN: 0x0000.a8d6d835):
    Sat Aug 16 23:05:41 2008
    SELECT X FROM Y WHERE A=B

Now that you the automatic undo settings and the time of the error execute the following query (substitute the timestamp of the error in your alert log in place of the one below, with some buffer at either end):

SQL> select to_char(begin_time,'MM/DD/YYYY HH24:MI') begin_time, UNDOBLKS, UNXPSTEALCNT, EXPSTEALCNT ,
SSOLDERRCNT, NOSPACEERRCNT, MAXQUERYLEN
from v$undostat
where begin_time between to_date('08/16/2008 22:30:00','MM/DD/YYYY HH24:MI:SS')
and to_date('08/16/2008 23:30:00','MM/DD/YYYY HH24:MI:SS')
order by begin_time;


BEGIN_TIME UNDOBLKS UNXPSTEALCNT EXPSTEALCNT SSOLDERRCNT NOSPACEERRCNT MAXQUERYLEN
---------------- -------- ------------ ----------- ----------- ------------- -----------
08/16/2008 22:32 10394 0 0 0 0 513
08/16/2008 22:42 5693 0 0 0 0 212
08/16/2008 22:52 10853 0 0 0 0 451
08/16/2008 23:02 8950 0 0 0 0 1653
08/16/2008 23:05 11284 0 0 1 0 1506

5 rows selected.


Based on this query we can determine a couple of things such as the number of transactions which required UNDO during the time period. How many undo blocks they required and whether there were any (successful) atempts to steal undo space from other transactions.

The query also shows which transactions received ora-1555 errors (SSOLDERRCNT) and whether or not there was enough space in the UNDO tablespace (NOSPACEERRCNT). If you look at the row above with a timestamp of 23:05, this is the transaction which produced my ora-1555 error. We know this because the SSOLDERRCNT column for this row has the value 1 and the timestamp is the same as the one in the log file. Since the MAXQUERYLEN for this transaction (1506) is greated than the undo_retention (1200) one solution would be to increase undo_retention.*

*There are other ways to solve this problem to try and avoid ora-1555 errors such as transaction tuning, etc. More detail can be found in Note:10630.1

For this particular case I will set it to 2000 and monitor the undo tablespace. By increasing undo_retention undo is kept for a longer period of time, so you should monitor the undo tablespace to make sure you have enough room. Oracle 9i+ has advisors which can help you determine the impact of raising undo_retention. The following screen shot is taken from Oracle 9i Enterprise Manager but you can also find better graphs in 10G Database or Grid Control:




*Note: this graph is not from same database as above. Its just for informational purposes.

We can determine from this graph if we were to double undo retention from 30 to 60 minutes we would need at most 70MB of space. This estimate is based on undo activity since the database was initially started.

If the UNDO tablespace was too small, you would see a value in the NOSPACEERRCNT column. Take a look at the following row with a timestamp of 23:22:


BEGIN_TIME UNDOBLKS UNXPSTEALCNT EXPSTEALCNT SSOLDERRCNT NOSPACEERRCNT MAXQUERYLEN
---------------- -------- ------------ ----------- ----------- ------------- -----------
08/16/2008 23:22 1034 10 0 1 1 1428

This row has a values for UNXPSTEALCNT and NOSPACEERRCNT. This means the transaction was not able to obtain space from the UNDO tablespace and attempted to steal 10 blocks (UNXPSTEALCNT) from other transactions. In this case we need to add space to the UNDO tablespace because it was not large enough to meet the undo_retention requirement.

No comments: