I’m in the process of studying for the OCP exam and I personally find it much easier to remember something if I write it down. What better way to do this than creating a blog post? Unfortunately its some pretty basic stuff so it might not garner much interest.
My test environment has the following redo log configuration:
1 select group#, sequence#, bytes/1024/1024 as "Size (MB)", members, status
2* from v$log
L> /
GROUP# SEQUENCE# Size (MB) MEMBERS STATUS
-------- ---------- ---------- ---------- ----------------
1 5 50 2 INACTIVE
2 6 50 2 CURRENT
3 4 50 2 INACTIVE
SQL> l
1 select group#, member
2 from v$logfile
3* order by 1
SQL> /
GROUP# MEMBER
---------- ----------------------------------------
1 /home/oracle/oradata/orcl/redo01.log
1 /home/oracle/oradata/orcl/redo01a.log
2 /home/oracle/oradata/orcl/redo02a.log
2 /home/oracle/oradata/orcl/redo02.log
3 /home/oracle/oradata/orcl/redo03a.log
3 /home/oracle/oradata/orcl/redo03.log
As you can see, there are 3 redo log groups with 2 members each. In order to simulate a failure I am going to remove one of the members.
SQL> !rm /home/oracle/oradata/orcl/redo01a.log
SQL> alter system switch logfile;
System altered.
SQL> /
System altered.
Oracle will not detect a problem until it tries to use the missing redo log file so I had to issue two logfile switches before redo01a.log was accessed. At that point the following errors show up in the alert.log.
Errors in file /home/oracle/product/10.2.0/db_1/admin/orcl/bdump/orcl_lgwr_2648.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '/home/oracle/oradata/orcl/redo01a.log'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
Thu Jan 15 15:40:44 2009
Errors in file /home/oracle/product/10.2.0/db_1/admin/orcl/bdump/orcl_lgwr_2648.trc:
ORA-00321: log 1 of thread 1, cannot update log file header
ORA-00312: online log 1 thread 1: '/home/oracle/oradata/orcl/redo01a.log'
Thu Jan 15 15:40:45 2009
Errors in file /home/oracle/product/10.2.0/db_1/admin/orcl/bdump/orcl_lgwr_2648.trc:
ORA-00313: open failed for members of log group 1 of thread 1
As well, if we take another look at v$logfile, the missing redo log will have a status of INVALID:
SQL>
GROUP# MEMBER STATUS
---------- ---------------------------------------- -------
3 /home/oracle/oradata/orcl/redo03.log
2 /home/oracle/oradata/orcl/redo02.log
1 /home/oracle/oradata/orcl/redo01.log
1 /home/oracle/oradata/orcl/redo01a.log INVALID
2 /home/oracle/oradata/orcl/redo02a.log
3 /home/oracle/oradata/orcl/redo03a.log
In this particular case, the loss of redo01a.log file is not critical. Since the group is multiplexed (more than 1 member) Oracle can operate without impacting users. Each time Oracle tries to use the missing redo log file the errors will be displayed again in alert log.
There are two methods in order to recover form this scenario:
- Issue the CLEAR LOGFILE command.
- Drop the missing member and add a new one.
Which one should you choose? That depends on the circumstances surrounding why the redo log is missing. The CLEAR LOGFILE command reinitializes the logfile group, similiar to dropping and adding each member. If the failure is due to a corrupted disk, then this command will fail as it recreates the group in place. If this is the case, you will need to use the second method of dropping and adding a new member so you can specify a new location.
Lets try issuing the CLEAR LOGFILE command:
SQL> alter database clear logfile group 1;
Database altered.
Thats it, everything is back to normal.
Lets try the other method of dropping and adding a new member:
SQL> !rm /home/oracle/oradata/orcl/redo01a.log
SQL> alter database drop logfile member '/home/oracle/oradata/orcl/redo01a.log';
Database altered.
SQL> alter database add logfile member '/home/oracle/oradata/orcl/redo01a.log' to group 1;
Database altered.
So, what happens if you lose all members of a redo log group? The steps you have to take are dependent on the groups status and whether or not the database is in archivelog mode.
INACTIVE:
- If the affected redo log group has a status of INACTIVE it is no longer required for crash recovery. If you are in NOARCHIVELOG mode issue either CLEAR LOGFILE or recreate the group manually.
- If you are in ARCHIVELOG mode and the group has been archived, issue either of the steps above.
- If you are in ARCHIVELOG mode and the group hasn’t been archived then issue CLEAR UNARCHIVED LOGFILE. If you don’t specify the UNARCHIVED keyword you will receive an error. After you execute this command, you will have a gap in your archivelogs, so perform a complete backup (including the control file) ASAP.
ACTIVE:
- If the redo log group has a status of ACTIVE, it is still required for crash recovery. Issue the command
ALTER
SYSTEM
CHECKPOINT
, If successful then follow the steps above for INACTIVE.
- If the checkpoint fails, then you need to perform recovery. If you are in NOARCHIVELOG mode then you need to perform a complete recovery of the last cold backup.
- If the checkpoint fails and you are in ARCHIVELOG mode then you can perform an INCOMPLETE recovery up to the previous log file.
CURRENT:
The current redo log group is the one which Oracle is currently writing to. If you lose this group or files become corrupted, LGWR may terminate. For example:
Corrupted files:
Fri Jan 30 14:14:55 2009
Errors in file /apps/oravis/db/tech_st/10.2.0/admin/test/bdump/test_lgwr_25599.trc:
ORA-00316: log 1 of thread 1, type 8944 in header is not log file
ORA-00312: online log 1 thread 1: '/oradata/dbf2/testdata/test/redo01.log'
Fri Jan 30 14:14:55 2009
LGWR: terminating instance due to error 316
Instance terminated by LGWR, pid = 25599
Missing files:
Fri Jan 30 15:04:34 2009
Errors in file /apps/oravis/db/tech_st/10.2.0/admin/test/bdump/test_lgwr_1434.trc:
ORA-00320: cannot read file header from log 1 of thread 1
ORA-00312: online log 1 thread 1: '/oradata/dbf2/testdata/test/redo01.log'
ORA-27091: unable to queue I/O
ORA-27072: File I/O error
SVR4 Error: 11: Resource temporarily unavailable
Additional information: 4
Additional information: 1
LGWR: terminating instance due to error 320
Instance terminated by LGWR, pid = 1434
- If LGWR terminates you have to recover the database:
- If you are in ARCHIVELOG mode, perform an incomplete recovery up to the previous log file.
- If you are in NOARCHIVELOG, perform a complete recovery of the last cold backup.
- If the database is still online, you can try and clear the logfile group. If the command is not successful then you will need to recover.