Thursday

Refreshing an 11i Database using Rman

In order to facilitate troubleshooting we maintain a test environment which is a nightly copy of our 11i production environment. Since this environment is usually used to test data fixes it has to be as up to date as possible. To perform the database refresh we use rman's duplicate feature.

The goal of this article isn't just to provide the entire set of scripts and send you on your way. I think its safe to say that most EBS environments aren't identical, so its not like you could take them and execute with no issues. Instead i'll highlight the steps we follow and some of the key scripts.

NOTE: This doesn't include any pre-setup steps such as, if this is the first time duplicating the database make sure you have the parameters db_file_name_convert and log_file_name_convert specified in your test environments init file.

  • Step 1: Shutdown the test environment. If you are using 10g then remove any tempfiles. In 10g, rman now includes tempfile information and if they exist you will encounter errors. Check this previous post. Startup the database in nomount mode.
  • Step 2: Build a Rman Script. There are a couple of ways to recover to a point in time and we have decided to use SCN numbers. Since this process needs to be automated, we query productions rman catalog and determine the proper SCN to use and build an rman script. Here it is:

    set feedback off
    set echo off
    set serverout on
    spool $HOME/scripts/prod_to_vis.sql
    declare
    vmax_fuzzy number;
    vmax_ckp number;
    scn number;
    db_name varchar2(3) := 'VIS';
    log_file_dest1 varchar2(30) := '/dbf/visdata/';
    begin
    select max(absolute_fuzzy_change#)+1,
    max(checkpoint_change#)+1
    into vmax_fuzzy, vmax_ckp
    from rc_backup_datafile;
    if vmax_fuzzy > vmax_ckp then
    scn := vmax_fuzzy;
    else
    scn := vmax_ckp;
    end if;
    dbms_output.put_line('run {');
    dbms_output.put_line('set until scn '||to_char(scn)||';');
    dbms_output.put_line('allocate auxiliary channel ch1 type disk;');
    dbms_output.put_line('allocate auxiliary channel ch2 type disk;');
    dbms_output.put_line('duplicate target database to '||db_name);
    dbms_output.put_line('logfile group 1 ('||chr(39)||log_file_dest1||'log01a.dbf'||chr(39)||',');
    dbms_output.put_line(chr(39)||log_file_dest1||'log01b.dbf'||chr(39)||') size 10m,');
    dbms_output.put_line('group 2 ('||chr(39)||log_file_dest1||'log02a.dbf'||chr(39)||',');
    dbms_output.put_line(chr(39)||log_file_dest1||'log02b.dbf'||chr(39)||') size 10m;}');
    dbms_output.put_line('exit;');
    end;
    /
    spool off;

    This script produces a spool file called, prod_to_vis.sql:

    run {
    set until scn 14085390202;
    allocate auxiliary channel ch1 type disk;
    allocate auxiliary channel ch2 type disk;
    duplicate target database to VIS
    logfile group 1 ('/dbf/visdata/log01a.dbf',
    '/dbf/visdata/log01b.dbf') size 10m,
    group 2 ('/oradata/dbf/visdata/log02a.dbf',
    '/dbf/visdata/log02b.dbf') size 10m;}
    exit;
    Note: Our production nightly backups are on disk which are NFS mounted to our test server.

  • Step 3: Execute the rman script. Launch rman, connect to the target, catalog, auxiliary and execute the script above:

    ie.
    rman> connect target sys/syspasswd@PROD catalog rmancat/catpasswd@REPO auxiliary /


    You may want to put some error checking around rman to alert you if it fails. We have a wrapper script which supplies the connection information and calls the rman script above. Our refresh is critical so if it fails we need to be paged.

    rman @$SCRIPTS/prod_to_vis.sql
    if [ $? != 0 ]
    then
    echo Failed
    echo "RMAN Dupcliate Failed!"|mailx -s "Test refresh failed" pageremail@mycompany.com
    exit 1
    fi


  • Step 4: If production is in archivelog mode but test isn't, then mount the database and alter database noarchivelog;
  • Step 5: If you are using a hotbackup for cloning then you need to execute adupdlib.sql. This updates libraries with correct OS paths. (Appendix B of Note:230672.1)
  • Step 6: Change passwords. For database accounts such as sys, system and other non-applications accounts change the passwords using alter user. For applications accounts such as apps/applsys, modules, sysadmin, etc use FNDCPASS to change their passwords.

    ie. To change the apps password:

    FNDCPASS apps/<production appspassword=> 0 Y system/<system_passwd> SYSTEM applsys <new apps passwd>

  • Step 7: Run autoconfig.
  • Step 8: Drop any database links that aren't required in the test environment, or repoint them to the proper test environments.
  • Step 9: Follow Section 3: Finishing Tasks of Note:230672.1
    • Update any profile options which have still reference the production instance.

      Example:

      UPDATE FND_PROFILE_OPTION_VALUES SET
      profile_option_value = REPLACE(profile_option_value,'PROD','TEST')
      WHERE profile_option_value like '%PROD%


      Specifically check the FND_PROFILE_OPTION_VALUES, ICX_PARAMETERS, WF_NOTIFICATION_ATTRIBUTES and WF_RESOURCES tables and look for production hostnames and ports. We also update the forms title bar with the date the environment was refreshed:

      UPDATE apps.FND_PROFILE_OPTION_VALUES SET
      profile_option_value = 'TEST:'||' Refreshed from '||'Production: '||SYSDATE
      WHERE profile_option_id = 125
      ;

    • Cancel Concurrent requests. We don't need concurrent requests which are scheduled in production to keep running in test. We use the following update to cancel them. Also, we change the number of processes for the standard manager.

      update fnd_concurrent_requests
      set phase_code='C',
      status_code='D'
      where phase_code = 'P'
      and concurrent_program_id not in (
      select concurrent_program_id
      from fnd_concurrent_programs_tl
      where user_concurrent_program_name like '%Synchronize%tables%'
      or user_concurrent_program_name like '%Workflow%Back%'
      or user_concurrent_program_name like '%Sync%responsibility%role%'
      or user_concurrent_program_name like '%Workflow%Directory%')
      and (status_code = 'I' OR status_code = 'Q');

      update FND_CONCURRENT_QUEUE_SIZE
      set min_processes = 4
      where concurrent_queue_id = 0;


  • Step 10: Perform any custom/environment specific steps. We have some custom modules which required some modifications as part of cloning.
  • Step 11: Startup all of the application processes. ($S_TOP/adstrtal.sh)

    NOTE: If you have an application tier you may have to run autoconfig before starting up the services.

Hopefully this article was of some use even tho it was pretty vague at times. If you have any questions feel free to ask. Any corrections or better methods don't hesitate to leave a comment either.

How to Determine your Forms Patchset Level

There are supposedly two ways to determine your forms patchset level. The first method is to launch a form and click on Help -> About Oracle Applications. According to Note:74647.1 you should see it in the window that pops up. Maybe its just me, but I don't see it there.

Another method is to view the output by f60gen. Note:283985.1 provides a script which will determine the patchset level for you. You don't really need a script for it tho.

Execute f60gen and look at the first line:

oravisf@myserver=> f60gen
Forms 6.0 (Form Compiler) Version 6.0.8.27.0 (Production)



Subtract 9 from 27 and you'll have your patchset level. So I am on patchset level 18.

Friday

Part 4: (TAF) Transparent Application Failover and testing.


TAF, Transparent Application Failover is the ability for a connection to failover and reconnect to a surviving node. Transactions are rolled back but TAF can be configured to resume SELECT operations. Before you can use TAF you have to create a Service. Initially services were used for connection load balancing but over the years has developed into a method to distribute and manage workload across database instances. For more information on services see Oracles documentation:

http://www.oracle.com/technology/obe/10gr2_db_vmware/ha/rac/rac.htm

There are a few ways to create a service. You can use Grid Control, srvctl or dbca. For this exercise I am going to use DBCA.

Launch DBCA. Select Oracle Real Application Clusters database and click on next:

clip_image002[1]

Select Services Management and click on Next:

clip_image004[1]

RACDB should be selected, so click the Next button:

clip_image006[1]

On the following screen click on the Add button. When prompted for a service name enter RAC and click on the OK button:

clip_image008[1]

On the following screen verify both RAC instances (RACDB1, RACDB2) have the preferred option selected. Preferred means that the service will run on that instance when the environment first starts. An instance flagged as available means it will not run the service unless the preferred instances fail. Since this is a RAC environment we want users to be distributed between both nodes, thus they need to have the preferred option set.

The TAF policy is the failover settings. Select Basic. Basic means that the session is reconnected when a node fails. Preconnect means that a shadow process is created on another node in the cluster, to reduce time if there is a failure.

clip_image010[1]

Once the changes above have been completed, click on the finish button. A popup will appear prompting you to configure the services. Select OK.

clip_image012[1]
clip_image014[1]

Once the service configuration has completed, you will be asked if you’d like to perform another operation. After you select No, dbca will close.

clip_image016[1]

A check of the $ORACLE_HOME/network/admin/tnsnames.ora file will show the following new entry:

clip_image018[1]

Lets break down this new entry and highlight a few key lines:
Line 5: (LOAD_BALANCE = yes)
This line indicates that Oracle will randomly select one of the addresses defined on the previously two lines and connect to that nodes listener. This is called client-side connect-time load balancing.

Lines 9 -13: These lines are the TAF settings.

Line 10: (TYPE = SELECT)
There are two types of failover. SELECT indicates that the session will be authenticated on a surviving node but as well SELECT statements will be re-executed. Rows already returned to the client are ignored.
The other option is SESSION. In a SESSION failover, only the users session is re-authenticated.
Line 11: (METHOD = BASIC)
BASIC means that the session will not reconnect to a surviving node until the failover occurs.
PRECONNECT means a shadow process is created on a backup instance to reduce failover time. There are some additional considerations when choosing this setting so be sure to read up on it.

Line 12 and 13: (RETRIES = 180)
(DELAY = 5)
Self-explanatory, the maximum number of retries to failover and the amount of time in seconds to wait between attempts.

Testing TAF


Session 1: Login to the database via the service created above:

clip_image020[1]

Session 2: Login as sysdba and query gv$session to determine which instance Scott is connected to:

clip_image022[1]

Scott is connected to instance 2 (which resides on raclinux2), so lets shutdown that instance and see what happens. As sysdba, connect to RACDB2 and shutdown immediate. Once the instance has shutdown re-execute the query above to see which instance Scott is now connected to:

clip_image024[1]

Why hasn’t the session failed over? Because when node 2 was shutdown scott wasn’t executing a query. Re-execute the select count(*) from emp statement, then query gv$session again:

clip_image026[1]

If a session is inactive, it will not failover until another statement is issued.