Monday, September 30, 2013

SOA Upgrade 11.1.1.3 -> 11.1.1.6

We upgraded our SOA install from 11.1.1.3 to 11.1.1.6 and hit a few issues.   One of the main issues was that after upgrading DEV our inflight processes disappeared.    We talked back and forth with Oracle and it was supposed to be supported but for some reason it wasn't working for us.    I know that from SOA 10g to SOA 11g in-flight processes are not supported as part of the upgrade.

One of the ways we use BPEL is workflow for a custom application.  It could take months for a workflow to complete and there are always in-flight processes.    So there is never a quiet time, or a point at which all workflows are complete to do the upgrade.

We put a plan in place to manually re-place all items back in the proper stage.    It seems like it was going to be a fair amount of effort so for DEV we only re-placed a subset of items to make sure we had the process correct.

However, once we upgrade TEST the in-flight processes somehow magically survived.    We aren't sure why, there could be some small inconsistencies between DEV and TEST which caused it.  Another reason could be that in DEV we upgraded to 11.1.1.6 first, then at a later point upgraded to 11.1.1.6.6.   In TEST we went directly to 11.1.1.6.6.

The SOA upgrade itself isn't a very resilient process.   If the PSA (Patch Set Assistant) gets interrupted for some reason its not smart enough to recover.  In DEV it wasn't a big issue, I had an export of the MDS and SOAINFRA schemas and I just dropped and recreated.    For PROD tho, the amount of data, thus downtime, would prevent us from doing this.    We only hit an issue once in DEV and TEST was smooth,  so for PROD we all had our fingers cross.

The PROD install was going smoothly until I had to run the PSA.     Shortly after I started the schema upgrade it failed and a dreaded error appeared.  In the $FMW_HOME/oracle_common/upgrade/logs directory I found:


[2013-08-21T19:28:09.239-04:00] [SOA] [ERROR] [] [upgrade.SOA.SOA1] [tid: 13] [ecid: 0000K2ZD5BJ9lZWpPws1yd1I5Klw000004,0] [[
oracle.sysman.assistants.common.dbutil.SQLFatalErrorException: java.sql.SQLException: ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

The problem was a database session holding a lock in the SOAINFRA schema.   We hit this problem in DEV and the solution was to shutdown applications that access SOAINFRA tables and change the schema password.   However, I didn't notice a hung database session by one of our custom applications.

If you try to run the PSA again it says the SOAINFRA schema isn't valid and won't let you continue with the upgrade.   So I tried manually updating the registry to state the schema is valid.

update schema_version_registry set status='VALID' where mr_name='SOAINFRA';

Great!  The installer started up again but it quickly failed.  Checking the logs:

2013-08-21 19:36:19.378 rcu:Extracted SQL Statement: [ALTER TABLE BPM_CUBE_PROCESS ADD (SubType VARCHAR2(200), DeploymentInfo BLOB)]
2013-08-21 19:36:19.378 rcu:Statement Type: 'DDL Statement'
JDBC SQLException - ErrorCode: 1430SQLState:72000 Message: ORA-01430: column being added already exists in table

Since this was production, I opened a P1 SR with Oracle right away.   In these situations I try to get Oracle involved as quickly as possible.   I'll still continue to do research on my end.  Sometimes I find the solution quicker, sometimes Oracle Support does.   

This time I found the following note:

Patch Set Assistant Failed for SOA 11.1.1.6.0 when Patch 13606871 is Applied on SOA 11.1.1.5.0 (Doc ID 1517404.1)

It provided a set of SQL statements to help rollback the schema upgrade.  Only problem was my upgrade was from 11.1.1.3, not 11.1.1.5.    Knowing that Oracle now supported manually rolling back the upgrade, then I looked at the script that was failing:

/u01/app/oracle/product/fmw11g_SOA/SOAHome_1/rcu/integration/soainfra//sql/upgrade_soainfra_111130_111140_oracle.tsql

After a few trial and errors I managed to write a script to undo all the changes.     Unfortunately if you missed one, which was easy to do as the script is a few thousand lines, you had to start over.   Finally I managed to find all updated objects and continued with the upgrade.

Another upgrade issue I encountered showed itself in the startup logs for soa_server1:

java.sql.SQLException: ORA-25226: dequeue failed, queue UPG_SOAINFRA.EDN_OAOO_QUEUE is not enabled for dequeue
The solution to this problem was to manually restart the queues:

SQL> show user
USER is "UPG_SOAINFRA"
SQL> exec dbms_aqadm.start_queue('IP_OUT_QUEUE',true,true);
PL/SQL procedure successfully completed.
SQL>  exec dbms_aqadm.start_queue('EDN_OAOO_QUEUE',true,true);
PL/SQL procedure successfully completed.
After the startup we waited anxiously to hear from the superusers to let us know the status of in-flight processes.    To much celebration (more so from them than us since they had alot of potential work to do) in-flight processes did not disappear.

Sunday, September 29, 2013

Openworld 2013


Another Openworld has come and gone.  With all the keynotes, sessions and parti^H^H^H networking events yet to happen, its almost unimmaginable that it could go by so quickly.  Yet here I am typing up the summary of my trip.


I arrived the Saturday before the conference and shortly after checking in I went down to Moscone West to register.  This is the first time in a couple of years that I didn't attend under the Blogger program.   Hopefully i'll be able to blog enough this year to be eligible for the program again.

Lately i've had a few issues with query performance so I attended a number of sessions on the optimizer and sql tuning.   I also attended a couple on Grid Control so I could start planning our upgrade from 11g to 12c.  With the new 12.2 E-Business Suite just a short time before the conference I attended some sessions about that product as well

The last few years I did a day by day break down but this year i'll just mention some of the sessions  I attended and are worth downloading if your interested in the topic.


"UGF9740  --  RDBMS Forensics: Troubleshooting with Active Session History"

"UGF3062  --  The Query Optimizer in Oracle Database 12c: What’s New?"

"UGF5498  --  Solving Critical Customer Issues with the Oracle Database 12c Optimizer"

"UGF9790  --  Where Did My CPU Go?"

"CON8482  --  Oracle E-Business Suite Technology: Latest Features and Roadmap"

"CON8707  --  Consolidating Databases with Oracle Database 12c"

"CON8711  --  Oracle RMAN in Oracle Database 12c: New Features and Best Practices"

"CON8460  --  Deployment and System Administration of Oracle E-Business Suite 12.2"

"CON8725  --  Behind the Scenes of Oracle Multitenant"

"CON8492  --  Oracle E-Business Suite Technology Certification Primer and Roadmap"

"CON8643  --  Oracle Optimizer Boot Camp: 10 Optimizer Tips You Can’t Do Without"

"CON9458  --  Oracle Linux Troubleshooting: Diagnostics and Best Practices"

"CON8883  --  Cloning and Snapshots with Oracle Database 12c"

"CON11637  --  What’s New in Oracle Database 12c"

"CON7457  --  SQL Tuning 101"

"CON8252  --  Best Practices for Maintaining Oracle Fusion Middleware"

"CON7752  --  Oracle Enterprise Manager Cloud Control 12c: Top 10 Features for DBAs"

A couple of dampers on the trip.  Baggage handlers left my suitcase in a puddle somewhere at SFO or the Toronto airport.   So half of my clothes were soaking wet.  Not a huge deal, I just hung them up to dry but luckily I had opened my suitcase when I arrived, instead of the next morning when I would have needed more clothes.

Larry Ellison didn't show up for his keynote.  I had skipped a session and lined up early so I could get a good seat.   Looks like I wasn't the only unhappy person there because as soon as he was announced as a no show quite a few people left.    Another first is that I didn't know very many people attending the conference.  I managed to met up with a former co-worker a few times which was great.

From a session/networking perspective tho I had a great time.   I learned a lot and met some great people.    Hopefully I will be able to return again next year.

I liked how they didn't totally enclose Howard Street this year.    If it rains its nice to have the tents but I found them to be pretty hot and stuff if the weather is nice.    It also opened up the space and made it more enjoyable to meet up with people.

I didn't take alot of pictures but ones I did take can be found on flickr:

http://www.flickr.com/photos/8020613@N05/sets/72157636031724556/

Tuesday, July 30, 2013

runInstaller java.lang.reflect.InvocationTargetException




Well this one stumped me for a little bit today:











Not much information on Metalink or Google.    In the past one issue that would crop up every now and then was /tmp mounted with noexec.   I checked for that but it was ok... On a whim I decided to set my tmp dir to another location and it worked:

TMP=/u01/tmp; export TMP


Update: I recall why changing the TMP directory worked.    The problem was with noexec as one of the mount options of /tmp.   I had changed it in fstab but couldn't arrange downtime so I used environment variables to repoint tmp.   The server still hasn't been rebooted, so the change hasn't come into affect yet.  

Monday, June 17, 2013

error on line 1 at column 1: Document is empty

I'm not sure why this happens but every now and then the invalidator password gets corrupted.  When that happens, the following error will be seen when you try to access portal:

This page contains the following errors:
error on line 1 at column 1: Document is empty
Below is a rendering of the page up to the first error.
FIX:

Re-enter the invalidator password for Webcache and Portal.





Portal




WebCache










Friday, April 26, 2013

OAM and EBS Breaks after Cloning

We are in the process of integrating E-Business Suite with Oracle Access Manager (OAM). Its setup in DEV and TEST but not in PROD yet.    There was a requirement to reclone DEV and as a result the integration with OAM broke.

We ran all the configuration steps on the EBS side, such as:

  • $FND_TOP/bin/txkrun.pl -script=SetSSOReg -registerinstance=yes
  • Verifying ASADMIN was active and the correct password.
  • Profile options to configure authentation:
    • Applications SSO Type
    • Applications SSO Login types
    • etc...
  • Business Events to make sure account creation and modifications synchronizes correctly.
  • We skipped all of the tasks to be performed on the OAM side since they were already done.

However, when restarting Access Gate we would see the following error in the logs:

weblogic.application.ModuleException:
        at weblogic.jdbc.module.JDBCModule.prepare(JDBCModule.java:302)
        at weblogic.application.internal.flow.ModuleListenerInvoker.prepare(ModuleListenerInvoker.java:199)
        at weblogic.application.internal.flow.DeploymentCallbackFlow$1.next(DeploymentCallbackFlow.java:517)
        at weblogic.application.utils.StateMachineDriver.nextState(StateMachineDriver.java:52)
        at weblogic.application.internal.flow.DeploymentCallbackFlow.prepare(DeploymentCallbackFlow.java:159)
        Truncated. see log file for complete stacktrace
Caused By: weblogic.common.resourcepool.ResourceSystemException:
Could not connect to 'oracle.apps.fnd.ext.jdbc.datasource.AppsDataSource'.

The returned message is: ORA-01017: invalid username/password; logon denied


We triple checked the datasource credentials defined in the Access Gate server but still nothing worked.  It turns one of the steps to deploy Access Gate was to copy the dbc file from $FND_SECURE over to the $MW_HOME/appsutil/accessgate/dev directory.   Since DEV was recloned the APPL_SERVER_ID value had changed.    We re-copied this file, bounced the Access Gate server and everything started working again.

Since we don't have EBS integrated with OAM in PROD yet i'm not sure if we will need to do any additional steps after a clone.   I'm thinking all the steps above will apply except for maybe business events as they will already be configured properly in PROD.   I'll update this post if I have to do anything else.