One of the ways we use BPEL is workflow for a custom application. It could take months for a workflow to complete and there are always in-flight processes. So there is never a quiet time, or a point at which all workflows are complete to do the upgrade.
We put a plan in place to manually re-place all items back in the proper stage. It seems like it was going to be a fair amount of effort so for DEV we only re-placed a subset of items to make sure we had the process correct.
However, once we upgrade TEST the in-flight processes somehow magically survived. We aren't sure why, there could be some small inconsistencies between DEV and TEST which caused it. Another reason could be that in DEV we upgraded to 220.127.116.11 first, then at a later point upgraded to 18.104.22.168.6. In TEST we went directly to 22.214.171.124.6.
The SOA upgrade itself isn't a very resilient process. If the PSA (Patch Set Assistant) gets interrupted for some reason its not smart enough to recover. In DEV it wasn't a big issue, I had an export of the MDS and SOAINFRA schemas and I just dropped and recreated. For PROD tho, the amount of data, thus downtime, would prevent us from doing this. We only hit an issue once in DEV and TEST was smooth, so for PROD we all had our fingers cross.
The PROD install was going smoothly until I had to run the PSA. Shortly after I started the schema upgrade it failed and a dreaded error appeared. In the $FMW_HOME/oracle_common/upgrade/logs directory I found:
[2013-08-21T19:28:09.239-04:00] [SOA] [ERROR]  [upgrade.SOA.SOA1] [tid: 13] [ecid: 0000K2ZD5BJ9lZWpPws1yd1I5Klw000004,0] [[
oracle.sysman.assistants.common.dbutil.SQLFatalErrorException: java.sql.SQLException: ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired
The problem was a database session holding a lock in the SOAINFRA schema. We hit this problem in DEV and the solution was to shutdown applications that access SOAINFRA tables and change the schema password. However, I didn't notice a hung database session by one of our custom applications.
If you try to run the PSA again it says the SOAINFRA schema isn't valid and won't let you continue with the upgrade. So I tried manually updating the registry to state the schema is valid.
update schema_version_registry set status='VALID' where mr_name='SOAINFRA';
Great! The installer started up again but it quickly failed. Checking the logs:
2013-08-21 19:36:19.378 rcu:Extracted SQL Statement: [ALTER TABLE BPM_CUBE_PROCESS ADD (SubType VARCHAR2(200), DeploymentInfo BLOB)]
2013-08-21 19:36:19.378 rcu:Statement Type: 'DDL Statement'
JDBC SQLException - ErrorCode: 1430SQLState:72000 Message: ORA-01430: column being added already exists in table
Since this was production, I opened a P1 SR with Oracle right away. In these situations I try to get Oracle involved as quickly as possible. I'll still continue to do research on my end. Sometimes I find the solution quicker, sometimes Oracle Support does.
This time I found the following note:
Patch Set Assistant Failed for SOA 126.96.36.199.0 when Patch 13606871 is Applied on SOA 188.8.131.52.0 (Doc ID 1517404.1)
It provided a set of SQL statements to help rollback the schema upgrade. Only problem was my upgrade was from 184.108.40.206, not 220.127.116.11. Knowing that Oracle now supported manually rolling back the upgrade, then I looked at the script that was failing:
After a few trial and errors I managed to write a script to undo all the changes. Unfortunately if you missed one, which was easy to do as the script is a few thousand lines, you had to start over. Finally I managed to find all updated objects and continued with the upgrade.
Another upgrade issue I encountered showed itself in the startup logs for soa_server1:
|java.sql.SQLException: ORA-25226: dequeue failed, queue UPG_SOAINFRA.EDN_OAOO_QUEUE is not enabled for dequeue|
The solution to this problem was to manually restart the queues:
|SQL> show user|
|USER is "UPG_SOAINFRA"|
|SQL> exec dbms_aqadm.start_queue('IP_OUT_QUEUE',true,true);|
|PL/SQL procedure successfully completed.|
|SQL> exec dbms_aqadm.start_queue('EDN_OAOO_QUEUE',true,true);|
|PL/SQL procedure successfully completed.|
After the startup we waited anxiously to hear from the superusers to let us know the status of in-flight processes. To much celebration (more so from them than us since they had alot of potential work to do) in-flight processes did not disappear.