Showing posts with label EBS Concurrent Processing. Show all posts
Showing posts with label EBS Concurrent Processing. Show all posts

Tuesday

Statistics, Concurrent requests and Terminations

There is a script on Metalink which can help you automate gathering statistics for an E-Business Suite environment. As you are probably aware in an EBS environment you have to use FND_STATS or some predefined concurrent programs such as "Gather Schema Statistics".


Subject: coe_stats.sql - Automates CBO Stats Gathering using FND_STATS and Table sizes
Doc ID: Note:156968.1


This note contains a script which verifies statistics and generates a script and report which lists all the tables that need to be analyzed. Beware that coe_stats.sql automatically executes the dynamic script coe_fix_stats.sql script at the end of its execution. So make sure you comment out the line "START coe_fix_stats.sql" if you don't want this to happen.

The report generated also contains some information about the table. If its partitioned, number of rows (at the time it was last analyzed), sample percent, etc.. As well as "Days since last analyze". Immediately I noticed that the time since the tables were last analyzed was over a week. Our main Gather Schema Statistics job is scheduled to run every weekend.

What does this have to do with terminations? In a previous post I mentioned that our environment crashed as a result of bug 5907779. This happened as our weekly concurrent request to gather statistics was running. After the environment was bounced this job restarted but it was now during business hours. Statistics should always be gathered at periods of low activity because it causes performance issues. So I canceled the request via OAM, which changed the status to Phase: Completed, Status: Terminated.

The side affect to terminating a job, which had slipped my mind* is that it will not be rescheduled. I'm not sure why that happens, it doesn't make much sense to me but it appears to be by design. The solution is to reschedule the job.

* Apparently 8hrs of sleep in the past 72hrs can have that affect. Partly due to a last night out while on vacation, traveling, falling ill and being paged at 3am.

In summary, the goal of this post was to point out a useful metalink note and refresh your memory to reschedule canceled requests if you need them to run again.

iProcurement: Catalog Bulk Load

Just over a year ago we implemented Shared Application Filesystem in our EBS environment. This reduced maintenance time by my guesstimate of 25-50%. Its hard to say exactly how much time we have saved but previously we would have to apply a patch to two application tiers (web, forms) and a database tier (cm, db). If patches took an equal amount of time per tier it would be easy to say how much time we have saved but typically patches on the database tier take longer to apply.

One of the modules we have implement is iProcurement and as part of that, periodically catalog data has to be uploaded. Catalog data consists of items available and their prices. To peform this task a user with the "Internet Procurement Catalog Administration/eContent Manager" responsibility has to select a file on their system and load it via the "Bulk Load Items & Price Lists" screen.

Behind the scenes the catalog data file is stored as a temporary file on the application tier. The POXCDXBL concurrent request starts and if it detects the node names for the cm tier and application tier are different it executes a file transfer, regardless of whether or not you are using a shared filesystem. In order for you to take advantage of the shared file system you need to apply patch 4656509 which upgrades RemoteFile.java to version 115.4 and set the profile option "POR: Catalog bulkload Directory" to a location on your share.

At the time we implemented Shared Application Filesystem we investigated this but it would have required alot of patching. At least 11 patches, with some of them being family packs. Even tho our POR profile option (listed above) was set to a local directory, catalog bulk load was working fine. So we decided not to change the configuration as it would require alot of testing.

Skip ahead almost two years, with alot of changes in between, and this configuration was still working until we applied the Oct 07, Jan 08 and Apr 08 security patches. During testing users reported that catalog bulk load failed:


POXCDXBL module: Catalog Bulk Load - Items & Price Lists failing with:

{true,/apps/appsvis/visappl/fnd/11.5.0/secure/myserver_vis.dbc,1,1,0}]
https://myappserver.cognos.com:8005/OA_CGI/FNDWRR.exe?temp_id=2964546434
null
oracle.apps.fnd.cp.request.FileAccessException
at oracle.apps.fnd.cp.request.RemoteFile.getFile(RemoteFile.java:441)
at oracle.apps.icx.loader.LoadRequest.runProgram(LoadRequest.java:51)
at oracle.apps.fnd.cp.request.Run.main(Run.java:161)
[6/12/08 3:11:02 PM] ********** Loader stopped **********


At first I assumed it was a cloning issue so I checked some typical problems we have experienced in the past. I noticed a few things that were wrong such as adovars.env settings, an error in our apache error_log file (below) and a few others but fixing them didn't resolve the error. I spent a fair bit of time on the apache SSL error because I had seen that in the past and it usually meant the txkrun.pl script to enable SSL failed for some reason. I re-executed it but the error did not go away (although, it did fix another problem). (As an aside, I have to say, debugging someone elses clones is a nightmare, especially if they didn't log everything.)


[Wed Jul 2 13:07:13 2008] [notice] caught SIGTERM, shutting down
[Wed Jul 2 13:07:24 2008] [notice] FastCGI: process manager initialized (pid 17232)
[Wed Jul 2 13:07:25 2008] [notice] Oracle HTTP Server Powered by Apache/1.3.19 configured -- resuming normal operations
[Wed Jul 2 13:10:34 2008] [error] mod_ssl: SSL handshake failed (server
myappserver.cognos.com:8005, client 10.69.92.24) (OpenSSL library error
follows)
[Wed Jul 2 13:10:34 2008] [error] OpenSSL: error:14094416:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate unknown
Terminated



Moving on, Note:281530.1 talks about this feature and how it works behind the scenes in great detail. It also includes typical problems that may be encountered, workarounds and potential fixes. Using this note I narrowed down our problem to be at the point where the CM requests the catalog temp file from the application tier and places it in the CM tiers $APPLTMP directory. However, this file was of zero size.

At this point we decided to change the "POR: Catalog bulkload Directory" profile option to a directory on our shared filesystem and retest. This solved the issue. An SR with Oracle couldn't confirm whether or not the problem was caused by the security patches. So for now we will put this change into our release instructions since it is a better configuration (why transfer the file if can be available locally?). We have to apply the patches to one more environment before we go live and I will test catalog bulk loads before and after we apply the patches to see if they were the cause or it was indeed a cloning issue.

Wednesday

ORA-20100 and FIND_FILE

BEGIN
*
ERROR at line 1:
ORA-20006: ORA-20100: File l0005230.tmp creation for FND_FILE failed.
You will find more information on the cause of the error in request log.
ORA-06512: at "APPS.OUR_CUSTOM_CODE", line 949
ORA-06512: at line 2

Thats how I started my day. To say I hate ORA-20100 errors would not accurately portray my feelings. However, after reading Tom Kytes blog posting titled "Why do people do this?" I know why. Guess what is on line 949. ;)

Unlike previous occurrences of this error, its a concurrent manager temp file. So I know right away that the directory in question is pointed to by the environment variable APPLPTMP.

We have hit this error before and usually its the utl_file_dir database parameter. I guess some of us DBA's have fat fingers and even tho we double check this parameter, sometimes a typo sneaks through during cloning. I have a script which writes to each of our utl file directories. So I could rule this out as being a problem.

I can't remember where I obtained the code below, but here is what I use. I write to a file, then try and read from it. Exceptions below give me a good indication of what the problem may be.

CREATE OR REPLACE PROCEDURE utl_test IS
file_handle UTL_FILE.FILE_TYPE; -- file handle of OS flat file
col1 NUMBER; -- C1 retrieved from testtab table
retrieved_buffer VARCHAR2(100); -- Line retrieved from flat file
BEGIN
file_handle := UTL_FILE.FOPEN('dir1', 'filetest.txt', 'W');
UTL_FILE.PUT_LINE(file_handle, 'This is a test');
UTL_FILE.FCLOSE(file_handle);
file_handle :=utl_file.fopen('dir1', 'filetest.txt', 'R');
UTL_FILE.GET_LINE(file_handle, retrieved_buffer);
dbms_output.put_line(retrieved_buffer);
utl_file.fclose(file_handle);
EXCEPTION
WHEN NO_DATA_FOUND THEN
DBMS_OUTPUT.PUT_LINE('no_data_found');
UTL_FILE.FCLOSE(file_handle);
WHEN UTL_FILE.INVALID_PATH THEN
DBMS_OUTPUT.PUT_LINE('UTL_FILE.INVALID_PATH');
UTL_FILE.FCLOSE(file_handle);
WHEN UTL_FILE.READ_ERROR THEN
DBMS_OUTPUT.PUT_LINE(' UTL_FILE.READ_ERROR');
UTL_FILE.FCLOSE(file_handle);Troubleshooting ORA-20100 on Concurrent Processing
WHEN UTL_FILE.WRITE_ERROR THEN
DBMS_OUTPUT.PUT_LINE('UTL_FILE.WRITE_ERROR');
UTL_FILE.FCLOSE(file_handle);
END;



The next common resolution was to check the location to which the APPLPTMP environment variable points. We have multiple environments on our test dev servers and its not uncommon for one environment to own a tmp file, thus blocking another environment from writing to it. In this case, the file l0005230.tmp existedand was owned by the proper user but was zero size.

-rw-r--r-- 1 oravis dba 0 Feb 5 10:06 l0005230.tmp

We have hit OS fragmentation issues in the past. Just to be safe I copied a large file to the tmp directory without a problem. We have had partitions with 20GB free and have been unable to create a 100k file. A system reboot usually fixed that.

I was out of ideas. I have never gotten this far with this particular error before. So I fired up Metalink. Note:261693.1 Troubleshooting ORA-20100 on Concurrent Processing, listed the options I have tried above and a few more. One suggestion is to use FND_FILE which is the package used by the application to write to OS files.

SQL> exec FND_FILE.PUT_LINE(FND_FILE.LOG, 'THIS IS A TEST');

If everything is setup correctly, this will write to a file in the directory pointed to by APPLPTMP. Don't go to this directory looking for a file named FND_FILE.LOG. You won't find it. The data, "THIS IS A TEST", will be written to a temp file. So look at tmp files created around the time you executed that command.

$cat l0006340.tmp
THIS IS A TEST


So at this point, none of the suggested troubleshooting steps have pinpointed the problem. To summarize, we have verified:
  • that the UTL_FILE_DIR init parameter is correct and each location is writeable.
  • that the application should be able to write to the OS using FND_FILE. The file does get created but the app can't write data to it.
The only thing left was to try restarting services. A bounce of the concurrent managers didn't resolve the issue. So we bounced the entire application and database. This finally resolved the issue. Why? No idea. There were a bunch of processes that didn't shutdown properly so we killed them. Maybe one of those had a resource open that was blocking. If you have encountered this before and found the root cause, feel free to share the information. ;)

Thursday

Stuck Concurrent Requests

Every now and then users call us with a concurrent request that is running longer than normal and/or blocking other batch jobs because of incompatibilities. Upon investigation we'll see that there is no database session for the request. Since there isn't a database session users may not be unable to cancel the request themselves. The cancel button will be grayed out. The solution is to clean the fnd_concurrent_requests table.

Background: Concurrent programs may be incompatible with other programs which means they cannot execute at the same time. If the stuck concurrent request has such rules defined, then programs it is incompatible with will not run until the problem is solved.

There are 2 ways to do this, update the table manually or run the Oracle provided cmclean.sql script. Depending on the method you choose, you'll need the request id. This can be provided by the user or you can look at the running requests via Oracle Applications Manager (OAM). To navigate there click on Site Map on the top left hand corner of the page. Under Concurrent requests click on Running.




Once your in the Running requests screen you'll see which programs are currently being executed. With the help of your users, find the request id in question and make note of it. The recommended approach from Oracle will be:

1. Kill the database sessions for the requests. (In our case there weren't any.)
2. Shutdown the concurrent managers.
3. Run the cmclean.sql script Note: 134007.1
4. Start your concurrent managers.

The other method is to update the bad rows in the fnd_concurrent_requests table manually.

update fnd_concurrent_requests set STATUS_CODE='D',phase_code='C' where request_id=<request id>

STATUS_CODE of D means Cancelled and a phase_code of C is completed.

For a list of status, phase_codes and what they mean, refer to Note: 297909.1.

The benefit to updating the fnd_concurrent_requests table manually is that no downtime is required. If you are using cmclean.sql remember to shutdown the concurrent managers first!