Thursday

OTL Performance problems

For a while now we have been investigating a performance issue within OTL. Specifically in the Create Timecard screen when users select a week and click on the Go button. This query could take upwards of a couple of minutes to execute.

The applications team has been working with Oracle and tried a few suggestions. More aggressive statistics gathering, some data purging, etc. I had some free time so I decided to dig into it myself. The first thing I tried was to see if I could reproduce the issue in a test environment. The test environment is refreshed nightly from production via rman's duplicate feature. So, the environments are identical. I could not reproduce the issue so that meant a couple of things. Either the problem is load induced (production has hundreds of concurrent users) or the environments weren't identical.

Since I know the database is identical, I looked at init parameters. Oracle has a handy script which builds a report of initialization parameters and required values. This can be found in note: 174605.1

When I compared the production vs test report, I noticed some discrepancies. Fortunately the parameters in question were dynamic, so they could be changed and the issue re-tested. The problem was not resolved so I had to dig a little deeper.

I am by no means an expert in SQL tuning. Working with packaged applications doesn't really provide me with alot of opportunities to dig in. Here is the explain plan for test:


----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 96 | 5 |
| 1 | SORT ORDER BY | | 1 | 96 | 5 |
| 2 | FILTER | | | | |
| 3 | CONNECT BY WITH FILTERING | | | | |
| 4 | TABLE ACCESS BY INDEX ROWID | HXC_TIME_BUILDING_BLOCKS | 2 | 240 | 162 |
| 5 | BITMAP CONVERSION TO ROWIDS | | | | |
| 6 | BITMAP OR | | | | |
| 7 | BITMAP CONVERSION FROM ROWIDS| | | | |
| 8 | SORT ORDER BY | | | | |
| 9 | INDEX RANGE SCAN | HXC_TIME_BUILDING_BLOCKS_PK | | | 3 |
| 10 | BITMAP CONVERSION FROM ROWIDS| | | | |
| 11 | INDEX RANGE SCAN | HXC_TIME_BUILDING_BLOCKS_FK2 | | | 14 |
| 12 | NESTED LOOPS | | | | |
| 13 | BUFFER SORT | | | | |
| 14 | CONNECT BY PUMP | | | | |
| 15 | TABLE ACCESS BY INDEX ROWID | HXC_TIME_BUILDING_BLOCKS | 1 | 96 | 4 |
| 16 | INDEX RANGE SCAN | HXC_TIME_BUILDING_BLOCKS_FK3 | 3 | | 3 |
| 17 | TABLE ACCESS FULL | HXC_TIME_BUILDING_BLOCKS | 5336K| 610M| 26435 |

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
14 consistent gets
0 physical reads
0 redo size
2024 bytes sent via SQL*Net to client
464 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
1 rows processed




Production:


------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 96 | 5 |
| 1 | SORT ORDER BY | | 1 | 96 | 5 |
| 2 | FILTER | | | | |
| 3 | CONNECT BY WITH FILTERING | | | | |
| 4 | TABLE ACCESS FULL | HXC_TIME_BUILDING_BLOCKS | 2 | 240 | 26485 |
| 5 | NESTED LOOPS | | | | |
| 6 | BUFFER SORT | | | | |
| 7 | CONNECT BY PUMP | | | | |
| 8 | TABLE ACCESS BY INDEX ROWID| HXC_TIME_BUILDING_BLOCKS | 1 | 96 | 4 |
| 9 | INDEX RANGE SCAN | HXC_TIME_BUILDING_BLOCKS_FK3 | 3 | | 3 |
| 10 | TABLE ACCESS FULL | HXC_TIME_BUILDING_BLOCKS | 5336K| 610M| 26435 |
------------------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
95708 consistent gets
94802 physical reads
0 redo size
2024 bytes sent via SQL*Net to client
464 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
5 sorts (memory)
0 sorts (disk)
1 rows processed



The query requires 14 consistent gets in test but 95708 in production! Ouch. I was pretty sure that was our problem right there but how to prove it. I noticed the test environment query was using indexes that productions wasn't, so I tried the query with some index hints. That didn't work. To make a long story short the cause is a bug:

Bug 6282093 - Cost based CONNECT BY does not OR expand START WITH
Note:6282093.8


The recommended action plan from Oracle is to upgrade to 10.2.0.3 and apply a one off-patch. There is a workaround, set "_optimizer_connect_by_cost_based" to false but i'm not sure if it will have an adverse impact on other areas of the application. Since its a dynamic parameter, I am going to ask that give it a try. Worst case, if performance tanks somewhere else I can quickly revert the change.

If you are on 11.5.10 CU2, DB 10.2.0.2 and have set this parameter, please let me know if you have experienced any issues. I'm still curious why I can't reproduce the issue in an identical environment tho. The only other variables in the equation are that the servers aren't identical (which could affect execution plans) or rman's duplicate feature isn't identical enough?

5 comments:

Shiv Saroj said...

Hi Dave,

I just happened to stumble upon your blog entry. Looks interesting but as per my knowledge there are many customers on this combination and even your test instance is working fine, so issue is something else. Can you provide more details on the following aspect:
A. Any customization on the create timecard page?
B. What is the OTL patchset?
C. Does it happen while opening saved or submitted timecards?

--Shiv

Unknown said...

No customizations in OTL and we are on HXT.H

No, viewing previously submitted or saved timecards aren't an issue... Its only when selecting the timecard week and clicking the go button.

Weird problem for sure...

Shiv Saroj said...

One thing for sure is that it is a very old code level for OTL. So it certainly won't be utilizing all the features of 10G. Moreover Oracle doesn't support this code level.

Which timecard layout is being used? Payroll or projects or something else?

Any customization on the timecard layout ldt?

--Shiv

Unknown said...

Yeah, we realize we are old and are in the process of upgrading... As far as I know its the projects layout and I don't believe there are any customizations... I am not very familiar with the functional side so i'll follow up with our devs.

Shiv Saroj said...

Also check on the Projects and Tasks related views used on the timecard.

--Shiv