Wednesday, May 20, 2009

Hyperion Essbase: Why Bother to Tune?

Issue

Recently a customer logged an SR. The issue was that a data load that in Production on 32-bit Essbase takes 20 minutes, and the same data load to the same outline using the same configuration parameters on a 64-bit version of Essbase takes nearly 5 hours to complete. “Holy batch windows Batman!”

Robin and I began our investigation by loading the data on our own 32 bit Essbase version - to see what we could see, as it were. The load completed in about ~4900 seconds.


Discussion
As I monitored the database statistics using esscmd, I noticed fragmentation statistics. The data was not sorted. (Not sorted?!? Good grief! Robin and I hadn’t seen that in a decade). When we performed a load with sorted data it completed in ~900 seconds. Repeating the customer’s load procedures, we noticed that data loaded efficiently up to approximately ~70,000 blocks but bogged down considerably from that point until all ~350,000 blocks were created.

When I mentioned fragmentation/sorting to the customer it was clear from his initial response to me that he was not entirely aware of the relation between sorting data and loading data to a BSO cube. Rather, he stated that he did not really care whether the load was optimal because a 20‑minute data load time fits easily within his batch window. Given that my test results did not speak to the issue of 64-bit load times, we agreed to convene the next workday, online, to examine the issue together.

After we confirmed every setting between the two cubes environments were identical, we initiated a load on the 64-bit server. We killed the data load after it was clear that it had bogged down, and began to crawl. It would have been more eventful to sit and watch the grass grow. Since watching Windows performance monitor is even more fun than watching grass grow, we opened up a session there - to see what we could see.

CPU activity was down at that grass growing pace. When we investigated further we noticed there was a disk bottleneck by observing the average disk queue metric. After the initial (~70K)data blocks were created, the queue dramatically increased per the image below.

Compare the above chart to the queue that is created during the sorted data load:

Checking the commit level (default 3000) caused me to experience internally the sound of a fingernail scratching across a blackboard. The reason that this was bothering me is that the negative impact of unsorted data was being exacerbated by the commit level. Together for this customer, they caused some sort of code issue to rear an ugly head in the 64‑bit version.

Say what? Here is what I think is happening.


Cause Isolated?
As blocks are being created and handed over to the Windows OS to be written, new data is being read from the source files. That the data is not sorted means that Essbase may have to fetch data already sent to disk in order to update that block with new (unsorted) input data.

What happens when the block fetch request is in fact still in the next batch of 3000 blocks waiting to be committed? I think that Windows will need to commit the block to disk before it can respond to the fetch request. In the meantime, the data load has stopped, and the CPUs are sitting around, probably having a smoke.

I don’t think it takes a rocket scientist to speculate that this situation creates a disk I/O nightmare that eventually impacts CPU utilization. The unacceptable disk queue is ultimately the result of not sorting the source data.

Notwithstanding the fact that something may be seriously wrong with the 64-bit Essbase data load routines, the disk I/O bottleneck really is occurring on the 32‑bit server, but only to a lesser extent.


Solution
On the customer’s test environment we were able to use 64‑bit Essbase to load the unsorted data in approximately 500 seconds. This was accomplished simply by setting the commit block parameter to 0 (zero) causing Essbase to delay commits until the load had successfully completed. Be aware that not all customer Essbase environments can accommodate this setting.


In any event, when we sorted the data, our data load was reduced to ~200 seconds!

You Tell Me...
You tell me why anybody would ever consider ensuring that their Essbase server is optimally tuned and configured? If it ain’t broked, don’t fix it! Right?

Yea. Right.


Essbase Timebombs?
How many customers out there have Essbase environments that are just waiting to self destruct because the next change to processing requirements happens?


I can count on none fingers the number of BSO applications that I have seen in the past two years that have been appropriately tuned and configured. And every BSO application log that I have reviewed on MetaLink has been filled with indicators of pathology.

Next month, I hope to do a web presentation to show what to look for in their Essbase application logs to determine whether they need to tweak tune, perform open heart surgery, or perhaps order a coffin. Interested?

1 comment:

Erik Ykema said...

Interesting quest by Robin. We use Essbase in our organization, and have not bothered at all with sorting before loading. I will give the team a hint.
Question: can one also use an Oracle database as a resource provider under Hyperion in stead of Essbase? Being an Oracle database user I am kinda amazed by some of the characteristics of the Essbase data layer, there are several things that in an Oracle DBMS one doesn't have to bother with, i.e. sorting etc. of data is not so relevant, and certainly not at load time.
best regards, Erik

Official, Youbetcha Legalese

This blog is provided for information purposes only and the contents hereof are subject to change without notice. This blog contains links to articles, sites, blogs, that are created by entities other than Oracle. These links may contain advice, information, and opinion that is incorrect or untested. This blog, links, and other materials contained or referenced in this blog are not warranted to be error-free, nor are they subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this blog, links and other materials contained or referenced in this blog, and no contractual obligations are formed either directly or indirectly by this blog, link or other materials. This blog may not be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without our prior written permission. The opinions and recommendations contained in this blog(including links) do not represent the position of Oracle Corporation.

Oracle, JD Edwards, PeopleSoft, and Siebel are registered trademarks of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners.