Wednesday, December 12, 2007

Metric Baselining and Why its important

As companies start to build more and more services that are truly loosely coupled and trusted, they become a source of revenue. These services can then be used by the interested parties on a per use basis. That means the owner of the service has to meet certain service levels to operate the service. So they build certain SLA's that the service has to meet for it to be acceptable.

Some of these SLA's might include things like availability, performance, fault generation, etc. In this blog, I would like to specifically talk about performance SLA's.

Setting a performance service level agreement means that the service provider agrees that for any request sent to the service that the service sends a response back within a set time frame. So you set a static threshold, and if the performance exceeds that threshold, then it's breaking the service level.

This is a great way to set service levels, if the load on the web service is the same all the time, but usually that is not the case. So as an example, a threshold of 2 seconds might be an ok threshold during the busy times, but not ok during slow times.

To avoid these types of service levels "disagreements", there is a need for a dynamic threshold called baselining. In these cases instead of comparing the performance of the service for any message to a static threshold, we compare it to a dynamic or "rolling baseline". So as an example, let's say we have a 3 hour rolling baseline. The request comes in and a response is sent back. Let's say this transaction took 2 seconds. In a 3 hour rolling baseline situation, we look at all the requests that came in the last 3 hours and their performance. We create an average and a standard deviation (baseline), then we see if this new performance metric (2 seconds) is within a certain standard deviation. This type of baselining will account for busy times, etc.

This baseline comparison can still be taken a step further. For example instead of looking at the last 3 hours for the baseline, we could look at the same 3 hour period going back multiple weeks. That will give us a better baseline and also help us with trending and forecasting. So we know what the baselines are for any time period during the week and set our SLA's accordingly.


AA
www.managedmethods.com

No comments: