Do we still need Capacity Management?
Deep knowledge of the capacity of your IT systems is crucial to delivering reliable service at a reasonable cost.
It’s tempting to think that in the cloud capacity management is no longer needed, however this simply is not the case.
Cloud delivers flexibility and scale, but at a cost. In fact, poorly managed cloud services can be much more costly than more traditional systems.
System | Why? |
---|---|
Cloud | Cloud is expensive, controlling moment by moment capacity and scaling only when necessary stops overspending. |
On-Prem | Lead times on new equipment can be long, meaning a risk of outage if capacity isn’t managed correctly. |
Understanding and controlling those costs is key to remaining competitive.
How to even start?
Building capacity management capability from scratch is difficult, and for an organisation that is just starting on the journey, it is best tacked in three stages of maturity.
Know Today
A surprisingly challenging first obstacle is to be able to get a coherent view of the current capacity available in your estate, both in the cloud and on-prem.
If you can confidently say that nothing is going to break today because it’s out of capacity, then that’s a good start.
Identify all the potential capacity constraints that matter to your business, and ensure that you’re able to see what the usage looks like for each one on a day to day (or ideally, moment to moment) basis.
Then start to collect and store this data so that you can begin to anticipate issues, which is the next stage of maturity in capacity management.
Predict Tomorrow
Once you’re collecting capacity data from across your estate, you should start to identify trends in that data.
Look out for systems or processes that are using capacity at a predictable rate, and then it’s easy enough to calculate when more capacity will be required.
Once you’re able to foresee capacity issues based on historical usage patterns, it’s time to move on to forecasting usage based on input from the business.
Forecast the Future
A truly mature capacity management function will be able to fold in the requirements from the business and model the impact that will have on the usage of IT systems.
For each of the metrics that are being tracked in the capacity management system a driver, or set of drivers, that can be related back to business forecasts should be identified.
Then, using historical data as a guide, work out how the capacity metrics react to changes in the driver. For example, how many extra application servers does it require to serve another 2000 concurrent users?
It may be necessary to do some translation between drivers to get a reasonable answer. In the example above, the business may only be able to provide an estimate of how many customers will be served in a given week and then it’s up to the capacity management function to turn that into an estimate for concurrent users, and then on to the impact on the application servers.
At the end of the day it’s all about ensuring you’re collecting the right data and, crucially, applying it appropriately to the capacity management problem.
Continuing the Journey
The sheer amount of different metrics that are available from systems today means that it’s often difficult to work out what is most important to track, and to worry that something will get missed.
There will be enough SAN capacity, and enough will be allocated to the datastore, and the thin-provisioned virtual machine will be happily making use of it, the database tablespace will be big enough, and it won’t run out of RAM.
But it will manage to overflow a small integer column on a forgotten table in the database from long ago which brought service to its knees.
Something will get missed. It’s inevitable.
However, deeply understanding the failures and building them into the management system, learning, growing and adapting as you do will give you the best chance of avoiding the next miss.
2021-10-31 13:42