Predict your Cloud Costs with Machine Learning for Free.

Cloud computing is the core need for every business going digital. But companies often overpay for their cloud needs. An estimate by Statista suggests that more than 30% of cloud expenditures are wasted every year.

Amazon believes fair usage of spot instances can save up to 90% of on-demand instances.

I have brought you an unconventional solution that will optimize your cloud costs without the help of any consultant or further unnecessary expenditure.

Cloud Spot Instance Prediction with kNN Regression

KNN Regression
KNN Regression

Regression has been one of the oldest ways of arriving at a mathematical value by churning a lot of data. It helps you to design parameters that are almost perfectly organized and very accurate.

For beginners, an instance is a demand for computing resources from a virtual computer located on a server.

Basically, there are 3 types of instances in Cloud Computing:

  • Reserved Instances: These instances allows user to purchase long-term instance credits, often at low prices.
  • On-Demand Instances: They are more costly than reserved instances and are available for purchase when users need them.
  • Spot Instances are those which use spare EC2 resources at steep discounts. They are cheaper than on-demand instances. They are purchased via bidding.

Environment Setup

AWS Elasti-Compute 2 (EC2) was chosen as a testbed for this exercise. It was chosen as a representative as it is the most popular cloud computing service.

The price history of its spot instance was used to predict the future price by building a k-Nearest Neighbors regression model. This model has yielded better results than Linear Regression, Multi-Layer Perception Regression. Support Vector Machine Regression and Random Forest.

For Regression, the experiment has used Python and Pandas for data manipulation. Pandas is a data analysis library of Python.

The dataset for the experiment can be obtained from here.

The environment for data regression can be either Anaconda or Google Colab. I prefer Colab notebooks for the following reasons.

  • They are easy to share.
  • They are easy for beginners as it requires no installation. They run on virtual PCs.
  • They are user-friendly.

However, they are also a bit slower than a normal PC(if you have an Intel Core i5 or higher CPU).

Here is the experimentation code. It was written using Python 3.5

Cloud Cost for Spot Instances Experiment Setting
Experiment Setting

You can view the complete experiment in HCIS Journal here.