Basics of Machine Learning
Machine Learning is a buzzword amongst technology enthusiasts today, but what exactly does it mean?
Let me start with a simple example, a movie recommendation on netflix . Now, that doesn’t really sound simple,does it? When you think about it, how does it know?
Let’s see what a movie recommendation has to do with ML. When you watch a certain movie on Netflix, it recommends you other movies or series of a similar genre. That means, it now knows, to a certain extent, what kind of movies you like or might like. But how? This is where ML comes in.
Let’s talk about what Machine Learning is.
Machine Learning is basically teaching a machine how to do something.
In simple programming, what you do is provide input and a formula to your code and it gives you an output, while in ML you provide input and output and in turn get a relation between them.
A much formal definition given by Tom Mitchell is
A Program is said to learn from Experience E with some tasks T and performance measure P, if it’s performance in T, as measured by P improves with E.
Informally, Machine Learning is the ability of a computer to learn without being explicitly programmed.
This was all about what Machine Learning is. Let’s talk technically now.
As a developer, at some point of your career you would definitely come across some problems which cannot be solved by regular if-else statements.
For example, in a game of Tic-Tac-Toe, there are way too many combinations for it to be written successfully.
To solve these kinds of problems, we use ML algorithms but before we go there, let's talk about the three main paradigms of Machine Learning.
The three paradigms of Machine Learning are:
· Supervised Learning.
· Unsupervised Learning.
· Reinforcement Learning.
Supervised Learning
Supervised Learning is one of the major paradigms of Machine Learning. It simply means, that the data we have is supervised, i.e. it is labelled. In other words, we know what we are predicting and what data we are using to predict it, unlike unsupervised learning where speaking broadly, we don’t know what we are predicting since the data is given directly with no labels to it. We have correct answers for some inputs and using that, we need to predict the other outputs.
In Supervised Learning, we have inputs and outputs for some data (Training set) and we try to predict the output of new unseen data* (Testing data).
There are two main prediction problems under Supervised Learning
· Regression
· Classification
Regression means predicting a continuous value by finding a relation between variables.
For example, Predicting price of a house.
In regression, we have a continuous function which is used to predict the value of something. ‘y’ in regression can be any real number (taking in consideration all constraints). This function can be linear, polynomial, exponential etc. When this function is linear, we call it Linear regression.
Classification means classifying the data into different classes and predicting the class to which the input data belongs.
Example: Whether an image is of a dog or a cat.
In classification, ‘y’ is discrete. It usually is 0 or 1, 0 for negative and 1 for positive. For example, if we want to classify an image as black and white or coloured, we can say that 0 is for coloured and 1 for grey scale. This is binary classification, as we have two classes grey-scale and coloured. Multi-class classification is where ‘y’ can be 0,1,2,3…. n.
Let us understand this using a very simple example, which I consider to be the “Hello world” of Machine Learning. Suppose you have the areas of different houses and their prices, and you want to predict the price of a new house using that data.
So the data you have will be somewhat like the following :
As you can see, the data here is labelled. The first column is labelled as “Area of house” which is called a feature .The second Column is “Price” which is usually referred to by variable ‘y’ and is our output.
Note : Feature is a descriptive characteristic of the variable to be predicted.
Now if we want to predict the price of a house whose area is 3250 square meters, we will use a supervised learning algorithm to find it. The data shows it to be a regression problem, so we will be using Linear Regression here.
The obvious question here is, what is Linear Regression?
Linear Regression is an algorithm which is used to find a relation between variables by fitting a linear equation to input data.
Here’s what a linear equation usually looks like:
where,
is hypothesis function and x being input.
In linear regression, our main aim is to find i.e. values of which fits the data with minimum error. This is where cost function comes in.
Cost Function measures the error between actual values and predicted
values. It is given by
Note: Summation is from i=1 to i=m where m is number of training
examples.
This is also called Mean Squared Error Function. Our goal is to minimize
this cost which is done using Gradient Descent.
Gradient Descent is an algorithm which is used to minimize loss and find
values of In GD, we start with random values of and update the values in an iterative
manner until we get minimum cost.
Note: In real world applications, data we have contains more than one (i.e.
n) features. Hence, theta is usually a matrix of (1, n+1)
dimension and x is a matrix of dimension (n, m).
Unsupervised Learning
Unsupervised Learning means that the training data is not supervised i.e. it is not labelled. We don’t know what we are predicting and let the machine do all the work. We don’t have any correct output here and are trying to find patterns in existing training data.
One of the major prediction algorithms in Unsupervised Learning is Clustering.
In Clustering, we form groups of training data based on certain similarities in them. For example, if we have news as our data, then they can be grouped based on their domains like agriculture, natural disasters or business. We do not know how many such labels can be formed. Hence, we let the program do its work and group similar news together.
For example, We have data of 5 candidates as follows:
Our clustering algorithm will find a pattern in this data and group together interns, FTEs and Part time employees as shown in the image below:
This is a very basic example of Clustering. Clustering has wide applications in almost every industry.
Machine Learning is not limited to these algorithms, there are several other algorithms which have lots of other applications in various domains. In addition to these, there are many researchers constantly trying to make ML Algorithms more efficient and optimised. The scope of Machine Learning goes beyond just prediction, it can help computers learn things which even humans can’t do. That is the power of ML.
Now,you might remember me using the Netflix recommendation system as an example in the beginning of this post. Well, I'll be explaining all about it in my next post so stay tuned .
Hope you all enjoyed it!
Comments
Post a Comment