People worry that computers will get too smart and take over the world, but the real problem is that they're too stupid and they've already taken over the world. — Pedro Domingos
It feels like Google or Facebook releases a new AI technology every week to speed up or improve user experience. In this article, we'll cover what Machine Learning is, and cover the types of Machine Learning.
What is Machine Learning
Machine Learning(ML) is a method of data analysis that allows a system to learn without being explicitly programmed. Or in the words of Tom Mitchell, "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E".
Machine Learning is based on the idea that analytic systems can learn to identify patterns and make decisions with minimal human involvement using statistics, linear algebra, and numerical optimization.
I like to think of Machine Learning as a way of writing programs whose business logic is generated from input data. We feed data to the algorithm and the result of the program execution will be the logic for processing new data. It is a new way of writing software, a step away from the traditional development process.
To illustrate this even better, let's move on to the more familiar action of preparing dinner. Let's imagine that we want to make a pepperoni pizza. Pizza... the ultimate open-faced sandwich.
We have the ingredients and the recipe — these will be our inputs. We follow the recipe and by following the correct sequence of steps, we end up with a ready-to-eat pizza. This is traditional programming. To get the right result we need the recipe. We can write it ourselves or ask mom to write it for us.
With Machine Learning, it's a little bit different — we don't know the recipe, we don't want to write or don't know it, but we want the pizza nonetheless. We have a bunch of ingredients and some idea of pizza. In our case, of course, it's an ad from Papa Johns — epic crust pizza with cheese and pepperoni topics. Yammy. We have no idea how to make it, but we can invite our friends over and see from their feedback that what we've made is close to the kind of pizza we want, and maybe even delicious pizza!
A few angry reviews later, we realize that the dough on top is not really pizza. As we continue to analyze pizza "metrics" over a period of months or even years to determine what factors influenced the look, taste, and quality of the end result. At some point, we eventually come to what our friends mean by real pepperoni pizza. We write down the sequence of steps into the recipe. Our algorithm is ready!
It is important to emphasize the data-driven nature of the ML. This seems trivial, but it is surprising how often ML engineers forget this. They focus on building a better model rather than improving the data on which the model is built. This makes the task of collecting representative data crucial for machine learning to be successful.
We should train our models on data that correctly represents the conditions under which the model will work in the real world. If we want to identify spam, we need examples of spam messages. If we want to predict stock prices, we need a history of prices. If we want to know the user's interests, we need the user's click history. The data is collected in any way possible. Some people do it manually, it takes longer, usually results in less data, but with fewer errors. There is a real hunt for good data sets. Big companies sometimes reveal their algorithms, but very rarely the data.
Machine Learning types
Above we defined the main things that are needed to define the ML algorithm — the input data and the problem to be solved. As the great people used to say, in order to possess something you need to classify it. That's what we'll do.
High-level types of Machine Learning problems can be classified according to the data they have. As always all the classification is quite arbitrary.
The data can be in the following states:
- Data with labels
- Data without labels
- Data about reward
Basically, all tasks in machine learning are divided into three groups (my interpretation):
- supervised learning,
- unsupervised learning,
- reinforcement learning,
The field is fairly new, and for this reason, new problems are invented every year that are different by definition from the classical ones, and so it is difficult to classify them in any way, so for me, there is another group-anything you can think of.
In this post, we will look at the classical groups, but without explaining the algorithms in detail.
In this type of ML, the data consists of a set of input records, each with corresponding labels, and the goal is to learn to classify unlabelled data. These output labels can be either categorical or numerical, leading to two types of supervised machine learning — classification and regression.
In сlassification, the goal is to split the data into a discrete set of classes. With binary classification, there are only two discrete classes that you want to predict, such as "hotdog" or "not hotdog". With a multiclass classification, there are three or more classes for example in the problem of predicting the type of alcohol drinks.
An algorithm often used for classification is a decision tree. It works like this. If we want to classify alcohol drinks by type, the algorithm builds a decision tree by asking questions like "Is the drink stronger than 40%?" at the time of learning.
- Yes — we immediately classify it as whiskey. All whiskey has a strength greater than 40%.
- No. Then we move on to the next condition.
During training, a condition tree is created through which new data are passed and which fall into one or another group. This algorithm is convenient from the point of view of the business interpretation of the results because we can always immediately determine which properties we have divided the data best.
In a regression, the goal is to assign a certain number to the input data. For example, the price of an apartment, the expected income of a store for the next month, credit scoring — how long it will take the client to repay the loan.
The simplest algorithm here is a linear regression. Imagine that our objects are points on the plane. Our task is to draw a regression line, lying as close as possible to all points. By doing so, we will set linear coefficients between the input data and the output value.
For example, we want to calculate how the diameter of dark circles under the eyes depends on the amount of written code. In this case, we are interested in the dependence of the value "diameter of dark circles under the eyes" on the value "number of lines of code written". By performing a regression analysis, we get a kind of "magic" formula. By entering your data into it, you will get the average diameter of dark circles under the eyes that people with your programming language get.
Such an algorithm is simple and does not require any particular cost. It is convenient to use if we have many features and few objects.
In the case of unsupervised learning, things get even more interesting. Here, there are no answers and no labels. The goal of this group of algorithms is to discover the structure of data or the law that generates data when it is not initially evident there is one.
What can we do in this case? Perhaps observe some similarities between objects and include them in appropriate groups or clusters. This is a clustering task. Clustering answers questions about how to divide the objects under study into groups and how objects are similar. For example, the division of all customers of a mobile operator by the ability to pay, the categorization of space objects into one or another category (planet, star, black hole, etc.).
The most popular clustering algorithm is k-means. Let's go back to alcohol. We want to categorize our bad guys into four groups. Our drinks become points on the plane. We randomly choose the centers of our groups (centroids), then look at the distance from the center of the group to the points closest to that center of the group. We assign one cluster to those closest points. Then we move the centers so that the distance to the points of our group is less than the distance to the points of the other group. We reassign the cluster to the nearest points. After several iterations, we will have well-separated groups.
The complexity of this algorithm is that objects do not always divide well into groups. Thus it is difficult to assess the correctness of the result, even with special evaluation methods.
Some objects may be very different from all clusters, and thus we assume that these objects are anomalies. This is anomaly detection. In anomaly detection tasks, we try to isolate those data objects that look "suspicious" compared to most other objects. For example, identifying atypical (suspicious) bank card transactions.
If we take another situation, when each of the objects in the sample has a hundred different attributes, then the main difficulty will be a visual representation of such a sample. Therefore the number of features is reduced to two or three, and it becomes possible to visualize them on a plane or in 3D. This is the problem of dimensionality reduction. Dimensionality reduction — a transformation of a large number of features to a smaller one (usually 2-3) for convenience of their subsequent visualization or further processing on the encoded data.
Reinforcement learning is very different from any of our previous tasks because here we have neither labeled data nor unlabeled data.
The system consists of a model or agent and an environment. The agent will interact with the dynamic world to achieve a certain goal. The dynamic world will reward or punish the agent based on its actions. Over time, the agent will learn to navigate the dynamic world and achieve its goal(s) based on the rewards and punishments it receives.
In my mind, this is what artificial intelligence is all about.
Imagine that you are a robot in some bizarre place. You can perform actions and receive rewards from the environment. After each action, your behavior becomes more complex and smarter as you train yourself to behave in the most efficient way possible at each step. In biology, this is called adapting to the natural environment, in the human world it is called social adaptation(meh).
- A machine learning algorithm discovers and formalizes the principles behind the data it sees. With this knowledge, the algorithm can "reason" about the properties of previously unseen examples.
- Machine learning has a wide variety of approaches to solve a particular problem, not just one. These approaches have different features and different problems that best suit them.
- Machine Learning is not just glorified Statistics. Machine learning is nothing more than a class of computational algorithms. I would say that ML is more a field of computer science than statistics. Even though it works and depends on data.