20 Algorithms Commonly Used in Machine Learning | Introducing Two Charms of Python

What is machine learning

Machine learning is a method of reading data into a computer and having it analyze itself by an algorithm.

Artificial intelligence is an attempt to make computers behave like humans. Artificial intelligence technology has been attracting attention all over the world in recent years, but it is necessary to give computers the ability to learn in order to make them make their own judgments without giving them clear instructions.

Machine learning is one of the many elements that make up artificial intelligence.

What is an algorithm

Algorithm means a procedure or calculation method for solving a problem.

Algorithm can be said that anyone can get the same answer by following the procedure. Algorithms generally use a computer to programmatically solve a problem, and machine learning uses algorithms to analyze data.

There are various types of algorithms used in machine learning.

type of machine learning

There are three types of machine learning.

Machine learning can be divided into three types: supervised learning, unsupervised learning, and reinforcement learning, depending on the type of data.

Here, we will introduce each type of machine learning, so why not use it as a reference?

supervised learning

“Supervised learning” is a method of learning by giving correct labeled data to a computer.

By repeatedly learning the correct input values, it is possible to produce the correct answer even when new data is input.

For example, if you give a computer an image and you want it to distinguish whether it is a cat or a dog, you can give it a large amount of image data labeled with the correct answers for cats and dogs so that it can distinguish which is a cat and which is a dog. will be

unsupervised learning

“Unsupervised learning” is a method of giving unlabeled data to a computer and letting it learn features and tendencies by itself.

Since unsupervised learning does not give correct labels, by training the input values to be grouped, it becomes possible to know which group even if new data is input.

Grouping in unsupervised learning is called “clustering” and is also used for marketing analysis.

reinforcement learning

“Reinforcement learning” is a method to make the computer learn more valuable output by assigning points to the computer’s output.

By setting a value for each result output by the computer, and having the computer repeat it so as to maximize that value, the accuracy of the output is gradually increased.

Reinforcement learning is used in artificial intelligence that plays games such as Go, and has the characteristic of becoming stronger by trying various moves to win.

What is deep learning

Deep learning is machine learning that uses a “deep neural network” that allows computers to automatically extract data features by using a sufficient amount of data.

One of the machine learning methods, it is characterized by discovering rules and patterns in data, setting and learning of feature values, etc. automatically by the computer itself.

In addition, deep learning can learn features that humans cannot find.

What is the relationship between machine learning and deep learning?

Deep learning is one of many machine learning techniques.

Machine learning can discriminate new data and predict the future by discovering rules and patterns from a huge amount of data. In addition, machine learning improves learning accuracy by making judgments and adjustments by humans.

On the other hand, deep learning does not require human judgment because the computer itself discovers rules and patterns.

20 Commonly Used Algorithms for Machine Learning

Here are some algorithms that are commonly used in machine learning.

Algorithms are used in machine learning, but there are various types. Even among those who are interested in machine learning, there may be those who do not know what types there are.

Here are 20 algorithms that are often used for machine learning, so why not deepen your understanding of machine learning algorithms?

Algorithm 1 used for machine learning: linear regression

Linear regression is an algorithm that uses a single straight line to represent a scattered data distribution.

Linear regression is a type of regression analysis. Also, regression is one of the “supervised learning” among various types of algorithms, and it is used when dealing with quantities such as growth rates.

Linear regression is very simple, it’s called linear regression because it draws a straight line that represents the data in a sparsely distributed data distribution.

Algorithm 2 Used in Machine Learning: Regularization Method

Regularization is an algorithm used as a means of preventing overfitting.

In machine learning, regularization can penalize data with extreme weights in order to prevent “overfitting” that responds to overly biased data more than necessary.

Regularization imposes a penalty on the increase in learning complexity, and by finding the learning model with the smallest amount of penalty added to the training error, generalization performance can be improved.

Algorithm 3 Used in Machine Learning: Decision Trees

A decision tree is an algorithm that performs classification and regression using a tree structure.

“Classification” is a method for determining the category and type of data to be analyzed, and decision trees are used in random forests, which are a typical main classification method.

A decision tree is a threshold-based classification method that uses a tree model to make decisions. In addition, methods such as classification trees and regression trees are collectively called decision trees.

Algorithm 4 Used in Machine Learning: Logistic Regression Analysis

Logistic regression analysis is an algorithm that models the probability of occurrence of a phenomenon.

Logistic regression analysis is an algorithm developed in the field of medicine and is an adaptation of the linear regression mentioned above. Predict the probability of occurrence from the combination and degree of multiple factors.

Since logistic regression analysis deals with problems that are classified as Yes or No, even if negative values are obtained in other analyzes, logistic analysis provides easy-to-understand results.

Algorithm 5 used for machine learning: k-means

The k-means method is an algorithm that classifies into a predetermined number of clusters (k) using the average of the clusters.

The k-means method is a “non-hierarchical cluster analysis” that is a classification method without a hierarchical structure. Determine the number of clusters in advance and classify the data into that number of clusters.

It is suitable for using a lot of data compared to hierarchical cluster analysis. The k-means method is also called the “k-means method”.

Algorithm 6 used for machine learning: k nearest neighbors

The k-nearest neighbor method is an algorithm that performs classification and anomaly detection using the distance between data.

The k-nearest neighbor method acquires k pieces of data that are determined in order of proximity to the data to be discriminated, and classifies them into the class to which the data belongs by majority vote within that range. Therefore, the value of k greatly influences the result.

Among the algorithms belonging to the classification, the k-nearest neighbor method is a simple and easy-to-understand algorithm.

Algorithm 7 Used in Machine Learning: Support Vector Machines

A support vector machine is a boundary-drawing algorithm for dividing a data distribution into multiple classes.

It is a supervised learning that can be used for both classification and regression, and is a frequently used algorithm because of its high discrimination accuracy. Furthermore, since non-linear discrimination is performed, it is easy to implement.

Also, support vector machines have different calculation methods for classification and regression, so the contents are different even for the same algorithm.

Algorithm 8 Used in Machine Learning: Naive Bayes

Naive Bayes is an algorithm that computes all estimated probabilities given data and outputs the one with the highest probability as a result.

Bayes’ theorem is a theorem used to determine which estimate is plausible when there are multiple estimates based on a dataset.

Naive Bayes is also called a “naive Bayes classifier” and is a simple algorithm based on Bayes’ theorem.

Algorithms used in machine learning 9: Neural networks

Neural network is a term that refers to a general model in which artificial neurons solve problems.

It is a mathematical model that forms a network by connecting synapses like human brain functions. It is sometimes called an “artificial neural network”.

Neural networks are divided into supervised learning and unsupervised learning, and are applied to pattern recognition and data mining.

Algorithm 10 Used for Machine Learning: Adaboost

AdaBoost is an algorithm that constructs a strong classifier by combining weak classifiers.

Adaboost applies a weak discriminator that is more accurate than random, increases the weight of the misclassified items, and then prioritizes the weighted items. create.

By combining classifiers, it is easy to perform accurate classification, but it is also an algorithm that is prone to overfitting.

Algorithm 11 Used in Machine Learning: Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) is an algorithm for creating Markov chains and sampling probability distributions.

Algorithms that find the mean or mode of a probability distribution, and their common application is to numerically compute multiple integrals.

By using the random walk method in the Markov chain Monte Carlo method, we assume a group of particles moving around randomly, and add the value of the integrand function of the point to the integral every time the particle passes the point.

Algorithm 12 Used in Machine Learning: Random Forest

Random forest is a typical algorithm that uses decision trees.

It is an algorithm used for classification, regression, and clustering, and it is a method that uses many decision trees to make a majority vote or take an average.

Random forest collects decision trees to make the whole classifier and create a classifier with higher accuracy. Also, for regression, you can collect regression trees in the same way and use it as a random forest.

Algorithm 13 Used in Machine Learning: Perceptron

A perceptron is a type of neural network.

Perceptron is one of the neural networks announced in 1958 and can be said to be the origin of machine learning. Connect on the network by using multiple formal neurons.

A perceptron with two layers is called a simple perceptron, and a perceptron with three or more layers is called a multi-layer perceptron. The latter is now predominant.

Algorithm 14 Used in Machine Learning: Principal Component Analysis

Principal component analysis is an algorithm that compresses variable multidimensional data into fewer dimensions.

In machine learning, if there are too many features, the accuracy may deteriorate, so it is necessary to reduce the features by dimensionality reduction.

Principal component analysis is a technique that preserves the original information as much as possible, compresses multidimensional data by grouping it, and improves the overall outlook.

Algorithm 15 Used in Machine Learning: Nonnegative Matrix Factor Analysis

Nonnegative matrix factor analysis is an algorithm that decomposes matrices that do not contain negative values.

Non-negative matrix factor analysis cannot handle negative values, so it is an addition-only approach.

When extracting the features of a person’s face using non-negative matrix factor analysis, it is possible to extract the face by grasping the features well by adding parts just like drawing a picture of the actual face. can.

Algorithm 16 Used in Machine Learning: Topic Models

A topic model is an algorithm for describing what a document’s topic is.

The topics people talk about will vary from person to person, even if they are talking about the same topic. A topic model is used to determine the topic of each sentence against a data set of sentences.

In addition, topic models can be used not only for sentences but also for images and music.

Algorithm 17 Used in Machine Learning: Gaussian Mixture Model

The Gaussian mixture model is an algorithm that combines Gaussian distributions used for clustering.

By preparing a Gaussian distribution, which is a chevron graph, and adjusting the mean and covariance and adding them together, it is possible to approximate with arbitrary accuracy. A model approximated by adding these Gaussian distributions is called a mixed Gaussian model.

Algorithm 18 Used in Machine Learning: Collaborative Filtering

Collaborative filtering is an algorithm that analyzes purchase patterns using data purchased by a target and data purchased by non-targets.

Collaborative filtering analyzes the similarity and co-occurrence of products from the purchase patterns of products, and furthermore, by associating with the behavior history of the target person, it is possible to present personalized products to each individual.

Algorithm 19 Used in Machine Learning: Self-Organizing Maps

A self-organizing map is a neural network model that expresses the similarity of input information as a distance on the map.

Self-organizing map is a data mining method, and it is called self-organization because it can cluster various high-dimensional data without supervision.

Algorithms with strengths in data classification, summarization, and visualization.

Algorithms Used in Machine Learning 20: Association Analysis

Association analysis is a famous algorithm for data analysis in marketing.

This is a method of discovering the relationship between the sales of products A and B by analyzing the purchase pattern and purchase history of a certain target person when purchasing products.

By using association analysis, you can find the rules of how products sell, and you can identify unsold products and implement measures to increase sales.

Two attractions of Python used for machine learning

I will introduce two attractive features of Python used for machine learning.

When performing machine learning, the programming language Python is often used, but why is Python used?

Finally, I would like to introduce two attractive features of Python.

Attractiveness of Python used for machine learning 1: Simplicity

Python is characterized by its very simple writing style.

Python is a programming language with a philosophy of simplicity. The code is short and simple, and there is little to remember, making it easy for beginners to learn.

In addition, since the code is short, it is easy to read, and even if an engineer other than the developer looks at the source code, it is easy to understand what is written, and it is less likely to cause bugs.

Attractiveness of Python used for machine learning 2: Abundant libraries and frameworks

Python has the advantage of being able to use a wide variety of standard libraries and frameworks.

There are many standard libraries and frameworks that come with Python from the beginning, and you can use various functions just by installing it.

Therefore, in Python you can implement advanced processing without writing much code yourself.

Learn algorithms commonly used in machine learning

Various algorithms are used in machine learning.

Why not try to deepen your understanding of machine learning and algorithms by referring to the algorithms often used for machine learning and the appeal of Python used for machine learning introduced in this article.