Tuesday, May 28, 2024
HomeTechnologyWhat is CNN (Convolutional Network)? Easy-to-understand explanations using diagrams and examples!

What is CNN (Convolutional Network)? Easy-to-understand explanations using diagrams and examples!

In 2012, Professor Hinton of the University of Toronto in Canada ushered in the spring age of AI. The breakthrough was a technology called ” deep learning ” in the field of machine learning.

By constructing a model that mimics human brain functions, the accuracy of machine translation and speech recognition has improved dramatically. Among them, the field of image recognition may be said to be a field with many applications such as automatic driving technology.

This time, I would like to introduce CNN, which is often used in image recognition technology in deep learning.

Table of Contents

  • What is CNN?
    • What is image recognition?
    • convolutional layer
    • pooling layer
  • Applications of CNN
    • Unmanned cash register store
    • Object detection using drive recorder
    • diagnostic imaging in medicine
  • At the end

What is CNN?

CNN (Convolutional Neural Network) refers to a neural network structure that adds an operation called “convolution”. The biggest feature of CNN is “local feature extraction”.

In order to make this feature easier to understand, we will first explain “ image recognition ”.

What is image recognition?

For computers, images are represented simply by numerical data, unlike humans, who can read what kind of objects are in an image from various angles.

As shown in the figure, when a computer recognizes a “snowflake”, it divides the image into pixels and extracts the features of the image based on the size of the divided values.

The clear images that we usually see are represented by very fine pixels.

Until the advent of CNN, the problem lies in how efficiently the “feature extraction” in image recognition can be performed. What made this possible is the neural network structure, CNN, which includes convolutional layers.

convolutional layer

What should I do when I want the image recognition to be more accurate, in other words, to recognize the image as I want?

For example, if you want to identify a cat as the correct answer by image recognition, but if you receive an extra background image etc. as an input, the amount of information will be enormous, and the accuracy will be low because you will try to identify the extra image. I can somehow imagine that.

The solution to this problem is a neural network structure that incorporates a “convolutional layer” called CNN.

Illustrate a convolutional layer.

To explain the sequence of events,

  1. Numerical data on a relatively small grid called a filter (or kernel) and a partial image (window) of the same size are prepared.
  2. By calculating this filter and window, a certain value is obtained, so store it, shift the filter by the amount set by the stride, and calculate again
  3. When the calculation is completed for all inputs, the set of numerical values ​​obtained by the calculation is popped out as locally extracted features.
  4. By calculating this numerical value on the neural network, it is possible to determine what kind of features the image has.

The flow is as above.

By placing the convolutional layer before the input layer of the neural network, we receive data that is more likely to have features as input.

Summarize the terminology.

Convolutional layer
A layer that extracts the feature amount locally for the image to be identified. There is a difference in the feature amount depending on the size and value of the filter. Filter (kernel) Numerical data on a grid relatively smaller in size than the image to be identified .
The feature extraction in the convolution layer differs depending on the size of the filter and the value of the numerical data. ・Numeric
data of a partial image that matches the size of the filter in the image to be identified. It will be calculated directly with the filter. ・The degree of operation to move the stride window is called stride.
For example, if the stride is “2”, it will be shifted by 2 each time the calculation is performed.

pooling layer

Similar to convolutional layers, there are techniques for extracting features of images that you want to identify. That is the “pooling layer”.

This work is a technique for extracting features from the numerical data of the window itself. The pooling layer is also explained in the figure. This diagram illustrates MAX pooling.

In MAX pooling, the windows are grouped by equally dividing them, and the highest numerical value among them is extracted as a feature value. In addition to MAX pooling, there is also mean value pooling.

In CNNs, pooling layers are used in pairs with convolutional layers and trained with computations on activation functions and biases.

Applications of CNN

Finally, we will introduce three application examples of image recognition technology using CNN.

  1. Unmanned cash register store
  2. Object detection using drive recorder
  3. diagnostic imaging in medicine

I will explain what each of them are.

Unmanned cash register store

The most obvious example is the mechanism of “Amazon Go”, which is called the unmanned convenience store of the future.

Amazon Go, which allows you to shop without going through a cash register, is a function of cameras, microphones, sensors, etc. on the ceiling. This function incorporates image recognition technology based on deep learning.

Object detection using drive recorder

The application of object detection technology is also applied to drive recorders.

For example, “Urban X Technology”, a venture company of the University of Tokyo, has its own smartphone app that functions as a drive recorder. It enables the cycle of being paved.

It may be thanks to object detection that this major trend supports traffic safety and protects everyone’s common sense.

diagnostic imaging in medicine

Image recognition technology is also active in the medical field. Image recognition technology has the potential to be useful in the medical field, such as diagnosing a medical condition from a huge amount of X-ray data and determining whether or not a person has dementia based only on a photograph of their face.

At the end

This time, I tried to explain CNN roughly.

“Photographs that I have been vaguely looking at until now are represented on a computer as numerical data.” This recognition alone will deepen your understanding of CNN.

The application of image recognition technology is expected to become a technology that will be used in advance in the next 5 to 10 years. It may not hurt to have a firm understanding.



Please enter your comment!
Please enter your name here

Recent Posts

Most Popular

Recent Comments