In July 2021, LINE Corporation announced the world’s first large-scale general-purpose language model specialized for Japanese as “HyperCLOVA.”
In recent years, large-scale language models have been actively developed, mainly in the English-speaking world. This wave has spread to the Japanese-speaking world, and on November 25, 2020, LINE announced the development of a large-scale general-purpose language model.
As of January 2022, three models with 6.7 billion, 13 billion, and 39 billion parameters have been announced, and 82 billion models are under development.
This time, we interviewed Mr. Toshinori Sato, manager of the NLP development team at LINE Corporation, about the behind-the-scenes development of “HyperCLOVA.”
- What is HyperCLOVA
- Blog data improves dialogue accuracy | Behind the scenes of HyperCLOVA development
- Social Impact of HyperCLOVA | Duality of Ethics and Business
- Future HyperCLOVA｜Multimodal is also a key
What is HyperCLOVA
First, let’s review the overview of HyperCLOVA.
HyperCLOVA is the world’s first large-scale general-purpose language model specialized for Japanese, jointly developed by LINE Corporation and NAVER Corporation.
By inputting a small amount of language using a model trained on a large amount of data, it is possible to perform language processing that matches the context and enable natural dialogue between agents and humans.
Blog data improves dialogue accuracy | Behind the scenes of HyperCLOVA development
ーーWere there any difficulties in developing a large-scale general-purpose language model that had few precedents in the Japanese-speaking world?
The difficulty was that I could not directly apply my knowledge of English and Korean to the Japanese model.
The model is constructed by accumulating 6.7 billion, 13 billion, 39 billion parameters in order from the smallest size. At 13 billion, the accuracy dropped sharply, but after trial and error, such as arranging the data, it rose sharply at 39 billion.
In this way, there were so many things that you couldn’t know until you actually tried it, like in a treasure hunt or hackathon, and it took two weeks to a month for each trial and error process.
It was also difficult to create the corpus. The data that trains the model basically clarifies the rights and creates a corpus that takes into consideration not only the legal aspect but also the emotional aspect.
ーーWhat was the most weighted data in the learning data?
With 39 billion models, we learned the most blog data.
It was surprising that even blog data could improve the accuracy of dialogue.
We submitted a dialogue system using HyperCLOVA to the dialogue system live competition sponsored by the Japanese Society for Artificial Intelligence, Language/Speech Understanding and Dialogue Processing Study Group, and opened a chat dialogue system that integrates knowledge bases and considers persona consistency. / We were able to finish well with 1st place on both tracks in the situation.
At that time, other companies had a lot of Twitter data in their language model. On the other hand, we tried our model with almost no Twitter data, but the system using HyperCLOVA as the dialogue system was evaluated for better performance.
Data other than dialogue data may have been needed as a necessary element when creating a dialogue system that conducts evaluations like this competition.
Blogs and news articles contain more solid interview articles, and it may be that there are many sentences that are easy to form as conversations.
For the 82 billion models we are currently building, we will use Japanese websites that basically do not require login and data borrowed from companies that have received permission after building the 39 billion models. I am allowed to use it.
Social Impact of HyperCLOVA | Duality of Ethics and Business
ーーI think it will have a big impact on model societies like HyperCLOVA. How did you discuss it with the internal business development members?
In addition to talking with members on the business side of LINE, we are also taking on the challenge of creating technology together by contracting with other companies in the sense of PoC and trial for technology development purposes.
In addition, we are in talks with another company under the same Z Holdings to see if we can develop more in-depth technology.
However, in many cases, every company has an issue they are currently facing, and they often start with the idea of whether or not they can improve that issue by using HyperCLOVA.
The problem here is that there aren’t many people who can handle large-scale, general-purpose language models. Therefore, there is a problem that each company cannot grasp what HyperCLOVA can actually do.
A long time ago, when search engines came out, I think there was a time when people thought that since search engines know everything, they should be able to answer anything. I think there is.
— When it comes to large-scale language models, we have to discuss ethical issues in depth. How do you cover ethics?
When implementing AI ethics, the current situation is that it is difficult to create a system that has AI ethics itself and embodies ethics. We are currently developing a filter to inspect whether the output and input are appropriate.
The first core part starts with a rule base, but the problem is how far to make it.
For example, the same word may have different meanings just because the domain is different. Therefore, it is difficult to decide which information should not be included in which domain, depending on the purpose.
On another note, depending on how the words are applied, even perfectly harmless words can become strange when combined.
For example, “bean sprouts” and “burdock” are perfectly fine words, but “bean sprouts” and “burdock-like man” are metaphors for poor-looking men. Humans speak in a complex way, so they can be used in various ways.
Depending on how this is applied, there are cases where the expression does not meet certain ethical standards, so it is difficult.
Our next step is to meet an even wider range of ethical requirements.
ーーAfter actually developing HyperCLOVA, how do you think it will impact society?
When using a large-scale general-purpose language model such as HyperCLOVA, it is better to help create intelligent data that can be used by many people over a long period of time, rather than skillfully processing individual, completely disparate requests. is effective. .
For humans, the work of creating and continuing to create a lot of data is difficult. With HyperCLOVA’s help in these tasks, it generates candidate answers to various problems based on models created from text data on a scale that humans cannot comprehend.
For example, when we lightly benchmarked the accuracy rate of general quizzes, we found that it achieves about 70%. This means that when HyperCLOVA is used to generate quiz answers, humans can only focus on 30% of the questions they were unable to answer correctly. Specifically, it is possible to work efficiently, such as removing cases where wrong answers are given within the range that humans can understand, and deeply examining cases where answers can be given in new directions. By concentrating on resolving deeper issues in this way, I think it will be possible to improve the issues that were stuck and move forward.
It can also be used for data augmentation. I think HyperCLOVA is useful when you want to quickly launch a new model building system from where there is no data at the very beginning of business utilization. I think it’s convenient to be able to create a large amount of new data even if it’s slightly wrong.
In addition to this, I think it will be useful for text generation and summarization.
For example, I think there are various patterns, such as summarizing the explanation and generating a catchphrase, or summarizing the minutes of a roundtable discussion and summarizing it in about three lines.
I think that it is also possible to organize text that is easy for humans to read and in a specific direction by appropriately outputting messy text that cannot be read by humans.
Future HyperCLOVA｜Multimodal is also a key
ーーIn Japan, the field of natural language processing is gaining momentum, isn’t it?
I feel that each company is going through a lot of hardships, and I’m sure they’re going through exactly the same hardships as we are.
I would like to think about it together if possible, but I think it would be difficult from a business point of view.
So, we are moving to open source models that can be released without problems outside of HyperCLOVA.
Of course, there is a difference in performance between the model that LINE uses internally and the model that is open source, but by increasing the number of models that everyone can freely use, I think it will become possible to do things that were once impossible. I don’t think so.
We would like to regularly release models that allow your implementations and services to demonstrate higher performance than ever before.
ーーPlease tell us about the future prospects of HyperCLOVA.
I believe that the size and performance of models will continue to grow in the future.
On the other hand, recent trends such as RETRO announced by DeepMind indicate that “the model itself can be small”. On the other hand, we also know that in order to make a small model perform well, we need to have a large amount of data at hand.
Therefore, in 2022, we will put more effort into collecting and creating data than in 2021.
In the direction of multimodal, I think image generation is very catchy, but on the contrary, we have a greater need to create text from images. I think there are people who say, “It’s been around for a long time.”
LINE’s NLP development team is also in charge of research and development for creating captions. I would like to be able to create captions that have a very high resolution and are so smooth that when you listen to them as they are, you will feel like you are listening to someone you know.
* RETRO: “Retrieval-Enhanced Transformer” is a model that utilizes external memory as a dictionary to greatly reduce training costs, and rivals the performance of a neural network 25 times its size.
It is said that human intelligence is born by linking information obtained from various perceptions to linguistic ability. In other words, without the development of natural language processing technology, AI that connects various perceptual information and can be used for general purposes will not be created.
Natural language is indispensable in our lives and work, and it is important for technological development whether the system can handle it in the same way as humans.
However, there are legal and ethical concerns as well as the meaning of words, and it will be important in the future to determine what general-purpose language models should be through multifaceted discussions.
AINOW explains natural language processing technology in an easy-to-understand manner on the business side, and also publishes an interview with Arisa Ema, associate professor at the University of Tokyo Future Vision Research Center, who is an authority in the field of AI and ethics. .