ELYZA Co., Ltd., an AI startup from the University of Tokyo’s Matsuo Lab, has successfully developed a generative summary model in Japanese .
Also, from August 26, 2021, a summary AI “ELYZA DIGEST” using the same model has been released to the public as a demo site.
Table of Contents
- “Generative” summary AI that summarizes text into 3 lines
- Promoting the Practical Use of a Giant Language Model for Japanese
- Challenging the complicated and difficult “Summary of Dialogue Text”
“Generative” summary AI that summarizes text into 3 lines
ELYZA DIGEST is a “generative” summary model, an AI that generates a summary sentence from 1 based on the input text data and summarizes it in 3 lines . You can summarize not only texts that are arranged like books and novels, but also random sentences such as meeting minutes and dialogue texts.
To summarize, enter the relevant text and also enter a URL to generate a summary based on the full text of that page.
ELYZA DIGEST was developed using cutting-edge natural language processing (NLP) technology, and has been used in a demonstration experiment that began on July 1, 2021 with Sompo Holdings, Inc., a company listed on the First Section of the Tokyo Stock Exchange. .
Promoting the Practical Use of a Giant Language Model for Japanese
With the development of voice recognition and image recognition technology, it is now possible to recognize voice data and text written on paper and convert it to text data. However, the accuracy of NLP in understanding and utilizing the recognized text was still at a level that required human intervention .
Under such circumstances, the appearance of the large-scale language model “BERT” announced by Google in 2018 has dramatically improved its accuracy, and in the English-speaking world, services using NLP have begun to appear. However, in the Japanese-speaking world, practical application of BERT has not progressed due to the high level of technical difficulty that depends on language characteristics and the lack of published data.
ELYZ is aware of this problem, and in 2020 developed a Japanese-specific AI engine “ELYZA Brain” that utilizes a large-scale language model and the company’s own large-scale data set . After that, we improved “ELYZA Brain” and released ELYZA DIGEST, which specializes in “summaries” that often occur in daily life and business.
Challenging the complicated and difficult “Summary of Dialogue Text”
ELYZA DIGEST continues to improve toward the practical use of “Summarization of Dialogue Text”. There are four major obstacles to summarizing dialogue text:
Summarization using AI is classified into “extraction type” and “compression type” that extract parts from the text, “template type” that replaces with a prepared template, and “generation type” that is generated from scratch. ELYZA DIGEST has the potential to overcome the above four obstacles because it is generative and can flexibly generate summaries.
When the company actually summarized the dialogue text using ELYZA DIGEST, it was found that even if there were interjections peculiar to colloquial language such as “ah” and “erh” and mistakes in speech recognition, it was possible to obtain a reasonable summary as shown in the figure below. was able to generate
In order to evaluate the accuracy of summaries by ELYZA DIGEST, we conducted comparison verification with human-made summaries on two evaluation axes.
As a result of the verification, we found that it can output 90% of the articles with almost the same accuracy as humans, but it may generate sentences that are not in the original text or that differ from the facts. Regarding fluency, the percentage of output with errors was high. This resulted in sentences that were difficult to read due to grammatical errors and subject omissions.
Comparing the summarization time, it takes about 5 minutes for a human to summarize an article with an average of 900 characters, and about 10 seconds for ELYZA DIGEST. I can do it.
The company is working to further improve the accuracy of its models through its in-house research and development, and at the same time, is rapidly promoting the practical application of NLP technology for various use cases, thereby increasing its impact on society.