Tuesday, May 28, 2024
HomeAISummary of the latest natural language processing technology announced at Google I...

Summary of the latest natural language processing technology announced at Google I / O 2022


Table of Contents 

  • Supports 24 languages ​​using integrated learning data
    • long tail language problem
    • Creation of Integrated Learning Data and Its Advantages
    • Contributions of native speakers
    • Future tasks
  • Implemented automatic summarization model “PEGASUS” in Google Docs
    • Automatic summarization prior to PEGASUS
    • PEGASUS Innovation
    • PEGASUS improvements
    • Future tasks
  • The world’s largest language model “PaLM”
    • New AI design concept “Pathways”
    • A Breakthrough in Chained Inference
    • Supports code generation
    • Residual bias
    • Future tasks
  • Other AI technologies announced
    • Three evolutions of Google Maps
    • Two useful new features on YouTube
    • Improved image quality of Google Meet
    • Launch of AI Test Kitchen
  • summary


From May 11th to 12th, 2022, the annual Google-sponsored developer conference ” Google I/O 2022 ” was held as a hybrid. If you read the article summarizing the keynote speech given by the company’s CEO Sundar Pichai, you can see that a number of AI technologies have been announced. In this article, I will extract and explain the announcements at Google I/O, especially those related to natural language processing.

Supports 24 languages ​​using integrated learning data

At Google I/O 2022, it was announced that Google Translate now supports 24 new languages . Supported languages ​​include Assamese, spoken in northeastern India, and Kurdish, spoken by Kurdish people (see appendix at the end of this article for 24 translation supported languages). Large-scale development of multilingual machine translation was essential to realize this new function. An overview and details of these developments are provided in Google AI research blog posts and papers .

long tail language problem

Machine translation of long-tail languages ​​(niche languages ​​with few users) is difficult because there is overwhelmingly less training data compared to major languages ​​such as English . Since natural language processing research on long-tail languages ​​has not progressed, there is also the difficulty that the method itself for collecting learning data has not been established .

The graph below shows the amount of training data for translation for various languages. The horizontal axis represents the language type, and the vertical axis represents the amount of learning data. If you align the languages ​​so that the language with the largest amount of learning data is on the left, the long-tail languages ​​are on the right side of the graph. Since the distribution of this graph is the same as the long tail , which is the concept of Internet business, niche languages ​​with few users are called long tail languages. In addition, the area colored in red in the graph represents “parallel data”, which means learning data related to correspondence with other languages, and the blue area is learning data for a single language that lacks correspondence with other languages. It stands for “monaural data”. From this graph, we can see that the languages ​​for which parallel data useful for machine translation are maintained are only a small part of the languages ​​spoken in the world .

Amount of training data for each language. Image source: Quoted from Google Research blog post

Creation of Integrated Learning Data and Its Advantages

To achieve machine translation of long-tail languages, the Google research team performed the following tasks.

  1. Developed algorithms for scraping long-tail languages.
  2. Collect learning data for various languages ​​by scraping long-tail languages.
  3. Integrate training data of various languages ​​into a single training data . Such integration enables transfer learning in machine translation , which is more efficient than training on individual training data.
  4. Generate training data for long-tail languages ​​to increase training data.
  5. Get a native speaker to rate your machine translation.

As described above, multilingual machine translation including long-tail languages ​​has been realized. In order to evaluate the quality of the multilingual machine translation that has been achieved, the Google research team independently developed a translation quality index RTT LANGID CHRF (* Note 1) was calculated. The calculated results are shown in the graph below. The vertical axis means the RTT LANGID CHRF value, and the horizontal axis means the amount of learning data. Red plots represent languages ​​with rich training data and blue plots with poor training data It can be seen from the graph that some languages ​​with little training data were able to achieve the same translation quality as languages ​​with abundant training data .

(*Note 1) For the definition and details of RTT LANGID CHRF, see ” 4.3 RTT LANGID CHRF ” in the paper “Building a Machine Translation System for the Next Thousand Languages”.

Contributions of native speakers

Native speakers have made significant contributions in evaluating the quality of machine translations of long-tail languages. This is because training data for long-tail languages, which are still in the process of being developed, contain many errors, and correcting these errors cannot be done without the cooperation of native speakers .

The Google research team also investigated the fundamental question of whether the long-tail language community wants multilingual machine translation in the first place. As a result of these studies, we found that the long-tail language community tends to want multilingual machine translation, even if it is of low quality . This result means that the development of machine translation for long-tail languages ​​is extremely meaningful.

Future tasks

The Google research team lists the following three items as future challenges for improving the quality of multilingual machine translation.

  • Creating dictionaries for long-tail languages: Some long-tail languages ​​do not have dictionaries. Creating dictionaries for these languages ​​is an efficient means of improving the quality of machine translation.
  • Preparation of learning data by various means: It may be desirable to collect data manually for long-tail languages ​​for which there is little information on the Internet. Also, as mentioned above, hearing from native speakers is still important.
  • Leverage multimodal learning data: Most of the world’s spoken languages ​​lack written language or standardized spelling conventions. In order to increase the number of languages ​​supported by machine translation in the future, we will need training data that includes both voice data and text data.

Implemented automatic summarization model “PEGASUS” in Google Docs

It was also announced that automatic summaries will be implemented in Google Docs . However, the release date of the same function is next year, and the corresponding language is unknown. The feature leverages the revolutionary automatic summarization model PEGASUS . The Google AI research blog post summarizes the research history of the model.

Automatic summarization prior to PEGASUS

Automatic summarization by AI models means performing a Sequence to Sequence task that generates sentences that summarize arbitrary sentences. RNNs used in early language AIs were not good at summarizing long sentences .

The invention of Transformer and Transformer-based language models such as BERT took the development of automatic summary models to a new level. With Transofrmer, I was able to efficiently perform long-form Sequence to Sequence tasks. Also, by using Transformer-based language models, it is now possible to train using unlabeled training data.

PEGASUS Innovation

The automatic summary model PEGASUS , jointly announced by Google and Imperial College London in July 2020, is an evolution of the Transformer-based language model as an automatic summary model.

The innovativeness of PEGASUS lies in the use of GSP (Gap Sentence Prediction) for pre-learning. GSP is learning to predict the entire sentence before masking, given as input a masked part of an unlabeled news article or web document.

Schematic diagram of GSP. Image source: Quoted from a paper discussing PEGASUS

PEGASUS improvements

When integrating PEGASUS into Google Docs, we needed to further refine our published model. Improvements include the following two items.

  • Preparing training data for fine-tuning: Early in development, training data for fine-tuning included summaries in various formats. For example, there was a long, detailed academic summary and a brief, punchy wording for managers in a single set of training data. PEGASUS was confused by using such training data. Therefore , we cleaned the training data to be consistent and retrained, resulting in better summaries.
  • Improved architecture: PEGASUS is a Transformer-based model, but if the architecture consists only of Transformers, there will be a large delay in generating long summaries. When Transformer generates long summaries, it generates multiple tokens, which are strings of fixed length, and then concatenates them. There is a delay due to looking at multiple tokens in this concatenation process. To mitigate these delays, we adopted a hybrid architecture that uses RNNs .

Future tasks

Such automatic summaries have room for further improvement. There are three issues that need to be addressed in order to improve:

  • Further refinement of the training data: As a result of cleaning the fine-tuning training data for consistency as described above, only a limited form of summarization can be accommodated. In the future, we plan to expand the learning data so as to further increase the corresponding summary formats .
  • Collecting feedback from readers: The quality of summaries is likely to vary from reader to reader . For example, if you provide a general reader with a technical summary, the summary will be rated as difficult. Therefore, gathering reader feedback is essential to assessing and improving the quality of the abstracts.
  • Long-form summaries: Long- form summaries (like novels) are a major goal of automatic summaries, and what readers want. However, high-quality automatic summarization of long texts is currently technically difficult, so medium- to long-term research and development is required.

The world’s largest language model “PaLM”

Pichai’s keynote speech also mentioned the world’s largest language model “PaLM” (as of May 2022). The official name of the model announced in April 2022 is “Pathways Language Model”, and as the name indicates, the new AI design concept “Pathways” advocated by Google is adopted.

New AI design concept “Pathways”

According to the official Google blog post that introduced Pathways, when comparing this design concept with conventional AI design concepts, it can be summarized as shown in the table below.

Conventional AI design concept Pathways
Train from scratch for each task . Also, tasks cannot be combined to execute a new task. The learning of any task can be diverted to other tasks . Tasks can be combined to perform new tasks.
Basically unimodal (image recognition only, natural language processing only, etc..) Multimodal (supports multimedia such as images, sounds, and languages)
Dense model (uses all parameters during task execution) Spurs model (uses only parameters necessary for task execution) (*Note 2)
(* Note 2) Regarding the comparison between the spats model and the high-density model, please refer to the AINOW translated article ” GPT-4 is Coming Soon. what we know about it. , headline ” Sparseness: GPT-4 Will Be a Dense Model “.

A Breakthrough in Chained Inference

The number of parameters of PaLM that adopted Pathways was 540 billion, which was the largest in the world as of May 2022. However, only some parameters are used during individual task executions. When the performance of this model was measured by Google’s benchmark BIG-bench , which consists of more than 150 language tasks, it showed the highest performance . In the graph below, the vertical axis represents the performance value using BIG-bench, and the horizontal axis represents the model size. From this graph, we can see that PaLM’s performance improves sharply when the model size exceeds 10 billion, but even with the same model, it does not reach the best score of humans .

What is noteworthy about the performance of PaLM is the significant improvement in logical reasoning, which is a weak point of conventional language AIs, including GPT-3 . The Google AI research blog post explaining this improvement includes a graph summarizing the results. From the left, fine-tuned GPT-3, GPT-3 trained specifically for logical reasoning, normal PaLM, PaLM with ” chain of thought ” (described later) , “chain of thought” and ” self-consistency ( It means PaLM that implements one of the latest ensemble techniques called ” self-consistency “, and the accuracy rate of this right PaLM was the highest at 75%.

Comparison of PaLM’s logical reasoning ability. Image source: Taken from a Google Research blog post discussing the chain of thought in PaLM

As mentioned above, logical reasoning was improved by adopting a reasoning model called “chain of thought”. Thought chaining refers to the technique of splitting an inference at runtime and then finally combining it . Conventional language models were trained on training data that paired inference conditions and inference conclusions, so there were errors when trying to derive conclusions directly from inference conditions. On the other hand, in the chain of thought, intermediate conclusions are generated from the inference conditions, and the final conclusion is derived using the generated intermediate results. It can be said that this technique exactly mimics the human reasoning process .

Supports code generation

PaLM also supports code generation like the OpenAI Codex. They perform tasks such as generating code from comments, translating from one programming language to another, and fixing compilation errors.

Residual bias

PaLM, like other large-scale language models , produces output that includes gender, occupational, and religious biases . For example, sentences about Islam are relatively more likely to be generated with negative words such as terrorism. The graph below visualizes the probability that generated sentences about each religion contain negative words. A longer colored band indicates a higher probability of containing negative words. We can see that the negative words are relatively more likely to be included in sentences about denominationals, Muslims, and Jews .

Future tasks

In developing a large-scale model that adopts Pathways such as PaLM, the problem is ” how to scale up appropriately “. The recent announcement of DeepMind’s language model “Chinchilla” has revealed that there is room for reconsideration of the relationship between size and performance of dense language models. Conventionally, it was believed that the larger the model size, the better the performance would be in proportion to the size. However, it turned out that not only the model size but also the amount of training data is important for the performance of the language model (*Note 3).

Currently, there are many unknowns about the scaling of the Spurs model employed in PaLM . The main factors related to scaling include model size, training data, computational performance during training, batch size during training, etc. Tradeoffs among these factors will be investigated in the future.

(*Note 3) Regarding the relationship between model size and learning data in language models, please refer to the AINOW translated article “ GPT-4 is Coming Soon. what we know about it. See the headline ” Model Size: GPT-4 Won’t Be Super Large “.

Other AI technologies announced

In Pichai’s keynote speech, Google’s latest AI technology was mentioned in addition to the natural language processing explained above. Below, we briefly introduce four such AI technologies.

Three evolutions of Google Maps

Google Maps has evolved in three ways with AI technology . The first evolution is the ability to detect buildings from satellite images using computer vision and neural networks, making the map more detailed. Specifically, since July 2020, the number of buildings on Google Maps in Africa has increased fivefold, from 60 million to about 300 million, and India and Indonesia have doubled the number of buildings this year. The buildings detected by the above building detection technologies now account for more than 20% of the buildings on the map .

The second evolution is the implementation of immersive views. With this new feature, for example, when you want to visit the Palace of Westminster in England, you can seamlessly see from the photoreal bird’s eye view of the palace to the interior of a nearby restaurant. These drone-like visual experiences are synthesized using static images accumulated by Google using neural rendering , an AI drawing technology . Immersive View will roll out later this year in Los Angeles, London, New York, San Francisco, and Tokyo, with more cities coming soon.

The third is live view. This function uses AR to superimpose arrows and other elements on camera images of the cityscape to guide the user to their destination . Furthermore, it is possible to realize a location information game that displays a dragon in the cityscape. This function applies AI technology called global localization.

Two useful new features on YouTube

Two new features using AI technology have also been added to YouTube. The first new feature is the automatic generation of chapters introduced last year . With chapters, your viewers can easily reach the parts of interest even in long videos. As of May 2022, there are 8 million videos with automatically generated chapters, and we plan to increase it to 80 million in the next year. This function utilizes technology developed by DeepMind (*Note 4).

The second is machine translation of subtitles in YouTube videos played on smartphones, and supports 16 languages. From June 2022, we will support machine translation of YouTube video subtitles in Ukraine, aiming to provide accurate information about the invasion of Ukraine.

(*Note 4) Although it was not explicitly stated in CEO Pichai’s keynote speech, it is speculated that the technology developed by DeepMind, which is used to automatically generate chapters in YouTube videos, is the multimodal recognition model Flamingo . When given any image, video, or text as input, the model outputs text corresponding to the input content.

Improved image quality of Google Meet

Google Meet, an online meeting tool provided by Google, now uses AI technology to display people’s skin colors more appropriately . This image quality improvement solves the problem that people of color were not able to reproduce the actual skin color properly unless they implemented computer vision that could distinguish a wide range of skin tones. was done for

With the cooperation of Dr. Ellis Monk, a sociologist affiliated with Harvard University, the above image quality improvement conforms to the “Monk Skin Tone,” which is a skin color scale (gradation) devised by Dr. Ellis Monk.

Launch of AI Test Kitchen

Last May, Google announced LaMDA , a language model focused on human conversation. The model has been tested by thousands of Googlers and has seen a significant improvement in quality with fewer inaccurate and offensive responses.

Based on these test results, we launched AI Test Kitchen , a website that allows non-Googlers to participate in LaMDA testing . Through the site, you can participate in three tests:

  • Imagine It: LaMDA generates sentences for any scene described in text (such as “Deep Sea Exploration”).
  • List It: Text in any goal and LaMDA identifies the subtasks needed to achieve it and displays them in a list.
  • Talk About It: Chat about any topic (such as “dogs”) without straying from the human topic.

In addition, AI Test Kitchen will open access within the next few months, and at first we will ask academics such as AI researchers, social scientists, and human rights experts to participate in the test. We are planning to increase the number of people.


As you can see from the above announcements, Google still leads the world in AI research. When it comes to natural language processing, the Pathways model proposed by the company is likely to become the standard architecture for future language model development . This is because the sparsity characteristic of this architecture is more similar to the human brain than existing high-density models, and this similarity is believed to contribute to the realization of AGI.


What is an AI algorithm?


Please enter your comment!
Please enter your name here

Recent Posts

Most Popular

Recent Comments