GPT-4 will have 100 trillion parameters, 500 times more than GPT-3!

Table of Contents

Are there limits to large-scale neural networks?
Chip and Model – WSE-2 and GPT-4
What can we expect from GPT-4?

Are there limits to large-scale neural networks?

OpenAI was founded to tackle the challenges of achieving AI that can do anything humans can, or AGI (Artificial General Intelligence).

These technologies will change the world as we know it. When used properly, it benefits us all, but when used incorrectly, it can be one of the most destructive weapons. That’s why OpenAI has taken on the quest to realize AGI. To ensure that everyone can enjoy the benefits of AGI equally .

However, it is no exaggeration to say that the seriousness of this problem is one of the greatest scientific undertakings undertaken by mankind. Even with advances in computer science and artificial intelligence, no one knows how or when the problem will be solved.

There is an opinion that deep learning alone is not enough to realize AGI. Stuart Russell, a professor of computer science at Berkeley and an AI pioneer, said, “Looking at raw computational power is completely irrelevant… even if computational power were comparable to the size of the universe. We don’t even know how to build a truly intelligent machine . “

OpenAI, on the other hand, believes that large neural networks fed with large datasets and trained on large computers are the best path to AGI. As OpenAI CTO Greg Brockman said in an interview with the Financial Times : “I think the person with the biggest computer will get the biggest profit.”

And OpenAI did what they thought it would. To awaken the hidden power of deep learning, they began training larger models. The first modest step was the release of GPT and GPT-2 . These large-scale language models are the foundation of GPT-3 , which is the main character of this time. GPT-3 is a language model with 175 billion parameters, which is 100 times larger than GPT-2.

At the time of its introduction, GPT-3 was the largest neural network in history and is still the densest neural network today. The language’s specialization and myriad functions surprised many people. And despite the skepticism of some experts , the large language model already felt strangely human. These achievements were big enough to convince the OpenAI researchers to convince themselves that AGI is a matter of deep learning.(*Translation Note 1) This passage describing OpenAI’s activity philosophy is quoted from ”

Introduction to OpenAI ” published as a blog post by the organization on December 11, 2015 .According to an article published in The Financial Times and reprinted on Medium, “

Billion Dollar Bet on Human-Level AI, ” in addition to Stuart Russell mentioned above, Allen artificial intelligence Oren Ezioni, the head of the lab, also argues that AGI cannot be realized even if deep learning is scaled up. Ezioni believes that getting

to the next level of deep learning will require some breakthroughs , and that those breakthroughs wo

n’t happen simply by throwing money at them.

(*Translation Note 3) In the article reprinted in The Financial Times mentioned above, Greg Brockman responded to the criticism that AGI cannot be realized even if deep learning is scaled up, and that if the

amount of computation is increased, qualitatively They argue that different results are obtained .(*Translation Note 4) The linked webpage is ”

GPT-3 Creative Fiction ” run by American author Gwern Branwen. The page summarizes

various texts generated by GPT-3 and weaknesses of the model .Gary Marcus, professor of psychology at New York University, and Ernest Davis, professor of computer science at New York University, wrote in the US edition of MIT Technology Review in August 2020, ”

GPT-3, this talker: OpenAI’s I don’t know what the language generator is saying,’ posted an article. In the article, the professors argued that the model produces text that looks like it was written by humans, but

that it doesn’t actually understand the real world like humans do . As evidence that the model does not understand the real world like humans do, he cites nonsense sentences generated by the model. These pieces of nonsense are summarized on the web page ”

An experiment testing GPT-3’s ability in commonsense reasoning: the results .” By the way, see the AINOW translation article ” Trying the Turing test on GPT-3″ for the attempt to clarify the limits of its ability by implementing the Turing test on the same model .

OpenAI believes in the ” scaling hypothesis “. Given a scalable algorithm, Transformer, which is the underlying architecture of the GPT family in the context of general language models, training larger and larger models based on this algorithm will allow us to reach AGI. A direct route may be possible.

But large models are just one piece of the puzzle of solving the AGI problem. Training them requires large datasets and a lot of computing power.

Data is no longer a bottleneck as the machine learning community has begun to reveal the potential of unsupervised learning. Add in a generative language model and a few shots of task transfer, and OpenAI’s “large dataset” problem is solved.

OpenAI thought all it needed was a huge computational resource to train and implement the model. That’s why we partnered with Microsoft in 2019 . In exchange for licensing some of OpenAI’s models for commercial use to big tech companies, they gave them access to Microsoft’s cloud computing infrastructure and powerful GPUs that OpenAI needed.

However, GPUs are not made specifically for training neural networks. The AI industry simply uses a chip developed for graphics processing by the game industry in a form suitable for parallel computing. OpenAI wanted the best models, the best datasets, and the best computer chips. GPUs alone weren’t enough.

Many companies have realized this and have started to produce in-house chips that are specialized for training neural nets without sacrificing efficiency or capacity. But for a pure software company like OpenAI, it’s nearly impossible to integrate hardware design and manufacturing. So they took another route. It uses third-party AI-specific chips.

This is why Cerebras Systems has entered the world of large language model development. The chip company had already built the largest-ever chip for training large-scale neural networks in 2019 . This time, they’ve managed to build a giant chip again. OpenAI will take advantage of this wonderful piece of engineering.(*Translation Note 6) For more information on how Transformer works and the relationship between the algorithm and language AI, see the AINOW translation article ”

Chip and Model – WSE-2 and GPT-4

Two weeks ago, WIRED published an article revealing two important pieces of news .

In the first news, Cerebras remanufactured the largest chip on the market, the WSE-2 (Wafer Scale Engine Two ) . WSE-2 is about 22 cm on a side and contains 2.6 trillion transistors. By comparison, Tesla’s new training tile has 1.25 trillion transistors.

Cerebras found an efficient way to condense computing power so that the WSE-2 could have 850,000 cores (computation units), compared to hundreds of typical GPUs. A new cooling system also solved the heat problem and managed to achieve an efficient input and output flow of data.

There aren’t many uses for a super-specialized yet super-cheap megapower chip like the WSE-2. One of them is the training of large-scale neural networks. So Cerebras consulted OpenAI.

Now for the second piece of news. Cerebras CEO Andrew Friedman told WIRED: “From talks with OpenAI, GPT-4 will be about 100 trillion parameters…it won’t be ready in a few years.”

Since GPT-3, much has been expected of the release following OpenAI and the same model. We now know it’s coming out in a few years, and it’s going to be huge. It will be over 500 times larger than GPT-3. It’s not a misread, it’s exactly 500 times.

GPT-4 will be 500 times larger than the language model that shocked the world last year .(*Translation Note 7) Two weeks ago in this article means two weeks before the article was published on September 12th. Published on the 24th. These two articles were published on WORED.jp as “Clustering

huge chips, the potential of technology that dramatically enhances the capabilities of AI ” and ”

A semiconductor chip larger than the iPad accelerates AI research. ” Translated.

What can we expect from GPT-4?

100 trillion parameters is very big. To understand just how big this number is, compare it to our brain. The brain has about 80-100 billion neurons (on the order of GPT-3) and about 100 trillion synapses.

GPT-4 has as many parameters as the number of synapses in the brain.

The scale of such neural networks could represent a qualitative leap from GPT-3, although it is only imaginable. Current prompting methods may not fully test the potential of this system.

But comparing artificial neural networks to the brain is difficult. At first glance, this seems like a fair comparison, but only because we assume that artificial neurons are at least loosely based on biological neurons. A new study published in Neuron suggests otherwise. The researchers found that a neural network with at least five layers is required to simulate the behavior of a single biological neuron. That means you need about 1,000 artificial neurons for every biological neuron.

Even if GPT-4 isn’t as powerful as the human brain, it certainly leaves us with some surprises. Unlike GPT-3, it will not be just a language model. OpenAI chief scientist Ilya Sutskever hinted at the possibility of GPT-4 when writing about multimodality in December 2020.

“By 2021, language models will begin to be aware of the visual world. Text alone can express a lot of information about the world, but we also live in a visual world, so language alone will It’s imperfect.”

DALL-E , which is a small version of GPT-3 (with 12 billion parameters) and trained specifically on text-image pairs , already shows some signs of multimodality. At the time, OpenAI said that “manipulating visual concepts through language is now within reach.”

OpenAI is working non-stop to exploit GPT-3’s hidden capabilities. DALL-E is a special case of GPT-3, much like the Codex . However, these are close to special cases rather than absolute improvements. GPT-4 promises more than a special case. It will combine the depth of specialized systems like DALL-E (text-to-image conversion) and Codex (coding) with the breadth of general systems like “GPT-3” (general language).

And what about other human functions, such as reasoning and common sense? On this point, Sam Altman says he’s “optimistic” but not sure.

There are many questions about AGI that are mostly unanswered. No one knows if that is possible. I don’t even know how to make it. Who knows if larger neural networks will even come close to AGI. But something cannot be denied. GPT-4 will continue to be a presence to watch.Romero, the author of this article , posted on Medium in June titled ”

Software 3.0 – How Prompts Will Change the Rules of the Game .” Based on the recognition that the rise of deep learning and the birth of GPT-3 will change the way software is controlled, this article summarizes the historical changes in software control as follows.

Three stages in the historical evolution of software control

Software 1.0: programming culture before the 3rd AI boom. Programmers describe the behavior of software entirely in a programming language .Software 2.0: A programming culture that emerged after the third AI boom. Give the AI model a goal (output) and learning data (input) and train it to achieve the goal. The behavior of AI that accomplishes its goals is not written by programmers, but is determined through learning . Therefore, there is a “black box problem” where it is impossible to explain why AI works.Software 3.0: The programming culture revealed by the birth of GPT-3. Prompt -style interaction, which is obtained by giving arbitrary sentences to the model as input and outputting sentences generated by the model , is an activity that manifests the linguistic potential of the model each time it is executed. Execution also has the aspect of learning the same model.

In order to take advantage of software that runs on Software 3.0 above, you will need a prompt usage or grammar to get the expected output. And GPT-3 generates nonsense sentences because people don’t know the correct “prompt grammar ,” Romero said. Without establishing a grammar of such prompts, it may be impossible to measure the potential of GPT-4.
(*Translation Note 9) For DALL-E, please refer to the AINOW translation article ”

DALL-E explained in less than 5 minutes “.(*Translation Note 10) For language models that convert natural language into programming language, such as the OpenAI Codex, see the AINOW translation article “Will

AI replace programmers?” ” and AINOW translation article “

If you use AI Copilot on GitHub, you may be sued ”.

Sam Altman serves

on the Board of Directors of OpenAI . He was also involved in (now irrelevant)

Y Combinator , which helps startups get started .

GPT-4 will have 100 trillion parameters, 500 times more than GPT-3!

Are there limits to large-scale neural networks?

Chip and Model – WSE-2 and GPT-4

What can we expect from GPT-4?

LEAVE A REPLY Cancel reply

Recent Posts

Most Popular

Recent Comments

EDITOR PICKS

POPULAR POSTS

POPULAR CATEGORY

ABOUT US

FOLLOW US