The Limited Times

Now you can see non-English news...

The race to make AI smaller (and smarter)

2023-05-30T12:22:44.105Z

Highlights: Teaching fewer words to great linguistic models could help them sound more human. A successful mini-model would be almost as capable as high-end models, but much smaller, accessible and human-friendly. A group of young academics working on natural language processing launched a challenge to try to reverse this paradigm. They asked teams to create functional linguistic models using datasets less than one-ten-thousandth the size of those used by the most advanced large linguistic models. The winner would be announced the day it was announced the challenge was announced.


Teaching fewer words to great linguistic models could help them sound more human.


When it comes to AI chatbots, bigger is usually better.

Large linguistic models like ChatGPT and Bard, which generate conversational and original texts, improve as they are fed more data.

Every day, bloggers take to the Internet to explain how the latest advances — an app that summarizes articles, AI-generated podcasts, a fine-tuned model that can answer any question related to professional basketball — will "change everything."

But creating larger, more capable AI requires processing power that few companies possess, and there are growing concerns that a small group, including Google, Meta, OpenAI and Microsoft, will exert near-total control over the technology.

In addition, larger linguistic models are more difficult to understand.

They are often described as "black boxes," even by the people who design them, and leading figures in this field have expressed concern that AI's goals may not ultimately match our own.

If bigger is better, it's also more opaque and more exclusive.

In January, a group of young academics working on natural language processing – the branch of AI focused on linguistic understanding – launched a challenge to try to reverse this paradigm.

The group asked teams to create functional linguistic models using datasets less than one-ten-thousandth the size of those used by the most advanced large linguistic models.

A successful mini-model would be almost as capable as high-end models, but much smaller, accessible and human-friendly.

The project is called BabyLM Challenge.

"We challenge people to think small and focus more on building efficient systems that can be used by more people," says Aaron Mueller, a computer scientist at Johns Hopkins University and organizer of BabyLM.

Alex Warstadt, a computer scientist at ETH Zurich and another of the project's organisers, added:

"The challenge puts questions about learning human language at the center of the conversation, rather than 'How big can we make our models?'"

Large linguistic models are neural networks designed to predict the next word in a given phrase or sentence.

They are trained for this task using a corpus of words gleaned from transcripts, websites, novels, and newspapers.

A typical model makes guesses based on example phrases and then adjusts based on how close it is to the correct answer.

By repeating this process over and over again, a model forms maps of how words relate to each other.

In general, the more words used to train a model, the better it will be;

Each sentence provides context to the model, and more context translates into a more detailed impression of what each word means.

OpenAI's GPT-3, released in 2020, was trained on 200 billion words; DeepMind's Chinchilla, launched in 000, trained with 2022 billion.

For Ethan Wilcox, a linguist at ETH Zurich, the fact that something non-human can generate language represents an exciting opportunity:

Could AI linguistic models be used to study how humans learn language?

For example, nativism, an influential theory dating back to the early work of Noam Chomsky, holds that humans learn language quickly and effectively because they have an innate understanding of how language works.

But linguistic models also learn fast and, apparently, without an innate understanding of how language works.

The problem is that linguistic models learn very differently from humans.

Humans have a body, a social life and a wealth of sensations.

We can smell the mulch, feel the feathers, bump into the doors and taste the mint candies.

From a very young age we are exposed to simple words and a syntax that is often not represented in writing.

That's why, Wilcox concludes, a computer that produces language after being trained on billions of written words can't tell us much about our own linguistic process.

But if a linguistic model were exposed only to the words a young human being encounters, it could interact with language in a way that might answer certain questions we ask about our own capabilities.

So, along with half a dozen colleagues, Wilcox, Mueller and Warstadt conceived the BabyLM Challenge, to try to bring linguistic models slightly closer to human understanding.

In January, they sent out a call for teams to train linguistic models with the same number of words as a 13-year-old human:

about 100 million.

Candidate models would be tested for their ability to generate and capture the nuances of language, and a winner would be announced.

Eva Portelance, a linguist at McGill University, discovered the challenge the day it was announced.

His research straddles the line between computer science and linguistics.

The first forays into AI, in the 50s, were driven by the desire to model human cognitive abilities in computers; the basic unit of information processing in AI is the "neuron", and the first linguistic models of the 80s and 90s were directly inspired by the human brain.

But as processors became more powerful and companies began working on marketable products, engineers realized that it was often easier to train linguistic models with huge amounts of data than to force them to create psychologically informed structures.

As a result, Dr. Porteleck said, "we are given texts that resemble humans, but there is no connection between us and how they work."

For scientists interested in understanding how the human mind works, these large models offer limited insight.

And because they require enormous processing power, few researchers can access them.

"Only a small number of industrial laboratories with enormous resources can afford to train models with billions of parameters in trillions of words," says Wilcox.


"Or even to charge them," Mueller added.

"This has made research in this field seem somewhat less democratic lately."

The BabyLM Challenge, according to Portelance, could be considered another step in the arms race for larger linguistic models and another step towards more accessible and intuitive AI.

The potential of such a research programme has not been ignored by the large laboratories in the sector.

Sam Altman, CEO of OpenAI, recently stated that increasing the size of linguistic models would not lead to the same kind of improvements seen in recent years.

Companies like Google and Meta have been investing in researching more efficient linguistic models, based on human cognitive structures.

After all, a model capable of generating language when trained with less data could potentially be scaled up.

Whatever the benefits of a successful BabyLM, for those behind the challenge the objectives are more academic and abstract. Even the prize subverts the practical.

"It's just pride," Wilcox says.

c.2023 The New York Times Company

See also

Hollywood killer robots become tools of the Military

Artificial intelligence is reading the mind better and better

Source: clarin

All news articles on 2023-05-30

You may like

Trends 24h

Latest

© Communities 2019 - Privacy

The information on this site is from external sources that are not under our control.
The inclusion of any links does not necessarily imply a recommendation or endorse the views expressed within them.