TODAYINFO
  • finance
  • technology
  • military
  • world
  1. TODAYINFO
  2. technology

Scientists prove in amazing ways that large models can be trained with ethical data

2025-06-08 21:13:38 HKTtodayinfo

Picture by Getty / Futurism provides

According to the "The Washington Post">More than 20 AIs from MIT, Cornell University, the University of Toronto and other institutions The team of researchers trained a large language model using only publicly licensed or public domain data, providing a blueprint for ethical development of the technology.

But, as the creators readily admit, this is far from easy.

As they described in a peer-reviewed paper published this week, they quickly discovered that it was not computing power that hindered them, but personal ability.

WaPo Explained that this is because the text in the dataset they put together (which they call Common Pile v0.1) must be manually cleaned and reformatted to make it suitable for AI training. Then there is a lot of extra work that has to be done to double-check the copyright status of all data, as many online works are inappropriately licensed.

WaPoWaPoWaPo: "This is not a thing you can just expand the available resources," like accessing more computer chips and fancy web crawlers. "We use automation tools, but all of our stuff is manually annotated and checked by people at the end of the day. It's really hard.

Nevertheless, Biderman and her colleaguesThey did the job.

After the hard journey of creating the Common Pile, they used the guilt-free dataset to train an LLM with 7 billion parameters. How did it turn out? An admirable AI can rival industry models such as Meta's Llama 1 and Llama 2 7B - which is impressive, but these are versions released more than two years ago. It's almost a lifetime in the AI ​​competition.

Of course, this is done by a more or less cluttered team, rather than a company with billions of dollars in resources, and it must be compensated for it in pieces. A particularly resourceful discovery is a set of over 130,000 books in English in the Library of Congress.

Copyright remains one of the biggest moral and legal issues facing AI. Leaders like OpenAI and Google consume unfathomable data on the surface network to reach where they are, swallowing up everything from news articles to content as intrusive as social media posts. Meta Has been sued by the authors who claim it illegally used pirated 7 million copyrighted books to train its AI.

The tech industry defended its greedy data demands, saying it all counts as fair use—rather, from an existence perspective, it would be "unable" to develop the technology without sucking everyone's content for free.

This latest work is a rebuttal to the Silicon Valley route, although it doesn't remove all moral issues. It's still a large language model, a technology that fundamentally aims to destroy jobs, and perhaps not everyone whose work ends up in the public domain will be happy that it's rumination by AI -- of course, if they are not late artists whose copyright has expired.

Even that AI companies are bound and can only use their works when licensed or paid -- a big assumption -- the fact remains that as long as these companies stick, copyright owners will face great pressure to allow AI training.

Biderman herself doesn't fantasize that companies like OpenAI will suddenly turn a new page and start to become a model of ethical data sources. But she hopes that her work will at least make them stop hiding what they use to train AI models.

"Even partial transparency has great social value and moderate scientific value," she told WaPo.

Latest articles
  • Another state in Ukraine is lost! Zelensky made a move, and Russia no longer held back this time Another state in Ukraine is lost! Zelensky made a move, and Russia no longer held back this time world | 2025-06-09
  • Suddenly, tourists are suspended! The situation escalates, just south of us Suddenly, tourists are suspended! The situation escalates, just south of us finance | 2025-06-09
  • Less than 48 hours before the new China-US talks, the United States hastily made a move, and Trump has no good cards to play Less than 48 hours before the new China-US talks, the United States hastily made a move, and Trump has no good cards to play world | 2025-06-09
  • No breakthrough: Ukrainian General Staff denies Russian entry into Dnepropetrovsk Oblast No breakthrough: Ukrainian General Staff denies Russian entry into Dnepropetrovsk Oblast world | 2025-06-09
  • A takeaway "single king" died while delivering food in Guangzhou? Latest response: The party has called the police A takeaway "single king" died while delivering food in Guangzhou? Latest response: The party has called the police technology | 2025-06-09
  • The winner of the Australian Medal is released! Former Prime Minister Morrison is awarded the highest medal of honor The winner of the Australian Medal is released! Former Prime Minister Morrison is awarded the highest medal of honor finance | 2025-06-09
  • After Los Angeles, New York is in chaos After Los Angeles, New York is in chaos world | 2025-06-09
  • Hebei: Ensure full coverage of technical services for inspection and testing institutions at or above the provincial level in 107 county-level characteristic industrial clusters this year Hebei: Ensure full coverage of technical services for inspection and testing institutions at or above the provincial level in 107 county-level characteristic industrial clusters this year finance | 2025-06-09
  • Rockchip Microelectronics RK2118G-YX chip has obtained DTS dual core certification Rockchip Microelectronics RK2118G-YX chip has obtained DTS dual core certification technology | 2025-06-09
  • Gold prices fell sharply! The price of gold jewelry in some domestic brands fell below the 1,000 yuan mark Gold prices fell sharply! The price of gold jewelry in some domestic brands fell below the 1,000 yuan mark finance | 2025-06-09
  • It turns out that ship 88 is the "People's Navy Advance" ship It turns out that ship 88 is the "People's Navy Advance" ship military | 2025-06-09
  • Thor's JQ24F240L monitor: 23.8-inch 2K 240Hz screen, first launch of 899 yuan Thor's JQ24F240L monitor: 23.8-inch 2K 240Hz screen, first launch of 899 yuan technology | 2025-06-09
  • Logitech G522 three-mode wireless gaming headsets are available in China, 1,099 yuan Logitech G522 three-mode wireless gaming headsets are available in China, 1,099 yuan technology | 2025-06-09
  • Bidding provides false materials China Instrument Import and Export Group is prohibited from participating in the army's procurement within one year Bidding provides false materials China Instrument Import and Export Group is prohibited from participating in the army's procurement within one year military | 2025-06-09
  • Full of blood and high brush! Apple's new 5,999 yuan phone this time is exaggerating Full of blood and high brush! Apple's new 5,999 yuan phone this time is exaggerating technology | 2025-06-09

©2025 TODAYINFO. ALL RIGHTS RESERVED.