close
close

OSI explains what makes AI systems open, but most “open” models don’t follow that

OSI explains what makes AI systems open, but most “open” models don’t follow that

Highly respected Open Source Initiativewho has a reputation as one of the most prominent stewards of open source software, finally came up with it official definition about what makes AI models open or not.

The definition was immediately rejected by Meta Platforms Inc., whose popular Large language patterns of llamas fail to make an assessment.

OSI made public Open Source AI Definition v1.0 at the All Things Open 2024 conference this week in Raleigh, North Carolina, saying it came after a years-long process in which he collaborated with various organizations and academia. He envisions OSAID becoming the standard by which anyone can determine whether an AI system is truly open or not.

Standards for what makes traditional software “open” have long been agreed upon, but AI software is a different beast because it contains elements not covered by traditional licenses, such as the vital data used to train it.

That’s why OSI spent years coming up with a new definition for such systems, and decreed that for AI to be considered truly open source, it must do the following three things:

  1. Full access to details about the data used to train AI so others can understand and reproduce it.
  2. The complete codebase used to build and run an AI system.
  3. The settings and weights used during training that enable the AI ​​to generate results.

Unfortunately self-proclaimed “champions” Open source AI such as Meta, Stability AI Ltd. and Mistral, the vast majority of their AI models do not meet the OSI definition.

For example, Meta Llama models have restrictions on commercial use, preventing the free use of programs with more than 700 million users. In addition, Meta does not provide access to Meta’s training datasets, nor does it provide complete information about the data, so Llama models cannot be reproduced.

Stability AI, which specializes in images and videos, has long insisted on this popular Stable Diffusion models are “open”. But it also doesn’t meet OSI’s definition because of its requirement that companies with more than $1 million in annual revenue buy a corporate license to use its models. “Mistral” also imposes restrictions on the use of its latest Ministral 3B and 8B models for certain commercial enterprises.

It’s likely that many more AI companies that claim to be open source will be upset by the OSI definition. A recent study by Carnegie Mellon, the AI ​​Now Institute, and the Signal Foundation found that the vast majority of “open source” models actually much more secret than such a claim deserves. For example, very few produce datasets used to train models, and most require massive amounts of computing power to train and run, making them out of reach for most developers.

In the case of Llama, Meta says security concerns prevent it from providing basic training data to the community, but few believe that’s the only reason it does so. Meta is almost certainly using the vast amount of content posted by users of platforms like Facebook and Instagram, including content that is limited to the user’s contacts.

In addition, Lama is likely learning from the vast amount of copyrighted material that has been published online, and Meta does not want to divulge the details. In April, the New York Times reported that Meta admitted internally that the Llama tutorial contained copyrighted content because it was impossible to avoid collecting such material. Nevertheless, the company needs to keep quiet, because it is currently confused in a litany of lawsuits created by publishers, authors and other content creators.

Instead of challenging OSI, Meta appears to “agree to disagree” with its definition of what open source AI is. A company representative said that while Meta agrees with OSI on many things, it disagrees with today’s statement.

“There is no single definition of open source AI, and defining it is challenging because previous open source definitions do not capture the complexities of today’s rapidly evolving AI models,” the spokesperson said.

The problem with Meta is that most people are likely to accept the OSI definition because it’s based on fairly simple logic, Rob Enderle, an analyst at the Enderle Group, told SiliconANGLE.

“OSI is right in its assessment because without transparency of training data, you really don’t have an open platform,” Enderle said. “Training data is not a trivial thing because it defines how AI functions. Without access to it, an AI system cannot be open because the very nature of how it works is closed.”

Most experts who don’t have a stake in big AI technology companies would probably agree with the OSI definition. The organization’s definition of open source software is widely regarded as the basic principle that software can be used for free without fear of lawsuits and licensing pitfalls. In addition, he spent more than two years working closely with various academics, AI developers, and researchers to refine his definition of open source AI.

Also, the OSI definition closely resembles an earlier attempt to figure out what makes AI open. Earlier this year, the Linux Foundation published its own definitionlisting many of the same requirements.

Image: SiliconANGLE/Microsoft Designer

Your vote of support is important to us and helps us keep our content FREE.

One click below supports our mission to provide free, deep and relevant content.

Join our community on YouTube

Join a community of over 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jesse, Dell Technologies Founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many other luminaries and experts.

“TheCUBE is an important partner for the industry. You guys are truly a part of our events and we really appreciate you coming and I know people appreciate the content you create too.” – Andy Jesse

THANK YOU