NVIDIA AI Releases Star Elastic: One Checkpoint that Contains 30B, 23B, and 12B Reasoning Models with Zero-Shot Slicing
Back to Explainers
aiExplainerbeginner

NVIDIA AI Releases Star Elastic: One Checkpoint that Contains 30B, 23B, and 12B Reasoning Models with Zero-Shot Slicing

May 9, 202639 views4 min read

Learn how NVIDIA's Star Elastic technology packs multiple AI models into one file, making AI more efficient and accessible. This new method trains models of different sizes together, saving time and resources while improving performance.

Introduction

Imagine you're building a library of books, but instead of creating separate copies for each book, you find a way to store all the books in one giant, smart book that can automatically show you just the part you need. That's exactly what NVIDIA's new AI technology, called Star Elastic, does with artificial intelligence models. Instead of training many different AI models separately, this new method packs several models into one, making everything faster and more efficient.

What is Star Elastic?

Star Elastic is a new method developed by NVIDIA that allows them to store multiple versions of an AI model inside a single file — like a smart library book. These models come in different sizes, known as parameter counts, which determine how big and powerful the AI is. In this case, Star Elastic stores three models: one with 30 billion parameters (30B), another with 23 billion (23B), and a third with 12 billion (12B). Each of these models can do different tasks, from simple to complex.

Traditionally, creating these models would require separate training sessions, which take a lot of time and computing power. But Star Elastic changes that by training all three models together in one run — making the whole process much more efficient.

How Does Star Elastic Work?

Think of Star Elastic like a chef who can make different-sized cakes using the same recipe and ingredients, but just adjusting how much of each ingredient they use. The chef doesn't need to re-learn the recipe every time they want to make a small or large cake — they just tweak the amount of batter.

In Star Elastic, the AI is trained once using a large dataset (called a token run), and during this process, it learns to be good at multiple sizes. This is done using a framework called Nemotron Elastic, which helps the AI understand how to adapt its performance based on what it's asked to do.

During inference (when the AI is actually used to answer questions), Star Elastic uses a smart system called elastic budget control. This means it uses a smaller version of the model to think through a problem and then switches to the full model to give the final answer — like using a rough draft to brainstorm and then writing the final, polished version.

Why Does It Matter?

Star Elastic matters because it makes AI more efficient and accessible. Instead of needing a powerful computer to run a large model, users can now use smaller versions that are just as fast. It also saves a lot of time and energy, since training is done once, not multiple times.

Additionally, because of the way it's built, Star Elastic allows even regular consumer-grade GPUs (like those in gaming computers) to run these models. This means more people and businesses can use advanced AI without needing expensive hardware.

For example, imagine you're using an AI assistant to solve a math problem. With Star Elastic, the AI might first use a quick, smaller version to figure out how to approach the problem. Then, it switches to the full model to give you a highly accurate answer — all without the user needing to know the difference.

Key Takeaways

  • Star Elastic is a new way to train AI models that stores multiple versions of a model in one file.
  • It trains models of different sizes (30B, 23B, and 12B parameters) in one run, saving time and resources.
  • It uses a smart system to choose which model version to use at different stages of solving a problem.
  • This makes AI faster, more efficient, and accessible to more people and devices.
  • It uses advanced techniques like nested FP8 and NVFP4 to make large models fit on regular GPUs.

Source: MarkTechPost

Related Articles