NVIDIA Releases Nemotron 3 Super: Open 120B-Parameter Model Targets Enterprise Agentic AI
NVIDIA has released Nemotron 3 Super, a 120-billion-parameter open-weights model built on a hybrid Mamba-Transformer architecture with a one-million-token context window. The model delivers 5x throughput improvements over its predecessor and is designed specifically for enterprise agentic AI workflows.
NVIDIA has released Nemotron 3 Super, a 120-billion-parameter open-weights model that represents a significant architectural departure from conventional transformer-only designs. Built on a hybrid Mamba-Transformer mixture-of-experts architecture with only 12 billion active parameters at inference time, the model delivers 5x throughput improvements over NVIDIA's previous-generation models while supporting a one-million-token context window.
Hybrid Architecture
Nemotron 3 Super combines the Mamba selective state space model with traditional transformer attention layers in a mixture-of-experts configuration. This hybrid approach allows the model to handle long-context tasks — such as analyzing entire codebases or processing lengthy documents — with significantly less compute than pure transformer models of comparable capability. The 12B active parameter count means that despite having 120B total parameters, each forward pass only activates a fraction of the model, dramatically reducing inference costs.
NVIDIA has published both the model weights and the 10-trillion-token training dataset, making Nemotron 3 Super one of the most transparent large-scale model releases to date. The open training data is particularly notable — most model providers treat training data composition as a closely guarded secret.
Enterprise Agentic AI Focus
The model is explicitly designed for enterprise agentic AI workflows — multi-step tasks where an AI agent needs to plan, execute, and iterate autonomously. NVIDIA has optimized Nemotron 3 Super for tool calling, structured output generation, and multi-turn reasoning, the core capabilities required for agents that can navigate complex enterprise systems.
The model integrates natively with NVIDIA's NIM (NVIDIA Inference Microservices) platform, allowing enterprises to deploy it on their own infrastructure without sending data to external APIs. This on-premises capability is critical for regulated industries — healthcare, finance, defense — where data sovereignty requirements make cloud-hosted AI models impractical.
The Nemotron 3 Family
Nemotron 3 Super sits in the middle of a three-model family. Nemotron 3 Nano (8B parameters) targets edge and mobile deployments, while Nemotron 3 Ultra (200B+ parameters) is designed for the most demanding research and enterprise workloads. All three models share the same hybrid architecture and are trained on the same data pipeline, ensuring behavioral consistency across deployment scales.
The release comes just days before NVIDIA's GTC 2026 conference, where the company is expected to provide deeper technical details on the Nemotron architecture and announce enterprise partnerships built around the model family.
Related Articles
NVIDIA GTC 2026 Keynote: Jensen Huang Unveils Vera Rubin Platform and Six New Chips
NVIDIA CEO Jensen Huang opened GTC 2026 in San Jose with the formal unveiling of the complete Vera Rubin GPU platform — six new chips featuring 288 GB of HBM4 memory, 336 billion transistors, and 50 PetaFLOPS of FP4 performance. Over 30,000 attendees from 190 countries gathered for the AI industry's most anticipated annual event.
OpenAI Acquires Promptfoo to Strengthen AI Agent Security and Red-Teaming
OpenAI has agreed to acquire Promptfoo, the open-source AI security and red-teaming platform used by over 25% of the Fortune 500, in a deal that will integrate the tool directly into OpenAI's enterprise agent platform. The acquisition signals OpenAI's growing focus on safety infrastructure as it pushes deeper into autonomous AI agent deployment.
NVIDIA GTC 2026 Preview: Jensen Huang Teases Chips That Will "Surprise the World"
With GTC 2026 just days away, NVIDIA CEO Jensen Huang promises to unveil "a few new chips the world has never seen before" — analysts expect the Vera Rubin GPU platform and first architectural details on Feynman, next-generation silicon designed for AI agent reasoning workloads.