DGX Spark vs Radeon 960 XT vs M3 Ultra : One Million AI Tokens Performance Testing

7:45 amJanuary 5, 2026 By Julian Horsey

What does it really take to generate 1 million tokens, fast, efficiently, and without breaking the bank? Below, Alex Ziskind breaks down how five distinct computing systems stack up in this high-stakes challenge, revealing surprising trade-offs between speed, energy consumption, and cost. From the lightning-fast DGX Spark to the budget-friendly AMD Radeon 960 XT, each system tells a story of priorities: raw performance, affordability, or sustainability. But the results aren’t always what you’d expect, especially when a system like the Mac Studio M3 Ultra, known for its sleek efficiency, struggles to keep pace in the speed department. If you’ve ever wondered how these machines perform under pressure, this feature offers a detailed look at their strengths and limitations.

By the end of this guide, you’ll discover which system is the fastest, which one saves the most energy, and which might quietly drain your wallet over time. Whether you’re a tech enthusiast, a professional balancing performance with operational costs, or simply curious about the real-world impact of hardware choices, this guide will leave you with plenty to think about. From the record-setting throughput of the DGX Spark to the unexpected inefficiencies of the Beink GTR9, the comparisons here go beyond surface specs to uncover what truly matters in high-performance computing. It’s a fascinating glimpse into the trade-offs that shape the machines powering modern innovation.

Token Generation System Comparison

TL;DR Key Takeaways :

The DGX Spark is the fastest and most energy-efficient system, ideal for high-throughput tasks, but requires a significant financial investment.
The AMD Radeon 960 XT offers a budget-friendly option with competitive performance and low energy consumption, suitable for smaller-scale operations.
The Mac Studio M3 Ultra excels in idle energy efficiency but is slower and less efficient during intensive tasks, making it better for energy-conscious users.
Software optimization plays a critical role, with tools like VLM excelling in concurrency for Nvidia and AMD hardware, and MLX optimized for Apple Silicon.
The H200 Cluster delivers unmatched speed for enterprise-level tasks but at a high energy and financial cost, suitable only for users with substantial computational demands.

Test Setup: Hardware and Software Overview

To ensure a comprehensive and fair evaluation, five distinct systems were tested:

AMD Radeon 960 XT: A budget-friendly GPU designed for moderate workloads.
DGX Spark: A high-performance system optimized for demanding computational tasks.
Beink GTR9 (AMD Strix Halo): A compact system aimed at balancing performance and affordability.
Mac Studio M3 Ultra: Apple’s premium offering, designed for energy efficiency and creative workflows.
H200 Cluster: A large-scale, high-capacity system built for enterprise-level tasks.

Each system was tasked with generating 1 million tokens using the Quen 3 4B model, a compact 4-billion-parameter model chosen for its compatibility across diverse platforms. To maximize performance, software tools such as Llama CPP, VLM, and MLX were employed, with a focus on concurrency and cross-platform functionality. This combination of hardware and software ensured a robust comparison of speed, energy efficiency, and cost.

Performance Results: Speed Matters

The speed of token generation varied significantly across the systems, reflecting their design priorities and hardware capabilities:

DGX Spark: The fastest system, completing the task in just 6.7 minutes with a throughput of 2,451 tokens per second. This makes it ideal for high-throughput environments.
AMD Radeon 960 XT: A strong contender in the budget category, generating tokens at 1,913 tokens per second and completing the task in 8.12 minutes.
Mac Studio M3 Ultra: While efficient in other areas, it required 26 minutes to generate 1 million tokens, reflecting its focus on energy savings rather than raw speed.
Beink GTR9: The slowest of the group, taking 34 minutes to complete the task, highlighting its limitations in handling high-performance workloads.

The H200 Cluster, tested separately with a larger 480-billion-parameter model, achieved an impressive 2,609 tokens per second, surpassing the DGX Spark in speed. However, this performance came at a significantly higher energy and financial cost, making it suitable only for users with substantial computational demands.

Million Tokens Speed Test, DGX Spark vs Radeon vs M3 Ultra

Watch this video on YouTube.

Advance your knowledge in AI tokens by reading more of our detailed content.

Energy Efficiency: Balancing Power and Performance

Energy consumption was a critical factor in evaluating the systems, as it directly impacts both operational costs and environmental considerations:

DGX Spark: Demonstrated the highest energy efficiency during active processing, producing the most tokens per kilowatt-hour. This makes it a strong choice for users prioritizing both performance and sustainability.
Mac Studio M3 Ultra: Excelled in idle energy efficiency, consuming minimal power when not actively generating tokens. However, its energy efficiency during intensive tasks was less competitive.
Beink GTR9: Consumed the most energy overall, reflecting inefficiencies in its design for high-performance tasks. This makes it less suitable for users concerned with energy costs.

For users seeking a balance between energy consumption and performance, the DGX Spark emerged as the most practical option. In contrast, the Beink GTR9’s high energy usage underscores the importance of aligning hardware capabilities with workload requirements.

Cost Analysis: Operational Expenses

Operational costs were calculated based on an energy rate of $0.20 per kilowatt-hour, providing a practical perspective on the financial implications of each system:

DGX Spark: Despite its high upfront cost, it proved cost-effective for large-scale workloads due to its speed and energy efficiency. This makes it a worthwhile investment for users with demanding computational needs.
AMD Radeon 960 XT: Offered a more affordable alternative, balancing performance and cost for budget-conscious users. Its lower energy consumption further enhances its appeal for smaller-scale operations.
Beink GTR9: While inexpensive upfront, its inefficiency led to higher energy costs over time, reducing its overall cost-effectiveness.

The choice between these systems ultimately depends on your workload and budget. For high-throughput tasks, the DGX Spark is an excellent option, while the AMD Radeon 960 XT is better suited for users with limited resources.

Software Optimization: The Role of Tools

The performance of each system was significantly influenced by the software tools used, underscoring the importance of optimization:

VLM: Excelled in concurrency, particularly on Nvidia and AMD hardware, making it the preferred choice for high-throughput tasks.
MLX: Optimized for Apple Silicon, it performed exceptionally well on the Mac Studio but lacked the cross-platform compatibility needed for broader applications.
Llama CPP: A versatile baseline tool, suitable for simpler setups. However, it was less effective for high-concurrency tasks compared to specialized software like VLM.

Selecting the right software is as critical as choosing the hardware itself. Tools like VLM and MLX demonstrate how software optimization can unlock the full potential of a system, enhancing both performance and efficiency.

H200 Cluster: Pushing the Limits

The H200 Cluster was tested with a larger 480-billion-parameter model to evaluate its capabilities under extreme conditions. It achieved a throughput of 2,609 tokens per second, outperforming the DGX Spark in speed. However, its energy consumption and operational costs were substantially higher, making it a viable option only for users with significant computational demands and budgets. This system is best suited for enterprise-level applications where performance outweighs cost considerations.

Key Takeaways: Choosing the Right System

This analysis highlights the diverse capabilities of modern computing systems for token generation tasks. Here are the main insights:

High-end systems: The DGX Spark delivers exceptional speed and energy efficiency, making it ideal for high-throughput environments, though it requires a significant financial investment.
Budget-friendly options: The AMD Radeon 960 XT offers competitive performance at a fraction of the cost, making it a practical choice for users with limited resources.
Energy considerations: The Mac Studio M3 Ultra is highly efficient at idle but less competitive during intensive tasks, while the Beink GTR9 incurs higher energy costs overall.
Software matters: Tools like VLM and MLX highlight the importance of software optimization in maximizing hardware potential.

Ultimately, the best system for your needs will depend on your specific priorities, whether they are speed, energy efficiency, or cost-effectiveness. By understanding the strengths and limitations of each option, you can make an informed decision that aligns with your computational requirements and budget.

Media Credit: Alex Ziskind

Filed Under: AI, Guides, Hardware

Latest Geeky Gadgets Deals

Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.

DGX Spark vs Radeon 960 XT vs M3 Ultra : One Million AI Tokens Performance Testing

Token Generation System Comparison

Test Setup: Hardware and Software Overview

Performance Results: Speed Matters

Million Tokens Speed Test, DGX Spark vs Radeon vs M3 Ultra

Energy Efficiency: Balancing Power and Performance

Cost Analysis: Operational Expenses

Software Optimization: The Role of Tools

H200 Cluster: Pushing the Limits

Key Takeaways: Choosing the Right System

Thêm bài viết

Why 2026 Is the Perfect Time to Build a Second Brain with Simple AI Tools

How to Use Al to Create 3D Printable Designs : No CAD Modeling Skills Needed

NVIDIA Unveils New Open AI Models at CES 2026 & New AI Platform with 5x Speed

Zuckerberg’s AI Dream Unravels : Why Meta’s Push into AI is Falling Apart