Nvidia’s Huang warns DeepSeek running on Huawei chips would be ‘horrible’ for the US

Learn how to benchmark AI model performance across different hardware platforms, specifically comparing Nvidia and Huawei Ascend chips for AI development.

Introduction

In the ongoing tech rivalry between the US and China, AI chip performance and optimization have become critical battlegrounds. This tutorial will guide you through the process of benchmarking AI model performance across different hardware platforms, specifically focusing on how to evaluate and compare the performance of AI models on Nvidia and Huawei Ascend chips. Understanding these performance differences is crucial for developers and researchers who need to make informed decisions about hardware selection for AI workloads.

While this tutorial doesn't directly address the geopolitical concerns raised in the news article, it provides the technical foundation for understanding why hardware choices matter so much in AI development. You'll learn how to set up performance testing environments, run AI workloads, and analyze results across different chip architectures.

Prerequisites

Basic understanding of Python and machine learning concepts
Access to a system with either Nvidia GPU or Huawei Ascend chip (or both for comparison)
Installed deep learning frameworks (PyTorch or TensorFlow)
Basic knowledge of Docker containers
Understanding of performance benchmarking concepts

Step-by-Step Instructions

1. Set Up Your Development Environment

First, you need to create a consistent environment for testing AI models across different hardware. Start by installing the necessary libraries:

pip install torch torchvision torchaudio
pip install tensorflow
pip install numpy scipy matplotlib
docker pull nvidia/cuda:11.8.0-devel-ubuntu20.04

Why: Installing the core libraries ensures you have the necessary tools to run AI models. Docker containers provide consistency across different hardware platforms and help isolate your testing environment.

2. Create a Sample AI Model for Benchmarking

Next, create a simple neural network that you can use for performance testing:

import torch
import torch.nn as nn
import torch.optim as optim

# Simple CNN for benchmarking
class SimpleCNN(nn.Module):
    def __init__(self, num_classes=10):
        super(SimpleCNN, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        self.classifier = nn.Sequential(
            nn.Linear(128 * 8 * 8, 512),
            nn.ReLU(inplace=True),
            nn.Linear(512, num_classes)
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x

# Initialize model
model = SimpleCNN()

Why: This simple model provides a consistent workload for testing performance across different hardware. It's complex enough to be meaningful but simple enough to run quickly.

3. Set Up Performance Monitoring

Install and configure performance monitoring tools:

pip install psutil gpustat

Then create a monitoring script:

import psutil
import time
import threading

class PerformanceMonitor:
    def __init__(self):
        self.cpu_usage = []
        self.memory_usage = []
        self.gpu_usage = []
        self.running = False

    def start_monitoring(self):
        self.running = True
        monitor_thread = threading.Thread(target=self._monitor)
        monitor_thread.start()

    def stop_monitoring(self):
        self.running = False

    def _monitor(self):
        while self.running:
            self.cpu_usage.append(psutil.cpu_percent(interval=1))
            self.memory_usage.append(psutil.virtual_memory().percent)
            time.sleep(1)

Why: Monitoring helps you understand resource utilization during AI model execution, which is crucial for comparing performance across different hardware platforms.

4. Create Hardware-Specific Test Scripts

Develop separate test scripts for each hardware platform:

# test_nvidia.py
import torch
import time

# Check if CUDA is available
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA device count: {torch.cuda.device_count()}")

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = SimpleCNN().to(device)

# Create dummy input
input_tensor = torch.randn(32, 3, 32, 32).to(device)

# Time model execution
start_time = time.time()
with torch.no_grad():
    output = model(input_tensor)
end_time = time.time()

print(f"NVIDIA execution time: {end_time - start_time:.4f} seconds")

For Huawei Ascend, you would create a similar script using the Ascend-specific PyTorch integration:

# test_ascend.py
import torch
import torch_npu
import time

# Check if NPU is available
print(f"NPU available: {torch.npu.is_available()}")

# Set device
device = torch.device("npu" if torch.npu.is_available() else "cpu")
model = SimpleCNN().to(device)

# Create dummy input
input_tensor = torch.randn(32, 3, 32, 32).to(device)

# Time model execution
start_time = time.time()
with torch.no_grad():
    output = model(input_tensor)
end_time = time.time()

print(f"Ascend execution time: {end_time - start_time:.4f} seconds")

Why: These scripts allow you to directly compare performance characteristics of different hardware platforms by running identical workloads.

5. Run Comparative Performance Tests

Execute your tests on both hardware platforms and collect data:

import subprocess
import json

# Run tests on different hardware
nvidia_result = subprocess.run(['python', 'test_nvidia.py'], capture_output=True, text=True)
ascend_result = subprocess.run(['python', 'test_ascend.py'], capture_output=True, text=True)

# Parse results
nvidia_time = float(nvidia_result.stdout.split(' ')[-1])
ascend_time = float(ascend_result.stdout.split(' ')[-1])

# Store results
results = {
    "nvidia_time": nvidia_time,
    "ascend_time": ascend_time,
    "speedup": nvidia_time / ascend_time
}

print(json.dumps(results, indent=2))

Why: This comparative approach helps quantify performance differences, which is essential for understanding the implications of hardware choices in AI development.

6. Analyze and Visualize Results

Create a visualization script to analyze your benchmarking data:

import matplotlib.pyplot as plt
import numpy as np

# Sample results
hardware = ['NVIDIA', 'Huawei Ascend']
execution_times = [0.125, 0.185]  # seconds

# Create bar chart
fig, ax = plt.subplots(figsize=(10, 6))
bars = ax.bar(hardware, execution_times, color=['blue', 'green'])

# Add value labels
for bar, time in zip(bars, execution_times):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.005,
            f'{time:.3f}s', ha='center', va='bottom')

ax.set_ylabel('Execution Time (seconds)')
ax.set_title('AI Model Performance Comparison')
plt.tight_layout()
plt.savefig('performance_comparison.png')
plt.show()

Why: Visualizing performance data makes it easier to understand and communicate the differences between hardware platforms, which is valuable for making informed decisions about AI infrastructure.

Summary

This tutorial provided a practical framework for benchmarking AI model performance across different hardware platforms, specifically focusing on the comparison between Nvidia and Huawei Ascend chips. By following these steps, you've learned how to set up test environments, create benchmarking scripts, and analyze performance differences.

The importance of these benchmarks extends beyond simple performance measurement. As highlighted in the news article, hardware choices have significant implications for AI development and geopolitical positioning. Understanding these performance characteristics helps developers make informed decisions about where to deploy their AI models and what hardware to invest in for future development.

While this tutorial focuses on the technical aspects of hardware comparison, it's important to recognize that the broader implications involve considerations of supply chain security, software ecosystem support, and long-term development strategy. The performance differences you've measured here reflect the complex interplay between hardware architecture, software optimization, and the broader ecosystem of AI development tools.