05/13 2025
451
Hello everyone!
In our previous article, we delved into various GPU parameters. For those interested, you can check out "Easy to Understand: A Fun Guide to GPU Core Parameters and Specifications!" for more details.
Recently, many of you have asked how the GPU computing power data presented in our tables is calculated. Why are there different expressions for FP32 and FP16?
Below, let's explore the method for calculating computing power and understand how GPU computing power is determined. Feel free to like and share this article.
GPU computing power is commonly expressed in FLOPS (Floating-point Operations Per Second), which reflects the efficiency of GPUs in performing complex computational tasks.
In simple terms, GPU computing power measures how many mathematical problems a GPU can solve per second. These problems aren't just basic arithmetic operations like addition, subtraction, multiplication, or division, but rather more complex floating-point operations (similar to decimal calculations) and integer operations (similar to whole number statistics).
For instance:
Floating-point operations: When GPUs perform scientific calculations (like weather forecasting) or AI training, it's akin to solving complex calculus problems, where speed is crucial.
Integer operations: During AI inference (such as image recognition), GPUs need to quickly count pixel points or determine classification results. GPUs excel in integer calculations, offering higher performance and efficiency, especially when handling large datasets and complex algorithms.
Before diving into the computing power formula, let's clarify two key terms: TFLOPS (Tera Floating-point Operations Per Second) and TOPS (Tera Operations Per Second).
TFLOPS: Measures the number of trillion floating-point operations a computer hardware (like a CPU or GPU) can complete in one second. It's essential for tasks requiring high-precision calculations, such as scientific research and graphics rendering.
TOPS: Measures the number of trillion operations per second, encompassing various types of computations, including integer operations and logical operations. It's particularly relevant in AI, where efficient integer operations are vital for tasks like inference and image recognition.
In summary, TFLOPS focuses on high-precision floating-point operations, while TOPS is broader, encompassing various types of computations. TFLOPS is commonly used to evaluate GPU performance, whereas TOPS is more relevant for NPU or dedicated AI chips.
Here's the core formula for calculating GPU computing power:
Computing Power (FLOPS) = CUDA Core Count × Boost Frequency × Floating-point Calculation Coefficient per Core per Cycle
For example, let's calculate the theoretical peak computing power of the NVIDIA A100 GPU:
Applying the formula: A100's computing power (FP32 single-precision) = 6912 × 1.41 × 2 = 19491.84 GFLOPS ≈ 19.5 TFLOPS.
Another method to estimate GPU computing power is the peak computation method, based on the number of instructions executed per clock cycle (F_clk), operating frequency (F_req), and the number of SMs (N_SM).
Calculation formula: Peak computing power = F_clk × F_req × N_SM
Application example (using NVIDIA A100):
Considering Tensor Core's fused multiply-add instructions, A100's peak computing power = 64 FLOPS/Cycle × 1.41 GHz × 108 SMs × 2 = 19.491 TFLOPS ≈ 19.5 TFLOPS.
NVIDIA GPU architecture upgrades are akin to upgrading mobile phone chips, each generation optimizing computational efficiency:
Regardless of computing power, if data transmission can't keep up, it's like having too many cars on a narrow highway. Memory bandwidth determines the speed at which the GPU processes data:
For example, RTX 4090's bandwidth of 1008 GB/s is equivalent to 10 trucks transporting data simultaneously, whereas A100's bandwidth of 2039 GB/s is like 20 trucks.
When evaluating GPU computing power, in addition to considering theoretical peak computing power and the peak computation method, the following points are crucial:
GPU computing power is akin to a car's horsepower, determining its speed. However, the overall experience also hinges on factors like memory bandwidth (road width) and software optimization (driving skills). When selecting a GPU, consider your task requirements (gaming, training, inference) and budget comprehensively.