GPU Benchmark Explorer
Interactive performance analysis for LLaMA Inference (FP16/INT4), Toy Training, and Diffusion.
8B LLaMA 3.1 (FP16)
70B LLaMA 3.1 Quantized (INT4)
Training (Toy)
Diffusion
Metric
Requests / Sec
Tokens / Sec
Output Tokens / Sec
Metric
Requests / Sec
Tokens / Sec
Output Tokens / Sec
Metric
Throughput (samples/s)
Latency (ms)
Peak VRAM (GB)
Est. Bandwidth (GB/s)
Metric
Iterations / Sec