Suppose one GPU could be very very similar to one other? Suppose once more. It seems that there’s stunning variability within the efficiency delivered by chips of the identical mannequin. That may make getting your cash’s price by renting time on a GPU from a cloud supplier an actual roll of the cube, in keeping with analysis from the Faculty of William & Mary, Jefferson Lab, and Silicon Data.
“It’s known as the silicon lottery,” says Carmen Li, founder and CEO of Silicon Knowledge, which tracks GPU rental prices and benchmarks cloud-computing efficiency.
The silicon lottery’s existence has been identified since at the least 2022, when researchers on the College of Wisconsin tied it to variations within the efficiency of GPU-dependent supercomputers. Li and her colleagues figured that the impact can be much more pronounced for AI cloud prospects.
Efficiency varies for GPU fashions within the cloud
In order that they ran 6,800 situations of the index agency’s benchmark check on 3,500 randomly chosen GPUs operated by 11 cloud-computing suppliers. The three,500 GPUs comprised 11 models of Nvidia GPU, probably the most superior being the Nvidia H200 SXM. (The staff wasn’t simply selecting on Nvidia; the GPU big makes up a lot of the rental cloud market.)
The benchmark, known as SiliconMark, is meant to supply a snapshot of a GPU’s capability to run large language models, or LLMs. It exams 16-bit floating-point computing performance, measured in trillions of operations per second, and a GPU’s internal-memory bandwidth, measured in gigabytes per second. The results confirmed that the computing efficiency various for all fashions, however for the 259 H100 PCIe GPUs it differed by as a lot as 34.5 p.c, and the reminiscence bandwidth of the 253 H200 SXM GPUs various by as a lot as 38 p.c.
SOURCE: SILICON DATA
Variations in how the GPU is cooled, how cloud operators configure their computer systems, and the way a lot use the chip has seen can all contribute to variations in efficiency of in any other case an identical chips. However Silicon Knowledge’s evaluation confirmed that the actual offender was variations within the chips themselves, probably as a result of manufacturing points.
Such randomness has actual dollars-and-cents penalties, the researchers argue, as a result of there’s an opportunity {that a} pricier, extra superior GPU received’t ship higher efficiency than an older mannequin chip.
So what ought to GPU renters do? “Probably the most sensible strategy is to benchmark the precise rental they obtain,” says Jason Cornick, head of infrastructure at Silicon Knowledge. “Working a benchmark device [such as SiliconMark] permits them to check their particular occasion’s efficiency towards a broader corpus of information.”
From Your Website Articles
Associated Articles Across the Net
