CUDA Proves Nvidia Software Company

Forgive me to start with a cliché, a piece of financial jargon that has recently entered the tech lexicon, but I’m afraid I have to talk about “moats.” Popularized decades ago by Warren Buffett to refer to a company’s competitive advantage, the term found its way onto Silicon Valley pitch decks when a memo from Google, titled “We Don’t Have Moat, And Neither OpenAI,” was said to have leaked, worrying that open AI would rob the stronghold of Big Tech.
After several years, the walls of the castle remain safe. Despite the brief panic when DeepSeek first appeared, open source AI models have not outperformed proprietary models. Still, none of the frontier labs—OpenAI, Anthropic, Google—have a moat to speak of.
The company with the moat is Nvidia. CEO Jensen Huang called it his most valuable “treasure”. It is not, as you might think of a chip company, a piece of hardware. It’s something called CUDA. What sounds like a chemical compound banned by the FDA may be the only real drain on AI.
CUDA is technically static of the Compute Unified Device Architecture, but very similar laser or scubano one bothers to expand the summary; we just say “KOO-duh.” So what is this precious treasure good for? If you’re forced to give a one-word answer: matching.
Here is a simple example. Suppose we give a machine the task of completing a 9×9 multiplication table. Using a single-core computer, all 81 tasks are efficiently executed one by one. But a GPU with nine cores can assign tasks so that each core takes a different column—one from 1×1 to 1×9, another from 2×1 to 2×9, and so on—achieving a ninefold speedup. Modern GPUs can be even smarter. For example, if it is programmed to recognize the variable—7×9 = 9×7—they can avoid duplicate work, reduce 81 tasks to 45, and almost cut the workload in half. If a single training costs a hundred million dollars, all improvements count.
Nvidia’s GPUs were originally designed to provide graphics for video games. In the early 2000s, a Stanford PhD student named Ian Buck, who first got into GPUs as a gamer, realized that their architecture could be repurposed for high-performance computing. He created a programming language called Brook, was hired by Nvidia, and, with John Nickolls, led the development of CUDA. If AI ushers in an age of permanent white-collar weapons and autonomous weapons, just know that it will all be because someone is playing somewhere. Annihilation I thought the demon’s scrotum should move at 60 frames per second.
CUDA is not a programming language itself but a “platform.” I use that weasel word because, it’s not like the New York Times is a newspaper that’s also a games company, CUDA, over the years, has been a bunch of AI software libraries. Each task shaves nanoseconds off of one-plus math operations, making GPUs, in industry parlance, go. brr.
Modern graphics A card is not just a circuit board full of chips and memory and fans. It is a detailed combination of cache layers and special units called “tensor cores” and “streaming multiprocessors.” In that sense, what the chip companies sell is like a professional kitchen, and more cores are like more heat stations. But even a kitchen with 30 grill stations won’t run fast without a skilled chef assigning tasks intelligently—like CUDA does to GPU cores.
To extend the metaphor, hand-tuned CUDA libraries optimized for single-matrix operations are the equivalent of kitchen tools designed for one task and nothing else—a cherry pitter, a shrimp deveiner—which is an indulgence for home cooks but not when you have 10,000 shrimp guts to gut. Which brings us back to DeepSeek. Its developers have moved beneath this already deep layer of functionality to work directly on PTX, a type of assembly language for Nvidia GPUs. Let’s say the task is to peel garlic. Unused GPU will go: “Peel off the skin with your fingers.” CUDA would command: “Crush the clove with a flat knife.” The PTX allows you to cut to the smallest detail: “Lift the blade 2.35 inches above the cutting board, align it with the clove’s axis, and strike down with the palm of your hand with 36.2 pounds of force.”
.jpg)


