Nvidia’s meteoric rise to the world’s most valuable company has drawn attention from the tech stack to the datacenter, where the company makes 83% of its revenue. But the effects of AI’s seemingly unlimited compute needs are rippling throughout the ecosystem; from the crypto mining companies pivoting into AI data centers as-a-service, to the stellar IPO performance of AI hardware maker Astera Labs. The catalyst is the proliferation of foundation model companies (to which the top 9 including OpenAI, Anthropic, Cohere, and Mistral have raised $28.67B) that collectively spend billions on compute to train and serve their models. The general consensus with the current transformer architecture is that we haven’t reached scaling limits on the model’s ability to improve quality and performance with increasing model size, data, and of course, compute. Therefore the companies with the best access to training data and largest scale compute (hello Nvidia) win.
However, another framing we think about is that if many well-funded companies race to compete at the model layer, the differentiator of “who’s better” becomes “who’s more cost effective”. Foundation model providers will shift from optimizing performance to optimizing efficiency to drive competitive margins. In this framing of “who’s more cost effective”, energy costs / output is the main differentiator. It took 50 GWh of electricity to train GPT-4, roughly 50x more than that of GPT-3. That’s equivalent to the annual electricity consumption of ~5000 homes. Inference is also power consumptive - requiring upwards of 2.9 Wh per inference call for an intensive task like generating an image. For context, a typical cell phone battery is 15 Wh, so one inference call is comparable to 20% of battery life.
The improvement of overall performance, coupled with cost efficiencies around training and inference is the perfect storm for market expansion for AI companies.
Data Center and Models
At the datacenter and model layer, we’re excited by startups building hardware that’s more efficient than the state-of-the-art and novel ways to generate power. While startups may not be able to compete directly with the OpenAI’s of the world on the basis of scale (or balance sheet), we’re excited by companies building new model architectures that improve on the limitations of the transformer architecture. For example, State Space Models like Mamba address sequence length scaling challenges that exist with the attention mechanism.
Infrastructure Software
At the infra layer, we think there will be room for companies that push us away from expensive, generalized foundation models towards more efficient task-specific models. We think there are opportunities from startups enabling edge inference to those building inference specific hardware. We are also seeing the research community produce novel solutions to problems like knowledge distillation, efficient fine-tuning, model merging, leveraging new RAG architectures, etc. There’s a lot more here that we want to dedicate to its own post, so stay tuned for more on this infra layer. Beyond infrastructure for AI, we think more efficient models are a major unlock to automate many of the manual workflows of developers, security teams, data engineers, etc.
Vertical applications of AI
At the application layer, as underlying foundation models become more commoditized and cost effective, those that win are those that are able to build data moats around contextualizing off the shelf models to improve their performance. Just as the cloud providers were the catalyst for the decade-plus long market expansion of SaaS through cheaper and better infrastructure, we think that we are in the early innings of foundational model companies doing the same to AI-first applications. Companies can take advantage of the decreasing costs and increasing quality of these AI models, and by doing so, build a strong competitive edge by acquiring unique, high-quality datasets to power their use case. This creates a "data moat" that allows companies to offer better, more unique experiences compared to standard models. We're particularly excited about companies that are diving deep into specific workflows and vertical-specific applications. Areas like legal, healthcare, and manufacturing are perfect for AI-first solutions because they not only boost productivity but also directly impact the bottom line. For example, in industries where companies are paid on output rather than time, AI-driven software that can lead to 10x productivity gains are perfectly aligned and ripe for adoption.
Last year we wrote about the compounding effects of a Data Trojan Horse for AI-first companies. Better data leading means better models which means better user experiences to drive more data. However, an update to that flywheel is that we believe that there will be an inclusion of the effect of better energy efficiencies driving better contribution margins which ultimately drive more scale. If you’re a builder in the AI stack and think about the compounding loop of data, models, compute, and efficiency - from hardware to application layer - we’d love to chat.