The Long Road to Genuine AI Mastery

11 September 2024 at 17:46

In the early 1970s, programming computers involved punching holes in cards and feeding them to room-size machines that would produce results through a line printer, often hours or even days later.

This is what computing had looked like for a long time, and it was against this backdrop that a team of 29 scientists and researchers at the famed Xerox PARC created the more intimate form of computing we know today: one with a display, a keyboard, and a mouse. This computer, called Alto, was so bewilderingly different that it necessitated a new term: interactive computing.

[time-brightcove not-tgx=”true”]

Alto was viewed by some as absurdly extravagant because of its expensive components. But fast-forward 50 years, and multitrillion-dollar supply chains have sprung up to transform silica-rich sands into sophisticated, wondrous computers that live in our pockets. Interactive computing is now inextricably woven into the fabric of our lives.

Silicon Valley is again in the grip of a fervor reminiscent of the heady days of early computing. Artificial general intelligence (AGI), an umbrella term for the ability of a software system to solve any problem without specific instructions, has become a tangible revolution almost at our doorsteps.

The rapid advancements in generative AI inspire awe, and for good reason. Just as Moore’s Law charted the trajectory of personal computing and Metcalfe’s Law predicted the growth of the internet, an exponential principle underlies the development of generative AI. The scaling laws of deep learning postulate a direct correlation between the capabilities of an AI model and the scale of both the model itself and the data used to train it.

Over the past two years, the leading AI models have undergone a staggering 100-fold increase in both dimensions, with model sizes expanding from 10 billion parameters trained on 100 billion words to 1 trillion parameters trained on over 10 trillion words.

The results are evocative and useful. But the evolution of personal computing offers a salutary lesson. The trajectory from the Alto to the iPhone was a long and winding path. The development of robust operating systems, vibrant application ecosystems, and the internet itself were all crucial milestones, each of which relied on other subinventions and infrastructure: programming languages, cellular networks, data centers, and the creation of security, software, and services industries, among others.

AI benefits from much of this infrastructure, but it’s also an important departure. For instance, large language models (LLMs) excel in language comprehension and generation, but struggle with reasoning abilities, which are crucial for tackling complex, multistep tasks. Yet solving this challenge may necessitate the creation of new neural network architectures or new approaches for training and using them, and the rate at which academia and research are generating new insights suggests we are in the early innings.

The training and serving of these models, something that we at Together AI focus on, is both a computational wonder and a quagmire. The bespoke AI supercomputers, or training clusters, created mostly by Nvidia, represent the bleeding edge of silicon design. Comprising tens of thousands of high-performance processors interconnected via advanced optical networking, these systems function as a unified supercomputer. However, their operation comes at a significant cost: they consume an order of magnitude more power and generate an equivalent amount of heat compared with traditional CPUs. The consequences are far from trivial. A recent paper published by Meta, detailing the training process of the Llama 3.1 model family on a 16,000-processor cluster, revealed a striking statistic: the system was inoperable for a staggering 69% of its operational time.

As silicon technology continues to advance in accordance with Moore’s Law, innovations will be needed to optimize chip performance while minimizing energy consumption and mitigating the attendant heat generation. By 2030, data centers may undergo a radical transformation, necessitating fundamental breakthroughs in the underlying physical infrastructure of computing.

Already, AI has emerged as a geopolitically charged domain, and its strategic significance is likely to intensify, potentially becoming a key determinant of technological preeminence in the years to come. As it improves, the transformative effects of AI on the nature of work and the labor market are also poised to become an increasingly contentious societal issue.

But a lot remains to be done, and we get to shape our future with AI. We should expect a proliferation of innovative digital products and services that will captivate and empower users in the coming years. In the long run, artificial intelligence will bloom into superintelligent systems, and these will be as inextricably woven into our lives as computing has managed to become. Human societies have absorbed new disruptive technologies over millennia and remade themselves to thrive with their aid—and artificial intelligence will be no exception.

Reading view