Reading view

There are new articles available, click to refresh the page.

Mistral AI CEO Arthur Mensch on Microsoft, Regulation, and Europe’s AI Ecosystem

4 August 2024 at 11:00

Over the past year, Paris-based Mistral AI—one of the TIME100 Most Influential Companies of 2024—has rapidly risen as a homegrown European AI champion, earning the praise of French President Emmanuel Macron. The startup has released six AI language models that can answer questions, produce code, and carry out basic reasoning.

[time-brightcove not-tgx=”true”]

In June Mistral said it had raised $645 million in a funding round that reportedly values the company at more than $6 billion. That followed an announcement in February that Mistral had struck a deal with Microsoft to make its models available to the U.S. tech giant’s customers in exchange for access to Microsoft’s computational resources.

Mistral’s co-founder and CEO Arthur Mensch has been vocal in debates over the E.U.’s landmark AI law, arguing that rather than regulating general-purpose AI models like Mistral’s, lawmakers should focus on regulating how others use those models. He also opposes limitations on AI developers freely sharing their creations. “I don’t see any risk associated with open sourcing models,” he says. “I only see benefits.”

TIME spoke with Mensch in May about attracting scarce AI talent, how Mistral plans to turn a profit, and what’s missing from Europe’s AI ecosystem.

This interview has been condensed and edited for clarity.

Florian Bressand, chief business officer at Mistral, told CNBC a few months ago that more than half of the team that developed Llama now work for Mistral. How have you managed to draw so many talented researchers away from well-resourced companies like Meta?

At first, we hired our friends. We could do it because we made some meaningful contribution to the field, and so people knew that it was interesting to work with us. Later on, starting in December, we started to hire people that we knew less. That owed to the strategy we follow, to push the field in a more open direction. That’s the mission that is talking to a lot of scientists, that for similar reasons as we do, liked the way it was before when communication and information was circulating in the free way.

There are so few people around the world who have trained the sorts of AI systems that Mistral does. I know France has a thriving AI scene, but do you think you managed to hire some significant proportion of the people who know how to do this—perhaps even all of them?

Not all of them. There’s a couple of them, friends of ours, at Google, at OpenAI, a few of them remain at Meta. But for sure, we attracted, let’s say, 15 people that knew how to train these models. It’s always hard to estimate the size of the pool, but I think it’s probably been maybe 10% of the people that knew how to work on these things at the time.

Mistral has been fundraising. What are you spending the money on?

We’re spending the money on mostly compute. It’s an industry which is structured differently than software, in the sense that the investment that you need to do at the beginning to get the scientific team running, to get models that are on the frontier of technology, is quite significant. Today, we are still running on our seed compute, but finally getting access to the compute that we raised money for in December.

Executives at almost all the other foundation model companies have talked about how they expect to spend $100 billion on compute in the coming years. Do you have similar expectations?

What we’ve shown is that we burn a little north of 25 million [euros] in 12 months, and that has brought us to where we are today, with distribution which is very wide across the world, with models that are on the frontier of performance and efficiency. Our thesis is that we can be much more capital efficient, and that the kind of technology we’re building is effectively capital intensive, but with good ideas [it] can be made with less spending than our competitors. We have shown it to be true 2023-2024, and we expect it to remain true 2024-2025. With obviously the fact that we will be spending more. But we will still be spending a fraction of what our competitors are spending.

Are you profitable at the moment?

Of course not. The investment that we have to make is quite significant. The investment that we did and the revenue that we have, is actually not completely decorrelated, unlike others. So it’s not the case [that Mistral is profitable], but it’s also not expected from a 12-month-old startup to be profitable.

What’s the plan for turning a profit? What is your business model?

Our business model is to build frontier models and to bring them to developers. We’re building a developer platform that enables [developers] to customize AI models, to make AI applications, that are differentiated, that are in their hands—in the sense that they can deploy the technology where they want to deploy it, so potentially not using public cloud services, that enable them to customize the models much more than what they can do today with general-purpose models behind closed opaque APIs [application programming interfaces]. And finally, we are also focusing a lot on model efficiency, so enabling for certain reasoning capacity, making the models as fast as possible, as cheap as possible.

This is the product that we are building: the developer platform that we host ourselves, and then serve through APIs and managed services, but that we also deploy with our customers that want to have full control over the technology, so that we give them access to the software, and then we disappear from the loop. So that gives them sovereign control over the data they use in their applications, for instance.

Is it fair to say that your plan is to make AI models that almost match those of your competitors, at a lower cost to you and your customers, and that are more openly available? Or do you hope to match your competitors’ most advanced models, or “frontier models,” in terms of absolute capabilities?

We also intend to compete on the frontier. There’s some phenomenon of saturation of performance that has enabled us to catch up, and we intend to continue catching up, and eventually to be as competitive as the others. But effectively the business model we have is a business model that the others do not have. We are much more open about sharing and customizing and deploying our technology in places where we stop having control.

Recently, your most capable models did move behind an API, whereas you started with all of your models being open. Why did you change your approach?

This is not something that we changed on. We always had the intention to have leading models in the open source world, but also have some premium features that could only be accessed through monetized services.

We have a very significant part of our offer which is open source, and that enables developers to adopt the technology and to build whatever they need to build with it. And then eventually, when you want to move the workloads that they’ve built into production, or if you want to make them better, more efficient, better managed with lower maintenance costs, then these developers come and use our platforms, use potentially our optimized models to have an increase in performance and speed in reasoning capacities.

It’s always going to stay that way. The open source path is very important to us. We are building the developer platform on top of it, which is obviously going to be monetized because we do need to have a valid business model. But we expect to bring extra value to developers that are using our open-source models.

You have often argued that Europe can’t be dependent on American AI companies and needs a homegrown AI champion. Mistral is one of the most prominent European AI companies but has a partnership with a U.S.-based “hyperscaler,” Microsoft, to access the computational power it needs. Does Mistral’s dependency on Microsoft for compute limit its ability to play the role of sovereign AI champion?

We have four cloud providers. We are cloud independent by design, and that has been our strategy from day one. Our models are available through Microsoft Azure, but also through [Amazon Web Services], and also through [Google Cloud Platform]. We use the three of them as cloud providers. We also use different cloud providers—CoreWeave, in particular—for training. We have built our stack of technology and our distribution channels to build the independence that we believe our customers need.

Should Europe try to build its own sovereign compute infrastructure in addition to having AI labs based in Europe?

I think it would be beneficial for the ecosystem. But Europe isn’t some actor that takes independent decisions and builds something out of thin air. There’s an ecosystem aspect to it, of how do you make sure that effectively some of the infrastructure for compute is available in Europe.

This is super important for our customers, because some of them are European customers, and they do want to have some form of sovereignty over the cloud infrastructure they use. In that respect, there is already some availability, and our inference, our platform is actually deployed in Europe. But there could be some improvement. It’s not Europe that decides it. It’s an ecosystem that needs to realize that there are some needs that could be addressed. We would love to have some European cloud partners in the near future.

Cedric O, former Minister of State for Digital Affairs of France and one of your co-founders, has warned that the E.U. AI Act could “kill” Mistral. The Act passed, but the codes of conduct for general purpose AI models are yet to be written. What should those look like?

Generally, the AI Act is something that is very workable with, in the sense that the constraints that we have are constraints that we already meet. We already document the way we use our models, the way we evaluate our models, and this has become a requirement for frontier models. So this is fine.

There’s some discussions to be had on the aspect of transparency for training data sets, which is something that we’d really love to enable, but which is something that needs to be measured against trade secrets. A lot of our [intellectual property] is also in the way we treat the data and select the data. This is also the IP of others. As a small company, we are defensive about our IP, because this is the only thing we have. And so in that perspective, we are confident that we can find a way for things to be acceptable for every side.

We’ve been asked to participate in the technical specifications and to give our input. We also expect that Europe should, in all independence, make its choice to enable the development of the ecosystem and also to enable everybody to be happy.

There are a lot of sound bites out there from executives at your competitors about how AI is going to change the world over the next five or 10 years and the things that they’re worried about, the various transformational things that they think might happen as a result of the development of more powerful AI systems. Do you have predictions for how AI might change the world?

We make a powerful technology, but I think there’s a tendency of assuming that this powerful technology is solving every problem. At Mistral, we’re very focused on making sure that our technology brings productivity gains, brings reasoning capacity to certain verticals, to certain fields that we will have societal benefits.

Everything that humanity has been building are tools, and we’re bringing a new tool that brings new abstract capabilities. So in a sense, you can see it as a more abstract programming language. It’s been 50 years since we programmed in computer-understandable language. Today, we are able to create systems by talking to them in English or in French, or in any language. That brings a new abstraction to workers, to developers, and that obviously changes the way we’re going to work in the next 10 years.

I think if we do things properly and we ensure that this tool is in the hands of everyone—and this is really why we created Mistral—we can ensure that this brings improvement to everybody’s lives—across the world, across the spectrum of socio-economic status. The way to do it is, for us, to first enable strongly differentiated applications in useful sectors like health care, like education. It’s also very important to ensure that the population is trained and has access to the technology, and to enable access to this technology—providing the technology in a more open way than the others, is a way to accelerate it. It’s not sufficient in the sense that the political decision makers also have to create enablement programs to accelerate access to the internet in parts of the world where it’s not not accessible. But I think what we’re building has a positive effect in enabling the population to access that new tool, which is generative AI.

Is there any scenario you can imagine in the future, where you’ve developed an AI model, or you’re in the process of developing one, and you notice certain capabilities. Would there ever be a scenario where you decided it’s better not to open source the model and to keep it behind an API, or not even deploy it behind an API?

Not in the foreseeable future. The models we build have predictable capabilities. The only way that we found collectively to govern software and the way it is used in a collective way is through open source. This has been true for cybersecurity. This has been true for operating systems. And so the safest technologies today are the open source technologies.

In a sense, AI is not changing anything about software. It’s only a more abstract way of defining software. So I don’t see any risk associated with open sourcing models. I only see benefits. This is a neutral tool that can be used to do anything. We haven’t banned the C [computer programming] language, because you could build malware with the C language. There is nothing different about the models that we release. And so it remains important to control the quality of applications that are put on the market. But the technology that is used to make these applications is not the one thing that can be regulated.

Anthropic CEO Dario Amodei on Being an Underdog, AI Safety, and Economic Inequality

Tech – TIME

Billy Perrigo

23 June 2024 at 11:00

Hanging on the wall of Anthropic’s offices in San Francisco in early May, a stone’s throw from the conference room where CEO Dario Amodei would shortly sit for an interview with TIME, was a framed meme. Its single panel showed a giant robot ransacking a burning city. Underneath, the image’s tongue-in-cheek title: Deep learning is hitting a wall. That’s a refrain you often hear from AI skeptics, who claim that rapid progress in artificial intelligence will soon taper off. But in the image, an arrow points to the robot, labeling it “deep learning.” Another points to the devastated city: “wall.”

[time-brightcove not-tgx=”true”]

The cheeky meme sums up the attitude inside Anthropic, an AI company where most employees seem to believe that AI progress isn’t slowing down—and could potentially result in dangerous outcomes for humanity. Amodei, who was on the cover of TIME last month as his company was named one of TIME’s 100 Most Influential Companies of 2024, says Anthropic is devoted to studying cutting-edge AI systems and developing new safety methods. And yet Anthropic is also a leading competitor to OpenAI, releasing powerful tools for use by the public and businesses, in what even Amodei himself has worried might be a perilous race that could end badly.

On June 20, Anthropic fired its latest broadside in that race, releasing the latest version of its Claude chatbot: Claude 3.5 Sonnet. The model, according to the company, sets new industry standards on reasoning, coding and some types of math, beating GPT-4o, the most recent offering from OpenAI. “With today’s launch, we’re taking a step towards what we believe could be a significant shift in how we interact with technology,” Amodei said in a statement on June 20. “Our goal with Claude isn’t to create an incrementally better [large language model] but to develop an AI system that can work alongside people and software in meaningful ways.”

TIME’s cover story about Amodei last month delved into the question of whether Anthropic can succeed at its safety mission while under pressure to compete with OpenAI, as well as tech giants like Microsoft, Google and Amazon—the latter two of whom are significant investors in Anthropic. Today, TIME is publishing a longer excerpt from the interview conducted in early May for that piece. It has been condensed and edited for clarity.

Anthropic is the youngest “frontier” AI lab. It’s the smallest. Its competitors have more cash. And in many ways, it’s the most expressly committed to safety. Do you think of yourselves as underdogs?

Certainly all the facts you cite are true. But it’s becoming less true. We have these big compute deals. We have high single-digit billions in funding. Whereas our competitors have low double-digits to, somewhat frighteningly, triple-digits. So I’m starting to feel a little bit awkward about calling ourselves the underdogs. But relative to the heavyweights in the field, I think it’s absolutely true.

The AI “scaling laws” say that as you train systems with more computing power and data, they become predictably more capable, or powerful. What capabilities have you seen in systems that you’re not able to release yet?

We released our last model relatively recently, so it’s not like there’s a year of unreleased secrets. And to the extent that there are, I’m not going to go into them. But we do know enough to say that the progress is continuing. We don’t see any evidence that things are leveling off. The reality of the world we live in is that it could stop any time. Every time we train a new model, I look at it and I’m always wondering—I’m never sure in relief or concern—[if] at some point we’ll see, oh man, the model doesn’t get any better. I think if [the effects of scaling] did stop, in some ways that would be good for the world. It would restrain everyone at the same time. But it’s not something we get to choose—and of course the models bring many exciting benefits. Mostly, it’s a fact of nature. We don’t get to choose, we just get to find out which world we live in, and then deal with it as best we can.

What did you set out to achieve with Anthropic’s culture?

We have a donation matching program. [Anthropic allows employees to donate up to 25% of their equity to any charity, and will match the donation.] That’s inclined to attract employees for whom public benefit is appealing. Our stock grants look like a better deal if that’s something you value. It doesn’t prevent people from having financial incentives, but it helps to set the tone.

In terms of the safety side of things, there’s a little bit of a delta between the public perception and what we have in mind. I think of us less as an AI safety company, and more think of us as a company that’s focused on public benefit. We’re not a company that believes a certain set of things about the dangers that AI systems are going to have. That’s an empirical question. I more want Anthropic to be a company where everyone is thinking about the public purpose, rather than a one-issue company that’s focused on AI safety or the misalignment of AI systems. Internally I think we’ve succeeded at that, where we have people with a bunch of different perspectives, but what they share is a real commitment to the public purpose.

There’s a real chance that Donald Trump gets elected at the end of this year. What does that mean for AI safety?

Look, whoever the next president is, we’re going to work with them to do the best we can to explain both that the U.S. needs to stay ahead of its adversaries in this technology, but also that we need to provide reasonable safeguards on the technology itself. My message is going to be the same. Obviously different administrations are going to have different views, and I expect different [government] policies depending on the outcome of the election. All I can really do is say what I think is true about the world.

One of our best methods for making AI systems safer is “reinforcement learning,” which steers models in a certain direction—to be helpful and harmless—but doesn’t remove their dangerous capabilities, which can still be accessible through techniques like jailbreaking. Are you worried by how flimsy that method seems?

I don’t think it’s inherently flimsy—I would more say that we’re early in the science of how to steer these systems. Early in that they hallucinate, early in that they can be jailbroken. Early in the sense that, every time we want to make our model more friendly sounding, then it’ll turn out that in some other context, it gives responses that are too long. You’re trying to make it do all the good things, and not all the bad things, but that’s just hard to do. I think it’s not perfect, and I think it’ll probably never be perfect. We have a Swiss cheese model: you can have one layer that has holes in it, but if you have 10 layers of Swiss cheese, it’s hard for anything to fly through all 10 layers. There’s no silver bullet that’s going to help us steer the models. We’re going to have to put a lot of different things together.

Is it responsible to use a strategy like that if the risks are so high?

I’d prefer to live in an ideal world. Unfortunately in the world we actually live in, there’s a lot of economic pressure, there’s not only competition between companies, there’s competition between nations. But if we can really demonstrate that the risks are real, which I hope to be able to do, then there may be moments when we can really get the world to stop and consider for at least a little bit. And do this right in a cooperative way. But I’m not naive—those moments are rare, and halting a very powerful economic train is something that can only be done for a short period of time in extraordinary circumstances.

Meta is open-sourcing models that aren’t too far behind yours, in terms of capabilities. What do you make of that strategy? Is it responsible?

Our concern is not primarily about open-source models, it’s about powerful models. In practice, what I think will happen with most of these companies—I suspect it will happen with Meta, but I’m not certain—is that they’ll stop open-sourcing below the level where I have concerns. I ultimately suspect that open-source as an issue is almost a red herring. People treat open-source models as if they undermine our business model. The truth is, our competitors are anyone who can produce a powerful model. When someone hosts one of these models—and I think we should call them open weights models—they get put on the cloud and the economics are the same as everything else.

Microsoft recently said it is training a large language model in-house that might be able to compete with OpenAI, which they’ve invested in. Amazon, which is one of your backers, is doing the same. So is Google. Are you concerned that the smaller AI labs, which are currently out in front, might just be in a temporary lead against these much better-resourced tech companies?

I think a thing that has been prominent in our culture has been “do more with less.” We always try to maintain a situation where, with less computational resources, we can do the same or better than someone who has many more computational resources. So in the end, our competitive advantage is our creativity as researchers, as engineers. I think increasingly in terms of creative product offerings, rather than pure compute. I think you need a lot of pure compute. We have a lot, and we’re gonna get more. But as long as we can use it more efficiently, as long as we can do more with less, then in the end, the resources are going to find their way to the innovative companies.

Anthropic has raised around $7 billion, which is enough money to pay for the training of a next-generation model, which you’ve said will likely cost in the single-digit billions of dollars. To train the generation after that, you’re starting to have to look at raising more cash. Where do you think that will come from?

I think we’re pretty good for the one after the next one as well, although in the history of Anthropic, we’re going to need to raise more. It’s hard to say where that [will come from]. There are traditional investors. There are large compute providers. And there are miscellaneous other sources that we may not have thought of.

One source of funding that doesn’t get talked about very much is the government. They have this scale of money—it might be politically difficult, but is that something you’ve thought about, or even had conversations about?

The technology is getting powerful enough that government should have an important role in its creation and deployment. This is not just about regulation. This is about national security. At some point, should at least some aspects of this be a national project? I certainly think there are going to be lots of uses for models within government. But I do think that as time goes on, it’s possible that the most powerful models will be thought of as national-security-critical. I think governments can become much more involved, and I’m interested in having democratic governments lead this technology in a responsible and accountable way. And that could mean that governments are heavily involved in the building of this technology. I don’t know what that would look like. Nothing like that exists today. But given how powerful the thing we’re building is, should individuals or even companies really be at the center of it once it gets to a certain strength? I could imagine in a small number of years—two, three, five years—that could be a very serious conversation, depending on what the situation is.

Do you think the current level of inequality in the world is too much?

I would say so. We have billions of people who still live on less than $1 per day. That just doesn’t seem good.

One of the things you hear from someone like Sam Altman is, if we can raise the floor for those people, then he’s comfortable with the existence of trillionaires. How do you think about the power imbalance that comes with that level of inequality?

There’s two separate things there. There’s lifting the floor, and then there’s power within the world. I think I’m a little less sanguine about this concentration of power. Even if we materially provide for everyone, it’s hard for democracies and democratic decision-making to exist and to be fair, if power is concentrated too much. Financial concentration of power is one form of power, and there are others. But there were periods in U.S. history—look at the Gilded Age—where industrialists essentially took over government. I think that’s probably too extreme. I’m not sure the economy can function without some amount of inequality. But I’m concerned we might be in one of those eras, where it’s trending towards being too extreme to be healthy for a democratic polity.

Ideas around guaranteed basic income—if we can’t think of anything better, I certainly think that’s better than nothing. But I would much prefer a world in which everyone can contribute. It would be kind of dystopian if there are these few people that can make trillions of dollars, and then the government hands it all out to the unwashed masses. It’s better than not handing it out, but I think it’s not really the world we want to aim for. I do believe that if the exponential [rate of AI progress] is right, AI systems will be better than most humans, maybe all humans, at doing most of the things humans do. And so we’re really going to need to rethink a lot. In the short run, I always talk about complementarity—we should make sure humans and AIs work together. That’s a great answer in the short run. I think in the long run, we’re really going to need to think about, how do we organize the economy, and how humans think about their lives? One person can’t do that. One company can’t do that. That’s a conversation among humanity. And my only worry is, if the technology goes fast, we’ll have to figure it out fast.