• D.C.
  • BXL
  • Lagos
  • Riyadh
  • Beijing
  • SG
  • D.C.
  • BXL
  • Lagos
Semafor Logo
  • Riyadh
  • Beijing
  • SG


An AI coding company on how computers as we know it will change

Jan 15, 2025, 9:37pm EST
techNorth America
Replit’s CEO Amjad Masad.
Replit/YouTube
PostEmailWhatsapp
Title icon

The Scene

AI coding company Replit has hit an inflection point, thanks to a breakthrough that propelled its new product, called Agent, which creates software apps from a natural language prompt.

Agent has been wildly successful and changed the focus of the company, from one that serves software developers to one that caters to people who know very little about coding.

In the long run, Replit could be more like a new kind of computer operating system. Today, computers can run software that other people create. But we’re heading into an era where a computer writes software to automate tasks for our specific needs.

AD

Obviously, Replit has competition and Agent is by no means where it needs to be for that vision to come true — something CEO Amjad Masad fully acknowledges. But some of the takeaways from this edited conversation below are pretty profound.

Title icon

The View From Amjad Masad

Reed Albergotti: What made you want to move in the Agent direction?

Amjad Masad: There was a bit of a founder mode story there, where the company has gotten big, the culture has gotten a little slow, a little soft. So I created this thing called the Agent Task Force. The best people at the company. We’re going to change what we’re doing entirely. Replit is no longer about coding. Replit is about making software. It’s about being able to conjure up software just using natural language.

It wasn’t clear that it was possible at the time, but everyone there was motivated. We did the layoffs, and the people that weren’t part of this Agent Task Force were kind of demoralized and didn’t feel like they’re working on something important, so a lot of people ended up leaving. We lost half of the team. We were 130, now we’re about 65. But at the same time, we’re five times our revenue in the past five to six months.

AD

Wow. Where’s that coming from?

Agent. It was a huge hit. We launched it in September, and it’s basically the first at-scale working software agent you can try in the world today. And it’s the only one I would say.

What was supposed to be Devin (a coding agent made by Cognition)?

They just recently announced a $500 plan, and I don’t think they’re at the scale we’re at. With Replit, you can pay $25, you get quite a bit of quota. You can make 30-40 micro SaaS apps with $25 a month, and then you can pay as much as you want to continue building. People are making apps for $5-$10. A lot of people are just replacing their internal tools, replacing SaaS products entirely from their stack. They’re saving a lot of money for their company, or if they’re small entrepreneurs, they’re saving a lot of money for themselves.

Is there one that’s the most popular, specific tool that everybody’s making?

No, the interesting thing about it, it is a long-tail thing. If there was one popular thing, someone would make it, and everyone else would copy or fork it. But the cool thing is that it is maximally customizable. If you go to TypeForm, there’s some settings and knobs you can click. With Replit Agent, you can use the power of code, which is infinitely generalizable, to change whatever you want about it. So you can build a tool that fits exactly what you want to do.

AD

I’m working with a sleep doctor and he gave me a printed form and said, ‘I want you to record your diaries.’ If I print it on a piece of paper, I’m just going to lose it. So I took the PDF that he sent me, I put it into Replit Agent to make it into an app that I could record my sleep on. Over time, I’ve been adding things to it and adding ways to chart the sleeping data and analyze it. It’s like having a junior developer in your pocket.

You have to have a database somewhere, in the cloud, does it do all that stuff for you? Or do you have to know how to do that?

The cool thing about Replit is we spent the last seven, eight years building that. It does it all for you. It’s all hosted.

Before you announced the Agent, you said you had this advantage because you could see the code creation from conception to actual use. Did that end up being the key to the Agent Task Force?

The big unlock was a lot of engineering that we did here to make the Agent work. But the model came from Anthropic Claude 3.5 Sonnet [and] was a huge unlock for the entire industry. Cursor and others started really working after this model came out. That’s the interesting thing about the field [of] AI coding: Every time there’s one model that creates a leap, it allows all these companies that are building on top of it to create massively more interesting experiences.

What does that tell you about the way this is all going to progress? You had this amazing training data and it turns out, this model is big enough and powerful enough that it just does it.

I went back and forth on this. Data still matters. But the thing is, they really lived up to their name as foundation models, meaning they really became these general-purpose models that can be tuned to whatever downstream task that you’d like. We’re working on fine tuning using our data. So I think our data is going to be so valuable. But just the fact that we’re able to get here without using our data poses a lot of questions for the industry. There is something about these foundation model companies that feels more durable now than it felt maybe a year ago, because they’re truly powerful and generalizable.

If you could use Anthropic, doesn’t that mean there’s not much of a moat for you?

In the past seven, eight years, we’ve been working on a platform that can create a development environment that can run the code, that can install the packages, orchestrate the databases, and deploy. We have a one-click deploy. This is the value that we’re creating. It’s actually not a durable moat. It is a head start. As long as we keep the rate of innovation and the rate of progress, and we keep deepening that, I think we can continue to be ahead. The business question is, what is the durable moat?

It’s a real change from what you predicted was going to happen. The first time we did an interview, I asked, ‘Is this the thing — anybody can just create an app?’ And you said, ‘Oh, that’s not going to happen for the foreseeable future.’ I feel like that was yesterday.

I gave a TED Talk in October [2023], and I said, ‘Here’s where the future is headed, but I don’t know if it’s going to happen this decade.’ Literally, everything from that TED talk we have today. That was a huge change for me. I knew all this stuff was coming. I just didn’t think it was going to come this fast.

It still requires quite a bit of a human touch and creativity, and skill. We already know that prompting is a skill: Being able to take a high-level idea and break it down into individual components, being able to look at a system and give it feedback on how well it did on that. All these things make it so that it’s not fully automated. You’re actually doing quite a bit of work. That’s an innovation we made in that … there’s a human in the loop here.

Do you think that’s something that will endure?

I still think that you would need to direct it throughout the process. The problem we have with agents today, it’s very easy for them to drift, and this human-in-the-loop system that we have, where it goes and does some work [then] comes back to you. You’re sort of babysitting it, in a way. You’re bringing it back to the task you wanted to get done. Can it be creative and come up with ideas? I think so. But right now, that’s not the bottleneck. The bottleneck is that it just can drift.

Is the reasoning capability that test-time compute offers helping to solve that?

The insight from test-time compute is that, yes, LLMs can be random. They can drift and be wrong in all these ways. The way to go around that is you sort of run a parallel search, and then pick the best trajectory. We are looking at doing similar things. Give it this command, and it kicks off five or 10 agents competing on making the application. So, if we wanted to release, “Replit o1” [a reference to OpenAI’s GPT-o1 model], it would literally be doing that — forking off 10 processes and having some way to judge what is the best way to give it back to you.

You’re using Claude 3.5 Sonnet. What if you put o1 in there?

The way o1 works today is that it’s a little solipsistic. It thinks in its own head, and gives you some kind of final answer. You can give it a problem statement and it’ll go think really hard about it. It’s generating a lot of potential thoughts, and then having a reward model judge those thoughts. With programming, it is actually in some way easier, because you have ground proof. Your reward model could be trained on actual programming output. Unlike natural language, you can get the answer from the environment. With o1, if they give us hooks into its thinking process, where we could say, ‘run this program in Python’ or ‘read this file,’ it could get environmental feedback. In its current form, I don’t think it’s very good for Agent, because it doesn’t give us hooks into its thinking process to get feedback from the environment. You can imagine an alternative system similar to o1, where it can give you the tool calls within its thinking. It’s thinking ‘oh, I wonder what that file is’ and it reads that file. ‘I wonder if I ran this Python program, what it does.’”

The value of test-time compute for you would be to create all these different apps with the whole thing, connecting it to Python and connecting the database, but OpenAI can’t do that because it’s behind their wall.

That’s why [it’s] probably solipsistic. It’s in their data centers, like, looping over itself.

Have you experimented with your own test-time compute implementation?

Yes. Replit Agent already has some ideas from test-time compute, where we do quite a bit of thinking. We’ve generated a lot of tokens to reason through problems in the form of, ‘I saw this error in the console. Let me reflect on that and think about how to solve that. Maybe if I go look at this file. Let’s see what’s in there. Let me revert that change, and then let me rerun it now.’ Action observation is a form of test-time compute, because the standard way of using LLMs is request-response. The human sends a request, assistant sends a response. What we’re doing with Agent is: You send it a request, it can loop a number of times, and it determines the termination commission.

Could you create iOS apps with this too?

We’re working on it right now.

What’s the challenge there?

Apple is a very locked down ecosystem. It’s just the infrastructure challenges. How do you create them and make them run in the browser, so that people can see the results. And then how do you install them, deploy them, test them?

Have you seen anybody create a commercially viable product yet?

It’s early, it’s three months in, and I don’t think anything gained tremendous traction. We already have these success stories on Replit, pre-Agent. With Agent, there’s all these promising applications. If you ask me six months from now, I’d probably have one where you’re adding millions of ARR, but right now, it’s just people launching things.

People are excited about apps for work, apps for life, apps for family. Startup entrepreneurs building fully functioning applications is on the more challenging side, because the moment you get a little bit of traction, the demands on a system like that become exponential because there’s a lot of users, a lot of things that could go wrong. But if you have an app that’s taking off, at that point you can afford to invest in a developer.

I’d say in the next six months, that will change. We’ll probably see a number of startups fully running in Replit and Replit Agent. Over the next year or two, I do think this billion-dollar solo entrepreneur startup becomes increasingly possible.

There’s been a lot of people writing about how we’ve hit a wall, it’s not getting any better. Where do you think that’s coming from?

If you take a fully functional view on it, it is not the case at all. Are there things we couldn’t do last year that we could do today? If you’re in the coding space, that’s absolutely true. If you look at computer use, it’s absolutely true. Models now can use a computer, a mouse, and keyboard. That’s a huge unlock.

Capabilities are going really fast. The talk around the wall is perhaps a little more academic, which we haven’t seen the next generation of models. It’s more like we’re expanding the capabilities of the current generation models. Going from GPT, two to three, three to four. That hasn’t happened in a while. People expected it to happen in 2024. There’s rumors coming out of OpenAI that it’s a year late, so there’s some skepticism as to whether it’s going to be possible at all. But again, this is engaging too much into the academic details of it. I would like to take a step back and either look at the benchmark results or the capabilities, and they’re all going in the right direction.

I’m playing devil’s advocate here and just saying there is some truth to the idea that we haven’t seen a leap in a while. On the other hand, as someone who’s building real applications, I don’t give a shit. What I care about is, this thing continues to unlock more useful things for me to build around.

Do you think about infrastructure and building this insane amount of compute capacity? Do you try to plan how much compute you are going to be able to use in a year?

We’ve actually run into limits with Google Cloud and how much quota they can give us on some of their models. We do run into limits quite often. We’re scaling fast. We are among the top three largest Anthropic Claude users on Google Cloud, and we ran out of quota. There are clear limits. At some point, training was limited; inference is now becoming more limited, especially as you project forward with test-time compute and scaling. Do I worry about it? It’s like anything that’s outside of my control. I don’t worry about it as much. I try to understand it because it informs my worldview of where things are headed. But they should build all of that out, and I think all of it will get utilized.

Can you plan it in six months, an experiment that’s too expensive today won’t be anymore?

I don’t think all capabilities will come from scaling. Some of them will come from algorithmic changes. Some of them from data, from ways of training and ways of collecting data, ways of buying data. There’s a lot of buying data that’s going around. There are a lot of things to scale. You can scale data, you can scale training compute, you can scale inference compute. But it’s hard for me to come up with some kind of thing that will guide my business.

Instead, it needs to be almost a fate-based or belief-based approach. I came up with this law, for example, named after me. I said, every six months, the return on learning a little bit of coding doubles. Learning a little bit about how to go into Replit and do something every six months that will become more and more valuable. If you become a little more tech savvy — being able to debug a little bit more and read a little bit of code and understand a little bit more — the return on just a little bit of skill is exponentially more valuable.

Is that law holding true?

I think so. September was the big Agent launch, I would say sometime in the next few months, we’ll have another launch that could be another step change, and how easy it is and how more capable it is.

You can think about it from another point of view. What do you lose if you don’t engage in these things?

Is it getting so easy that, at some point, you don’t have to learn it?

I do think there’s skill. That skill looks very, very different. And I think that skill is much more pure. The skill of working with AI systems and systems like Replit is more pure. What’s left is: Are you able to generate good ideas, and are you able to turn those ideas into instructions for LLMs to follow? And then are you able to react to whatever result the LLM can give you, and be able to direct it in the correct way? That is a real skill. People call it prompting. I don’t think it’s just prompting. Part of it is clear thinking. Part of it is just general tech savviness, being able to understand how these systems work. If you’re able to do all of that, I think you’re going to be hugely successful.

Let’s say you’re a brilliant attorney. You’re a clear thinker, but you know nothing about this stuff. A year from now, can they jump into it because they have that thinking ability or are they behind?

They’re probably a little behind, because these systems are different, and you need to build an intuition for them. And it takes a little bit of time to build an intuition for them. I built an intuition for LLMs. By 2022, I sort of understood how to talk to them, and by the time ChatGPT came out, I was already sort of an expert at prompting. Maybe someone who started around when ChatGPT came out now has similar mental models that I have today. It takes time, but I do think that the precondition is thinking clearly.

So if you’re not building your own software, building apps for your family, you’re falling behind?

Finding software problems in your life is also a skill. Looking at a problem and saying, ‘a piece of software could solve that’ is a skill. Just remembering that even something so simple as keeping a sleep diary can be a useful app. A chore app, a travel planner.

The really exciting thing here is that the difference between programmers and non-programmers is collapsing rapidly, and lay people will have the opportunity to have as much command over computers as hackers do today. That’s been the big divide in technology ever since we invented the graphical user interfaces and things like that. You have the people who make the apps, and people who use them. That wasn’t the original vision of computing. The idea was that you’re the programmer, but it was too hard. We’re going back to that now.

When will the mouse go away, when will every user interface just be natural language?

I struggle with the “when” question because even if you get the timelines correct, the question of adoption is extremely hard, especially when you’re thinking about the government systems, education, the military systems — all these regulated industries. There’s still computers running Cobalt, Pascal, what have you. But if you remove that as a limiting factor, I do think that consumer software is probably changing drastically over the next two years. The audiovisual inputs are going to be the predominant way we use computers in the next two years.

What’s the trajectory for Replit? Do you see it as the place where professional coders now go?

I’d say we don’t care about professional coders anymore. The interesting thing about desktop interfaces is they took a population of 10 million users of computers to a population of 1 billion users of computers, and then mobile phones took it to 5 billion. There was this order of magnitude jump in how many people could use it because it’s that much easier.

Basically, if you’re able to use a spreadsheet, and there’s like a billion spreadsheet users globally, then you should be able to use something like Replit Agent to make software.

We’re going to release a version of our mobile app that’s optimized for making software using the Agent. Pretty soon it’s going to be consumer-grade. Kids, lay people who are not really related to the tech industry, should be able to use Replit to make applications to make their lives a lot better. It is similar to the PC revolution, the personal software revolution. I think that’s going to happen pretty quick. We’re adding customers very, very quickly. It’s very different than Replit previously, where we just added more and more users. Now we’re adding people who are spending money on the platform to make applications, which is great for us. But people are getting way more value out of the platform instead of just coming in, learning code, trying to build things, and really not making as much progress. Even by next month, it’s going to be a lot easier to make applications on Replit.


AD
AD