The News
There’s a new prize making waves in the AI research community. Andy Konwinski, a co-founder of Databricks and Perplexity, is offering up to $1 million for anyone who can get an open-source large language model to score up to 90% on a new benchmark that tests AI coding ability.
The contest, called K Prize, is the latest in a wave of similar awards for people who can solve problems with AI, something Semafor covered in June. Nat Friedman’s contest to decipher words on ancient scrolls turned to ash in the Mount Vesuvius volcanic eruption 2,000 years ago led to new historical knowledge of the period.
Konwinski’s prize is designed to entice smaller, independent researchers to experiment with new methods and techniques to train foundation models or build atop existing ones. AI has already had an impact in the field of software development, but the models are a long way from replacing human developers.
Step Back
The problem with AI benchmarks is that the scores are often inflated. That’s because training data used by AI models can be contaminated with the benchmarks themselves — akin to giving someone the answers to a test before they take it.
Despite the grade inflation, a coding test called SWE-bench has proven challenging for AI models. According to its website, the highest-performing model currently only scores 55% on the evaluation, which consists of real-world software problems posted to the popular coding repository site GitHub.
Konwinski said in an interview that the concern with SWE-bench was that the coding problems could simply be downloaded from the internet. He worked with SWE-bench and the machine learning site Kaggle, which often posts similar contests online, to come up with a special test that couldn’t be gamed, he said.
Essentially, Konwinski and SWE-bench will create a test that doesn’t yet exist at the time the AI models are submitted, ensuring the answers can’t be included in the training data.
The contest should provide the most accurate assessment yet of how well AI models can code.
“Better benchmarks could be very much at the heart of better technology,” he said.
Know More
To encourage independent developers to participate, Konwinski is limiting the prize money to open models, as opposed to closed models, like those made by OpenAI and Anthropic.
The prize money is coming out of his pocket, he said. (Konwinski has been wildly successful. Databricks was just valued at $62 billion and Perplexity is on course to reach a double-digit, billion-dollar valuation).
Kaggle will provide compute resources to the small developers who can’t necessarily afford or access the GPU power to train and test their models.
Now What?
Konwinski believes offering prize money to small developers will encourage “small AI.” While big companies have piled data and compute power into larger and larger AI models, Konwinski said there’s room for more elegant innovation and breakthroughs that don’t require the same kind of scaling.
Indeed, some AI researchers believe the human brain, capable of running on bread and water, is evidence that it’s possible to make AI models much smaller.
“Our target is not to displace something. It’s to get people to stay up late and try to make progress on the problem,” he said. “You’ll see that making AI smaller has this positive side effect of reinvigorating research.”
Konwinski announced the prize at the Neural Information Processing Systems conference last week.
The first person to hit 90% on the benchmark will get $1 million, he says.
As one commentator on Hacker News pointed out, “If your AI can do this, it’s worth several orders of magnitude more. Just FYI.”
The top submission will receive at least $50,000, even if it doesn’t hit the 90% mark. “The goal isn’t necessarily to win the world championship of this thing. It’s just to catalyze energy and breakthroughs,” Konwinski said.