Startup

What AI Builders Are Actually Doing in 2025

Published Nov 17, 2025

Everywhere you look, there’s a new AI headline telling you what should be happening.
But the reality inside product teams is usually very different from the stories on LinkedIn.

The latest survey on AI builders gives us a clearer picture of what teams are actually doing. Not the hype, not the wishful thinking, but the real workflows, the common tradeoffs, and the messy experiments happening behind the scenes.

AI development is maturing quickly. Teams are experimenting, operationalizing, evaluating, fine-tuning, and connecting AI directly into the heart of their systems.

Everyone is past the honeymoon phase and rushing to execution.

Companies that build solid foundations today such as strong evals, model strategies, and reliable agent tooling, will be the ones shipping the most compelling AI products in the years ahead.

Below are the nine themes that stood out most. Some of them match the public narrative. Most of them don’t.

1. Teams Are in the “Try Everything” Phase

If there’s one theme that shows up everywhere, it’s this:
Teams are testing every approach they can get their hands on.

Agents? Yep.
RAG? Of course.
Evals? Several types.
Synthetic data? More common than you’d expect.
Tools that launched four months ago and are still half-baked? Also yes.

Even the Model Context Protocol (MCP) is already showing up in real workflows. Adoption is still low, but the fact it’s showing up at all tells you where things are heading.

This is the exploration phase.
Messy, experimental, and absolutely necessary.

2. Open Source Is Winning Where It Matters

One of the clearest signals: open-source models dominate real-world usage.

Here’s how teams describe their model strategies:

21.8% → exclusively open source
44.6% → mostly open source
66.4% total → open source is the default

Closed-source APIs still have strong adoption, but they’ve become “precision tools” reserved for when latency or accuracy requirements justify the cost. When teams want control, affordability, and customizability, they choose open source.

3. The Biggest AI Impact Isn’t Customer-Facing

There’s this narrative that AI innovation is all about customer-facing experiences, but teams report something different. Internal tools are actually leading the pack:

65.6% → using AI to build internal tools
61.3% → using AI to improve existing products
57.1% → building internal AI experiences

This makes sense: internal workflows have fewer constraints, less regulatory pressure, and usually a clearer ROI. So while everyone talks about the shiny customer-facing features, the real productivity boost is happening behind the scenes.

4. Agents Are Getting Real System Access (Not Just Demo Mode)

A lot of agent frameworks still feel experimental. But teams are clearly giving them meaningful access:

72.4% → database access
59.1% → web search
55% → memory systems and file systems
45.8% → code interpreter capabilities

This is a strong sign that agents aren’t just being used for weekend hackathon prototypes. They’re being plugged into real systems, with real responsibilities, and real risks.

It also means agent safety and oversight are about to get a lot more important.

5. Evaluations Are Everywhere (And Getting More Rigorous)

If AI systems are going to be trusted, they need evals. According to the survey, over 99% of respondents are running evaluation processes.

Types of evals:

AI-based evals: 66.6%
User surveys: 60%
A/B testing: 47.7%
Human judgment: 41.2%

And for automated checks:

Verification: 52.5%
Error analysis: 50.8%
Regression tests: 49.4%
Unit tests: 40.2%

This isn’t just a QA step anymore. Evaluations are being used throughout the entire lifecycle, from early specs to post-deployment monitoring.

It’s one of the clearest signs that AI engineering is maturing into its own discipline.

6. RLFT Is Showing Wildly Strong Results

Reinforcement learning from human feedback (RLFT) is still early… but the numbers are hard to ignore:

74.6% → saw >16% performance lift
30.5% → saw >30% lift
A handful → reported >45% lift

In any other field, a 30% performance improvement would be considered a breakthrough.
Here, it’s becoming normal.

This is probably the clearest “this will be standard practice soon” signal in the entire survey.

7. Fine-Tuning Is Mainstream But Startups Are Behind

Fine-tuning is widely used overall (80% adoption) but there’s a major divide:

52.4% of startups → not fine-tuning
Only 17% of enterprises → not fine-tuning

The simplest explanation:
Startups prefer base models because they’re cheaper, faster to implement, and “good enough” early on.

Enterprises, on the other hand, have the data, the regulatory pressure, and the scale to make fine-tuning worth the investment.

As models continue improving, this gap may close. But for now, it’s a clear signal of where organizational maturity plays a role.

8. MCP Is Still Early But Adoption Is Rising Fast

Model Context Protocol (MCP) is still in its early adopter phase, but the numbers already look promising:

16.7% → using MCP servers
32.9% → using MCP via LLM clients

Use cases include:

connecting to customer data
hooking into data warehouses
bridging internal tools
powering more capable agents

This feels similar to the early days of APIs-as-products: a small group experimenting, and everyone else watching closely.

Expect adoption to accelerate quickly.

9. Synthetic Data Is Quietly Becoming Standard

Synthetic data has moved from “interesting idea” to everyday tool:

63% → using it for evaluations
22.3% → using it for fine-tuning

Teams aren’t waiting for perfect datasets anymore.
They’re generating what they need, stress-testing agents, and bootstrapping new model behaviors before real data arrives.

A new market of synthetic data marketplaces and evaluation sets is likely just around the corner.

What AI Builders Are Actually Doing in 2025

1. Teams Are in the “Try Everything” Phase

2. Open Source Is Winning Where It Matters

3. The Biggest AI Impact Isn’t Customer-Facing

4. Agents Are Getting Real System Access (Not Just Demo Mode)

5. Evaluations Are Everywhere (And Getting More Rigorous)

6. RLFT Is Showing Wildly Strong Results

7. Fine-Tuning Is Mainstream But Startups Are Behind

8. MCP Is Still Early But Adoption Is Rising Fast

9. Synthetic Data Is Quietly Becoming Standard

You might be interested in:

How to Train an LLM to Think Like Your Product

How to Build Smarter Remarketing Lists with GA4 and WordPress

Privacy, Trust, and the New Rules of Modern Marketing

Leave a Reply Cancel reply