What AI Builders Are Actually Doing in 2025


The narrative around AI often drifts into hype, but the reality inside product teams tells a different story. A recent survey of AI builders provides a clearer picture of actual workflows, tradeoffs, and experiments happening behind the scenes.

AI development is maturing. Teams are moving past the initial excitement and are now focused on operationalizing, evaluating, and integrating AI into core systems.

Companies that build solid foundations today—focusing on strong evals, model strategies, and reliable agent tooling—will be the ones shipping the most compelling AI products in the years ahead.

Here are the nine themes that stood out from the data.

1. Teams Are in the “Try Everything” Phase

Teams are currently testing every approach available. From Agents and RAG to synthetic data and new tools like the Model Context Protocol (MCP), builders are in a necessary exploration phase to determine what works best for their specific use cases.

2. Open Source Is Winning Where It Matters

Open-source models are dominating real-world usage. While closed-source APIs are still used as “precision tools” for specific latency or accuracy needs, teams prefer open source for control, affordability, and customizability.

Here’s how teams describe their model strategies:

  • 21.8% → exclusively open source
  • 44.6% → mostly open source
  • 66.4% total → open source is the default

3. The Biggest AI Impact Isn’t Customer-Facing

Contrary to the focus on customer-facing features, internal tools are currently the primary driver of AI adoption.

  • 65.6% → using AI to build internal tools
  • 61.3% → using AI to improve existing products
  • 57.1% → building internal AI experiences

This makes sense: internal workflows have fewer constraints, less regulatory pressure, and usually a clearer ROI. So while everyone talks about the shiny customer-facing features, the real productivity boost is happening behind the scenes.

4. Agents Are Getting Real System Access

Agents are moving beyond prototypes. Teams are granting them meaningful access to databases, web search, and file systems. This shift signals that agents are being integrated into real systems with real responsibilities, making safety and oversight increasingly critical.

  • 72.4% → database access
  • 59.1% → web search
  • 55% → memory systems and file systems
  • 45.8% → code interpreter capabilities

5. Evaluations Are Everywhere

Trust requires verification. Over 99% of respondents are running evaluation processes, ranging from AI-based evals to human judgment and automated checks. Evaluations are no longer just a QA step; they are central to the entire engineering lifecycle.

Types of evals:

  • AI-based evals: 66.6%
  • User surveys: 60%
  • A/B testing: 47.7%
  • Human judgment: 41.2%

And for automated checks:

  • Verification: 52.5%
  • Error analysis: 50.8%
  • Regression tests: 49.4%
  • Unit tests: 40.2%

6. RLFT Is Showing Strong Results

Reinforcement Learning from Human Feedback (RLFT) is showing significant results, with many teams reporting performance lifts of over 30%. This suggests RLFT will likely become standard practice for optimizing model performance.

  • 74.6% → saw >16% performance lift
  • 30.5% → saw >30% lift
  • A handful → reported >45% lift

7. Fine-Tuning Is Mainstream But Startups Are Behind

While fine-tuning is mainstream among enterprises (who have the data and regulatory pressure), startups often stick to base models for speed and cost-efficiency. This gap highlights how organizational maturity influences technical strategy.

  • 52.4% of startups → not fine-tuning
  • Only 17% of enterprises → not fine-tuning

8. MCP Adoption Is Rising Fast

The Model Context Protocol (MCP) is seeing early but promising adoption for connecting data and bridging internal tools. It resembles the early days of APIs-as-products, with adoption expected to accelerate.

  • 16.7% → using MCP servers
  • 32.9% → using MCP via LLM clients

9. Synthetic Data Is Becoming Standard

Synthetic data has graduated from an experiment to an everyday tool for evaluations and fine-tuning. Teams are generating their own data to stress-test agents and bootstrap behaviors, reducing reliance on perfect real-world datasets.

  • 63% → using it for evaluations
  • 22.3% → using it for fine-tuning