News

Error Analysis, Not Fancy Tools, Drives Highest ROI in AI Development

End Of Miles

16 Apr 2025 — 3 min read

"Most AI teams focus on the wrong things," declares AI consultant Hamel Husain, who has observed dozens of companies investing weeks into building complex AI systems while being unable to determine if their changes help or hurt. "Teams get caught up in architecture diagrams, frameworks, and dashboards while neglecting the process of actually understanding what's working and what isn't."

After helping more than 30 companies build AI products, Husain has discovered that successful teams barely talk about tools at all, End of Miles reports. Instead, they obsess over measurement and iteration through systematic error analysis, which he calls "consistently the highest-ROI activity" in AI development.

The "tools trap" leading teams astray

The experienced consultant identifies the "tools first" mindset as the most common mistake in AI development. This approach creates a dangerous illusion of progress that actively impedes real improvement.

"Generic metrics are worse than useless—they actively impede progress in two ways: First, they create a false sense of measurement and progress. Teams think they're data-driven because they have dashboards, but they're tracking vanity metrics that don't correlate with real user problems." Hamel Husain, AI Consultant

Husain notes that he's seen teams celebrate improving their "helpfulness score" by 10% while their actual users were still struggling with basic tasks. "It's like optimizing your website's load time while your checkout process is broken—you're getting better at the wrong thing," he explains.

What effective error analysis looks like

The AI expert points to apartment-industry startup Nurture Boss as an exemplary case study. Their team built a simple viewer to examine conversations between their AI and users, with space for open-ended notes about failure modes.

After annotating dozens of conversations, clear patterns emerged—their AI was struggling with date handling, failing 66% of the time when users said things like "Let's schedule a tour two weeks from now."

"Instead of reaching for new tools, they looked at actual conversation logs, categorized the types of date-handling failures, built specific tests to catch these issues, and measured improvement on these metrics. The result? Their date handling success rate improved from 33% to 95%." Husain

Bottom-up analysis reveals what really matters

The consultant contrasts two approaches to error analysis. The more common "top-down" method starts with generic metrics like "hallucination" or "toxicity." While convenient, this approach often misses domain-specific issues.

The more effective "bottom-up" approach forces teams to examine actual data and let metrics naturally emerge. At Nurture Boss, the team started with a simple spreadsheet where each row represented a conversation, wrote open-ended notes on undesired behaviors, and then used an LLM to build a taxonomy of common failure modes.

The AI specialist found the results striking—just three issues accounted for over 60% of all problems: conversation flow issues, handoff failures, and rescheduling problems. "The impact was immediate. Jacob's team had uncovered so many actionable insights that they needed several weeks just to implement fixes for the problems we'd already found," Husain reports.

Why custom data viewers transform development speed

According to Husain, the single most impactful investment any AI team can make isn't a fancy evaluation dashboard—it's building a customized interface that lets anyone examine what their AI is actually doing.

"Teams with thoughtfully designed data viewers iterate 10x faster than those without them. And here's the thing: These tools can be built in hours using AI-assisted development. The investment is minimal compared to the returns." Hamel Husain

The tech consultant emphasizes that effective data viewers show all context in one place, make feedback trivial to capture, enable quick filtering and sorting, and capture open-ended feedback. "It doesn't matter what web frameworks you use—use whatever you're familiar with. The key is starting somewhere, even if it's simple," he advises.

Having observed patterns across dozens of AI implementations, Husain is emphatic that successful teams aren't the ones with the most sophisticated tools or advanced models. "They're the ones that master the fundamentals of measurement, iteration, and learning," concludes the AI consultant, urging teams to prioritize systematic error analysis before investing in complex technical solutions.

Error Analysis, Not Fancy Tools, Drives Highest ROI in AI Development

End Of Miles

The "tools trap" leading teams astray

What effective error analysis looks like

Bottom-up analysis reveals what really matters

Why custom data viewers transform development speed

Read more

Students Engage With AI in Four Distinct Patterns, Anthropic Education Report Reveals

Google and Yale Create 'Virtual Cells' That Could Replace Lab Experiments

Current AI Training Deception Could Sabotage Future Human-AI Cooperation

"Even AI Specialists Can't Process How Fast These Models Are Improving"