Evaluation Platform for a Conversational AI Startup - platform screenshot

    Evaluation Platform for a Conversational AI Startup

    Happyverse · San Francisco, California · AI / SaaS · 6 weeks · Visit website

    Built an evaluation playground integrating 30+ AI providers across LLM, TTS, STT, and video avatars with real-time benchmarking and side-by-side comparison.

    Scope of work

    Multi-Provider Integration

    Real-Time Benchmarking

    Voice Cloning Pipeline

    Stack Builder & Presets

    Tech stack

    Next.js

    Python

    FastAPI

    WebSocket

    Docker

    Google Cloud

    Overview

    Sales engineers and product teams pick any combination of LLM, TTS, STT, and video avatar providers, run them side-by-side, and see exactly where each one wins or loses. Voice cloning lets users compare their own voice across providers. Every decision is backed by real-time latency and quality data.

    Challenge

    Happyverse builds lifelike video avatar products for enterprise clients. They had no systematic way to compare AI providers. Evaluation was ad hoc and subjective. Sales engineers spent hours configuring demos.

    Evaluation Platform for a Conversational AI Startup - screenshot 4
    Evaluation Platform for a Conversational AI Startup - screenshot 5

    Approach

    Multi-provider integration layer: Unified abstraction connecting 30+ providers across four categories: LLMs, text-to-speech, speech-to-text, and video avatars. Custom streaming integrations for providers lacking framework support.

    Voice cloning testing: Users clone their own voice and compare results across TTS providers, making trade-offs between emotion/prosody and latency visible and measurable.

    Real-time benchmarking dashboard: Each conversation captures per-component metrics: STT latency, LLM response time, TTS latency, avatar rendering, and end-to-end round-trip. Dashboards show distributions, not just averages.

    Configurable stack builder: Sales engineers assemble any provider combination, save presets, and launch live demos in seconds. A/B testing runs in parallel with real-time metric comparison.

    Evaluation Platform for a Conversational AI Startup - screenshot 2
    Evaluation Platform for a Conversational AI Startup - screenshot 3

    Results

    Metric
    Before
    After
    Providers integrated
    5–6 tested individually
    30+ in unified platform
    Time to configure a new provider
    3–4 hours
    < 30 min
    Latency measurement
    Subjective ("felt fast")
    Sub-millisecond precision
    Provider evaluation cycle
    ~1 week of ad hoc testing
    Same-day side-by-side comparison
    Voice cloning comparison
    Manual, one provider at a time
    Side-by-side across all TTS providers

    "Alex built our evaluation platform from scratch, integrating 30+ AI providers into a single benchmarking tool. He picks up new technologies fast, ships quickly, and regularly flagged things we hadn't thought of yet. I'd work with him again without hesitation."

    Val Avdeenko

    Val Avdeenko

    Co-Founder, Happyverse