Ahoj Metrics

Your Lighthouse Score Is Only Half the Story

Yuri Tománek — Sun, 22 Feb 2026 04:21:42 GMT

A Lighthouse score of 95 feels great. Until you check what your actual users experience and find that 40% of them are getting a Poor LCP.

How? Because Lighthouse runs in a controlled environment. Fixed CPU, fixed network, no browser extensions, cold cache. Your real users are on old Android phones, congested Wi-Fi, with 12 Chrome extensions installed. The test and reality can be very different.

We just shipped Field Data in Ahoj Metrics to close that gap. You can now look up real Chrome user experience data for any domain or URL, right alongside your Lighthouse audits.

What Is Field Data?

The data comes from Google's Chrome User Experience Report (CrUX). It's an aggregated, anonymised dataset of real performance timings collected from Chrome users who have opted in to sharing usage statistics.

When someone visits your site in Chrome, their browser quietly measures how long things take to load, how quickly the page responds to clicks, and how much the layout shifts around. Google aggregates this data across all opted-in Chrome users and makes it available through the CrUX API.

A few important details about how CrUX works:

28-day rolling window. The data represents the last 28 days of real user visits. No single bad day can spike the numbers. No single good day can hide persistent problems.

75th percentile (p75). The reported value isn't the average. It's the experience of someone at the 75th percentile, meaning 75% of your visitors had a better experience than this number, and 25% had a worse one. This is intentional. Google wants you to optimize for the tail, not the middle.

Good / Needs Improvement / Poor distribution. Every page load gets classified against Google's thresholds. You can see what percentage of your users fall into each bucket. A site might have 80% Good, 12% Needs Improvement, and 8% Poor for LCP. That distribution tells you more than any single number.

Lab Data vs Field Data

This is the core concept. Both are useful. Neither is complete on its own.

Lab data (Lighthouse) tests your site in a controlled environment. Same CPU, same network throttling, same browser config, every time. It's reproducible. It's great for finding issues, comparing before/after a deployment, and running automated tests in CI/CD. But it's synthetic. It doesn't represent any real user.

Field data (CrUX) measures what your actual visitors experience. Real devices, real networks, real browser configurations. It's messy and variable, but it's the truth. It's also what Google uses for Core Web Vitals in Search ranking.

Here's where it gets interesting: these two numbers can disagree significantly.

A site might score 68 on Lighthouse (worrying) but show 85% Good LCP in CrUX (fine in practice). Why? Maybe most of your users are on fast connections with warm caches, so the real experience is better than what the lab predicts.

Or the reverse: a Lighthouse score of 92 (looks great) but only 55% Good LCP in CrUX (a real problem). Maybe your audience skews toward mobile users in regions with slower connectivity, and the lab test doesn't capture that.

Neither number is "right." Lab data tells you what's wrong. Field data tells you the impact. You need both to make good decisions about where to spend your optimization time.

The Five Metrics

Field Data in Ahoj Metrics shows five metrics:

LCP (Largest Contentful Paint) measures how quickly the main content loads. This is usually the hero image, a large heading, or a video thumbnail. Google considers under 2.5 seconds "Good."

INP (Interaction to Next Paint) measures how responsive the page is to user input. When someone taps a button or clicks a link, how long before something visibly happens? Under 200ms is "Good." INP replaced FID (First Input Delay) as a Core Web Vital in 2024.

CLS (Cumulative Layout Shift) measures how much the layout jumps around while loading. You know when you're about to tap a button and an ad loads above it, pushing everything down? That's layout shift. Under 0.10 is "Good."

FCP (First Contentful Paint) measures how quickly the first piece of content appears. Not the main content (that's LCP), just anything: text, an image, the background color. Under 1.8 seconds is "Good."

TTFB (Time to First Byte) measures how quickly the server responds to the browser's request. Under 800ms is "Good."

LCP, INP, and CLS are Google's three Core Web Vitals. These are the metrics that directly feed into Google's search ranking signals. If you can only focus on three things, focus on these.

How to Use It

Go to Field Data in the Ahoj Metrics sidebar. Enter any domain (like https://stripe.com) or a specific URL. Hit Look Up Field Data.

You'll see the p75 value and the Good/Needs Improvement/Poor distribution for all five metrics. Instant results, no audit credits used.

A few things to know:

It works for any public site. You can look up your competitors, your clients, or any site you're curious about. The data is public.

Not every URL has data. CrUX needs a meaningful amount of Chrome traffic to generate a record. If you look up an internal tool, a brand new site, or a low-traffic page, Google won't have data for it. You'll see a clear message when that happens. Origin-level lookups (the whole domain) are more likely to have data than individual URLs.

It's available to all users. Free tier, paid plans, everyone. Field Data lookups don't count against your audit quota. The CrUX API is free from Google, and we saw no reason to gate it.

How This Changes Your Workflow

Before, an Ahoj Metrics workflow looked like this:

Run Lighthouse audit from multiple regions
See scores and recommendations
Fix issues
Run another audit to verify

Now it looks like this:

Check Field Data for a baseline of what real users experience
Run Lighthouse audit from multiple regions to find specific issues
Fix issues
Run another audit to verify the fix
Wait for field data to update (28-day rolling window) to confirm the real-world impact

Field data gives you the "why" behind your optimization work. You're not fixing things because a synthetic test says so. You're fixing things because 30% of your real users are getting a Poor LCP.

Why Not Just Use PageSpeed Insights?

Google's PageSpeed Insights already shows CrUX data. It's free and it works. So why look at it in Ahoj Metrics?

Context. In PSI, field data lives on Google's website, separate from everything else. You look up a URL, see the numbers, close the tab. In Ahoj Metrics, field data lives next to your Lighthouse audits, your monitors, and your historical data. You can see how your lab scores compare to real-world experience for the same site, in the same tool, without switching between tabs.

PSI also doesn't save history, doesn't compare across sites, and doesn't integrate into a monitoring workflow. It's a snapshot tool. Ahoj Metrics is trying to be the place where all your performance data lives together.

Technical Details

For anyone curious about the implementation:

We built a thin Ruby wrapper around the CrUX API (ahojmetrics/crux-api). Results are cached server-side for 12 hours using Solid Cache (PostgreSQL-backed, same as the rest of our infrastructure). Repeat lookups for the same URL are instant.

The API response from Google is verbose. Metric names are long (largest_contentful_paint), CLS comes back as a string float, and the structure is nested. Our serializer normalizes everything into a clean JSON shape with short keys (lcp, inp, cls) that the frontend can work with easily.

Authentication is the same as every other Ahoj endpoint. Standard JWT/session auth, no separate API key needed.

What's Next

Field Data is a lookup tool today. You search for a URL and see the current CrUX data. We're thinking about:

Historical field data tracking. Store CrUX snapshots over time so you can see trends, not just the current 28-day window.
Field data alongside monitors. When your automated Lighthouse monitor runs, also pull the CrUX data for that URL and display them together.
Field vs lab comparison view. A side-by-side showing your Lighthouse lab metrics and CrUX field metrics for the same URL, highlighting where they agree and where they diverge.

If any of those would be particularly useful to you, I'd love to hear about it.

Try It

Sign in to Ahoj Metrics and go to Field Data in the sidebar. Look up your own site, look up your competitors, look up anything. No credits used, no limits.

If you don't have an account, the free tier gives you 20 Lighthouse audits per month plus unlimited Field Data lookups.

Ahoj Metrics is a performance monitoring tool that runs Lighthouse audits from 18 global regions and now shows real Chrome user experience data via CrUX. Built with Rails 8, Solid Queue, and Fly.io.

How We Run Lighthouse from 18 Regions in Under 2 Minutes

Yuri Tománek — Sat, 14 Feb 2026 11:58:48 GMT

Most performance monitoring tools test your site from one location, or run tests sequentially across regions. That means testing from 18 locations can take 20+ minutes.

We needed something faster. Ahoj Metrics tests from 18 global regions simultaneously in about 2 minutes. Here's how.

The Architecture

The core idea is simple: don't keep workers running. Spawn them on demand, run the test, destroy them.

We use Fly.io's Machines API to create ephemeral containers in specific regions. Each container runs a single Lighthouse audit, sends the results back via webhook, and destroys itself.

Here's how a request flows through the system:

sequenceDiagram
    actor User
    participant App as Rails App
    participant DB as PostgreSQL
    participant SQ as Solid Queue
    participant Fly as Fly API
    participant W1 as Worker (Sydney)
    participant W2 as Worker (London)
    participant W3 as Worker (Tokyo)

    User->>App: Run Audit (3 regions)
    App->>DB: Create ReportRequest
    App->>DB: Create 3 Report records

    App->>SQ: Enqueue spawn jobs

    par Spawn workers simultaneously
        SQ->>Fly: POST /machines (Sydney)
        Fly->>W1: Boot container
        SQ->>Fly: POST /machines (London)
        Fly->>W2: Boot container
        SQ->>Fly: POST /machines (Tokyo)
        Fly->>W3: Boot container
    end

    par Run Lighthouse in parallel
        W1->>W1: Run Lighthouse (~2 min)
        W2->>W2: Run Lighthouse (~2 min)
        W3->>W3: Run Lighthouse (~2 min)
    end

    par Report results back
        W1->>App: POST results (webhook)
        Note right of W1: Auto-destroys
        W2->>App: POST results (webhook)
        Note right of W2: Auto-destroys
        W3->>App: POST results (webhook)
        Note right of W3: Auto-destroys
    end

    App->>DB: Update Reports
    App->>DB: Aggregate stats on ReportRequest
    App->>User: Dashboard updated

The key design decision: one audit = one ReportRequest, regardless of how many regions you test. Test from 1 region or 18 - it's the same user action.

Spawning Machines with the Fly.io API

Here's the actual code that creates a machine in a specific region:

class FlyMachinesService
  API_BASE_URL = "https://api.machines.dev/v1"

  def self.create_machine(region:, env:, app_name:)
    url = "#{API_BASE_URL}/apps/#{app_name}/machines"

    body = {
      region: region,
      config: {
        image: ENV.fetch("WORKER_IMAGE", "registry.fly.io/am-worker:latest"),
        size: "performance-8x",
        auto_destroy: true,
        restart: { policy: "no" },
        stop_config: {
          timeout: "30s",
          signal: "SIGTERM"
        },
        env: env,
        services: []
      }
    }

    response = HTTParty.post(
      url,
      headers: headers,
      body: body.to_json,
      timeout: 30
    )

    if response.success?
      Response.new(success: true, data: response.parsed_response)
    else
      Response.new(
        success: false,
        error: "API error: #{response.code} - #{response.body}"
      )
    end
  end
end

A few things worth noting:

auto_destroy: true is the magic. The machine cleans itself up after the process exits. No lingering containers, no zombie workers, no cleanup cron jobs.

performance-8x gives us 4 vCPU and 8GB RAM. Lighthouse is resource-hungry - it runs a full Chrome instance. Underpowered machines produce inconsistent scores because Chrome competes for CPU time. We tried smaller sizes and the variance was too high.

restart: { policy: "no" } means if Lighthouse crashes, the machine just dies. We handle the failure on the Rails side by checking for timed-out reports.

services: [] means no public ports. The worker doesn't need to accept incoming traffic. It runs Lighthouse and POSTs results back to our API. That's it.

The Worker

Each Fly.io machine runs a Docker container that does roughly this:

Read environment variables (target URL, callback URL, report ID)
Launch headless Chrome
Run Lighthouse audit
POST the JSON results back to the Rails API
Exit (machine auto-destroys)

The callback is a simple webhook. The worker doesn't need to know anything about our database, user accounts, or billing. It just runs a test and reports back.

Handling Results

On the Rails side, each Report record tracks its own status:

class ReportRequest < ApplicationRecord
  has_many :reports

  def check_completion!
    return unless reports.all?(&:completed?)

    update!(status: "completed")
    update_cached_stats!
    check_monitor_alert if site_monitor.present?
  end
end

When a worker POSTs results, the corresponding Report is updated. After each update, we check if all reports for the request are done. If so, we aggregate the results, calculate averages, and update the dashboard.

Each report is independent. If the Sydney worker fails but the other 17 succeed, you still get 17 results. The failed region shows as an error without blocking everything else.

Cost Math

This is the part that makes ephemeral workers compelling. Compare two approaches:

Persistent workers (18 regions, always-on):

18 performance-8x machines running 24/7
Based on Fly.io's pricing calculator: ~$2,734/month
Mostly sitting idle waiting for audit requests

Ephemeral workers (our approach):

Machines run for ~2 minutes per audit
performance-8x costs roughly $0.0001344/second
One 18-region audit costs about $0.29
100 audits/month = ~$29

At low volume, ephemeral is dramatically cheaper. The crossover point where persistent workers become more cost-effective is well beyond our current scale.

The tradeoff is cold start time. Each machine takes a few seconds to boot. For our use case (users expect a 1-2 minute wait anyway), that's invisible.

The Background Job Layer

We use Solid Queue (Rails 8's built-in job backend) for everything. No Redis, no Sidekiq.

# config/recurring.yml
production:
  monitor_scheduler:
    class: MonitorSchedulerJob
    queue: default
    schedule: every minute

The MonitorSchedulerJob runs every minute, checks which monitors are due for testing, and kicks off the Fly.io machine spawning. Monitor runs are background operations - they don't count toward the user's audit quota.

This keeps the architecture simple. One PostgreSQL database handles the queue (via Solid Queue), the application data, and the cache. No Redis to manage, no separate queue infrastructure to monitor.

What We Learned

Lighthouse needs consistent resources. When we first used shared-cpu machines, scores would vary by 15-20 points between runs of the same URL. Bumping to performance-8x brought variance down to 2-3 points. The extra cost per audit is worth the consistency.

Timeouts need multiple layers. We set timeouts at the HTTP level (30s for API calls), the machine level (stop_config timeout), and the application level (mark reports as failed after 5 minutes). Belt and suspenders.

Region availability isn't guaranteed. Sometimes a Fly.io region is temporarily unavailable. We handle this gracefully - the report for that region shows an error, but the rest of the audit completes normally.

Webhook delivery can fail. If our API is temporarily unreachable when the worker finishes, we lose the result. We're adding a retry mechanism and considering having workers write results to object storage as a fallback.

The Numbers

After running this in production since January 2026:

Average audit time: ~2 minutes (single region or all 18)
P95 audit time: ~3 minutes
Machine boot time: 3-8 seconds depending on region
Success rate: ~97% (3% are timeouts or region availability issues)
Cost per audit: $0.01-0.29 depending on regions selected

Try It

You can test this yourself at ahojmetrics.com. Free tier gives you 20 audits/month - enough to see how your site performs from Sydney, Tokyo, Sao Paulo, London, and more.

If you have questions about the architecture, ask in the comments. Happy to go deeper on any part of this.

Built with Rails 8.1, Solid Queue, Fly.io Machines API, and PostgreSQL. Frontend is React + TypeScript on Cloudflare Pages.