Skip to content
Back to blog

Case Study

When 'Safe for Your Dog' Beats 'pH 7.2': Designing AI for Real Users

Alexander Snyder6 min

The data pipeline that powers CheckYourTap has been essentially the same since the second iteration. Real-time water quality readings, normalized by ZIP code, with anomaly detection running against historical baselines.

The first four product iterations failed. Not because the data was wrong. Because the product was answering the wrong question.

The question we were actually answering

The question we were answering through the first four iterations: "What does the water quality data show?"

The question users were actually asking: "Is my water safe?"

These seem like they should produce the same answer. They don't.

"What does the water quality data show?" produces pH measurements, turbidity readings, conductivity levels, bacterial indicators, and trend lines. Accurate, technically impressive, and completely meaningless to someone standing at their kitchen tap.

"Is my water safe?" requires knowing what safe means to that specific person, in their specific situation, with their specific concerns. For a new parent, it means: is this water safe for formula? For someone with a dog, it means: can my dog drink this? For someone on a restricted diet, it means: does this water interact with anything I'm managing medically?

The data pipeline couldn't answer "is my water safe?" without knowing what "safe" meant to each person. We had to build that layer.

The five iterations

Iteration 1: Infrastructure intelligence. We built a leak detection product that used operational data to identify potential water main issues before they became emergencies. Technically elegant. Users don't think about water mains. They think about the water that comes out of their tap. Zero emotional resonance.

Iteration 2: Contamination alerts. We pivoted to real-time contamination detection, bacteria, chemicals, regulated compounds. The framing was: we'll tell you when there's a problem before you find out through official channels.

The problem: telling someone there's potential bacterial contamination in their water supply produces panic, not engagement. The correct emotional response to "your water may be contaminated" is to stop drinking the water immediately. That's not a relationship you can build a product around.

Iteration 3: Lab testing. We positioned the product as a health checkup for water, proactive testing on a schedule, like getting a physical. We added softer language, cleaner design, less alarming terminology.

The problem was speed. Lab results take days. By the time results arrived, the curiosity that prompted someone to request a test had completely dissipated. Engagement rates were near zero.

Iteration 4: Real-time dashboard. We removed the lab testing entirely and built a clean dashboard showing real-time measurements. pH, turbidity, conductivity, updated continuously, displayed clearly.

Beautiful data. Still meaningless to anyone who didn't know what the numbers meant. A pH of 7.2 means something to someone with a chemistry background and nothing to everyone else.

Iteration 5: Personalized context. The version that worked combined real-time data with contextual translation:

  • "Water quality in your area is within safe ranges for pets and infants." (Not raw measurements.)
  • "Seasonal runoff in your region typically affects turbidity during May-June. Here's what that means for your household." (Not just a turbidity number.)
  • "Recent measurements show no bacterial indicators above action thresholds." (Not the indicator values themselves.)

The system now processes 50,000+ calls a month.

What changed and what didn't

The data pipeline is essentially identical to iteration one. The measurement sources are the same. The update frequency is the same. The anomaly detection logic evolved, but the fundamental architecture is unchanged.

What changed: the translation layer. The system no longer outputs measurements. It outputs meaning.

The translation is personalized by ZIP code, which allows the system to provide context specific to the regional water source, seasonal patterns, and local treatment characteristics. Someone in a rural area with a private well gets different context than someone in an urban area on a municipal system. Same data infrastructure, different output.

The design principle

Every AI system produces some output: a recommendation, a measurement, a prediction, an alert. The assumption most builders make is that output quality is determined by the quality of the underlying model or data.

It's not. Output quality is determined by whether the output is interpretable and actionable by the specific person receiving it.

A pH reading of 7.2 is highly accurate data that most people cannot interpret. "Your water quality is within normal parameters" is lower information density but creates a usable mental model.

The design principle: identify the decision or action the user needs to take, then design the output to support that decision directly. Don't output data that requires the user to apply expertise they don't have.

This sounds obvious. It's consistently the last thing builders think about, because builders have the expertise that users don't. When you understand pH, outputting pH values seems natural. When you don't, when you just want to know if you should give this to your dog, the pH value is noise.

The empathy problem in AI product design

The harder version of this challenge: users often can't tell you what they actually need.

If you ask someone what they want from a water quality product, they'll say: accurate data, real-time updates, easy to use. These are inputs. What they actually need is a clear answer to a specific safety question that maps to their household situation, delivered in terms they don't need to translate.

Getting from "accurate data" to "is this safe for my dog?" requires not just asking users what they want, but observing how they actually behave. Users who requested a test and never opened the results weren't satisfied with the product. They were demonstrating that the product had failed to create urgency. The iteration that produced 50,000+ calls a month created urgency because it made the safety question concrete and personal.

You find the right framing by building the wrong ones, watching what happens, and pivoting until users actually engage with the product the way you need them to. Five iterations is not unusual. It's the number it takes for some problems.


PurviewX builds AI products that work for real users, not just technically. Start a conversation.