← All PostsHome ↑Contact →
System DesignMarch 17, 2024

Distributed Data Architectures

The Fallacy of the Single Source of Truth

DatabaseArchitectureSystem DesignData SystemsData

Distributed Data Architectures

The Fallacy of the Single Source of Truth

There's a belief that runs deep in data culture: if you just build the right warehouse, model the right schema, establish the right governance, you'll arrive at a Single Source of Truth. One authoritative place where the business can go to know what's real.

It's a compelling idea. It's also fiction.

Not because the goal is wrong. Because the architecture required to achieve it in the real world makes it impossible before you even finish the sentence. We've been selling organizations a destination that doesn't exist, and then billing them hourly when they can't find it.

The Hidden Topology

Here's what most "simple" data stacks actually look like under the hood.

A transactional database. A reporting replica of that database. A nightly export to the warehouse. A copy in the BI layer. A third-party SaaS tool with its own copy of customer records. An event stream that may or may not be caught up. A cache somewhere that hasn't been invalidated since last Thursday.

Every one of those is a copy of truth. Every one starts drifting the moment data is written.

The SSoT was never a Single Source. It was the primary source in a distributed system that nobody formally named. That's a very different thing. Pretending otherwise is how you end up with four teams confidently citing four different revenue numbers in the same meeting, all of them technically correct, none of them the same number.

I've sat in that meeting. It's not a data quality problem. It's an architectural honesty problem.

Why We Keep Building It Wrong

The fantasy of centralization is seductive because it feels like rigor. One place. One number. One version of events. Governance people love it. Executives love it. It sounds like control.

What it actually is, is a naming convention applied to a distributed system and then never revisited. You called it a warehouse. You called it the source of truth. You moved on. Nobody went back to draw what the system actually looked like six months later after three new integrations got bolted on.

And here's the part that stings: the distribution wasn't an accident. You needed the replica so reporting didn't hammer production. You needed the SaaS tool because the product team picked it. You needed the cache because latency was unacceptable otherwise. Every copy existed for a reason. The problem wasn't the copies. The problem was calling the whole thing a Single Source and then being surprised when it behaved like a distributed system.

The More Useful Question

The goal shouldn't be eliminating distribution. That ship sailed the moment you added a second system. The goal is designing for it deliberately instead of discovering it during an incident at 11pm on a Tuesday.

Stop asking "where is the truth?" Start asking "what are the consistent contracts between my truths?"

Which systems need to be in sync, within what window, and what actually breaks if they aren't? That's the real architectural conversation. Not "how do we centralize everything" but "how do we make the boundaries explicit so we stop lying to ourselves about what we've built."

Eventual consistency isn't a failure mode. It's a reality you're either designing around or ignoring. The systems that hold up over time are the ones where someone made those tradeoffs visible instead of papering over them with a marketing term.

What This Actually Changes

Mostly it changes the conversation. Which turns out to matter a lot.

When you stop pretending you have a Single Source of Truth, you start asking better questions about staleness windows, about which downstream system owns reconciliation, about what "correct" even means in the context of a specific query. Those are the questions that actually improve your architecture.

The teams I've seen do this well don't have simpler stacks. They just have honest ones. They know what they built. They know where the seams are. And when something breaks, they're not surprised by the shape of the failure.

The SSoT is a fine north star. As a forcing function it keeps you from proliferating data carelessly. But as a description of reality it's a comfortable lie most orgs are still telling themselves.

TLDR... Stop calling your data systems a single source of truth and just say "We've got about 100 sources of truth converged into one point...". At least then everyone in the room knows what you actually built.

Distributed Data Architectures - Tyler Fletcher