Back to blog

What 'Privacy-First' Actually Means in Your Analytics Stack

The phrase is everywhere, but most tools using it still store identifiers, fingerprints, or hashed emails. Here is what a technically sound privacy-first data model looks like.

"Privacy-first" has become a marketing phrase. Tools that use it range from those that genuinely store no personal data to those that simply moved their cookies to server-side storage and called themselves compliant. To evaluate the claim, you need to look at three specific things: what is collected, how it is stored, and whether it can be reversed.

Level 1: What gets collected

Every analytics tool collects something. The question is whether any of it qualifies as personal data under GDPR, LGPD, or CCPA. The following are generally safe — they are aggregate or non-identifiable on their own:

  • Page URL and path
  • Referrer domain (not the full URL)
  • Country (inferred from IP at the edge, never stored)
  • Device type (mobile or desktop, inferred from screen width and broad User-Agent patterns)
  • Browser family (Chrome, Safari, Firefox, and other coarse families derived server-side from the request User-Agent)

What makes these safe is that none of them, individually or combined, reliably identifies a specific person. A referrer domain tells you that someone came from Hacker News — not which user of Hacker News they are.

The line is crossed when you start storing IP addresses, full user agents, device fingerprints, or any kind of persistent identifier — even a hashed one that you retain across sessions.

Level 2: How data is stored

Collecting data safely is different from storing it safely. Many tools claim they don't use cookies, then store a visitor ID in a server-side session tied to an IP address. The IP address is personal data under GDPR. The fact that the cookie moved server-side doesn't change what's being tracked.

A genuinely privacy-first store contains only what's listed above — and a visitor identifier that cannot be linked back to any individual. Monoid's approach is a daily one-way hash:

visitor_hash = SHA-256(IP + UA + SALT_SECRET + YYYY-MM-DD)

Three properties make this safe:

One-way: SHA-256 is not reversible. You cannot recover the IP address from the hash. Salted: The server-side SALT_SECRET means the hash cannot be rainbow-table attacked even if the algorithm is known. Daily: The date in the input means the same visitor produces a different hash tomorrow. There is no persistent cross-session identifier.

The hash is not useful for tracking a person over time. It is only useful for de-duplicating visitors within a single day, which is the only thing it needs to do.

Level 3: Can it be reversed?

This is the test that separates genuine privacy-first tools from marketing claims. If a sufficiently motivated adversary — including a government with a legal order — obtained your analytics database, what could they learn?

With Monoid's data model: they could learn which pages were visited, from which countries, on which devices, and on which days. They could not learn which specific individual visited any specific page. The hash tells them nothing without the original IP, the original user agent, the secret salt, and the correct date — all of which are never stored together.

Compare this to "anonymized" GA4 data, which retains client IDs (persistent cookie-based identifiers), event timestamps with millisecond precision, and device fingerprint components. That data is not anonymous — it is pseudonymous at best, and linkable to real users with moderate effort.

What the database actually looks like

A Monoid pageview record contains: site_id, path, referrer, country, device, browser_family, visitor_hash (the one-way daily hash), and a timestamp. That is the complete record. There is no IP address column, full User-Agent string, browser version, persistent user ID, or session token. There is nothing in the schema that maps to a real person.

That is what privacy-first looks like at the data model level. Everything else — dashboards, real-time counts, country breakdowns — is computed from those fields.

Why the distinction matters in practice

If your analytics tool stores personal data, you are a GDPR data controller with obligations: you must publish a lawful basis for processing, maintain records of processing activities, and respond to data subject access requests. You also need a consent mechanism if your lawful basis is consent.

If your analytics tool stores only non-personal aggregate data, those obligations don't apply — because there is no personal data to control. The legal overhead disappears along with the consent banner.