Re-identification is not a parlor trick; it’s an industry

Shad Khattab
July 15, 2025

"That’s not drama—it’s a supply chain."

“Anonymized” data isn’t nameless; it’s name-adjacent. Strip out direct identifiers (name, email) and what’s left—ZIP code, birth date, device fingerprints, movement trails, purchase timestamps—still behaves like a fingerprint. Link that “anonymous” fingerprint to a few public crumbs and you’ve got a person. Think of it like guessing your neighbor from three facts: the car they drive, the time they leave, and the dog that hates Thursdays. You don’t need a badge, just cross-reference. Classic research showed how Massachusetts Governor William Weld’s “de-identified” hospital record was linked using voter rolls—ancient history that still lands. 
EPIC
UCB-UMT

The receipts: re-ID works disturbingly well

Mobility traces are unique. A landmark 2013 study found four random spatiotemporal points (where/when you were) uniquely identified 95% of people in a 1.5M-user dataset. Your commute is basically a signature. 
PubMed

Shopping metadata is just as telling. With three months of credit-card records for 1.1M people, four purchases (times/places) re-identified 90% of individuals—even when the data lacked names. 
DSpace@MIT
ResearchGate
Science

Ratings, likes, and niche tastes can out you. Researchers linked “anonymous” Netflix Prize ratings to IMDb activity and identified users—revealing sensitive preferences in the process. Translation: your 2 a.m. documentary binge is not a secret handshake. 
UT Austin CS
arXiv
+1

Old-school demographics are enough. The combo of ZIP + full birth date + gender uniquely identifies the majority of Americans. It’s been replicated, explained, and used as a teaching example for decades. 
EPIC
aboutmyinfo.org
johndcook.com

It’s not theoretical—it leaks into real life

NYC taxi data fiasco (2014): “Anonymized” trip logs let sleuths tie rides to celebrities and even estimate tips by cross-matching paparazzi photos. If you can find Bradley Cooper’s fare, you can find anyone’s. 
Fast Company
Gawker
mathbabe

Strava heatmap (2018 → ongoing cautionary tale): A public fitness “heat map” exposed patrol routes and locations of sensitive military sites worldwide. That wasn’t an exploit; it was default sharing plus easy linkage. 
The Guardian
WIRED
+1

Follow the money: there’s a full market for this

Re-ID isn’t a hobby; it’s how a multi-hundred-billion-dollar data-broker economy stitches profiles together from ad trackers, SDKs, credit headers, geolocation pings, loyalty programs, and public records. Even the U.S. FTC has spent years warning that data brokers compile and sell massive dossiers with minimal transparency. Recent enforcement has targeted location data sellers precisely because those feeds can be linked to sensitive places—clinics, shelters, places of worship—i.e., instant re-identification in context. That’s not “maybe”; that’s the sales pitch. 
Federal Trade Commission
+3
Federal Trade Commission
+3
Federal Trade Commission
+3

If you want a taste of 2025 reality: the FTC is still litigating against Kochava over the sale of precise geolocation data; courts let the case proceed this year, and the agency has already barred other brokers (X-Mode/Outlogic; later, Gravy Analytics and Mobilewalla) from selling sensitive location datasets. Translation: regulators know linking is trivial—and commercial. 
Federal Trade Commission
+1
Hunton Andrews Kurth
The Verge
Reuters

How the sausage gets made (a 60-second schematic)

Collect: SDKs inside everyday apps hoover GPS, Wi-Fi, accelerometer, ad IDs, and more; websites drop cookies and grab browser/device fingerprints.

Clean & stitch: Brokers and ad-tech vendors unify streams using stable keys (MAIDs, hashed emails, credit headers) and unstable ones (behavioral similarities, home/work location).

Enrich: Public records, purchases, and third-party lists get fused to create “audience segments.”

Sell & score: Insurers, marketers, political operatives, “risk intelligence” shops, and—yes—government buyers get access. That’s the industry. Not a magic trick; a pipeline. 
Federal Trade Commission
+1

“But it was anonymized!”—why that promise flops

Uniqueness: Human patterns (movement, shopping, streaming) are sparse and distinctive. You don’t need all the data; just a few anchor points. 
PubMed
DSpace@MIT

Auxiliary data is everywhere: Voter files, property records, social media, breach dumps, paparazzi shots—linkage fuel forever. The Netflix and NYC taxi cases only needed public crumbs. 
UT Austin CS
Fast Company

Anonymization ≠ immunity: Even NIST’s guidance documents catalog repeated failures of naïve de-identification in the wild. “We removed names” is about as protective as removing your license plate and leaving your VIN on the windshield. 
NIST Publications

Why you should care (even if you’re “boring”)

Because decisions get made about you using data like you:

Eligibility & pricing: Insurance, lending, housing, and dynamic pricing systems sort you by patterns, not personality. Re-ID makes those patterns person-level and portable. 
Federal Trade Commission

Safety & stigma: Location linkage to sensitive places enables targeted harassment, stalking, and discrimination. Regulators keep citing exactly these risks when they crack down. 
Federal Trade Commission
+1

Okay, so what do you do?

No need to move to a cabin; just stop being an all-you-can-eat buffet.

Kill easy linkers:

Reset/limit advertising IDs; deny “always” location; turn off precise location for apps that don’t need it.

Use a modern privacy browser with tracker blocking and isolation; install uBlock Origin; separate profiles/containers.

Use email aliases and a password manager; enable MFA/passkeys so one leak doesn’t link everything.

Starve the broker pipeline:

Opt out of major people-finder sites and freeze your credit; it won’t make you invisible, but it lowers the resale value of your profile.

Audit smart devices; put IoT on a separate SSID; use DNS filtering to block the worst telemetry.

Be boring in public: Post on a delay, shrink your audience, and skip broadcasting school/work/home routines. Your future self says thanks.

The New York one-liner version

“Anonymized data is like ‘boneless wings’—rebranded, still chicken.”

“Your commute is a barcode; your shopping run is the price check.”

“If data is the new oil, re-identification is the refinery.”

“You’re not hiding; you’re negotiating—stop giving the other side your notes.”

Bottom line: Re-identification persists because it pays. There’s steady demand, mature tooling, and a regulatory game of whack-a-mole. Treat anonymization promises like umbrella drinks—cute, sweet, and best enjoyed with a healthy dose of skepticism. Then build layers so when your data leaks (and it will), it drips, not floods.

By Shad Khattab August 17, 2025
If their technology was so amazing, they would just call it what it is.
By Shad Khattab August 7, 2025
“If it’s free, you’re not the customer—you’re the side hustle.”
By Shad Khattab July 31, 2025
Zuboff brings the savage receipts
By shad Khattab July 28, 2025
Seriously. Why?
By shad Khattab July 26, 2025
It's time to leave the surveillence complex
By Shad Khattab July 24, 2025
STFU ROUTER!!!!!
By shad Khattab July 23, 2025
This is a subtitle for your new post
By shad Khattab July 23, 2025
For many parents in America and globally sharing their children’s milestones, funny moments became part of life. Until......
By shad Khattab July 20, 2025
And it's just the beginning…