Re-identification is not a parlor trick; it’s an industry
"That’s not drama—it’s a supply chain."
“Anonymized” data isn’t nameless; it’s name-adjacent. Strip out direct identifiers (name, email) and what’s left—ZIP code, birth date, device fingerprints, movement trails, purchase timestamps—still behaves like a fingerprint. Link that “anonymous” fingerprint to a few public crumbs and you’ve got a person. Think of it like guessing your neighbor from three facts: the car they drive, the time they leave, and the dog that hates Thursdays. You don’t need a badge, just cross-reference. Classic research showed how Massachusetts Governor William Weld’s “de-identified” hospital record was linked using voter rolls—ancient history that still lands.
EPIC
UCB-UMT
The receipts: re-ID works disturbingly well
Mobility traces are unique. A landmark 2013 study found four random spatiotemporal points (where/when you were) uniquely identified 95% of people in a 1.5M-user dataset. Your commute is basically a signature.
PubMed
Shopping metadata is just as telling. With three months of credit-card records for 1.1M people, four purchases (times/places) re-identified 90% of individuals—even when the data lacked names.
DSpace@MIT
ResearchGate
Science
Ratings, likes, and niche tastes can out you. Researchers linked “anonymous” Netflix Prize ratings to IMDb activity and identified users—revealing sensitive preferences in the process. Translation: your 2 a.m. documentary binge is not a secret handshake.
UT Austin CS
arXiv
+1
Old-school demographics are enough. The combo of ZIP + full birth date + gender uniquely identifies the majority of Americans. It’s been replicated, explained, and used as a teaching example for decades.
EPIC
aboutmyinfo.org
johndcook.com
It’s not theoretical—it leaks into real life
NYC taxi data fiasco (2014): “Anonymized” trip logs let sleuths tie rides to celebrities and even estimate tips by cross-matching paparazzi photos. If you can find Bradley Cooper’s fare, you can find anyone’s.
Fast Company
Gawker
mathbabe
Strava heatmap (2018 → ongoing cautionary tale): A public fitness “heat map” exposed patrol routes and locations of sensitive military sites worldwide. That wasn’t an exploit; it was default sharing plus easy linkage.
The Guardian
WIRED
+1
Follow the money: there’s a full market for this
Re-ID isn’t a hobby; it’s how a multi-hundred-billion-dollar data-broker economy stitches profiles together from ad trackers, SDKs, credit headers, geolocation pings, loyalty programs, and public records. Even the U.S. FTC has spent years warning that data brokers compile and sell massive dossiers with minimal transparency. Recent enforcement has targeted location data sellers precisely because those feeds can be linked to sensitive places—clinics, shelters, places of worship—i.e., instant re-identification in context. That’s not “maybe”; that’s the sales pitch.
Federal Trade Commission
+3
Federal Trade Commission
+3
Federal Trade Commission
+3
If you want a taste of 2025 reality: the FTC is still litigating against Kochava over the sale of precise geolocation data; courts let the case proceed this year, and the agency has already barred other brokers (X-Mode/Outlogic; later, Gravy Analytics and Mobilewalla) from selling sensitive location datasets. Translation: regulators know linking is trivial—and commercial.
Federal Trade Commission
+1
Hunton Andrews Kurth
The Verge
Reuters
How the sausage gets made (a 60-second schematic)
Collect: SDKs inside everyday apps hoover GPS, Wi-Fi, accelerometer, ad IDs, and more; websites drop cookies and grab browser/device fingerprints.
Clean & stitch: Brokers and ad-tech vendors unify streams using stable keys (MAIDs, hashed emails, credit headers) and unstable ones (behavioral similarities, home/work location).
Enrich: Public records, purchases, and third-party lists get fused to create “audience segments.”
Sell & score: Insurers, marketers, political operatives, “risk intelligence” shops, and—yes—government buyers get access. That’s the industry. Not a magic trick; a pipeline.
Federal Trade Commission
+1
“But it was anonymized!”—why that promise flops
Uniqueness: Human patterns (movement, shopping, streaming) are sparse and distinctive. You don’t need all the data; just a few anchor points.
PubMed
DSpace@MIT
Auxiliary data is everywhere: Voter files, property records, social media, breach dumps, paparazzi shots—linkage fuel forever. The Netflix and NYC taxi cases only needed public crumbs.
UT Austin CS
Fast Company
Anonymization ≠ immunity: Even NIST’s guidance documents catalog repeated failures of naïve de-identification in the wild. “We removed names” is about as protective as removing your license plate and leaving your VIN on the windshield.
NIST Publications
Why you should care (even if you’re “boring”)
Because decisions get made about you using data like you:
Eligibility & pricing: Insurance, lending, housing, and dynamic pricing systems sort you by patterns, not personality. Re-ID makes those patterns person-level and portable.
Federal Trade Commission
Safety & stigma: Location linkage to sensitive places enables targeted harassment, stalking, and discrimination. Regulators keep citing exactly these risks when they crack down.
Federal Trade Commission
+1
Okay, so what do you do?
No need to move to a cabin; just stop being an all-you-can-eat buffet.
Kill easy linkers:
Reset/limit advertising IDs; deny “always” location; turn off precise location for apps that don’t need it.
Use a modern privacy browser with tracker blocking and isolation; install uBlock Origin; separate profiles/containers.
Use email aliases and a password manager; enable MFA/passkeys so one leak doesn’t link everything.
Starve the broker pipeline:
Opt out of major people-finder sites and freeze your credit; it won’t make you invisible, but it lowers the resale value of your profile.
Audit smart devices; put IoT on a separate SSID; use DNS filtering to block the worst telemetry.
Be boring in public: Post on a delay, shrink your audience, and skip broadcasting school/work/home routines. Your future self says thanks.
The New York one-liner version
“Anonymized data is like ‘boneless wings’—rebranded, still chicken.”
“Your commute is a barcode; your shopping run is the price check.”
“If data is the new oil, re-identification is the refinery.”
“You’re not hiding; you’re negotiating—stop giving the other side your notes.”
Bottom line: Re-identification persists because it pays. There’s steady demand, mature tooling, and a regulatory game of whack-a-mole. Treat anonymization promises like umbrella drinks—cute, sweet, and best enjoyed with a healthy dose of skepticism. Then build layers so when your data leaks (and it will), it drips, not floods.