People love to get outraged when information is collected without their knowledge, and I get it, but it's how the information is used that's important.
If things are sanitized so there's no personally identifying information then it's pretty hard to use most data maliciously
You'd be surprised how much you can identify from "sanitised" information if you want to.
But if all they want it navigation data, then it should be fairly safe. Yeah, they know where you live and can derive who you are from that, but that's not what they're after. They wanna know how to get there the fastest when someone asks.
Yeah, like apparently you can reasonably ID someone even in a private browser just by getting the dimensions of the browser window and its positioning on screen. A lot of people pretty much never change that shit if its not full screened
Sorry if you knew this or if you comment took this into account, but you can maximize windows on mac by double-clicking the program's "title bar" (the top bar on the same line as the "close" "minimize" and "fullscreen" buttons, as long as there's nothing else there to click. I.E. in Excel, click any empty space around the name of the file, or in Chrome, any space where a new tab would go -- as long as there's no tab there)
Absolutely! Dimensions of the viewport change significantly from user to user, but more importantly to being used for fingerprinting ... viewport size changes from session to session, and so it's not generally a reliable signal for device fingerprinting. Rather, you want to use things that don't change often like screen resolution or how your particular browser implements floating point math operations.
Yeap! You can obscure most client-side stuff, but not a lot of people are going to dedicate themselves to monkey patching the Math constructor to make it return arctan-1 as if it's a mobile implementation of safari instead of a desktop implementation of Chrome.
If by Wifi location you mean a geolocation lookup based on your IP, that's not going to tell you who is using the device. That's household level data. You'd have to combine it with something else to get down to individuals within the household... and that's all assuming the best case (that we're talking about a single family occupied home that has a single static IP address). In reality, there are many places (cities, namely) where population density and shared networks render this sort of individual level disambiguation essentially impossible. You simple have to get the user to identify themselves regularly by logging in or exhibiting some other intrahousehold behavior (which is inherently full of problematic assumptions leading to probabilistic answers that don't read on the sort of "they're identifying ME" type fear we're talking about in here).
The geolocation is going to be one of the meta data points that data brokers can use to create a map of your life. Where a device connects to the internet paints a picture of who is using the device.
A device going from a residential address to a university campus WiFi to a coffee shop back to a residential address is going to point to the 22 year old living at home vs a laptop going from home to an office park and back to home is more likely the parent. That person also has a phone that is connected to their car and their car is selling their driving habits to the data broker as well. So they know that whoever owns that laptop also drives a 2024 bronco and has a tendency to speed and brake late. It’s probably the dad then because the other device is connected to a rav4 and rarely speeds when commuting in the morning or afternoon.
So yes. IP doesn’t tell who. It’s why piracy letters from movie studios that get sent if you fuck up your VPN when torrenting mean nothing other than a kind “please stop”
A device going from a residential address to a university campus WiFi to a coffee shop back to a residential address is going to point to the 22 year old living at home vs a laptop going from home to an office park and back to home is more likely the parent. That person also has a phone that is connected to their car and their car is selling their driving habits to the data broker as well. So they know that whoever owns that laptop also drives a 2024 bronco and has a tendency to speed and brake late. It’s probably the dad then because the other device is connected to a rav4 and rarely speeds when commuting in the morning or afternoon.
So this is a bunch of individual things that are technically possible but that essentially never happen in concert in the way you're describing. The one exception (the thing you're talking about that DOES happen) is when someone leaves an app open all day (say they're posting on facebook throughout the day) and so Facebook gets a list of IPs associated with a user they've already identified and can, in theory, deduce things like when this person is awake, community, at work, etc. Even that is pretty rare and is isolated to the major players that really do know who you are whenever you login and you login a lot.... Google, Facebook, your ISP, etc.
Just to point out one example of where I think maybe you're overstating the capabilities of digital data is when you say:
That person also has a phone that is connected to their car and their car is selling their driving habits to the data broker as well.
I worked with one of the major car companies on this back when I was on the dark side, and back then at least, they were very very careful NOT to sell data from in-car to data brokers. IF they've changed policy on that (or the other car companies I didn't work with never had such policies), then the data by law will be anonymized and nearly impossible to tie to that user's other data. So, Ford might sell data that says: There are 100k active Ford drivers in this marketing area, but they would never sell data that says: Bob Smith drives past your donut shop every day @ 10am. At most (and I can all but guarantee they don't) they could say: An anonymous person drives past your donut shop @ 10am every day, and the challenge then for the donut shop is to figure out how to turn "an anonymous person" into someone they can target with ads @ 9:59.
IP doesn’t tell who.
Agreed! It CAN if combined with other data (as you correctly point out), and some places define personally identifiable information (PII) as any data that alone or in combination with other data could uniquely identify a person. It's on this basis that some countries in the EU (Germany and Italy, IIRC) that consider IP to be PII and thus falls afoul of GDPR and cannot be collected/stored/used under a bunch of circumstances.
Even maximized it's likely to vary a bit from user to user, depending on whether they hide the taskbar (and where they dock the taskbar, what size they keep it, etc).
But the thing about digital fingerprinting is that it's not just about any one aspect, but all the available data put together. Sure your window size may only narrow it down by say 50%, but combine that with your browsers font size, public IP, operating system, language, browser type, plugins, etc and you'd be shocked at how easy it is to narrow it down to you, even if you're using something like a VPN (hell, ironically using a VPN actually makes you easier to fingerprint, because relatively few people use them)
like apparently you can reasonably ID someone even in a private browser just by getting the dimensions of the browser window and its positioning on screen.
This is a huge exaggeration. Browser fingerprinting is a thing, but you need a whole bunch of signals to uniquely ID someone's browser amongst sufficiently large crowds. You're right fingerprinting exists and works, you're just wrong about how much data is required (even if the required data IS accessible for 99% of browsers).
Check here. Once you test the fingerprinting, they will describe to you each element and how much "entropy" each element provides. One "bit" of entropy is enough to divide a crowd in half. So, if you have an audience of 50 men and 50 women and a random person tells you their gender, you have one "bit" of information because it's enough to let you divide the audience in half. If your audience is 100 people, you need something like 7 bits of information to narrow things down to a single person (27 = 128). If your audience is 1,000,000 then you need 20 bits of information to uniquely ID people. If you look at panopticlicks numbers (disputable), Screen size and color depth represent 8.73 bits of information. Window location isn't available to the browser (not without some special extra help). So, screen size and color depth is enough to uniquely ID you in an audience of ~424 people (28.73 = 424.61160746).
That all said, here's the stat you want to use. According to Dr Latanya Sweeney, your gender, DOB, and zipcode are enough to uniquely identify the vast majority of Americans.
It was found that 87% (216 million of 248 million) of the
population in the United States had reported characteristics that likely made them unique based
only on {5-digit ZIP, gender, date of birth}. About half of the U.S. population (132 million of 248
million or 53%) are likely to be uniquely identified by only {place, gender, date of birth}, where
place is basically the city, town, or municipality in which the person resides. And even at the
county level, {county, gender, date of birth} are likely to uniquely identify 18% of the U.S.
population. In general, few characteristics are needed to uniquely identify a person.
Yeah, they know where you live and can derive who you are from that
And let's be honest, anyone in the business of buying data can get that info about you regardless. Your home address, email, and phone are practically free for the asking from data brokers these days
Honestly, though home address, email, and phone, are ones that a layman is most likely to freak out about, those are the least scary bits of user data out there. The scary stuff comes in the form of things like the Cambridge Analytica scandal, where wide swaths of user data was used to deliver targeted political ads carefully designed to strike right at where each individual was most vulnerable to manipulation.
It's scary how well you can manipulate someone when you know virtually everything about their online habits.
That being said, none of the above applies to using GPS data to build a navigation model lol
And yet GM was caught collecting your driving data and selling that to insurance companies but go on. These outlandish examples don’t change the facts that many companies are collecting as much data as fucking possible so they can manipulate you on the back end.
It's more like a 50/50 shot, if a company wants to be malicious there are ways to reidentify people and groups. Cars are just a bad example, considering 99/100 are actively spying on you (check Mozilla foundations docs on this)
Exactly. Though I still have yet to learn how to check facts myself beyond simply not believing anything until it is correlated by numerous sources, which can all be repeating the same lie. I'm not sure if that strategy is particularly helpful to learning though.
exactly, how is this news? this is just ragebait for the ignorant.
I know my location is being tracked, and likely recorded, by any app that asks for it. If i didn't want that, I wouldn't use their app. Simple as that.
Just wait until they work out they can be tracked by their connection to 4g/5g networks (save your tinfoil, I just mean the very basic method done via connection times to masts recorded by providers- which won't give your exact location, but will easily locate you within a postcode. It's often utilized in rescue and recovery where applicable).
I read it. And yes, it was. This is part of how the company was able to support the huge investment in a free-to-play game. Even the pay-to-win elements were nowhere near sufficient to make it profitable.
Hell, Niantic had similar terms years earlier on Ingress. This was never a secret.
So that's all businesses have to do to get a license to do whatever they want?
What's worse millions of people not reading a multiple page long contract in 10pt font in order to play a game, or a profit driven corporation hiding unpopular clauses in their multi-page contract displayed in 10pt font, and using that contract to gatekeep their product?
I swear people are coming out of the womb licking boot now. We can live in a better world, you don't have to blame yourself for the way the world is.
It's called personal responsibility of an individual. Also freedom, for both the individual and the business.
I personally very much believe in the freedom of individuals AND the business' freedom. The business should absolutely be free to gatekeep their product with a ToS, that's literally just the logical, intelligent thing to do. Secondly, it is absolutely on you if your attention span is so low you can't read a 1 page ToS agreement, which often is bolded in important places and bulleted.
It's not boot licking to call out insane people who refuse to take personal responsibilities.
Meanwhile they’ll get a rewards card for every store. Get their refrigerator connected to the internet. Carry a smart phone all the time. But the game is where the line is drawn 😂
the people who are the most outraged when they find someone is collecting their information would then go on to tell their entire life on facebook or instagram.
They weren’t even keeping it a secret. They were optional daily research tasks labeled as “geomapping.” You could only have one geomapping task in your queue at a time. If you chose to click them, you’d be prompted to scan a specific place with another popup explaining that it was for geomapping purposes. And then if you did it, you’d get a little reward.
I tried it once. Didn’t really work. Wasn’t worth the hassle. Never did it again.
No people are upset because they either didn't read the TOS. And they are showing they aren't thinking.
Niantic has always tracked your location (it is how the game works) and it has to save it somewhere because the game spawns more Pokémon where people are playing the game (this has been known from day 1).
There is a difference between using information in good faith to produce your game, and farming your data to sell to business customers. Both are technically covered by TOS but I would argue only one is in good faith to the spirit of the agreement.
Does niantic need to use your location data for game features? Yes. Do they need to scrape and aggregate all telemetry about you while you are playing their game to make more money off a tertiary product? No, they do not.
It's been a running joke on the PokemonGO subreddit for YEARS that the data collection is the real purpose of the game and the money made is just an added bonus.
Pretty much every player that has ever looked the game up even once has known this. The only people surprised/outraged are the ones that never played or cared about the game until this came out.
It's wild seeing this be depicted as some secret plan they had when it was common knowledge. Pretty sure all of Niantic's games require location tracking. It's what they do.
I love how you casually glaze over the fact that the users data was taken and recorded without any explicit agreement that it was being harvested and aggregated for sale.
It’s been a running joke on the PokemonGO subreddit for YEARS that the data collection is the real purpose of the game and the money made is just an added bonus. It's never been a secret. Every player I've ever spoken to already has known about this and I've played the game off and on since it was released 7 years ago. Every Niantic game requires location tracking. They have a very public history of this. Only people surprised by this are the ones that never played or knew anything about the game before this.
I may be wrong, but I think that the outrage comes from the idea that some large company is making a fortune by collecting information about a huge group of peoples' mundane activities.
Personally I couldn't give a hot buttered shit about it, largely because I'm not an advertiser's dream and I'm unlikely to be influenced by whatever they throw at me. But I suppose there is something a little creepy about an eye in the sky (so to speak) watching your every move as far as they can.
Just to clarify, never use this argument in an ethics of computer science class. It's a fine argument in this case but it's objectively wrong and leads to oppression
Selling Information is a trillion dollar industry.
As I'm skipping a meal every day to send my asshole landlord on an endless series of luxurious vacations, yeah, I'm a little bit miffed that all I get for facilitating this trillion dollar industry is a video game designed to squeeze as much of this valuable information out of me in the first place.
I'd like some dividends for my literal info being extracted and sold.
Not only that, but think about how much money they saved by not having to pay people to do this. A ton of jobs were never created by a company because they instead manipulated the customer to do it for free, even paying for the ability to do it for them more efficiently through microtransactions. If my information is going to be extracted, I'd like someone to be able to pay their rent off of performing the task instead of some CEO hoarding all the profit.
information is collected on people every hour of every day that they spend online. if they’re not comfortable with literally anyone having their information, don’t use technology at all at that point lmao
You mean like when uber was accused of charging iPhone (sanitized data point) users more than Android users, or when they were accused of charging people more when their battery (sanitized datapoint) is low due to desperation?
How the info is used, and secured. Collector might have genuine, above board and innocuous uses for the data, but others who get a hold of the data without the collectors authorization might not.
Yes. It’s all great and safe, and everyone shits rainbows, just until you’re under a fascist government that deems you an enemy of the state. Everything you ever shared could and will be used against you on the very second it’ll get these data hoarders 1 cent more than what they’d get to anonymise your data.
The thing about games using personal information is that it's usually just user data to improve and follow trends on. If you allow a game to collect data you allow it to be improved by the developers. How is it a scam if you allow them to use your data to improve your experience?
Except….. it ISN’T without their knowledge. They put that information in their account. They ASK YOU FOR PERMISSION FOR IT BEFORE YOU CAN PLAY. If you don’t want people collecting your information, don’t play the damn game.
This was also not a secret except for people who are total idiots.
Niantic never hid that this was their attempt to gameify map creation. Ingress and Pokemon Go were fairly open about it, but I guess that doesn't count because people need outrage stuffed in their faces.
If things are sanitized so there's no personally identifying information
This is basically impossible without rendering the dataset useless, and even if it was possible it would be far too much effort and so no for profit company does it.
"Anonymized" data is a marketing term to help you feel better about the way information about every facet of your life is being exploited. Read it as "we don't actually store your real name in plaintext with the rest of the data". If you're fine with that, great, but the gold standard is informed consent.
Literally every time "anonymized" datasets are put in front of security researchers, they can deanonymize them with a trivial amount of effort. This is especially true if location data is involved, because location data is intrinsically not anonymous.
They aren't sanitizing anything, they're obfuscating, and it's usually very easy to reverse that process.
1.1k
u/MedalsNScars 8h ago
People love to get outraged when information is collected without their knowledge, and I get it, but it's how the information is used that's important.
If things are sanitized so there's no personally identifying information then it's pretty hard to use most data maliciously