You can be identified online based on your browsing history

Suramya's Blog : Welcome to my crazy life…

August 29, 2020

You can be identified online based on your browsing history

Filed under: Computer Related,Computer Software,My Thoughts,Tech Related — Suramya @ 7:29 PM

Reliably Identifying people online is a bedrock of the million dollar advertising industry and as more and more users become privacy conscious browsers have been adding features to increase the user’s privacy and reduce the probability of them getting identified online. Users can be identified by Cookies, Super Cookies etc etc. Now there is a research paper (Replication: Why We Still Can’t Browse in Peace: On the Uniqueness and Reidentifiability of Web Browsing Histories) that claims to be able to identify users based on their browsing histories. It is built on top of previous research Why Johnny Can’t Browse in Peace: On the Uniqueness of Web Browsing History Patterns and re-validates the findings of the previous paper and builds on top of it.

We examine the threat to individuals’ privacy based on the feasibility of reidentifying users through distinctive profiles of their browsing history visible to websites and third parties. This work replicates and

extends the 2012 paper Why Johnny Can’t Browse in Peace: On the Uniqueness of Web Browsing History Patterns[48]. The original work demonstrated that browsing profiles are highly distinctive and stable.We reproduce those results and extend the original work to detail the privacy risk posed by the aggregation of browsing histories. Our dataset consists of two weeks of browsing data from ~52,000 Firefox users. Our work replicates the original paper’s core findings by identifying 48,919 distinct browsing profiles, of which 99% are unique. High uniqueness hold seven when histories are truncated to just 100 top sites. Wethen find that for users who visited 50 or more distinct do-mains in the two-week data collection period, ~50% can be reidentified using the top 10k sites. Reidentifiability rose to over 80% for users that browsed 150 or more distinct domains.Finally, we observe numerous third parties pervasive enough to gather web histories sufficient to leverage browsing history as an identifier.

Original paper

Olejnik, Castelluccia, and Janc [48] gathered data in a project aimed at educating users about privacy practices. For the analysis presented in [48] they used the CSS :vis-ited browser vulnerability [8] to determine whether various home pages were in a user’s browsing history. That is, they probed users’ browsers for 6,000 predefined “primary links” such as www.google.com and got a yes/no for whether that home page was in the user’s browsing history. A user may have visited that home page and then cleared their browsing history, in which case they would not register a hit. Additionally a user may have visited a subpage e.g. www.google.com/maps but not www.google.com in which case the probe for www.google.com would also not register a hit. The project website was open for an extended period of time and recorded profiles between January 2009 and May 2011 for 441,627 unique users, some of whom returned for multiple history tests, allowing the researchers to study the evolution of browser profiles as well. With this data, they examined the uniqueness of browsing histories.

This brings to mind a project that I saw a few years ago that would give you a list of websites from the top 1k websites that you had visited in the past using javascript and some script-fu. Unfortunately I can’t find the link to the site right now as I don’t remember the name and a generic search is returning random sites. If I find it I will post it here as it was quite interesting.

Well this is all for now. Will post more later.

– Suramya

Comments (0)

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Suramya on Indian Railways Pilots Solar power generation using removable Solar Panels on railway tracks: “These panels are hardened and are supposed to be able to take a lot of use before getting damaged. The…” Jun 9, 23:08

C V PRASAD RAO on Indian Railways Pilots Solar power generation using removable Solar Panels on railway tracks: “This is a good idea for installing the solar Panels, however there is big risk of damages to the panels.…” Jun 8, 15:12

BroadFSC on What is Vibe Coding?: “Your analysis on Vibe Coding is thorough and sheds light on the pros and cons of this emerging technology. One…” May 7, 12:04

rjnivah on Visiting Sikkim: Home to glaciers, alpine meadows , Monasteries and lots of wildflowers: “This was such an engaging and honest travel account, especially the way you described the road conditions, local food, and…” Mar 20, 12:56

Suramya's Blog : Welcome to my crazy life…

August 29, 2020

You can be identified online based on your browsing history

No Comments »

Leave a comment