Trackography
You never read alone
Centre for Internet and Society (CIS), India, 6th March 2015
press ESC to display slide tree
Online tracking
“When governments collect data we call it surveillance, but when companies do the same, we mistakenly call it user services”
Marek Tuszynski.
Why did we develop Trackography?
- We are interested in the online tracking business because it is pretty opaque
- We have seen through the Snowden revelations that intelligence agencies have tapped into the data collected by tracking companies
- We do not control the profiles created about us by online tracking companies - which can lead to abuse
- This is a project for advocacy and transparency
Why are we focusing on media websites?
- One of the most common things we do is read the news online
- The business model of the media is largely dependent on advertising
- The type of news we read show more about us in the long-term than what we think
What do we mean by 'media'?
We mean websites which cover the news, are of public interest and which are regularly updated
Every country has its own media websites which are visited by individuals from that specific country
We compiled lists of media websites for each country we examined and distinguished between global (e.g. theguardian.co.uk), national (covers the entire nation), regional (covers a region) media websites and blogs
The creation of such media lists requires local knowledge - which is why we need you to collaborate with us!
Developing a Script to Track the Trackers
Our script is designed to:
-
Perform an HTTP connection (using phantomjs) to every media website under analysis
-
Collect all the third party URLs which are included in the media websites under analysis
-
Perform a traceroute for every URL included in the media websites under analysis
-
Identify the countries which host the network infrastructure and perform a GeoIP conversion of all the included IP addresses in the network path
But the script does NOT work in conjunction with your browser - it just performs a connection to the media and sends us the results
Data collected so far
curl https://trackography.org/countries
38 countries around the world...including India!
Number of media websites: at least 3,242
github.com/vecna/trackmap/verified_media$ grep http * | wc -l
3242
What does Trackography show when we select media websites?
- The blue countries host the servers of the media websites you have selected
- The purple countries host the network infrastructure required to access the media websites you have selected
- The red countries host the servers of the companies that track you when you access the media websites you have selected
As for the arcs?
- The blue arcs show your connection to the media websites you have selected
- ...while the red arcs show your connection to tracking companies
User Vulnerability: Network Topology
- When we access media websites, our connections travel through the network infrastructure of foreign states
- When unencrypted connections pass though the network infrastructure of ISPs, they have access to the HTTP referer, cookies, and other identifiable information
- When unencrypted connections pass though the network infrastructure of ISPs, they can redirect traffic to malicious servers
Do you remember FoxAcid ?
Or FinFly ISP?
They are based on the interception of HTTP connections which are redirected to other servers and subsequently injected with a browser exploit or tampered with a download on-the-fly
Geopolitics of data: Developing world
The media servers of developing countries are hosted in the datacentres of developed countries
Geopolitics of data: India
User Vulnerability: Profiling
When we access media websites, third parties track us and create profiles about us - which may or may not be accurate
Company presence percentage
Trackers' Business Model
Third party trackers (a.k.a tracking companies) engage in (one or more of) the following:
- Advertising
- Profiling
- Market Research
- Web Analytics
- Web Crawling
Third party trackers: India
But how do the Trackers handle our data?
We collected the following fields of data from the privacy policies of some of the globally prevailing tracking companies:
-
The types of data they collect
-
Whether they provide safeguards to prevent the full identification of users' IP addresses
-
Whether users can opt-out from their tracking
-
Whether they support Do Not Track (DNT)
-
The types of tracking technologies they use
-
Whether they comply with the US – EU Safe Harbour Framework
Globally prevailing tracking companies
-
22 out of 25 are based in the U.S.
-
19 out of 25 state that they collect personally identifiable information
-
Only 3 out of 25 support Do Not track (DNT)
-
Only 11 out of 25 disclose how long they retain data for
Opt-Out ?
Largely conditional because in some cases:
-
Users can only opt-out if their browser is not configured to block third party cookies
-
Users can only opt-out by cancelling their account with a service
-
Users need to opt-out from every device that they use
-
Users can only opt-out from the browser that they are using
-
If users opt-out they will have restricted access to content, features and services
-
...and lets not forget the various default online tracking settings browsers have...
Through which media websites are we tracked the most?
Not so easy to say!
Tracking changes across time, but more importantly, it changes depending on the location of the client!
Tracker heatmap
How can we block and circumvent online tracking?
Help us Track the Trackers
Contribute by helping us further review India's media list and please pull request.
Contribute by running our software
wget https://github.com/vecna/trackmap/blob/master/setup.sh && sh ./setup.sh
cd trackmap
./perform_analysis.py -c India
Thanks! Questions?
pub 3200R/0x94E7EF47 2014-08-05 [expires: 2015-08-30]
Key fingerprint = ABC2 7639 5EE3 3245 A0A1 3973 40E2 6C25 94E7 EF47
uid TrackMap project <trackmap@tacticaltech.org>
sub 3200R/0x504DEBDF 2014-08-05 [expires: 2015-08-30]
Project twitter @trackography_
Access Trackography through Tactical Tech's Me & My Shadow project