Skip to main content

Downloading your Google data

I've wanted to download all of my (Google) mails and look at things like how frequently I get mails, how frequently I send mails, how long it takes for me to reply to mails, how long my mails are and how long the mails I get are. Well, start pondering on what all interesting things your mails archive can reveal and i'm sure you'll come up with more interesting questions. The only problem is that the Google mail archive is bound to be a large one, especially given the fact that I've been actively using GMail for over 7 years now. Sometime in the middle, I found the link to create and download my Gmail archive and a day later, Google notified me that it was 4.7 GB in size. Note the GB. As I was at home, I didn't (rather couldn't) download that monster.

I finally got reminded of it yesterday in the evening when I was chatting with one of my mentors here at Enthought India about why I'm interested in learning and using programming. And when I went looking for the archive again, I realised that I could not only download my mail archive, I could also download a bunch of other things that Google has on me.

Go here to grab the data Google has on you. If that link doesn't work, for whatever reasons, go to your Google Personal Info & privacy page, specifically to the takeout option. You should see an option to Download your data or Create Archive or something on those lines. Once you click on it, you will then be shown what all things Google has on your and choose what to download!

Personally, I chose to download

  • Google Fit : For those of you who aren't Android users, Google Fit is an app that tracks your movement, counts how much distance you walk/run/cycle/drive everyday, how many calories you burn in the process and how long you sit on your ass. It's an awesome and easy way to track your daily activity!
  • Google Location History : Again, I don't know how many of you are Android users. Actually, I don't know if this feature is limited to Android phones. Anyway, if you tell it to, Google can prompt you to do things when you get home or when you get to work. This is because over time, based on your GPS location, Google can infer when you're at work and when you're home. After downloading the data, I realised that Google has my Location History for over the past year!
  • Youtube Videos : I don't know about you guys but I use Youtube everyday. Multiple times a day. To listen to songs. To watch trending videos. To watch movie trailers and to watch awesome videos over and over again. And Google, of course, has a list of all the searches you've made and all of the things you've watched, when you were logged into your account of course.
  • GMail : Do I really have to explain what this is?
  • Hangouts : Ditto.
Anyway, I downloaded the data and gone through the Fit, Location history and Hangouts data sets. Youtube, Location history and Hangouts data is stored using the json format, which was new to me and took a while to understand/use but I got the hang of it in the end. Fit data was in the form of simple csv files and some other weird xml derived format. I only dug through the csv files for now. I have basic histograms and plots for activity and chats. I'll dig through these data sets more thoroughly over the week to find some interesting metrics. Until then ...

Popular posts from this blog

Animation using GNUPlot

Animation using GNUPlotI've been trying to create an animation depicting a quasar spectrum moving across the 5 SDSS pass bands with respect to redshift. It is important to visualise what emission lines are moving in and out of bands to be able to understand the color-redshift plots and the changes in it.
I've tried doing this using the animate function in matplotlib, python but i wasn't able to make it work - meaning i worked on it for a couple of days and then i gave up, not having found solutions for my problems on the internet.
And then i came across this site, where the gunn-peterson trough and the lyman alpha forest have been depicted - in a beautiful manner. And this got me interested in using js and d3 to do the animations and make it dynamic - using sliders etc.
In the meanwhile, i thought i'd look up and see if there was a way to create animations in gnuplot and whoopdedoo, what do i find but nirvana!

In the image, you see 5 static curves and one dynam…

on MOOCs.

For those of you who don't know, MOOC stands for Massively Open Online Course.

The internet is an awesome thing. It's making education free for all. Well, mostly free. But it's surprising at the width and depth of courses being offered online. And it looks like they are also having an impact on students, especially those from universities that are not top ranked. Students in all parts of the world can now get a first class education experience, thanks to courses offered by Stanford, MIT, Caltech, etc.

I'm talking about MOOCs because one of my new year resolutions is to take online courses, atleast 2 per semester (6 months). And I've chosen the following two courses on edX - Analyzing Big Data with Microsoft R Server and Data Science Essentials for now. I looked at courses on Coursera but I couldn't find any which was worthy and free. There are a lot more MOOC providers out there but let's start here. And I feel like the two courses are relevant to where I …

Pandas download statistics, PyPI and Google BigQuery - Daily downloads and downloads by latest version

Inspired by this blog post :, I wanted to play around with Google BigQuery myself. And the blog post is pretty awesome because it has sample queries. I mix and matched the examples mentioned on the blog post, intent on answering two questions - 
1. How many people download the Pandas library on a daily basis? Actually, if you think about it, it's more of a question of how many times was the pandas library downloaded in a single day, because the same person could've downloaded multiple times. Or a bot could've.
This was just a fun first query/question.
2. What is the adoption rate of different versions of the Pandas library? You might have come across similar graphs which show the adoption rate of various versions of Windows.
Answering this question is actually important because the developers should have an idea of what the most popular versions are, see whether or not users are adopting new features/changes they provide…