A preliminary look at my activity on Facebook.

Because I have nothing better to do on a Friday night, I downloaded whatever data that Facebook had on me, which you can as well by going to this part of Facebook and clicking on the "Download a copy of your Facebook data". It might take a couple of minutes but you'll finally get a zipped file, one that will contain a "wall.htm" file inside of it.

The contents of this "wall.htm" file is what I'll constrain myself to at this moment. Here's a small part of the file to give you an idea of the kind of information available in this file.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
<p>
 <div class="meta">
  Monday, January 11, 2016 at 1:13am UTC+05:30
 </div>
 Rahul Poruri shared Lunarbaboon&#039;s photo.
</p>
<p>
 <div class="meta">
  Sunday, January 10, 2016 at 6:32pm UTC+05:30
 </div>
 Rahul Poruri shared a link.
</p>

As you can see above, there between the <p></p> HTML tags are <div></div> HTML tags whose contents are timestamps related to when the post was shared on your wall. We have day, date, month, year and time of the day available so let's see what all we can do with that information.

I'm not going to get into the details as to how I parsed this "wall.htm" file and what tools I used for it. I used Python and the BeautifulSoup library to parse the htm file and make extracting the string contents inside the <div></div> HTML tags easier. For more on the exact steps involved in reading the file, understanding its contents, extracting all of the timestamps available in the file and then collecting the various types of time events together to display their frequency, you can take a look at this Jupyter Notebook file. If you want to download the Jupyter Notebook yourself, you can do so from my Github here. If you have no idea what a Jupyter Notebook is but are comfortable running Python scripts, you can download this Python script to pretty much do the same thing that the Jupyter Notebook did.

Enough talk. Let's get to the actual numbers.

347 348 361 297 339 278 397

is the total number of times I posted on Facebook on each individual day of the week, starting with a Monday and ending with a Sunday. As was expected, I use Facebook the most on Sunday, evident by the fact that I posted a total of 397 times on Sundays. It was also interesting to note that I used Facebook less frequently on Thursdays (297 times) and much lesser on Saturdays (278 times).

122 422 183 115 166 219 183 144 247 220 115 231

is the total number of times I've posted on each individual month on Facebook, starting with January and ending in December. What stands out in that set of numbers is how frequently posts appeared in February (422). But, for those of you who know me, it should be obvious as to why that is so. It's because my birthday is in February and the large number (422) is because of the birthday wishes that people post on my wall. Skipping that, you can see that the lowest activity is in the months of April (115) and November (115), during which end-semester exams are held. Highest activity is seen in the months of June (219) and December (231), peak holiday period. It's also interesting to note that September (247) sees a lot of activity too but I don't exactly have a reason as to why. I'll have to go through the whole file to see if there's any particular reason or year which is causing this aberration.

51, 225, 474, 495, 661, 252, 204, 5

is the total number of posts, starting from 2009 till 2016. 2009 was when I joined Facebook, also the year when I joined IIT Madras. And my activity rose sharply till 2013, after which it tanked in 2014 and 2015. And there's a good enough reason for that as well. Most of the people I hung out with were undergraduates, who graduated in 2013. This was easier to make into a histogram so here's one -


Total number of posts per year on my wall
Now, let's look at date of the month. As can be seen below in the following two charts, there's a peak in activity on the 6th. And that's because by birthday is on the 6th. Neglecting that, my activity on the day of the month seems stable, except for two weird lows around the 5th and 20th and two weird highs on the 17th and 26th. Can't explain those.

total number of posts by day of the month - a histogram

total number of posts by day of the month - a line plot
And finally, we get to my behaviour according to the time of the day. Again, the histogram looks as expected. It slowly starts rising after 5 AM, peaking at 7 AM and tanking at 8 AM, which is roughly when classes started usually. After that it slowly keeps rising till I hit a peak at around 6 PM, i.e before dinner, and peaks again at 11 PM, just before sleeping, after which it drastically falls off.

total number of posts by the time of the day

And that's pretty much all I could think I could do with the "wall.htm" file. While I had expected to look at which of my friends appears the most on my wall, that information as easy to extract as the time stamps were. For example, here's a different portion of the "wall.htm" file that contains comments. See how there's no mention of who posted the comment and no special way of identifying friends. We can read this and understand that Srinikethan was a friend, that I am Rahul and that Sivaramakrishnan was a friend but how do I tell the computer this. And how do I make it automatically extract such name from all of the <div></div> elements of the class="comment" from the file. I don't know.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
<p>
 <div class="meta">
  Wednesday, February 6, 2013 at 12:01am UTC+05:30
 </div>
 <div class="comment">
  Happy birthday Rahul! Have a good one :)
 </div>
</p>
<p>
 <div class="meta">
  Sunday, February 3, 2013 at 4:28pm UTC+05:30
 </div>
 <div class="comment">
  epic! Srinikethan, you should check this out!
 </div>
</p>
<p>
 <div class="meta">
  Saturday, February 2, 2013 at 5:31pm UTC+05:30
 </div>
 <div class="comment">
  excerpt from the article - &quot;The basic problem can be stated very simply: A student&#039;s grandmother is far more likely to die suddenly just before the student takes an exam, than at any other time of year.&quot; 
This article deserves an Ignobel via Sivaramakrishnan :D...
 </div>
</p>

So this is where it stops for the moment. But next week, I'll take another dataset that also contains information on how frequently I used the online service. Say Youtube. Gmail. Hangouts. Twitter. Google Fit. GitHub. And maybe, just maybe, I'll be able to get out a little more information from those datasets. Until then,

PS : As always, any comments/suggestions/criticism is welcome and highly appreciated. Thank you.
Note : The highlighted HTML code was embedded using hilite.me.

Popular posts from this blog

Animation using GNUPlot

Pandas download statistics, PyPI and Google BigQuery - Daily downloads and downloads by latest version

Adaptive step size Runge-Kutta method