Skip to main content

Using my internet usage to understand my daily activity cycle

Well, I had a little bit of work to do on Friday night, mainly because I didn't get much work done during the day. And I tried a couple of other ideas I had in mind but they didn't exactly pan out so I finally settled on this.

Coming to the point, I use Google and Stack Overflow on a daily basis at work. In fact, it can be crudely said that if i'm using the internet, then I'm working and if I'm not using the internet, I'm either sleeping or watching something. I know what you're going to say, what if i'm wasting time on the internet, reading articles or worse, watching videos on Youtube and to that i'm going to say, from internet usage, I should be able to distinguish between using the internet for work i.e google/SO and wasting my time watching videos on Youtube. Now, let's look at how this can be done. I'm going to talk in the context of OSX and Linux-based systems. I didn't play around with Windows system. Maybe another time.

while :; do netstat -ib | grep -e "en0" -m 1| awk '{print $7}'; sleep 60; done >> ~/network.log

The -ib argument to the netstat commands displays the incoming and outgoing data in bytes. We pipe the output of this command to look for en0 which represents the wired connection. This might called different things in different systems. You'll have to run netstat -i or ifconfig to figure out what the wired and wireless connections are called on your OS. We then pipe this specifically to the awk statement that chooses the 7th element, which is the incoming bytes. The bash code that wraps this command runs it every 60 seconds and appends the output to a file. The output looks like


Note that the above command works on OSX. If you're running a linux system, you will have to throw away the b argument to the netstat command and change 7 to 4 in the print statement of awk.

Now, we can use a simple python script to look at how much data was downloaded, on a minute-by-minute basis. The following python simply does two things - it first creates a list of integers from the input log file and then creates a new list that is the difference of every two consecutive elements in the first list.

import matplotlib.pyplot as plt

time_stamps = [int(line) for line in open('network.log')]
time_deltas = [time_stamps[i+1] -time_stamps[i] for i in range(len(time_stamps)-1)]


And this is what the output finally looks like. The gap between 50-250 on the x-axis is because I was sleeping and therefore not using the network. Note that the x-axis represents the time in minutes and the y-axis represents the amount of data, in bytes, downloaded every minute. 1e7 bytes on the y-axis roughly translates to a few MB of data downloaded every minute, which I was probably doing because I was watching Youtube videos.
Having plotted it, I think it'll make more sense to use a log-scaled y-axis to better represent the data that contains extremely-heavy and very-light usage.

Now, there's another way to achieve the same thing. Instead of manually running the above command, I can use cron to run the command every minute and append the output to a log file. cron is a way to automate things on unix-based systems.

1 crontab -e

will display the cron jobs associated with your user account. Add the following line to the end of the file and you should get the same results as before.

 */1 *  * * * /sbin/netstat | /bin/grep -e "en0" -m 1 | /usr/bin/awk '{print $7}'>> ~/network.log

The meaning of *s in the line are explained in the file that is displayed after you run the crontab command. The */1 tells the system to run the command every minute and the *s that follow tell the system to run the command every hour of the day, every day of the month, every month of the year and every day of the week. You will notice that we changed netstat to /bin/netstat. As I understand it, unlike bashcron is not able to automatically understand where the netstat command is defined. Similarly, grep changed to /bin/grep for the same reason.

Now, I need to figure out how to create a new file every day, appropriately named, in which the output of the command is logged. I wasn't able to spend enough time this weekend, which for the most part was spent scrolling through Twitter and Facebook so maybe next weekend. Until next time ...

Note : As always, highlighting was done using

Popular posts from this blog

Animation using GNUPlot

Animation using GNUPlotI've been trying to create an animation depicting a quasar spectrum moving across the 5 SDSS pass bands with respect to redshift. It is important to visualise what emission lines are moving in and out of bands to be able to understand the color-redshift plots and the changes in it.
I've tried doing this using the animate function in matplotlib, python but i wasn't able to make it work - meaning i worked on it for a couple of days and then i gave up, not having found solutions for my problems on the internet.
And then i came across this site, where the gunn-peterson trough and the lyman alpha forest have been depicted - in a beautiful manner. And this got me interested in using js and d3 to do the animations and make it dynamic - using sliders etc.
In the meanwhile, i thought i'd look up and see if there was a way to create animations in gnuplot and whoopdedoo, what do i find but nirvana!

In the image, you see 5 static curves and one dynam…

on MOOCs.

For those of you who don't know, MOOC stands for Massively Open Online Course.

The internet is an awesome thing. It's making education free for all. Well, mostly free. But it's surprising at the width and depth of courses being offered online. And it looks like they are also having an impact on students, especially those from universities that are not top ranked. Students in all parts of the world can now get a first class education experience, thanks to courses offered by Stanford, MIT, Caltech, etc.

I'm talking about MOOCs because one of my new year resolutions is to take online courses, atleast 2 per semester (6 months). And I've chosen the following two courses on edX - Analyzing Big Data with Microsoft R Server and Data Science Essentials for now. I looked at courses on Coursera but I couldn't find any which was worthy and free. There are a lot more MOOC providers out there but let's start here. And I feel like the two courses are relevant to where I …

On programmers.

I just watched this brilliant keynote today. It's a commentary on Programmers and the software development industry/ecosystem as a whole.

I am not going to give you a tl;dr version of the talk because it is a talk that I believe everyone should watch, that everyone should learn from. Instead, I am going to give my own parallel-ish views on programmers and programming.
As pointed out in the talk, there are mythical creatures in the software development industry who are revered as gods. Guido Van Rossum, the creator of Python, was given the title Benevolent Dictator For Life (BDFL). People flock around the creators of popular languages or libraries. They are god-like to most programmers and are treated like gods. By which, I mean to say, we assume they don't have flaws. That they are infallible. That they are perfect.
And alongside this belief in the infallibility of these Gods, we believe that they were born programmers. That programming is something that people are born wit…