Using my internet usage to understand my daily activity cycle

Well, I had a little bit of work to do on Friday night, mainly because I didn't get much work done during the day. And I tried a couple of other ideas I had in mind but they didn't exactly pan out so I finally settled on this.

Coming to the point, I use Google and Stack Overflow on a daily basis at work. In fact, it can be crudely said that if i'm using the internet, then I'm working and if I'm not using the internet, I'm either sleeping or watching something. I know what you're going to say, what if i'm wasting time on the internet, reading articles or worse, watching videos on Youtube and to that i'm going to say, from internet usage, I should be able to distinguish between using the internet for work i.e google/SO and wasting my time watching videos on Youtube. Now, let's look at how this can be done. I'm going to talk in the context of OSX and Linux-based systems. I didn't play around with Windows system. Maybe another time.

1 
while :; do netstat -ib | grep -e "en0" -m 1| awk '{print $7}'; sleep 60; done >> ~/network.log

The -ib argument to the netstat commands displays the incoming and outgoing data in bytes. We pipe the output of this command to look for en0 which represents the wired connection. This might called different things in different systems. You'll have to run netstat -i or ifconfig to figure out what the wired and wireless connections are called on your OS. We then pipe this specifically to the awk statement that chooses the 7th element, which is the incoming bytes. The bash code that wraps this command runs it every 60 seconds and appends the output to a file. The output looks like

1
2
3
4
5
......................
49760820760
49771505187
49771559221
......................

Note that the above command works on OSX. If you're running a linux system, you will have to throw away the b argument to the netstat command and change 7 to 4 in the print statement of awk.

Now, we can use a simple python script to look at how much data was downloaded, on a minute-by-minute basis. The following python simply does two things - it first creates a list of integers from the input log file and then creates a new list that is the difference of every two consecutive elements in the first list.

1
2
3
4
5
6
7
import matplotlib.pyplot as plt

time_stamps = [int(line) for line in open('network.log')]
time_deltas = [time_stamps[i+1] -time_stamps[i] for i in range(len(time_stamps)-1)]

plt.plot(time_deltas)
plt.show()

And this is what the output finally looks like. The gap between 50-250 on the x-axis is because I was sleeping and therefore not using the network. Note that the x-axis represents the time in minutes and the y-axis represents the amount of data, in bytes, downloaded every minute. 1e7 bytes on the y-axis roughly translates to a few MB of data downloaded every minute, which I was probably doing because I was watching Youtube videos.
Having plotted it, I think it'll make more sense to use a log-scaled y-axis to better represent the data that contains extremely-heavy and very-light usage.

Now, there's another way to achieve the same thing. Instead of manually running the above command, I can use cron to run the command every minute and append the output to a log file. cron is a way to automate things on unix-based systems.

1 crontab -e

will display the cron jobs associated with your user account. Add the following line to the end of the file and you should get the same results as before.


1
 */1 *  * * * /sbin/netstat | /bin/grep -e "en0" -m 1 | /usr/bin/awk '{print $7}'>> ~/network.log

The meaning of *s in the line are explained in the file that is displayed after you run the crontab command. The */1 tells the system to run the command every minute and the *s that follow tell the system to run the command every hour of the day, every day of the month, every month of the year and every day of the week. You will notice that we changed netstat to /bin/netstat. As I understand it, unlike bashcron is not able to automatically understand where the netstat command is defined. Similarly, grep changed to /bin/grep for the same reason.

Now, I need to figure out how to create a new file every day, appropriately named, in which the output of the command is logged. I wasn't able to spend enough time this weekend, which for the most part was spent scrolling through Twitter and Facebook so maybe next weekend. Until next time ...

Note : As always, highlighting was done using hilite.me.

Popular posts from this blog

Farewell to Enthought

Arxiv author affiliations using Python

Elementary (particle physics), my dear Watson