Skip to main content

Using my internet usage to understand my daily activity cycle

Well, I had a little bit of work to do on Friday night, mainly because I didn't get much work done during the day. And I tried a couple of other ideas I had in mind but they didn't exactly pan out so I finally settled on this.

Coming to the point, I use Google and Stack Overflow on a daily basis at work. In fact, it can be crudely said that if i'm using the internet, then I'm working and if I'm not using the internet, I'm either sleeping or watching something. I know what you're going to say, what if i'm wasting time on the internet, reading articles or worse, watching videos on Youtube and to that i'm going to say, from internet usage, I should be able to distinguish between using the internet for work i.e google/SO and wasting my time watching videos on Youtube. Now, let's look at how this can be done. I'm going to talk in the context of OSX and Linux-based systems. I didn't play around with Windows system. Maybe another time.

while :; do netstat -ib | grep -e "en0" -m 1| awk '{print $7}'; sleep 60; done >> ~/network.log

The -ib argument to the netstat commands displays the incoming and outgoing data in bytes. We pipe the output of this command to look for en0 which represents the wired connection. This might called different things in different systems. You'll have to run netstat -i or ifconfig to figure out what the wired and wireless connections are called on your OS. We then pipe this specifically to the awk statement that chooses the 7th element, which is the incoming bytes. The bash code that wraps this command runs it every 60 seconds and appends the output to a file. The output looks like


Note that the above command works on OSX. If you're running a linux system, you will have to throw away the b argument to the netstat command and change 7 to 4 in the print statement of awk.

Now, we can use a simple python script to look at how much data was downloaded, on a minute-by-minute basis. The following python simply does two things - it first creates a list of integers from the input log file and then creates a new list that is the difference of every two consecutive elements in the first list.

import matplotlib.pyplot as plt

time_stamps = [int(line) for line in open('network.log')]
time_deltas = [time_stamps[i+1] -time_stamps[i] for i in range(len(time_stamps)-1)]


And this is what the output finally looks like. The gap between 50-250 on the x-axis is because I was sleeping and therefore not using the network. Note that the x-axis represents the time in minutes and the y-axis represents the amount of data, in bytes, downloaded every minute. 1e7 bytes on the y-axis roughly translates to a few MB of data downloaded every minute, which I was probably doing because I was watching Youtube videos.
Having plotted it, I think it'll make more sense to use a log-scaled y-axis to better represent the data that contains extremely-heavy and very-light usage.

Now, there's another way to achieve the same thing. Instead of manually running the above command, I can use cron to run the command every minute and append the output to a log file. cron is a way to automate things on unix-based systems.

1 crontab -e

will display the cron jobs associated with your user account. Add the following line to the end of the file and you should get the same results as before.

 */1 *  * * * /sbin/netstat | /bin/grep -e "en0" -m 1 | /usr/bin/awk '{print $7}'>> ~/network.log

The meaning of *s in the line are explained in the file that is displayed after you run the crontab command. The */1 tells the system to run the command every minute and the *s that follow tell the system to run the command every hour of the day, every day of the month, every month of the year and every day of the week. You will notice that we changed netstat to /bin/netstat. As I understand it, unlike bashcron is not able to automatically understand where the netstat command is defined. Similarly, grep changed to /bin/grep for the same reason.

Now, I need to figure out how to create a new file every day, appropriately named, in which the output of the command is logged. I wasn't able to spend enough time this weekend, which for the most part was spent scrolling through Twitter and Facebook so maybe next weekend. Until next time ...

Note : As always, highlighting was done using

Popular posts from this blog

Animation using GNUPlot

Animation using GNUPlotI've been trying to create an animation depicting a quasar spectrum moving across the 5 SDSS pass bands with respect to redshift. It is important to visualise what emission lines are moving in and out of bands to be able to understand the color-redshift plots and the changes in it.
I've tried doing this using the animate function in matplotlib, python but i wasn't able to make it work - meaning i worked on it for a couple of days and then i gave up, not having found solutions for my problems on the internet.
And then i came across this site, where the gunn-peterson trough and the lyman alpha forest have been depicted - in a beautiful manner. And this got me interested in using js and d3 to do the animations and make it dynamic - using sliders etc.
In the meanwhile, i thought i'd look up and see if there was a way to create animations in gnuplot and whoopdedoo, what do i find but nirvana!

In the image, you see 5 static curves and one dynam…

Pandas download statistics, PyPI and Google BigQuery - Daily downloads and downloads by latest version

Inspired by this blog post :, I wanted to play around with Google BigQuery myself. And the blog post is pretty awesome because it has sample queries. I mix and matched the examples mentioned on the blog post, intent on answering two questions - 
1. How many people download the Pandas library on a daily basis? Actually, if you think about it, it's more of a question of how many times was the pandas library downloaded in a single day, because the same person could've downloaded multiple times. Or a bot could've.
This was just a fun first query/question.
2. What is the adoption rate of different versions of the Pandas library? You might have come across similar graphs which show the adoption rate of various versions of Windows.
Answering this question is actually important because the developers should have an idea of what the most popular versions are, see whether or not users are adopting new features/changes they provide…

Adaptive step size Runge-Kutta method

I am still trying to implement an adaptive step size RK routine. So far, I've been able to implement the step-halving method but not the RK-Fehlberg. I am not able to figure out how to increase the step size after reducing it initially.

To give some background on the topic, Runge-Kutta methods are used to solve ordinary differential equations, of any order. For example, in a first order differential equation, it uses the derivative of the function to predict what the function value at the next step should be. Euler's method is a rudimentary implementation of RK. Adaptive step size RK is changing the step size depending on how fastly or slowly the function is changing. If a function is rapidly rising or falling, it is in a region that we should sample carefully and therefore, we reduce the step size and if the rate of change of the function is small, we can increase the step size. I've been able to implement a way to reduce the step size depending on the rate of change of …