Skip to main content

A tale of two trains : The Indian Railways

Last week, I started collecting the running status of a few (<10) trains everyday. I wrote a blogpost last week about how I was collecting the data if you want to know more. Now, let's look at what I've collected so far.

(Open in the following images in a new page to take a better look at which stations are the most problematic and understand the general trend better)

Train 18029 - runs from Lokmanyatilak (Mumbai) to Shalimar. This train is mostly representative of what happens with the rest of the trains discussed below. There are stations enroute where the train makes up for lost time and then it loses any gains made. But, for the most part, I guess the delays are acceptable, given that they're within an hour of expected arrival time.

Train 12809 - runs from Mumbai CST to Howrah JN. This train was a little surprising because it's different compared to the rest of the lot. The train almost always makes up for delays in at the start of the route. There are a few places where there's a drastic reduction in delay but the gains are offset a few stations later (thrice)!

Train 12322 - runs from Mumbai CST to Howrah JN. This train displays two interesting trends. The first is that even though there are stations enroute where the train makes up for lost time (twice), it gets delayed again almost immediately. The second interesting trend is that beyond a certrain point enroute, the delay persists, and in 2/4 cases, the train can't make up for lost time.

Train 12622 - runs from New Delhi to Chennai Central. Can't complain about this train.

Train 12616 - runs from Delhi to Chennai Central. The interesting thing to note here is that there are points enroute where the train makes up for lost time - but, it gets delayed again almost immediately, negating any reduction in delay.

Train 12424 - runs from New Delhi to Dibrugarh Town via Guwahati. This train is just sad. At no point enroute does it show any prospect of making up lost time, if it's late.

Train 14056 - runs from Delhi to Dibrugarh via Guwahati. The running status of the train looks a little weird, doesn't it? After a certain point, the delays become very predictable instead of random. That is because I was asking for the running status of train at the wrong time - when the train was still in enroute. Of course, if I ask for the running status while a train in enroute, all I will get is estimated delay at future stations. Which is the reason behind long horizontal lines followed by dips.

Train 15910 - runs from Lalgarh JN to Dibrugarh via Guwahati. The running status of the above train shows the same behavior as the earlier one (14056) i.e asking for the running status while the train in still enroute WILL give me faulty estimates of delay beyond the current position of the train. And of course, the it's in the Indian Railways' best interests to estimate no delay instead of providing more accurate estimates.

That's all for now folks. I know, we didn't learn too much above why the delays are being caused or what routes lead to the most delay but we'll get there. I think. I'll try. I'll post the code I used to analyze the data and generate the plots tomorrow. If you can gleam anything more from the plots above or any other comments that you'd like to pass on to me, I'm all ears.

Popular posts from this blog

Animation using GNUPlot

Animation using GNUPlotI've been trying to create an animation depicting a quasar spectrum moving across the 5 SDSS pass bands with respect to redshift. It is important to visualise what emission lines are moving in and out of bands to be able to understand the color-redshift plots and the changes in it.
I've tried doing this using the animate function in matplotlib, python but i wasn't able to make it work - meaning i worked on it for a couple of days and then i gave up, not having found solutions for my problems on the internet.
And then i came across this site, where the gunn-peterson trough and the lyman alpha forest have been depicted - in a beautiful manner. And this got me interested in using js and d3 to do the animations and make it dynamic - using sliders etc.
In the meanwhile, i thought i'd look up and see if there was a way to create animations in gnuplot and whoopdedoo, what do i find but nirvana!

In the image, you see 5 static curves and one dynam…

on MOOCs.

For those of you who don't know, MOOC stands for Massively Open Online Course.

The internet is an awesome thing. It's making education free for all. Well, mostly free. But it's surprising at the width and depth of courses being offered online. And it looks like they are also having an impact on students, especially those from universities that are not top ranked. Students in all parts of the world can now get a first class education experience, thanks to courses offered by Stanford, MIT, Caltech, etc.

I'm talking about MOOCs because one of my new year resolutions is to take online courses, atleast 2 per semester (6 months). And I've chosen the following two courses on edX - Analyzing Big Data with Microsoft R Server and Data Science Essentials for now. I looked at courses on Coursera but I couldn't find any which was worthy and free. There are a lot more MOOC providers out there but let's start here. And I feel like the two courses are relevant to where I …

Pandas download statistics, PyPI and Google BigQuery - Daily downloads and downloads by latest version

Inspired by this blog post :, I wanted to play around with Google BigQuery myself. And the blog post is pretty awesome because it has sample queries. I mix and matched the examples mentioned on the blog post, intent on answering two questions - 
1. How many people download the Pandas library on a daily basis? Actually, if you think about it, it's more of a question of how many times was the pandas library downloaded in a single day, because the same person could've downloaded multiple times. Or a bot could've.
This was just a fun first query/question.
2. What is the adoption rate of different versions of the Pandas library? You might have come across similar graphs which show the adoption rate of various versions of Windows.
Answering this question is actually important because the developers should have an idea of what the most popular versions are, see whether or not users are adopting new features/changes they provide…