Dude, Where's my Train? - The Indian Railways.
I have a friend who was traveling from New Delhi to Guwahati and due to certain constraints, she had to take train number 12502. She had made further plans of traveling from Guwahati based on the assumption that she would reach Guwahati at the expected arrival time. You all know where this is going. The train was late and she had to make changes to her travel plans. We all have either known someone who went through this or personally went through this ourselves. Usually, trains run by the Indian Railways are not more than an hour late. There are ones who run perfectly on time too. And then there are also trains that are multiple hours late, sometimes even > 6! Which I don't think is acceptable. And because I had nothing better to do on a Sunday, I set about to do something about it.
If you guys have read a few of my earlier blog posts, you know where this is going. I'm going to write some code that will help me automate something. Or get some data. Or make a plot or a map. This too will be more of the same. In the context of trains run by the Indian Railways.
Let's start with something simple - trains that are cancelled. Everyday, the Indian Railways announces Fully/Partially Cancelled Trains for that day. It doesn't state a reason as to why they were cancelled. And I never really had a reason to check this list before. Until now.
Let me first state what I want to do. I want to see if there are trains that have been cancelled everyday. And then look for reasons as to why they might be. Think about it. Why would the Indian Railways even list a train if it's being cancelled every day? Are there costs involved with taking a train off of service or rotation? Are there costs involved with maintaining a train, even though it's being cancelled every day?
To find out, let's write some code. One other way to find out would be to manually make a list of trains cancelled every day from this website but I'm wayyy too lazy for that. In order to automate the task of getting every day's list of cancelled trains, I used the RailwayAPI website. These people are not affiliated to the Indian Railways but they seem to be offering most of the information I need. I've cross checked the list these RailwayAPI people are returning me is the same that Indian Railways displays so the data from RailwayAPI seems to be correct. With that, let's move on to some code.
Let me explain what is happening in the above code. The API_KEY in the first line is used to let me access their data. The same way Facebook recognizes you using a username and password, the RailwayAPI people recognize me using the API_KEY. I had to register with them to get my API_KEY and I can't display it publicly. You too can get one by signing up with them. Which is why I'm importing it as a variable, that was defined in another file.
Moving on. The URL_TEMPLATE is what I need to query to get information on cancelled trains. You can see that there are placeholders for date in the URL, which are filled in in the next step. The response is made using the urllib library available in the Python Standard Library. The next few steps are simply making the request, getting a response and converting the response into a meaningful format.
The last few steps involve writing the response to a file. The complete response contains information about where the train departs from and what it's destination is, what it's number is and so on. I don't need all of that information. I just want the train number. Which is what I'm writing to the file in the last time. You will need to look at the response yourself to understand the it's structure.
All I had to do now was run this code everyday. But because I'm too lazy to manually run this code everyday, I automated that too. Cron comes to the rescue. I mentioned cron in one of my earlier posts. It's a way to run tasks/jobs periodically on Linux/OSX. I simply had to add
to my list of automated cron tasks/jobs, that can be accessed using
Again. All I'm doing is call the script query_for_cancelled using Python (/usr/bin/python is the full path to the executable) at 00:01 AM every day. There ends the code.
Now for a preliminary look at the results. I queried for the trains cancelled on August 03, 04, 05 and 06 and then got the trains that were cancelled on all of those days, a few of which are - 18235, 18236, 19943, 19944, 22183, 22184, 22185, 22186, 24041, 24042. Notice that there are 5 pairs in total, where two numbers in a pair only differ in the last digit. If you don't know already, last digit changes differentiate trains from A to B and B to A. All of those trains are a little weird. Especially the last one - 24041 and 24042. It's a train that is supposed to travel 25 KMs. Umm, what?
If you guys have read a few of my earlier blog posts, you know where this is going. I'm going to write some code that will help me automate something. Or get some data. Or make a plot or a map. This too will be more of the same. In the context of trains run by the Indian Railways.
Let's start with something simple - trains that are cancelled. Everyday, the Indian Railways announces Fully/Partially Cancelled Trains for that day. It doesn't state a reason as to why they were cancelled. And I never really had a reason to check this list before. Until now.
Let me first state what I want to do. I want to see if there are trains that have been cancelled everyday. And then look for reasons as to why they might be. Think about it. Why would the Indian Railways even list a train if it's being cancelled every day? Are there costs involved with taking a train off of service or rotation? Are there costs involved with maintaining a train, even though it's being cancelled every day?
To find out, let's write some code. One other way to find out would be to manually make a list of trains cancelled every day from this website but I'm wayyy too lazy for that. In order to automate the task of getting every day's list of cancelled trains, I used the RailwayAPI website. These people are not affiliated to the Indian Railways but they seem to be offering most of the information I need. I've cross checked the list these RailwayAPI people are returning me is the same that Indian Railways displays so the data from RailwayAPI seems to be correct. With that, let's move on to some code.
from api_key import API_KEY as MY_API_KEY import urllib import json from datetime import datetime today = datetime.today() day = today.day month = today.month year = today.year URL_TEMPLATE = "http://api.railwayapi.com/cancelled/date/{day}-{month}-{year}/apikey/{APIKEY}/" url = URL_TEMPLATE.format(day=day, month=month, year=year, APIKEY=MY_API_KEY) response = urllib.urlopen(url) json_response = response.read() response = json.loads(json_response) filename = "{day}-{month}-{year}.can".format(day=day, month=month, year=year) with open(filename, 'w') as f: for train in response['trains']: f.write("{} \n".format(train['train']['number']))
Let me explain what is happening in the above code. The API_KEY in the first line is used to let me access their data. The same way Facebook recognizes you using a username and password, the RailwayAPI people recognize me using the API_KEY. I had to register with them to get my API_KEY and I can't display it publicly. You too can get one by signing up with them. Which is why I'm importing it as a variable, that was defined in another file.
Moving on. The URL_TEMPLATE is what I need to query to get information on cancelled trains. You can see that there are placeholders for date in the URL, which are filled in in the next step. The response is made using the urllib library available in the Python Standard Library. The next few steps are simply making the request, getting a response and converting the response into a meaningful format.
The last few steps involve writing the response to a file. The complete response contains information about where the train departs from and what it's destination is, what it's number is and so on. I don't need all of that information. I just want the train number. Which is what I'm writing to the file in the last time. You will need to look at the response yourself to understand the it's structure.
All I had to do now was run this code everyday. But because I'm too lazy to manually run this code everyday, I automated that too. Cron comes to the rescue. I mentioned cron in one of my earlier posts. It's a way to run tasks/jobs periodically on Linux/OSX. I simply had to add
01 00 * * * /usr/bin/python /path/to/script/query_for_cancelled.py
to my list of automated cron tasks/jobs, that can be accessed using
crontab -e
Again. All I'm doing is call the script query_for_cancelled using Python (/usr/bin/python is the full path to the executable) at 00:01 AM every day. There ends the code.
Now for a preliminary look at the results. I queried for the trains cancelled on August 03, 04, 05 and 06 and then got the trains that were cancelled on all of those days, a few of which are - 18235, 18236, 19943, 19944, 22183, 22184, 22185, 22186, 24041, 24042. Notice that there are 5 pairs in total, where two numbers in a pair only differ in the last digit. If you don't know already, last digit changes differentiate trains from A to B and B to A. All of those trains are a little weird. Especially the last one - 24041 and 24042. It's a train that is supposed to travel 25 KMs. Umm, what?