Skip to main content

A network of physicists at IIT Madras

I started to find networks interesting, especially because of the insights they can provide into the system. Earlier, I worked on making a network of Universities based on co-authorship on publications. Studying such network and their evolution can be helpful. For example, if an ongoing multi-university collaboration is successful without the knowledge and support of the host universities, such analysis can be a way to lobby for official support.

On similar terms, I created a new network of physicists at the Department of Physics at the Indian Institute of Technology at Madras (which is my almamater). They revamped the department's website, specifically the Recent Publications page, which is updated with publications of the faculty in the department. As you can see from the table, each row/paper contains a list of authors. By collecting such lists, we can make a network which shows who collaborates with who and without prior knowledge, take a guess at which labs collaborates with which other and so on.

I used the following python code to extract the lists and construct a network. The network can be seen on my website. It was easier to display it there instead of on this blogpost here. Hover over the individual nodes on the graph to get the name of the faculty/student member the node represents.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
import json

import pandas
import networkx as nx
from networkx.readwrite import json_graph


url_template = 'https://physics.iitm.ac.in/researchinfo?page={}'
authorlist = []


for i in range(8):
    # we can pass a url as the first argument to pandas.read_html
    # and it returns a list of data frames
    df_list = pandas.read_html(url_template.format(i),
                               header=0,
                               index_col=0
    )
    df = df_list[0]

    # column containing author names needs to be cleaned
    df.Authors = df.Authors.str.lower()
    df.Authors = df.Authors.str.strip()
    df.Authors = df.Authors.str.replace('*', ' ')
    df.Authors = df.Authors.str.replace('and ', ',')
    df.Authors = df.Authors.str.replace('&', ',')

    # Split column containing authors on ","
    # split is a data frame i.e 2D array
    split = df['Authors'].str.split(u',', expand=True)
    split.columns = ['Authors_split_{0}'.format(i)
                     for i in range(len(split.columns))]
   
    # strip author names of whitespaces
    for column in split:
        split[column] = split[column].str.strip()

    # each row contains authors of a paper
    # the row might contains NaNs, which is why we use dropna
    for i in range(len(split)-1):
        authorlist.append(list(split.iloc[i].dropna()))


G = nx.Graph()

# link each author to the other authors on each paper
for list in authorlist:
    for pos, node1 in enumerate(list):
        for node2 in list[pos:]:
            # there might be empty strings or whitespaces in the author list
            if node1 != u'' and node2 != u'' and node1 != u' ' and node2 != u' ':
                G.add_edge(node1, node2)

# label each node with the author's name
for n in G:
    G.node[n]['name'] = n

# draw the graph using networkx's Graph object
pos = nx.spring_layout(G)
nx.draw_networkx_nodes(G, pos, node_size=100, node_color='blue')
nx.draw_networkx_edges(G, pos, edge_color='green')
nx.draw_networkx_labels(G, pos, font_color='red')

# convert the Graph object into a JSON object
# we use the JSON object using D3
d = json_graph.node_link_data(G)
json.dump(d, open('force.json', 'w'))

The code is highlighted and formatted using hilite.me.

Popular posts from this blog

Animation using GNUPlot

Animation using GNUPlotI've been trying to create an animation depicting a quasar spectrum moving across the 5 SDSS pass bands with respect to redshift. It is important to visualise what emission lines are moving in and out of bands to be able to understand the color-redshift plots and the changes in it.
I've tried doing this using the animate function in matplotlib, python but i wasn't able to make it work - meaning i worked on it for a couple of days and then i gave up, not having found solutions for my problems on the internet.
And then i came across this site, where the gunn-peterson trough and the lyman alpha forest have been depicted - in a beautiful manner. And this got me interested in using js and d3 to do the animations and make it dynamic - using sliders etc.
In the meanwhile, i thought i'd look up and see if there was a way to create animations in gnuplot and whoopdedoo, what do i find but nirvana!

In the image, you see 5 static curves and one dynam…

on MOOCs.

For those of you who don't know, MOOC stands for Massively Open Online Course.

The internet is an awesome thing. It's making education free for all. Well, mostly free. But it's surprising at the width and depth of courses being offered online. And it looks like they are also having an impact on students, especially those from universities that are not top ranked. Students in all parts of the world can now get a first class education experience, thanks to courses offered by Stanford, MIT, Caltech, etc.

I'm talking about MOOCs because one of my new year resolutions is to take online courses, atleast 2 per semester (6 months). And I've chosen the following two courses on edX - Analyzing Big Data with Microsoft R Server and Data Science Essentials for now. I looked at courses on Coursera but I couldn't find any which was worthy and free. There are a lot more MOOC providers out there but let's start here. And I feel like the two courses are relevant to where I …

On programmers.

I just watched this brilliant keynote today. It's a commentary on Programmers and the software development industry/ecosystem as a whole.



I am not going to give you a tl;dr version of the talk because it is a talk that I believe everyone should watch, that everyone should learn from. Instead, I am going to give my own parallel-ish views on programmers and programming.
As pointed out in the talk, there are mythical creatures in the software development industry who are revered as gods. Guido Van Rossum, the creator of Python, was given the title Benevolent Dictator For Life (BDFL). People flock around the creators of popular languages or libraries. They are god-like to most programmers and are treated like gods. By which, I mean to say, we assume they don't have flaws. That they are infallible. That they are perfect.
And alongside this belief in the infallibility of these Gods, we believe that they were born programmers. That programming is something that people are born wit…