### A network of physicists at IIT Madras

I started to find networks interesting, especially because of the insights they can provide into the system. Earlier, I worked on making a network of Universities based on co-authorship on publications. Studying such network and their evolution can be helpful. For example, if an ongoing multi-university collaboration is successful without the knowledge and support of the host universities, such analysis can be a way to lobby for official support.

On similar terms, I created a new network of physicists at the Department of Physics at the Indian Institute of Technology at Madras (which is my almamater). They revamped the department's website, specifically the Recent Publications page, which is updated with publications of the faculty in the department. As you can see from the table, each row/paper contains a list of authors. By collecting such lists, we can make a network which shows who collaborates with who and without prior knowledge, take a guess at which labs collaborates with which other and so on.

I used the following python code to extract the lists and construct a network. The network can be seen on my website. It was easier to display it there instead of on this blogpost here. Hover over the individual nodes on the graph to get the name of the faculty/student member the node represents.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 import json import pandas import networkx as nx from networkx.readwrite import json_graph url_template = 'https://physics.iitm.ac.in/researchinfo?page={}' authorlist = [] for i in range(8): # we can pass a url as the first argument to pandas.read_html # and it returns a list of data frames df_list = pandas.read_html(url_template.format(i), header=0, index_col=0 ) df = df_list[0] # column containing author names needs to be cleaned df.Authors = df.Authors.str.lower() df.Authors = df.Authors.str.strip() df.Authors = df.Authors.str.replace('*', ' ') df.Authors = df.Authors.str.replace('and ', ',') df.Authors = df.Authors.str.replace('&', ',') # Split column containing authors on "," # split is a data frame i.e 2D array split = df['Authors'].str.split(u',', expand=True) split.columns = ['Authors_split_{0}'.format(i) for i in range(len(split.columns))] # strip author names of whitespaces for column in split: split[column] = split[column].str.strip() # each row contains authors of a paper # the row might contains NaNs, which is why we use dropna for i in range(len(split)-1): authorlist.append(list(split.iloc[i].dropna())) G = nx.Graph() # link each author to the other authors on each paper for list in authorlist: for pos, node1 in enumerate(list): for node2 in list[pos:]: # there might be empty strings or whitespaces in the author list if node1 != u'' and node2 != u'' and node1 != u' ' and node2 != u' ': G.add_edge(node1, node2) # label each node with the author's name for n in G: G.node[n]['name'] = n # draw the graph using networkx's Graph object pos = nx.spring_layout(G) nx.draw_networkx_nodes(G, pos, node_size=100, node_color='blue') nx.draw_networkx_edges(G, pos, edge_color='green') nx.draw_networkx_labels(G, pos, font_color='red') # convert the Graph object into a JSON object # we use the JSON object using D3 d = json_graph.node_link_data(G) json.dump(d, open('force.json', 'w')) 

The code is highlighted and formatted using hilite.me.