Arxiv author affiliations - Part II

Well, one part of the project is complete now that I have lists of affiliations of co-authors on a number of papers. I now need to convert this into an edge-weighted graph, where the edge weights convey how connected various universities are in terms of co-authorship. Previously, I had mentioned that I had information on all of the author affiliation but that I needed to sort them based on individual papers. As small tweak in the code was all that was needed to get that information. Following is the necessary code.

import urllib
from BeautifulSoup import BeautifulStoneSoup
 
url = 'http://export.arxiv.org/api/query?search_query=all:astro&start=0&max_results=1000'
data = urllib.urlopen(url).read()
soup = BeautifulStoneSoup(data)

test = [tag for tag in soup.findAll('entry')]
 
affiliationList = []
for i in range(len(test)):
        if test[i].findAll('arxiv:affiliation') != []:
                affiliationList.append([tag.string for tag in test[i].findAll('arxiv:affiliation')])
There were some interesting preliminary results to see. I expected there to be more papers with 2 authors than papers with just a single author but that was proved wrong. Hardly 50% of the papers contained author affiliations. I have an IPython notebook at hand but i'm trying to figure out a better way to present this information.

Either way, on to the next part of the project where I need to figure out first to convert the lists of affiliations first into a connected graph and then into an edge weighted graph, after which I will place the universities accordingly to their location to better visualize things like continental differences and so on.

Popular posts from this blog

Farewell to Enthought

Arxiv author affiliations using Python

Elementary (particle physics), my dear Watson