Arxiv author affiliations - Part II
Well, one part of the project is complete now that I have lists of affiliations of co-authors on a number of papers. I now need to convert this into an edge-weighted graph, where the edge weights convey how connected various universities are in terms of co-authorship. Previously, I had mentioned that I had information on all of the author affiliation but that I needed to sort them based on individual papers. As small tweak in the code was all that was needed to get that information. Following is the necessary code.
import urllib
from BeautifulSoup import BeautifulStoneSoup
url = 'http://export.arxiv.org/api/query?search_query=all:astro&start=0&max_results=1000'
data = urllib.urlopen(url).read()
soup = BeautifulStoneSoup(data)
test = [tag for tag in soup.findAll('entry')]
affiliationList = []There were some interesting preliminary results to see. I expected there to be more papers with 2 authors than papers with just a single author but that was proved wrong. Hardly 50% of the papers contained author affiliations. I have an IPython notebook at hand but i'm trying to figure out a better way to present this information.
for i in range(len(test)):
if test[i].findAll('arxiv:affiliation') != []:
affiliationList.append([tag.string for tag in test[i].findAll('arxiv:affiliation')])
Either way, on to the next part of the project where I need to figure out first to convert the lists of affiliations first into a connected graph and then into an edge weighted graph, after which I will place the universities accordingly to their location to better visualize things like continental differences and so on.