Arxiv author affiliations using Python

So, I wanted to get author affiliation information from papers on arXiv. arXiv provides with an API to bulk query their database and get information. Following that, I look for the attribute 'arxiv:affiliation' in the html data. Here's the code -

import urllibfrom BeautifulSoup import BeautifulStoneSoup 
url = 'http://export.arxiv.org/api/query?search_query=all:astro&start=0&max_results=1000' 
data = urllib.urlopen(url).read()soup = BeautifulStoneSoup(data)
#print(soup.prettify())
#list = soup.findAll('arxiv:affiliation')#for i in range(len(list)):#        print list[i].contents 
test = [tag.string for tag in soup.findAll('arxiv:aiffiliation')]

Now, the problem I'm having is that I'm getting affiliation of all authors which I want to split into sets of affiliations of authors of a paper, which I'm stuck on at the moment. Once I get that part, I can move on to the next part of this pet project, displaying these relations between the universities based on authors.

Popular posts from this blog

Farewell to Enthought

Elementary (particle physics), my dear Watson