Arxiv author affiliations using Python

So, I wanted to get author affiliation information from papers on arXiv. arXiv provides with an API to bulk query their database and get information. Following that, I look for the attribute 'arxiv:affiliation' in the html data. Here's the code -

import urllibfrom BeautifulSoup import BeautifulStoneSoup 
url = 'http://export.arxiv.org/api/query?search_query=all:astro&start=0&max_results=1000' 
data = urllib.urlopen(url).read()soup = BeautifulStoneSoup(data)
#print(soup.prettify())
#list = soup.findAll('arxiv:affiliation')#for i in range(len(list)):#        print list[i].contents 
test = [tag.string for tag in soup.findAll('arxiv:aiffiliation')]

Now, the problem I'm having is that I'm getting affiliation of all authors which I want to split into sets of affiliations of authors of a paper, which I'm stuck on at the moment. Once I get that part, I can move on to the next part of this pet project, displaying these relations between the universities based on authors.

Popular posts from this blog

Giving up to 5% of my annual salary to FOSS

You need to start writing Architecture Decision Records