Arxiv author affiliations using Python

December 28, 2015

So, I wanted to get author affiliation information from papers on arXiv. arXiv provides with an API to bulk query their database and get information. Following that, I look for the attribute 'arxiv:affiliation' in the html data. Here's the code -

import urllibfrom BeautifulSoup import BeautifulStoneSoup

url = 'http://export.arxiv.org/api/query?search_query=all:astro&start=0&max_results=1000'

data = urllib.urlopen(url).read()soup = BeautifulStoneSoup(data)
#print(soup.prettify())
#list = soup.findAll('arxiv:affiliation')#for i in range(len(list)):# print list[i].contents

test = [tag.string for tag in soup.findAll('arxiv:aiffiliation')]

Now, the problem I'm having is that I'm getting affiliation of all authors which I want to split into sets of affiliations of authors of a paper, which I'm stuck on at the moment. Once I get that part, I can move on to the next part of this pet project, displaying these relations between the universities based on authors.

Search This Blog

Rahul gives unsolicited advice

Arxiv author affiliations using Python

Popular posts from this blog

You need to start writing Architecture Decision Records

Talk proposals submitted in Dec 2024