Arxiv author affiliations using Python
So, I wanted to get author affiliation information from papers on arXiv. arXiv provides with an API to bulk query their database and get information. Following that, I look for the attribute 'arxiv:affiliation' in the html data. Here's the code - import urllib from BeautifulSoup import BeautifulStoneSoup url = 'http://export.arxiv.org/api/query?search_query=all:astro&start=0&max_results=1000' data = urllib.urlopen(url).read() soup = BeautifulStoneSoup(data) #print(soup.prettify()) #list = soup.findAll('arxiv:affiliation') #for i in range(len(list)): # print list[i].contents test = [tag.string for tag in soup.findAll('arxiv:aiffiliation')] Now, the problem I'm having is that I'm getting affiliation of all authors which I want to split into sets of affiliations of authors of a paper, which I'm stuck on at the moment. Once I get that part, I can move on to the next part of this pet project, displaying these