Posts

Showing posts from December, 2015

Pocket reading stats

Image
Using the same Python code I used to look at my bookmarks, I looked at my reading habits using Pocket and here are the results. I installed the app in 2014 and therefore, the number of articles I read in 2014 are fewer in number than those I read in 2015. The 2015 numbers should probably be my benchmark from this point on. These are monthly reading habits. Apparently, I didn't read as much in Feb and March as I did in the following months. Well, I dug myself deep into two of my courses, General Relativity and Ultrafast lasers at that time, which could be the reason. And I might have read less in September/October than in August or November because I was apping those months. Maybe. Now we come to days of the month. I can't write anything meaningful about this. It'll be better if I get weekly behavior i.e Monday through Sunday and, if I'm correct, see that I read a lot more on Friday/Saturday/Sunday than on the other days of the week. And we finally...

Arxiv author affiliations - Part II

Well, one part of the project is complete now that I have lists of affiliations of co-authors on a number of papers. I now need to convert this into an edge-weighted graph, where the edge weights convey how connected various universities are in terms of co-authorship. Previously, I had mentioned that I had information on all of the author affiliation but that I needed to sort them based on individual papers. As small tweak in the code was all that was needed to get that information. Following is the necessary code. import urllib from BeautifulSoup import BeautifulStoneSoup   url = 'http://export.arxiv.org/api/query?search_query=all:astro&start=0&max_results=1000' data = urllib.urlopen(url).read() soup = BeautifulStoneSoup(data) test = [tag for tag in soup.findAll('entry')]   affiliationList = [] for i in range(len(test)):         if test[i].findAll('arxiv:affiliation') != []:                 affiliati...

Arxiv author affiliations using Python

So, I wanted to get author affiliation information from papers on arXiv. arXiv provides with an API to bulk query their database and get information. Following that, I look for the attribute 'arxiv:affiliation' in the html data. Here's the code - import urllib from BeautifulSoup import BeautifulStoneSoup   url = 'http://export.arxiv.org/api/query?search_query=all:astro&start=0&max_results=1000'   data = urllib.urlopen(url).read() soup = BeautifulStoneSoup(data) #print(soup.prettify()) #list = soup.findAll('arxiv:affiliation') #for i in range(len(list)): #        print list[i].contents   test = [tag.string for tag in soup.findAll('arxiv:aiffiliation')] Now, the problem I'm having is that I'm getting affiliation of all authors which I want to split into sets of affiliations of authors of a paper, which I'm stuck on at the moment. Once I get that part, I can move on to the next part of this pet project, displaying these...

Looking at my bookmarking habits

So, I have finally been able to get the time stamps out of the html file into which I had exported my bookmarks. But preliminary analysis doesn't make much sense. So, I used the BeautifulSoup and time python library to extract and make sense of the time stamps. So, chrome stores the time stamps in this format - '1402120115'. You can look at this if you want to understand the time stamps. Now here are the weird parts 1. It thinks that all of the bookmarks were made in 2014 2. It thinks that all of the bookmarks were made in the 6th month i.e June. 3. It doesn't show any bookmarks on the 1-6 days of the month 4. The hour stamps are GMT and not GMT +0530 which is Indian Standard Time. The html file is available in blog and here's the code - import numpy import matplotlib.pyplot as plt import BeautifulSoup import time   soup = BeautifulSoup.BeautifulSoup(open('bookmarks.html')) allAttrs = [tag.attrs for tag in soup.findAll('a')] dates = [...