Posts

Showing posts from January, 2017

On the market for assembled desktops, laptops and phones.

I am currently on my third laptop in 8 years. The one I have currently is pretty powerful but the last two weren't as powerful as I wished. They didn't have a dedicated GPU. They didn't have SSDs. Their CPUs were a generation old. Don't get me wrong, they were still very helpful, especially given that most of my time was spent programming and running long-running simulations. But, I would've loved to be able to swap out the CPU or the SSD on my old laptop, instead of having to throw it away and buy a new one. If I had a desktop, instead of a laptop, throwing the old out and replacing them with the new might have been easier. There are a number of people who still prefer assembling their desktop from scratch. And there are businesses that cater specifically to these customers. The good thing about assembling your own desktop, as I mentioned earlier, is the fact that you can swap out the old CPU/GPU with the new ones and replace the existing magnetic hard drive wit

On teaching Python to students at IIT Madras.

If you've been following my blog, you would've noticed my efforts to give talks on Python and get better at public speaking. Early December, I organized two workshops at SciPy India at IIT Bombay. Late December, I gave a talk at a local Python meetup. Early January, I attended the local PyLadies event. I wanted to do something similar at my almamater, IIT Madras. It also gave me a reason to go back and visit some of my classmates who are now grad students at IITM and the professor I worked with as an undergrad. Coming to the point, I gave number of sessions on Friday night and Saturday. The session on Friday night was on Git & GitHub. I used the same slides that I had during the SciPy workshop but we went further at IITM because we had internet connectivity. Towards the end, everyone had created a GitHub repository and everyone had pushed their local changes to GitHub. I didn't get time to get into what I call the advanced workflow, which involves branches and merging

Pandas download statistics, PyPI and Google BigQuery - Daily downloads and downloads by latest version

Image
Inspired by this blog post : https://langui.sh/2016/12/09/data-driven-decisions/ , I wanted to play around with Google BigQuery myself. And the blog post is pretty awesome because it has sample queries. I mix and matched the examples mentioned on the blog post, intent on answering two questions -  1. How many people download the Pandas library on a daily basis? Actually, if you think about it, it's more of a question of how many times was the pandas library downloaded in a single day, because the same person could've downloaded multiple times. Or a bot could've. This was just a fun first query/question. 2. What is the adoption rate of different versions of the Pandas library? You might have come across similar graphs which show the adoption rate of various versions of Windows. Answering this question is actually important because the developers should have an idea of what the most popular versions are, see whether or not users are adopting new features/cha

Visualizing the PyPI Pandas download statustics using Tableau - Downloads by location

Image
For some background, read my previous posts on the topic - simple queries to start understanding Pandas downloads ( https://rahulporuri.blogspot.in/2017/01/on-whos-downloading-pandas.html ),  building up queries incrementally to understand Pandas downloads ( https://rahulporuri.blogspot.in/2016/12/pandas-download-statistics-pypi-and.html ) and using Tableau Public to visualize Pandas downloads by version over the last 6 months ( https://rahulporuri.blogspot.in/2017/01/visualizing-pypi-pandas-download.html) Having visualized the total number of downloads per version per month of Pandas in the last post, we now come to the total number of downloads per month by location. The relevant query is SELECT STRFTIME_UTC_USEC( timestamp , "%Y-%m" ) AS yyyymm, country_code, COUNT ( * ) as total_downloads, FROM TABLE_DATE_RANGE( [the - psf:pypi.downloads], DATE_ADD( CURRENT_TIMESTAMP (), - 6 , "month" ), CURRENT_TIMESTAMP () ) WHERE file.pro

Visualizing the PyPI pandas download statistics using Tableau - Downloads by version

Image
For some background, read my previous posts on the topic - https://rahulporuri.blogspot.in/2017/01/on-whos-downloading-pandas.html and https://rahulporuri.blogspot.in/2016/12/pandas-download-statistics-pypi-and.html . Having looked at the total number of downloads for all versions of Pandas and downloads by month in the last post, we now come to the total number of downloads by month by version. The relevant query is SELECT STRFTIME_UTC_USEC( timestamp , "%Y-%m" ) AS yyyymm, file. version , COUNT ( * ) as total_downloads, FROM TABLE_DATE_RANGE( [the - psf:pypi.downloads], DATE_ADD( CURRENT_TIMESTAMP (), - 6 , "month" ), CURRENT_TIMESTAMP () ) WHERE file.project = 'pandas' GROUP BY file. version , yyyymm ORDER BY total_downloads DESC which returns a data set, that can be downloaded as a CSV file. The file is available at https://drive.google.com/file/d/0BxwQdgnuTo6JYzR1dUI0Zm5jbWs/view?usp=sharing . I v

On who's downloading Pandas - Total, monthly and version-specific downloads of Pandas.

Image
For those of you who don't know, Pandas ( http://pandas.pydata.org/pandas-docs/stable/ ) is a data analysis/manipulation library in Python. Most people download it using pip ( https://pip.pypa.io/en/stable/ ) which is the PyPA (Python Packaging Authority) recommended tool for installing Python packages. pip downloads the library from PyPI ( https://pypi.python.org/pypi ), which is the Python Package Index. Now, having introduced you to the jargon, let me get to the point. Because most people install Pandas using pip, PyPI has a count on the total number of Pandas downloads. Well, not just Pandas downloads but pretty much every Python library installed using pip. And, you know what, all of the data is available publicly via Google BigQuery ( https://bigquery.cloud.google.com/table/the-psf:pypi.downloads ). Think about all the data. Think about all the questions. For now, I'm going to ask a few questions, specific to the Pandas library. 1. How many people have downloaded

On helping college students/grads understand bias.

I've written about bias in the workplace once before here : ( https://rahulporuri.blogspot.in/2016/12/on-bias-at-work.html ). I mentioned about how I wished that we were made aware of the various implicit biases we have, as we entered the professional sector, especially because most of us we dealt with a variety of clients, all of us worked with a variety of people and some of us were also involved in choosing new colleagues. The spectrum of people we interact with on a day-to-day basis at work is why I feel the need to educate college graduates of their implicit biases. Now, the question is, how do we educate college students about their implicit biases? One thing that I could think of was to put the students through tests aimed at bringing out these implicit biases. What better way to educate a student than to put a mirror in front of him and help him understand what he is seeing. I haven't looked up what exactly these tests/questionaires are that help us understand impl

PyLadies Pune at redhat.

I attended the PyLadies meetup yesterday at the redhat offices in Magarpatta City. TL; DR : It was an awesome bunch of people and a pretty interesting meetup. For those of you who didn't know that a PyLadies chapter existed in Pune, follow them on twitter at - https://twitter.com/PyLadiesPune , join meetup and start following the meetup group at - https://www.meetup.com/PyLadies-Pune/ . The event is sponsored by Red Hat and they always happen at the Red Hat offices in Tower X, Magarpatta City, which is awesome for me because I stay/work 10 mins away. It was supposed to start at 4:30 PM. Kushal Das, one of the organizers and speakers for the day, slowly started talking about the basics of systems programming using Python, starting with the os module and how it can be used to get the current working directory, to change the current working directory, to make new directories and get environment variables. Short on time, the systems programming talk ended fast. The second talk w

On the infinite capabilities of a programmer - Law and the Judiciary

Before I get to my point, let me take you on a detour. I was having breakfast on the 1st of Jan with a bunch of friends, and one of the friends' niece. The niece just finished her 8th standard and is moving to the 9th. Every single person at the dining table was from a different career direction - an MBA, a physics student turned programmer, a graduate student studying ecology, an MBBS student and an engineer turned entrepreneur. And we were all talking to her about academics in general and about specific subjects like mathematics and science. Eventually, when it was my turn to talk, I just asked her to learn how to program. No matter what she decides to become, a doctor or an engineer, learning how to program will open new doors for her. Now, let me get to my main point, which are the infinite new possibilities that arise from being a programmer. Programming is becoming an integral part of every profession and learning how to program will open doors that you didn't even kno

on MOOCs.

For those of you who don't know, MOOC stands for Massively Open Online Course. The internet is an awesome thing. It's making education free for all. Well, mostly free. But it's surprising at the width and depth of courses being offered online. And it looks like they are also having an impact on students, especially those from universities that are not top ranked. Students in all parts of the world can now get a first class education experience, thanks to courses offered by Stanford, MIT, Caltech, etc. I'm talking about MOOCs because one of my new year resolutions is to take online courses, atleast 2 per semester (6 months). And I've chosen the following two courses on edX -  Analyzing Big Data with Microsoft R Server and Data Science Essentials for now. I looked at courses on Coursera but I couldn't find any which was worthy and free. There are a lot more MOOC providers out there but let's start here. And I feel like the two courses are relevant to whe