Posts

Showing posts from 2017

No moar books!

This is the last blogpost among a series of 3, which list all the graphic novels that I read in the last 6 weeks, while I was visiting the Austin office of my company. Most of these books were suggested to me by my colleagues, something that really surprised me. I didn't know there were so many work colleagues who were into graphic novels and comic books. Joe The Barbarian - Is one of the best graphic novels I've read these past two months. It's the story of a hypoglycemic kid and it's wonderfully told. It's one that I recommend everyone read. Watchmen - You might've seen the movie. I liked the graphic novel better than the movie, even though the movie was on the back of my mind the whole time. I liked it better because of the full page scenes and how much deeper the story is, when compared to the movie. It's a classic and I'd definitely recommend it if you're interested in graphic novels. 300 - Again, you might've watched the movie. Ag

Moar books!

In the last post , I talked about some of the graphic novels and books I'd read over the last 6 weeks. Most of the books in that post were multiple volumes of a series that I'd started reading. A few of the books listed here might be part of a series but I haven't (yet) read the other volumes in the series. Volume I of Maus . Maus is the story of a survivor of the Nazi concentration camps, being told by his son. It's visually different than anything else I've read, especially because the Jews are represented as Mouse and the Nazis as Cats in the graphic novel. People of other races are represented as other animals, for example the Polish people are represented as Pigs, if i remember correctly. It's a sad but powerful read. Snow Piercer : The Escape is part I of a 3 part series about a fictional future where all life on the planet has been wiped off because of severe cold. Earth is blanketed with snow and humans survive on a train, that chugs along the cont

Books!

To give you some context, I've been in Austin, Tx since August the 3rd. Starting the first weekend of August, I've been voraciously reading graphic novels or, as you might call them, comic books. Loads and loads of comic books. And what an awesome time it has been. Over the course of the last 6 weeks, I have read a total of 37 books, 35 of which were comic books and 2 of which were books. Let me get the books out of the way first. I read The Postmortal , which is the story of a future where people could take a shot to prevent aging. It's an interesting read, not just because of the premise but also because of the way in which it's written. I wouldn't mind reading it again and I would definitely suggest it to someone interested in science fiction. The second book I read is When breath becomes air , which is an autobiography of the Stanford neurosurgeon Paul Kalanithi and his life before and after he was diagnosed with Lung Cancer. It's a beautiful story and

300 : The graphic novel

I just read 300. Yes. Read, not watched. The movie 300 is based on the graphic novel 300, by Frank Miller. If you are interested in comic books and graphic novels, you would've heard of Frank Miller. You would be familiar with his work if you are fan of Batman. Now, coming to the actual book, it was beautifully drawn and interesting to read. I kept comparing it to the movie, in the back of my head. I no doubt prefer the book to the movie. The book felt more visually appealing than the movie. It was also better at grabbing and holding on to my attention better than the movie. I don't know why but I very much prefer reading the story than watching the story, even if the movie had followed the story exactly. Coming to deviations between the book and the movie, the story arc is probably the most prominent, atleast in the beginning of the book. The book takes a non-linear arc where we see the 300 prepare for war as we learn about how Leonidas became king and how the war started

on Institutions of National Importance

I reordered the list from http://mhrd.gov.in/institutions-national-importance and put them up on Github at https://github.com/rahulporuri/INI/blob/master/README.md

the IIITs setup under PPP

For those of you who don't know, IIIT stands for Indian Institute of Information Technology and PPP stands for Public-Private-Partnership. There are a total of 23 IIITs in India and the 18 new ones have been setup under the PPP mode, where the Central govt contributes 50% of the cost of setting up the institute, the state 35% and private/industry partners 15%. I didn't know about you but this was news to me. I wanted to know what the industries were getting in return, what their angle was in investing in the IIITs. I wanted to know who was investing in the first place. Alas, there was no single location where I could see the private partners of each IIIT, I'd have to dig through the Wiki pages in some cases or go through the IIIT webpage. In two cases, I couldn't find the webpages for the IIITs in question. I got bored yesterday so I took the time to look through the Wiki list of IIITs and make note of the private partners of each of the IIIT. Like I mentioned, in

Why did Facebook choose this default?

Image
I removed a lot of personal information from Facebook on Friday night and I ended up with a small box on the left hand side of my Wall, asking me for personal information, prodding me to complete my profile. One of those questions were on where I went to college. The options I saw were a list of colleges friends of mine went to. What I found interesting was the last option that read "I didn't go to university." Why is this option Private by default? I wonder why Facebook chose to make it Private if someone didn't go to university. Is it because of the usual stigma associated with not being a college graduate? Just wanted to point it out.

Where Facebook thinks I work

Image
So, I removed most personal information from Facebook and I ended up with a box on my Wall asking me to fill in some information. One of the questions is about where I worked at in the past. The options Facebook gave me are : The options Facebook gave surprised me, to say the least. The reason I am surprised at these options is because of how they differ from options I got for other questions. For example, the options I got for the question of where I live currently was a list of places my friends currently live in. Similarly, the options I got for the question of where I studied was a list of colleges my friends studied at. The only company on this list that my friends work at is Ather, as is evidenced. I cannot recall a single person in my Facebook friend circle who works at either TCS or Cognizant or at HCL. Why Facebook decided to add those items to the list will remain a mystery.

Stack Overflow dev survey - Cleaning up numbers

If you haven't read my last post , I am taking a look a Stack Overflow's yearly developer survey. After the survey is complete, the release the data collected publicly, which can be found here . At first, I was only looking at numbers from the 2017 survey but there were a few questions I wanted to ask using survey numbers from 2011-2017, such as the ratio of Male/Female developers, among those responded to the developer survey and the change in salary of developers over the last 7 years. Which is when I hit a roadblock - the actual survey data. Real life data is a mess, starting from the weird formats it is saved in to the weird data saved in them. For example, the developer's country/region has been a consistent question over the years but the possible answers have changed e.g. from "United States of America" to "United States". The column name for this data has changed as well, from "country" to "Country" to "What country

Stack Overflow Dev survey - India

Prelude If you didn't know, Stack Overflow is a community of 7.3 million programmers, just like you, helping each other. It's a Q&A forum with probably the most comprehensive knowledge of the intricacies of programming languages and associated libraries, knowledge which volunteers contributed. Being part of the programming community, they run a survey every year. The survey is developer-centric and asks questions on what language you use, what language you want to learn/use, how much you get paid, what your title is, where you work from, what your gender is, and so on. It is one of the largest surveys, as far as I know. This blogpost looks at data from the most recent SO survey from 2017. Specifically, it looks at data submitted by Indian developers. Before we take a peek at the data, the data is freely available at this URL . You will also find raw data from surveys through 2011-2017. You might also want to take a look at this blogpost from the SO folks regarding

Scientific Computing 101 using Python

On 21 May, I conducted a workshop on Scientific Computing 101 using Python. Over the course of 3 hours, I introduced the Numpy and Pandas packages. The participants practiced using the packages in Jupyter Notebooks. It was a great learning experience for me to make the slides and deliver the material. I did a workshop at IIT Madras earlier on the same topic but this one was better planned and cleaner content. The content was delivered using Jupyter Notebooks, which can be found on my github repository . I have been working on the notebooks to add documentation, references and a few exercises. I will keep adding more exercises in time. Going forward, the participants were interested in a followup workshop, with a focus on data science, which I presume means Machine Learning using scikit-learn. Let's see how that goes. Until next time.

On Virtual Environments and Environment Managers in Python.

Last weekend, I gave a brief talk on Virtual Environments in Python at the April meet of the PythonPune meetup group . Yesterday, I conducted a workshop on the same topic at the May meet of the PyLadies meetup group . The slides can be found on a GitHub repository . The GitHub repo also contains slides for most other talks/workshops I gave/conducted. To give you a brief overview of the slides, I introduce what a Virtual Environment are, why we should use them and finally, how we go about using them. Along with Virtual Environments, I also introduce environment managers such as conda and edm and talk about how the environments created by them differ from virtual environments. Neither virtual environments not environment managers help understand the Python language better but they are crucial to using Python in enterprise/open-source projects. A large number of open source Python libraries suggest the use of virtual environments if you are interested in contributing.

On packages in Python

Yesterday, I conducted a workshop on Python Packages at the May meet of the PyLadies meetup group . Slides I used for the workshop can be found on the GitHub repository . The GitHub repository also contains slides I used for other talks/workshops I gave/conducted. To give you a brief overview of the workshop, I first introduced what a Python package is, why we should use them and finally, how to go about creating a Python package. Python packages make distribution and installation of your work easy, crucial to get others to use your work. Two references I highly recommend are the sample project published by PyPA and the Packaging User Guide, also published by PyPA .

On satellite imagery and sand mafia

The cost of launching satellites is going down, be it in the US or in India. Smaller and easier to develop micro-satellites are the latest trend, usually developed with a specific goal in mind e.g. satellite imagery. These, and a lot of other factors, have contributed to an increase in the number of satellites hovering over earth in the recent years, a number which is bound to only keep increasing. ISRO recently put 88 (micro) satellites belonging to Planet Labs , satellites which Planet Labs will use to image the Earth everyday. ISRO and other government space organizations themselves have satellites that image the Earth and/or their respective countries on a regular basis. And a number of these organizations are releasing their data publicly. With that context, I realized a while back that daily imagery will help identify and possibly curb sand mafia, specifically in India . The boom in infrastructure, specifically housing and office construction in India, was one of the reasons

On Drones, Caves and the Ocean.

My fascination of drones is growing day after day. Of all the areas drones can be used, I want to talk about exploration for the moment. You might be wondering, what exploration? All corners of the Earth have been explored and there's nothing else to discover. Au Contraire. Correction. The surface of the Earth has been explored, not what's underneath. NASA and other space organizations have been sending out satellites and telescopes to understand what's out there but similar efforts aren't being made to understand unexplored caves and the depths of the ocean. Two articles that I read recently drove this point home for me; one on exploring a cave formation in Uzbekistan and another on exploring the Indian Ocean's floor . The first is an account of a team of scientists' and explorers' attempt at exploring a cave formation and discovering new regions of the cave. The second is an account of how the search for the missing MH370 flight also led to a detaile

On translation and AI

This article on Baidu and it's bet on AI sparked a thought in my head. Well, the thought was to basically copy what Baidu is doing in China and do the same in India. I wrote earlier on the language barrier in India and how AI can help solve the problem. The crux of what Baidu is doing in China is to employ a bunch of people to translate English technical documents to Chinese (Mandarin prolly). By doing so, not only is the industry able to communicate with Chinese customers better, Baidu now has a big database of English-Chinese (Mandarin) translations. Don't you think doing the same with Indian languages would be awesome? Instead of hackathons and docathons, we could have translate-athons (?) where volunteers could translate documents into the numerous Indian languages and in the process, help in compiling an impressive English-Indian language dataset, a dataset that can now be used to train AI to better translate between English and Indian languages and maybe even betwe

On Cython and speeding up Python code

I gave a brief talk on Cython during the February meet of the PythonPune meetup group . The talk I had in mind when I had proposed the session and the one I ended up delivering at the meet were very different, mainly because I wanted to limit the scope of the talk. Initially, I wanted to dive into Cython and give numerous code examples. However, when I finally sat down the make the presentation, I chose to lay back and give an overview of why Cython is needed in the first place and other ways to speed up Python code. I talked about how Cython can be used to speedup Python routines and work with existing C/C++ code bases. I took a detour and asked the audience why Python was slow in the first place, in comparison to C/C++/Java/JavaScript. After driving the difference between interpreted, compiled and JIT-compiled languages, I introduced JIT compilers available in Python land e.g. Psyco, PyPy and Pyjion. PyPy is the most popular alternative compiler to the Python language but isn

On the language barrier and AI.

Natively, I speak Telugu. I've learnt to speak Hindi and English. I currently live in a city, where most speak Marathi. I used to live in a city where most spoke Tamil. And I'm not atypical. Most Indians cross state lines over the course of their lives, for work or other reasons. And the language barrier is the biggest problem while doing so. Personally, even though I lived in Chennai for over 6 years, I still can't speak or understand Tamil and Marathi might as well be Arabic to me, even after living in Pune for just over an year. (I'm clearly not trying hard enough to learn the language.) Unlike me, most people who stay for a long (1+ year) at a location learn the local language, because it makes day-to-day life easier. But what about travelers? What about those who stay for a short period of time? What about those not educated in a common language? You can argue that a majority of Indians can speak and understand Hindi. Actually, No. And no, the majority of India

On the market for assembled desktops, laptops and phones.

I am currently on my third laptop in 8 years. The one I have currently is pretty powerful but the last two weren't as powerful as I wished. They didn't have a dedicated GPU. They didn't have SSDs. Their CPUs were a generation old. Don't get me wrong, they were still very helpful, especially given that most of my time was spent programming and running long-running simulations. But, I would've loved to be able to swap out the CPU or the SSD on my old laptop, instead of having to throw it away and buy a new one. If I had a desktop, instead of a laptop, throwing the old out and replacing them with the new might have been easier. There are a number of people who still prefer assembling their desktop from scratch. And there are businesses that cater specifically to these customers. The good thing about assembling your own desktop, as I mentioned earlier, is the fact that you can swap out the old CPU/GPU with the new ones and replace the existing magnetic hard drive wit

On teaching Python to students at IIT Madras.

If you've been following my blog, you would've noticed my efforts to give talks on Python and get better at public speaking. Early December, I organized two workshops at SciPy India at IIT Bombay. Late December, I gave a talk at a local Python meetup. Early January, I attended the local PyLadies event. I wanted to do something similar at my almamater, IIT Madras. It also gave me a reason to go back and visit some of my classmates who are now grad students at IITM and the professor I worked with as an undergrad. Coming to the point, I gave number of sessions on Friday night and Saturday. The session on Friday night was on Git & GitHub. I used the same slides that I had during the SciPy workshop but we went further at IITM because we had internet connectivity. Towards the end, everyone had created a GitHub repository and everyone had pushed their local changes to GitHub. I didn't get time to get into what I call the advanced workflow, which involves branches and merging

Pandas download statistics, PyPI and Google BigQuery - Daily downloads and downloads by latest version

Image
Inspired by this blog post : https://langui.sh/2016/12/09/data-driven-decisions/ , I wanted to play around with Google BigQuery myself. And the blog post is pretty awesome because it has sample queries. I mix and matched the examples mentioned on the blog post, intent on answering two questions -  1. How many people download the Pandas library on a daily basis? Actually, if you think about it, it's more of a question of how many times was the pandas library downloaded in a single day, because the same person could've downloaded multiple times. Or a bot could've. This was just a fun first query/question. 2. What is the adoption rate of different versions of the Pandas library? You might have come across similar graphs which show the adoption rate of various versions of Windows. Answering this question is actually important because the developers should have an idea of what the most popular versions are, see whether or not users are adopting new features/cha

Visualizing the PyPI Pandas download statustics using Tableau - Downloads by location

Image
For some background, read my previous posts on the topic - simple queries to start understanding Pandas downloads ( https://rahulporuri.blogspot.in/2017/01/on-whos-downloading-pandas.html ),  building up queries incrementally to understand Pandas downloads ( https://rahulporuri.blogspot.in/2016/12/pandas-download-statistics-pypi-and.html ) and using Tableau Public to visualize Pandas downloads by version over the last 6 months ( https://rahulporuri.blogspot.in/2017/01/visualizing-pypi-pandas-download.html) Having visualized the total number of downloads per version per month of Pandas in the last post, we now come to the total number of downloads per month by location. The relevant query is SELECT STRFTIME_UTC_USEC( timestamp , "%Y-%m" ) AS yyyymm, country_code, COUNT ( * ) as total_downloads, FROM TABLE_DATE_RANGE( [the - psf:pypi.downloads], DATE_ADD( CURRENT_TIMESTAMP (), - 6 , "month" ), CURRENT_TIMESTAMP () ) WHERE file.pro

Visualizing the PyPI pandas download statistics using Tableau - Downloads by version

Image
For some background, read my previous posts on the topic - https://rahulporuri.blogspot.in/2017/01/on-whos-downloading-pandas.html and https://rahulporuri.blogspot.in/2016/12/pandas-download-statistics-pypi-and.html . Having looked at the total number of downloads for all versions of Pandas and downloads by month in the last post, we now come to the total number of downloads by month by version. The relevant query is SELECT STRFTIME_UTC_USEC( timestamp , "%Y-%m" ) AS yyyymm, file. version , COUNT ( * ) as total_downloads, FROM TABLE_DATE_RANGE( [the - psf:pypi.downloads], DATE_ADD( CURRENT_TIMESTAMP (), - 6 , "month" ), CURRENT_TIMESTAMP () ) WHERE file.project = 'pandas' GROUP BY file. version , yyyymm ORDER BY total_downloads DESC which returns a data set, that can be downloaded as a CSV file. The file is available at https://drive.google.com/file/d/0BxwQdgnuTo6JYzR1dUI0Zm5jbWs/view?usp=sharing . I v

On who's downloading Pandas - Total, monthly and version-specific downloads of Pandas.

Image
For those of you who don't know, Pandas ( http://pandas.pydata.org/pandas-docs/stable/ ) is a data analysis/manipulation library in Python. Most people download it using pip ( https://pip.pypa.io/en/stable/ ) which is the PyPA (Python Packaging Authority) recommended tool for installing Python packages. pip downloads the library from PyPI ( https://pypi.python.org/pypi ), which is the Python Package Index. Now, having introduced you to the jargon, let me get to the point. Because most people install Pandas using pip, PyPI has a count on the total number of Pandas downloads. Well, not just Pandas downloads but pretty much every Python library installed using pip. And, you know what, all of the data is available publicly via Google BigQuery ( https://bigquery.cloud.google.com/table/the-psf:pypi.downloads ). Think about all the data. Think about all the questions. For now, I'm going to ask a few questions, specific to the Pandas library. 1. How many people have downloaded

On helping college students/grads understand bias.

I've written about bias in the workplace once before here : ( https://rahulporuri.blogspot.in/2016/12/on-bias-at-work.html ). I mentioned about how I wished that we were made aware of the various implicit biases we have, as we entered the professional sector, especially because most of us we dealt with a variety of clients, all of us worked with a variety of people and some of us were also involved in choosing new colleagues. The spectrum of people we interact with on a day-to-day basis at work is why I feel the need to educate college graduates of their implicit biases. Now, the question is, how do we educate college students about their implicit biases? One thing that I could think of was to put the students through tests aimed at bringing out these implicit biases. What better way to educate a student than to put a mirror in front of him and help him understand what he is seeing. I haven't looked up what exactly these tests/questionaires are that help us understand impl

PyLadies Pune at redhat.

I attended the PyLadies meetup yesterday at the redhat offices in Magarpatta City. TL; DR : It was an awesome bunch of people and a pretty interesting meetup. For those of you who didn't know that a PyLadies chapter existed in Pune, follow them on twitter at - https://twitter.com/PyLadiesPune , join meetup and start following the meetup group at - https://www.meetup.com/PyLadies-Pune/ . The event is sponsored by Red Hat and they always happen at the Red Hat offices in Tower X, Magarpatta City, which is awesome for me because I stay/work 10 mins away. It was supposed to start at 4:30 PM. Kushal Das, one of the organizers and speakers for the day, slowly started talking about the basics of systems programming using Python, starting with the os module and how it can be used to get the current working directory, to change the current working directory, to make new directories and get environment variables. Short on time, the systems programming talk ended fast. The second talk w

On the infinite capabilities of a programmer - Law and the Judiciary

Before I get to my point, let me take you on a detour. I was having breakfast on the 1st of Jan with a bunch of friends, and one of the friends' niece. The niece just finished her 8th standard and is moving to the 9th. Every single person at the dining table was from a different career direction - an MBA, a physics student turned programmer, a graduate student studying ecology, an MBBS student and an engineer turned entrepreneur. And we were all talking to her about academics in general and about specific subjects like mathematics and science. Eventually, when it was my turn to talk, I just asked her to learn how to program. No matter what she decides to become, a doctor or an engineer, learning how to program will open new doors for her. Now, let me get to my main point, which are the infinite new possibilities that arise from being a programmer. Programming is becoming an integral part of every profession and learning how to program will open doors that you didn't even kno

on MOOCs.

For those of you who don't know, MOOC stands for Massively Open Online Course. The internet is an awesome thing. It's making education free for all. Well, mostly free. But it's surprising at the width and depth of courses being offered online. And it looks like they are also having an impact on students, especially those from universities that are not top ranked. Students in all parts of the world can now get a first class education experience, thanks to courses offered by Stanford, MIT, Caltech, etc. I'm talking about MOOCs because one of my new year resolutions is to take online courses, atleast 2 per semester (6 months). And I've chosen the following two courses on edX -  Analyzing Big Data with Microsoft R Server and Data Science Essentials for now. I looked at courses on Coursera but I couldn't find any which was worthy and free. There are a lot more MOOC providers out there but let's start here. And I feel like the two courses are relevant to whe