Skip to main content

Week 3 - PYTHON!!! and Data Analysis.

Yes. The exclamation marks, all of three of them are necessary!
Because i cannot tell you in words how blown away i am having used python for the last week. Don't get me wrong, i've used/tried python before, learning for a couple of days but eventually abandoning it! And the reason i love (yes, love. not just like. ever-lasting love) it so much is because of what i've been able to accomplish. Before i start with what happened this week, I'm sure there are ways to do the same using other programming languages : C, Java, Mathematica and what not but they (probably) don't have the extremely extensive documentation and gazillions of examples.

So, as I said last week, my work for the summer was the reproduce the results of a certain paper by Richards, Gordon T (2000) which analyses the color-redshift relation of 2625 quasars. I've extensively explained the details of the paper and the results i am to reproduce in last week's post. And as i said, last week ended with data in hands, acquired from the SDSS archive through SQL Query, which i was to refine and analyze. There are actually 2 data sets i worked on, one is a data set of ~900 quasars, part of the original 2625 quasars analyzed in the paper and the second one what what i acquired from the SDSS.

The original data set came with a README file, which gave me this information -

Byte-by-byte Description of file: table2.dat
--------------------------------------------------------------------------------
   Bytes    Format    Units   Label    Explanations
--------------------------------------------------------------------------------
   1- 30    A30          ---      Name    Quasar name
  32- 36    F5.3         ---        z         Redshift
  38- 39    I2              h       RAh     Right Ascension (J2000)
  51- 52    I2            deg     DEd      Declination (J2000)
  67- 71    F5.2        mag     u*mag   The SDSS u* band photometry (1)
  79- 83    F5.2        mag     g*mag   The SDSS g* band photometry (1)
  91- 95    F5.2        mag     r*mag   The SDSS r* band photometry (1)
 103-107  F5.2        mag     i*mag    The SDSS i* band photometry (1)
 115-119  F5.2        mag     z*mag   The SDSS z* band photometry (1)

i removed a couple of the lines from the file but well that was the part of the file that i needed! And let me tell you now why this readme file was necessary! When i started using python to extract the information from this .dat file, each lines was stored as a string, a string where the bits 32:36 are the red shift of the object and so on. You get the point. So, in order to extract the red shift from each of these strings, i needed to know the exact byte location of the different columns i needed to extract. Again, i'm sure i'd have eventually figured out what the structure of the file is but hey, this made my life easier.

Moral of the story : Have README files. in each and every directory. For your sake and for someone else's sake who might be using your system or your data later on.

so, once i'd extracted the red shift and the 5 magnitudes, i calculated the 4 colors - u-g, g-r, r-i, i-z - if you don't understand what i mean by color, go read last week's post! and then i go about plotting the a histogram of the redshift and the colors as a fn of redshift.

Here's how they looks like -
NOTE : you could open the images in a new tab, to see the details better.



I'd say they came out pretty good, you can compare them from the original plots, you can check them from the previous post! So, as you can see, the color-red shift relation isn't random, there is a clear relation between the color of a quasar and the red shift it is at! All 4 of the colors show a beautiful evolution.

Now, we go about calculating the median redshift of each of these colors.
We take bins at intervals of 0.05 on the x axis - the red shift - where the bin width is 0.075. Now, we look at all of the quasars in this bin and calculate the median color for these quasars. Once we've done the same for all of the bins, we plot the median color against red shift.

Here's how that looks like -


The blue line you see is the median color - redshift relation.

Now, let's get to why i'm doing all of this!
As i've said the last time, ted shifts of quasars are obtained by studying the spectra of these objects. And spectroscopy is in general very time taking, in comparison to photometry of the quasar. So, if there is a method to estimate the red shift of a quasar using just photometry, then it'd make life simpler. Also, by estimating a quasar's red shifts using photometry (called PhotoZ from hereon), we can better select candidates that need to be studied using spectroscopy - quasars with high estimated PhotoZ can be selected for spectroscopy instead of having to study each quasar! It basically makes our object selection more efficient!

Estimating PhotoZ, as you probably see by now, i done by looking at where a quasar falls in the 4 color-red shift plots above.

So, that's the year 2000. This telescope has been observing day and night (ohh wait, telescopes can't observe in the day, unless they're radio telescopes) and identifying more and more quasars. The quasar count this time is ~300,000 and the # of quasars whose red shift has been calculated using spectroscopy is ~150,000! Now, i'm basically trying to replicate the same process i mentioned above - calculating color, plotting color-red shift and calculating median color - with data on the ~150,000 quasars!

And the reason why i was talking about the README file in the previous case is because the new data is in a .csv format, it did NOT come with a readme file and it was a bitch to look for ways to open and extract useful information out of it!

Anyway, here's how the preliminary results look like -
(i'm running the code to calculate the median colors as i write)

the 4 color-redshift plots
enhanced -
the median u-g color vs red shift. Partial plot. Further work needed here. 

I'll write about the different python libraries i've come across over the week, in search of ways to read and analyse the files (the original .dat and the new .csv) in a separate post but well, that's all for now. 

In the mean while, try figure out how to calculate the median color. To elaborate, let's say i have a file with 2 columns * 898 rows worth of data - one is the color and the second is the redshift. To note, the columns aren't ordered - neither by color nor by red shift. The range of red shifts though is 0-5. 
Now, as i mentioned, take bins at red shift intervals of 0.05 whose size is 0.075 
(yes, two consecutive bins do overlap! don't ask me why i chose this size, it's from the paper!)
and extract all of the quasars in this red shift bin. calculate their median color and move on to the next bin. There is a brute force way to do this, which i'm running right now cos i need results right now. And then there's an elegant way i'm trying to code. Let's see if it works cos OHMYGOD the brute force is taking too long! 

So, there you go. 
Work so far this week.
Later...

Popular posts from this blog

Animation using GNUPlot

Animation using GNUPlotI've been trying to create an animation depicting a quasar spectrum moving across the 5 SDSS pass bands with respect to redshift. It is important to visualise what emission lines are moving in and out of bands to be able to understand the color-redshift plots and the changes in it.
I've tried doing this using the animate function in matplotlib, python but i wasn't able to make it work - meaning i worked on it for a couple of days and then i gave up, not having found solutions for my problems on the internet.
And then i came across this site, where the gunn-peterson trough and the lyman alpha forest have been depicted - in a beautiful manner. And this got me interested in using js and d3 to do the animations and make it dynamic - using sliders etc.
In the meanwhile, i thought i'd look up and see if there was a way to create animations in gnuplot and whoopdedoo, what do i find but nirvana!

In the image, you see 5 static curves and one dynam…

on MOOCs.

For those of you who don't know, MOOC stands for Massively Open Online Course.

The internet is an awesome thing. It's making education free for all. Well, mostly free. But it's surprising at the width and depth of courses being offered online. And it looks like they are also having an impact on students, especially those from universities that are not top ranked. Students in all parts of the world can now get a first class education experience, thanks to courses offered by Stanford, MIT, Caltech, etc.

I'm talking about MOOCs because one of my new year resolutions is to take online courses, atleast 2 per semester (6 months). And I've chosen the following two courses on edX - Analyzing Big Data with Microsoft R Server and Data Science Essentials for now. I looked at courses on Coursera but I couldn't find any which was worthy and free. There are a lot more MOOC providers out there but let's start here. And I feel like the two courses are relevant to where I …

On programmers.

I just watched this brilliant keynote today. It's a commentary on Programmers and the software development industry/ecosystem as a whole.



I am not going to give you a tl;dr version of the talk because it is a talk that I believe everyone should watch, that everyone should learn from. Instead, I am going to give my own parallel-ish views on programmers and programming.
As pointed out in the talk, there are mythical creatures in the software development industry who are revered as gods. Guido Van Rossum, the creator of Python, was given the title Benevolent Dictator For Life (BDFL). People flock around the creators of popular languages or libraries. They are god-like to most programmers and are treated like gods. By which, I mean to say, we assume they don't have flaws. That they are infallible. That they are perfect.
And alongside this belief in the infallibility of these Gods, we believe that they were born programmers. That programming is something that people are born wit…