Skip to main content

Week 2 - Data Mining & Colors of a Quasars

So, here's what i'm working on.
I'm reproducing a paper published in 2000 by Richards et al on the colors of 2625 quasars at red shifts ranging from 0<z<5 and the empirical relation between color and red shift of a quasar. I don't know how much of that you understand but well, that's my work for the summer.

Let me now break down it down.

The color of a quasar is the difference in the (AB) magnitude of a quasar in different bands - u, g, r, i, z to be specific in the case of the SDSS. The definition of color in this case is no different than what you and I use everyday to describe objects. Refer to this awesome comic by the oatmeal.  The color receptors mentioned in the comic are specifically sensitive to the R, G & B bands, which is why they are our primary colors. For humans specifically. Now, we do something similar here - take a CCD (which is our brain) and fit it with 5 (instead of just 3) filters (which are u, g, r, i, z for the SDSS). Light from of these filters is spread over the CCD and we can record the strength of the quasar or stellar object in different wavelength bands by looking at the object in that specific part of the CCD.


Now, as i said earlier, color of a quasar is defined as the difference in the magnitude of a quasar in different bands. Mag_u - Mag_g is one of the colors and so on, well you get the point. So, i now have 4 colors for a quasar. (There's a trick question here regarding the # of colors, if you have the same question as i did in the beginning, comment and ask. Otherwise, just move on). 

Now, the Sloan Digital Sky Survey has been running since the early 2000s and has been really successful - a quick  SQL search yielded ~150,000 quasars whose spectra were recorded, probably more whose spectra haven't been recorded yet and 500,000 galaxies whose magnitudes in the 5 bands were known (or maybe 500,000 is what the output is limited to.It's just too perfect a number). Plus a heck lot more stars, Seyfert galaxies, BAL galaxies, CEL galaxies and what not. So, yeah, it's a pretty awesome survey. 

So, after an year of observations, they decided to study the color-red shift relation of quasars. While a paper from 1999 by Fan  giving a relation between the color of a quasar and the red shift of a quasar. The SDSS guys wanted to see if the relation held true. (Fan was actually part of the SDSS team, i think). Having found ~1500 quasars themselves and with ~ 1000 quasars already catalogued in NED, SIMBAD, they went about plotting color-red shift and color-color diagrams of a quasar. And whatdoyaknow, it fits. To a good extent.

LLS - Lyman Limit Spectrum - refers to the blueward wavelength region to the Ly alpha line!



(on a different note, i'm not reproducing these figures with permission from the author. Ohh i sure do hope i won't get into any trouble).

So, see, it's a good enough fit. 
Now, the reason to do all of this isn't just to prove the color-red shift relation, it's for a much bigger cause. To calculate red shifts for other quasars which haven't been spectroscopically studied yet (and might never be). So, the red shifts to all of these objects were calculated using spectroscopy. There are specific lines in the spectrum of a quasar which can be used to identify the red shift of the quasar. Here's a sample spectrum of a quasar. If you want to know why there are [] and ] around some of the elements, comment and ask. 


So, to calculate the red shift of a quasar (or any object in general), the most accurate method is to study the spectrum of said object, identify the different emission lines(this is a fun task!), look at the observed wavelength of the lines (which is different from the actual wavelength because of the red shift) and because we know what the rest wavelength of the line, calculate red shift. 

Easy peesy. 

But alas, spectroscopy is a very arduous and time consuming task. Compared to photometry (studying the magnitudes of star in different bands), spectroscopy takes a lot more time (in order to reach the same level of SNR per pixel as that in photometry). So, people use other methods to calculate the red shift of objects. 

This is where things get interesting. 


Source : the Richards et al 2000  paper. 

So, this image conveys three things. 
1. the pass bands - u, g, r, i, z - and their wavelength transmission! 
2. The spectrum of a quasar which is at a red shift (z) = 2.8 and
3. The spectrum of a F5V star. 

If you aren't able to differentiate between the quasar and the star, see the continuum spectrum with a peak at ~ 5000 A?! With a lot of jagged lines to the left?! Well, that's the Quasar spectrum and the other one is the star. That sharp peak you see is the Lyman alpha line and the zig-zag lines to the left of it are collectively referred to as the lyman forest! 

Just looking at the spectrum of quasars at various red shifts, 
source : Richards et al (2000)

As you can see, each emission line can be considered a spectral feature and with increasing red shifts, some of them move out of the observable band (like the Balmer alpha line) and new lines (like the lyman alpha line) move into the observable band. 

Now, if you perform photometry on these quasars, there is a certain trend to be seen. As you look at further and further quasars, there is a certain trend in the colors which is beautifully explained here, in the same Richards et al 2000 paper. 

Things are slowly starting to fall into place. 

As you can see, just by performing photometry on quasars (which is easier and faster than spectroscopy), we can calculate redshifts (to a precision of ~0.3!). 

Isn't that just beautiful?! 
MIND = BLOWN!

Read the paper further to exactly understand the method used to calculate the red shift of a quasar - they use a least chi-squared method to estimate the red shift

So, well, that's what i intend to reproduce. 
The relation between the color & red shift of a quasar obtained empirically and through simulation. 

Well, actually, i thought i could just download the magnitudes and red shift for the 2625 quasars used by Richards et al but the SIMBAD & NED links give me just 898 objects! I don't know where to find the rest. Probably in the 1st Data Release of SDSS! 

But hey, i thought i could just perform an SQL query (i'm calling this data mining. Why?! Cos it sounds cool, that's why!) given the same constraints as those used in the paper and whoopdedaa, i get data for ~140,000 quasars. ~140,000 quasars with 10 columns each, containing information regarding the 5 magnitudes, object ID, classification, RA & Dec and what not. I'm sorry but if that number isn't impressive, i don't know what is!

Now, my job (since the last couple of days and) for this week is to plot the same color-red shift relation and draw an empirical relation between the color and red shift of a quasar but instead of just 2625 quasars, i have ~140,000! I'm soo looking forward to this week! 

Well, that's most of what was happening last week, my trying to understand the Richards et al 2000 paper and couple of others which were cited in it, learning SQL query (from here) and retrieving data regarding quasars and studying the theory behind quasar emissions, understanding the characteristic power-law spectrum and emission-absorption features on a quasar spectrum. This post, which is already too long, will go on forever if i get started about those as well. So, i shall limit myself here. 

Until next week then...

Popular posts from this blog

Animation using GNUPlot

Animation using GNUPlotI've been trying to create an animation depicting a quasar spectrum moving across the 5 SDSS pass bands with respect to redshift. It is important to visualise what emission lines are moving in and out of bands to be able to understand the color-redshift plots and the changes in it.
I've tried doing this using the animate function in matplotlib, python but i wasn't able to make it work - meaning i worked on it for a couple of days and then i gave up, not having found solutions for my problems on the internet.
And then i came across this site, where the gunn-peterson trough and the lyman alpha forest have been depicted - in a beautiful manner. And this got me interested in using js and d3 to do the animations and make it dynamic - using sliders etc.
In the meanwhile, i thought i'd look up and see if there was a way to create animations in gnuplot and whoopdedoo, what do i find but nirvana!

In the image, you see 5 static curves and one dynam…

on MOOCs.

For those of you who don't know, MOOC stands for Massively Open Online Course.

The internet is an awesome thing. It's making education free for all. Well, mostly free. But it's surprising at the width and depth of courses being offered online. And it looks like they are also having an impact on students, especially those from universities that are not top ranked. Students in all parts of the world can now get a first class education experience, thanks to courses offered by Stanford, MIT, Caltech, etc.

I'm talking about MOOCs because one of my new year resolutions is to take online courses, atleast 2 per semester (6 months). And I've chosen the following two courses on edX - Analyzing Big Data with Microsoft R Server and Data Science Essentials for now. I looked at courses on Coursera but I couldn't find any which was worthy and free. There are a lot more MOOC providers out there but let's start here. And I feel like the two courses are relevant to where I …

On programmers.

I just watched this brilliant keynote today. It's a commentary on Programmers and the software development industry/ecosystem as a whole.



I am not going to give you a tl;dr version of the talk because it is a talk that I believe everyone should watch, that everyone should learn from. Instead, I am going to give my own parallel-ish views on programmers and programming.
As pointed out in the talk, there are mythical creatures in the software development industry who are revered as gods. Guido Van Rossum, the creator of Python, was given the title Benevolent Dictator For Life (BDFL). People flock around the creators of popular languages or libraries. They are god-like to most programmers and are treated like gods. By which, I mean to say, we assume they don't have flaws. That they are infallible. That they are perfect.
And alongside this belief in the infallibility of these Gods, we believe that they were born programmers. That programming is something that people are born wit…