Skip to main content

Week 2 - Data Mining & Colors of a Quasars

So, here's what i'm working on.
I'm reproducing a paper published in 2000 by Richards et al on the colors of 2625 quasars at red shifts ranging from 0<z<5 and the empirical relation between color and red shift of a quasar. I don't know how much of that you understand but well, that's my work for the summer.

Let me now break down it down.

The color of a quasar is the difference in the (AB) magnitude of a quasar in different bands - u, g, r, i, z to be specific in the case of the SDSS. The definition of color in this case is no different than what you and I use everyday to describe objects. Refer to this awesome comic by the oatmeal.  The color receptors mentioned in the comic are specifically sensitive to the R, G & B bands, which is why they are our primary colors. For humans specifically. Now, we do something similar here - take a CCD (which is our brain) and fit it with 5 (instead of just 3) filters (which are u, g, r, i, z for the SDSS). Light from of these filters is spread over the CCD and we can record the strength of the quasar or stellar object in different wavelength bands by looking at the object in that specific part of the CCD.

Now, as i said earlier, color of a quasar is defined as the difference in the magnitude of a quasar in different bands. Mag_u - Mag_g is one of the colors and so on, well you get the point. So, i now have 4 colors for a quasar. (There's a trick question here regarding the # of colors, if you have the same question as i did in the beginning, comment and ask. Otherwise, just move on). 

Now, the Sloan Digital Sky Survey has been running since the early 2000s and has been really successful - a quick  SQL search yielded ~150,000 quasars whose spectra were recorded, probably more whose spectra haven't been recorded yet and 500,000 galaxies whose magnitudes in the 5 bands were known (or maybe 500,000 is what the output is limited to.It's just too perfect a number). Plus a heck lot more stars, Seyfert galaxies, BAL galaxies, CEL galaxies and what not. So, yeah, it's a pretty awesome survey. 

So, after an year of observations, they decided to study the color-red shift relation of quasars. While a paper from 1999 by Fan  giving a relation between the color of a quasar and the red shift of a quasar. The SDSS guys wanted to see if the relation held true. (Fan was actually part of the SDSS team, i think). Having found ~1500 quasars themselves and with ~ 1000 quasars already catalogued in NED, SIMBAD, they went about plotting color-red shift and color-color diagrams of a quasar. And whatdoyaknow, it fits. To a good extent.

LLS - Lyman Limit Spectrum - refers to the blueward wavelength region to the Ly alpha line!

(on a different note, i'm not reproducing these figures with permission from the author. Ohh i sure do hope i won't get into any trouble).

So, see, it's a good enough fit. 
Now, the reason to do all of this isn't just to prove the color-red shift relation, it's for a much bigger cause. To calculate red shifts for other quasars which haven't been spectroscopically studied yet (and might never be). So, the red shifts to all of these objects were calculated using spectroscopy. There are specific lines in the spectrum of a quasar which can be used to identify the red shift of the quasar. Here's a sample spectrum of a quasar. If you want to know why there are [] and ] around some of the elements, comment and ask. 

So, to calculate the red shift of a quasar (or any object in general), the most accurate method is to study the spectrum of said object, identify the different emission lines(this is a fun task!), look at the observed wavelength of the lines (which is different from the actual wavelength because of the red shift) and because we know what the rest wavelength of the line, calculate red shift. 

Easy peesy. 

But alas, spectroscopy is a very arduous and time consuming task. Compared to photometry (studying the magnitudes of star in different bands), spectroscopy takes a lot more time (in order to reach the same level of SNR per pixel as that in photometry). So, people use other methods to calculate the red shift of objects. 

This is where things get interesting. 

Source : the Richards et al 2000  paper. 

So, this image conveys three things. 
1. the pass bands - u, g, r, i, z - and their wavelength transmission! 
2. The spectrum of a quasar which is at a red shift (z) = 2.8 and
3. The spectrum of a F5V star. 

If you aren't able to differentiate between the quasar and the star, see the continuum spectrum with a peak at ~ 5000 A?! With a lot of jagged lines to the left?! Well, that's the Quasar spectrum and the other one is the star. That sharp peak you see is the Lyman alpha line and the zig-zag lines to the left of it are collectively referred to as the lyman forest! 

Just looking at the spectrum of quasars at various red shifts, 
source : Richards et al (2000)

As you can see, each emission line can be considered a spectral feature and with increasing red shifts, some of them move out of the observable band (like the Balmer alpha line) and new lines (like the lyman alpha line) move into the observable band. 

Now, if you perform photometry on these quasars, there is a certain trend to be seen. As you look at further and further quasars, there is a certain trend in the colors which is beautifully explained here, in the same Richards et al 2000 paper. 

Things are slowly starting to fall into place. 

As you can see, just by performing photometry on quasars (which is easier and faster than spectroscopy), we can calculate redshifts (to a precision of ~0.3!). 

Isn't that just beautiful?! 

Read the paper further to exactly understand the method used to calculate the red shift of a quasar - they use a least chi-squared method to estimate the red shift

So, well, that's what i intend to reproduce. 
The relation between the color & red shift of a quasar obtained empirically and through simulation. 

Well, actually, i thought i could just download the magnitudes and red shift for the 2625 quasars used by Richards et al but the SIMBAD & NED links give me just 898 objects! I don't know where to find the rest. Probably in the 1st Data Release of SDSS! 

But hey, i thought i could just perform an SQL query (i'm calling this data mining. Why?! Cos it sounds cool, that's why!) given the same constraints as those used in the paper and whoopdedaa, i get data for ~140,000 quasars. ~140,000 quasars with 10 columns each, containing information regarding the 5 magnitudes, object ID, classification, RA & Dec and what not. I'm sorry but if that number isn't impressive, i don't know what is!

Now, my job (since the last couple of days and) for this week is to plot the same color-red shift relation and draw an empirical relation between the color and red shift of a quasar but instead of just 2625 quasars, i have ~140,000! I'm soo looking forward to this week! 

Well, that's most of what was happening last week, my trying to understand the Richards et al 2000 paper and couple of others which were cited in it, learning SQL query (from here) and retrieving data regarding quasars and studying the theory behind quasar emissions, understanding the characteristic power-law spectrum and emission-absorption features on a quasar spectrum. This post, which is already too long, will go on forever if i get started about those as well. So, i shall limit myself here. 

Until next week then...

Popular posts from this blog

Animation using GNUPlot

Animation using GNUPlotI've been trying to create an animation depicting a quasar spectrum moving across the 5 SDSS pass bands with respect to redshift. It is important to visualise what emission lines are moving in and out of bands to be able to understand the color-redshift plots and the changes in it.
I've tried doing this using the animate function in matplotlib, python but i wasn't able to make it work - meaning i worked on it for a couple of days and then i gave up, not having found solutions for my problems on the internet.
And then i came across this site, where the gunn-peterson trough and the lyman alpha forest have been depicted - in a beautiful manner. And this got me interested in using js and d3 to do the animations and make it dynamic - using sliders etc.
In the meanwhile, i thought i'd look up and see if there was a way to create animations in gnuplot and whoopdedoo, what do i find but nirvana!

In the image, you see 5 static curves and one dynam…

Pandas download statistics, PyPI and Google BigQuery - Daily downloads and downloads by latest version

Inspired by this blog post :, I wanted to play around with Google BigQuery myself. And the blog post is pretty awesome because it has sample queries. I mix and matched the examples mentioned on the blog post, intent on answering two questions - 
1. How many people download the Pandas library on a daily basis? Actually, if you think about it, it's more of a question of how many times was the pandas library downloaded in a single day, because the same person could've downloaded multiple times. Or a bot could've.
This was just a fun first query/question.
2. What is the adoption rate of different versions of the Pandas library? You might have come across similar graphs which show the adoption rate of various versions of Windows.
Answering this question is actually important because the developers should have an idea of what the most popular versions are, see whether or not users are adopting new features/changes they provide…

Adaptive step size Runge-Kutta method

I am still trying to implement an adaptive step size RK routine. So far, I've been able to implement the step-halving method but not the RK-Fehlberg. I am not able to figure out how to increase the step size after reducing it initially.

To give some background on the topic, Runge-Kutta methods are used to solve ordinary differential equations, of any order. For example, in a first order differential equation, it uses the derivative of the function to predict what the function value at the next step should be. Euler's method is a rudimentary implementation of RK. Adaptive step size RK is changing the step size depending on how fastly or slowly the function is changing. If a function is rapidly rising or falling, it is in a region that we should sample carefully and therefore, we reduce the step size and if the rate of change of the function is small, we can increase the step size. I've been able to implement a way to reduce the step size depending on the rate of change of …