Skip to main content

Colors of Quasars- Data Acquisition and Analysis

Over the course of my project, i've had to work with multiple data sets. Two of the important data sets were the original data set used by Richards et al in their paper and a new data set that i acquired - data set whose samples are constrained by the same conditions as mentioned in the richards et al paper.
Part of the original data used in the paper can be acquired from browsed from the vizier archived data, SIMBAD object list and NED object list. These data sets can be found through a simple search for the paper on SAO/NASA ADS.
As mentioned, this is just a part of the original data set i.e data pertaining to only 898 of the 2625 quasars used in the original paper are available for download through these links. The location for the rest of the sample set is still elusive - maybe they were released as part of the SDSS Data Release 1 which is why they weren't specifically included in this data set. This conclusion is based on the fact that none of these 898 objects are classified as SDSS - although SDSSp doesn't appear here and there.
The new data set was acquired by querying the Data Release 9 using an SQL Query. Constraints defined in the original paper were applied while querying DR9. A total of 146,659 query were found to satisfy the query.
The AB magnitude constraints set in the original paper are
  • u - 22.3
  • g - 22.6
  • r - 22.7
  • i - 22.4
  • z - 20.5
The reason behind setting these constraints might be to increase the efficiency of the photo-z estimation. As mentioned, the photo-z method isn't a particularly new method and it's effectiveness depends highly on accurate photometry of the objects. Therefore, the less accurate the photometry, the less accurate the photo-z estimation. Depending on the telescope's physical limitations, the authors might've concluded that the error in photometry for objects beyond these AB magnitude limits is too large.
The SQL query used to retrieve the new data set -
    plateX.plate, plateX.mjd, specObj.fiberID, 

    PhotoObj.modelMag_u, PhotoObj.modelMag_g, 
    PhotoObj.modelMag_r, PhotoObj.modelMag_i, PhotoObj.modelMag_z,
    PhotoObj.ra, PhotoObj.dec,

    specObj.z, PhotoObj.ObjID

    PhotoObj, specObj, plateX

    specObj.bestObjid = PhotoObj.ObjID 
    AND plateX.plateID = specObj.plateID 
    AND specObj.class = 'qso' 
    AND specObj.zWarning = 0
    AND PhotoObj.modelMag_u < 22.3  
    AND PhotoObj.modelMag_g < 22.6
    AND PhotoObj.modelMag_r < 22.7
    AND PhotoObj.modelMag_i < 22.4
    AND PhotoObj.modelMag_z < 20.5
As you can see, the query is for objects which have been spectroscopically confirmed as quasars within the AB magnitude limits. The magnitude of the object in the 5 filters - u, g, r, i, z are acquired along with the red shift of the quasar - obtained spectroscopically, the object ID of the quasar and the plate number, fiber ID & mean julian date on which the observation was made.
A sample of how the data looks -
plate mjd fiberID Mu Mg Mr Mi Mz ra dec z ObjID
1257 52944 91 19.02 16.99 15.85 15.26 14.87 71.32 25.64 0.967008 1237660760537100000
4997 55738 704 21.68 20.67 19.95 19.76 19.51 256.87 32.02 0.969408 1237655501888550000
1251 52964 27 20.06 17.89 16.65 15.99 15.57 66.13 27.99 0.967218 1237660553302510000
1256 52902 470 19.85 17.72 16.50 15.85 15.44 69.05 25.90 0.967194 1237660559209400000
1251 52964 24 19.74 17.70 16.57 16.00 15.67 65.83 28.24 0.96773 1237660750872310000
This new data has been uploaded to my google drive and is available for download. Note that the file is 16.4 MB in size. You can query the DR9 of SDSS yourself here.
Analysis of the data sets was done using Python. There are two data sets - the original data set which has a .dat file extension and the new data set which has a .csv file extension.
The .dat file has been extracted as follows -
>>> fin = open('foo.dat')
>>> data = fin.readlines()
>>> fin.close()
The .dat file came with a ReadMe file that gave instructions as to the formatting of the data in various columns -
Byte-by-byte Description of file: table2.dat
Bytes Format Units Label Explanations
1- 30 A30 Name Quasar name
32- 36 F5.3 z Redshift
38- 39 I2 h RAh Right Ascension (J2000)
41- 42 I2 min RAm Right Ascension (J2000)
44- 48 F5.2 s RAs Right Ascension (J2000)
50 A1 ± DE- Sign of the Declination (J2000)
51- 52 I2 deg DEd Declination (J2000)
54- 55 I2 arcmin DEm Declination (J2000)
57- 60 F4.1 arcsec DEs Declination (J2000)
62- 65 F4.2 arcsec DelPos Difference between SDSS and cataloged positions
67- 71 F5.2 mag umag The SDSS u band photometry (1)
73- 77 F5.3 mag eu Uncertainty in umag
79- 83 F5.2 mag gmag The SDSS g band photometry (1)
85- 89 F5.3 mag eg Uncertainty in gmag
91- 95 F5.2 mag rmag The SDSS r band photometry (1)
97-101 F5.3 mag er Uncertainty in rmag
103-107 F5.2 mag imag The SDSS i band photometry (1)
109-113 F5.3 mag ei Uncertainty in imag
115-119 F5.2 mag zmag The SDSS z band photometry (1)
121-125 F5.3 mag ez Uncertainty in zmag
127-130 F4.2 mag RedVec Reddening vector in E(gr) (2)
132-134 A3 rName Source catalog of each object (3)
  • SDSS magnitudes are based on the AB magnitude system Oke & Gunn (1983ApJ...266..713O), and are given as asinh magnitudes Lupton, Gunn, & Szalay (1999AJ....118.1406L).
  • As determined from Schlegel et al. (1998ApJ...500..525S).
  • Source catalog of each object -1 = NED -2 = SDSS.
using this, we can extract the relevant columns as
>>> data[i][31:36]
# to extract red shift
>>> data[i][66:71]
# to  extract the u band magnitude
>>> u, g, r, i, z = [], [], [], [], []
>>> redshift = []
>>> for j in xrange(len(data)):
...     tmp = data[j][31:36]
...     redshift.append(tmp)
...     tmp = data[j][31:36]
...     u.append(tmp)
...     tmp = data[j][78:83]
...     g.append(tmp)
...     tmp = data[j][90:95]
...     r.append(tmp)
...     tmp = data[j][102:107]
...     i.append(tmp)
...     tmp = data[j][114:119]
...     z.append(tmp)
>>> u_g, g_r, r_i, i_z = [], [], [], []
>>> for j in xrange(len(redshift)):
...     tmp = u[j] - g[j]
...     u_g.append(tmp)
...     tmp = g[j] - r[j]
...     g_r.append(tmp)
...     tmp = r[j] - i[j]
...     r_i.append(tmp)
...     tmp = i[j] - z[j]
...     i_z.append(tmp)
>>> subplot(221)
>>> scatter(redshift, u_g, s=5)
>>> subplot(222)
>>> scatter(redshift, g_r, s=5)
>>> subplot(223)
>>> scatter(redshift, r_i, s=5)
>>> subplot(224)
>>> scatter(redshift, i_z, s=5)
We have extracted the magnitude of the quasar in u, g, r, i, z bands and calculated the colors of the quasar - u-g, g-r, r-i and i-z. We then plotted the colors with respect to redshift.
Data extracted has been visualized as -
Histogram of the redshift
color vs redshift plots
Similarly, the new data set has been read using the pandas library
>>> import pandas as pd
>>> data = pd.read_csv('foo.csv')
>>> data['column_name']
# prints the data in the specific column. 
>>> data
# a summary of the data set, including the column identifiers.
<class 'pandas.core.frame.DataFrame'>
Int64Index: 146659 entries, 0 to 146658
Data columns (total 12 columns):
plate         146659  non-null values
mjd           146659  non-null values
fiberID       146659  non-null values
modelMag_u    146659  non-null values
modelMag_g    146659  non-null values
modelMag_r    146659  non-null values
modelMag_i    146659  non-null values
modelMag_z    146659  non-null values
ra            146659  non-null values
dec           146659  non-null values
z             146659  non-null values
ObjID         146659  non-null values
dtypes: float64(8), int64(4)
>>> u,g,r,i,z = data['modelMag_u'], data['modelMag_g'], data['modelMag_r'], data['modelMag_i'], data['modelMag_z']
Histogram of the new redshift data samples
color vs redshift plots of samples from the new data set
Enhanced version of the previous plot
Though it is not obvious, you can see that the color-redshift plots for quasars from the old sample set and for those from the new sample set follow a similar pattern wrt red shift!
We are now interested in obtaining a median color vs redshift relation for each of the 4 colors - u-g, g-r, r-i, i-z.
The motivation behind using the median color instead of a mean color is due to the presence of a large number of outliers in our data set - outliers in the form of reddened quasars, seyfert galaxies & broad emission line (BAL) quasars, outliers which may skew the mean.
Following the paper 'Photometric Redshifts of Quasars' by Richards et al 2001, for each of the 4 colors - the red shift range is first binned with centers at 0.05 redshift intervals with a bin width of 0.075 for z 2.15, 0.2 for 2.15z2.5 and 0.5 for z>2.5 centered at the 0.05. We now calculate the median colors for all of the quasars lying within these bins.
color vs redshift plot with the median color plotted
Here, the median color has been calculated using a bin width of 0.05 over the whole red shift interval. As you can see, the median color plotted in turquoise varies sharply beyond z=3. The reason for this, i believe, is because of the rapid fall in the number of quasars available in the red shift bins beyond this redshift, as is obvious from the histogram shown above! And I speculate that this is the reason why the authors chose aforementioned bin values.
Thus, I've been able to partially replicate and extend the paper richards et al 2001.
Further, having extended the results to a new sample set with 146,659 quasars, interesting features can be seen in the color-redshift plots. Vertical clustering of quasar colors can be seen in the color redshift plots of the new data set - clustering indicative of a wide color range for quasars at roughly the same redshift. Three such features are clearly visible in the u-g color vs redshift plot - at z~1, z~1.3 and z~1.6. Some of these features are visible in the other color-redshift plots.
Looking at the spatial projection, it can be seen that most of these quasars fall into two specific clusters. These fields are being further investigated as to understand their peculiar properties.
projection of quasars at the z~1 vertical feature
projection of quasars at the z~1.6 vertical feature
>>> ra,dec = data['ra'],data['dec']
...     for j in xrange(len(ra)):
...     if ra[j]>180:
...         ra[j] = ra[j] - 360
...     ra[j] = ra[j]*np.pi/180
...     dec[j] = dec[j]*np.pi/180
# the ra and dec should first be converted into radians and the ra and dec values should be in the range -180:180 and -90:90 respectively
>>> subplot(111,projection="mollweide")
>>> scatter(ra,dec,s=5)
>>> grid(True)
>>> title("mollweide projection of quasars in the 1st vertical feature")
The spectra of these quasars have been obtained and for the quasars in the vertical feature at z~1, this is how they look
spectra of quasars on the first feature
>>> import astropy as ap
# has wayy more functionality than pyfits but for now, pyfits will suffice
>>> import pyfits as pf

>>> hdulist ='foo.fits')
# the number of blocks in the fits file etc
>>> hdulist[i].header
# the header for the ith block
>>> hdulist[i].data
# data from the ith block
# function spec_plot() to extract and plot the spectrum of a fits file. 
>>> import pyfits as pf
>>> plot.hold(True)
>>> spec_plot(file):
...     hdulist =
...     print
...     print len(hdulist[1].data)
...     flux = [] 
...     loglam = [] 
...     for i in xrange(len(hdulist[1].data)):
...         tmp = (hdulist[1].data)[1][0:1]
...         flux.append(tmp)
...         temp = (hdulist[1].data)[1][1:2]
...         loglam.append(temp)
# in this specific case, the 1st block contains the data pertaining to the spectrum and the 1st value in this block is flux while the 2nd value is the log(lambda)
...     print len(flux)
...     print len(loglam)
...     plot(loglam, flux, lw=0.3)
The reason i included the command
>>> plt.hold(True)
in the beginning to be able to plot multiple spectra in the same plot window - as shown above. In order to automate this task, i include a small routine on top of spec_plot() that will give this function the files to extract and plot spectra from. I've created such a file using
$ ls >> fits_list
so, the delimiters are ',\n'. Now, i can read this file 'fits_list' but i can't explicitly extract just the file name because the delimiter ',\n' is attached to it. In order to get over this problem, we need the freedom to explicitly mention the delimiters ourselves. And we can do so using the function read() from the library astropy. We need to edit this function and set the delimiters ourselves so that we can explicitly retrieve just the file names from 'fits_list' which can then be fed to spec_plot()
the function read() should be in
or just do
$ locate astropy/io/ascii/
and it should show up! Now, edit the function as such
rdr = ascii.get_reader(Reader=ascii.Basic)
# rdr.header.splitter.delimiter = ' ' = ',\n '
# rdr.header.start_line = 0 = 0 = None
# rdr.header.comment = r'\s*#'
# = r'\s*#'
Having edited the function read() to use the delimiters as ',\n'
>>> fin = open('fits_list')
>>> list = fin.readlines()
>>> fin.close()
>>> list[i]
# should explicitly print the ith file name
as for the ability to plot the spectra of all the files in a directory, the file names of which are in fits_list
>>> def spec_all(file):
...     fin = open(file)
...     list = fin.readlines()
...     fin.close()
...     for k in xrange(len(list)):
...         tmp = list[k]
...         spec_plot(tmp)
should do the trick.
emission line wavelength(rest frame)
Ly alpha 1.22E+03
NV 1240 1.24E+03
CIV 1549 1.55E+03
HeII 1640 1.64E+03
CIII] 1908 1.91E+03
MgII 2799 2.80E+03
OII 3725 3.73E+03
OII 3727 3.73E+03
NeIII 3868 3.87E+03
Hepsilon 3.89E+03
NeIII 3970 3.97E+03
Hdelta 4.10E+03
Hgamma 4.34E+03
OIII 4363 4.36E+03
HeII 4685 4.69E+03
Hbeta 4.86E+03
OIII 4959 4.96E+03
OIII 5007 5.01E+03
HeII 5411 5.41E+03
OI 5577& 5.58E+03
OI 6300 6.30E+03
SIII 6312 6.31E+03
OI 6363 & 6.37E+03
NII 6548 6.55E+03
Halpha 6.56E+03
NII 6583 6.59E+03
SII 6716 6.72E+03
SII 6730 6.73E+03
ArIII 7135 7.14E+03
Written with StackEdit.

Popular posts from this blog

Animation using GNUPlot

Animation using GNUPlotI've been trying to create an animation depicting a quasar spectrum moving across the 5 SDSS pass bands with respect to redshift. It is important to visualise what emission lines are moving in and out of bands to be able to understand the color-redshift plots and the changes in it.
I've tried doing this using the animate function in matplotlib, python but i wasn't able to make it work - meaning i worked on it for a couple of days and then i gave up, not having found solutions for my problems on the internet.
And then i came across this site, where the gunn-peterson trough and the lyman alpha forest have been depicted - in a beautiful manner. And this got me interested in using js and d3 to do the animations and make it dynamic - using sliders etc.
In the meanwhile, i thought i'd look up and see if there was a way to create animations in gnuplot and whoopdedoo, what do i find but nirvana!

In the image, you see 5 static curves and one dynam…

on MOOCs.

For those of you who don't know, MOOC stands for Massively Open Online Course.

The internet is an awesome thing. It's making education free for all. Well, mostly free. But it's surprising at the width and depth of courses being offered online. And it looks like they are also having an impact on students, especially those from universities that are not top ranked. Students in all parts of the world can now get a first class education experience, thanks to courses offered by Stanford, MIT, Caltech, etc.

I'm talking about MOOCs because one of my new year resolutions is to take online courses, atleast 2 per semester (6 months). And I've chosen the following two courses on edX - Analyzing Big Data with Microsoft R Server and Data Science Essentials for now. I looked at courses on Coursera but I couldn't find any which was worthy and free. There are a lot more MOOC providers out there but let's start here. And I feel like the two courses are relevant to where I …

On programmers.

I just watched this brilliant keynote today. It's a commentary on Programmers and the software development industry/ecosystem as a whole.

I am not going to give you a tl;dr version of the talk because it is a talk that I believe everyone should watch, that everyone should learn from. Instead, I am going to give my own parallel-ish views on programmers and programming.
As pointed out in the talk, there are mythical creatures in the software development industry who are revered as gods. Guido Van Rossum, the creator of Python, was given the title Benevolent Dictator For Life (BDFL). People flock around the creators of popular languages or libraries. They are god-like to most programmers and are treated like gods. By which, I mean to say, we assume they don't have flaws. That they are infallible. That they are perfect.
And alongside this belief in the infallibility of these Gods, we believe that they were born programmers. That programming is something that people are born wit…