Extracting JWST data
Firstly, you’ll need to download your data! Navigate to MAST and use the search boxes to find your target / program. Add your desired observations to your download basket and then make sure to download the uncal.fits
files, by marking the check box next to “UNCAL” in the “Group” filter within the download basket. These are what we will use for the Stage 1 reduction. You don’t need to download any more files than the uncal.fits
files.
Alternatively, you can download the rateints.fits files if you don’t want to perform stage 1 extraction yourself (not recommended) and jump straight to “Stage 2” below.
1. Stage 1
1.1 Running the jwst
pipeline on uncal.fits
files
After downloading and unpacking the JWST data, navigate to your downloaded directory. You will see that the JWST files are divided into separate segment subdirectories jw......-seg001
, jw.........-seg002
,… . I tend to leave these subdirectories as they are and run the below stage 1 steps separately within each segment sub-directory.
At this point, I like to take a quick look at the data and to make an initial bad pixel mask, which we will use later.
For the quick look and bad pixel mask creation, look at the example jupyter notebook under Tiberius/src/reduction_utils/JWST_utils/reduction_notebooks/0_quick_look_data.ipynb
.
Now you will want to run the relevant stage 1 executable found under Tiberius/src/reduction_utils/JWST_utils/stage1_*
. These are just executable text files that string together the relevant commands from the jwst
pipeline, following the procedure outlined here.
Note
The stage 1 files include a 1/f correction by default and saturation flagging override for PRISM data as default, following the procedures outlined in Rustamkulov et al. 2023. If you don’t wish to perform these corrections, you’ll need to comment out these lines within the relevant “stage1_*” file. If you do want to perform a 1/f correction, see the following sub-section, otherwise skip ahead.
1.1.1 Performing a 1/f correction
In JWST observations, there is column-dependent (for NIRCam it’s row-dependent) noise which can increase the noise in your extracted light curves. As shown in Rustamkulov et al. 2023, this noise is best-removed at the group stage. In order to do this accurately, it is necessary to determine the locations of the trace so that it can be masked from the background estimation.
To do this, first make a “master” uncal.fits
file (a median frame of all uncal.fits
files – see Tiberius/src/reduction_utils/JWST_utils/0_quick_look_data.ipynb
for an example. You’d then want to run Tiberius/src/reduction_utils/spectral_extraction.py
on this master uncal.fits
file to determine the location of the trace (see section 1.3 below to see how to do this).
Then in the relevant line in your stage1_
file, you would have to change the following:
python $HOME/python/Tiberius/src/reduction_utils/JWST_utils/1overf_subtraction.py $1_darkcurrentstep.fits --trace_location /path-to-master_uncal.fits/pickled_objects/x_positions_1.pickle --extraction_input /path-to-master_uncal.fits/extraction_input.txt
Now you are ready to run stage 1, which is described in more detail below.
1.1.2 Running stage1
files
In stage1_NIRCam
, I demonstrate how to override the reference files that jwst
will try to use by default, as I’ve found that it won’t always use the most recent reference files. You can download the latest JWST reference files here.
If you do download new reference files, I recommend putting them in:
$HOME/crds_cache/jwst_pub/references/jwst/[instrument-name-in-lower-case]/
Once you are happy with your stage1_*
file, you can run the stage 1 extraction by doing the following (assuming you have installed jwst
into a new conda environment called JWST
). Note if you copy/make a new stage1_*
file, you’ll need to make it executable by doing:
chmod +x stage1_*
Then you can proceed as follows:
conda activate JWST
export CRDS_PATH=$HOME/crds_cache/jwst_pub
export CRDS_SERVER_URL=https://jwst-crds-pub.stsci.edu
. /path/to/stage1_* [file-name-of-jwst-fits-file-truncated-before-_uncal.fits]
e.g.,
. /path/to/stage1_PRISM jw01366004001_04101_00001-seg001
This will produce a series of fits files, with the main one of interest being the gainscalestep.fits
files which is what we will work with in Stage 2. By default, Tiberius’ stage1_*
executables clean the subdirectories of other intermediate fits files, otherwise you can quickly run out of storage! You can prevent this behaviour by commenting out the relevant lines (rm jw......fits
) in the stage1_
text files.
1.2 Cleaning the cosmic rays / telegraph pixels
With the gainscalestep.fits
in hand, you’re ready to proceed with cleaning the fits files of cosmic rays.
Within the parent directory of your segment subdirectories, first make a list of the gainscalestep.fits
files:
ls **/*gainscalestep.fits > cosmic_file_list
Then run reduction_utils/locate_cosmics.py
which will locate the cosmic rays and telegraph pixels by calculating medians for every pixel in the time-series and comparing each pixel to its respective median. Flagged outliers will then be replaced by the median for that pixel in the time-series.
I have set the default arguments to sensible values but you will want to experiment on a case-by-case basis to see whether these need altering. In most cases with Tiberius, adding -h
as a command line argument will print help for that particular script along with argument definitions.
After generating cosmic_file_list
do:
python /path/to/Tiberius/src/reduction_utils/locate_cosmics.py cosmic_file_list -jwst -h
Once you have looked at the parameter definitions, run the above again without the -h
parameter.
This will calculate all pixel medians and then plot all integrations that have a total number of flagged pixels greater than the threshold set by -frame_clip
(default = 3, which might plot a lot of frames!).
For every frame that exceeds this threshold, it will ask you in the terminal:
Reset mask for integration N? [y/n]
This gives you an opportunity to overwrite all pixel flags for a whole integration if you suspect the outlier detection was too aggressive. If you have the settings right, this should just plot integrations with massive cosmics, for which you can reply n
to the command line question.
Once you have vetted all these flagged frames, it will ask you one last question (try not to be too hasty with your n
key!!).
Replace cosmic values with median and save to new fits? [y/n]:
Providing you are happy with everything up to this point, you can hit y
which will replace all flagged pixels in the time-series with the medians and save the cleaned integrations to a new directory called cosmic_cleaned_fits/
. If you are not happy, hit n
and play around with the command line arguments for locate_cosmics.py
.
1.3 Extracting stellar spectra
Now we have our cosmic-cleaned integration level fits files, we are ready to run aperture photometry on these to extract our stellar spectra.
I recommend you make a new directory (reduction01, reduction02,...
) for each test reduction you perform (e.g., different aperture and background widths).
In each new reduction directory, you will need to make a new extraction_input.txt
file (which can be copied from a previous reduction or from /path/to/Tiberius/src/reduction_utils/extraction_input.txt
). You will also need to make a text file with a list of filenames defining the fits files you will be running the extraction over. Assuming you’re working with the cosmic-cleaned fits files, this can be made like so:
ls /path/to/cosmic_cleaned_fits/*.fits > science_list
You then need to define the path to this science_list
in your extraction_input.txt
file. I don’t explain the different parameters in extraction_input.txt
at this point as they are each explained within the example extraction_input.txt
bundled in the Tiberius
download.
One thing I do recommend, however, is that every time you run a reduction for the first time, or with a new set of extraction parameters, that you set verbose = -2
in extraction_input.txt
. This will plot a number of helpful plots for every integration and allow you to check whether the parameters you’ve selected are sensible. If they are, then you can quit the extraction and set verbose = -1
(for no plots) or verbose = 0
(which will only show plots for a particular integration if something has gone wrong with that integration).
Note
Tiberius
needs to have the dispersion/spectral direction along the vertical axis. That means for NIRSpec, NIRCam and NIRISS data you need to set rotate_frame = 1
in extraction_input.txt
.
To actually run the extraction, you will need to run the following from within your reduction directory where you have put extraction_input.txt
and science_list
:
python /path/to/Tiberius/src/reduction_utils/spectral_extraction.py
This will loop through all integrations, performing aperture photometry, and print out its progress.
After running spectral_extraction.py
, you will see that two new sub-directories have been made:
pickled_objects/
which contains the extracted stellar flux (star1_flux.pickle
), flux uncertainty (star1_error.pickle
), time stamps (time.pickle
==int_mid_BJD_TDB
from the FITS headers), measured FWHM (fwhm_1.pickle
), x position (x_positions_1.pickle
) and measured background (background_avg_star1.pickle
) as pickled numpy arrays.initial_WL_fit/
which contains the extracted white light light curve (initial_WL_flux.pickle
), white light light curve error (initial_WL_err.pickle
) and white light curve time arrays (initial_WL_time.pickle
). These can be fitted withTiberius
’s light curve fitting tools (read on to see how) to check the quality of your reduction.
1.3.1 A note on background subtraction
During the spectral_extraction.py
step, you have the option to perform a background subtraction at the integration level, using the background parameters in extraction_input.txt
. Tiberius
can fit any order of polynomial (or use a median) across two regions either side of the trace, as defined in extraction_input.txt
. I have found an additional background subtraction step to be advantageous even if you performed a 1/f correction at the group stage. This is because the background may have structure that is not well-described by the median that was used in the 1/f step.
1.3.2 A note on oversampling
Tiberius
allows you to oversample an integration’s flux along the spatial dimension. This is done via a flux-conserving linear interpolation onto an axis with N times the original number of pixels. The motivation for this step is to be able to use sub-pixel apertures, which is particularly beneficial for curved and/or undersampled PSFs (e.g., PRISM). In tests on ERS PRISM data, setting oversampling_factor = 10
in extraction_input.txt
led to an improvement in white light scatter of 14%.
1.4 Post-processing the spectra
After you’ve extracted the spectra using spectral_extraction.py
, you’re ready to perform the wavelength calibration, correct for any shifts in the spectra and create your wavelength bins and light curves. These steps are done using a serious of Jupyter notebooks, with examples included in Tiberius/src/reduction_utils/JWST_utils/reduction_notebooks/
.
I tend to copy the example reduction_notebooks
directory into each of my reductionNN/
directories. I go through each of these notebooks below.
0_quick_look_data.ipynb
: I use this notebook to look at the uncal.fits files and make bad pixel masks1_cosmic_removal.ipynb
: this notebook describes how you can check for and remove residual cosmic rays and bad pixels from your extracted spectra. Typically, if you’ve runlocate_cosmics.py
this step is not necessary.2_spectra_resampling.ipynb
: this notebook cross-correlates each spectrum in the time-series with an reference spectrum from the time-series to determine how the spectra shift in the dispersion axis. You can then use these shifts to resample the spectra onto a common (sub)pixel grid. This is not strictly necessary given the shifts are typically << 1 pixel.3_wavelength_calibration.ipynb
: this notebook shows you how to get the wavelength solution from theextract2d.fits
files.4_light_curve_creation.ipynb
: this notebook shows you how to make your spectroscopic light curves from your selected wavelength bins5_reformatting_results.ipynb
: an example notebook about how to reformat the outputs fromTiberius
for easier comparison with other pipelines.
1.5 Outcome
At this stage, you should have extracted 2D stellar spectra and light curves (as pickled numpy arrays) and you’re able to move onto light curve fitting!