Converting data to BIDS using HeuDiConv

For a guided tutorial of everything on this page, check out the video recordings from our Pygers Workshop.

Do this once: Setting up your directory structure for a new study

Getting data off the scanner in a BIDS-formatted way requires that your directories are setup in a specific way. To make things simpler, we have made a template folder with the directory structure already in place. Copy the folder new_study_template, which can be found in /jukebox/norman/pygers/handbook/new_study_template, to wherever you would like your new study directory to live.

The specific steps to copy the folder using the terminal are as follows:

# log into spock or scotty
$ ssh -XY username@spock.pni.princeton.edu
$ ssh -XY username@scotty.pni.princeton.edu

# copy the study template to your own directory
$ cp -r /jukebox/norman/pygers/handbook/new_study_template /jukebox/YOURLAB/USERNAME/

# rename new_study_template with your study name
$ cd /jukebox/YOURLAB/USERNAME
$ mv new_study_template [STUDYNAME]

# if you are working with our sample data, called it sample_project:
$ cd /jukebox/YOURLAB/USERNAME
$ mv new_study_template sample_project
# from now on, replace YOURSTUDY in path examples with 'sample_project'

Here is the hierarchy of the folders in the new_study_template folder.

$ tree

└── new_study_template               # copy this directory to setup the entire directory structure for a new project
    └── code
        └── analysis                 # [example] analysis code can live here
        └── preprocessing            # this is where heudiconv, fmriprep, mriqc scripts live
        └── task                     # [example] task code can live here
    └── data
        └── behavioral               # [example] other data (e.g. behavioral) can live at this level
        └── bids                     # this is where raw BIDS data will be saved by HeuDiConv
            └── sub-001              # these sub directories don't exist until you run HeuDiConv
            └── sub-002
            └── sub-003
            └── derivatives          # this is where everything "derived" from BIDS data will live
                └── deface           # defaced T1 images go here
                    └── logs         # slurm logs
                └── fmriprep         # fmriprep-preprocessed data will go here
                    └── logs         # slurm logs
                └── freesurfer       # fmriprep will also run freesurfer reconstruction and outputs go here
                └── mriqc            # mriqc output will go here
                    └── logs         # slurm logs
            └── .bidsignore          # similar to .gitignore, list all files/directories you don’t want to be checked by the bids-validator
        └── dicom                    # raw dicoms copied from the scanner go here
            └── check_volumes        # outputs checking that all dicoms transferred
        └── work                     # work files generated by fMRIprep and MRIQC

IMPORTANT NOTES:

  • You need to decide where your intermediate work files will live. One option is to send them to the data/work directory included in new_study_template. You can also send them to the scratch volume on the PNI server. This could be a good idea because intermediate work files take up a lot of space, and scratch is not backed up like the rest of the server. So you can save space on the server by not having the work files backed up. You most likely won’t need the work files after MRIQC and fMRIPrep have finished running for a given subject. Worst case scenario, if you lose your work files and realize you need them for some reason, you can just re-run MRIQC or fMRIPrep. In order to move forward with the scratch option, you will need to setup a directory for work files on scratch.

# navigate to scratch
$ cd /jukebox/scratch

# make yourself a directory if you don't already have one
$ mkdir <USERNAME>

# move into your personal directory and make a work directory for this project
$ cd <USERNAME>
$ mkdir work/YOURSTUDY
  • After copying the template directory for your new study, you need to update the paths in globals.sh. Open globals.sh and update the following three directories:
    • scanner_dir (see note)

    • project_dir (path to your root study directory)

    • scratch_dir (path to where you want your work files to live for this project)

Note

scanner_dir is the path to the conquest directory where your file end up after transferring off the scanner.

If you are following these steps to practice using BIDS with our sample project, you should make sure scanner_dir is set to copy the sample dataset from our “fake” scanner directory:

scanner_dir=/jukebox/norman/pygers/conquest

Otherwise, if you are setting this up for your own study, scanner_dir should point to the directory where your raw data are sent when you transfer data off the scanner. At PNI, if you scanned on Skyra, this is:

scanner_dir=/jukebox/dicom/conquest/Skyra-AWP45031/YOURLAB/YEAR

OR if you scanned on Prisma:

scanner_dir=/jukebox/dicom/conquest/Prisma-MSTZ400D/YOURLAB/YEAR

  • Before running fMRIprep for the first time, you will need to download a FreeSurfer license file and save it in your /code/preprocessing/ directory. If you decide to save it somewhere else (which is totally fine!), then you will need to update the --fs-license-file line of run_fmriprep.sh with the correct license file location.

  • Anatomical images need to be defaced before they can be shared publicly. We recommend defacing images as you collect data and saving them in /data/derivatives/deface, so they are available when you need them (e.g., data visualization in notebooks that may be shared publicly). Depending on the goals of your study, it may not be a good idea to preprocess your data using defaced images (e.g., it might introduce registration problems), so that is why we have them set aside in the derivatives directory here.

Convert DICOMS to BIDS-formatted NIFTI

Step 1: Convert your dicoms into nifti files using HeuDiConv

This step will use the following four scripts (all of which can be found in /code/preprocessing):

  • step1_preproc.sh

  • number_of_files.py

  • run_heudiconv.py

  • deface.sh

The script step1_preproc.sh will do five things for you:

  1. copy your DICOM files from “conquest” and place them in your study directory (/data/dicom/)

  2. count the number of volumes in each run so you can check that your data transfer was successful (the output of this step can be found in /data/dicom/check_volumes, and will also be printed out in your terminal window)

  3. unzip the DICOMs in your study directory

  4. run HeuDiConv to convert your DICOMs (.dcm) to BIDS-formatted Nifti files (.nii)

  5. Deface your T1w anatomical image and set it aside in your derivatives directory (/data/bids/derivatives/deface)

HeuDiDonv is a flexible DICOM converter for organizing brain imaging data into structured directory layouts.

You should run step1_preproc.sh for each subject and each session separately. You can run step1_preproc.sh as soon as your data have finished transferring off the scanner to the conquest directory (i.e., ~20 min after you finish scanning).

The script takes three inputs:

  • subjectID

  • sessionID

  • the name of the data folder that contains your DICOM-images for that subject/session (at Princeton, this is in the “conquest” directory). You can get this information by listing the files in the conquest directory:

    • from Skyra: ls /jukebox/dicom/conquest/Skyra-AWP45031/YOURLAB/YEAR

    • from Prisma: ls /jukebox/dicom/conquest/Prisma-MSTZ400D/YOURLAB/YEAR

    • sample project: ls /jukebox/norman/pygers/conquest

Tip

Add the above ls command as an alias in your .bashrc file to easily get this info when you need it:

alias 'conquest'='ls /jukebox/dicom/conquest/Skyra-AWP45031/YOURLAB/YEAR'

Then instead of typing out the full conquest path every time you want to see the files in that directory, you can simply type conquest on your command line!

Whatever subjectID you use as your first input will correspond to how your BIDS subject folders are named (eg., inputting 999 above will result in a directory called sub-999).

SessionID (second input) should match how your runs were named on the scanner (e.g., input 01 for sessionID if your runs were named func_ses-01_task-study_run-01). If your study doesn’t include multiple sessions per subject, you will need to make some modifications to these scripts to remove the session information.

Tip

If you need to, run step1_preproc.sh line by line to check that the correct paths will go into run_heudiconv.py. If there is a problem with your paths, check your globals.sh file.

We recommended running step1_preproc.sh in a tmux window so you don’t run into issues with losing connection to the server, etc. After ssh-ing into the server, create a new tmux window OR attach to an exisiting tmux window. After creating a new window, you can attach to that specific window/session in the future. In other words, you don’t have to create a new window every time you run step1_preproc.sh.
  • Create a new tmux window: tmux new -s [name]

  • Attach to an existing window: tmux a -t [name]

  • NOTE: replace [name] with whatever you want to name your tmux window – we recommend naming it step1.

  • tmux tip page

  • tmux cheatsheet

# create a new tmux window
$ tmux new -s step1

# OR attach to an existing tmux window
$ tmux a -t step1

# make sure you are in your study's code/preprocessing directory
$ cd /jukebox/YOURLAB/USERNAME/YOURSTUDY/code/preprocessing

# list files available in conquest directory to get data folder name for input 3
$ ls /jukebox/dicom/conquest/Skyra-AWP45031/YOURLAB/YEAR
# OR
$ ls /jukebox/dicom/conquest/Prisma-MSTZ400D/YOURLAB/YEAR
# OR (sample project)
$ ls /jukebox/norman/pygers/conquest

# run the script step1_preproc.sh for subject XXX, session xx
# replace XXX with your subject ID
# replace xx with your session ID
$ ./step1_preproc.sh XXX xx [conquest folder]

# NOTE: For the sample project, use the following command:
$ ./step1_preproc.sh 001 01 0219191_mystudy-0219-1114

Tip

If HeuDiConv is failing, check that your original dicoms are only zipped one time (meaning only one .gz extension instead of .gz.gz). If your dicoms are zipped multiple times (sometimes this happens!), add another line for gunzipping again. Basically do this until your files only have the .dcm extension.

Step 2: Get your data ready to pass the bids-validator

This step will use the step2_preproc.sh script. We recommend running this step after data for all sessions for a given subject have been acquired and run through step1_preproc.sh.

This script will carry out all the “cleanup” steps that need to be taken to make sure your data are BIDS-valid and ready for MRIQC and fMRIPrep:

  1. delete extra files (e.g., scouts, duplicate runs)

  2. rename fieldmaps (if necessary)

  3. add the IntendedFor field to the fieldmap .json files so that fieldmaps can be used for susceptibility distortion correction on your functional data

The script takes one input:

  • subjectID

Note

  • This script will need to be customized for your study! Edit this script once at the beginning of your project so that all the filenames match your naming scheme, and so the fieldmaps are being applied to the correct functional runs. If you did not collect fieldmaps, then you can ignore the steps specific to fieldmaps.

  • If an individual subject deviates from your standard (e.g., has an extra set of fieldmaps or is missing functional runs), then you will need to edit step2_preproc.sh again to accomodate these differences.

  • Sample project: The sample dataset does NOT include fieldmaps. Therefore, when you edit the step2_preproc.sh for the sample project, you can comment out the lines of code dealing with the fieldmaps. You should still run step2_preproc.sh to delete the extra (scout and dup) files.

If you run bids-validator and get any warnings and/or errors, put any modifications you need to make to pass the validator into this script so you can easily get subjects ready for BIDS apps as you collect more subjects. Again, this script should be customized for your experiment and not just run without editing.

# run the script (step2_preproc.sh), e.g. for subject XXX
$ ./step2_preproc.sh XXX

# NOTE: For our sample project, use the following command
$ ./step2_preproc.sh 001

Step 3: Run the BIDS validator

Run the BIDS validator to make sure everything is setup correctly. You should check your BIDS validation as soon as possible (i.e., after collecting your first subject’s data) so that you can fix any problems if they exist!

Any non-BIDS formatted files should go into your ../bids/derivatives directory which is automatically ignored by the BIDS validator; if you (deliberately) have non-BIDS formatted files outside of the derivatives folder, then you can add them to a .bidsignore file.

You can run the BIDS validator from your browser.

OR (recommended) you can install the bids-validator in a conda environment and run it directly on the server or locally:

If you have already setup a pygers conda environment following the instructions on our conda tip page, then you are good to go! The pygers conda environment already has the bids-validator package installed.

If you have another conda environment and you want to add the bids-validator to that conda environment, follow these steps:

$ conda activate <myenv>

# first, update or install nodejs
$ conda install -c conda-forge nodejs=11
$ node -v #check node version (11.14.0)

# install bids-validator
$ npm install -g bids-validator
$ which bids-validator #shows your installation location
$ bids-validator -v #1.5.7 as of Dec-10-2020

In order to run the bids-validator, you need to give it a bids dataset as the input. Make sure you have your conda environment activated, and navigate to your project directory.

$ conda activate <myenv>
$ cd /jukebox/YOURLAB/USERNAME/YOURSTUDY
$ bids-validator data/bids

Read the red “errors” and yellow “warnings”. You should try to fix the red “errors” before you continue. Re-run until the bids-validator is appeased. Note that “warnings” can be ignored for now, but you’ll probably want to fix them at some point.

Step 4: Deface anatomical images

IMPORTANT: The defacing step is included in step1_preproc.sh! We are including additional instructions here in case you would like to run it separately. However, you do not need to continue with this step if you left it as is as part of step1_preproc.sh.

Eventually, if you want to share de-identified data, you will need to deface anatomical images. You do not want to use the defaced images for any further preprocessing step (unless you are certain it won’t mess up a downstream preprocessing or analysis step). So after defacing the images, we will set them aside in the ../data/bids/derivatives/deface so they are available whenever you need them.

The deface.sh script will run pydeface to deface the T1w structural images and move the defaced image into your ../data/bids/derivatives/deface directory. It takes two inputs:

  • subjectID

  • sessionID

Running pydeface on the cluster:

To run pydeface on the head node, we recommend using a tmux window (it takes ~9 min to deface one image).

# open a new tmux window called deface
tmux new -s deface

# OR  attach to a previously opened window called deface
tmux a -t deface

# move into your code directory
cd /jukebox/YOURLAB/USERNAME/YOURSTUDY/code/preprocessing

# call deface script
./deface.sh XXX xx #example is subject XXX, session xx

You can also run pydeface using Slurm, which is especially useful if you want to run this step for multiple subjects and/or multiple sessions all at once. The script that we will call to run a job on SLURM is code/preprocessing/slurm_deface.sh.

  • Update lines in slurm_deface.sh:
    • Line 7: array number should be equal to all the subject numbers you want to run the script on (if you enter multiple, it will run them all in parallel) e.g., array=001,002,003

    • Lines 23 -24: update if you want to get an email with the update on the code

    • Line 34: change if you want to run on a different session besides session 1

Tip

In Slurm scripts, lines that start with #SBATCH are Slurm commands, not comments! All other lines that start with # are regular comments.

To submit the job:

# move into your code directory
cd /jukebox/YOURLAB/USERNAME/YOURSTUDY/code/preprocessing

# submit the job
sbatch slurm_deface.sh

Note you don’t have to include the subjectID and sessionID inputs here because you defined this information in the slurm_deface.sh script itself.

return to timeline