Getting Started – DIA Preprocesing
DO-MS is an application to visualize mass-spec data both in an interactive application and static reports generated via. the command-line. In this document we’ll walk you through analyzing an example dataset in the interactive application.
Table of Contents
We have provided an example data set online, which contains parts of the MS2 number optimization in the paper. You can download a .zip bundle of it here: https://drive.google.com/file/d/1bjFzKqTFLk7ECUJOTy8LNxsCOD0Xf96Q/view?usp=share_link. The contents of the archive are the main dia-nn report and the corresponding raw files.
Your folder should look like this:
Please make sure that you installed DO-MS as descibed in the installation section. For using the preprocessing pipeline it is necessary to install the ThermoRawFileParser and the Dinosaur feature detection.
Processing Raw Data
Open a terminal and enter the base folder of your DO-MS installation. Make sure that your DO-MS environment is set up and activate it.
conda activate doms
For processing, the piplline module located at
pipeline/processing.py will be called with the following parameters.
python pipeline/processing.py /location/to/example/report_filtered.tsv
the following additional options will be included:
# Activate Mono if using Mac or Linux. Mono is required to run the Thermo Raw File Parser on Linux and OSX. -m # location of the ThermoRawFileParser executeable --raw-parser-location /location/to/ThermoRawFileParser1.4.2/ThermoRawFileParser.exe # location of the Dinosaur .jar file --dinosaur-location /Users/georgwallmann/Library/CloudStorage/OneDrive-Personal/Studium/Northeastern/DO-MS-DIA/Dinosaur-1.2.0.free.jar # location of the example raw data -r /location/to/example
The full command needs to be a single line and will look like:
python pipeline/processing.py /location/to/example/report_filtered.tsv -m --raw-parser-location /location/to/ThermoRawFileParser1.4.2/ThermoRawFileParser.exe --dinosaur-location /Users/georgwallmann/Library/CloudStorage/OneDrive-Personal/Studium/Northeastern/DO-MS-DIA/Dinosaur-1.2.0.free.jar -r /location/to/example
After processing, the additional files should be part of your folder:
Temporary .mzML files can be deleted.
Command Line Interface
The documentation for the various command line options can be found by typing
python pipeline/processing.py -h
usage: processing.py [-h] --raw-parser-location RAW_PARSER_LOCATION [--dinosaur-location DINOSAUR_LOCATION] [-m] [-d] [-v] [-t TEMPORARY_FOLDER] [-r RAW_FILE_LOCATION] [--no-feature-detection] [--no-fill-times] [--no-tic] [--no-sn] [--no-mzml-generation] [--mz-bin-size MZ_BIN_SIZE] [--rt-bin-size RT_BIN_SIZE] [--resolution RESOLUTION] [-p PROCESSES] [--isotopes-sn] report Command line tool for feature detection in shotgun MS experiments. Can be used together with DIA-NN to provide additional information on the peptide like features identified in the MS1 spectra. positional arguments: report Location of the report.tsv output from DIA-NN which should be used for analysis. options: -h, --help show this help message and exit --raw-parser-location RAW_PARSER_LOCATION Path pointing to the ThermoRawFileParser executeable. --dinosaur-location DINOSAUR_LOCATION Path pointing to the dinosaur jar executeable. -m, --mono Use mono for ThermoRawFileParser under Linux and OSX. -d, --delete Delete generated mzML and copied raw files after successfull feature generation. -v, --verbose Show verbose output. -t TEMPORARY_FOLDER, --temporary-folder TEMPORARY_FOLDER Input Raw files will be temporarilly copied to this folder. Required for use with Google drive. -r RAW_FILE_LOCATION, --raw-file-location RAW_FILE_LOCATION By default, raw files are loaded based on the File.Name column in the report.tsv. With this option, a different folder can be specified. --no-feature-detection All steps are performed as usual but Dinosaur feature detection is skipped. No features.tsv file will be generated. --no-fill-times All steps are performed as usual but fill times are not extracted. No fill_times.tsv file will be generated. --no-tic All steps are performed as usual but binned TIC is not extracted. No tic.tsv file will be generated. --no-sn Signal to Noise ratio is not estimated for precursors --no-mzml-generation Raw files are not converted to .mzML. Nevertheless, mzML files are expected in their theoretical output location and loaded. Should be only be carefully used for repeated calulcations or debugging --mz-bin-size MZ_BIN_SIZE Bin size over the mz dimension for TIC binning. --rt-bin-size RT_BIN_SIZE Bin size over the RT dimension for TIC binning in minutes. If a bin size of 0 is provided, binning will not be applied and TIC is given per scan. --resolution RESOLUTION Set the resolution used for estimating counts from S/N data -p PROCESSES, --processes PROCESSES Number of Processes --isotopes-sn Use all isototopes from the same scan as the highest intensity datapoint for estimating the SN and copy number.