Quickstart
A quickreference summary of how to obtain a copy of this workflow and prep an environment for analysis.
quickstart
Table of contents
Prepare working environment:
- First you will need to download a copy of the pipeline to a location where you can configure and execute it. Navigate to our GitHub repository and retrieve the latest tag information.
- Next, you can use
GITto clone a copy of the pipeline to your working environment:git clone https://github.com/Tuks-ICMM/Pharmacogenetic-Analysis-Pipeline .Tags are available on our GitHub repository under the releases page.
Prepare data and Metadata
-
In order to execute the Pharmacogenetics Analysis Workflow, you will need to configure the pipeline as well as provide information about the analysis you wish to perform. This involves the following files:
Configuration files:
config/config.json(General configuration) Metadata files:input/datasets.csv(Dataset declarations)input/samples.csv(Sample metadata)input/locations.csv(Genomic location metadata)input/transcripts.csv(Transcript selection)
-
Following configuration, you will need to provide the input data files themselves.
.vcf.gzfiles can be compressed but must be accompanied by a tabix index file (Discussion here).fasta.gzfiles for reference sequences must be accompanied by a sequence dictionary file (.dict), a fasta index file (.fa.gz.faiorfasta.gz.fai) and a BGZIP-index (.fa.gz.gzi) (Discussion here).
Execute analysis
- To execute the analysis, we need to compile our metadata and auto-generate a suitable queue-able script for the batch scheduler. To do this, you can use the
run.pyscript which generates and queues a hidden generated script.run.shwritten for your environment. For example:module load python-3.8.2 run.py watch -t -d qstat
There is currently a known bug with
run.pyon some systems where the python instance does not have the requisite permissions and configuration to executeBASHcommands. As a result, when executing the script, it will compile.run.shsuccessfully and hang without executing this script. IN such cases, the user can terminate the script (CTRL + C) and manually queue the generated script. In such cases, you can queue the script as follows:module load python-3.8.2 run.py # CTRL + C after confirming .run.sh has been created qsub .run.sh watch -t -d qstat
Support for non-PBS-Torque schedulers
Currently, only support for PBS-Torque has been included in the Pharmacogenetics Analysis Workflow. Dedicated integration of additional environments is intended for future releases. Generic snakemake profiles are currently available should you wish to manually integrate a different environments profile. Please be advised that we have made alterations to the standard Snakemake rule format to accommodate resource declarations. For more information, please see the section on the PBS/Torque bach scheduler.
Documentation is available for Snakemake Profiles and a repository of standardized profiles is available on GitHub.