Quickstart

A quickreference summary of how to obtain a copy of this workflow and prep an environment for analysis.

quickstart

Table of contents

Prepare working environment:
Prepare data and Metadata
Execute analysis

Prepare working environment:

First you will need to download a copy of the pipeline to a location where you can configure and execute it. Navigate to our GitHub repository and retrieve the latest tag information.
Next, you can use GIT to clone a copy of the pipeline to your working environment:
```
 git clone https://github.com/Tuks-ICMM/Pharmacogenetic-Analysis-Pipeline .
```
Tags are available on our GitHub repository under the releases page.

Prepare data and Metadata

In order to execute the Pharmacogenetics Analysis Workflow, you will need to configure the pipeline as well as provide information about the analysis you wish to perform. This involves the following files:

Configuration files:
- config/config.json (General configuration) Metadata files:
- input/datasets.csv (Dataset declarations)
- input/samples.csv (Sample metadata)
- input/locations.csv (Genomic location metadata)
- input/transcripts.csv (Transcript selection)
Following configuration, you will need to provide the input data files themselves.
- .vcf.gz files can be compressed but must be accompanied by a tabix index file (Discussion here)
- .fasta.gz files for reference sequences must be accompanied by a sequence dictionary file (.dict), a fasta index file (.fa.gz.fai or fasta.gz.fai) and a BGZIP-index (.fa.gz.gzi) (Discussion here).

Execute analysis

To execute the analysis, we need to compile our metadata and auto-generate a suitable queue-able script for the batch scheduler. To do this, you can use the run.py script which generates and queues a hidden generated script .run.sh written for your environment. For example:
```
 module load python-3.8.2
 run.py
 watch -t -d qstat
```

There is currently a known bug with run.py on some systems where the python instance does not have the requisite permissions and configuration to execute BASH commands. As a result, when executing the script, it will compile .run.sh successfully and hang without executing this script. IN such cases, the user can terminate the script (CTRL + C) and manually queue the generated script. In such cases, you can queue the script as follows:
  module load python-3.8.2
  run.py   # CTRL + C after confirming .run.sh has been created
  qsub .run.sh
  watch -t -d qstat

Support for non-PBS-Torque schedulers

Currently, only support for PBS-Torque has been included in the Pharmacogenetics Analysis Workflow. Dedicated integration of additional environments is intended for future releases. Generic snakemake profiles are currently available should you wish to manually integrate a different environments profile. Please be advised that we have made alterations to the standard Snakemake rule format to accommodate resource declarations. For more information, please see the section on the PBS/Torque bach scheduler.

Documentation is available for Snakemake Profiles and a repository of standardized profiles is available on GitHub.