Methodology
A breakdown of the process used in this workflow and how it has been implemented.
Reference Genome Configuration
Table of contents
Rule Map/Diagram
---
title: Population Structure Workflow
config:
flowchart:
defaultRenderer: elk
---
flowchart TB
subgraph population_structure_workflow[Population Structure Workflow]
direction TB
classDef bcftools stroke:#FF5733,fill:#D3D3D3,stroke-width:4px,color:black;
classDef plink stroke:#36454F,fill:#D3D3D3,stroke-width:4px,color:black;
classDef python stroke:#FEBE10,fill:#D3D3D3,stroke-width:4px,color:black;
classDef admixture stroke:#333,fill:#D3D3D3,stroke-width:4px,color:black;
classDef tabix stroke:#023020,fill:#D3D3D3,stroke-width:4px,color:black;
classDef gatk stroke:#007FFF,fill:#D3D3D3,stroke-width:4px,color:black;
START(((Input)))
END(((Output)))
extract_provided_region[[<b>extract_provided_region</b>: <br>Extract the provided region <br>coordinates for clustering]]
remove_rare_variants[[<b>remove_rare_variants</b>: <br>Remove all variants which are <br>not good indicators of population <br>structure by nature]]
plink_pca[[<b>plink_pca</b>: <br>Perform a <br>PLINK-2.0 PCA]]
report_fixation_index_per_cluster[[<b>report_fixation_index_per_cluster</b>: <br>Report Fixation-index for the <br>provided clusters]]
class remove_rare_variants,plink_pca,plinkPed,report_fixation_index_per_cluster,extract_provided_region plink;
class Admixture admixture;
class fetchPedLables python;
START --> extract_provided_region --> remove_rare_variants --> plink_pca & report_fixation_index_per_cluster
plink_pca & report_fixation_index_per_cluster --> END
end
extract_provided_region
flowchart TD
extract_provided_region[[<b>extract_provided_region</b>: <br>Extract the provided region <br>coordinates for clustering]]
classDef plink stroke:#36454F,fill:#D3D3D3,stroke-width:4px,color:black;
class extract_provided_region plink;
- Function
- Extract the requested coordinates to be used for population clustering, as provided in the
sample.csv
file. - Command
plink2 --threads {threads} --pfile {params.input} vzs --from-bp {params.fromBP} --to-bp {params.toBP} --chr {params.chr} --make-pgen vzs --out {params.output}
- Parameters
-
--threads {threads}
- Used to set the number of CPU threads used during this calculation
--pfile {params.input} vzs
- Used to provide plink with the location of a plink-2 binary file set (.psam, .pvar and .pgen files), and to expect z-compressed files.
--from-bp
- The start co-ordinates to start trimming from.
--to-bp
- The stop coordinates to trim until.
--chr
- The chromosome on which the coordinates can be found.
--make-pgen zs
- Save output to a BG-Zipped pgen binary fileset.
--out {params.output}
- Provide the file name and path for output creation.
remove_rare_variants
flowchart TD
remove_rare_variants[[<b>remove_rare_variants</b>: <br>Remove all variants which are <br>not good indicators of population <br>structure by nature]]
classDef plink stroke:#36454F,fill:#D3D3D3,stroke-width:4px,color:black;
class remove_rare_variants plink;
- Function
- Remove singletons as these do not contribute towards an understanding of clusters, since a singleton only serves to separate a sample from a possible cluster.
- Command
plink2 --threads {threads} --pfile {params.input} vzs --pheno {input.sample_metadata} --mac 2 --make-pgen vzs --out {params.output}
- Parameters
-
--threads {threads}
- Used to set the number of CPU threads used during this calculation.
--pfile {params.input} vzs
- Used to provide plink with the location of a plink-2 binary file set (.psam, .pvar and .pgen files), and to expect z-compressed files.
--pheno {input.sample_metadata}
- Responsible for annotating samples with provided annotations.
--mac 2
- Remove any variants with a total count of less than 2.
--make-pgen zs
- Save output to a BG-Zipped pgen binary fileset.
--out {params.output}
- Provide the file name and path for output creation.
plink_pca
flowchart TD
plink_pca[[<b>plink_pca</b>: <br>Perform a <br>PLINK-2.0 PCA]]
classDef plink stroke:#36454F,fill:#D3D3D3,stroke-width:4px,color:black;
class plink_pca plink;
- Function
- Perform dimensionality reduction on the samples provided and produce allele-weighted scores indicating possible population structure.
- Command
plink2 --threads {threads} --pfile {params.input} vzs --pca allele-wts --out {params.output}
- Parameters
-
--threads {threads}
- Used to set the number of CPU threads used during this calculation.
--pfile {params.input} vzs
- Used to provide plink with the location of a plink-2 binary file set (.psam, .pvar and .pgen files), and to expect z-compressed files.
--pca allele-wts
- Generate an allele-weighted PCA eigenvector and eigenvalue files.
--out {params.output}
- Provide the file name and path for output creation.
report_fixation_index_per_cluster
flowchart TD
report_fixation_index_per_cluster[[<b>report_fixation_index_per_cluster</b>: <br>Report Fixation-index for the <br>provided clusters]]
classDef plink stroke:#36454F,fill:#D3D3D3,stroke-width:4px,color:black;
class report_fixation_index_per_cluster plink;
- Function
- To generate a hardy-weinberg report.
- Command
plink2 --threads {threads} --pfile {params.input} vzs --fst {wildcards.cluster} report-variants zs --out {params.output}
- Parameters
-
--threads {threads}
- Used to set the number of CPU threads used during this calculation
--pfile {params.input} vzs
- Used to provide plink with the location of a plink-2 binary file set (.psam, .pvar and .pgen files), and to expect z-compressed files.
--fst {wildcards.cluster} report-variants zs
- Perform the requested fixation index calculations. the
report-variants
modifier requests variant-level fst results and thezs
modifier requests the output to be compressed. --out {params.output}
- Provide the file name and path for output creation.