WDL Deep Technical Documentation Template

AdvancedWDL Templates2026-03-19

WDL Deep Technical Documentation Template

Use this template to produce a thorough technical document that describes a WDL workflow in full detail. This document is intended for developers, bioinformaticians, platform engineers, and technical reviewers who need to understand, maintain, extend, or troubleshoot the workflow.

1. Document Control

Field	Details
Document Title	[Workflow name — Technical Documentation]
Document Version	[e.g. 1.0.0]
WDL Version	[1.0 / 1.1 / development]
Workflow Version	[e.g. 2.3.1 — use semantic versioning]
Author(s)	[Names and roles]
Last Updated	[YYYY-MM-DD]
Status	[Draft / In Review / Approved / Deprecated]
Repository	[URL to source repository]
Licence	[e.g. MIT / BSD-3 / Proprietary]

2. Workflow Overview

2.1 Purpose

Provide a concise technical summary of what this workflow does and the problem it solves.

[Enter purpose here]

2.2 Workflow Identifier

Field	Value
Workflow Name	[e.g. germline-variant-calling]
Namespace	[e.g. org.broadinstitute.pipelines]
Entry Point	[e.g. GermlineVariantCalling]
Source File	[e.g. germline_variant_calling.wdl]

2.3 Changelog

Version	Date	Author	Summary of Changes
[2.3.1]	[YYYY-MM-DD]	[Name]	[Upgraded GATK to 4.5.0]
[2.3.0]	[YYYY-MM-DD]	[Name]	[Added BQSR step]
[...]	[...]	[...]	[...]

3. Architecture

3.1 Workflow Diagram

Include a visual representation of the workflow. Use Mermaid, draw.io, or a similar tool.

[Input Files] --> [Task 1: QC] --> [Task 2: Trim] --> [Task 3: Align] --> [Task 4: Sort/Mark Duplicates]
                                                                                      |
                                                                                      v
                                                      [Task 7: Merge VCFs] <-- [Task 5: Call Variants (scattered)]
                                                              |
                                                              v
                                                      [Task 6: Filter] --> [Final Outputs]

Replace the above with an accurate diagram for your workflow.

3.2 Workflow Structure

Component	File	Description
Main Workflow	[main.wdl]	[Orchestrates all tasks and sub-workflows]
Sub-workflow: QC	[qc.wdl]	[Quality control sub-workflow]
Task Library	[tasks/alignment.wdl]	[Reusable alignment tasks]
Struct Definitions	[structs.wdl]	[Custom struct type definitions]
[...]	[...]	[...]

3.3 Import Dependencies

import "tasks/alignment.wdl" as Alignment
import "tasks/variant_calling.wdl" as VariantCalling
import "structs.wdl" as Structs

List all imported WDL files and their purposes.

Import Alias	Source File	Purpose
[Alignment]	[tasks/alignment.wdl]	[BWA-MEM and post-alignment processing tasks]
[...]	[...]	[...]

4. Input Specification

4.1 Workflow-Level Inputs

Input Name	WDL Type	Required	Default	Description
[sample_name]	`String`	Yes	—	[Unique identifier for the sample]
[input_bam]	`File`	Yes	—	[Input BAM file for processing]
[reference_fasta]	`File`	Yes	—	[Reference genome FASTA file]
[reference_fasta_index]	`File`	Yes	—	[.fai index for reference genome]
[call_regions]	`File?`	No	null	[Optional BED file to restrict calling regions]
[scatter_count]	`Int`	No	24	[Number of shards for parallel variant calling]
[...]	[...]	[...]	[...]	[...]

4.2 Custom Struct Definitions

struct SampleInfo {
    String sample_id
    String patient_id
    File input_bam
    File input_bam_index
    String? library_name
}

Document each custom struct used in the workflow.

Struct Name	Field	Type	Description
[SampleInfo]	[sample_id]	`String`	[Unique sample identifier]
[SampleInfo]	[patient_id]	`String`	[Associated patient identifier]
[SampleInfo]	[input_bam]	`File`	[Path to input BAM]
[...]	[...]	[...]	[...]

4.3 Example Input JSON

{
  "GermlineVariantCalling.sample_name": "NA12878",
  "GermlineVariantCalling.input_bam": "gs://bucket/samples/NA12878.bam",
  "GermlineVariantCalling.reference_fasta": "gs://bucket/references/hg38.fa",
  "GermlineVariantCalling.reference_fasta_index": "gs://bucket/references/hg38.fa.fai",
  "GermlineVariantCalling.scatter_count": 24
}

5. Task Specifications

Document every task in the workflow. Repeat this section for each task.

5.1 Task: [TaskName]

Purpose: [What this task does in one sentence]

Command Block

command <<<
    set -euo pipefail

    ~{tool_path} \
        --input ~{input_file} \
        --output ~{output_prefix}.bam \
        --reference ~{reference_fasta} \
        --threads ~{cpu}
>>>

Inputs

Input	WDL Type	Required	Source	Description
[input_file]	`File`	Yes	[Workflow input / Previous task output]	[Description]
[...]	[...]	[...]	[...]	[...]

Outputs

Output	WDL Type	Filename Pattern	Description
[aligned_bam]	`File`	`~{output_prefix}.bam`	[Aligned BAM file]
[alignment_log]	`File`	`~{output_prefix}.log`	[Tool log output]
[...]	[...]	[...]	[...]

Runtime Attributes

Attribute	Value	Notes
docker	[broadinstitute/gatk:4.5.0.0]	[Source and version justification]
cpu	[~{cpu}]	[Default: 4]
memory	[~{memory_gb} + " GB"]	[Default: 16 GB]
disks	["local-disk " + disk_size + " SSD"]	[Calculated from input size]
preemptible	[~{preemptible_tries}]	[Default: 2]
maxRetries	[~{max_retries}]	[Default: 1]

Disk Size Calculation

Int disk_size = ceil(size(input_file, "GB") * 2.5) + 20

Explain the rationale for the disk calculation (e.g., input size x 2.5 for intermediate files + 20 GB headroom).

[Repeat Section 5.1 for each task in the workflow]

6. Workflow Logic and Control Flow

6.1 Task Execution Order

Describe the directed acyclic graph (DAG) of task dependencies.

1. FastQC (independent — can run in parallel with step 2)
2. TrimReads
3. AlignReads (depends on: TrimReads)
4. MarkDuplicates (depends on: AlignReads)
5. ScatteredHaplotypeCaller (depends on: MarkDuplicates) [SCATTER]
6. MergeVCFs (depends on: ScatteredHaplotypeCaller) [GATHER]
7. FilterVariants (depends on: MergeVCFs)

6.2 Scatter Operations

Scatter Variable	Type	Source	Tasks Scattered	Gather Method
[interval]	`Array[File]`	[SplitIntervals output]	[HaplotypeCaller]	[MergeVCFs — concatenation]
[...]	[...]	[...]	[...]	[...]

6.3 Conditional Execution

if (defined(call_regions)) {
    call RestrictToRegions { input: bed_file = select_first([call_regions]) }
}

Condition	Evaluates	Tasks Affected	Behaviour When False
[defined(call_regions)]	[Whether BED file is provided]	[RestrictToRegions]	[Uses whole genome]
[...]	[...]	[...]	[...]

7. Output Specification

7.1 Final Outputs

Output Name	WDL Type	Source Task	File Pattern	Description
[final_vcf]	`File`	[FilterVariants]	`~{sample_name}.filtered.vcf.gz`	[Filtered variant calls]
[final_vcf_index]	`File`	[FilterVariants]	`~{sample_name}.filtered.vcf.gz.tbi`	[VCF index file]
[qc_report]	`File`	[FastQC]	`~{sample_name}_fastqc.html`	[Quality control report]
[...]	[...]	[...]	[...]	[...]

7.2 Output Validation

Describe how outputs can be validated for correctness.

Output	Validation Method	Expected Result
[final_vcf]	[bcftools stats]	[Non-zero variant count; valid VCF format]
[final_bam]	[samtools flagstat]	[>95% mapping rate]
[...]	[...]	[...]

8. Docker Containers and Dependencies

8.1 Container Inventory

Container Image	Tag	Size	Tools Included	Used By Tasks
[broadinstitute/gatk]	[4.5.0.0]	[~1.8 GB]	[GATK, Samtools, Picard]	[MarkDuplicates, HaplotypeCaller, FilterVariants]
[biocontainers/bwa]	[0.7.17]	[~200 MB]	[BWA]	[AlignReads]
[...]	[...]	[...]	[...]	[...]

8.2 Container Build and Maintenance

Field	Details
Dockerfile Location	[e.g. docker/ directory in repo]
Build Process	[e.g. CI/CD automatic build on tag push]
Vulnerability Scanning	[e.g. Trivy / Snyk / Manual]
Update Cadence	[e.g. Quarterly or on tool version bump]

9. Performance Characteristics

9.1 Benchmarks

Provide benchmark data from representative runs.

Dataset	Samples	Total Runtime	Total Cost (est.)	Platform
[30x WGS NA12878]	[1]	[~6 hours]	[~$12 USD]	[GCP — Cromwell on Terra]
[30x WGS cohort]	[100]	[~18 hours]	[~$950 USD]	[GCP — Cromwell on Terra]
[...]	[...]	[...]	[...]	[...]

9.2 Per-Task Performance Breakdown

Task	Avg. Runtime	Avg. CPU Utilisation	Peak Memory	Avg. Disk Used
[AlignReads]	[45 min]	[85%]	[14 GB]	[80 GB]
[HaplotypeCaller]	[30 min/shard]	[70%]	[6 GB]	[10 GB]
[...]	[...]	[...]	[...]	[...]

9.3 Scaling Considerations

Describe how the workflow scales with increasing data volume, sample count, or complexity.

[e.g. Runtime scales linearly with sample count due to scatter parallelism]
[e.g. Memory for joint genotyping scales with cohort size — recommend increasing memory for >500 samples]
[...]

10. Error Handling and Troubleshooting

10.1 Common Failure Modes

Error Symptom	Root Cause	Resolution
[Task fails with OOM (exit code 137)]	[Insufficient memory allocation]	[Increase memory_gb input parameter]
[Non-zero exit from tool X]	[Corrupt or truncated input file]	[Verify input file integrity; re-upload if needed]
[Disk space exhausted]	[Disk multiplier too low for large inputs]	[Increase disk_multiplier parameter]
[...]	[...]	[...]

10.2 Retry Logic

Task	Max Retries	Preemptible Tries	Retry Behaviour
[AlignReads]	[1]	[2]	[Retries on preemption; fails on tool error]
[...]	[...]	[...]	[...]

10.3 Log File Locations

Log Type	Location / Pattern	Description
[stdout]	[execution/stdout]	[Standard output from command block]
[stderr]	[execution/stderr]	[Standard error — primary debugging log]
[tool-specific]	[~{sample_name}.tool.log]	[Detailed tool-level logging]

11. Platform-Specific Configuration

11.1 Cromwell

{
  "backend": "PAPIv2",
  "options": {
    "jes_gcs_root": "gs://bucket/cromwell-executions",
    "default_runtime_attributes": {
      "zones": "us-central1-a us-central1-b",
      "preemptible": 2
    }
  }
}

11.2 miniWDL

[scheduler]
container_backend=docker

[docker]
image_cache=/tmp/miniwdl_cache

11.3 Terra / DNAnexus / AWS HealthOmics

Include any platform-specific notes, workspace setup instructions, or configuration overrides.

Platform	Configuration Notes
[Terra]	[Upload inputs JSON via workspace Data tab; configure method with this WDL]
[DNAnexus]	[Compile with dxCompiler v2.x; set instance types in extras.json]
[AWS HealthOmics]	[Package as a private workflow; configure ECR container references]

12. Testing

12.1 Test Strategy

Test Type	Description	Data	Expected Outcome
Unit Test	[Individual task validation]	[Minimal synthetic inputs]	[Correct output format and content]
Integration Test	[Full workflow end-to-end]	[Downsampled real data (~1 GB)]	[Workflow completes; outputs match truth set]
Regression Test	[Compare outputs across versions]	[Frozen test dataset]	[Outputs are identical or within tolerance]
Scale Test	[Run at production volume]	[Full-size production data]	[Completes within time/cost budget]

12.2 Validation Commands

# Validate WDL syntax
womtool validate workflow.wdl

# Generate inputs template
womtool inputs workflow.wdl > inputs.json

# Dry-run with Cromwell
java -jar cromwell.jar run workflow.wdl -i inputs.json --options options.json

13. Security and Compliance

Field	Details
Data Classification	[Public / Internal / Confidential / Restricted]
Encryption at Rest	[e.g. GCS default encryption / Customer-managed keys]
Encryption in Transit	[e.g. TLS 1.2+]
Access Controls	[e.g. IAM roles, service accounts, VPC-SC]
Audit Logging	[e.g. Cloud Audit Logs enabled]
Compliance Frameworks	[e.g. HIPAA BAA in place / GDPR DPA signed / GxP validated]
Data Residency	[e.g. All processing in us-central1]

14. Maintenance and Support

Field	Details
Owning Team	[Team name and contact]
Support Channel	[e.g. Slack #wdl-support / JIRA project XYZ]
On-Call Rotation	[e.g. PagerDuty schedule link]
Review Cadence	[e.g. Quarterly review of tool versions and performance]
Deprecation Policy	[e.g. Prior versions supported for 6 months after new release]

Appendix A: Complete Input Reference

Auto-generated or manually maintained complete list of every input parameter with full descriptions, types, defaults, and valid ranges.

Appendix B: Glossary

Term	Definition
WDL	Workflow Description Language — a specification for describing data processing workflows
Scatter	A WDL construct that parallelises a task across an array of inputs
Gather	The implicit collection of scattered task outputs back into an array
Preemptible	A cloud VM instance that can be reclaimed by the provider at any time, offered at reduced cost
[...]	[...]

Appendix C: References

[WDL Specification: https://github.com/openwdl/wdl]
[Cromwell Documentation: https://cromwell.readthedocs.io]
[Tool documentation links]
[Related internal documentation]