WDL Requirements Intake Template
Use this template to capture all the information a WDL developer needs to design, build, and validate a workflow. Complete each section as thoroughly as possible — the more detail provided upfront, the fewer iterations required during development.
1. Project Overview
| Field |
Details |
| Project Name |
[Enter project or workflow name] |
| Requestor Name |
[Name and role of the person requesting the WDL] |
| Date Submitted |
[YYYY-MM-DD] |
| Target Completion Date |
[YYYY-MM-DD or "No deadline"] |
| Priority |
[Critical / High / Medium / Low] |
| Business Justification |
[Why is this workflow needed? What problem does it solve?] |
2. Workflow Purpose and Scope
2.1 High-Level Description
Provide a plain-language summary of what the workflow should accomplish from start to finish.
[Enter description here]
2.2 Scientific or Business Context
Describe the domain context. For bioinformatics workflows, include the biological question being addressed. For data processing workflows, describe the data domain.
[Enter context here]
2.3 Scope Boundaries
| In Scope |
Out of Scope |
| [What the workflow WILL do] |
[What the workflow will NOT do] |
| [...] |
[...] |
3. Input Specifications
3.1 Primary Inputs
For each input file or parameter the workflow requires, complete the following:
| Input Name |
Type |
Format / Extension |
Required? |
Example Value |
Description |
| [e.g. sample_bam] |
[File / String / Int / Float / Boolean / Array] |
[e.g. .bam] |
[Yes / No] |
[e.g. sample001.bam] |
[What this input represents] |
| [...] |
[...] |
[...] |
[...] |
[...] |
[...] |
3.2 Reference Data and Databases
| Reference Name |
Source / Version |
File Format |
Size (approx.) |
Update Frequency |
Access Location |
| [e.g. hg38 reference genome] |
[UCSC / Ensembl / NCBI] |
[.fa / .fasta] |
[e.g. 3.2 GB] |
[Static / Quarterly / Annual] |
[URL or file path] |
| [...] |
[...] |
[...] |
[...] |
[...] |
[...] |
3.3 Input Validation Rules
Describe any validation criteria that should be enforced on inputs before processing begins.
- [e.g. BAM files must be coordinate-sorted and indexed]
- [e.g. FASTQ files must be paired-end with matching read counts]
- [...]
4. Expected Outputs
4.1 Primary Outputs
| Output Name |
Format / Extension |
Description |
Downstream Consumer |
| [e.g. aligned_bam] |
[.bam] |
[Aligned and sorted BAM file] |
[Variant calling pipeline / Manual review] |
| [...] |
[...] |
[...] |
[...] |
4.2 Intermediate Outputs
List any intermediate files that should be preserved (not just final outputs).
| Output Name |
Format |
Reason to Retain |
| [e.g. duplicate_metrics] |
[.txt] |
[QC reporting] |
| [...] |
[...] |
[...] |
4.3 Quality Control Outputs
| QC Metric / Report |
Format |
Pass/Fail Criteria |
| [e.g. alignment rate] |
[.txt / .html] |
[>95% reads mapped] |
| [e.g. duplication rate] |
[.txt] |
[<30% duplicates] |
| [...] |
[...] |
[...] |
5. Workflow Steps and Logic
5.1 Step-by-Step Process
Describe each major processing step in order. For each step, include the tool, version, and key parameters.
| Step |
Task Name |
Tool / Software |
Version |
Key Parameters |
Description |
| 1 |
[e.g. FastQC] |
[FastQC] |
[v0.11.9] |
[--threads 4] |
[Raw read quality assessment] |
| 2 |
[e.g. Trim] |
[Trimmomatic] |
[v0.39] |
[LEADING:3 TRAILING:3] |
[Adapter and quality trimming] |
| 3 |
[...] |
[...] |
[...] |
[...] |
[...] |
5.2 Conditional Logic
Describe any branching or conditional steps in the workflow.
- [e.g. If sample is paired-end, run step X; if single-end, run step Y]
- [e.g. If QC fails, halt pipeline and notify]
- [...]
5.3 Scatter/Gather Operations
Describe any steps that should be parallelised across samples, chromosomes, or other units.
| Scatter Variable |
Scatter Over |
Gather Method |
| [e.g. sample_id] |
[Array of sample BAMs] |
[Merge output VCFs] |
| [e.g. chromosome] |
[Array of chromosome intervals] |
[Concatenate results] |
6. Compute and Runtime Requirements
6.1 Per-Task Resource Estimates
| Task Name |
CPU Cores |
Memory (GB) |
Disk (GB) |
GPU Required? |
Estimated Runtime |
| [e.g. BWA-MEM alignment] |
[8] |
[16] |
[100] |
[No] |
[~2 hours per sample] |
| [e.g. GATK HaplotypeCaller] |
[4] |
[8] |
[50] |
[No] |
[~1 hour per sample] |
| [...] |
[...] |
[...] |
[...] |
[...] |
[...] |
6.2 Execution Environment
| Field |
Details |
| Target Platform |
[Cromwell / miniWDL / Terra / DNAnexus / AWS HealthOmics / Other] |
| Backend |
[Local / HPC (Slurm/PBS) / Cloud (GCP/AWS/Azure)] |
| Container Registry |
[Docker Hub / GCR / ECR / Quay.io / Custom] |
| Preemptible/Spot Instances |
[Yes / No / Where possible] |
| Maximum Retry Attempts |
[e.g. 2] |
6.3 Docker Containers
| Task Name |
Docker Image |
Tag / Version |
Registry URL |
| [e.g. BWA alignment] |
[broadinstitute/bwa] |
[0.7.17] |
[docker.io/broadinstitute/bwa:0.7.17] |
| [...] |
[...] |
[...] |
[...] |
7. Error Handling and Edge Cases
7.1 Known Edge Cases
Describe any known edge cases or special conditions the workflow must handle.
- [e.g. Empty input files — workflow should skip processing and log a warning]
- [e.g. Very large samples (>100GB) may require increased disk allocation]
- [...]
7.2 Failure Modes and Recovery
| Failure Scenario |
Expected Behaviour |
Recovery Action |
| [e.g. Tool exits with non-zero code] |
[Task fails] |
[Retry up to N times, then halt] |
| [e.g. Disk space exhausted] |
[Task fails] |
[Increase disk multiplier and retry] |
| [...] |
[...] |
[...] |
8. Testing and Validation
8.1 Test Data
| Dataset Name |
Size |
Location |
Description |
| [e.g. NA12878 downsampled] |
[~1 GB] |
[gs://bucket/test-data/] |
[Standard reference sample for validation] |
| [...] |
[...] |
[...] |
[...] |
8.2 Acceptance Criteria
Define what "done" looks like. How will you validate the workflow produces correct results?
- [e.g. Output VCF matches expected variants in truth set with >99% sensitivity]
- [e.g. Workflow completes successfully on all 5 test samples]
- [e.g. Runtime is within 20% of estimated duration]
- [...]
9. Metadata and Governance
| Field |
Details |
| Data Classification |
[Public / Internal / Confidential / Restricted] |
| Compliance Requirements |
[HIPAA / GDPR / GxP / None] |
| Data Retention Policy |
[e.g. Retain outputs for 7 years] |
| Audit Trail Required? |
[Yes / No] |
| Version Control Repository |
[e.g. github.com/org/repo] |
| WDL Version |
[1.0 / 1.1 / development] |
10. Stakeholders and Approvals
| Role |
Name |
Email |
Sign-off Required? |
| Requestor |
[Name] |
[email] |
[Yes] |
| WDL Developer |
[Name] |
[email] |
[Yes] |
| Scientific Lead |
[Name] |
[email] |
[Yes / No] |
| DevOps / Platform |
[Name] |
[email] |
[Yes / No] |
| Data Governance |
[Name] |
[email] |
[Yes / No] |
11. Additional Notes
Include any additional context, links to related documentation, diagrams, or references that would help the WDL developer.
[Enter any additional notes here]
Revision History
| Version |
Date |
Author |
Changes |
| 1.0 |
[YYYY-MM-DD] |
[Name] |
[Initial submission] |
| [...] |
[...] |
[...] |
[...] |