Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ serde_json = { version = "=1.0.148", features = ["preserve_order", "indexmap", "
serde_repr = "=0.1.20"
serde_stacker = { version = "=0.1.14" }
serde_yaml = "=0.9.34"
smart-default = "=0.7.1"
strsim = "=0.11.1"
strum = "=0.27.2"
strum_macros = "=0.27.2"
Expand Down
57 changes: 57 additions & 0 deletions docs/user/input-files/05-pathogen-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,63 @@ In addition, a "default" value can be specified for amino acid mutations that ar

If the score is only relevant for specific clades, you can specify which clades are to be ignored.

#### Nucleotide mutation pattern detection (`mutationPatterns`)

Nextclade can detect named groups of private nucleotide substitutions. This is useful for reporting mutation patterns such as RNA editing signatures separately from the generic SNP cluster QC rule.

Pattern detection is configured with `mutationPatterns.patterns`. Each pattern has an `id`, a display `name`, optional `description`, one or more `events`, and optional clustering parameters. The only supported event type is currently `nucSubstitution`.

```json
"mutationPatterns": {
"patterns": [
{
"id": "adar",
"name": "ADAR-like RNA editing",
"description": "ADAR-mediated A-to-I editing observed as A>G and complementary T>C",
"events": [
{
"type": "nucSubstitution",
"ref": ["A"],
"qry": ["G"]
},
{
"type": "nucSubstitution",
"ref": ["T"],
"qry": ["C"]
}
],
"cluster": {
"windowSize": 50,
"cutoff": 3
}
},
{
"id": "apobec",
"name": "APOBEC-like cytosine deamination",
"description": "APOBEC-like cytosine deamination observed as G>A in a reference motif",
"events": [
{
"type": "nucSubstitution",
"ref": ["G"],
"qry": ["A"],
"motifs": ["[CT]G[ACT]"]
}
],
"cluster": {
"windowSize": 50,
"cutoff": 3
}
}
]
}
```

The `ref` and `qry` arrays use Nextclade nucleotide symbols, including IUPAC ambiguity codes such as `N`, `R`, and `Y`. A substitution matches when both the reference and query nucleotide match one of the configured symbols.

The `motifs` array contains regular expressions matched against the reference sequence. A motif qualifies a substitution when the regex match interval contains the substituted reference position. Motifs are regular expressions over the reference letters, so use regex character classes such as `[CT]` instead of IUPAC ambiguity symbols when matching multiple reference letters inside a motif.

The optional `cluster` object reports clusters within mutations matched by that pattern. It does not replace `qc.snpClusters`: `qc.snpClusters` remains the generic global SNP cluster QC rule over all private nucleotide substitutions.

#### Amino acid motif detection (`aaMotifs`)

Nextclade can detect and report specific motifs in translated amino acid sequences. This feature is currently being used to highlight changes in glycosylation or cleavage sites, but the feature itself is generic.
Expand Down
10 changes: 9 additions & 1 deletion docs/user/output-files/04-results-tsv.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,15 @@ Every row in tabular output corresponds to 1 input sequence. The meaning of colu
| qc.stopCodons.totalStopCodons | Total number of detected stop codons in "Stop codons" QC rule | non-negative integer | 2 |
| qc.stopCodons.score | Score for "Stop codons" QC rule | float | 0.5 |
| qc.stopCodons.status | Status for "Stop codons" QC rule | string: `good | mediocre |bad` | bad |
| mutationPatterns.id | Mutation pattern identifier, or multiple identifiers separated by `\|` | string | adar |
| mutationPatterns.name | Mutation pattern display name, or multiple names separated by `\|` | string | ADAR-like RNA editing |
| mutationPatterns.description | Mutation pattern description, or multiple descriptions separated by `\|` | string | ADAR-mediated A-to-I editing |
| mutationPatterns.counts.matches | Total number of events matched by mutation patterns | non-negative integer | 14 |
| mutationPatterns.counts.clustered | Total number of matched events that occur in mutation pattern clusters | non-negative integer | 9 |
| mutationPatterns.counts.clusters | Total number of mutation pattern clusters | non-negative integer | 2 |
| mutationPatterns.eventTypeCounts | Matched event type counts for mutation patterns | comma separated list of strings | nucSubstitution:A>G:5 |
| mutationPatterns.clusters | Matched mutation pattern cluster ranges and event counts | comma separated list of strings | 3003-3011:5 |
| mutationPatterns.clusterEvents | Events in matched mutation pattern clusters | comma separated list of strings | 3003-3011:nucSubstitution:A3003G |
| isReverseComplement | Whether query sequences were transformed using reverse complement operation before alignment | boolean | false |
| errors | List of errors during processing | comma separated list of strings | |
| warnings | List of warnings during processing | comma separated list of strings | |
Expand All @@ -123,4 +132,3 @@ The table can contain additional columns for every clade-like attribute defined
> <br/>
>
> See descriptions of individual outputs and [Errors and warnings](./errors-and-warnings.md) section for more details.

2 changes: 2 additions & 0 deletions packages/nextclade-cli/src/cli/nextclade_loop.rs
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,7 @@ pub fn nextclade_run(mut run_args: NextcladeRunArgs) -> Result<(), Report> {
clade_node_attr_key_descs,
phenotype_attr_descs,
aa_motif_keys,
mutation_pattern_keys,
ref_nodes,
..
} = nextclade.get_initial_data();
Expand All @@ -148,6 +149,7 @@ pub fn nextclade_run(mut run_args: NextcladeRunArgs) -> Result<(), Report> {
&phenotype_attr_descs,
&ref_nodes,
&aa_motif_keys,
&mutation_pattern_keys,
&csv_column_config,
&run_args.outputs,
&nextclade.params,
Expand Down
3 changes: 3 additions & 0 deletions packages/nextclade-cli/src/cli/nextclade_ordered_writer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ impl NextcladeOrderedWriter {
phenotype_attr_key_desc: &[PhenotypeAttrDesc],
ref_nodes: &AuspiceRefNodesDesc,
aa_motifs_keys: &[String],
mutation_pattern_keys: &[String],
csv_column_config: &CsvColumnConfig,
output_params: &NextcladeRunOutputArgs,
params: &NextcladeInputParams,
Expand Down Expand Up @@ -73,6 +74,7 @@ impl NextcladeOrderedWriter {
&phenotype_attr_keys,
ref_nodes,
aa_motifs_keys,
mutation_pattern_keys,
csv_column_config,
)
})?;
Expand All @@ -85,6 +87,7 @@ impl NextcladeOrderedWriter {
&phenotype_attr_keys,
ref_nodes,
aa_motifs_keys,
mutation_pattern_keys,
csv_column_config,
)
})?;
Expand Down
Loading