Pipeline Commands
The following subcommands are built into the container image and available via
skimindex <command>. For global options, runtime detection, and bind-mount
configuration see Entry Point.
All pipeline commands share a common set of flags inherited from the
SkimCommand base class:
| Flag | Description |
|---|---|
--list |
Print available sections (datasets or divisions) as CSV and exit. |
--dry-run |
Show what would be done without executing anything. |
--help |
Show command help and exit. |
download — Download raw data
Download GenBank flat-file releases and NCBI reference genome assemblies.
skimindex download # download everything
skimindex download genbank [options] # GenBank flat-files only
skimindex download ncbi [options] # NCBI genome assemblies only
download genbank
Downloads GenBank flat-file divisions declared in [source.genbank].
| Option | Description |
|---|---|
--division DIV |
Process a single GenBank division (e.g. pln, bct). |
--status |
Show download status without downloading. |
--list |
Print available divisions and exit. |
--dry-run |
Show what would be downloaded without executing. |
download ncbi
Downloads NCBI reference genome assemblies declared as source = "ncbi" data
sections.
| Option | Description |
|---|---|
--dataset NAME |
Process a single NCBI dataset (e.g. human, plants). |
--taxon TAXON |
Query assemblies for a taxon and display results (no download). |
--one-per species\|genus |
Keep only one assembly per species or genus. |
--assembly-level LEVEL |
Filter by assembly level (e.g. complete, chromosome). |
--assembly-source SOURCE |
Filter by assembly source (refseq, genbank). |
--assembly-version VERSION |
Filter by assembly version (e.g. latest). |
--reference |
Filter to reference assemblies only. |
--status |
Show download status without downloading. |
--list |
Print available datasets and exit. |
--dry-run |
Show what would be downloaded without executing. |
decontam — Prepare decontamination filter
Prepare reference sequences for building the decontamination k-mer filter.
skimindex decontam # run full pipeline (prepare + count)
skimindex decontam prepare [options] # split genomes into fragments
skimindex decontam count [options] # count k-mers in fragments
decontam prepare
Splits reference genomes into overlapping fragments using the
[processing.prepare_decontam] pipeline.
| Option | Description |
|---|---|
--dataset NAME |
Process a single decontamination dataset (e.g. human, fungi). |
--list |
Print available datasets and exit. |
--dry-run |
Show what would be processed without executing. |
decontam count
Counts k-mers in prepared fragments using [processing.count_kmers_decontam].
| Option | Description |
|---|---|
--dataset NAME |
Process a single decontamination dataset. |
--list |
Print available datasets and exit. |
--dry-run |
Show what would be processed without executing. |
validate — Validate configuration
Loads config/skimindex.toml, runs all validation rules, and reports errors.
skimindex validate [--config PATH]
| Option | Description |
|---|---|
--config PATH |
Path to the config file (default: /config/skimindex.toml). |
Exits with code 0 if valid, 1 if errors are found.