Tools installed in the container image
This page documents all third-party tools bundled in the skimindex container image.
Versions are pinned in docker/Makefile and substituted at build time via --build-arg.
OBITools4
Bioinformatics suite for the analysis of DNA metabarcoding data. Provides sequence manipulation, demultiplexing, taxonomic assignment, and quality filtering.
| Version | 4.4.29 (pinned; OBITOOLS_VERSION in docker/Makefile) |
| Binary location | /usr/local/bin/obi* |
| Architecture | Compiled from source for each target platform (builder stage) |
| Language | Go |
| GitHub | https://github.com/metabarcoding/obitools4 |
| Documentation | https://obitools4.metabarcoding.org/ |
| Institution | LECA / Université Grenoble Alpes |
Reference
Boyer F., Mercier C., Bonin A., Le Bras Y., Taberlet P., Coissac E. (2016). obitools: a unix-inspired software package for DNA metabarcoding. Molecular Ecology Resources, 16(1), 176–182. https://doi.org/10.1111/1755-0998.12428
kmindex
Fast and memory-efficient k-mer indexing and querying across large collections of genomic datasets. Used by skimindex for decontamination k-mer index construction and querying.
| Version | latest bioconda release (bioconda::kmindex) |
| Binary location | /opt/conda/bin/kmindex |
| Architecture | linux/amd64, linux/arm64 |
| Language | C++ |
| GitHub | https://github.com/tlemane/kmindex |
| Documentation | https://tlemane.github.io/kmindex |
References
Lemane T., Lezzoche N., Lecubin J., Pelletier E., Sunagawa S., Bork P., Hingamp P., Lavigne R., Chikhi R., Peterlongo P. (2024). Indexing and real-time user-friendly queries in terabyte-sized complex genomic datasets with kmindex and ORA. Nature Computational Science, 4, 104–109. https://doi.org/10.1038/s43588-024-00596-6
Lemane T., Medvedev P., Chikhi R., Peterlongo P. (2022). kmtricks: efficient and flexible construction of Bloom filters for large sequencing data collections. Bioinformatics Advances, 2(1), vbac029. https://doi.org/10.1093/bioadv/vbac029
ntCard
Streaming algorithm for estimating k-mer frequency histograms from sequencing data. Used to estimate genome size and coverage prior to k-mer counting.
| Version | latest bioconda release (bioconda::ntcard) |
| Binary location | /opt/conda/bin/ntcard |
| Architecture | linux/amd64, linux/arm64 |
| Language | C++ |
| GitHub | https://github.com/bcgsc/ntCard |
| Website | https://bcgsc.ca/resources/software/ntcard |
Reference
Mohamadi H., Khan H., Birol I. (2017). ntCard: a streaming algorithm for cardinality estimation in genomics data. Bioinformatics, 33(9), 1324–1330. https://doi.org/10.1093/bioinformatics/btw832
NCBI SRA Toolkit
Suite of tools for downloading, validating, and converting sequencing data
from the NCBI Sequence Read Archive (SRA). Provides prefetch, fasterq-dump,
sam-dump, and many others.
| Version | 3.2.0 (latest bioconda; bioconda::sra-tools) |
| Binary location | /opt/conda/bin/prefetch, /opt/conda/bin/fasterq-dump, … |
| Architecture | linux/amd64, linux/arm64 (via bioconda + conda-forge::ossuuid) |
| Language | C / C++ |
| GitHub | https://github.com/ncbi/sra-tools |
| Documentation | https://github.com/ncbi/sra-tools/wiki |
| Website | https://www.ncbi.nlm.nih.gov/sra/docs/sradownload/ |
| Institution | NCBI / National Library of Medicine |
Reference
Leinonen R., Sugawara H., Shumway M., International Nucleotide Sequence Database Collaboration (2011). The sequence read archive. Nucleic Acids Research, 39(Database issue), D19–D21. https://doi.org/10.1093/nar/gkq1019
NCBI Datasets CLI
Command-line interface for downloading genome assemblies, gene sequences, and
associated metadata directly from NCBI. Provides datasets and dataformat.
| Version | v18.21.0 (pinned; NCBI_DATASETS_VERSION in docker/Makefile) |
| Binaries | /usr/local/bin/datasets, /usr/local/bin/dataformat |
| Architecture | linux/amd64, linux/arm64 (official NCBI releases) |
| Language | Go |
| GitHub | https://github.com/ncbi/datasets |
| Documentation | https://www.ncbi.nlm.nih.gov/datasets/docs/v2/download-and-install/ |
| Institution | NCBI / National Library of Medicine |
IBM Aspera Transfer SDK (ascp)
High-speed file transfer client using the FASP protocol. Used by the SRA Toolkit to accelerate downloads from NCBI when available, falling back transparently to HTTPS otherwise.
| Version | 1.1.7 (pinned; ASPERA_SDK_VERSION in docker/Makefile) |
| Binary | /usr/local/bin/ascp |
| Architecture | linux/amd64 (linux-amd64), linux/arm64 (linux-aarch64) |
| Distribution | IBM CloudFront CDN (d3pgwzphl5a0ty.cloudfront.net) |
| Documentation | https://www.ibm.com/docs/en/ahts/4.4.x |
| SDK | https://developer.ibm.com/apis/catalog/aspera--aspera-transfer-sdk/ |
| Website | https://www.ibm.com/products/aspera |
| Institution | IBM |
| SRA integration | Configured via vdb-config --set /TOOLS/ascp-path=/usr/local/bin/ascp |
Note: ARM64 support was introduced in SDK 1.1.4 (transferd 1.1.4).
kmerasm
Simple de Bruijn unitig assembler for 31-mers. Reads canonical 31-mers (one per line) and outputs unitigs in FASTA format, built from non-branching paths in the de Bruijn graph.
| Version | in-house (built from src/kmerasm/) |
| Binary | /app/bin/kmerasm |
| Architecture | Compiled for each target platform (builder stage) |
| Language | C |
| Source | src/kmerasm/kmerasm.c in this repository |
Base environment
Miniconda3 / conda
| Base image | continuumio/miniconda3:latest |
| Python | /opt/conda/bin/python3 |
| Website | https://docs.conda.io |
Tools installed via conda are in /opt/conda/bin/ and available on PATH.
System utilities (apt)
Installed via apt-get in the container:
| Tool | Purpose |
|---|---|
pigz |
Parallel gzip compression / decompression |
jq |
JSON processor |
curl |
HTTP/FTP client |
make |
Build automation |
less |
Pager |
sudo |
Privilege escalation (user skimindex has NOPASSWD:ALL) |
Version management
All pinned versions are centralised in docker/Makefile:
OBITOOLS_VERSION := 4.4.29
NCBI_DATASETS_VERSION := v18.21.0
KMINDEX_VERSION := bioconda::kmindex
NTCARD_VERSION := bioconda::ntcard
SRATOOLS_VERSION := bioconda::sra-tools
ASPERA_SDK_VERSION := 1.1.7
To update a tool, change the corresponding variable and rebuild with make all.