skimindex.unix.ncbi
skimindex.unix.ncbi
NCBI tools wrapper module using plumbum.
Provides Pythonic interfaces to NCBI CLI tools installed in the image
- datasets: Download datasets from NCBI (genome sequences, etc.)
- dataformat: Convert dataset formats (e.g., JSON to FASTA)
Two API styles
- Flexible: datasets("download", "genome", ...)
- Convenient: datasets_download_genome(...)
Example
from skimindex.ncbi import datasets, dataformat, help from plumbum import FG
Flexible API
datasets("download", "genome", "--taxon", "human", "--reference", "--assembly-level", "chromosome") & FG
Convenient API
datasets_download_genome("--taxon", "human", "--reference") & FG
Convert JSON output to FASTA
dataformat("convert", "json-to-fasta", "--input-file", "data.json") & FG
datasets
datasets(*args) -> LoggedBoundCommand
Execute a datasets command.
Common subcommands
- download: Download datasets from NCBI
- summary: Get summary information about datasets
Examples:
datasets("download", "genome", "--taxon", "human", "--reference") datasets("summary", "genome", "--taxon", "Spermatophyta") datasets("download", "protein", "--taxon", "human")
Full documentation: datasets --help
dataformat
dataformat(*args) -> LoggedBoundCommand
Execute a dataformat command.
Common subcommands
- convert: Convert between formats (json-to-fasta, json-to-gff3, etc.)
- fasta: Extract/convert to FASTA format
- tsv: Extract/convert to TSV format
- gff3: Extract/convert to GFF3 format
Examples:
dataformat("convert", "json-to-fasta", "--input-file", "data.json") dataformat("convert", "json-to-gff3", "--input-file", "data.json") dataformat("fasta", "--input-file", "data.json", "--seq-type", "nucl")
Full documentation: dataformat --help
datasets_download
datasets_download(*args) -> LoggedBoundCommand
Download datasets from NCBI.
datasets_download_genome
datasets_download_genome(*args) -> LoggedBoundCommand
Download genome sequences (datasets download genome).
datasets_download_gene
datasets_download_gene(*args) -> LoggedBoundCommand
Download gene sequences (datasets download gene).
datasets_download_protein
datasets_download_protein(*args) -> LoggedBoundCommand
Download protein sequences (datasets download protein).
datasets_summary
datasets_summary(*args) -> LoggedBoundCommand
Get summary information about datasets without downloading.
datasets_summary_genome
datasets_summary_genome(*args) -> LoggedBoundCommand
Get summary of genome datasets.
datasets_summary_gene
datasets_summary_gene(*args) -> LoggedBoundCommand
Get summary of gene datasets.
datasets_summary_protein
datasets_summary_protein(*args) -> LoggedBoundCommand
Get summary of protein datasets.
dataformat_convert
dataformat_convert(*args) -> LoggedBoundCommand
Convert between dataset formats.
dataformat_fasta
dataformat_fasta(*args) -> LoggedBoundCommand
Extract or convert to FASTA format.
dataformat_tsv
dataformat_tsv(*args) -> LoggedBoundCommand
Extract or convert to TSV format.
dataformat_gff3
dataformat_gff3(*args)
Extract or convert to GFF3 format.
help
help(tool_name: str) -> str
Return the --help output for an NCBI CLI tool.
Supports multi-word subcommands by splitting on spaces, e.g.
"datasets download" invokes datasets download --help.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tool_name
|
str
|
Tool name or space-separated subcommand path,
e.g. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Help text string, or an error message if the tool is not found. |