Skip to content

skimindex.unix.ncbi

skimindex.unix.ncbi

NCBI tools wrapper module using plumbum.

Provides Pythonic interfaces to NCBI CLI tools installed in the image
  • datasets: Download datasets from NCBI (genome sequences, etc.)
  • dataformat: Convert dataset formats (e.g., JSON to FASTA)
Two API styles
  1. Flexible: datasets("download", "genome", ...)
  2. Convenient: datasets_download_genome(...)
Example

from skimindex.ncbi import datasets, dataformat, help from plumbum import FG

Flexible API

datasets("download", "genome", "--taxon", "human", "--reference", "--assembly-level", "chromosome") & FG

Convenient API

datasets_download_genome("--taxon", "human", "--reference") & FG

Convert JSON output to FASTA

dataformat("convert", "json-to-fasta", "--input-file", "data.json") & FG

datasets

datasets(*args) -> LoggedBoundCommand

Execute a datasets command.

Common subcommands
  • download: Download datasets from NCBI
  • summary: Get summary information about datasets

Examples:

datasets("download", "genome", "--taxon", "human", "--reference") datasets("summary", "genome", "--taxon", "Spermatophyta") datasets("download", "protein", "--taxon", "human")

Full documentation: datasets --help

dataformat

dataformat(*args) -> LoggedBoundCommand

Execute a dataformat command.

Common subcommands
  • convert: Convert between formats (json-to-fasta, json-to-gff3, etc.)
  • fasta: Extract/convert to FASTA format
  • tsv: Extract/convert to TSV format
  • gff3: Extract/convert to GFF3 format

Examples:

dataformat("convert", "json-to-fasta", "--input-file", "data.json") dataformat("convert", "json-to-gff3", "--input-file", "data.json") dataformat("fasta", "--input-file", "data.json", "--seq-type", "nucl")

Full documentation: dataformat --help

datasets_download

datasets_download(*args) -> LoggedBoundCommand

Download datasets from NCBI.

datasets_download_genome

datasets_download_genome(*args) -> LoggedBoundCommand

Download genome sequences (datasets download genome).

datasets_download_gene

datasets_download_gene(*args) -> LoggedBoundCommand

Download gene sequences (datasets download gene).

datasets_download_protein

datasets_download_protein(*args) -> LoggedBoundCommand

Download protein sequences (datasets download protein).

datasets_summary

datasets_summary(*args) -> LoggedBoundCommand

Get summary information about datasets without downloading.

datasets_summary_genome

datasets_summary_genome(*args) -> LoggedBoundCommand

Get summary of genome datasets.

datasets_summary_gene

datasets_summary_gene(*args) -> LoggedBoundCommand

Get summary of gene datasets.

datasets_summary_protein

datasets_summary_protein(*args) -> LoggedBoundCommand

Get summary of protein datasets.

dataformat_convert

dataformat_convert(*args) -> LoggedBoundCommand

Convert between dataset formats.

dataformat_fasta

dataformat_fasta(*args) -> LoggedBoundCommand

Extract or convert to FASTA format.

dataformat_tsv

dataformat_tsv(*args) -> LoggedBoundCommand

Extract or convert to TSV format.

dataformat_gff3

dataformat_gff3(*args)

Extract or convert to GFF3 format.

help

help(tool_name: str) -> str

Return the --help output for an NCBI CLI tool.

Supports multi-word subcommands by splitting on spaces, e.g. "datasets download" invokes datasets download --help.

Parameters:

Name Type Description Default
tool_name str

Tool name or space-separated subcommand path, e.g. "datasets" or "dataformat convert".

required

Returns:

Type Description
str

Help text string, or an error message if the tool is not found.