Skip to content

skimindex.config

skimindex.config

Config module for skimindex — reads TOML configuration and provides typed access to all sections, path helpers, and environment variable export.

The configuration file is expected at /config/skimindex.toml by default, overridable via SKIMINDEX_CONFIG environment variable.

Section types (identified by TOML prefix): [local_directories], [logging], [processed_data], [indexes], [stamp] → root-level configuration sections [source.X] → data origin parameters [role.X] → data usage / pipeline parameters [processing.X] → pipeline step definitions (atomic or composite) [data.X] → dataset declarations (require source + role)

Environment variable schema

SKIMINDEX__LOGGING__LEVEL (root section) SKIMINDEX__SOURCE__NCBI__DIRECTORY (prefixed section) SKIMINDEX__DATA__HUMAN__TAXON (prefixed section) SKIMINDEX__ROOT (mount root, default "/") SKIMINDEX__REF_TAXA (space-separated list) SKIMINDEX__REF_GENOMES (space-separated list)

Config

Config(
    path: Path = DEFAULT_CONFIG,
    *,
    apply_logging: bool = True,
    export_env: bool = True,
)

Parse and provide typed access to skimindex TOML configuration.

Reads a TOML file (default /config/skimindex.toml, overridable via SKIMINDEX_CONFIG environment variable), exports all scalar values as SKIMINDEX__SECTION__KEY environment variables, and provides typed accessors for every config section.

Precedence for any value: env var > TOML > built-in default.

Example
from skimindex.config import config

cfg = config()
print(cfg.get("logging", "level"))   # "INFO"
print(cfg.processed_data_dir())      # Path("/processed_data")

Load configuration from a TOML file.

Parameters:

Name Type Description Default
path Path

Path to the TOML configuration file.

DEFAULT_CONFIG
apply_logging bool

If True, configure the logging system from [logging] immediately after loading.

True
export_env bool

If True, export all config values as SKIMINDEX__ environment variables (existing vars are never overwritten).

True

sources property

sources: dict[str, dict[str, Any]]

All [source.X] sections, keyed by X.

roles property

roles: dict[str, dict[str, Any]]

All [role.X] sections, keyed by X.

processing property

processing: dict[str, dict[str, Any]]

All [processing.X] sections, keyed by X.

datasets property

datasets: dict[str, dict[str, Any]]

All [data.X] sections, keyed by X.

ref_taxa property

ref_taxa: list[str]

Names of all datasets with source 'ncbi' or 'genbank'.

ref_genomes property

ref_genomes: list[str]

Names of all datasets with source 'ncbi' (downloadable via NCBI datasets CLI).

sra_datasets property

sra_datasets: list[str]

Names of all datasets with source 'sra'.

root property

root: Path

Container/runtime root path from SKIMINDEX_ROOT env var (default '/').

data property

data: dict[str, Any]

Read-only copy of the raw parsed TOML data.

path property

path: Path

Absolute path to the TOML configuration file.

source_dir

source_dir(name: str) -> Path

Return the mount path for a named source (root / sources[name][directory]).

processed_data_dir

processed_data_dir() -> Path

Return the processed data root (root / [processed_data].directory).

indexes_dir

indexes_dir() -> Path

Return the indexes root (root / [indexes].directory).

stamp_dir

stamp_dir() -> Path

Return the stamp root (root / [stamp].directory).

scratch_dir

scratch_dir() -> Path

Return the scratch root (root / [scratch].directory).

log_file

log_file() -> Path

Return the log file path (root / [logging].directory / [logging].file).

raw_data_dir

raw_data_dir() -> Path

Return the internal/raw data root (source_dir('internal')).

get

get(section: str, key: str, default: str = '') -> str

Get a config value with env-var / TOML / default precedence.

Parameters:

Name Type Description Default
section str

Section path, dotted for prefixed sections (e.g. "logging", "source.ncbi", "data.human").

required
key str

Key name within the section.

required
default str

Fallback value when neither env var nor TOML entry exist.

''

Returns:

Type Description
str

The resolved value as a string.

Example
cfg.get("logging", "level")          # "INFO"
cfg.get("source.ncbi", "directory")  # "genbank"
cfg.get("data.human", "taxon")       # "human"

sections

sections() -> list[str]

Return all top-level section names.

env_vars

env_vars() -> dict[str, str]

Return all SKIMINDEX__ environment variables as a plain dict.

Does not touch os.environ — suitable for inspection, testing, or generating a shell export snippet. Derived special variables (SKIMINDEX__REF_TAXA, SKIMINDEX__REF_GENOMES) are included.

Returns:

Type Description
dict[str, str]

Mapping of variable name → serialised string value.

dump_env

dump_env() -> str

Return a shell snippet that exports all SKIMINDEX__ variables.

Variables already present in os.environ are skipped so that pre-existing environment values take priority. The output is safe to pass directly to eval in bash:

eval "$(python3 -m skimindex.config)"

Returns:

Type Description
str

A newline-separated string of export VAR=value statements.

load

load(path: Path = DEFAULT_CONFIG) -> Config

Load and return a Config instance.

config

config() -> Config

Get the module-level singleton Config (lazy-initialized).

root

root() -> Path

Return the container root path (SKIMINDEX_ROOT env var, default '/').

source_dir

source_dir(name: str) -> Path

Return the mount path for a named source.

processed_data_dir

processed_data_dir() -> Path

Return the processed data root.

indexes_dir

indexes_dir() -> Path

Return the indexes root.

stamp_dir

stamp_dir() -> Path

Return the stamp root.

raw_data_dir

raw_data_dir() -> Path

Return the internal/raw data root (source_dir('internal')).

scratch_dir

scratch_dir() -> Path

Return the scratch root.