skimindex.config
skimindex.config
Config module for skimindex — reads TOML configuration and provides typed access to all sections, path helpers, and environment variable export.
The configuration file is expected at /config/skimindex.toml by default, overridable via SKIMINDEX_CONFIG environment variable.
Section types (identified by TOML prefix): [local_directories], [logging], [processed_data], [indexes], [stamp] → root-level configuration sections [source.X] → data origin parameters [role.X] → data usage / pipeline parameters [processing.X] → pipeline step definitions (atomic or composite) [data.X] → dataset declarations (require source + role)
Environment variable schema
SKIMINDEX__LOGGING__LEVEL (root section) SKIMINDEX__SOURCE__NCBI__DIRECTORY (prefixed section) SKIMINDEX__DATA__HUMAN__TAXON (prefixed section) SKIMINDEX__ROOT (mount root, default "/") SKIMINDEX__REF_TAXA (space-separated list) SKIMINDEX__REF_GENOMES (space-separated list)
Config
Config(
path: Path = DEFAULT_CONFIG,
*,
apply_logging: bool = True,
export_env: bool = True,
)
Parse and provide typed access to skimindex TOML configuration.
Reads a TOML file (default /config/skimindex.toml, overridable via
SKIMINDEX_CONFIG environment variable), exports all scalar values as
SKIMINDEX__SECTION__KEY environment variables, and provides typed
accessors for every config section.
Precedence for any value: env var > TOML > built-in default.
Example
from skimindex.config import config
cfg = config()
print(cfg.get("logging", "level")) # "INFO"
print(cfg.processed_data_dir()) # Path("/processed_data")
Load configuration from a TOML file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to the TOML configuration file. |
DEFAULT_CONFIG
|
apply_logging
|
bool
|
If |
True
|
export_env
|
bool
|
If |
True
|
sources
property
sources: dict[str, dict[str, Any]]
All [source.X] sections, keyed by X.
roles
property
roles: dict[str, dict[str, Any]]
All [role.X] sections, keyed by X.
processing
property
processing: dict[str, dict[str, Any]]
All [processing.X] sections, keyed by X.
datasets
property
datasets: dict[str, dict[str, Any]]
All [data.X] sections, keyed by X.
ref_taxa
property
ref_taxa: list[str]
Names of all datasets with source 'ncbi' or 'genbank'.
ref_genomes
property
ref_genomes: list[str]
Names of all datasets with source 'ncbi' (downloadable via NCBI datasets CLI).
sra_datasets
property
sra_datasets: list[str]
Names of all datasets with source 'sra'.
root
property
root: Path
Container/runtime root path from SKIMINDEX_ROOT env var (default '/').
data
property
data: dict[str, Any]
Read-only copy of the raw parsed TOML data.
path
property
path: Path
Absolute path to the TOML configuration file.
source_dir
source_dir(name: str) -> Path
Return the mount path for a named source (root / sources[name][directory]).
processed_data_dir
processed_data_dir() -> Path
Return the processed data root (root / [processed_data].directory).
indexes_dir
indexes_dir() -> Path
Return the indexes root (root / [indexes].directory).
stamp_dir
stamp_dir() -> Path
Return the stamp root (root / [stamp].directory).
scratch_dir
scratch_dir() -> Path
Return the scratch root (root / [scratch].directory).
log_file
log_file() -> Path
Return the log file path (root / [logging].directory / [logging].file).
raw_data_dir
raw_data_dir() -> Path
Return the internal/raw data root (source_dir('internal')).
get
get(section: str, key: str, default: str = '') -> str
Get a config value with env-var / TOML / default precedence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
section
|
str
|
Section path, dotted for prefixed sections
(e.g. |
required |
key
|
str
|
Key name within the section. |
required |
default
|
str
|
Fallback value when neither env var nor TOML entry exist. |
''
|
Returns:
| Type | Description |
|---|---|
str
|
The resolved value as a string. |
Example
cfg.get("logging", "level") # "INFO"
cfg.get("source.ncbi", "directory") # "genbank"
cfg.get("data.human", "taxon") # "human"
sections
sections() -> list[str]
Return all top-level section names.
env_vars
env_vars() -> dict[str, str]
Return all SKIMINDEX__ environment variables as a plain dict.
Does not touch os.environ — suitable for inspection, testing,
or generating a shell export snippet. Derived special variables
(SKIMINDEX__REF_TAXA, SKIMINDEX__REF_GENOMES) are included.
Returns:
| Type | Description |
|---|---|
dict[str, str]
|
Mapping of variable name → serialised string value. |
dump_env
dump_env() -> str
Return a shell snippet that exports all SKIMINDEX__ variables.
Variables already present in os.environ are skipped so that
pre-existing environment values take priority. The output is safe to
pass directly to eval in bash:
eval "$(python3 -m skimindex.config)"
Returns:
| Type | Description |
|---|---|
str
|
A newline-separated string of |
load
load(path: Path = DEFAULT_CONFIG) -> Config
Load and return a Config instance.
config
config() -> Config
Get the module-level singleton Config (lazy-initialized).
root
root() -> Path
Return the container root path (SKIMINDEX_ROOT env var, default '/').
source_dir
source_dir(name: str) -> Path
Return the mount path for a named source.
processed_data_dir
processed_data_dir() -> Path
Return the processed data root.
indexes_dir
indexes_dir() -> Path
Return the indexes root.
stamp_dir
stamp_dir() -> Path
Return the stamp root.
raw_data_dir
raw_data_dir() -> Path
Return the internal/raw data root (source_dir('internal')).
scratch_dir
scratch_dir() -> Path
Return the scratch root.