Environment Variables

skimindex exports all configuration values as shell environment variables, making them available to bash scripts, Makefile rules, and container entrypoints without re-parsing the TOML file.

Variables are populated automatically when __skimindex_config.sh is sourced. Inside the container the Python package reads them via os.environ with the same precedence rules.


Naming Convention

Every variable follows the pattern:

SKIMINDEX__{SECTION}__{KEY}

where dots in section names are replaced by double underscores:

TOML section Key Variable
[logging] level SKIMINDEX__LOGGING__LEVEL
[source.ncbi] directory SKIMINDEX__SOURCE__NCBI__DIRECTORY
[role.decontamination] run SKIMINDEX__ROLE__DECONTAMINATION__RUN
[processing.count_kmers_decontam] kmer_size SKIMINDEX__PROCESSING__COUNT_KMERS_DECONTAM__KMER_SIZE
[data.human] taxon SKIMINDEX__DATA__HUMAN__TAXON

Precedence: environment > config file > built-in defaults. A variable already set in the environment is never overwritten.

Value serialisation:

  • TOML arrays → space-separated string (["bct", "pln"]"bct pln")
  • TOML booleans → lowercase string (true"true", false"false")
  • Numeric and string values → str(value)
  • Complex values (inline-table arrays such as steps) are not exported.

Special Variables

These variables are not direct reflections of a single TOML key.

Variable Description
SKIMINDEX_ROOT Container/runtime root path. Read from the environment; defaults to /. All path helpers prepend this value.
SKIMINDEX__REF_TAXA Space-separated names of all datasets whose source is ncbi or genbank.
SKIMINDEX__REF_GENOMES Space-separated names of all datasets whose source is ncbi (downloadable via NCBI Datasets CLI).

Variable Reference by Section

[local_directories]

Each key <k> produces one variable whose value is the container-side mount path /<k> (the host-side path from the TOML is discarded inside the container).

Variable Default value
SKIMINDEX__LOCAL_DIRECTORIES__GENBANK /genbank
SKIMINDEX__LOCAL_DIRECTORIES__INDEXES /indexes
SKIMINDEX__LOCAL_DIRECTORIES__RAW_DATA /raw_data
SKIMINDEX__LOCAL_DIRECTORIES__PROCESSED_DATA /processed_data
SKIMINDEX__LOCAL_DIRECTORIES__CONFIG /config
SKIMINDEX__LOCAL_DIRECTORIES__LOG /log
SKIMINDEX__LOCAL_DIRECTORIES__STAMP /stamp
SKIMINDEX__LOCAL_DIRECTORIES__USERCMD /usercmd

[logging]

Variable Example value
SKIMINDEX__LOGGING__DIRECTORY log
SKIMINDEX__LOGGING__FILE skimindex.log
SKIMINDEX__LOGGING__LEVEL INFO
SKIMINDEX__LOGGING__MIRROR true
SKIMINDEX__LOGGING__EVERYTHING true

[processed_data], [indexes], [stamp]

Variable Example value
SKIMINDEX__PROCESSED_DATA__DIRECTORY processed_data
SKIMINDEX__INDEXES__DIRECTORY indexes
SKIMINDEX__STAMP__DIRECTORY stamp

[source.X]

One set of variables per source section.

Variable Example value
SKIMINDEX__SOURCE__NCBI__DIRECTORY genbank
SKIMINDEX__SOURCE__GENBANK__DIRECTORY genbank
SKIMINDEX__SOURCE__GENBANK__DIVISIONS bct pln
SKIMINDEX__SOURCE__INTERNAL__DIRECTORY raw_data

[role.X]

One set of variables per role section.

Variable Example value
SKIMINDEX__ROLE__DECONTAMINATION__DIRECTORY decontamination
SKIMINDEX__ROLE__DECONTAMINATION__RUN prepare_decontam
SKIMINDEX__ROLE__GENOMES__DIRECTORY genomes_15x
SKIMINDEX__ROLE__GENOMES__KMER_SIZE 31
SKIMINDEX__ROLE__GENOME_SKIMS__DIRECTORY skims

[processing.X]

One set of variables per processing section. Scalar keys only — the steps array (list of inline tables) is not exported.

Variable Example value
SKIMINDEX__PROCESSING__PREPARE_DECONTAM__ROLE decontamination
SKIMINDEX__PROCESSING__PREPARE_DECONTAM__DIRECTORY parts
SKIMINDEX__PROCESSING__COUNT_KMERS_DECONTAM__TYPE kmercount
SKIMINDEX__PROCESSING__COUNT_KMERS_DECONTAM__ROLE decontamination
SKIMINDEX__PROCESSING__COUNT_KMERS_DECONTAM__INPUT prepare_decontam
SKIMINDEX__PROCESSING__COUNT_KMERS_DECONTAM__DIRECTORY kmercount
SKIMINDEX__PROCESSING__COUNT_KMERS_DECONTAM__KMER_SIZE 29
SKIMINDEX__PROCESSING__COUNT_KMERS_DECONTAM__THREADS 10

[data.X]

One set of variables per dataset. The exact keys depend on the dataset's source and role; see Configuration Format for the full key reference.

# Example — data.human
SKIMINDEX__DATA__HUMAN__DIRECTORY=Human
SKIMINDEX__DATA__HUMAN__SOURCE=ncbi
SKIMINDEX__DATA__HUMAN__ROLE=decontamination
SKIMINDEX__DATA__HUMAN__TAXON=human
SKIMINDEX__DATA__HUMAN__REFERENCE=true
SKIMINDEX__DATA__HUMAN__ASSEMBLY_LEVEL=chromosome
SKIMINDEX__DATA__HUMAN__ASSEMBLY_VERSION=latest

How Variables Are Loaded

Inside the container

__skimindex_config.sh delegates all TOML parsing to the Python module:

eval "$(python3 -m skimindex.config)"

python3 -m skimindex.config also validates the configuration before printing any variables — if validation fails, it exits with status 1 and prints errors to stderr, which causes the eval to abort visibly.

In development (outside the container)

The same script detects the project .venv and injects its site-packages into PYTHONPATH automatically, so no manual activation is needed:

source scripts/__skimindex_config.sh

In Python code

from skimindex.config import config

cfg = config()
# Values are in os.environ after Config.__init__ calls _export_env().
# Use cfg.get() for typed access with the same precedence rules:
level = cfg.get("logging", "level")          # → "INFO"
taxa  = cfg.get("data.human", "taxon")       # → "human"