skimindex.sources
skimindex.sources
skimindex.sources — source registry and directory helpers.
A source ([source.X] in TOML) is an external data provider with a configured directory under SKIMINDEX_ROOT.
Usage
from skimindex.sources import (
source_dir, dataset_download_dir,
output_dir, dataset_output_dir,
)
genbank_root = source_dir("genbank")
human_dl_dir = dataset_download_dir("human")
decontam_root = output_dir("role", "decontamination")
human_out_dir = dataset_output_dir("human")
source_dir
source_dir(source: str) -> Path
Root directory for a named source.
Reads [source.
Example
source_dir("genbank") → Path("/data/genbank") source_dir("ncbi") → Path("/data/genbank") # shares dir with genbank
dataset_download_dir
dataset_download_dir(dataset_name: str) -> Path
Download output directory for a named dataset.
Resolves: source_dir(dataset.source) / dataset.directory where dataset.directory defaults to dataset_name if not set.
Example
dataset_download_dir("human") → Path("/data/genbank/human") dataset_download_dir("betula_nana") → Path("/data/raw_data/Betula_nana")
output_dir
output_dir(section_kind: str, section_name: str) -> Path
Processing output directory for a named config section.
Reads the section's 'directory' key and resolves it under the appropriate root for the section kind: - "role" → processed_data_dir() / section.directory - "index" → indexes_dir() / section.directory
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
section_kind
|
str
|
"role" or "index" |
required |
section_name
|
str
|
Name of the sub-section, e.g. "decontamination" |
required |
Examples:
output_dir("role", "decontamination") → /processed_data/decontamination output_dir("role", "genomes") → /processed_data/genomes_15x
dataset_output_dir
dataset_output_dir(dataset_name: str) -> Path
Processing output directory for a dataset, resolved under its role.
Resolves: output_dir("role", dataset.role) / dataset.directory
Example
dataset_output_dir("human") → /processed_data/decontamination/human dataset_output_dir("plants") → /processed_data/decontamination/Plants
resolve_artifact
resolve_artifact(
value: str | dict, dataset_subdir: Path | None = None
) -> Path
Resolve an artifact reference to an absolute path.
Accepted forms
"parts@decontamination" → processed_data/{role_dir}/{dataset_subdir}/parts "parts@idx:decontamination" → indexes/{role_dir}/{dataset_subdir}/parts "@idx:decontamination" → indexes/{role_dir}/ (meta-index, no subdir) {"role": "decontamination", "dir": "parts"} → same as string form {"role": "idx:decontamination", "dir": ""} → meta-index
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
str | dict
|
Artifact reference — string ( |
required |
dataset_subdir
|
Path | None
|
Relative subdir within the role tree
(e.g. |
None
|
Returns:
| Type | Description |
|---|---|
Path
|
Absolute |
Raises:
| Type | Description |
|---|---|
ValueError
|
If value is a string without an |