skimindex.unix.kmindex

kmindex wrapper module using plumbum.

Provides a Pythonic interface to kmindex v0.6.0 — a tool for indexing and querying kmtricks Bloom filter / counting Bloom filter matrices.

Two API styles:

Flexible — pass any subcommand and flags directly::

kmindex("build", "-i", "/indexes/main", "-f", "fof.txt", ...)
Shortcuts — one function per subcommand with typed keyword arguments::

kmindex_build(index="/indexes/main", fof="fof.txt", run_dir="@inplace", register_as="human", kmer_size=31, threads=8)

All functions return a plumbum BoundCommand that can be run with & FG, & BG, or piped with |.

Example

from skimindex.unix.kmindex import kmindex_build, kmindex_query
from plumbum import FG

kmindex_build(
    index="/indexes/decontam",
    fof="samples.fof",
    run_dir="@inplace",
    register_as="bacteria",
    kmer_size=29,
    threads=16,
) & FG

kmindex_query(
    index="/indexes/decontam",
    fastx="sample.fa.gz",
    output="results/",
    zvalue=3,
    threads=8,
) & FG

kmindex

kmindex(*args: str) -> LoggedBoundCommand

Execute a kmindex command with arbitrary arguments.

Parameters:

Name	Type	Description	Default
`*args`	`str`	Subcommand and flags, e.g. `"build"`, `"-i"`, `"/path"`.	`()`

Returns:

Type	Description
`LoggedBoundCommand`	A plumbum `BoundCommand` ready to execute.

kmindex_build

kmindex_build(
    *,
    index: str | Path,
    fof: str | Path,
    run_dir: str | Path = "@inplace",
    register_as: str,
    from_index: str | None = None,
    km_path: str | Path | None = None,
    kmer_size: int | None = None,
    minim_size: int | None = None,
    hard_min: int | None = None,
    nb_partitions: int | None = None,
    bloom_size: int | None = None,
    nb_cell: int | None = None,
    bitw: int | None = None,
    threads: int | None = None,
    cpr: bool = False,
    verbose: str | None = None,
) -> LoggedBoundCommand

Build a kmindex sub-index from a kmtricks file-of-files.

Parameters:

Name	Type	Description	Default
`index`	`str \| Path`	Global index path (`-i`).	required
`fof`	`str \| Path`	kmtricks input file — file-of-files (`-f`).	required
`run_dir`	`str \| Path`	kmtricks runtime directory. Use `"@inplace"` to build inside the global index directory (`-d`).	`'@inplace'`
`register_as`	`str`	Name under which the sub-index is registered (`-r`).	required
`from_index`	`str \| None`	Re-use parameters from a pre-registered sub-index (`--from`).	`None`
`km_path`	`str \| Path \| None`	Path to the `kmtricks` binary; searched in `$PATH` if omitted (`--km-path`).	`None`
`kmer_size`	`int \| None`	k-mer length in `[8, 255]`, default 31 (`-k`).	`None`
`minim_size`	`int \| None`	Minimizer length in `[4, 15]`, default 10 (`-m`).	`None`
`hard_min`	`int \| None`	Minimum abundance to keep a k-mer, default 2 (`--hard-min`).	`None`
`nb_partitions`	`int \| None`	Number of partitions, 0 = auto (`--nb-partitions`).	`None`
`bloom_size`	`int \| None`	Bloom filter size for presence/absence indexing (`--bloom-size`).	`None`
`nb_cell`	`int \| None`	Number of cells per counting Bloom filter for abundance indexing (`--nb-cell`).	`None`
`bitw`	`int \| None`	Bits per cell for abundance indexing, default 2 (`--bitw`). Abundances are stored as log₂ classes: `2^bitw` classes.	`None`
`threads`	`int \| None`	Number of threads (`-t`).	`None`
`cpr`	`bool`	Compress intermediate files (`--cpr`).	`False`
`verbose`	`str \| None`	Verbosity level: `debug`, `info`, `warning`, `error` (`-v`).	`None`

Returns:

Type	Description
`LoggedBoundCommand`	A plumbum `BoundCommand` ready to execute.

kmindex_register

kmindex_register(
    *,
    global_index: str | Path,
    name: str | None = None,
    index_path: str | Path | None = None,
    from_file: str | Path | None = None,
    mode: str = "symlink",
    verbose: str | None = None,
) -> LoggedBoundCommand

Register an existing kmtricks run as a sub-index.

Either provide name + index_path for a single sub-index, or from_file for batch registration.

Parameters:

Name	Type	Description	Default
`global_index`	`str \| Path`	Global index path (`-i`).	required
`name`	`str \| None`	Sub-index name; ignored when `from_file` is set (`-n`).	`None`
`index_path`	`str \| Path \| None`	Path to a kmtricks run directory; ignored when `from_file` is set (`-p`).	`None`
`from_file`	`str \| Path \| None`	Tab-separated file with `index_name<tab>index_path` per line (`-f`).	`None`
`mode`	`str`	Registration mode: `symlink`, `copy`, or `move` (`-m`), default `symlink`.	`'symlink'`
`verbose`	`str \| None`	Verbosity level (`-v`).	`None`

Returns:

Type	Description
`LoggedBoundCommand`	A plumbum `BoundCommand` ready to execute.

kmindex_query

kmindex_query(
    *,
    index: str | Path,
    fastx: str | Path,
    output: str | Path = "output",
    names: str | None = None,
    zvalue: int | None = None,
    threshold: float | None = None,
    single_query: str | None = None,
    format: str | None = None,
    batch_size: int | None = None,
    aggregate: bool = False,
    fast: bool = False,
    threads: int | None = None,
    verbose: str | None = None,
) -> LoggedBoundCommand

Query a kmindex index with a FASTA/FASTQ file.

Use :func:kmindex_query2 instead when the index contains hundreds or thousands of sub-indexes.

Parameters:

Name	Type	Description	Default
`index`	`str \| Path`	Global index path (`-i`).	required
`fastx`	`str \| Path`	Input FASTA/FASTQ file, supports gz/bzip2 (`-q`).	required
`output`	`str \| Path`	Output directory, default `"output"` (`-o`).	`'output'`
`names`	`str \| None`	Comma-separated list of sub-indexes to query; all if omitted (`-n`).	`None`
`zvalue`	`int \| None`	Findere z value — index s-mers, query `(s+z)`-mers. Enables approximate matching against an index built with size `K` by querying with size `K+z` (`-z`).	`None`
`threshold`	`float \| None`	Minimum shared k-mer fraction in `[0.0, 1.0]` to report a hit (`-r`).	`None`
`single_query`	`str \| None`	Query identifier — treat all sequences as a single query (`-s`).	`None`
`format`	`str \| None`	Output format: `json`, `matrix`, `json_vec`, `jsonl`, `jsonl_vec` (`-f`).	`None`
`batch_size`	`int \| None`	Size of query batches; 0 = auto (`-b`).	`None`
`aggregate`	`bool`	Aggregate batch results into one file (`-a`).	`False`
`fast`	`bool`	Keep more pages in cache for faster repeated queries (`--fast`).	`False`
`threads`	`int \| None`	Number of threads (`-t`).	`None`
`verbose`	`str \| None`	Verbosity level (`-v`).	`None`

Returns:

Type	Description
`LoggedBoundCommand`	A plumbum `BoundCommand` ready to execute.

kmindex_query2

kmindex_query2(
    *,
    index: str | Path,
    fastx: str | Path,
    output: str | Path = "output",
    names: str | None = None,
    zvalue: int | None = None,
    threshold: float | None = None,
    single_query: str | None = None,
    format: str | None = None,
    batch_size: int | None = None,
    aggregate: bool = False,
    fast: bool = False,
    threads: int | None = None,
    verbose: str | None = None,
) -> LoggedBoundCommand

Query a kmindex index — optimised for large numbers of sub-indexes.

Drop-in replacement for :func:kmindex_query when the global index contains hundreds or thousands of sub-indexes.

Parameters:

Name	Type	Description	Default
`index`	`str \| Path`	Global index path (`-i`).	required
`fastx`	`str \| Path`	Input FASTA/FASTQ file, supports gz/bzip2 (`-q`).	required
`output`	`str \| Path`	Output directory, default `"output"` (`-o`).	`'output'`
`names`	`str \| None`	Comma-separated list of sub-indexes to query (`-n`).	`None`
`zvalue`	`int \| None`	Findere z value (`-z`).	`None`
`threshold`	`float \| None`	Minimum shared k-mer fraction (`-r`).	`None`
`single_query`	`str \| None`	Treat all sequences as a single query (`-s`).	`None`
`format`	`str \| None`	Output format (`-f`).	`None`
`batch_size`	`int \| None`	Batch size; 0 = auto (`-b`).	`None`
`aggregate`	`bool`	Aggregate batch results (`-a`).	`False`
`fast`	`bool`	Keep more pages in cache (`--fast`).	`False`
`threads`	`int \| None`	Number of threads (`-t`).	`None`
`verbose`	`str \| None`	Verbosity level (`-v`).	`None`

Returns:

Type	Description
`LoggedBoundCommand`	A plumbum `BoundCommand` ready to execute.

kmindex_merge

kmindex_merge(
    *,
    index: str | Path,
    new_name: str,
    new_path: str | Path,
    to_merge: list[str],
    rename: str | None = None,
    delete_old: bool = False,
    threads: int | None = None,
    verbose: str | None = None,
) -> LoggedBoundCommand

Merge sub-indexes into a new combined sub-index.

Sub-indexes containing identical sample identifiers cannot be merged without renaming — use the rename parameter in that case.

Parameters:

Name	Type	Description	Default
`index`	`str \| Path`	Global index path (`-i`).	required
`new_name`	`str`	Name for the merged sub-index (`-n`).	required
`new_path`	`str \| Path`	Output path for the merged sub-index (`-p`).	required
`to_merge`	`list[str]`	Sub-index names to merge, passed as a comma-separated list (`-m`).	required
`rename`	`str \| None`	Rename strategy for sample identifiers (`-r`). Three forms: `"f:id1.txt,id2.txt,..."` — one identifier file per sub-index (one id per line). `"s:prefix_{}"` — format string (`{}` replaced by an integer). Manual editing of `kmtricks.fof` files (not recommended).	`None`
`delete_old`	`bool`	Delete old sub-index files after merging (`-d`).	`False`
`threads`	`int \| None`	Number of threads (`-t`).	`None`
`verbose`	`str \| None`	Verbosity level (`-v`).	`None`

Returns:

Type	Description
`LoggedBoundCommand`	A plumbum `BoundCommand` ready to execute.

kmindex_index_infos

kmindex_index_infos(
    *, index: str | Path, verbose: str | None = None
) -> LoggedBoundCommand

Print information about a kmindex global index.

Parameters:

Name	Type	Description	Default
`index`	`str \| Path`	Global index path (`-i`).	required
`verbose`	`str \| None`	Verbosity level (`-v`).	`None`

Returns:

Type	Description
`LoggedBoundCommand`	A plumbum `BoundCommand` ready to execute.

kmindex_compress

kmindex_compress(
    index: str | Path, *args: str
) -> LoggedBoundCommand

Compress a kmindex index.

Parameters:

Name	Type	Description	Default
`index`	`str \| Path`	Global index path (`-i`).	required
`*args`	`str`	Additional flags passed directly to `kmindex compress`.	`()`

Returns:

Type	Description
`LoggedBoundCommand`	A plumbum `BoundCommand` ready to execute.

kmindex_sum_index

kmindex_sum_index(*args: str) -> LoggedBoundCommand

Build a lightweight summarised index (experimental).

At query time reports only the number of samples containing each k-mer.

Parameters:

Name	Type	Description	Default
`*args`	`str`	Flags passed directly to `kmindex sum-index`.	`()`

Returns:

Type	Description
`LoggedBoundCommand`	A plumbum `BoundCommand` ready to execute.

kmindex_sum_query

kmindex_sum_query(*args: str) -> LoggedBoundCommand

Query a summarised index (experimental).

Parameters:

Name	Type	Description	Default
`*args`	`str`	Flags passed directly to `kmindex sum-query`.	`()`

Returns:

Type	Description
`LoggedBoundCommand`	A plumbum `BoundCommand` ready to execute.