Skip to content

skimindex.unix.kmindex

skimindex.unix.kmindex

kmindex wrapper module using plumbum.

Provides a Pythonic interface to kmindex v0.6.0 — a tool for indexing and querying kmtricks Bloom filter / counting Bloom filter matrices.

Two API styles:

  1. Flexible — pass any subcommand and flags directly::

    kmindex("build", "-i", "/indexes/main", "-f", "fof.txt", ...)

  2. Shortcuts — one function per subcommand with typed keyword arguments::

    kmindex_build(index="/indexes/main", fof="fof.txt", run_dir="@inplace", register_as="human", kmer_size=31, threads=8)

All functions return a plumbum BoundCommand that can be run with & FG, & BG, or piped with |.

Example
from skimindex.unix.kmindex import kmindex_build, kmindex_query
from plumbum import FG

kmindex_build(
    index="/indexes/decontam",
    fof="samples.fof",
    run_dir="@inplace",
    register_as="bacteria",
    kmer_size=29,
    threads=16,
) & FG

kmindex_query(
    index="/indexes/decontam",
    fastx="sample.fa.gz",
    output="results/",
    zvalue=3,
    threads=8,
) & FG

kmindex

kmindex(*args: str) -> LoggedBoundCommand

Execute a kmindex command with arbitrary arguments.

Parameters:

Name Type Description Default
*args str

Subcommand and flags, e.g. "build", "-i", "/path".

()

Returns:

Type Description
LoggedBoundCommand

A plumbum BoundCommand ready to execute.

kmindex_build

kmindex_build(
    *,
    index: str | Path,
    fof: str | Path,
    run_dir: str | Path = "@inplace",
    register_as: str,
    from_index: str | None = None,
    km_path: str | Path | None = None,
    kmer_size: int | None = None,
    minim_size: int | None = None,
    hard_min: int | None = None,
    nb_partitions: int | None = None,
    bloom_size: int | None = None,
    nb_cell: int | None = None,
    bitw: int | None = None,
    threads: int | None = None,
    cpr: bool = False,
    verbose: str | None = None,
) -> LoggedBoundCommand

Build a kmindex sub-index from a kmtricks file-of-files.

Parameters:

Name Type Description Default
index str | Path

Global index path (-i).

required
fof str | Path

kmtricks input file — file-of-files (-f).

required
run_dir str | Path

kmtricks runtime directory. Use "@inplace" to build inside the global index directory (-d).

'@inplace'
register_as str

Name under which the sub-index is registered (-r).

required
from_index str | None

Re-use parameters from a pre-registered sub-index (--from).

None
km_path str | Path | None

Path to the kmtricks binary; searched in $PATH if omitted (--km-path).

None
kmer_size int | None

k-mer length in [8, 255], default 31 (-k).

None
minim_size int | None

Minimizer length in [4, 15], default 10 (-m).

None
hard_min int | None

Minimum abundance to keep a k-mer, default 2 (--hard-min).

None
nb_partitions int | None

Number of partitions, 0 = auto (--nb-partitions).

None
bloom_size int | None

Bloom filter size for presence/absence indexing (--bloom-size).

None
nb_cell int | None

Number of cells per counting Bloom filter for abundance indexing (--nb-cell).

None
bitw int | None

Bits per cell for abundance indexing, default 2 (--bitw). Abundances are stored as log₂ classes: 2^bitw classes.

None
threads int | None

Number of threads (-t).

None
cpr bool

Compress intermediate files (--cpr).

False
verbose str | None

Verbosity level: debug, info, warning, error (-v).

None

Returns:

Type Description
LoggedBoundCommand

A plumbum BoundCommand ready to execute.

kmindex_register

kmindex_register(
    *,
    global_index: str | Path,
    name: str | None = None,
    index_path: str | Path | None = None,
    from_file: str | Path | None = None,
    mode: str = "symlink",
    verbose: str | None = None,
) -> LoggedBoundCommand

Register an existing kmtricks run as a sub-index.

Either provide name + index_path for a single sub-index, or from_file for batch registration.

Parameters:

Name Type Description Default
global_index str | Path

Global index path (-i).

required
name str | None

Sub-index name; ignored when from_file is set (-n).

None
index_path str | Path | None

Path to a kmtricks run directory; ignored when from_file is set (-p).

None
from_file str | Path | None

Tab-separated file with index_name<tab>index_path per line (-f).

None
mode str

Registration mode: symlink, copy, or move (-m), default symlink.

'symlink'
verbose str | None

Verbosity level (-v).

None

Returns:

Type Description
LoggedBoundCommand

A plumbum BoundCommand ready to execute.

kmindex_query

kmindex_query(
    *,
    index: str | Path,
    fastx: str | Path,
    output: str | Path = "output",
    names: str | None = None,
    zvalue: int | None = None,
    threshold: float | None = None,
    single_query: str | None = None,
    format: str | None = None,
    batch_size: int | None = None,
    aggregate: bool = False,
    fast: bool = False,
    threads: int | None = None,
    verbose: str | None = None,
) -> LoggedBoundCommand

Query a kmindex index with a FASTA/FASTQ file.

Use :func:kmindex_query2 instead when the index contains hundreds or thousands of sub-indexes.

Parameters:

Name Type Description Default
index str | Path

Global index path (-i).

required
fastx str | Path

Input FASTA/FASTQ file, supports gz/bzip2 (-q).

required
output str | Path

Output directory, default "output" (-o).

'output'
names str | None

Comma-separated list of sub-indexes to query; all if omitted (-n).

None
zvalue int | None

Findere z value — index s-mers, query (s+z)-mers. Enables approximate matching against an index built with size K by querying with size K+z (-z).

None
threshold float | None

Minimum shared k-mer fraction in [0.0, 1.0] to report a hit (-r).

None
single_query str | None

Query identifier — treat all sequences as a single query (-s).

None
format str | None

Output format: json, matrix, json_vec, jsonl, jsonl_vec (-f).

None
batch_size int | None

Size of query batches; 0 = auto (-b).

None
aggregate bool

Aggregate batch results into one file (-a).

False
fast bool

Keep more pages in cache for faster repeated queries (--fast).

False
threads int | None

Number of threads (-t).

None
verbose str | None

Verbosity level (-v).

None

Returns:

Type Description
LoggedBoundCommand

A plumbum BoundCommand ready to execute.

kmindex_query2

kmindex_query2(
    *,
    index: str | Path,
    fastx: str | Path,
    output: str | Path = "output",
    names: str | None = None,
    zvalue: int | None = None,
    threshold: float | None = None,
    single_query: str | None = None,
    format: str | None = None,
    batch_size: int | None = None,
    aggregate: bool = False,
    fast: bool = False,
    threads: int | None = None,
    verbose: str | None = None,
) -> LoggedBoundCommand

Query a kmindex index — optimised for large numbers of sub-indexes.

Drop-in replacement for :func:kmindex_query when the global index contains hundreds or thousands of sub-indexes.

Parameters:

Name Type Description Default
index str | Path

Global index path (-i).

required
fastx str | Path

Input FASTA/FASTQ file, supports gz/bzip2 (-q).

required
output str | Path

Output directory, default "output" (-o).

'output'
names str | None

Comma-separated list of sub-indexes to query (-n).

None
zvalue int | None

Findere z value (-z).

None
threshold float | None

Minimum shared k-mer fraction (-r).

None
single_query str | None

Treat all sequences as a single query (-s).

None
format str | None

Output format (-f).

None
batch_size int | None

Batch size; 0 = auto (-b).

None
aggregate bool

Aggregate batch results (-a).

False
fast bool

Keep more pages in cache (--fast).

False
threads int | None

Number of threads (-t).

None
verbose str | None

Verbosity level (-v).

None

Returns:

Type Description
LoggedBoundCommand

A plumbum BoundCommand ready to execute.

kmindex_merge

kmindex_merge(
    *,
    index: str | Path,
    new_name: str,
    new_path: str | Path,
    to_merge: list[str],
    rename: str | None = None,
    delete_old: bool = False,
    threads: int | None = None,
    verbose: str | None = None,
) -> LoggedBoundCommand

Merge sub-indexes into a new combined sub-index.

Sub-indexes containing identical sample identifiers cannot be merged without renaming — use the rename parameter in that case.

Parameters:

Name Type Description Default
index str | Path

Global index path (-i).

required
new_name str

Name for the merged sub-index (-n).

required
new_path str | Path

Output path for the merged sub-index (-p).

required
to_merge list[str]

Sub-index names to merge, passed as a comma-separated list (-m).

required
rename str | None

Rename strategy for sample identifiers (-r). Three forms:

  • "f:id1.txt,id2.txt,..." — one identifier file per sub-index (one id per line).
  • "s:prefix_{}" — format string ({} replaced by an integer).
  • Manual editing of kmtricks.fof files (not recommended).
None
delete_old bool

Delete old sub-index files after merging (-d).

False
threads int | None

Number of threads (-t).

None
verbose str | None

Verbosity level (-v).

None

Returns:

Type Description
LoggedBoundCommand

A plumbum BoundCommand ready to execute.

kmindex_index_infos

kmindex_index_infos(
    *, index: str | Path, verbose: str | None = None
) -> LoggedBoundCommand

Print information about a kmindex global index.

Parameters:

Name Type Description Default
index str | Path

Global index path (-i).

required
verbose str | None

Verbosity level (-v).

None

Returns:

Type Description
LoggedBoundCommand

A plumbum BoundCommand ready to execute.

kmindex_compress

kmindex_compress(
    index: str | Path, *args: str
) -> LoggedBoundCommand

Compress a kmindex index.

Parameters:

Name Type Description Default
index str | Path

Global index path (-i).

required
*args str

Additional flags passed directly to kmindex compress.

()

Returns:

Type Description
LoggedBoundCommand

A plumbum BoundCommand ready to execute.

kmindex_sum_index

kmindex_sum_index(*args: str) -> LoggedBoundCommand

Build a lightweight summarised index (experimental).

At query time reports only the number of samples containing each k-mer.

Parameters:

Name Type Description Default
*args str

Flags passed directly to kmindex sum-index.

()

Returns:

Type Description
LoggedBoundCommand

A plumbum BoundCommand ready to execute.

kmindex_sum_query

kmindex_sum_query(*args: str) -> LoggedBoundCommand

Query a summarised index (experimental).

Parameters:

Name Type Description Default
*args str

Flags passed directly to kmindex sum-query.

()

Returns:

Type Description
LoggedBoundCommand

A plumbum BoundCommand ready to execute.