skimindex.unix.kmindex
skimindex.unix.kmindex
kmindex wrapper module using plumbum.
Provides a Pythonic interface to kmindex v0.6.0 — a tool for indexing
and querying kmtricks Bloom filter / counting Bloom filter matrices.
Two API styles:
-
Flexible — pass any subcommand and flags directly::
kmindex("build", "-i", "/indexes/main", "-f", "fof.txt", ...)
-
Shortcuts — one function per subcommand with typed keyword arguments::
kmindex_build(index="/indexes/main", fof="fof.txt", run_dir="@inplace", register_as="human", kmer_size=31, threads=8)
All functions return a plumbum BoundCommand that can be run with
& FG, & BG, or piped with |.
Example
from skimindex.unix.kmindex import kmindex_build, kmindex_query
from plumbum import FG
kmindex_build(
index="/indexes/decontam",
fof="samples.fof",
run_dir="@inplace",
register_as="bacteria",
kmer_size=29,
threads=16,
) & FG
kmindex_query(
index="/indexes/decontam",
fastx="sample.fa.gz",
output="results/",
zvalue=3,
threads=8,
) & FG
kmindex
kmindex(*args: str) -> LoggedBoundCommand
Execute a kmindex command with arbitrary arguments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*args
|
str
|
Subcommand and flags, e.g. |
()
|
Returns:
| Type | Description |
|---|---|
LoggedBoundCommand
|
A plumbum |
kmindex_build
kmindex_build(
*,
index: str | Path,
fof: str | Path,
run_dir: str | Path = "@inplace",
register_as: str,
from_index: str | None = None,
km_path: str | Path | None = None,
kmer_size: int | None = None,
minim_size: int | None = None,
hard_min: int | None = None,
nb_partitions: int | None = None,
bloom_size: int | None = None,
nb_cell: int | None = None,
bitw: int | None = None,
threads: int | None = None,
cpr: bool = False,
verbose: str | None = None,
) -> LoggedBoundCommand
Build a kmindex sub-index from a kmtricks file-of-files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index
|
str | Path
|
Global index path ( |
required |
fof
|
str | Path
|
kmtricks input file — file-of-files ( |
required |
run_dir
|
str | Path
|
kmtricks runtime directory. Use |
'@inplace'
|
register_as
|
str
|
Name under which the sub-index is registered ( |
required |
from_index
|
str | None
|
Re-use parameters from a pre-registered sub-index
( |
None
|
km_path
|
str | Path | None
|
Path to the |
None
|
kmer_size
|
int | None
|
k-mer length in |
None
|
minim_size
|
int | None
|
Minimizer length in |
None
|
hard_min
|
int | None
|
Minimum abundance to keep a k-mer, default 2
( |
None
|
nb_partitions
|
int | None
|
Number of partitions, 0 = auto ( |
None
|
bloom_size
|
int | None
|
Bloom filter size for presence/absence indexing
( |
None
|
nb_cell
|
int | None
|
Number of cells per counting Bloom filter for abundance
indexing ( |
None
|
bitw
|
int | None
|
Bits per cell for abundance indexing, default 2 ( |
None
|
threads
|
int | None
|
Number of threads ( |
None
|
cpr
|
bool
|
Compress intermediate files ( |
False
|
verbose
|
str | None
|
Verbosity level: |
None
|
Returns:
| Type | Description |
|---|---|
LoggedBoundCommand
|
A plumbum |
kmindex_register
kmindex_register(
*,
global_index: str | Path,
name: str | None = None,
index_path: str | Path | None = None,
from_file: str | Path | None = None,
mode: str = "symlink",
verbose: str | None = None,
) -> LoggedBoundCommand
Register an existing kmtricks run as a sub-index.
Either provide name + index_path for a single sub-index, or
from_file for batch registration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
global_index
|
str | Path
|
Global index path ( |
required |
name
|
str | None
|
Sub-index name; ignored when |
None
|
index_path
|
str | Path | None
|
Path to a kmtricks run directory; ignored when
|
None
|
from_file
|
str | Path | None
|
Tab-separated file with |
None
|
mode
|
str
|
Registration mode: |
'symlink'
|
verbose
|
str | None
|
Verbosity level ( |
None
|
Returns:
| Type | Description |
|---|---|
LoggedBoundCommand
|
A plumbum |
kmindex_query
kmindex_query(
*,
index: str | Path,
fastx: str | Path,
output: str | Path = "output",
names: str | None = None,
zvalue: int | None = None,
threshold: float | None = None,
single_query: str | None = None,
format: str | None = None,
batch_size: int | None = None,
aggregate: bool = False,
fast: bool = False,
threads: int | None = None,
verbose: str | None = None,
) -> LoggedBoundCommand
Query a kmindex index with a FASTA/FASTQ file.
Use :func:kmindex_query2 instead when the index contains hundreds or
thousands of sub-indexes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index
|
str | Path
|
Global index path ( |
required |
fastx
|
str | Path
|
Input FASTA/FASTQ file, supports gz/bzip2 ( |
required |
output
|
str | Path
|
Output directory, default |
'output'
|
names
|
str | None
|
Comma-separated list of sub-indexes to query; all if omitted
( |
None
|
zvalue
|
int | None
|
Findere z value — index s-mers, query |
None
|
threshold
|
float | None
|
Minimum shared k-mer fraction in |
None
|
single_query
|
str | None
|
Query identifier — treat all sequences as a single
query ( |
None
|
format
|
str | None
|
Output format: |
None
|
batch_size
|
int | None
|
Size of query batches; 0 = auto ( |
None
|
aggregate
|
bool
|
Aggregate batch results into one file ( |
False
|
fast
|
bool
|
Keep more pages in cache for faster repeated queries
( |
False
|
threads
|
int | None
|
Number of threads ( |
None
|
verbose
|
str | None
|
Verbosity level ( |
None
|
Returns:
| Type | Description |
|---|---|
LoggedBoundCommand
|
A plumbum |
kmindex_query2
kmindex_query2(
*,
index: str | Path,
fastx: str | Path,
output: str | Path = "output",
names: str | None = None,
zvalue: int | None = None,
threshold: float | None = None,
single_query: str | None = None,
format: str | None = None,
batch_size: int | None = None,
aggregate: bool = False,
fast: bool = False,
threads: int | None = None,
verbose: str | None = None,
) -> LoggedBoundCommand
Query a kmindex index — optimised for large numbers of sub-indexes.
Drop-in replacement for :func:kmindex_query when the global index
contains hundreds or thousands of sub-indexes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index
|
str | Path
|
Global index path ( |
required |
fastx
|
str | Path
|
Input FASTA/FASTQ file, supports gz/bzip2 ( |
required |
output
|
str | Path
|
Output directory, default |
'output'
|
names
|
str | None
|
Comma-separated list of sub-indexes to query ( |
None
|
zvalue
|
int | None
|
Findere z value ( |
None
|
threshold
|
float | None
|
Minimum shared k-mer fraction ( |
None
|
single_query
|
str | None
|
Treat all sequences as a single query ( |
None
|
format
|
str | None
|
Output format ( |
None
|
batch_size
|
int | None
|
Batch size; 0 = auto ( |
None
|
aggregate
|
bool
|
Aggregate batch results ( |
False
|
fast
|
bool
|
Keep more pages in cache ( |
False
|
threads
|
int | None
|
Number of threads ( |
None
|
verbose
|
str | None
|
Verbosity level ( |
None
|
Returns:
| Type | Description |
|---|---|
LoggedBoundCommand
|
A plumbum |
kmindex_merge
kmindex_merge(
*,
index: str | Path,
new_name: str,
new_path: str | Path,
to_merge: list[str],
rename: str | None = None,
delete_old: bool = False,
threads: int | None = None,
verbose: str | None = None,
) -> LoggedBoundCommand
Merge sub-indexes into a new combined sub-index.
Sub-indexes containing identical sample identifiers cannot be merged
without renaming — use the rename parameter in that case.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index
|
str | Path
|
Global index path ( |
required |
new_name
|
str
|
Name for the merged sub-index ( |
required |
new_path
|
str | Path
|
Output path for the merged sub-index ( |
required |
to_merge
|
list[str]
|
Sub-index names to merge, passed as a comma-separated
list ( |
required |
rename
|
str | None
|
Rename strategy for sample identifiers (
|
None
|
delete_old
|
bool
|
Delete old sub-index files after merging ( |
False
|
threads
|
int | None
|
Number of threads ( |
None
|
verbose
|
str | None
|
Verbosity level ( |
None
|
Returns:
| Type | Description |
|---|---|
LoggedBoundCommand
|
A plumbum |
kmindex_index_infos
kmindex_index_infos(
*, index: str | Path, verbose: str | None = None
) -> LoggedBoundCommand
Print information about a kmindex global index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index
|
str | Path
|
Global index path ( |
required |
verbose
|
str | None
|
Verbosity level ( |
None
|
Returns:
| Type | Description |
|---|---|
LoggedBoundCommand
|
A plumbum |
kmindex_compress
kmindex_compress(
index: str | Path, *args: str
) -> LoggedBoundCommand
Compress a kmindex index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index
|
str | Path
|
Global index path ( |
required |
*args
|
str
|
Additional flags passed directly to |
()
|
Returns:
| Type | Description |
|---|---|
LoggedBoundCommand
|
A plumbum |
kmindex_sum_index
kmindex_sum_index(*args: str) -> LoggedBoundCommand
Build a lightweight summarised index (experimental).
At query time reports only the number of samples containing each k-mer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*args
|
str
|
Flags passed directly to |
()
|
Returns:
| Type | Description |
|---|---|
LoggedBoundCommand
|
A plumbum |
kmindex_sum_query
kmindex_sum_query(*args: str) -> LoggedBoundCommand
Query a summarised index (experimental).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*args
|
str
|
Flags passed directly to |
()
|
Returns:
| Type | Description |
|---|---|
LoggedBoundCommand
|
A plumbum |