vllm.v1.attention.ops ¶

Modules:

Name	Description
`chunked_prefill_paged_decode`
`common`
`dcp_alltoall`	DCP All-to-All communication backend for attention.
`deepseek_v4_ops`
`flashmla`
`merge_attn_states`
`mqa_logits_triton`	Triton fallback for DeepGEMM's fp8_mqa_logits / fp8_paged_mqa_logits.
`rocm_aiter_mla_sparse`
`triton_attention_helpers`	Shared `@triton.jit` helpers used by the unified attention kernel
`triton_decode_attention`	Memory-efficient attention for decoding.
`triton_mla_sparse_kernel`	Triton sparse MLA attention with split-KV for low-batch decode.
`triton_prefill_attention`	Memory-efficient attention for prefill.
`triton_reshape_and_cache_flash`
`triton_turboquant_decode`	Triton fused TurboQuant decode attention.
`triton_turboquant_store`	Fused Triton kernels for TurboQuant KV store.
`triton_unified_attention`
`vit_attn_wrappers`	This file contains ops for ViT attention to be compatible with torch.compile
`xpu_mla_sparse`