vllm.v1.attention.ops ¶
Modules:
| Name | Description |
|---|---|
chunked_prefill_paged_decode | |
common | |
dcp_alltoall | DCP All-to-All communication backend for attention. |
deepseek_v4_ops | |
flashmla | |
merge_attn_states | |
mqa_logits_triton | Triton fallback for DeepGEMM's fp8_mqa_logits / fp8_paged_mqa_logits. |
rocm_aiter_mla_sparse | |
triton_attention_helpers | Shared |
triton_decode_attention | Memory-efficient attention for decoding. |
triton_mla_sparse_kernel | Triton sparse MLA attention with split-KV for low-batch decode. |
triton_prefill_attention | Memory-efficient attention for prefill. |
triton_reshape_and_cache_flash | |
triton_turboquant_decode | Triton fused TurboQuant decode attention. |
triton_turboquant_store | Fused Triton kernels for TurboQuant KV store. |
triton_unified_attention | |
vit_attn_wrappers | This file contains ops for ViT attention to be compatible with torch.compile |
xpu_mla_sparse | |