

Accelerating Milvus on Arm: SVE-Enhanced Vector Search for RAG Applications
Wednesday, June 24, 2026 3:45 PM to 5:15 PM · 1 hr. 30 min. (Europe/Berlin)
Foyer D-G - 2nd Floor
Project Poster
Mixed PrecisionOptimizing for Energy and PerformanceParallel Numerical AlgorithmsPerformance Measurement
Information
Poster is on display.
Efficient Retrieval-Augmented Generation (RAG) systems serve as a cornerstone for numerous AI applications, such as natural language processing and recommendation systems. These systems demand fast and scalable vector search capabilities to perform real-time data retrieval and inference. Milvus, an open-source vector database, has been recognized as a leader in the domain for its robust handling of large-scale vectors and rapid similarity searches. However, its performance on Arm CPUs often encounters limitations due to sub-optimal use of advanced vector instructions, which are critical for maximizing throughput in these high-computation environments.
Our project focuses on addressing these limitations by leveraging Arm's Scalable Vector Extension (SVE), a powerful set of SIMD instructions designed to enhance processing efficiency. We have integrated SVE-optimized distance kernels into Milvus' core vector execution engine, Knowhere. This integration targets key vector search operations, enhancing their computational efficiency and ensuring they fully utilize the capabilities of Arm CPUs.
The innovation presented in our work offers significant performance improvements, achieving up to 2x faster retrieval times for critical index types such as IVF_FLAT, IVF_SQ8, HNSW, and AutoIndex within Milvus supporting multiple data types FP32, FP16, BF16 & INT8. This acceleration directly correlates to several tangible benefits: higher query throughput allows for managing greater volumes of data with reduced latency; lower inference latency ensures quicker AI response times, crucial for real-time applications; and greater hardware efficiency maximizes the utility of existing infrastructure, leading to cost savings and improved sustainability.
Furthermore, our optimizations were evaluated using VectorDBBench, the official Milvus benchmarking tool, on a 32-core AWS Graviton3 system with a 1M-vector dataset & are fully up streamed in the open-source ecosystem, making them readily accessible to developers and researchers working on SVE-capable Arm hardware. This collaborative approach not only aids the immediate community of Milvus users but also sets a precedent for future enhancements, fostering innovation across the wide array of applications reliant on vector search technologies.
In summary, our work represents a pivotal advancement in harnessing Arm's SVE capabilities within Milvus, underscoring the transformative potential of optimized vector processing in accelerating RAG workloads. By enabling ~ 2x faster and more efficient vector searches, we pave the way for more responsive and capable AI systems, pushing forward the boundaries of performance in high-dimensional data environments.
Efficient Retrieval-Augmented Generation (RAG) systems serve as a cornerstone for numerous AI applications, such as natural language processing and recommendation systems. These systems demand fast and scalable vector search capabilities to perform real-time data retrieval and inference. Milvus, an open-source vector database, has been recognized as a leader in the domain for its robust handling of large-scale vectors and rapid similarity searches. However, its performance on Arm CPUs often encounters limitations due to sub-optimal use of advanced vector instructions, which are critical for maximizing throughput in these high-computation environments.
Our project focuses on addressing these limitations by leveraging Arm's Scalable Vector Extension (SVE), a powerful set of SIMD instructions designed to enhance processing efficiency. We have integrated SVE-optimized distance kernels into Milvus' core vector execution engine, Knowhere. This integration targets key vector search operations, enhancing their computational efficiency and ensuring they fully utilize the capabilities of Arm CPUs.
The innovation presented in our work offers significant performance improvements, achieving up to 2x faster retrieval times for critical index types such as IVF_FLAT, IVF_SQ8, HNSW, and AutoIndex within Milvus supporting multiple data types FP32, FP16, BF16 & INT8. This acceleration directly correlates to several tangible benefits: higher query throughput allows for managing greater volumes of data with reduced latency; lower inference latency ensures quicker AI response times, crucial for real-time applications; and greater hardware efficiency maximizes the utility of existing infrastructure, leading to cost savings and improved sustainability.
Furthermore, our optimizations were evaluated using VectorDBBench, the official Milvus benchmarking tool, on a 32-core AWS Graviton3 system with a 1M-vector dataset & are fully up streamed in the open-source ecosystem, making them readily accessible to developers and researchers working on SVE-capable Arm hardware. This collaborative approach not only aids the immediate community of Milvus users but also sets a precedent for future enhancements, fostering innovation across the wide array of applications reliant on vector search technologies.
In summary, our work represents a pivotal advancement in harnessing Arm's SVE capabilities within Milvus, underscoring the transformative potential of optimized vector processing in accelerating RAG workloads. By enabling ~ 2x faster and more efficient vector searches, we pave the way for more responsive and capable AI systems, pushing forward the boundaries of performance in high-dimensional data environments.
Format
on-demandon-site
