DeepSeek AI Releases DeepEP: An Open-Source EP Communication Library for MoE Model Training and Inference

Large language models utilizing the Mixture-of-Experts (MoE) architecture have significantly enhanced model capacity without a proportional increase in computational demands. However, this advancement presents challenges, particularly in GPU communication. In MoE models, only a subset of experts is activated for each token, making efficient data exchange between devices crucial. Traditional all-to-all communication methods can create bottlenecks, leading to increased latency and underutilized GPU resources. In latency-sensitive environments, such as real-time inference, even minor delays can negatively impact overall performance. Additionally, while low-precision operations (like FP8) reduce memory usage, they require careful optimization to maintain model quality. These challenges highlight the necessity for a communication library specifically designed for expert parallelism.

DeepSeek AI has introduced DeepEP, a communication library tailored for MoE models and expert parallelism (EP). DeepEP effectively addresses inefficiencies in token dispatch and aggregation across GPUs. It provides high-throughput, low-latency all-to-all GPU kernels—known as MoE dispatch and combine kernels—that streamline data exchange during training and inference. DeepEP also supports low-precision operations (including FP8), aligning with techniques outlined in the DeepSeek-V3 paper. This library directly responds to the challenges of scaling MoE architectures in both intranode and internode environments.

Technical Overview and Benefits

DeepEP features two main types of kernels designed for different operational needs:

  • Normal Kernels: Optimized for high-throughput scenarios, these kernels efficiently forward data across GPUs using NVLink and RDMA technologies. Tests on Hopper GPUs with NVLink have shown throughput of approximately 153 GB/s for intranode communication, while internode tests using CX7 InfiniBand achieve stable performance around 43–47 GB/s.
  • Low-Latency Kernels: For tasks requiring quick responses, these kernels utilize RDMA and are designed for small batch sizes typical in real-time applications, achieving latencies as low as 163 microseconds for dispatch operations involving eight experts.

DeepEP also offers adaptive configurations, allowing users to adjust parameters like the number of streaming multiprocessors (SMs) in use and manage traffic isolation. Adaptive routing in low-latency kernels helps distribute network traffic evenly under heavy loads, enhancing robustness.

Performance Insights and Practical Outcomes

DeepEP demonstrates impressive performance metrics. Normal kernels can achieve intranode communication throughput of up to 153 GB/s, while internode setups maintain around 43–47 GB/s. Low-latency kernels excel in production scenarios, processing a batch of 128 tokens with eight experts and achieving dispatch latency as low as 163 microseconds. These optimizations lead to faster response times in inference decoding and improved throughput in training, allowing for larger batch sizes and smoother computation-communication overlap.

Conclusion

DeepEP is a significant advancement in large-scale language model deployment. By addressing key communication bottlenecks in MoE architectures, it enhances training and inference efficiency. Its dual-kernel approach—one for high throughput and another for low latency—provides flexibility for various applications. With support for low-precision operations and adaptive configuration mechanisms, DeepEP serves as a practical tool for optimizing expert parallelism.

In summary, DeepSeek AI’s release of DeepEP represents a well-engineered solution that balances performance with resource efficiency, paving the way for more scalable and responsive AI models in both academic research and real-world applications.

Explore how artificial intelligence technology can transform your work processes. Identify areas for automation and customer interactions where AI can add value. Establish key performance indicators (KPIs) to ensure your AI investments yield positive business impacts. Choose tools that fit your needs and allow customization to meet your objectives. Start with a small project, assess its effectiveness, and gradually expand your AI initiatives.

If you need guidance on managing AI in business, contact us at hello@itinai.ru. Connect with us on Telegram, X, and LinkedIn.

AI Products for Business or Try Custom Development

AI Sales Bot

Welcome AI Sales Bot, your 24/7 teammate! Engaging customers in natural language across all channels and learning from your materials, it’s a step towards efficient, enriched customer interactions and sales

AI Document Assistant

Unlock insights and drive decisions with our AI Insights Suite. Indexing your documents and data, it provides smart, AI-driven decision support, enhancing your productivity and decision-making.

AI Customer Support

Upgrade your support with our AI Assistant, reducing response times and personalizing interactions by analyzing documents and past engagements. Boost your team and customer satisfaction

AI Scrum Bot

Enhance agile management with our AI Scrum Bot, it helps to organize retrospectives. It answers queries and boosts collaboration and efficiency in your scrum processes.