DeepSeek AI Releases DeepEP: An Open-Source EP Communication Library for MoE Model Training and Inference

Large language models utilizing the Mixture-of-Experts (MoE) architecture have significantly enhanced model capacity without a proportional increase in computational demands. However, this advancement presents challenges, particularly in GPU communication. In MoE models, only a subset of experts is activated for each token, making efficient data exchange between devices crucial. Traditional all-to-all communication methods can create bottlenecks, leading to increased latency and underutilized GPU resources. In latency-sensitive environments, such as real-time inference, even minor delays can negatively impact overall performance. Additionally, while low-precision operations (like FP8) reduce memory usage, they require careful optimization to maintain model quality. These challenges highlight the necessity for a communication library specifically designed for expert parallelism.

DeepSeek AI has introduced DeepEP, a communication library tailored for MoE models and expert parallelism (EP). DeepEP effectively addresses inefficiencies in token dispatch and aggregation across GPUs. It provides high-throughput, low-latency all-to-all GPU kernels—known as MoE dispatch and combine kernels—that streamline data exchange during training and inference. DeepEP also supports low-precision operations (including FP8), aligning with techniques outlined in the DeepSeek-V3 paper. This library directly responds to the challenges of scaling MoE architectures in both intranode and internode environments.

Technical Overview and Benefits

DeepEP features two main types of kernels designed for different operational needs:

Normal Kernels: Optimized for high-throughput scenarios, these kernels efficiently forward data across GPUs using NVLink and RDMA technologies. Tests on Hopper GPUs with NVLink have shown throughput of approximately 153 GB/s for intranode communication, while internode tests using CX7 InfiniBand achieve stable performance around 43–47 GB/s.
Low-Latency Kernels: For tasks requiring quick responses, these kernels utilize RDMA and are designed for small batch sizes typical in real-time applications, achieving latencies as low as 163 microseconds for dispatch operations involving eight experts.

DeepEP also offers adaptive configurations, allowing users to adjust parameters like the number of streaming multiprocessors (SMs) in use and manage traffic isolation. Adaptive routing in low-latency kernels helps distribute network traffic evenly under heavy loads, enhancing robustness.

Performance Insights and Practical Outcomes

DeepEP demonstrates impressive performance metrics. Normal kernels can achieve intranode communication throughput of up to 153 GB/s, while internode setups maintain around 43–47 GB/s. Low-latency kernels excel in production scenarios, processing a batch of 128 tokens with eight experts and achieving dispatch latency as low as 163 microseconds. These optimizations lead to faster response times in inference decoding and improved throughput in training, allowing for larger batch sizes and smoother computation-communication overlap.

Conclusion

DeepEP is a significant advancement in large-scale language model deployment. By addressing key communication bottlenecks in MoE architectures, it enhances training and inference efficiency. Its dual-kernel approach—one for high throughput and another for low latency—provides flexibility for various applications. With support for low-precision operations and adaptive configuration mechanisms, DeepEP serves as a practical tool for optimizing expert parallelism.

In summary, DeepSeek AI’s release of DeepEP represents a well-engineered solution that balances performance with resource efficiency, paving the way for more scalable and responsive AI models in both academic research and real-world applications.

Explore how artificial intelligence technology can transform your work processes. Identify areas for automation and customer interactions where AI can add value. Establish key performance indicators (KPIs) to ensure your AI investments yield positive business impacts. Choose tools that fit your needs and allow customization to meet your objectives. Start with a small project, assess its effectiveness, and gradually expand your AI initiatives.

If you need guidance on managing AI in business, contact us at hello@itinai.ru. Connect with us on Telegram, X, and LinkedIn.

Unleash Your Creative Potential with AI Agents

Competitors are already using AI Agents

Business Problems We Solve

Automation of internal processes.
Optimizing AI costs without huge budgets.
Training staff, developing custom courses for business needs
Integrating AI into client work, automating first lines of contact

Large and Medium Businesses

Startups

Offline Business

Get a plan to reduce routine and improve metrics

100% of clients report increased productivity and reduced operati

AI Agents

Localization Project Manager – Coordinating translation workflows, answering vendor or process-related questions.

Job Title: Localization Project Manager Overview The Localization Project Manager plays a vital role in coordinating translation workflows while addressing vendor and process-related queries. This position is crucial for ensuring that translation projects are executed efficiently…
AI Agents

Environmental Health & Safety Officer – Answering compliance-related questions, retrieving safety protocols or audit histories.

Professional Summary The AI-driven Environmental Health & Safety Officer is a reliable and effective digital team member that performs repetitive and time-consuming tasks with remarkable speed, accuracy, and stability. By automating these tasks, it frees up…
AI Agents

Legal Contract Reviewer – Auto-flagging clause inconsistencies or retrieving precedent cases for review.

Job Title: Legal Contract Reviewer – Auto-flagging Clause Inconsistencies or Retrieving Precedent Cases for Review The AI functions as a reliable and effective digital team member that excels in performing repetitive and time-consuming tasks. With remarkable…
AI Agents

Customer Retention Analyst – Creating customer summaries, identifying churn risk patterns, and suggesting retention steps.

Customer Retention Analyst Professional Summary A highly analytical and detail-oriented Customer Retention Analyst with a proven track record in creating comprehensive customer summaries, identifying churn risk patterns, and suggesting effective retention strategies. Adept at leveraging data-driven…

Itinai.com httpss.mj.runmrqch2uvtvo russian handsome charisma 9fdbb2d5 a55b 425d 8f3b 76d26f86710f 2

AI Business Accelerator

Start Your AI Business in Just a Week with itinai.com

You’re a great fit if you:

Have an audience (even 500+ followers in Instagram, email, etc.)
Have an idea, service, or product you want to scale
Can invest 2–3 hours a day
You’re motivated to earn with AI but don’t want to handle technical setup

AI news and solutions

Streamlining Repetitive Tasks During Exploratory Data Analysis

This article discusses automation in data science, particularly in the area of exploratory data analysis (EDA). The author emphasizes the importance of automating repetitive EDA tasks and demonstrates the creation of a utility to automate these…

AI Tech News
Researchers from CMU and UC Santa Barbara Propose Innovative AI-Based ‘Diagnosis of Thought’ Prompting for Cognitive Distortion Detection in Psychotherapy

Mental health disorders are underserved globally due to lack of specialists, subpar treatments, high costs, and societal stigma. Automated tools like chatbots and sentiment analysis have been developed to help, but they have limitations. Recent advancements…

AI Tech News
This Paper Explores the Future of Diagnosing and Managing Chronic Painful Temporomandibular Disorders: The Revolutionary Role of AI and Neuroimaging

The text discusses the complexity of diagnosing and treating chronic painful Temporomandibular Disorders (TMD), highlighting the role of neuroimaging and artificial intelligence (AI) in advancing understanding and management. AI integration with neuroimaging has shown promising results,…

AI Tech News
Open-Qwen2VL: A Fully Open and Efficient Multimodal Large Language Model

Open-Qwen2VL: A Solution for Effective Multimodal AI Integration Introducing Open-Qwen2VL: A Groundbreaking Multimodal Large Language Model Understanding the Challenge in Multimodal Models Multimodal Large Language Models (MLLMs) are becoming essential in bridging visual and textual data,…

AI Tech News
LLMDet: How Large Language Models Enhance Open-Vocabulary Object Detection

Introduction to Open-Vocabulary Object Detection Open-vocabulary object detection (OVD) allows for the identification of various objects using user-defined text labels. However, current methods face three main challenges: Dependence on Expensive Annotations: They require large-scale region-level annotations…

AI Tech News
Courage to Learn ML: A Deeper Dive into F1, Recall, Precision, and ROC Curves

The article “F1 Score: Your Key Metric for Imbalanced Data — But Do You Really Know Why?” explores the significance of F1 score, recall, precision, and ROC curves in assessing model performance. It emphasizes the importance of understanding…

AI Tech News
SelfCodeAlign: An Open and Transparent AI Framework for Training Code LLMs that Outperforms Larger Models without Distillation or Annotation Costs

Transforming Code Generation with AI Introduction to SelfCodeAlign Artificial intelligence is changing how we generate code in software engineering. Large language models (LLMs) are now essential for tasks like code synthesis, debugging, and optimization. However, creating…

AI Tech News
Google DeepMind Introduces ‘SALT’: A Machine Learning Approach to Efficiently Train High-Performing Large Language Models using SLMs

Understanding Large Language Models (LLMs) Large Language Models (LLMs) power many applications like chatbots, content generation, and understanding human language. They excel at recognizing complex language patterns from large datasets. However, training these models is costly…

AI Tech News
MIBench: A Comprehensive AI Benchmark for Model Inversion Attack and Defense

Understanding Model Inversion Attacks Model Inversion (MI) attacks are privacy threats targeting machine learning models. Attackers aim to reverse-engineer the model’s outputs to reveal sensitive training data, including private images, health information, financial details, and personal…

AI Tech News
Apple to Add New AI in iOS 18: Big Changes Coming

Apple Inc. is preparing to launch iOS 18 at its next Worldwide Developer Conference. The update will focus on integrating generative AI and is an effort to keep up with Google and OpenAI. Significant software advancements,…

AI Tech News
Microsoft AI Research Introduces OLA-VLM: A Vision-Centric Approach to Optimizing Multimodal Large Language Models

Advancements in Multimodal Large Language Models (MLLMs) Understanding MLLMs Multimodal large language models (MLLMs) are rapidly evolving technology that allows machines to understand both text and images at the same time. This capability is transforming fields…

AI Tech News
deepset Unveils Studio Tool to Revolutionize AI Pipeline Development with Visual Architecting, Native Integrations to deepset Cloud, and NVIDIA AI Enterprise for Seamless Deployment

Revolutionize AI Pipeline Development with deepset Studio Empower Your Teams with Visual Architecting and Seamless Deployment deepset, a leader in mission-critical AI, introduces deepset Studio, an innovative tool designed to empower product, engineering, and data teams.…

AI Tech News
Demystifying Vision-Language Models: An In-Depth Exploration

Vision-Language Models: Unveiling the Power of AI Practical Solutions and Value Vision-language models (VLMs) are revolutionizing AI with their ability to process both images and text, offering practical solutions for tasks like information retrieval and code…

AI Tech News
This AI Paper Introduces a Comprehensive Analysis of Computer Vision Backbones: Unveiling the Strengths and Weaknesses of Pretrained Models

The Battle of the Backbones (BoB) is a large-scale benchmark that compares different pretrained checkpoints and baselines in computer vision. It found that supervised convolutional networks perform better than transformers, while self-supervised models perform better than…

AI Tech News
40+ Cool AI Tools You Should Check Out (November 2023)

DeepSwap is an AI-based tool that allows users to create convincing deepfake videos and images easily. Aragon uses AI technology to create professional headshots quickly. AdCreative.ai is an AI solution for boosting advertising and social media…

AI Tech News
Continuous Arcade Learning Environment (CALE): Advancing the Capabilities of Arcade Learning Environment

Understanding Autonomous Agents in AI Autonomous agents are a key area of research in machine learning, particularly in reinforcement learning (RL). The goal is to create systems that can independently tackle various challenges. These agents should…

AI Tech News
AWS AI Labs Introduce CodeSage: A Bidirectional Encoder Representation Model for Source Code

AWS AI Labs has unveiled CODE SAGE, a groundbreaking bidirectional encoder representation model for programming code. It uses a two-stage training scheme and a vast dataset to enhance comprehension and manipulation of code. This model outperforms…

AI Tech News
Top healthcare use cases in 2023 that improved patient outcomes.

The health industry is seeing increased patient disengagement, driving organizations to adopt non-traditional care settings and technology. A blog discusses top healthcare use cases, including improved patient experience through AI chatbots, predictive analytics to avoid unnecessary…

AI Tech News
Researchers from Imperial College and GSK AI Introduce RAmBLA: A Machine Learning Framework for Evaluating the Reliability of LLMs as Assistants in the Biomedical Domain

AI Tech News
AI gains momentum in core manufacturing services functions

The potential for AI systems to revolutionize manufacturing is discussed by Ritu Jyoti, global AI research lead at IDC. Windmill manufacturers have employed AI to improve their processes, using digital twins and machine learning to simulate…

AI Tech News

DeepSeek AI Releases DeepEP: An Open-Source EP Communication Library for MoE Model Training and Inference

Technical Overview and Benefits

Performance Insights and Practical Outcomes

Conclusion

Unleash Your Creative Potential with AI Agents

Competitors are already using AI Agents

Business Problems We Solve

Localization Project Manager – Coordinating translation workflows, answering vendor or process-related questions.

Environmental Health & Safety Officer – Answering compliance-related questions, retrieving safety protocols or audit histories.

Legal Contract Reviewer – Auto-flagging clause inconsistencies or retrieving precedent cases for review.

Customer Retention Analyst – Creating customer summaries, identifying churn risk patterns, and suggesting retention steps.

Start Your AI Business in Just a Week with itinai.com

You’re a great fit if you:

AI news and solutions

Streamlining Repetitive Tasks During Exploratory Data Analysis

Researchers from CMU and UC Santa Barbara Propose Innovative AI-Based ‘Diagnosis of Thought’ Prompting for Cognitive Distortion Detection in Psychotherapy

This Paper Explores the Future of Diagnosing and Managing Chronic Painful Temporomandibular Disorders: The Revolutionary Role of AI and Neuroimaging

Open-Qwen2VL: A Fully Open and Efficient Multimodal Large Language Model

LLMDet: How Large Language Models Enhance Open-Vocabulary Object Detection

Courage to Learn ML: A Deeper Dive into F1, Recall, Precision, and ROC Curves

SelfCodeAlign: An Open and Transparent AI Framework for Training Code LLMs that Outperforms Larger Models without Distillation or Annotation Costs

Google DeepMind Introduces ‘SALT’: A Machine Learning Approach to Efficiently Train High-Performing Large Language Models using SLMs

MIBench: A Comprehensive AI Benchmark for Model Inversion Attack and Defense

Apple to Add New AI in iOS 18: Big Changes Coming

Microsoft AI Research Introduces OLA-VLM: A Vision-Centric Approach to Optimizing Multimodal Large Language Models

deepset Unveils Studio Tool to Revolutionize AI Pipeline Development with Visual Architecting, Native Integrations to deepset Cloud, and NVIDIA AI Enterprise for Seamless Deployment

Demystifying Vision-Language Models: An In-Depth Exploration

This AI Paper Introduces a Comprehensive Analysis of Computer Vision Backbones: Unveiling the Strengths and Weaknesses of Pretrained Models

40+ Cool AI Tools You Should Check Out (November 2023)

Continuous Arcade Learning Environment (CALE): Advancing the Capabilities of Arcade Learning Environment

AWS AI Labs Introduce CodeSage: A Bidirectional Encoder Representation Model for Source Code

Top healthcare use cases in 2023 that improved patient outcomes.

Researchers from Imperial College and GSK AI Introduce RAmBLA: A Machine Learning Framework for Evaluating the Reliability of LLMs as Assistants in the Biomedical Domain

AI gains momentum in core manufacturing services functions

Copyright

Press releases

Subscription

Advertising

Editorial Policy

Comment Policy