Researchers have proposed SMPLer-X, a generalist foundation model for 3D/4D human motion capture from monocular inputs. The model shows impressive generalization capabilities and outperforms previous benchmark results. The research highlights the need for more diverse and extensive datasets for accurate human pose and shape estimation. The researchers also emphasize the value of utilizing multiple datasets and provide advice for future dataset gathering. SMPLer-X serves as a basis for future field research and has the potential to become a powerful specialist model.
Introducing SMPLer-X: A Generalist Foundation Model for 3D/4D Human Motion Capture from Monocular Inputs
The field of expressive human pose and shape estimation (EHPS) from monocular photos or videos has the potential to benefit industries such as animation, gaming, and fashion. To accurately represent the complex human anatomy, face, and hands, this field relies on parametric human models like SMPL-X. However, the current state-of-the-art approaches are limited to a small number of datasets, hindering their performance in various scenarios and limiting their generalization abilities.
To address this challenge, researchers from Nanyang Technological University, SenseTime Research, Shanghai AI Laboratory, The University of Tokyo, and the International Digital Economy Academy (IDEA) have conducted in-depth research to build reliable and globally applicable models for EHPS. They analyzed 32 datasets and created the first systematic benchmark for EHPS, highlighting the inconsistencies between benchmarks and the need for data scaling to address domain gaps.
Their research emphasizes the value of using multiple datasets with complementary characteristics and provides helpful advice for future dataset gathering. They also developed SMPLer-X, a generalist foundation model trained using a variety of datasets, which demonstrates remarkable performance and generalization capabilities across various scenarios.
SMPLer-X outperforms benchmark results and challenges the practice of restricted dataset training, reducing primary errors on major benchmarks from over 110 mm to below 70 mm. It also successfully adapts to new scenarios. The researchers demonstrate the effectiveness of optimizing their generalist foundation models to become domain-specific experts, achieving exceptional performance across the board.
In addition to their contributions in constructing a systematic benchmark and investigating data and model scaling, the researchers employ a data selection methodology that enables their specialized models to achieve state-of-the-art performance on specific benchmarks.
To learn more about their research, you can check out the paper, project page, and GitHub repository. This research showcases the potential of SMPLer-X and how it can be applied to 3D/4D human motion capture from monocular inputs.
If you’re interested in incorporating AI into your company to stay competitive and improve your processes, consider leveraging the capabilities of SMPLer-X. AI can redefine the way you work and help you identify automation opportunities, define measurable KPIs, select the right AI solution for your needs, and implement it gradually for maximum impact.
For AI KPI management advice and continuous insights into leveraging AI, feel free to connect with us at hello@itinai.com. And don’t forget to check out our AI Sales Bot, designed to automate customer engagement and manage interactions across all stages of the customer journey, at itinai.com/aisalesbot.
Discover how AI can revolutionize your sales processes and customer engagement. Explore our solutions at itinai.com.