Streamlining ETL data processing at Talent.com with Amazon SageMaker

Talent.com, founded in 2011, offers a unified job search platform covering 75+ countries, 30M+ job listings, and various languages and industries. It collaborates with AWS to develop a job recommendation engine using deep learning. The large-scale data processing pipeline handles JSON Lines from S3, extracting and refining features for the recommendation engine. The pipeline significantly shortened the time needed to deploy the ML pipeline to production.

 Streamlining ETL data processing at Talent.com with Amazon SageMaker



Solution Overview

Talent.com, in collaboration with AWS, has built a cutting-edge job recommendation engine using Amazon SageMaker. This engine is capable of handling over 30 million job listings from various sources and employs deep learning techniques to provide personalized job recommendations to users. To facilitate the processing of this extensive amount of data, a three-phase ETL (extract, transform, and load) pipeline has been developed, leveraging Amazon SageMaker Processing, AWS Glue, Amazon Athena, and Python libraries for efficient feature extraction and data management.

Phase 1: Process Raw JSONL Files

The pipeline utilises Amazon SageMaker Processing jobs to handle raw JSONL files associated with specific days, performing feature extraction and data compaction. By parallelising the processing of each JSONL file, the pipeline ensures efficient extraction and compaction, ultimately saving the processed features into Parquet files and uploading them to Amazon S3. This enables efficient crawling and SQL queries in subsequent pipeline stages.

Phase 2: Crawl Processed Data Using AWS Glue

Once the raw data for multiple days has been processed, an Athena table is created using an AWS Glue crawler. This step allows for the creation of a table from the processed data, providing seamless management of large volumes of features for subsequent model training.

Phase 3: Load Processed Features for Training

Processed features for a specified date range are loaded from the Athena table using SQL, enabling seamless integration with the training of the job recommender model. The solution simplifies these tasks and allows for quick path-to-production for both Data Scientists and ML Engineers.

Solution Benefits

The implemented solution offers multiple advantages, including simplified implementation, quick path-to-production, reusability, efficiency, and support for incremental updates. It enables Talent.com to process large volumes of data, leveraging the ETL pipeline to create training data and deploy the recommendation system into production within a short timeframe. Ultimately, the solution has led to significant improvements in performance, including an 8.6% increase in clickthrough rate in A/B testing, highlighting its tangible impact on connecting users with relevant job opportunities.

Conclusion

The ETL pipeline outlined in this post has played a crucial role in enabling Talent.com to build and deploy their job recommendation system efficiently. Using Amazon SageMaker Processing jobs, the pipeline has streamlined feature extraction and provided the necessary infrastructure for developing and deploying ML models at scale. The authors encourage readers to explore the potential of this pipeline and its applicability to various use-cases, emphasising its reusability and efficiency in streamlining AI and ML workflows.

About the Authors

The team contributing to this solution includes experts from both Amazon Machine Learning Solutions Lab and Talent.com, bringing a wealth of experience in AI, machine learning, and technology solutions. Their collaborative efforts have resulted in a practical and impactful AI solution that significantly benefits Talent.com’s workforce connections and user engagement.

Spotlight on a Practical AI Solution

Discover the AI Sales Bot from itinai.com/aisalesbot, designed to automate customer engagement 24/7 and manage interactions across all customer journey stages. Explore how AI can redefine your sales processes and customer engagement, unlocking new opportunities for business growth and customer satisfaction.


List of Useful Links:

AI Products for Business or Try Custom Development

AI Sales Bot

Welcome AI Sales Bot, your 24/7 teammate! Engaging customers in natural language across all channels and learning from your materials, it’s a step towards efficient, enriched customer interactions and sales

AI Document Assistant

Unlock insights and drive decisions with our AI Insights Suite. Indexing your documents and data, it provides smart, AI-driven decision support, enhancing your productivity and decision-making.

AI Customer Support

Upgrade your support with our AI Assistant, reducing response times and personalizing interactions by analyzing documents and past engagements. Boost your team and customer satisfaction

AI Scrum Bot

Enhance agile management with our AI Scrum Bot, it helps to organize retrospectives. It answers queries and boosts collaboration and efficiency in your scrum processes.