Efficient feature selection via genetic algorithms

Genetic algorithms are highlighted as an efficient tool for feature selection in large datasets, showcasing how it can be beneficial in minimizing the objective function via population-based evolution and selection. A comparison with other methods is provided, indicating the potential and computational demands of genetic algorithms. For more in-depth details, the full article can be referenced.

“`html

Efficient Feature Selection via Genetic Algorithms

Using evolutionary algorithms for fast feature selection with large datasets

This is the final part of a two-part series about feature selection. Part 1 will be linked here when it’s published.

Brief recap: when fitting a model to a dataset, you may want to select a subset of the features (as opposed to using all features), for various reasons. But even if you have a clear objective function to search for the best combination of features, the search may take a long time if the number of features N is very large. Finding the best combination is not always easy. Brute-force searching generally does not work beyond several dozens of features. Heuristic algorithms are needed to perform a more efficient search.

If you have N features, what you’re looking for is an N-length vector [1, 1, 0, 0, 0, 1, …] with values from {0, 1}. Each vector component corresponds to a feature. 0 means the feature is rejected, 1 means the feature is selected. You need to find the vector that minimizes the cost / objective function you’re using.

In this article, we will look at another evolutionary technique: genetic algorithms. The context (dataset, model, objective) remains the same.

GA — Genetic Algorithms

Genetic algorithms are inspired by biological evolution and natural selection. In nature, living beings are (loosely speaking) selected for the genes (traits) that facilitate survival and reproductive success, in the context of the environment where they live.

Here’s GA in a nutshell. Start by generating a population of individuals (vectors), each vector of length N. The vector component values (genes) are randomly chosen from {0, 1}. After the population is created, evaluate each individual via the objective function. Now perform selection: keep the individuals with the best objective values, and discard those with the worst values. Once the best individuals have been selected, and the less fit ones have been discarded, it’s time to introduce variation in the gene pool via two techniques: crossover and mutation.

After all that, the algorithm loops back: the individuals are again evaluated via the objective function, selection occurs, then crossover, mutation, etc. Various stopping criteria can be used: the loop may break if the objective function stops improving for some number of generations. Or you may set a hard stop for the total number of generations evaluated. Regardless, the individuals with the best objective values should be considered to be the “solutions” to the problem.

GA has several hyperparameters you can tune: population size, mutation probabilities, crossover probability, selection strategies, etc.

GA for feature selection, in code

Here’s a simplified GA code that can be used for feature selection. It uses the deap library, which is very powerful, but the learning curve may be steep. This simple version, however, should be clear enough.

The code creates the objects that define an individual and the whole population, along with the strategies used for evaluation (objective function), crossover / mating, mutation, and selection. It starts with a population of 300 individuals, and then calls eaSimple() (a canned sequence of crossover, mutation, selection) which runs for only 10 generations, for simplicity.

This simple code is easy to understand, but inefficient. Check the notebook in the repository for a more complex version of the GA code, which I am not going to quote here. However, running the more complex, optimized code from the notebook for 1000 generations produces these results:

Comparison between methods

We’ve tried three different techniques: SFS, CMA-ES, and GA. How do they compare in terms of the best objective found, and the time it took to find it?

These tests were performed on an AMD Ryzen 7 5800X3D (8/16 cores) machine, running Ubuntu 22.04, and Python 3.11.7. SFS and GA are running the objective function via a multiprocessing pool with 16 workers. CMA-ES is single-process — running it multi-process did not seem to provide significant improvements, but I’m sure that could change if more work is dedicated to making the algorithm parallel.

These are the run times. For SFS it’s the total run time. For CMA-ES and GA it’s the time to the best solution. Less is better.

SFS: 44.850 sec

GA: 157.604 sec

CMA-ES: 46.912 sec

The number of times the objective function was invoked:

SFS: 22791

GA: 600525

CMA-ES: 20000

The best values found for the objective function — less is better:

SFS: 33708.9860

GA: 33705.5696

CMA-ES: 33703.0705

CMA-ES, running in a single process, found the best objective function of all. Its run time was on par with SFS. It only invoked the objective function 20k times, the lowest of all methods. And it could probably be improved even more.

GA was able to beat SFS at the objective function, running the objective function on as many CPU cores as were available, but it’s by far the slowest. It invoked the objective function more than an order of magnitude more times than the other methods. Further hyperparameter optimizations may improve the outcome.

SFS is quick (running on all CPU cores), but its performance is modest. It’s also the simplest algorithm by far.

If you just want a quick estimate of the best feature set, using a simple algorithm, SFS is not too bad.

OTOH, if you want the absolute best objective value, CMA-ES seems to be the top choice.

Discover how AI can redefine your way of work. Identify Automation Opportunities: Locate key customer interaction points that can benefit from AI. Define KPIs: Ensure your AI endeavors have measurable impacts on business outcomes. Select an AI Solution: Choose tools that align with your needs and provide customization. Implement Gradually: Start with a pilot, gather data, and expand AI usage judiciously.

For AI KPI management advice, connect with us at hello@itinai.com. And for continuous insights into leveraging AI, stay tuned on our Telegram or Twitter.

Spotlight on a Practical AI Solution:

Consider the AI Sales Bot from itinai.com/aisalesbot designed to automate customer engagement 24/7 and manage interactions across all customer journey stages.

Discover how AI can redefine your sales processes and customer engagement. Explore solutions at itinai.com.

“`

List of Useful Links:

AI Lab in Telegram @aiscrumbot – free consultation

Efficient feature selection via genetic algorithms

Towards Data Science – Medium

Twitter – @itinaicom

Unleash Your Creative Potential with AI Agents

Competitors are already using AI Agents

Business Problems We Solve

Automation of internal processes.
Optimizing AI costs without huge budgets.
Training staff, developing custom courses for business needs
Integrating AI into client work, automating first lines of contact

Large and Medium Businesses

Startups

Offline Business

Get a plan to reduce routine and improve metrics

100% of clients report increased productivity and reduced operati

AI Agents

Localization Project Manager – Coordinating translation workflows, answering vendor or process-related questions.

Job Title: Localization Project Manager Overview The Localization Project Manager plays a vital role in coordinating translation workflows while addressing vendor and process-related queries. This position is crucial for ensuring that translation projects are executed efficiently…
AI Agents

Environmental Health & Safety Officer – Answering compliance-related questions, retrieving safety protocols or audit histories.

Professional Summary The AI-driven Environmental Health & Safety Officer is a reliable and effective digital team member that performs repetitive and time-consuming tasks with remarkable speed, accuracy, and stability. By automating these tasks, it frees up…
AI Agents

Legal Contract Reviewer – Auto-flagging clause inconsistencies or retrieving precedent cases for review.

Job Title: Legal Contract Reviewer – Auto-flagging Clause Inconsistencies or Retrieving Precedent Cases for Review The AI functions as a reliable and effective digital team member that excels in performing repetitive and time-consuming tasks. With remarkable…
AI Agents

Customer Retention Analyst – Creating customer summaries, identifying churn risk patterns, and suggesting retention steps.

Customer Retention Analyst Professional Summary A highly analytical and detail-oriented Customer Retention Analyst with a proven track record in creating comprehensive customer summaries, identifying churn risk patterns, and suggesting effective retention strategies. Adept at leveraging data-driven…

Itinai.com httpss.mj.runmrqch2uvtvo russian handsome charisma 9fdbb2d5 a55b 425d 8f3b 76d26f86710f 2

AI Business Accelerator

Start Your AI Business in Just a Week with itinai.com

You’re a great fit if you:

Have an audience (even 500+ followers in Instagram, email, etc.)
Have an idea, service, or product you want to scale
Can invest 2–3 hours a day
You’re motivated to earn with AI but don’t want to handle technical setup

AI news and solutions

Chinese startup Zhipu secures 2.5 billion yuan ($340 million) in funding

China’s Zhipu AI, a startup founded by a professor from Tsinghua University, has raised 2.5 billion yuan ($340 million) in funding. The company has released a bilingual AI model, ChatGLM-6B, that understands Chinese and English, as…

AI Tech News
Microsoft Research Introduces Gigapath: A Novel Vision Transformer For Digital Pathology

Digital Pathology Revolution with Gigapath Transforming Medical Diagnostics and Research Digital pathology converts traditional glass slides into digital images for viewing, analysis, and storage. Advances in imaging technology and software drive this transformation, with significant implications…

AI Tech News
VEnhancer: A Generative Space-Time Enhancement Method for Video Generation

Recent Advances in Video Generation Advancements in Video Technology Recent advancements in video generation have been driven by large models trained on extensive datasets, employing techniques like adding layers to existing models and joint training. Some…

AI Tech News
Implementing Content Moderation for Mistral Agents: A Guide for AI Developers

In the world of artificial intelligence, ensuring safe and responsible interactions is paramount. This article dives into implementing content moderation for Mistral agents, a critical step for developers and business leaders who want to maintain ethical…

AI Tech News
What’s next for generative video

OpenAI’s generative video model, Sora, showcases advancements in video generation. Competitors like Haiper are working on similar technologies. The potential for generative video is vast, impacting fields from marketing to filmmaking. However, challenges like control and…

AI Tech News
Optimizing Reinforcement Learning for LLMs: Focus on High-Entropy Tokens

In the field of artificial intelligence, particularly with Large Language Models (LLMs), there is an ongoing effort to refine the training processes that enhance their reasoning skills. A recent study introduced an innovative approach called High-Entropy…

AI Tech News
KGGen: Advancing Knowledge Graph Extraction with Language Models and Clustering Techniques

Understanding Knowledge Graphs and Their Challenges Knowledge graphs (KGs) are essential for AI applications, but they often lack important connections, making them less effective. Established KGs like DBpedia and Wikidata miss key entity relationships, which limits…

AI Tech News
Anthropic AI Launches the Anthropic Economic Index: A Data-Driven Look at AI’s Economic Role

Understanding AI’s Role in the Economy Artificial Intelligence (AI) is becoming a key player in many industries, but there’s a lack of solid evidence about how it’s actually being applied. Traditional research methods, like surveys and…

AI Tech News
Italy’s data protection authority raise data privacy concerns over ChatGPT

Italy’s data protection authority, Garante, probes OpenAI’s ChatGPT over potential GDPR violations. Concerns relate to mishandling of personal data, lack of age verification, and generation of inaccurate user information. OpenAI asserts GDPR compliance and minimal personal…

AI Tech News
Meet MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue

Understanding Large Language Models (LLMs) Large Language Models (LLMs) are advanced tools that can understand and generate human-like text. However, they can be vulnerable to attacks, particularly through a method known as jailbreaking. This occurs when…

AI Tech News
Courage to Learn ML: Demystifying L1 & L2 Regularization (part 3)

L0.5, L3, and L4 regularizations are uncommon due to their non-convex nature and lack of unique benefits over L1/L2 regularizations. Non-convex L0.5 is complex, while higher norms like L3 and L4 don’t offer significant advantages and…

AI Tech News
Nvidia AI Introduces NV-Retriever-v1: An Embedding Model Optimized for Retrieval

Practical Solutions for Text Retrieval Importance of Hard-Negative Mining Text retrieval is crucial for applications like searching, question answering, and item recommendation. Hard-negative mining methods play a key role in improving the performance of text retrieval…

AI Tech News
Researchers from Yale and Google Introduce HyperAttention: An Approximate Attention Mechanism Accelerating Large Language Models for Efficient Long-Range Sequence Processing

Researchers from Yale and Google have developed a groundbreaking solution called “HyperAttention” to address the computational challenges of processing long sequences in large language models. This algorithm efficiently approximates attention mechanisms, simplifying complex computations and achieving…

AI Tech News
Google AI Researchers Introduced a Set of New Methods for Enhancing Long-Context LLM Performance in Retrieval-Augmented Generation

Understanding Long-Context Language Models (LLMs) Large language models (LLMs) have transformed many areas by improving data processing, problem-solving, and understanding human language. A key innovation is retrieval-augmented generation (RAG), which enables LLMs to pull information from…

AI Tech News
This AI Paper from the University of Michigan and Netflix Proposes CLoVe: A Machine Learning Framework to Improve the Compositionality of Pre-Trained Contrastive Vision-Language Models

The CLOVE framework, developed by researchers at the University of Michigan and Netflix, significantly enhances compositionality in pre-trained Contrastive Vision-Language Models (VLMs) while maintaining performance on other tasks. Through data curation, hard negatives, and model patching,…

AI Tech News
Meet MMToM-QA: A Multimodal Theory of Mind Question Answering Benchmark

Recent advancements in machine learning show potential in understanding Theory of Mind (ToM), crucial for human-like social intelligence in machines. MIT and Harvard introduced a Multimodal Theory of Mind Question Answering (MMToMQA) benchmark, assessing machine ToM…

AI Tech News
SneakyPrompts can jailbreak Stable Diffusion and DALL-E

Researchers from Duke and Johns Hopkins Universities have developed an approach called SneakyPrompt that bypasses safety filters in generative AI models like Stable Diffusion and DALL-E to generate explicit or violent images. By replacing banned words…

AI Tech News
Airbnb uses AI to wage war on house parties

Airbnb has implemented AI technology to combat house parties and protect property owners from potential damages. The system scans for red flags during the booking process, including account creation date, location proximity, and stay duration. If…

AI Tech News
DeepSeek AI Releases DeepEP: An Open-Source EP Communication Library for MoE Model Training and Inference

Large language models utilizing the Mixture-of-Experts (MoE) architecture have significantly enhanced model capacity without a proportional increase in computational demands. However, this advancement presents challenges, particularly in GPU communication. In MoE models, only a subset of…

AI Tech News
NVIDIA Utilizes Generative AI to Design Semiconductors: ChipNeMo

NVIDIA has released a groundbreaking research paper demonstrating how generative artificial intelligence (AI) can revolutionize semiconductor design. The study reveals that large language models (LLMs) can benefit specialized fields like chip design. NVIDIA’s custom LLM called…

AI Tech News