Graph methods can be used to perform inference on tabular datasets in machine learning tasks. By representing tabular data as a graph, new possibilities for prediction and inference can be opened up. The article demonstrates the use of graph methods through examples and highlights the advantages of using graphs in data science.
**Graph Data Science for Tabular Data: A Practical AI Solution for Middle Managers**
Graph methods are not just limited to data with an obvious graphical structure. They can also be applied to tabular datasets used in machine learning tasks, opening up new possibilities for inference. By representing tabular data as a graph, we can utilize the rich network of relationships between instances and improve the estimate of the probability distribution.
To demonstrate this, let’s consider the example of the Credit Approval dataset. The objective is to predict the value of Approval based on the values of other attributes. Instead of using traditional classification algorithms, let’s explore how we can approach this using graphs.
**Graph Representation**
To represent the data as a graph, we assign one node to each instance and one node for each possible attribute value. The connections between instance nodes and attribute value nodes reflect the information in the table. By capturing the shared attribute values between instances, we can determine their similarity. Here is the graph representation of the Credit Approval dataset.
![Graph representation for Credit Approval dataset](image-link)
**Message Passing**
To predict unknown attribute values, we use the concept of message passing. The procedure is as follows:
1. Initiate a message with a value of 1 at the starting node.
2. Let the starting node pass the message to each connected node.
3. Each node that receives a message passes it (dilated by a factor k) to other connected nodes.
4. Continue message passing until a target node is reached or there are no further nodes to pass the message to.
After message passing is completed, each node in the graph will have received zero or more messages. Sum these values for each node belonging to the target attribute and normalize them. Interpret the normalized values as probabilities. These probabilities can be used to predict the unknown attribute value or impute a random value drawn from the distribution.
**Example 1**
Let’s predict the value of Approval given that Income is Low. The arrows on the graph illustrate the message-passing procedure. The thickness of each arrow represents the message value diluted at each hop. Based on this, we have the following probabilities for Approval, conditional on Income is Low:
– Prob (Approval is ‘Yes’ | Income is Low): 20%
– Prob (Approval is ‘No’ | Income is Low): 80%
These probabilities are different from what we would have obtained from a count-based prediction from the table. The message-passing procedure takes into account the shared attribute values between instances and provides a more accurate probability estimate.
**Example 2**
The message-passing procedure can also be used when conditioning on more than one attribute. In this case, we initiate a message at each node corresponding to the attribute values we are conditioning on. The graph below shows the result of predicting the value of Approval given Income is Low and Education is Graduate.
![Estimating the distribution of Approval given Income is Low and Education is Graduate](image-link)
**The UNCRi Framework**
At Skanalytix, we have developed a graph-based computational framework called Unified Numerical/Categorical Representation and Inference (UNCRi). This framework combines a unique graph-based data representation with a flexible inference procedure. It can be used for tasks such as classification, regression, missing value imputation, anomaly detection, and synthetic data generation. The framework is robust to extremities in the data and can handle categorical variables of varying cardinality, numerical variables with different distributions, and high missing-value ratios.
**Conclusion**
Graph methods offer a powerful and flexible alternative to traditional vector-based approaches in AI. By applying graph methods to tabular data, we can not only predict attribute values but also generate synthetic datasets with similar distributions. Graph Data Science for Tabular Data provides a practical AI solution for middle managers to improve decision-making processes, automate customer engagement, and drive business outcomes.
To learn more about AI solutions and how they can transform your company, connect with us at hello@itinai.com. Stay updated with the latest insights into leveraging AI on our Telegram channel t.me/itinainews or Twitter @itinaicom. Explore the AI Sales Bot from itinai.com/aisalesbot, designed to automate customer engagement and manage interactions across all customer journey stages. Discover how AI can redefine your sales processes and customer engagement. Visit itinai.com for more information.