Create a Knowledge Graph from Unstructured Medical Data Using LLMs

Creating a Knowledge Graph Using an LLM

In the realm of artificial intelligence, one of the most interesting applications is the creation of Knowledge Graphs from unstructured data. This article will explore how to construct a Knowledge Graph from a medical log using a Large Language Model (LLM) like GPT-4o-mini. Unlike traditional Natural Language Processing (NLP) methods, which may struggle with messy data, LLMs provide enhanced accuracy and context, making them invaluable for tasks involving complex information.

Getting Started

To embark on creating a Knowledge Graph, we first need to prepare our environment. This involves installing some important dependencies using Python:

!pip install "mirascope[openai]" matplotlib networkx

Obtaining an OpenAI API Key

To use GPT-4o-mini, you’ll need an OpenAI API key. This can be obtained by visiting the OpenAI API Keys page and generating a new key. Keep in mind that new users may need to enter billing details and make an initial payment of $5 to activate their API access.

import os
from getpass import getpass
os.environ["OPENAI_API_KEY"] = getpass('Enter OpenAI API Key: ')

Defining a Graph Schema

Next, we need a way to structure the information we want to extract. A simple schema can be defined using Pydantic, encompassing nodes, edges, and the overall Knowledge Graph. Here’s how we can represent our schema:

from pydantic import BaseModel, Field

class Edge(BaseModel):
    source: str
    target: str
    relationship: str

class Node(BaseModel):
    id: str
    type: str
    properties: dict | None = None

class KnowledgeGraph(BaseModel):
    nodes: list[Node]
    edges: list[Edge]

Defining the Patient Log

To generate our Knowledge Graph, we need unstructured data. Let’s take a look at a sample patient log for a patient named Mary:

patient_log = """
Mary called for help at 3:45 AM, reporting that she had fallen while going to the bathroom. This marks the second fall incident within a week. She complained of dizziness before the fall.
Earlier in the day, Mary was observed wandering the hallway and appeared confused when asked basic questions. She was unable to recall the names of her medications and asked the same question multiple times.
Mary skipped both lunch and dinner, stating she didn't feel hungry. When the nurse checked her room in the evening, Mary was lying in bed with mild bruising on her left arm and complained of hip pain.
Vital signs taken at 9:00 PM showed slightly elevated blood pressure and a low-grade fever (99.8°F). Nurse also noted increased forgetfulness and possible signs of dehydration.
This behavior is similar to previous episodes reported last month.
"""

Generating the Knowledge Graph

With our schema and patient log defined, we can now harness the power of LLMs to extract structured insights from unstructured text. The following function will analyze the patient log and identify entities and relationships:

from mirascope.core import openai, prompt_template

@openai.call(model="gpt-4o-mini", response_model=KnowledgeGraph)
@prompt_template(
    """
    SYSTEM:
    Extract a knowledge graph from this patient log.
    Use Nodes to represent people, symptoms, events, and observations.
    Use Edges to represent relationships like "has symptom", "reported", "noted", etc.

    The log:
    {log_text}
    """
)
def generate_kg(log_text: str) -> openai.OpenAIDynamicConfig:
    return {"log_text": log_text}
kg = generate_kg(patient_log)
print(kg)

Querying the Graph

Once the Knowledge Graph is created, we can query it to find out more about the patient’s health risks or concerns. Below is a function that takes a natural language question and the structured graph, allowing us to retrieve informative responses:

@openai.call(model="gpt-4o-mini")
@prompt_template(
    """
    SYSTEM:
    Use the knowledge graph to answer the user's question.

    Graph:
    {knowledge_graph}

    USER:
    {question}
    """
)
def run(question: str, knowledge_graph: KnowledgeGraph): ...

question = "What health risks or concerns does Mary exhibit based on her recent behavior and vitals?"
print(run(question, kg))

Visualizing the Graph

To better understand the relationships in our Knowledge Graph, we can visualize it. Below is how we can create a clear and interactive representation using Matplotlib and NetworkX:

import matplotlib.pyplot as plt
import networkx as nx

def render_graph(kg: KnowledgeGraph):
    G = nx.DiGraph()

    for node in kg.nodes:
        G.add_node(node.id, label=node.type, **(node.properties or {}))

    for edge in kg.edges:
        G.add_edge(edge.source, edge.target, label=edge.relationship)

    plt.figure(figsize=(15, 10))
    pos = nx.spring_layout(G)
    nx.draw_networkx_nodes(G, pos, node_size=2000, node_color="lightgreen")
    nx.draw_networkx_edges(G, pos, arrowstyle="->", arrowsize=20)
    nx.draw_networkx_labels(G, pos, font_size=12, font_weight="bold")
    edge_labels = nx.get_edge_attributes(G, "label")
    nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels, font_color="blue")
    plt.title("Healthcare Knowledge Graph", fontsize=15)
    plt.show()

render_graph(kg)

Conclusion

In summary, creating a Knowledge Graph from unstructured medical logs using LLMs like GPT-4o-mini provides a powerful tool for extracting valuable insights. By structuring the data effectively, practitioners can gain a clearer understanding of patient conditions and risks. This approach not only enhances decision-making in healthcare but also opens avenues for further research and application in diverse fields.

FAQs

What is a Knowledge Graph? A Knowledge Graph is a structured representation of information, where entities are connected by relationships, making it easier to understand complex data.
How do LLMs improve the extraction of information? LLMs excel in understanding context and semantics, which allows them to extract relevant entities and relationships more accurately from unstructured data.
What is the importance of defining a schema? Defining a schema is crucial as it provides a framework for organizing information and ensures that the data extracted is meaningful and usable.
Can this method be applied to other fields? Yes, the principles outlined can be adapted to various fields such as finance, legal research, and customer support to extract and visualize insights from unstructured data.
How can I visualize the Knowledge Graph? You can use libraries like Matplotlib and NetworkX in Python to create clear graphical representations of your Knowledge Graph.

Unleash Your Creative Potential with AI Agents

Competitors are already using AI Agents

Business Problems We Solve

Automation of internal processes.
Optimizing AI costs without huge budgets.
Training staff, developing custom courses for business needs
Integrating AI into client work, automating first lines of contact

Large and Medium Businesses

Startups

Offline Business

Get a plan to reduce routine and improve metrics

100% of clients report increased productivity and reduced operati

AI Agents

Localization Project Manager – Coordinating translation workflows, answering vendor or process-related questions.

Job Title: Localization Project Manager Overview The Localization Project Manager plays a vital role in coordinating translation workflows while addressing vendor and process-related queries. This position is crucial for ensuring that translation projects are executed efficiently…
AI Agents

Environmental Health & Safety Officer – Answering compliance-related questions, retrieving safety protocols or audit histories.

Professional Summary The AI-driven Environmental Health & Safety Officer is a reliable and effective digital team member that performs repetitive and time-consuming tasks with remarkable speed, accuracy, and stability. By automating these tasks, it frees up…
AI Agents

Legal Contract Reviewer – Auto-flagging clause inconsistencies or retrieving precedent cases for review.

Job Title: Legal Contract Reviewer – Auto-flagging Clause Inconsistencies or Retrieving Precedent Cases for Review The AI functions as a reliable and effective digital team member that excels in performing repetitive and time-consuming tasks. With remarkable…
AI Agents

Customer Retention Analyst – Creating customer summaries, identifying churn risk patterns, and suggesting retention steps.

Customer Retention Analyst Professional Summary A highly analytical and detail-oriented Customer Retention Analyst with a proven track record in creating comprehensive customer summaries, identifying churn risk patterns, and suggesting effective retention strategies. Adept at leveraging data-driven…

Itinai.com httpss.mj.runmrqch2uvtvo russian handsome charisma 9fdbb2d5 a55b 425d 8f3b 76d26f86710f 2

AI Business Accelerator

Start Your AI Business in Just a Week with itinai.com

You’re a great fit if you:

Have an audience (even 500+ followers in Instagram, email, etc.)
Have an idea, service, or product you want to scale
Can invest 2–3 hours a day
You’re motivated to earn with AI but don’t want to handle technical setup

AI news and solutions

Researchers at the University of Maryland Propose a Unified Machine Learning Framework for Continual Learning (CL)

AI Tech News
MIT Researchers Unveil DISCIPL: A Self-Steering Framework for Enhanced Language Model Reasoning

Introducing DISCIPL: A New Framework for Language Models Introducing DISCIPL: A New Framework for Language Models Understanding the Challenge Language models have advanced significantly, yet they still struggle with tasks requiring precise reasoning and adherence to…

AI Tech News
AI tools streamline eCommerce tasks on Shopify, eBay, and Amazon

eBay, Amazon, and Shopify are incorporating AI features to assist users in listing products and completing mundane tasks. These tools help sellers generate detailed product descriptions quickly and accurately. AI tools on platforms like Shopify are…

AI Tech News
Can Benign Data Undermine AI Safety? This Paper from Princeton University Explores the Paradox of Machine Learning Fine-Tuning

AI Tech News
MISATO: A Machine Learning Dataset of Protein-Ligand Complexes for Structure-based Drug Discovery

AI Solutions for Drug Discovery and Structural Biology Addressing Challenges with MISATO In the field of AI technology, the drug discovery community faces challenges in creating precise models for drug design. MISATO, developed by leading research…

AI Tech News
Meet Abstra: An AI-Powered Startup that Scales Business Processes with Python and AI

The Value of Abstra: AI-Powered Business Process Scaling The challenges of hiring new employees, scaling operations, and complying with new laws are common as companies grow. Improving internal processes for onboarding, customer service, and finance systems…

AI Tech News
Create an AI Agent with Google ADK: A Step-by-Step Guide

Creating an AI Agent with Google ADK: A Practical Guide Creating an AI Agent with Google ADK: A Practical Guide The Agent Development Kit (ADK) is a powerful, open-source Python framework designed for developers to create,…

AI News
Salesforce AI Researchers Propose BootPIG: A Novel Architecture that Allows a User to Provide Reference Images of an Object in Order to Guide the Appearance of a Concept in the Generated Images

The research paper by Salesforce AI introduces BootPIG, a novel architecture for personalized image generation in text-to-image models. BootPIG uses RSA layers to guide image generation based on reference object features. Training uses synthetic data generation…

AI Tech News
MicroPython Testbed for Federated Learning Algorithms (MPT-FLA) Framework Advancing Federated Learning at the Edge

The Practical Solutions and Value of MPT-FLA Framework for Federated Learning at the Edge Introduction The MPT-FLA (MicroPython Testbed for Federated Learning Algorithms) framework provides practical solutions for developing decentralized and distributed applications for edge systems.…

AI Tech News
Tensoic AI Releases Kan-Llama: A 7B Llama-2 LoRA PreTrained and FineTuned on ‘Kannada’ Tokens

Tensoic introduced Kannada Llama (Kan-LLaMA), aiming to overcome limitations of language models (LLMs) by emphasizing the importance of open models for natural language processing and machine translation. The paper presents the solution for enhancing efficiency of…

AI Tech News
Revolutionize AI Safety with Qwen3Guard: Real-Time Multilingual Guardrail Models for Developers and Enterprises

Understanding Qwen3Guard and Its Impact on AI Safety In an era where artificial intelligence (AI) is rapidly evolving, the need for robust safety measures has never been more crucial. Alibaba’s Qwen team has stepped up to…

AI Tech News
Exploring Well-Designed Machine Learning (ML) Codebases [Discussion]

The Reddit post initiated a discussion on well-designed ML projects. Beyond Jupyter was recommended for enhancing ML software architecture, emphasizing OOP and design concepts. Scikit-learn stood out for intuitive design and user-friendliness. Other projects like Easy…

AI Tech News
Privacy Implications and Comparisons of Batch Sampling Methods in Differentially Private Stochastic Gradient Descent (DP-SGD)

Differentially Private Stochastic Gradient Descent (DP-SGD) DP-SGD is an important method for training machine learning models while keeping data private. It enhances the standard gradient descent by: Clipping individual gradients to a fixed size. Adding noise…

AI Tech News
RhoFold+: A Deep Learning Framework for Accurate RNA 3D Structure Prediction from Sequences

Understanding RNA 3D Structure Prediction Predicting the 3D structures of RNA is essential for grasping its biological roles, enhancing drug discovery, and advancing synthetic biology. However, RNA’s flexible nature and the scarcity of experimental data create…

AI Tech News
Meet TravelPlanner: A Comprehensive AI Benchmark Designed to Evaluate the Planning Abilities of Language Agents in Real-World Scenarios Across Multiple Dimensions

Researchers from Fudan University, Ohio State University, and Pennsylvania State University, Meta AI, have developed TravelPlanner, an AI benchmark to evaluate agents’ planning skills in realistic scenarios. It challenges AI agents to plan multi-day travel itineraries,…

AI Tech News
How to Calculate Cost Per Interaction in a Contact Center

Contact centers can improve efficiency by calculating and analyzing Cost Per Interaction (CPI). This metric considers labor costs, overhead costs, and technology infrastructure costs. To calculate CPI, divide total costs by the number of customer interactions.…

Support Ai News
ChatGPT shows strengths in emulating the peer review process

Researchers are finding that ChatGPT, OpenAI’s advanced language model, can provide useful feedback as an alternative to human reviewers in the peer review process. In a study, over 50% of ChatGPT’s comments on Nature papers and…

AI Tech News
OpenAI Introduces Sora: The Future of Video Generation with AI

OpenAI’s innovative text-to-video model, Sora, is transforming digital content creation. It offers unparalleled capabilities to generate, extend, and animate high-quality videos with remarkable detail. By leveraging spacetime patches and recaptioning techniques, Sora demonstrates diverse applications, showcasing…

AI Tech News
FinData Explorer: A Step-by-Step Tutorial Using BeautifulSoup, yfinance, matplotlib, ipywidgets, and fpdf for Financial Data Extraction, Interactive Visualization, and Dynamic PDF Report Generation

“`html Building an Advanced Financial Data Reporting Tool In this tutorial, we will guide you through creating a financial data reporting tool using Google Colab and various Python libraries. You will learn to: Scrape live financial…

AI Tech News
This AI Paper Introduces the Diffusion World Model (DWM): A General Framework for Leveraging Diffusion Models as World Models in the Context of Offline Reinforcement learning

Reinforcement learning encompasses model-based (MB) and model-free (MF) algorithms. The Diffusion World Model (DWM) is a novel approach addressing inaccuracies in world modeling. DWM predicts long-horizon outcomes and enhances RL performance. By combining MB and MF…

AI Tech News