LongPiBench: A Comprehensive Benchmark that Explores How Even the Top Large Language Models have Relative Positional Biases

Understanding Positional Biases in Large Language Models

Assessing Large Language Models (LLMs) accurately requires tackling complex tasks with lengthy input sequences, sometimes exceeding 200,000 tokens. In response, LLMs have improved to handle context lengths of up to 1 million tokens. However, researchers have identified challenges, particularly the “Lost in the Middle Effect,” where models struggle to process information located in the middle of long inputs. Traditional assessments assumed information was concentrated in specific areas, but in reality, it is often scattered, leading to biases based on relative positions.

Introducing LongPiBench

Researchers from Tsinghua University and ModelBest Inc. developed LongPiBench, a benchmark designed to evaluate positional biases in LLMs. This tool assesses both absolute and relative information positions across tasks of varying complexity and token lengths (32k to 256k). LongPiBench includes:

Three tasks: Table SQL, Timeline Reordering, and Equation Solving.
Four context lengths: 32k, 64k, 128k, and 256k.
Sixteen levels of absolute and relative positions.

The evaluation process involves annotating seed examples and varying the positions of relevant information to understand model performance better.

Key Findings from LongPiBench

The research team tested 11 prominent LLMs, discovering that while newer models are somewhat resistant to the “Lost in the Middle Effect,” they still show biases based on the spacing of relevant information. Notable models assessed included Llama-3.1-Instruct, GPT-4o-mini, Claude-3-Haiku, and Gemini-1.5-Flash. Results indicated:

Top models struggled with timeline reordering and equation solving, achieving only about 20% accuracy.
Commercial and larger open-source models performed well with absolute positioning but faced significant challenges with relative positioning.
Relative positioning biases led to a 30% drop in recall rates, even in simple retrieval tasks.

The Importance of Addressing Positional Biases

LongPiBench emphasizes the critical need to address relative positioning biases in modern LLMs. If left unresolved, these biases could significantly hinder the effectiveness of long-text language models in real-world applications.

Explore More and Stay Connected

For further insights, check out the Paper and GitHub. Follow us on Twitter, join our Telegram Channel, and connect with our LinkedIn Group. If you enjoy our work, subscribe to our newsletter and join our 55k+ ML SubReddit.

Upcoming Live Webinar

Oct 29, 2024: Discover the best platform for serving fine-tuned models with the Predibase Inference Engine.

Leverage AI for Your Business

To stay competitive, consider using LongPiBench to enhance your AI capabilities:

Identify Automation Opportunities: Find key customer interaction points that can benefit from AI.
Define KPIs: Ensure your AI initiatives have measurable impacts on business outcomes.
Select an AI Solution: Choose tools that fit your needs and allow for customization.
Implement Gradually: Start with a pilot project, gather data, and expand AI usage wisely.

For AI KPI management advice, contact us at hello@itinai.com. For ongoing insights into leveraging AI, follow us on Telegram or Twitter.

Discover how AI can transform your sales processes and customer engagement at itinai.com.

List of Useful Links:

Unleash Your Creative Potential with AI Agents

Competitors are already using AI Agents

Business Problems We Solve

Automation of internal processes.
Optimizing AI costs without huge budgets.
Training staff, developing custom courses for business needs
Integrating AI into client work, automating first lines of contact

Large and Medium Businesses

Startups

Offline Business

Get a plan to reduce routine and improve metrics

100% of clients report increased productivity and reduced operati

AI Agents

Localization Project Manager – Coordinating translation workflows, answering vendor or process-related questions.

Job Title: Localization Project Manager Overview The Localization Project Manager plays a vital role in coordinating translation workflows while addressing vendor and process-related queries. This position is crucial for ensuring that translation projects are executed efficiently…
AI Agents

Environmental Health & Safety Officer – Answering compliance-related questions, retrieving safety protocols or audit histories.

Professional Summary The AI-driven Environmental Health & Safety Officer is a reliable and effective digital team member that performs repetitive and time-consuming tasks with remarkable speed, accuracy, and stability. By automating these tasks, it frees up…
AI Agents

Legal Contract Reviewer – Auto-flagging clause inconsistencies or retrieving precedent cases for review.

Job Title: Legal Contract Reviewer – Auto-flagging Clause Inconsistencies or Retrieving Precedent Cases for Review The AI functions as a reliable and effective digital team member that excels in performing repetitive and time-consuming tasks. With remarkable…
AI Agents

Customer Retention Analyst – Creating customer summaries, identifying churn risk patterns, and suggesting retention steps.

Customer Retention Analyst Professional Summary A highly analytical and detail-oriented Customer Retention Analyst with a proven track record in creating comprehensive customer summaries, identifying churn risk patterns, and suggesting effective retention strategies. Adept at leveraging data-driven…

Itinai.com httpss.mj.runmrqch2uvtvo russian handsome charisma 9fdbb2d5 a55b 425d 8f3b 76d26f86710f 2

AI Business Accelerator

Start Your AI Business in Just a Week with itinai.com

You’re a great fit if you:

Have an audience (even 500+ followers in Instagram, email, etc.)
Have an idea, service, or product you want to scale
Can invest 2–3 hours a day
You’re motivated to earn with AI but don’t want to handle technical setup

AI news and solutions

M42 Introduces Med42: An Open-Access Clinical Large Language Model (LLM) to Expand Access to Medical Knowledge

Abu Dhabi-based company M42 Health has released Med42, an open-access clinical large language model (LLM) designed to enhance public access to advanced AI capabilities in healthcare. Med42, built using a human-curated medical literature and patient information…

AI Tech News
Archetypal SAE: Enhancing Stability in Concept Extraction for Vision Models

Understanding the Challenges of Artificial Neural Networks Artificial Neural Networks (ANNs) have significantly advanced computer vision, but their lack of transparency poses challenges in areas that require accountability and regulatory compliance. This opacity limits their use…

AI Tech News
Advancing Speech Accessibility with Personal Voice

Introduced in May 2023 and available on iOS 17 in September 2023, Personal Voice is a voice replicator tool designed for individuals at risk of losing their ability to speak, such as those with ALS. It…

AI Tech News
MiniCTX: Advancing Context-Dependent Theorem Proving in Large Language Models

Understanding Formal Theorem Proving and Its Importance Formal theorem proving is essential for evaluating the reasoning skills of large language models (LLMs). It plays a crucial role in automating mathematical tasks. While LLMs can assist mathematicians…

AI Tech News
This Machine Learning Research Introduces Mechanistic Architecture Design (Mad) Pipeline: Encompassing Small-Scale Capability Unit Tests Predictive of Scaling Laws

AI Tech News
Prompt Engineering Could Be the Hottest Programming Language of 2024 — Here’s Why

In 2024, Large Language Models (LLMs) are expected to become the interface between humans and computer systems. Prompt Engineering, the process of writing high-quality natural language instructions for LLMs and producing code that uses conditional prompting,…

AI Tech News
This AI Paper Introduces PolyID: Pioneering Machine Learning in the Discovery of High-Performance Biobased Polymers

Artificial intelligence has proven to be a valuable tool in the field of chemistry and polymer science. By predicting chemical reactions and suggesting optimal combinations, AI helps scientists discover new materials and accelerate the development process.…

AI Tech News
NVEagle Released by NVIDIA: A Super Impressive Vision Language Model that Comes in 7B, 13B, and 13B Fine-Tuned on Chat

The Value of NVEagle Vision Language Model Enhancing Visual Perception with NVEagle Multimodal large language models (MLLMs) like NVEagle combine visual and linguistic information to understand and interpret real-world scenarios. NVEagle’s vision encoders are designed to…

AI Tech News
Next-Gen Privacy: How AI is Revolutionizing Secure Browsing and VPN Technologies for Businesses and Cybersecurity Experts

Understanding the Target Audience The audience for this article includes business leaders, IT professionals, cybersecurity experts, and privacy advocates. These individuals are eager to grasp the implications of AI in secure browsing and VPN technologies. Their…

AI Tech News
LLaSA-3B: A Llama 3.2B Fine-Tuned Text-to-Speech Model with Ultra-Realistic Audio, Emotional Expressiveness, and Multilingual Support

Transforming Human-Machine Interaction with LLaSA-3B Text-to-speech (TTS) technology is essential for improving communication between humans and machines. There is a growing need for voices that sound real, express emotions, and can speak multiple languages. Traditional TTS…

AI Tech News
Microsoft teams up with Semafor to use AI tools for news

Microsoft partners with Semafor to help journalists utilize AI for news creation. Semafor, founded by ex-BuzzFeed and Bloomberg execs, launches “Signals” with Microsoft’s backing, aiming to deliver diverse and up-to-date perspectives on global news. The use…

AI Tech News
Google AI Unveils New Benchmarks in Video Analysis with Streaming Dense Captioning Model

AI Tech News
Implementing an LLM Agent with Tool Access Using MCP-Use: A Step-by-Step Guide

Implementing an LLM Agent with Tool Access Using MCP-Use Implementing an LLM Agent with Tool Access Using MCP-Use MCP-Use is an open-source library that connects any large language model (LLM) to any MCP server. This integration…

AI News
ChatGPT’s accounting skills are put to the test

ChatGPT has shown impressive performance in various disciplines, but it struggles with math. While it has performed well in exams like medical and law schools, it falls short in accounting. A study conducted by Professor David…

AI Tech News
Latest Advancements in the Field of Multimodal AI: (ChatGPT + DALLE 3) + (Google BARD + Extensions) and many more….

The article discusses recent advancements in the field of Multimodal AI. It highlights the integration of DALLE 3 into ChatGPT, enabling the generation of comprehensive images based on user prompts. It also mentions the enhancements made…

AI Tech News
AI has lower carbon emissions than human writers and artists

The rapid growth of AI technology has led to a significant demand for natural resources in running data centers, raising concerns about its contribution to carbon emissions. Although AI training and inference processes strain resources, it…

AI Tech News
Researchers from NVIDIA Introduce Retro 48B: The Largest LLM Pretrained with Retrieval before Instruction Tuning

Researchers from Nvidia and the University of Illinois at Urbana-Champaign have developed Retro 48B, a larger language model that improves on previous retrieval-augmented models. By pre-training with retrieval on a vast corpus, Retro 48B enhances task…

AI Tech News
Researchers from the University of Chicago Introduce 3D Paintbrush: A AI Method for Generating Local Stylized Textures on Meshes Using Text as Input

Researchers from the University of Chicago and Snap Research have developed a 3D paintbrush that can automatically texture local semantic regions on meshes using text descriptions. The method produces texture maps that seamlessly integrate into standard…

AI Tech News
SEED-X: A Unified and Versatile Foundation Model that can Model Multi-Granularity Visual Semantics for Comprehension and Generation Tasks

AI Tech News
CompeteAI: An Artificial Intelligence AI Framework that Understands the Competition Dynamics of Large Language Model-based Agents

CompeteAI: An Artificial Intelligence AI Framework that Understands the Competition Dynamics of Large Language Model-based Agents If you want to evolve your company with AI, stay competitive, and use for your advantage CompeteAI: An Artificial Intelligence…

AI Tech News