LongPiBench: A Comprehensive Benchmark that Explores How Even the Top Large Language Models have Relative Positional Biases

Understanding Positional Biases in Large Language Models

Assessing Large Language Models (LLMs) accurately requires tackling complex tasks with lengthy input sequences, sometimes exceeding 200,000 tokens. In response, LLMs have improved to handle context lengths of up to 1 million tokens. However, researchers have identified challenges, particularly the “Lost in the Middle Effect,” where models struggle to process information located in the middle of long inputs. Traditional assessments assumed information was concentrated in specific areas, but in reality, it is often scattered, leading to biases based on relative positions.

Introducing LongPiBench

Researchers from Tsinghua University and ModelBest Inc. developed LongPiBench, a benchmark designed to evaluate positional biases in LLMs. This tool assesses both absolute and relative information positions across tasks of varying complexity and token lengths (32k to 256k). LongPiBench includes:

Three tasks: Table SQL, Timeline Reordering, and Equation Solving.
Four context lengths: 32k, 64k, 128k, and 256k.
Sixteen levels of absolute and relative positions.

The evaluation process involves annotating seed examples and varying the positions of relevant information to understand model performance better.

Key Findings from LongPiBench

The research team tested 11 prominent LLMs, discovering that while newer models are somewhat resistant to the “Lost in the Middle Effect,” they still show biases based on the spacing of relevant information. Notable models assessed included Llama-3.1-Instruct, GPT-4o-mini, Claude-3-Haiku, and Gemini-1.5-Flash. Results indicated:

Top models struggled with timeline reordering and equation solving, achieving only about 20% accuracy.
Commercial and larger open-source models performed well with absolute positioning but faced significant challenges with relative positioning.
Relative positioning biases led to a 30% drop in recall rates, even in simple retrieval tasks.

The Importance of Addressing Positional Biases

LongPiBench emphasizes the critical need to address relative positioning biases in modern LLMs. If left unresolved, these biases could significantly hinder the effectiveness of long-text language models in real-world applications.

Explore More and Stay Connected

For further insights, check out the Paper and GitHub. Follow us on Twitter, join our Telegram Channel, and connect with our LinkedIn Group. If you enjoy our work, subscribe to our newsletter and join our 55k+ ML SubReddit.

Upcoming Live Webinar

Oct 29, 2024: Discover the best platform for serving fine-tuned models with the Predibase Inference Engine.

Leverage AI for Your Business

To stay competitive, consider using LongPiBench to enhance your AI capabilities:

Identify Automation Opportunities: Find key customer interaction points that can benefit from AI.
Define KPIs: Ensure your AI initiatives have measurable impacts on business outcomes.
Select an AI Solution: Choose tools that fit your needs and allow for customization.
Implement Gradually: Start with a pilot project, gather data, and expand AI usage wisely.

For AI KPI management advice, contact us at hello@itinai.com. For ongoing insights into leveraging AI, follow us on Telegram or Twitter.

Discover how AI can transform your sales processes and customer engagement at itinai.com.

List of Useful Links:

Unleash Your Creative Potential with AI Agents

Competitors are already using AI Agents

Business Problems We Solve

Automation of internal processes.
Optimizing AI costs without huge budgets.
Training staff, developing custom courses for business needs
Integrating AI into client work, automating first lines of contact

Large and Medium Businesses

Startups

Offline Business

Get a plan to reduce routine and improve metrics

100% of clients report increased productivity and reduced operati

AI Agents

Localization Project Manager – Coordinating translation workflows, answering vendor or process-related questions.

Job Title: Localization Project Manager Overview The Localization Project Manager plays a vital role in coordinating translation workflows while addressing vendor and process-related queries. This position is crucial for ensuring that translation projects are executed efficiently…
AI Agents

Environmental Health & Safety Officer – Answering compliance-related questions, retrieving safety protocols or audit histories.

Professional Summary The AI-driven Environmental Health & Safety Officer is a reliable and effective digital team member that performs repetitive and time-consuming tasks with remarkable speed, accuracy, and stability. By automating these tasks, it frees up…
AI Agents

Legal Contract Reviewer – Auto-flagging clause inconsistencies or retrieving precedent cases for review.

Job Title: Legal Contract Reviewer – Auto-flagging Clause Inconsistencies or Retrieving Precedent Cases for Review The AI functions as a reliable and effective digital team member that excels in performing repetitive and time-consuming tasks. With remarkable…
AI Agents

Customer Retention Analyst – Creating customer summaries, identifying churn risk patterns, and suggesting retention steps.

Customer Retention Analyst Professional Summary A highly analytical and detail-oriented Customer Retention Analyst with a proven track record in creating comprehensive customer summaries, identifying churn risk patterns, and suggesting effective retention strategies. Adept at leveraging data-driven…

Itinai.com httpss.mj.runmrqch2uvtvo russian handsome charisma 9fdbb2d5 a55b 425d 8f3b 76d26f86710f 2

AI Business Accelerator

Start Your AI Business in Just a Week with itinai.com

You’re a great fit if you:

Have an audience (even 500+ followers in Instagram, email, etc.)
Have an idea, service, or product you want to scale
Can invest 2–3 hours a day
You’re motivated to earn with AI but don’t want to handle technical setup

AI news and solutions

MIT Researchers Released a Robust AI Governance Tool to Define, Audit, and Manage AI Risks

Practical Solutions for AI Risk Management Unified Framework for AI Risks AI-related risks are a concern for policymakers, researchers, and the public. A unified framework is crucial for consistent terminology and clarity, enabling organizations to create…

AI Tech News
Arena Learning: Transforming Post-Training of Large Language Models with AI-Powered Simulated Battles for Enhanced Efficiency and Performance in Natural Language Processing

Practical Solutions and Value of Arena Learning Large language models (LLMs) like chatbots powered by LLMs can engage in naturalistic dialogues, providing a wide range of services. Challenges Faced The challenge is the efficient post-training of…

AI Tech News
How to Use Backdoor Criterion to Select Control Variables

The article introduces the use of Directed Acyclic Graphs (DAG) and backdoor criterion in causal inference for experimental settings to select good control variables. It explains the process through a data science problem of influencing sustainable…

AI Tech News
AI in Travel Booking Optimization

AI in Travel Booking Optimization The frantic energy of peak travel season. The endless email chains chasing down booking confirmations. The frustrated customer on the phone, repeating their needs for the third time. Sound familiar? For…

Tools
Advancing Sample Efficiency in Reinforcement Learning Across Diverse Domains with This Machine Learning Framework Called ‘EfficientZero V2’

EfficientZero V2 (EZ-V2) is a novel reinforcement learning framework from Tsinghua University and Shanghai Qi Zhi Institute. It excels in both discrete and continuous tasks, using a combination of Monte Carlo Tree Search and model-based planning.…

AI Tech News
Nvidia outflanks US AI hardware export bans again

Nvidia has developed new chips, the HGX H20, L20 PCle, and L2 PCle, as a workaround to continue selling high-end chips to Chinese companies despite US export restrictions. These chips, while less powerful than previously restricted…

AI Tech News
Subgroups: An Open-Source Python Library for Efficient and Customizable Subgroup Discovery

Practical Solutions and Value of Subgroups Library Efficient Subgroup Discovery with Subgroups Library Subgroups Library simplifies the use of Subgroup Discovery (SD) algorithms in machine learning and data science. Key Features: Improved Efficiency: Native Python implementation…

AI Tech News
Stanford Researchers Introduce PEPSI: A New Artificial Intelligence Method to Identify Tumor-Immune Cell Interactions from Tissue Imaging

Researchers have developed PEPSI (Protein Expression Polarity Subtyping in Immunostains) to analyze subcellular protein localization in tumor microenvironments, crucial for understanding immune responses in cancer. It identifies distinct immune cell states by computing cell surface biomarker…

AI Tech News
Google DeepMind Researchers Propose Matryoshka Quantization: A Technique to Enhance Deep Learning Efficiency by Optimizing Multi-Precision Models without Sacrificing Accuracy

Understanding Quantization in Deep Learning What is Quantization? Quantization is a key method in deep learning that helps reduce computing costs and improve the efficiency of models. Large language models require a lot of processing power,…

AI Tech News
How to Delete Character.ai Account (Tutorial)

This tutorial provides step-by-step instructions on how to delete your Character.ai account both via the website and the mobile app. It includes detailed guidance on logging in, accessing profile settings, and confirming the account deletion. The…

AI Tech News
Meet Neosync: The Open Source Solution for Synchronizing and Anonymizing Production Data Across Development Environments and Testing

Neosync is an open-source platform helping software development teams anonymize and generate synthetic data for testing while maintaining data privacy. It connects to production databases to facilitate data synchronization across environments and offers features like automatic…

AI Tech News
Microsoft Open Sourced MarkItDown: An AI Tool to Convert All Files into Markdown for Seamless Integration and Analysis

Streamlined Note-Taking and Documentation Effective note-taking and documentation are essential for both individuals and organizations. Traditional tools often lack integration, collaboration, and accessibility, leading to disorganized information and sharing difficulties. Users struggle with combining text, images,…

AI Tech News
Building a Semantic Search Engine with Sentence Transformers and FAISS

Building a Semantic Search Engine Building a Semantic Search Engine: A Practical Guide Understanding Semantic Search Semantic search enhances traditional keyword matching by grasping the contextual meaning of search queries. Unlike conventional systems that rely solely…

AI Tech News
Structuring Your Cloud Instances’ Startup Scripts

The text discusses the separation between first launch and reboot when using startup scripts in cloud servers. It explains how user data is used to configure instances during the first launch and reboot, and provides an…

AI Tech News
Researchers at Stanford University Introduce a Novel Artificial Intelligence Framework Aimed at Enhancing the Interpretability and Generative Capabilities of Current Models for Varied Visual Concepts

Stanford University researchers developed an AI framework to enhance the interpretability and generative capabilities of visual concepts. The framework leverages language-informed concept axes, training concept encoders aligned with textual embeddings. It outperforms text-based methods, generating novel…

AI Tech News
This AI Paper Introduces Data-Free Knowledge Distillation for Diffusion Models: A Method for Improving Efficiency and Scalability

Practical Solutions for Diffusion Models Challenges in Deploying Diffusion Models Diffusion models, while powerful in generating high-quality images, videos, and audio, face challenges such as slow inference speeds and high computational costs, limiting their practical deployment.…

AI Tech News
LLMs for Everyone: Running the HuggingFace Text Generation Inference in Google Colab

The text discusses using the HuggingFace Text Generation Inference (TGI) toolkit to run large language models in a free Google Colab instance. It details the challenges of system requirements and installation, along with examples of running…

AI Tech News
Google Plans for a World Beyond Search Engine

Google, led by CEO Sundar Pichai, is shifting focus towards AI chatbot technology with Gemini. This innovative tool aims to offer a versatile and interactive way of accessing information, including text, voice, and images. Google is…

AI Tech News
Researchers at Google DeepMind Present Gecko: A Compact and Versatile Embedding Model Powered by the Vast World Knowledge of LLMs

AI Tech News
How Perplexity AI is Transforming Search: Recent Innovations, Strategic Partnerships, and Market Advancements in 2024

Introduction to Perplexity AI Founded in 2022, Perplexity AI is a fast-growing company in artificial intelligence, especially in AI-driven search technologies. The company emphasizes innovation and offers user-friendly features to improve how people use search engines…

AI Tech News