Artificial Intelligence
Researchers have introduced the Listwise Preference Optimization (LiPO) framework, reshaping language model alignment as a listwise ranking challenge. LiPO-λ emerges as a powerful tool leveraging listwise data to enhance alignment, bridging LM preference optimization and Learning-to-Rank, setting new benchmarks, and driving future research. This approach signals a new era of language model development. [45 words]
Adobe introduces AI Assistant in Adobe Acrobat, a generative AI technology integrated into document workflows. This powerful tool offers productivity benefits for a wide range of users, from project managers to students. Adobe emphasizes responsible AI development and outlines a vision for future AI-powered document experiences, including intelligent creation and collaboration support.
Gary Marcus, a prominent AI researcher and critic of deep learning, discusses AI’s current state during a walk in Vancouver. He’s unimpressed with new AI models such as Google DeepMind’s Gemini and OpenAI’s Sora, criticizing their lack of understanding and the potential for exploitation. Marcus advocates for clearer rules and ethical practices in AI.
Researchers from Stanford University and Bauplan have developed the NEGOTIATION ARENA, a framework to evaluate Large Language Models’ (LLMs) negotiation capabilities. The study demonstrates LLMs’ evolving sophistication, adaptability, and strategic successes, while also highlighting their irrational missteps. This research offers insights into creating more reliable and human-like AI negotiators, paving the way for future applications…
Large language models (LLMs) offer powerful language processing but require significant resources. Binarization, reducing model weights to one bit, reduces computational demand. Existing quantization techniques face challenges at low bit widths. Researchers introduced BiLLM, a 1-bit post-training quantization scheme for LLMs, achieving ultra-low bit quantization without significant loss of precision. For more information, see the…
Mathematical reasoning is essential for solving complex real-world problems. However, developing large language models (LLMs) specialized in this area is challenging due to limited diverse datasets. Existing approaches rely on closed-source datasets, but the research team from NVIDIA has introduced OpenMathInstruct-1, a novel open-licensed dataset comprising 1.8 million problem-solution pairs. The dataset has shown significant…
The intersection of artificial intelligence and chess has been a testing ground for computational strategy and intelligence. Google DeepMind’s groundbreaking study trained a transformer model with 270 million parameters on 10 million chess games using large-scale data and advanced neural architectures. The model achieves grandmaster-level play without traditional search algorithms and demonstrates the critical role…
Researchers from Huawei Noah’s Ark Lab and Peking University, in collaboration with Huawei Consumer Business Group, have developed PanGu-π Pro, a groundbreaking tiny language model for mobile devices. The model achieves high performance through strategic optimization, compression of the tokenizer, and architectural adjustments, setting new benchmarks for compact language models. This innovation opens new avenues…
Hydragen is a transformative solution in optimizing large language models (LLMs). Developed by research teams from Stanford University, the University of Oxford, and the University of Waterloo, Hydragen’s innovative attention decomposition method significantly enhances computational efficiency for shared-prefix scenarios, showcasing up to a 32x improvement in LLM throughput and adaptable application to various settings. For…
OpenAI’s innovative text-to-video model, Sora, is transforming digital content creation. It offers unparalleled capabilities to generate, extend, and animate high-quality videos with remarkable detail. By leveraging spacetime patches and recaptioning techniques, Sora demonstrates diverse applications, showcasing potential for AGI and simulating real-world dynamics. Despite limitations, Sora represents a significant leap forward in AI-driven video generation.
AI development is evolving from static, task-centric models to dynamic, adaptable agent-based systems suitable for various applications. Recent research proposes the Interactive Agent Foundation Model, a multi-modal system with unified pre-training to process text, visual data, and actions. It demonstrates promising efficacy across diverse domains, showing potential for generalist agents in AI advancement.
The Nomic AI’s nomicembed-text-v1 model revolutionizes long-context text embeddings, boasting a sequence length of 8192, surpassing predecessors in performance evaluations. Open-source with an Apache-2 license, it emphasizes transparency and accessibility, setting new AI community standards. Its development process prioritizes auditability and potential replication, heralding a future of profound understanding in human discourse.
Researchers from Fudan University, Ohio State University, and Pennsylvania State University, Meta AI, have developed TravelPlanner, an AI benchmark to evaluate agents’ planning skills in realistic scenarios. It challenges AI agents to plan multi-day travel itineraries, highlighting limitations in current AI models. TravelPlanner aims to advance AI planning capabilities and bridge the gap between theoretical…
MeetKai, an influential player in conversational AI, introduced Functionary, an open-source language model for function calling. In contrast to larger models like GPT-4, Functionary offers faster, more cost-effective inference with high accuracy. It seamlessly integrates with OpenAI’s platform and aligns with MeetKai’s vision for the metaverse, inviting developers to shape the future of applied generative…
LMMs have widely expanded using CLIP for vision encoding and LLMs for multi-modality reasoning. Scaling up CLIP is crucial, leading to the EVA-CLIP-18B model with 18B parameters. It achieves remarkable zero-shot top-1 accuracy on 27 benchmarks and demonstrates effectiveness in various image tasks, underlining progress in open-source AI models. [50 words]
Graph Neural Networks (GNNs) leverage graph structures to perform inference on complex data, addressing the limitations of traditional ML algorithms. Google’s TensorFlow GNN 1.0 (TF-GNN) library integrates with TensorFlow, enabling scalable training of GNNs on heterogeneous graphs. It supports supervised and unsupervised training, subgraph sampling, and flexible model building for diverse tasks.
Vision Language Models (VLMs) leverage Large Language Models’ strength to comprehend visual data, demonstrating capability in visual question answering and optical character recognition. A study by Tsinghua University and Zhipu AI introduces Chain of Manipulations (CoM) to enable VLMs for visual reasoning, leading to competitive performance on various benchmarks and highlighting potential for accelerated VLM…
DeepSeekMath, developed by DeepSeek-AI, Tsinghua University, and Peking University, revolutionizes mathematical reasoning using large language models. With a dataset of over 120 billion tokens of math-related content and innovative training using Group Relative Policy Optimization, it achieves a top-1 accuracy of 51.7% on the MATH benchmark, setting a new standard for AI-driven mathematics.
State-space models (SSMs) are being explored as an alternative to Transformer networks in AI research. SSMs aim to address computational inefficiencies in Transformer networks and have led to the proposal of MambaFormer, a hybrid model combining SSMs and Transformer attention blocks. MambaFormer demonstrates superior in-context learning capabilities, offering new potential for AI advancement.
Large Language Models, like GPT-3, have revolutionized Natural Language Processing by scaling to billions of parameters and incorporating extensive datasets. Researchers have also introduced Speech Language Models directly trained on speech, leading to the development of SPIRIT-LM. This multimodal language model seamlessly integrates text and speech, demonstrating potential impacts on various applications.