Large language model
MIT and ETH Zurich researchers have developed a data-driven machine-learning technique to enhance the solving of complex optimization problems. By integrating machine learning into traditional MILP solvers, companies can tailor solutions to specific problems and achieve a significant speedup ranging from 30% to 70%, without compromising accuracy. This breakthrough opens new avenues for tackling complex…
The EU reached a historic agreement on the AI Act, set to come into effect in 2024. It establishes comprehensive laws to regulate AI, following intense negotiation. The legislation covers governance, enforcement, rights protection, prohibited practices, and penalties. The Act classifies high-impact AI systems and mandates regulations for their developers. This landmark decision is pivotal…
Researchers at Anthropic have addressed Claude 2.1’s hesitation in answering questions about individual sentences within its 200K token context. By introducing a prompt containing the sentence “Here is the most relevant sentence in the context,” they significantly improved the model’s recall capacity, with an increase in accuracy for single-sentence queries by 90%. This inventive solution…
“NeRFiller,” a 3D inpainting approach from Google Research and UC Berkeley, innovatively completes missing portions in 3D captures by controlling the process through reference examples. It enhances scenes by addressing reconstruction failures or lack of observations, surpassing object-removal baselines, and demonstrating effectiveness in 3D scene completion. (Word count: 50)
Recent research investigates the effectiveness of fine-tuning in Large Language Models (LLMs). It challenges the common industry practice of alignment tuning for AI assistants and proposes URIAL, a new tuning-free alignment technique based on in-context learning. The study suggests that URIAL can achieve comparable results to fine-tuning-based strategies, emphasizing the role of linguistic style and…
MIT CSAIL and FAIR Meta have introduced Representation-Conditioned Image Generation (RCG) framework, pioneering high-quality image generation without human annotations. This self-supervised approach leverages Representation Diffusion Model and pre-trained encoders to achieve state-of-the-art results in class-unconditional and conditional image generation, promising significant advancements in self-supervised image synthesis.
Notus, a new language model, builds on Zephyr’s success by fine-tuning data curation, prioritizing high-quality data from UltraFeedback and emphasizing user preference alignment. Implementing a meticulous curation process, Notus aims to elevate language model performance by reiterating response generation and AI ranking stages. These efforts have resulted in competitive performance and a commitment to open-source…
A team from Columbia University and Google has introduced ‘ReconFusion,’ an artificial intelligence method for achieving high-quality 3D reconstructions from a limited number of images. It effectively addresses challenges such as artifacts and catastrophic failures in reconstruction, providing robustness even with sparse input views. This advancement holds promise for various applications.
Axel Springer, a major German publishing house, has announced the closure of its news outlet, Upday, which will be relaunched as an AI-driven trend news generator, marking a significant shift from traditional journalism to AI-led content creation. This move signals the company’s commitment to exploring AI’s potential in journalism, despite resulting job cuts.
Researchers from FNii CUHKSZ and SSE CUHKSZ have introduced MVHumanNet, a vast dataset for multi-view human action sequences with comprehensive annotations, such as human masks, camera parameters, 2D and 3D key points, SMPL/SMPLX parameters, and textual descriptions. MVHumanNet is poised to drive innovation in large-scale 3D human-centric tasks like action recognition, human NeRF reconstruction, and…
Google researchers developed a new fine-tuning strategy, called chain-of-thought (CoT), to improve language models’ performance in generating correct answers. The CoT technique aims to maximize the accuracy of responses, surpassing other methods like STaR and prompt-tuning. The study also introduces a control-variate technique and outlines future research directions for further advancements.
Apple recently released MLX, a machine learning framework designed for Apple silicon. Inspired by existing frameworks, it offers a user-friendly design, Python and C++ APIs, composable function transformations, and lazy computations. MLX supports multiple devices, high-level packages like mlx.optimizers and mlx.nn, and has various applications, aiming to simplify complex model building and democratize machine learning.
Google’s demo video of its new model Gemini was impressive, but it fell short of the marketing hype. The video showcased interactions that were actually based on detailed text prompts and still images, not live demonstrations. Google’s claims about Gemini’s capabilities raise questions about AI innovation and future developments compared to existing models like GPT-4.
Deep learning’s wide-ranging applications, including robotics, face challenges due to its reliance on pre-existing data. PyPose, developed on the PyTorch framework, introduces a novel approach blending deep learning with physics-based optimization. This versatile toolkit aids in building and testing various robotic tools efficiently, enhancing performance and adaptability in challenging tasks. Researchers emphasize its revolutionary impact…
Researchers from multiple universities and NVIDIA have developed Dolphins, a vision-language model for autonomous vehicles. Dolphins excel in providing driving instructions by combining language reasoning with visual understanding, exhibiting human-like features such as rapid learning and interpretability. The model addresses challenges in achieving full autonomy in vehicular systems and emphasizes the importance of computational efficiency.
NVIDIA’s paper introduces Diffusion Vision Transformers (DiffiT), enhancing generative learning by combining a hybrid hierarchical architecture with a U-shaped encoder and decoder. Utilizing time-dependent self-attention for conditioning, DiffiT achieves state-of-the-art performance in image and latent space generation, setting a new record with an impressive FID score of 1.73 on ImageNet-256. Future research will explore alternative…
Google faced criticism for a promotional video of its Gemini multi-modal AI, pitted as a competitor to OpenAI’s GPT-4. The video highlighted Gemini’s capabilities, prompting excitement, but was later revealed to be heavily edited, sparking debate on AI marketing ethics. The incident underscores the blurred lines between profit-making and public service in the AI industry.
New text-to-image models have advanced, enabling revolutionary applications like creating images from text. However, existing approaches struggle to consistently produce content across zoom levels. A study by the University of Washington, Google, and UC Berkeley introduces a text-conditioned multi-scale image production method, allowing users to control content at different zoom levels through text prompts. The…
Neural Radiance Fields (NeRF) use neural networks to render detailed 3D scenes without explicit 3D model storage. However, they are limited in dynamic scenes. Shanghai Tech University proposes VideoRF, a real-time streaming solution for dynamic radiance fields on mobile devices. It leverages novel neural modeling and deferred rendering to enable seamless viewing experiences. The approach…
In late November 2023, following Sam Altman’s dismissal from OpenAI, Microsoft’s proposal to employ the entire OpenAI team was met with little enthusiasm. Employees cited concerns about corporate culture, financial losses, and the bureaucratic nature of Microsoft. They saw Microsoft as a less dynamic company, preferring to seek opportunities with other AI startups.