Artificial Intelligence
A team from Columbia University and Google has introduced ‘ReconFusion,’ an artificial intelligence method for achieving high-quality 3D reconstructions from a limited number of images. It effectively addresses challenges such as artifacts and catastrophic failures in reconstruction, providing robustness even with sparse input views. This advancement holds promise for various applications.
Axel Springer, a major German publishing house, has announced the closure of its news outlet, Upday, which will be relaunched as an AI-driven trend news generator, marking a significant shift from traditional journalism to AI-led content creation. This move signals the company’s commitment to exploring AI’s potential in journalism, despite resulting job cuts.
Researchers from FNii CUHKSZ and SSE CUHKSZ have introduced MVHumanNet, a vast dataset for multi-view human action sequences with comprehensive annotations, such as human masks, camera parameters, 2D and 3D key points, SMPL/SMPLX parameters, and textual descriptions. MVHumanNet is poised to drive innovation in large-scale 3D human-centric tasks like action recognition, human NeRF reconstruction, and…
Google researchers developed a new fine-tuning strategy, called chain-of-thought (CoT), to improve language models’ performance in generating correct answers. The CoT technique aims to maximize the accuracy of responses, surpassing other methods like STaR and prompt-tuning. The study also introduces a control-variate technique and outlines future research directions for further advancements.
Apple recently released MLX, a machine learning framework designed for Apple silicon. Inspired by existing frameworks, it offers a user-friendly design, Python and C++ APIs, composable function transformations, and lazy computations. MLX supports multiple devices, high-level packages like mlx.optimizers and mlx.nn, and has various applications, aiming to simplify complex model building and democratize machine learning.
Google’s demo video of its new model Gemini was impressive, but it fell short of the marketing hype. The video showcased interactions that were actually based on detailed text prompts and still images, not live demonstrations. Google’s claims about Gemini’s capabilities raise questions about AI innovation and future developments compared to existing models like GPT-4.
Deep learning’s wide-ranging applications, including robotics, face challenges due to its reliance on pre-existing data. PyPose, developed on the PyTorch framework, introduces a novel approach blending deep learning with physics-based optimization. This versatile toolkit aids in building and testing various robotic tools efficiently, enhancing performance and adaptability in challenging tasks. Researchers emphasize its revolutionary impact…
Researchers from multiple universities and NVIDIA have developed Dolphins, a vision-language model for autonomous vehicles. Dolphins excel in providing driving instructions by combining language reasoning with visual understanding, exhibiting human-like features such as rapid learning and interpretability. The model addresses challenges in achieving full autonomy in vehicular systems and emphasizes the importance of computational efficiency.
NVIDIA’s paper introduces Diffusion Vision Transformers (DiffiT), enhancing generative learning by combining a hybrid hierarchical architecture with a U-shaped encoder and decoder. Utilizing time-dependent self-attention for conditioning, DiffiT achieves state-of-the-art performance in image and latent space generation, setting a new record with an impressive FID score of 1.73 on ImageNet-256. Future research will explore alternative…
Google faced criticism for a promotional video of its Gemini multi-modal AI, pitted as a competitor to OpenAI’s GPT-4. The video highlighted Gemini’s capabilities, prompting excitement, but was later revealed to be heavily edited, sparking debate on AI marketing ethics. The incident underscores the blurred lines between profit-making and public service in the AI industry.
New text-to-image models have advanced, enabling revolutionary applications like creating images from text. However, existing approaches struggle to consistently produce content across zoom levels. A study by the University of Washington, Google, and UC Berkeley introduces a text-conditioned multi-scale image production method, allowing users to control content at different zoom levels through text prompts. The…
Neural Radiance Fields (NeRF) use neural networks to render detailed 3D scenes without explicit 3D model storage. However, they are limited in dynamic scenes. Shanghai Tech University proposes VideoRF, a real-time streaming solution for dynamic radiance fields on mobile devices. It leverages novel neural modeling and deferred rendering to enable seamless viewing experiences. The approach…
In late November 2023, following Sam Altman’s dismissal from OpenAI, Microsoft’s proposal to employ the entire OpenAI team was met with little enthusiasm. Employees cited concerns about corporate culture, financial losses, and the bureaucratic nature of Microsoft. They saw Microsoft as a less dynamic company, preferring to seek opportunities with other AI startups.
NeurIPS, the world’s largest AI conference, will occur in New Orleans from December 10-16, 2023. Google DeepMind teams will present over 150 papers.
Gemini AI, an advanced NLP model, is designed to exceed current benchmarks due to its multimodal capabilities, scalability, and potential for integration with Google’s ecosystem, marking a substantial advancement in AI technology.
Meta is rolling out over 20 generative AI updates to its platforms, introducing features like AI-enhanced search, invisible watermarking, and improvements to Meta AI. This update boosts user experience in areas such as messaging, social media interaction, and content creation, with further advancements expected in the upcoming year.
The “KnowNo” model teaches robots to ask for clarification on ambiguous commands to ensure they act correctly and minimize unnecessary human interaction. It combines language models with confidence scores to determine if intervention is needed. Tested on robots, it achieved consistent success and reduced the need for human aid.
Neosync is an open-source platform helping software development teams anonymize and generate synthetic data for testing while maintaining data privacy. It connects to production databases to facilitate data synchronization across environments and offers features like automatic data generation, schema-based synthetic data, and database subsetting. With its GitOps approach, asynchronous pipeline, and support for various databases…
MIT researchers developed an automated onboarding system that improves human-AI collaboration accuracy by training users when to trust AI assistance. Their method uses natural language to teach rules based on the user’s past interactions with AI, leading to a 5% improvement in image prediction tasks.
Generative AI in academia spurs debate without clear answers on its role, plagiarism, and permissible use. A study shows students and educators divided, seeking policy clarity. Concerns include detection of AI use, the risk of mental enfeeblement, equitable access, and the potential for false positives in AI-written work detection.