-
DAI#13 – DevDay hangovers, Nvidia flex, and sketchy AI pics
This week’s AI news roundup highlights various topics. There are discussions on AI’s potential control over humans, the EU AI Act, and improvements in AI technology like Humane’s “AI Pin” and Nvidia’s H100 and H200 chips. Challenges in AI deployment, such as the DDoS attack on OpenAI’s ChatGPT servers, and ethical concerns, including AI-generated child…
-
Text-to-image AI models can be tricked into generating disturbing images
Researchers have developed a method called “SneakyPrompt” that can bypass safety filters in popular text-to-image AI models, allowing them to generate inappropriate and disturbing images. The researchers highlight the ease with which AI models can be manipulated and the difficulty in preventing such content generation. Existing safety filters are inadequate, prompting the need for stronger…
-
Runway’s New ‘Motion Brush’ Feature in Gen-2 will Allow to Add Controlled Movement to Your Generations
Runway’s Gen-2 is a groundbreaking video editing tool that simplifies the video generation process. It introduces the Motion Brush function, which allows users to manipulate the movement of generated content using simple hand gestures. This eliminates the need for complex text inputs and extensive editing, making video creation more intuitive and accessible. Gen-2 faithfully restores…
-
Meet Google’s Project Open Se Cura: An Open-Source Framework to Accelerate the Development of Secure, Scalable, Transparent, and Efficient AI Systems
Project Open Se Cura is an open-source framework introduced by Google to enhance the development of secure and efficient AI systems. It aims to bridge the gap between hardware breakthroughs and advances in machine learning models and software development. The collaborative effort with partners like VeriSilicon, Antmicro, and lowRISC focuses on creating open-source design tools…
-
NetEase Youdao Open-Sources EmotiVoice: A Powerful and Modern Text-to-Speech Engine
NetEase Youdao has released an open-source text-to-speech (TTS) engine called “Yi Mo Sheng.” It offers web and script interfaces, allowing for batch result generation, making it suitable for applications requiring emotional synthesis of voices. The engine supports over 2,000 timbres, Chinese and English languages, and includes a unique emotion synthesis feature. Another competitor in the…
-
This AI Paper Introduces a Deep Learning Model for Classifying Stages of Age-Related Macular Degeneration Using Real-World Retinal OCT Scans
A recent research paper presents a deep learning-based classifier for age-related macular degeneration (AMD) stages using retinal optical coherence tomography (OCT) scans. The model accurately classifies macula-centered 3D volumes into Normal, early/intermediate AMD (iAMD), atrophic (GA), and neovascular (nAMD) stages. The study highlights the significance of accurate AMD staging for timely treatment initiation and emphasizes…
-
This AI Paper from MIT Explores the Scaling of Deep Learning Models for Chemistry Research
Researchers from MIT investigated the scaling behavior of large chemical language models, including generative pre-trained transformers (GPT) for chemistry and graph neural network force fields (GNNs). They introduced the concept of neural scaling, examining the impact of model and data size on pre-training loss. The study also explored hyperparameter optimization using a technique called Training…
-
This AI Research from China Introduces 4K4D: A 4D Point Cloud Representation that Supports Hardware Rasterization and Enables Unprecedented Rendering Speed
Dynamic view synthesis is a technique used in computer vision and graphics to reconstruct dynamic 3D scenes from videos. Traditional methods have limitations in terms of rendering speed and quality. However, a new approach called 4K4D has been introduced, which utilizes a 4D point cloud representation and a hybrid appearance model to achieve faster rendering…
-
This AI Paper Introduces Learning from Mistakes (LeMa): Enhancing Mathematical Reasoning in Large Language Models through Error-Driven Learning
A team of researchers from Jiaotong University, Peking University, and Microsoft have developed a method called LeMa that improves the mathematical reasoning abilities of large language models (LLMs) by teaching them to learn from mistakes. They fine-tune the LLMs using mistake-correction data pairs generated by GPT-4. LeMa consistently improves performance across various LLMs and tasks,…
-
Improved DDIM Sampling with Moment Matching Gaussian Mixtures
In this research, a Gaussian Mixture Model (GMM) is proposed as a reverse transition operator in the Denoising Diffusion Implicit Models (DDIM) framework. By constraining the GMM parameters to match the first and second order central moments of the forward marginals, samples of equal or better quality than the original DDIM with Gaussian kernels can…