-
How to Scale Your EMA
Preserving training dynamics across batch sizes is important for practical machine learning. One tool for achieving this is scaling the learning rate linearly with the batch size. Another tool is the use of model EMA, which creates a functional copy of a target model that gradually moves towards the parameters of the target model using…
-
Diffusion Models as Masked Audio-Video Learners
Recently, a paper on the use of audio-visual synchronization for learning audio-visual representations was accepted at the Machine Learning for Audio Workshop at NeurIPS 2023. The paper discusses the effectiveness of unsupervised training frameworks, particularly the Masked Audio-Video Learners (MAViL) framework, which combines contrastive learning with masked autoencoding.
-
Agnostically Learning Single-Index Models using Omnipredictors
This text introduces a new approach to agnostically learning Single-Index Models (SIMs) with arbitrary monotone and Lipschitz activations. Unlike previous methods, it does not rely on predetermined settings or knowledge of the activation function. Additionally, it only requires the marginal to have bounded second moments, instead of stronger distributional assumptions. The algorithm is based on…
-
PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model
Autoregressive models for text generation often produce repetitive and low-quality output due to errors accumulating during generation. Exposure bias, the difference between training and inference, is blamed for this. Denoising diffusion models offer an alternative by allowing a model to revise its output, but they are computationally expensive and less fluent for longer text.
-
Improving Vision-inspired Keyword Spotting Using a Streaming Conformer Encoder With Input-dependent Dynamic Depth
This text proposes an architecture capable of processing streaming audio using a vision-inspired keyword spotting framework. By extending a Conformer encoder with trainable binary gates, the approach improves detection and localization accuracy on continuous speech while maintaining a small memory footprint. The inclusion of gates also reduces the average amount of processing without affecting performance.
-
Realistic talking faces created from only an audio clip and a person’s photo
Researchers have created a program called DIRFA that generates realistic videos by combining audio and a face photo. The program uses artificial intelligence to create 3D videos that accurately show the person’s facial expressions and head movements.
-
YouTube continues foray into AI with upcoming creative tools
YouTube is introducing new AI-powered features that allow users to compose music using the voices of popular artists and convert hummed melodies into songs. One feature, called “Dream Track,” allows users to generate songs in the styles of licensed artists, while another tool, “Music AI Tools,” supports musicians in their creative processes. These innovations are…
-
Microsoft joins the AI hardware market with a pair of custom chips
Microsoft has introduced its first custom AI chips, the Microsoft Azure Maia 100 AI Accelerator and the Microsoft Azure Cobalt 100 CPU. These chips are designed for AI and cloud computing applications and will be used in Microsoft’s data centers to power Bing AI chatbot, Copilot, and Azure OpenAI. The goal is to meet the…
-
The Other Side of Data Contracts: Awakening Consumer Responsibility
Data organisations often overlook the responsibilities of data consumers in data contracts. To maximize the value of data, data contracts should outline the consumer’s obligations in analyzing and applying the data. Neglecting consumer commitments can reduce the business impact of data contracts. Consumer commitments should go beyond compliance and focus on value creation. Structured approaches,…
-
SCD2 — Semantics and Styles
This text discusses the semantics of slowly changing dimension type 2 (SCD2) techniques in dimensional modeling. It covers the importance of choosing appropriate reference dates and the impact of different row-versioning methods on access patterns. Three options for reference dates are discussed: extract timestamps, source system timestamps, and business timestamps. Additionally, the format of valid_to…