Meta Unveils Open‑Source AI Tool for Easy Audio Cleanup
Meta has introduced a new open‑source AI tool that simplifies audio cleaning by allowing users to describe the sound they want to isolate. The company’s new model, SAM Audio, can isolate specific sounds from complex recordings using simple text prompts. This tool enables users to extract particular noises—such as voices, instruments, or background sounds—without the need for complicated editing software. The model is available through Meta’s Segment Anything Playground, which also hosts other prompt‑based image and video editing tools.
How SAM Audio Works
SAM Audio is designed to identify the desired sound and separate it cleanly from the rest of the audio. By providing a short textual description, creators can isolate vocals from a band recording, remove traffic noise from a podcast, or eliminate a barking dog from a recording simply by describing the target sound.
Supported Prompt Types
The multimodal model supports three types of prompts:
- Text prompts – simple natural‑language descriptions of the sound.
- Visual prompts – images or video frames that hint at the desired audio content.
- Time‑based prompts – specifying the exact time range where the target sound occurs.
SAM Audio uses Meta’s Perception Encoder Audiovisual engine to recognize and understand sounds before isolating them. Meta also released SAM Audio‑Bench, a benchmark for evaluating performance on speech, music, and sound‑effects tasks, and SAM Audio Judge, which rates the naturalness and accuracy of the separated audio.
Performance & Limitations
According to Meta, SAM Audio performs best when combining different prompt types and can process audio faster than real‑time, even at scale. However, the model has a few limitations:
- It does not support audio‑based prompts.
- Some form of prompting is always required for separation.
- It struggles with highly overlapping, similar sounds (e.g., isolating a single voice from a choir).
Future Directions
Meta plans to address these limitations and is exploring real‑world applications, such as collaborations with hearing‑aid manufacturers and organizations supporting people with disabilities. The launch of SAM Audio is part of Meta’s broader AI initiatives, which include improving voice clarity on AI glasses, developing next‑generation mixed‑reality glasses, and creating conversational AI systems.
Impact
SAM Audio represents a significant advancement in audio editing technology, making it easier and more efficient for users to clean up noisy recordings. As Meta continues to develop and refine the tool, it has the potential to revolutionize workflows across music production, podcasting, film and television, accessibility tools, and research.
