OpenVINO AI effects for Audacity

151 points
1/21/1970
a month ago
by wazoox

Comments


mmastrac

I tried the Audacity noise-removal plugin recently and it's complete crap. I fed a high-quality audio stream from a Rode mic into a few different options to see which could remove the noise of my server rack. iMovie made the voice sound like a robot and Audacity barely did anything. The only thing that worked was DeepFilterNet and it's free, open-source and cargo installable.

There's no reason to lock yourself into an intel-only solution. Just use DeepFilterNet. The results of this on my noisy server room were insanely good. Almost no voice dropout with 100% fan noise removal.

https://github.com/Rikorose/DeepFilterNet

EDIT: Even more interesting, it looks like OpenVino is just DeepFilterNet glued to Whisper.cpp and tied to Intel hardware.

https://github.com/intel/openvino-plugins-ai-audacity/tree/m...

a month ago

dragonwriter

OpenVINO is an Intel toolkit for deploying AI models. This particular project is an Intel project using the OpenVINO toolkit to package several existing models as audacity plugins.

a month ago

refulgentis

> an intel-only solution.

> OpenVino is just DeepFilterNet glued to Whisper.cpp and tied to Intel hardware.

Well, no.

When you want to run a model on a truly wide set of devices, you end up sort of wedged into either ONNX, OpenVINO, TensorFlow Lite, and a few other frameworks.

They're all FOSS, and they're software libraries.

YMMV on which is best, of course, but broadly and widely: where are your users, mostly? Desktop? OpenVINO. Web? TensorFlow. Mobile and desktop? ONNX. This isnt entirely accurate because ex. I reach for ONNX every time because that is what I'm familiar with. All of them make effort to reach every platform, ex. OpenVINO goes supports ARM, and not in a trivial manner.

That all being said, TL;DR:

It is "not even wrong", in the Pauli sense, to imply OpenVINO is Intel-only, and to describe OpenVINO as "just glu[ing a model to inference code]"

You're describing 3 different components (a hardware acceleration library, and inference library, and a model) and suggesting the hardware accelerated inference library just glues together a model-specific inference library and a model. The mastroyshka doll is inverted: whisper.cpp uses openvino to acclerate its model-specific inference code.

a month ago

zeckalpha

The Noise Removal plugin takes a bit getting used to, but I have had great results from it. I don't mean to point to operator error... it's got too many options for someone new to it to tune.

a month ago

sorenjan

This only works with Intel GPUs, and CPUs and NPUs. No Nvidia support for instance.

https://docs.openvino.ai/2024/about-openvino/release-notes-o...

a month ago

Aromasin

I used to work at Intel doing OpenVINO stuff. Should work on AMD too; it's just not validated for it so there might be quirks.

a month ago

vient

They have some NVIDIA support in the form of external project: https://github.com/openvinotoolkit/openvino_contrib/tree/mas...

a month ago

VTimofeenko

Nvidia has Broadcast which is Windows-only:

https://www.nvidia.com/en-us/geforce/broadcasting/broadcast-...

a month ago

sorenjan

Doesn't work on files so can't be used in Audacity, only on live mic audio.

a month ago

VTimofeenko

Fair point! I don't suppose Windows allows pulse audio style routing of audio streams to use denoising on a fake microphone.

25 days ago

BizarroLand

I ran it on my amd 5950x with an rtx3090. It crashed when I attempted GPU process but ran fine on CPU. YMMV

25 days ago

idunnoman1222

So use your cpu

a month ago

htsh

A lot of us have ryzen / nvidia combos... hopefully, soon, though.

a month ago

bigbones

Openvino runs fine on AMD last I checked

a month ago

_carbyau_

Maybe it does, however the system requirements page makes it looks like it supports everything BUT AMD.

https://docs.openvino.ai/2024/about-openvino/release-notes-o...

a month ago

7speter

It supports AMD cpus because, if I understand correctly, AMD licenses x86 from Intel, so it shares the same bits needed to run openVINO as Intel’s cpus.

Go look at CPUs benchmarks on Phoronix; AMD Ryzen cpus regularly trounce Intel cpus using openVINO inference.

a month ago

dragonwriter

Or use the underlying open-source models directly; this is just several existing open models packaged by an Intel-specific deployment framework and wrapped as Audacity plugins.

a month ago

7speter

This is a great suggestion and all, but don’t you need a frontend/pipeline to run data through these models?

a month ago

dragonwriter

There are existing frontends for these models that aren't tied to Intel hardware. It may be somewhat less convenient than having them packaged as audacity plugins, but they certainly exist, for people who would want to use them but do not want to be limited to Intel hardware.

a month ago

7speter

It might also work with AMD CPUs too

a month ago

jogu

I've used this plugin on an AMD CPU, it definitely works.

a month ago

smusamashah

Is there a tool that can remove very noisy audio recording of a song using actual song as a reference?

I found a very old audio cassette from my childhood with me and some other kids talking while a song is playing in background. I tried subtracting the song using Audacity but for that to work reference song and recording must align "perfectly" which is very very hard. Not just the timing (which i found can be a problem with cassettes) loudness/frequency distribution must also align perfectly.

Found Smartsubtract https://oxfordwaveresearch.com/products/smartsubtract/ which seems to do exactly the same but it's not available for download.

Is there any (AI even?) tool that might do that? I tried an online AI tool which claimed it can extract voices but it returned back silence. I want to try OpenVino but not sure it will be useful with faint spoken words in a noisy environment with a song.

a month ago

regularfry

I don't know if there's an available tool to do it, but "assuming these sources are aligned in time, remove this reference B from that recording A" would be quite a nice undergrad problem. You'd do something like cross-correlate A with B, multiply B by the correlation coefficient and then subtract the result from A. In the frequency domain, because that makes things a little easier.

The next question on the problem would be "Give at least three reasons why this doesn't perfectly remove the reference sound," of course.

a month ago

BizarroLand

I would try openvino, it might get you part of the way there.

Other things to do would be to fix any tape warble or flutter, normalize the volume, do simple things like high pass filtering anything below 75hz (as most voices don't make audible volume at those frequencies, especially children's voices.

Then I would get a spectrum analyzer plugin and see if there are any spots that are clearly music vs children speaking and zap them out.

(Audition is pretty good software for this, you might still be able to find the download and mass unlock serial key that Adobe released for version 3.0 somewhere on the internet, of course, this is only for people who bought it as a perpetual license back in the day and need to activate it now that the activation servers have gone offline, so no pirating it!)

I'm not gonna say it will be perfect but you might do well enough to be able to hear what everyone is saying and it not sound too bizarre.

25 days ago

smusamashah

I tried Photosounder which works on FFT of the sounds. I expected it have many powers of photoshop to crop/copy/subtract spectrum which I couldn't do with it. It has option of layers but its not as intuitive as photoshop like tools.

Also looking at frequency spectrum and removing sounds from there I learned that music and spoken both contain a bunch of different frequency. To completely eleminate music, i have to remove all of it which is not easy to see or trace manually.

From what you are saying, given audition is an adobe tool, it should be sufficient.

25 days ago

oDot

Try iZotope RX

a month ago

smusamashah

Thanks for suggestion. Some quick lookup suggests it should be able to do it. Will give this a try.

a month ago

kmfrk

I'm a big fan of RTX Voice, but it seems like the kind of feature you can only use in real-time as virtual audio and not as postprocessing. Anyone if Nvidia makes this possible?

a month ago

pabs3

Wonder if these are open models like RNNoise now is.

https://github.com/xiph/rnnoise

a month ago