Denoising with Machine Learning
Up until the last few years, the only main techniques for removing noise from signals were based on spectral subtraction. Machine learning (ML) has now become a powerful alternative. It takes advantage of the fact that we know what the denoised signal should roughly sound like. I have a paper on this topic here (I'll add a paper download link).
The general principle is we train the ML to convert (noise + speech) -> speech. I use a database of 140,000 seconds of "books on tape." I add random noise to 1.024 second clips of speech and then ask the ML to reverse or remove the noise.
What I've found is that I can remove white noise that is up to 3x louder than the underlying speech signal. Unfortunately, based on listening carefully to hardware-produced white noise, I feel that spirit voices are at least 20x quieter than the background noise. So, if I make a model that can remove 3x white noise, I must apply this model multiple times to a white noise source, and after a few iterations, it'll produce something akin to human speech. In fact, I'm fairly certain this method works - however, the voice quality is not clear enough such that when sharing audio clips with other researchers, we generally can't agree on much of what is being said. Simply put, removing adding noise is not enough to solve the clear speech ITC problem; however, it may be enough for a dedicated ITC researcher to work with.
Another major drawback with ML, is that my models, in their current form, are computationally expensive. It is common to run ML on Graphics Processing Units (GPUs), specifically from Nvidia. I estimate the NVIDIA GTX 1650 is the minimal hardware to run multiple iterations of my ML models in real-time. Budget gaming laptops like the Acer Nitro series that are around $600-$700 have the requisite GPU. An alternative is that we host the machine learning models on the "cloud" to yield a single stream for people to tune into. Once again, someone needs to be willing to spend the money to buy a similar computer and host it 24/7 (electricity, AC, etc.)
If you have a ML-capable GPU, and would like to explore ML-based processing for ITC, let me know, and I will share with your my Python scripts and trained model files. The spirits are always excited about expanding "the network."
0 Comments
Recommended Comments
There are no comments to display.