Beats To Rap On Experience

The Sonic Dissection: How AI Stem Splitting Is Reshaping Music Creation

Chet

Discover how AI stem splitting is revolutionizing music. We break down the tech behind vocal isolation, beat extraction, and remixing—used by DJs, producers, and indie artists alike. From the science of spectrograms to cultural implications and ethical debates, this episode dives deep into how AI is transforming sound itself. Featuring insights from BeatsToRapOn and tools like Spleeter, Valkyrie AI Mastering, and more.

Okay, let's unpack this think about That one song you just can't get out of your head. Yeah. No imagine some kind of AI just going in Laser like precision, you know Pulling apart every single ingredient the singer's voice that that rumbling bassline the actual beat that gets your foot tapping Yeah, it's almost Well, it's almost eerie how well it can work now, right this whispering revolution you mentioned It's happening because of AI audio stem splitting exactly this distortion so subtle yet seismic That's what we're diving into and what's really fascinating here Is that this tech AI audio stem splitting it basically takes a finished song a complete piece of music and Sort of dissects it breaks it down into its core components those individual sound layers We call stems like reverse engineering a sonic puzzle. That's a great way to put it much Yeah, so look whether you're a musician wanting to you know mess around with the track right or maybe a producer hunting for that perfect little snippet a DJ Ready to spin a completely unique mix or honestly to someone who's fascinated by how tech is shaking up Music. Yeah, hopefully in a cool way You're in the right place. Yeah, we're gonna try and break down this This really exciting technology without getting totally lost in the weeds, you know the complicated terms Hopefully get ready for some aha moments along the way That's the plan our goal for this deep dive is really to get to grips with AI audio stem splitting We want to understand the basic ideas the science powering it how it's actually being used out there in the real world some of the Well interesting discussions and frankly questions. It's raising. There are definitely questions and ultimately what does it all mean? For the future of making music and even just how we listen to it And we're drawing on some really insightful pieces from beats to wrap on You can get both the tech side and the kind of cultural shifts. It's causing that's right They've got some good perspectives on this, you know for years and years if you wanted to isolate Let's say just the vocals from an old recording. Oh, yeah Nightmare total headache, right often meant Digging up the original multi-track tapes if they even existed. Yeah, and they could be a real tangled mess. Absolutely time-consuming Expensive yeah, sometimes impossible, but here's where it gets really interesting these AI models like Spleeter they've completely flipped the script totally changed the game. It's like going from painstakingly Trying to untangle a massive knot by hand. Uh-huh to suddenly having these incredibly precise Like digital tool that can just separate each strand almost instantly Yeah, the core idea behind AI stem splitting is using these deep learning models They're essentially smart algorithms trained to analyze a piece of music right and then intelligently separate out the different bits vocals bass Drums, maybe other instruments to the intricate digital scalpel blades analogy from the source pretty good, isn't it? It really is they're designed to isolate sounds with well remarkable accuracy sometimes and one of the big breakthroughs you mentioned this Unet thing sounds a bit sci-fi Oh, yeah, it does a bit but it started somewhere completely different like medical images exactly it originated in biomedical image segmentation Finding tumors and scans that kind of thing Wow So, how did that jump from looking at pictures to listening to music? Well the underlying structure this unit architecture it has this process of let's call it contraction and expansion It's been cleverly adapted for audio The contraction phase sort of compresses the audio info finding the key sonic features Got it then the expansion phase rebuilds the audio but ideally with the different stems neatly separated out think of it like a Like a symphonic journey where the music is broken down and then rebuild piece by piece So the AI isn't just like vaguely listening. It's really getting into the sonic details How does it know what's a drum and what's a vocal it works by looking at the spectral components? So imagine shining light through a prism you see all the colors, right? Yeah, the AI does something similar with sound looking at all the different frequencies present at any moment It learns to recognize patterns pitch rhythm how long notes last using things called convolutional neural networks CNN's and recurrent neural networks RNN's. Okay those acronyms again CNN's and RNN yeah think of CNN's as Helping the AI see the shape of a sound in the frequency data like a visual pattern and RNN's help it understand how sounds change over time the sequence, right, right So it's pattern recognition essentially, but I'm guessing you know, it's not perfect. It doesn't always nail it Oh, absolutely not and that's a really important point. This isn't magic. Okay, it's machine learning It needs huge amounts of music data Carefully labeled data to train on right someone has to teach at first exactly and even then the results aren't always pristine You might hear like the sources say ghostly remnants or artifacts little bits of say guitar leaking into the vocal track Precisely, or maybe the drums sound a bit Washi, it happens, but still there's some pretty cool tools out there making this tech way more accessible, right? Splitter was mentioned as a big one Yeah, splitter developed by Deezer is definitely a major player, especially in the open source world open source meaning free to use and modify Generally, yes, and it's known for being relatively easy to use plus you can choose how many stems you want to four or five usually Vocals drums bass and other that's really handy. Okay, let's let's dive a bit deeper into the science stuff the Fourier transform and Spectrograms, what's that all about? Okay, so the Fourier transform think of it as a mathematical tool it takes the audio signal which we hear changing over time Yeah, and translates it into its frequency components like what frequencies are present and how loud are they at each moment? Okay, breaking it down by pitch sort of sort of yeah And this frequency information is often visualized as a spectrogram. It's like the picture of the sound exactly It's a visual representation almost an input image that the AI Neural networks can analyze the AI gets trained by seeing tons of these spectrograms for isolated instruments So it learns what a drum looks like in the spectrogram view versus a vocal versus a bass guitar. Okay, that makes sense It's seeing the sound patterns, right? and then there are things called loss functions during training which basically tell the AI how well it's doing and help it minimize the Interference the bleed between the separated stems got it, but you mentioned limitations earlier Are there kinds of music where these AI models just get confused? Oh, definitely current models often have a harder time with music that doesn't follow. Let's say standard pop or rock structures Like what think about really complex jazz improvisation or maybe some experimental electronic music with unusual sounds Even live recordings can be tricky because sounds bleed into different microphones naturally, right? The separation isn't clean to begin with exactly it reminds us that while AI is amazing at analysis It doesn't yet, you know understand music in the way a human does the emotional context the cultural nuances. That's still beyond it Okay, so we've got a decent handle on what it is and the science sort of under the hood Where are we actually seeing this stuff used in the real world? Well commercial uses are already pretty widespread actually. Yeah, like we're major record labels Streaming platforms. They're using stem splitting for things like improving music recommendations Maybe enhancing the sound quality of older tracks where the original masters are poor or even creating more personalized listening experiences interesting But it's not just the big players, right? The article mentioned independent artists getting empowered to absolutely that's a huge aspect Platforms like well like beats to rap on are actually Offering AI stems platters as a tool for artists. How does that work? it means an independent musician can take say a royalty-free beat they like and Easily pull out just the drums or just the bass line to sample it remix it build something totally new on top of it I see so it lowers the barrier for creativity Exactly and they mentioned features like saving the split tracks even getting data files about peaks and lows for more precise editing That's real control. Yeah, that sounds powerful. So it's not just taking things apart It's enabling people to build new things and I guess the obvious uses are DJs and karaoke Oh for sure for DJs imagine being able to instantly grab Just the vocal acapella from one track and layer it over the instrumental of another total game-changer for live mixing Yeah, endless possibility and karaoke forget searching for specific backing tracks You could potentially remove the main vocal from almost any song Wow Okay Now I know beats to rap on did a like a head-to-head comparison their splitter versus others like voice dot AI vocal remover Org specifically on drums they did and well based on their analysis They're claiming their proprietary splitter comes out on top in what way they're saying it delivers a cleaner Low-end for the bass drum and bass sounds sharper transients, you know The initial crack of the snare drum right the attack exactly higher bandwidth meaning it captures more of the high frequencies and fewer unwanted artifacts Less of that bleed we talked about and they back this up somehow yeah, they mentioned using tools like so X which can generate those spectrograms we discussed to visually compare the frequency content they looked at metrics like bandwidth the clarity of those transient peaks the overall dynamic range and The level of noise or unwanted sounds. Okay, they're highlighting things like full range capture purity in the low frequencies Really crisp transients good headroom meaning less distortion and just a cleaner overall sound from their tool That's quite the claim and I saw a connection mentioned between this stem splitting and AI mastering. How do they link up? Well, it makes sense if you think about it if you feed an AI mastering tool really clean well separated stems It gives the mastering AI a much better starting point to work with it can potentially do a better job Balancing the levels adjusting the EQ for each part Optimizing the overall loudness and punch of the final track because it's not fighting against sounds bleeding into each other Exactly beats to wrap on even mentioned their own AI mastering system Valkyrie describing it as this sort of swarm of Specialized AI agents that analyze different parts of the sound having clean stems would presumably make that process more effective It all sounds incredibly powerful almost. Hmm, but scary sometimes which brings up the ethical side Yeah, that's definitely part of the conversation. There are fears about you know jobs Will this replace traditional audio engineers or producers right and then there are big questions about ownership and copyright if AI Can perfectly extract say a vocal from a famous song who owns that extracted stem? What can you legally do with it? Hmm consent and creative integrity become really tricky. Absolutely There's this tension between on the one hand democratizing music creation making it easier for anyone to participate It sounds good but on the other hand the potential for misuse or even may be leading to a more homogenized sound if Everyone is just endlessly remixing the same separated parts. Yeah, that's a concern Does it stifle originality in some ways? It's a valid debate and sometimes the sources mentioned the AI doesn't get it Perfect and these artifacts or surreal results emerge the imperfections Yeah And maybe those imperfections themselves can be interesting like windows into hidden dimensions of sound offering unexpected starting points for creative Experimentation using the glitches almost like a sound of resistance against perfectly polished music. That's one way to frame it. Yeah Embracing the messy unpredictable side of creativity that AI might inadvertently reveal so looking ahead then What are the next big challenges? Where's this tech heading? Well one major technical hurdle is phase reconstruction phase What's that? Okay, so sound waves have amplitude loudness and frequency pitch But also phase which relates to the timing of the wave cycles, right? Getting the phase right is really important for high fidelity audio making it sound natural and clear Current models often focus more on loudness and frequency, but accurately recreating the phase is still tricky They're working on algorithms like Griffin limb for that. Okay, so better sound quality is one goal What else genre diversity is another big one as we said models struggle with certain styles yeah, so creating a truly universal model that works perfectly on everything from classical to Hip-hop to avant-garde jazz is tough We'll likely see more models fine-tuned for specific genres or Techniques like transfer learning being used where it learns one genre and applies it to another sort of yeah Adapting its knowledge and then there's the sheer computational power needed right training These things must take huge computers It does so there's a lot of research into making the models smaller and more efficient model compression So they could potentially run in real time. Maybe even on your phone eventually Wow Real-time stem separation on a phone That's the dream for some applications and we're also seeing hybrid approaches models that combine different processing techniques Looking at both the raw waveform and the frequency domain together It really feels like this is fundamentally changing the role of the artist or producer doesn't it? I think so It's definitely making sophisticated tools more accessible lowering barriers The skill set might evolve future producers might need to blend traditional skills and music theory sound engineering with an understanding of data science and how to best leverage these AI tools, so not necessarily replacement, but augmentation Collaboration exactly a hybrid future where humans and AI work together seems the most likely and maybe the most interesting outcome AI handling the heavy lifting of separation humans providing the creative direction the taste the context Well, this has been a really really fascinating dip dive So just to quickly wrap up AI audio stem splitting It's a super powerful tech breaking down songs into their parts and it's seriously impacting music production remixing sampling even just how we listen Yeah, and it brings up all these crucial debates around creativity ownership in the future shape of the music industry itself it really makes you think doesn't it that balance between the sheer power of the technology and Well the artistic soul that's a perfect way to put it It poses some big questions like as AI gets scarily good at taking sound apart and putting it back together What are the uniquely human things that become even more valuable in making music? Hmm, that's a great question or you know, will this easy stem splitting lead to this? Amazing explosion of creative reuse and innovation. We haven't even imagined yet or Could it maybe flatten things out lead to a kind of sonic homogenization because everyone's pulling from the same well Yeah innovation versus homogenization. That's the core tension Maybe definitely something for you to think about next time you're deep into your favorite playlist. Yeah, maybe even Find one of these AI stem splitters online and give it a whirl yourself You might be surprised what you find hiding in the mix exactly see what hidden layers you can uncover