I’m making an application that could really benefit from having a way to differentiate speech and music.
So far I went looking into voice recognition, but I don’t really see anything that differentiates music from speech, but just is able to identify the words. And while I could use this technically to detect whether or not it’s not music, and it is speech, due to the fact that if there are words it isn’t music unless it’s a song. But given how much of music is song, I don’t think it would be an effective method unless it’s coupled with something. Besides, I couldn’t really find an open source “api” for voice recognition.
Next thing I considered is pitch accuracy and duration, if I take the current frequencies and run them against the musical note frequencies with a buffer of course, I could? ineffectively? say that, if it’s consistently aligned with note frequencies value. That it’s music. The problem with that is, besides the looming faults, that, how can I say when it stops being music and starts being speech, because I’m recognizing patterns.
If you have any ideas please let me know, also if there’s already a way to do this with like a plugin, framework, api, imports, forgive my lack of syntactical know how.
what would be the best language to program this in?
OH!. I also saw an article on Speech/Music Discrimination and then found this on github, if anyone knows more about this, please explain it to me