api – Is there a way to differentiate speech and music?

I’m making an application that could really benefit from having a way to differentiate speech and music.

So far I went looking into voice recognition, but I don’t really see anything that differentiates music from speech, but just is able to identify the words. And while I could use this technically to detect whether or not it’s not music, and it is speech, due to the fact that if there are words it isn’t music unless it’s a song. But given how much of music is song, I don’t think it would be an effective method unless it’s coupled with something. Besides, I couldn’t really find an open source “api” for voice recognition.

Next thing I considered is pitch accuracy and duration, if I take the current frequencies and run them against the musical note frequencies with a buffer of course, I could? ineffectively? say that, if it’s consistently aligned with note frequencies value. That it’s music. The problem with that is, besides the looming faults, that, how can I say when it stops being music and starts being speech, because I’m recognizing patterns.

If you have any ideas please let me know, also if there’s already a way to do this with like a plugin, framework, api, imports, forgive my lack of syntactical know how.

what would be the best language to program this in?

OH!. I also saw an article on Speech/Music Discrimination and then found this on github, if anyone knows more about this, please explain it to me

azure – Sentiment analysis for speech in MS Teams meetings (classrooms to be more specific)

I’m looking for a solution that could be used with MS Teams for Education that could be used for sentiment analysis in real time during classroom discussions in synchronous on-line learning. I assume Azure’s Speech to Text could capture what is said but not sure what tools are available for the analysis.
Comments by individual students are not as important as gauging the overall sentiment in the classroom and ideally it would go past simply positive/negative to detecting mood, words or phrases that are being used frequently, lapses between questions and answers, etc.
Ideally the data could then be fed into Power BI for providing real-time indicators of what is happening in the class.

Any suggestions?

applescript – Use Google speech to text instead of Apple’s

Google currently has much higher quality speech transcription. You can try it easily by going to doc.new and pressing Cmd-Shift-S and talking. Compare to using the built in system from Apple (pressing Fn twice).

Is there any way to somehow use this instead of the built in transcription, e.g. with a keyboard shortcut?

Ideally there would be a good way to do this, but if that doesn’t work, hacky is the next best thing. An AppleScript or Hammerspoon script that instantly pops up a Google Docs window, and then inserts the result is better than nothing.

Given that things like this https://dictation.io/speech exist, it seems like it should be possible to at least have an extension that can interface with other applications to enable dictation input, but I haven’t found any yet.

