speech recognition – Speaker Diarization language impact

Does anyone know if the language which a speaker diarization system is trained impact the accurancy of the model?

Many speaker diarization models are trained with mainly english data:
http://www.robots.ox.ac.uk/~vgg/data/voxceleb/

As many western languages have latin influence in the language I am wondering if a pre-trained speaker diarization model like pyannote.audio or Resemblyzer also works well for african oder asian languages.

Receive used language when using google Speech To Text API in conjunction with “alternativeLanguageCodes”

So I run a speech transcription and supply some alternativeLanguageCodes in addition to the main languageCode

for Example:

languageCode: "en"
alternativeLanguageCodes: ("fr","it")

Google speech automatically returns me the one transcription where the confidence is high enough that it might actually be the right one that the audio file’s speaker speaks in. Great!

But … I want to also know which language(code) it actually is/used. Is there any way to make the api return such metadata (like lang: ‘fr’) for that result alternative?

Google Speech recognition doesn’t consistently enter the speech into app text field

I cannot figure out the whole pattern but much of the time these days if I use Google Speech to recognise some text to put into a text field eg Facebook’s or Messenger’s, it successfully recognises the text (it shows me) but then when the popup displaying the text goes away, it leaves the text field unchanged, instead of entering the recognised text. It seems to happen mostly if I do some speech recognition, then edit or move the cursor, then some more speech recognition.

Is there any way to force the recognised text into that field or get it into “paste” so I can paste it in?

(Using a Huawei P20 Pro running Android 10 and Microsoft Swiftkey to “activate” Google Speech).

api – Is there a way to differentiate speech and music?

I’m making an application that could really benefit from having a way to differentiate speech and music.

So far I went looking into voice recognition, but I don’t really see anything that differentiates music from speech, but just is able to identify the words. And while I could use this technically to detect whether or not it’s not music, and it is speech, due to the fact that if there are words it isn’t music unless it’s a song. But given how much of music is song, I don’t think it would be an effective method unless it’s coupled with something. Besides, I couldn’t really find an open source “api” for voice recognition.

Next thing I considered is pitch accuracy and duration, if I take the current frequencies and run them against the musical note frequencies with a buffer of course, I could? ineffectively? say that, if it’s consistently aligned with note frequencies value. That it’s music. The problem with that is, besides the looming faults, that, how can I say when it stops being music and starts being speech, because I’m recognizing patterns.

If you have any ideas please let me know, also if there’s already a way to do this with like a plugin, framework, api, imports, forgive my lack of syntactical know how.

what would be the best language to program this in?

OH!. I also saw an article on Speech/Music Discrimination and then found this on github, if anyone knows more about this, please explain it to me

Trumps MT Rushmore speech was the Most watched TV event in history?

It was one of the most sickening displays of divisiveness the country has seen. He took a national holiday celebration and turned it into a campaign rally for himself. Bad mouthing everyone but those who support him and barely touching upon the over 130,000 covid deaths in the US since Feb.

You enjoyed it along with other trump suck ups but the world once again saw him for what he is. A lying, dangerous POS.

azure – Sentiment analysis for speech in MS Teams meetings (classrooms to be more specific)

I’m looking for a solution that could be used with MS Teams for Education that could be used for sentiment analysis in real time during classroom discussions in synchronous on-line learning. I assume Azure’s Speech to Text could capture what is said but not sure what tools are available for the analysis.
Comments by individual students are not as important as gauging the overall sentiment in the classroom and ideally it would go past simply positive/negative to detecting mood, words or phrases that are being used frequently, lapses between questions and answers, etc.
Ideally the data could then be fed into Power BI for providing real-time indicators of what is happening in the class.

Any suggestions?

Discussion Time – Dedicated to Free Speech | Forum Promotion

Discussion Time is a general discussion/debate forum that is dedicated to freedom of speech. It was founded as an alternative to a particular website that stifled honest debate and discussion.

Freedom of Speech is so important, especially when many websites ban you for unpopular opinions, no matter how harmless they are.

You are free to speak your mind. We, as a forum, will not ban you no matter how controversial your opinion is. We expect people to self moderate – use common sense, basically.

You don’t even have to sign up. Guests are free to participate in discussions and post new discussions. However there are members only sections that guests will not be able to participate in.

Currently we have pictionary style forum games where if you guess correctly enough times you will win forum prizes and monetary vouchers for Amazon.