Overlapped Speech Detection in Multi-Party Meetings
Detection of simultaneous speech in meeting recordings is a difficult problem due both to the complexity of the meeting itself and the environment surrounding it. The system proposes the use of gammatone-like spectrogram-based linear predictor coefficients on distant microphone channel data for overlap detection functions. The framework utilized the Augmented Multiparty Interaction (AMI) conference corpus to assess model performance. The proposed system offers enhancements over base line feature set models for classification.
. S. Otterson and M. Ostendorf, “Efficient use of overlap information in speaker diarization,” in 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) (pp. 683-686). IEEE.
. M. Huijbregts and C. Chuck, "The blame game: Performance analysis of speaker diarization system components," In Eighth Annual Conference of the International Speech Communication Association. 2007.
. S.N. Wrigley, G.J. Brown, V. Wan and S. Renals, "Speech and crosstalk detection in multichannel audio," in IEEE Transactions on speech and audio processing 13, no. 1 (2004): 84-91.
. K. Boakye, Kofi, B. Trueba-Hornero, O. Vinyals, and G, Friedland, "Overlapped speech detection for improved speaker diarization in multiparty meetings," in 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4353-4356. IEEE, 2008..
. P. Dighe, M. Ferras, and H. Bourlard, "Detecting and labeling speakers on overlapping speech using vector taylor series," in Fifteenth Annual Conference of the International Speech Communication Association. 2014.
. R. Vipperla, D. Wang, S. Bozonnet, and N. Evans, "Speech overlap detection using convolutive non-negative sparse coding," in (2011).
. S. H. Yella, and H. Bourlard, "Overlapping speech detection using long-term conversational features for speaker diarization in meeting room conversations," in IEEE/ACM Transactions on Audio, Speech, and Language Processing 22, no. 12 (2014): 1688-1700.
. K. Boakye, O. Vinyals, and G. Friedland, "Two's a crowd: Improving speaker diarization by automatically identifying and excluding overlapped speech," in Ninth Annual Conference of the International Speech Communication Association. 2008.
. V. Andrei, H. Cucu, and C. Burileanu, "Detecting Overlapped Speech on Short Timeframes Using Deep Learning," in INTERSPEECH, pp. 1198-1202. 2017.
. M. Diez, F. Landini, L. Burget, J. Rohdin, A. Silnova, K. Zmolíková, O. Novotný et al. "BUT System for DIHARD Speech Diarization Challenge 2018," in Interspeech, pp. 2798-2802. 2018.
. N. Sajjan, S. Ganesh, N. Sharma, S. Ganapathy, and N. Ryant, "Leveraging LSTM models for overlap detection in multi-party meetings," in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5249-5253. IEEE, 2018.
. N. Shokouhi, A. Sathyanarayana, S. O. Sadjadi, and J. H. Hansen, "Overlapped-speech detection with applications to driver assessment for in-vehicle active safety systems," in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2834-2838. IEEE, 2013.
. K. Boakye, B. Trueba-Hornero, O. Vinyals, and G. Friedland, "Overlapped speech detection for improved speaker diarization in multiparty meetings," in 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4353-4356. IEEE, 2008..
. N. Shokouhi, A. Ziaei, A. Sangwan, and J. H. HL, "Robust overlapped speech detection and its application in word-count estimation for Prof-Life-Log data," in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4724-4728. IEEE, 2015.
. J. T. Geiger, F. Eyben, B. Schuller, and G. Rigoll, "Detecting overlapping speech with long short-term memory recurrent neural networks," in Proceedings INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France. 2013.
. D. P. W. Ellis, "Gammatone-like spectrograms. web resource." URL: http://www. ee. columbia. edu/~ dpwe/resources/matlab/gammatonegram (2009).
. J. Carletta, A. Simone, S. Bourban, M. Flynn, M. Guillemot, T. Hain, J. Kadlec et al, "The AMI meeting corpus: A pre-announcement," In International workshop on machine learning for multimodal interaction, pp. 28-39. Springer, Berlin, Heidelberg, 2005.
. L. Dormehl,. "What is an artificial neural network? Here’s everything you need to know," in Digital Trends (2019).
. T. G. Dietterich, "Ensemble methods in machine learning." In International workshop on multiple classifier systems, pp. 1-15. Springer, Berlin, Heidelberg, 2000.
. T.H. Zaw and M. M. Thaw, “Speech Activity Detection in Multi Party Meetings,” in International Journal of Scientific Research and Engineering Development-– Volume 3 Issue 3, May – June 2020
Copyright (c) 2020 International Journal of Computer (IJC)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who submit papers with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
- By submitting the processing fee, it is understood that the author has agreed to our terms and conditions which may change from time to time without any notice.
- It should be clear for authors that the Editor In Chief is responsible for the final decision about the submitted papers; have the right to accept\reject any paper. The Editor In Chief will choose any option from the following to review the submitted papers:A. send the paper to two reviewers, if the results were negative by one reviewer and positive by the other one; then the editor may send the paper for third reviewer or he take immediately the final decision by accepting\rejecting the paper. The Editor In Chief will ask the selected reviewers to present the results within 7 working days, if they were unable to complete the review within the agreed period then the editor have the right to resend the papers for new reviewers using the same procedure. If the Editor In Chief was not able to find suitable reviewers for certain papers then he have the right to reject the paper.
- Author will take the responsibility what so ever if any copyright infringement or any other violation of any law is done by publishing the research work by the author
- Before publishing, author must check whether this journal is accepted by his employer, or any authority he intends to submit his research work. we will not be responsible in this matter.
- If at any time, due to any legal reason, if the journal stops accepting manuscripts or could not publish already accepted manuscripts, we will have the right to cancel all or any one of the manuscripts without any compensation or returning back any kind of processing cost.
- The cost covered in the publication fees is only for online publication of a single manuscript.