IJCAI-97 AI Challenge-5
"Understanding Three Simultaneous Speeches"


JSAI Special Interest Group on AI Challenges

Understanding Three Simultaneous Speeches
Hiroshi G. Okuno, Tomohiro Nakatani, and Takeshi Kawabata
(NTT Basic Research Laboratories)

Understanding three simultaneous speeches is proposed as a challenge problem to foster artificial intelligence, speech and sound understanding or recognition, and computational auditory scene analysis research. Automatic speech recognition under noisy environments is attacked by speech enhancement techniques such as noise reduction and speaker adaptation. However, the signal-to-noise ratio of speech in two simultaneous speeches is too poor to apply these techniques.
Therefore, novel techniques need to be developed. One candidate is to use speech stream segregation as a front-end of automatic speech recognition systems. Preliminary experiments on understanding two simultaneous speeches show that the proposed challenge problem will be feasible with speech stream segregation.
The detailed plan of the research on and benchmark sounds for the proposed challenge problem is also presented.

[Proc. of IJCAI-97, pp.30-35]


o Call For Challengers


o Challenge Paper Proposed at IJCAI-97

  1. Hiroshi G. Okuno, Tomohiro Nakatani, and Takeshi Kawabata:
    "Understanding Three Simultaneous Speakers", Proc. of the 15th International Joint Conference on Artificial Intelligence (IJCAI-97), Vol.1, pp.30-35, IJCAI, Nagoya, Aug. 1997. Paper in postscript
  2. Hiroshi G. Okuno, Tomohiro Nakatani, and Takeshi Kawabata:
    "Challenge Problem for Computational Auditory Scene Analysis: Understanding Three Simultaneous Speeches", Proc. of IJCAI-97 Workshop on Computational Auditory Scene Analysis (CASA'97), pp.61-68, IJCAI, Nagoya, Aug. 1997. Paper in postscript

o Related topics in AI Challenge-5


o Related topics in Computational Auditory Scene Analysis (CASA)

  1. CASA'97 The Second IJCAI Workshop on Computational Auditory Secene Analysis, Aug. 1997, Nagoya, Japan
  2. CASA'95 The First IJCAI Workshop on Computational Auditory Secene Analysis Aug. 1995, Montreal, Canada.
  3. "Computatonal Auditory Scene Analysis" by D. F. Rosenthal and H. G. Okuno (Eds.), Lawrence Erlbaum Associates, 1998. ISBN 0-8058-2283-6.
  4. Dan Ellis's City Sound Scene Organization Problem
  5. Dr. Hideki Kawahara's page on his "Hearing voice" presentation.
  6. Mr. Masataka Goto's page on Real-Time Beat Tracking for Musical Acoustic Signals.
  7. Researchers at Sheffield University and ATR have created the ShATR Multi-Simultaneous-Speaker Corpus.


0:0 0:0 in JST. *** since June 24, 1998.

Last Update: Wed Jun 24 09:52:21 1998
by Hiroshi G. Okuno