Keynote Talks

1. Shigeki Sagayama: Automatic Music Composition: Issues and Limitations
2. Yi-Hsuan Yang: Automatic Music Generation with the Transformers
3. Tatsuya Daikoku: Neural and computational understanding of statistical learning of the brain in musical emotion and the creativity

Shigeki Sagayama

Professor Emeritus, Graduate School of Information Science and Technology, The University of Tokyo Visiting Researcher, Graduate School of Informatics and Engineering, The University of Electro-Communications

Lecture Title: Automatic Music Composition: Issues and Limitations

Shigeki Sagayama was born in Hyogo, Japan, in 1948. He received the B.E., M.E., and Ph.D. degrees from the University of Tokyo, Tokyo, Japan, in 1972, 1974, and 1998, respectively, all in mathematical engineering and information physics.

He joined Nippon Telegraph and Telephone Public Corporation (currently, NTT) in 1974 and started his career in speech analysis, synthesis, and recognition at NTT Laboratories, Musashino, Japan. From 1990 to 1993, he was Head of Speech Processing Department, ATR Interpreting Telephony Laboratories, Kyoto, Japan, pursuing an automatic speech translation project. From 1993 to 1998, he was responsible for speech recognition, synthesis, and dialog systems at NTT Human Interface Laboratories, Yokosuka, Japan. In 1998, he became a Professor of the Graduate School of Information Science, Japan Advanced Institute of Science and Technology (JAIST), Ishikawa, Japan. In 2000, he was appointed Professor of the Graduate School of Information Science and Technology (formerly Graduate School of Engineering), University of Tokyo (UT), Tokyo, Japan. In 2013, he became a professor emeritus of UT and Project Professor at National Institute of informatics (NII), Tokyo, Japan. In 2014, he became a full professor at Meiji University, School of Interdisciplinary Mathematical Sciences, Nakano, Tokyo. His major research Interests include processing and recognition of speech, music, acoustic signals, handwriting, and images. He was the leader of anthropomorphic spoken dialog agent project (Galatea Project) from 2000 to 2003.

He is known as one of pioneers of numerous important concepts in speech processing including Lag Window (1975) widely used in speech codecs, cepstrum-based speech recognition later deployed in first spoken dialog system in public telephone service (“ANSER”) in 1981, Delta-Cepstrum (1979) which became essential in speech recognition, statistical speech synthesis (1997) related to modern HMM-based speech synthesis, Tree-based Allophone Clustering (1986) before K-F Lee et al.’s phonetical tree-based allophone clustering, Hidden Markov Network (1991) for precise modeling of allophonic variations currently employed in a NTT DoCoMo’s mobile phone service, Vector Field Smoothing (1990) for speaker adaptation which became one of major methods of speaker adaptation before MLLR was proposed, Tree-structured Speaker Modeling (1992), Jacobian Adaptation (1996?) for rapid environmental noise in speech recognition, and many others. He also promoted stroke-HMM-based online handwritten Kanji character recognition (1999), Anthropomorphic Spoken Dialog System (2001) combining speech recognition, synthesis, face expression generation and dialog control with 10 research institutes, early (probably first) use of Hidden Markov Model (1999) in music processing, automatic music composition from Japanese lyrics (2006), and many others.

Prof. Sagayama is a Fellow of IEICEJ, a life member of IEEE, and a menber of the Acoustical Society of Japan (ASJ) and IPSJ. He received the National Invention Award from the Institute of Invention of Japan in 1991, the Chief Official’s award for Research Achievement from the Science and Technology Agency of Japan in 1996, and other academic awards including Paper Awards from the Institute of Electronics, Information, and Communications Engineers (IEICEJ) Japan, in 1996 and from the Information Processing Society of Japan (IPSJ) in 1995, and Achievement Award from ASJ in 2021.

Yi-Hsuan Yang

Full professor, College of Electrical Engineering and Computer Science, National Taiwan University

Lecture Title: Automatic Music Generation with the Transformers

Dr. Yi-Hsuan Yang received the Ph.D. degree in Communication Engineering from National Taiwan University. Since February 2023, he has been with the College of Electrical Engineering and Computer Science, National Taiwan University, where he is a Full Professor. Prior to that, he used to be the Chief Music Scientist in an industrial lab called Taiwan AI Labs from 2019 to 2023, and an Associate/Assistant Research Fellow of the Research Center for IT Innovation, Academia Sinica, from 2011 to 2023. His research interests include automatic music generation, music information retrieval, artificial intelligence, and machine learning. His team developed music generation models such as MidiNet, MuseGAN, Pop Music Transformer, KaraSinger, and MuseMorphose. He was an Associate Editor for the IEEE Transactions on Multimedia and IEEE Transactions on Affective Computing, both from 2016 to 2019. Dr. Yang is a senior member of the IEEE. 

Tatsuya Daikoku

Project Assistant Professor, International Research Center for Neurointelligence, The University of Tokyo

Lecture Title: Neural and computational understanding of statistical learning of the brain in musical emotion and the creativity

I joined The University of Tokyo after a posdoc at University of Oxford, MaxPlanck institute, and University of Cambridge. I’m interested in interdisciplinary understanding of human and artificial intelligence and the associated creativity. Particularly, my topic is to investigate universality and specialty in music and language. Further, I try to devise a computational model of creativity in the brain based on neurophysiological data, and understand the origin of creativity and the developmental process. Then, using the model, I’m trying to generate novel music theory that covers both mathematical and neural phenomena, and compose contemporary music.