research-article

A Human-Computer Duet System for Music Performance

Authors:

Yih-Chih Tseng,

Li SuAuthors Info & Claims

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

October 2020

Pages 772 - 780

https://doi.org/10.1145/3394171.3413921

Published: 12 October 2020 Publication History

Abstract

Virtual musicians have become a remarkable phenomenon in the contemporary multimedia arts. However, most of the virtual musicians nowadays have not been endowed with abilities to create their own behaviors, or to perform music with human musicians. In this paper, we firstly create a virtual violinist, who can collaborate with a human pianist to perform chamber music automatically without any intervention. The system incorporates the techniques from various fields, including real-time music tracking, pose estimation, and body movement generation. In our system, the virtual musician's behavior is generated based on the given music audio alone, and such a system results in a low-cost, efficient and scalable way to produce human and virtual musicians' co-performance. The proposed system has been validated in public concerts. Objective quality assessment approaches and possible ways to systematically improve the system are also discussed.

Supplementary Material

MP4 File (3394171.3413921.mp4)

How can humans co-perform music with virtual musicians? \r\nIn the video, we will be presenting how we developed a low-cost human-computer duet system for music performance. In particular, the virtual violinist is fully automated and triggered only from music, and can follow human musician's performance tempo. In addition the virtual violinist body movement is also automatically generated from music.

Download
21.46 MB

References

[1]

Jelena Guga. Virtual idol hatsune miku. In International Conference on Arts and Technology, pages 36--44. Springer, 2014.

[2]

Yiyi Yin. Vocaloid in china: Cosmopolitan music, cultural expression, and multilayer identity. Global Media and China, 3(1):51--66, 2018.

[3]

Akihiko Shirai. Reality: broadcast your virtual beings from everywhere. In ACM SIGGRAPH 2019 Appy Hour, pages 1--2. 2019.

Digital Library

[4]

Masataka Goto, Isao Hidaka, Hideaki Matsumoto, Yosuke Kuroda, and Yoichi Muraoka. A jazz session system for interplay among all players-virja session (virtual jazz session system). In International Computer Music Conference (ICMC). Citeseer, 1996.

[5]

Eli Shlizerman, Lucio Dery, Hayden Schoen, and Ira Kemelmacher-Shlizerman. Audio to body dynamics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7574--7583, 2018.

[6]

Jun-Wei Liu, Hung-Yi Lin, Yu-Fen Huang, Hsuan-Kai Kao, and Li Su. Body movement generation for expressive violin performance applying neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3787--3791. IEEE, 2020.

[7]

Bochen Li, Akira Maezawa, and Zhiyao Duan. Skeleton plays piano: Online generation of pianist body movements from midi performance. In International Society of Music Information Retrieval Conference (ISMIR), pages 218--224, 2018.

[8]

Andreas Arzt and Gerhard Widmer. Simple tempo models for real-time music tracking. In Proceedings of the Sound and Music Computing Conference (SMC), 2010.

[9]

Andreas Arzt and Gerhard Widmer. Towards effective `any-time' music tracking. In Proceedings of the Fifth Starting AI Researchers' Symposium, page 24, 2010.

[10]

Takeshi Mizumoto, Angelica Lim, Takuma Otsuka, Kazuhiro Nakadai, Toru Takahashi, Tetsuya Ogata, and Hiroshi G. Okuno. Integration of flutist gesture recognition and beat tracking for human-robot ensemble. In Proc. of IEEE/RSJ-2010 Workshop on Robots and Musical Expression, pages 159--171, 2010.

[11]

Yoichi Nagashima. Bio-sensing systems and bio-feedback systems for interactive media arts. In Proceedings of the International Conference on New Interfaces for Musical Expression (NIME), pages 48--53. National University of Singapore, 2003.

[12]

Eduardo Miranda and Andrew Brouse. Toward direct brain-computer musical interfaces. In Proceedings of the International Conference on New Interfaces for Musical Expression (NIME), pages 216--219. National University of Singapore, 2005.

[13]

Beste F. Yuksel, Daniel Afergan, Evan M. Peck, Garth Griffin, Lane Harrison, Nick W.B. Chen, Remco Chang, and Robert J.K. Jacob. Braahms: a novel adaptive musical interface based on users' cognitive state. In NIME, pages 136--139, 2015.

[14]

Yoichiro Taki, Kenji Suzuki, and Shuji Hashimoto. Real-time initiative exchange algorithm for interactive music system. In International Computer Music Conference (ICMC), 2000.

[15]

Roger B. Dannenberg. An on-line algorithm for real-time accompaniment. In ICMC, volume 84, pages 193--198, 1984.

[16]

Arshia Cont. Realtime audio to score alignment for polyphonic music instruments, using sparse non-negative constraints and hierarchical hmms. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2006.

[17]

Nicola Montecchio and Arshia Cont. A unified approach to real time audio-to-score and audio-to-audio alignment using sequential montecarlo inference techniques. In IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 193--196, 2011.

[18]

Takuma Otsuka, Kazuhiro Nakadai, Toru Takahashi, Tetsuya Ogata, and HiroshiG Okuno. Real-time audio-to-score alignment using particle filter for coplayer music robots. EURASIP Journal on Advances in Signal Processing, 2011(1):384651, 2011.

Digital Library

[19]

Simon Dixon. Live tracking of musical performances using on-line time warping. In Proceedings of the 8th International Conference on Digital Audio Effects (DAFx), volume 92, page 97, 2005.

[20]

Robert Macrae and Simon Dixon. Accurate real-time windowed time warping. In Proc. ISMIR, pages 423--428, 2010.

[21]

Francisco Jose Rodriguez-Serrano, Julio Jose Carabias-Orti, Pedro Vera-Candeas, and Damian Martinez-Munoz. Tempo driven audio-to-score alignment using spectral decomposition and online dynamic time warping. ACM Trans. on Intelligent Systems and Technology, 8(2):22, 2016.

[22]

Pedro Alonso, Raquel Cortina, Francisco J Rodr'iguez-Serrano, Pedro Vera-Candeas, Mar'ia Alonso-González, and José Ranilla. Parallel online time warping for real-time audio-to-score alignment in multi-core systems. The Journal of Supercomputing, 73(1):126--138, 2017.

Digital Library

[23]

I-Chieh Wei and Li Su. Online music performance tracking using parallel dynamic time warping. In IEEE International Workshop on Multimedia Signal Processing (MMSP), pages 1--6. IEEE, 2018.

[24]

Matthias Dorfer, Florian Henkel, and Gerhard Widmer. Learning to listen, read, and follow: Score following as a reinforcement learning game. arXiv preprint arXiv:1807.06391, 2018.

[25]

Nicola Orio, Serge Lemouton, and Diemo Schwarz. Score following: State of the art and new developments. In Proceedings of the International Conference on New Interfaces for Musical Expression (NIME), pages 36--41, 2003.

[26]

Arshia Cont, Diemo Schwarz, Norbert Schnell, and Christopher Raphael. Evaluation of real-time audio-to-score alignment. In International Symposium on Music Information Retrieval (ISMIR), 2007.

[27]

Takuma Otsuka, Kazuhiro Nakadai, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, and Hiroshi G Okuno. Design and implementation of two-level synchronization for interactive music robot. In AAAI Conference on Artificial Intelligence (AAAI), 2010.

[28]

Andreas Arzt, Gerhard Widmer, and Simon Dixon. Adaptive distance normalization for real-time music tracking. In 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO), pages 2689--2693. IEEE, 2012.

[29]

Jinxiang Chai and Jessica K Hodgins. Performance animation from low-dimensional control signals. In ACM SIGGRAPH, pages 686--696. 2005.

Digital Library

[30]

Jonathan Starck and Adrian Hilton. Surface capture for performance-based animation. IEEE Computer Graphics and Applications, 27(3):21--31, 2007.

Digital Library

[31]

Shihong Xia, Lin Gao, Yu-Kun Lai, Ming-Ze Yuan, and Jinxiang Chai. A survey on human performance capture and animation. Journal of Computer Science and Technology, 32(3):536--554, 2017.

[32]

Bibeg Hang Limbu, Halszka Jarodzka, Roland Klemke, and Marcus Specht. Using sensors and augmented reality to train apprentices using recorded expert performance: A systematic literature review. Educational Research Review, 25:1--22, 2018.

[33]

Yeyao Zhang, Eleftheria Tsipidi, Sasha Schriber, Mubbasir Kapadia, Markus Gross, and Ashutosh Modi. Generating animations from screenplays. The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), page 292, 2019.

[34]

Hariharan Subramonyam, Wilmot Li, Eytan Adar, and Mira Dontcheva. Taketoons: Script-driven performance animation. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology, pages 663--674, 2018.

[35]

Nora S Willett, Hijung Valentina Shin, Zeyu Jin, Wilmot Li, and Adam Finkelstein. Pose2pose: pose selection and transfer for 2d character animation. In Proceedings of the 25th International Conference on Intelligent User Interfaces (IUI), pages 88--99, 2020.

Digital Library

[36]

Adrien Nivaggioli and Damien Rohmer. Animation synthesis triggered by vocal mimics. In Motion, Interaction and Games, pages 1--5. 2019.

Digital Library

[37]

Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, and Jitendra Malik. Learning individual styles of conversational gesture. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3497--3506, 2019.

[38]

Ryo Kakitsuka, Kosetsu Tsukuda, Satoru Fukayama, Naoya Iwamoto, Masataka Goto, and Shigeo Morishima. A choreographic authoring system for character dance animation reflecting a user's preference. In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pages 5--1, 2016.

[39]

Dario Pavllo, Christoph Feichtenhofer, David Grangier, and Michael Auli. 3d human pose estimation in video with temporal convolutions and semi-supervised training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7753--7762, 2019.

[40]

Brian McFee, Colin Raffel, Dawen Liang, Daniel P.W. Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. librosa: Audio and music signal analysis in python. In Proceedings of the 14th python in science conference, volume 8, 2015.

[41]

Sebastian Böck, Filip Korzeniowski, Jan Schlüter, Florian Krebs, and Gerhard Widmer. MADMOM: A new python audio and music signal processing library. In Proceedings of the 24th ACM international conference on Multimedia, pages 1174--1178, 2016.

Digital Library

[42]

Zhan Xu, Yang Zhou, Evangelos Kalogerakis, Chris Landreth, and Karan Singh. Rignet: Neural rigging for articulated characters. arXiv preprint arXiv:2005.00559, 2020.

Cited By

Kyriakou Tde la Campa Crespo MPanayiotou AChrysanthou YCharalambous PAristidou A(2024)Virtual Instrument Performances (VIP): A Comprehensive ReviewComputer Graphics Forum10.1111/cgf.1506543:2Online publication date: 30-Apr-2024
https://doi.org/10.1111/cgf.15065
Syarif AHastuti KAndono P(2022)Traditional Javanese Membranophone Percussion Play Formalization for Virtual Orchestra Automation2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)10.1109/CyberneticsCom55287.2022.9865292(381-386)Online publication date: 16-Jun-2022
https://doi.org/10.1109/CyberneticsCom55287.2022.9865292
Jamshidi FMarghitu DChapman R(2021)Developing an Online Music Teaching and Practicing Platform via Machine Learning: A Review PaperUniversal Access in Human-Computer Interaction. Access to Media, Learning and Assistive Environments10.1007/978-3-030-78095-1_9(95-108)Online publication date: 3-Jul-2021
https://doi.org/10.1007/978-3-030-78095-1_9

Index Terms

A Human-Computer Duet System for Music Performance
1. Applied computing
  1. Arts and humanities

Recommendations

Real-time Melodic Accompaniment System for Indian Music Using TMS320C6713
VLSID '12: Proceedings of the 2012 25th International Conference on VLSI Design

An instrumental accompaniment system for Indian classical vocal music is designed and implemented on a Texas Instruments Digital Signal Processor TMS320C6713. This will act as a virtual accompanist following the main artist, possibly a vocalist. The ...
Read More
Computational Analysis of Jazz Music: Estimating Tonality through Chord Progression Distances
CSAE '23: Proceedings of the 7th International Conference on Computer Science and Application Engineering

Currently, research in music informatics focuses extensively on music theory, particularly on the theoretical systems of Western classical music dating back to the 19th century. However, contemporary popular music genres such as pop, rock, and jazz often ...
Read More
A Query-by-Singing System for Retrieving Karaoke Music

This paper investigates the problem of retrieving karaoke music using query-by-singing techniques. Unlike regular CD music, where the stereo sound involves two audio channels that usually sound the same, karaoke music encompasses two distinct channels ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

October 2020

4889 pages

ISBN:9781450379885

DOI:10.1145/3394171

General Chairs:
Chang Wen Chen
Chinese University of Hong Kong, Shenzhen, China
,
Rita Cucchiara
UNIMORE, Italy
,
Xian-Sheng Hua
Alibaba Group, China
,
Program Chairs:
Guo-Jun Qi
Futurewei Technologies, USA
,
Elisa Ricci
UNITN & Fondazione Bruno Kessler, Italy
,
Zhengyou Zhang
Tencent, China
,
Roger Zimmermann
National University of Singapore, Singapore

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '20

Sponsor:

SIGMM

MM '20: The 28th ACM International Conference on Multimedia

October 12 - 16, 2020

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
344
Total Downloads

Downloads (Last 12 months)62
Downloads (Last 6 weeks)3

Other Metrics

View Author Metrics

Citations

Cited By

Kyriakou Tde la Campa Crespo MPanayiotou AChrysanthou YCharalambous PAristidou A(2024)Virtual Instrument Performances (VIP): A Comprehensive ReviewComputer Graphics Forum10.1111/cgf.1506543:2Online publication date: 30-Apr-2024
https://doi.org/10.1111/cgf.15065
Syarif AHastuti KAndono P(2022)Traditional Javanese Membranophone Percussion Play Formalization for Virtual Orchestra Automation2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)10.1109/CyberneticsCom55287.2022.9865292(381-386)Online publication date: 16-Jun-2022
https://doi.org/10.1109/CyberneticsCom55287.2022.9865292
Jamshidi FMarghitu DChapman R(2021)Developing an Online Music Teaching and Practicing Platform via Machine Learning: A Review PaperUniversal Access in Human-Computer Interaction. Access to Media, Learning and Assistive Environments10.1007/978-3-030-78095-1_9(95-108)Online publication date: 3-Jul-2021
https://doi.org/10.1007/978-3-030-78095-1_9

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents