skip to main content
research-article

A Human-Computer Duet System for Music Performance

Published: 12 October 2020 Publication History
  • Get Citation Alerts
  • Abstract

    Virtual musicians have become a remarkable phenomenon in the contemporary multimedia arts. However, most of the virtual musicians nowadays have not been endowed with abilities to create their own behaviors, or to perform music with human musicians. In this paper, we firstly create a virtual violinist, who can collaborate with a human pianist to perform chamber music automatically without any intervention. The system incorporates the techniques from various fields, including real-time music tracking, pose estimation, and body movement generation. In our system, the virtual musician's behavior is generated based on the given music audio alone, and such a system results in a low-cost, efficient and scalable way to produce human and virtual musicians' co-performance. The proposed system has been validated in public concerts. Objective quality assessment approaches and possible ways to systematically improve the system are also discussed.

    Supplementary Material

    MP4 File (3394171.3413921.mp4)
    How can humans co-perform music with virtual musicians? \r\nIn the video, we will be presenting how we developed a low-cost human-computer duet system for music performance. In particular, the virtual violinist is fully automated and triggered only from music, and can follow human musician's performance tempo. In addition the virtual violinist body movement is also automatically generated from music.

    References

    [1]
    Jelena Guga. Virtual idol hatsune miku. In International Conference on Arts and Technology, pages 36--44. Springer, 2014.
    [2]
    Yiyi Yin. Vocaloid in china: Cosmopolitan music, cultural expression, and multilayer identity. Global Media and China, 3(1):51--66, 2018.
    [3]
    Akihiko Shirai. Reality: broadcast your virtual beings from everywhere. In ACM SIGGRAPH 2019 Appy Hour, pages 1--2. 2019.
    [4]
    Masataka Goto, Isao Hidaka, Hideaki Matsumoto, Yosuke Kuroda, and Yoichi Muraoka. A jazz session system for interplay among all players-virja session (virtual jazz session system). In International Computer Music Conference (ICMC). Citeseer, 1996.
    [5]
    Eli Shlizerman, Lucio Dery, Hayden Schoen, and Ira Kemelmacher-Shlizerman. Audio to body dynamics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7574--7583, 2018.
    [6]
    Jun-Wei Liu, Hung-Yi Lin, Yu-Fen Huang, Hsuan-Kai Kao, and Li Su. Body movement generation for expressive violin performance applying neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3787--3791. IEEE, 2020.
    [7]
    Bochen Li, Akira Maezawa, and Zhiyao Duan. Skeleton plays piano: Online generation of pianist body movements from midi performance. In International Society of Music Information Retrieval Conference (ISMIR), pages 218--224, 2018.
    [8]
    Andreas Arzt and Gerhard Widmer. Simple tempo models for real-time music tracking. In Proceedings of the Sound and Music Computing Conference (SMC), 2010.
    [9]
    Andreas Arzt and Gerhard Widmer. Towards effective `any-time' music tracking. In Proceedings of the Fifth Starting AI Researchers' Symposium, page 24, 2010.
    [10]
    Takeshi Mizumoto, Angelica Lim, Takuma Otsuka, Kazuhiro Nakadai, Toru Takahashi, Tetsuya Ogata, and Hiroshi G. Okuno. Integration of flutist gesture recognition and beat tracking for human-robot ensemble. In Proc. of IEEE/RSJ-2010 Workshop on Robots and Musical Expression, pages 159--171, 2010.
    [11]
    Yoichi Nagashima. Bio-sensing systems and bio-feedback systems for interactive media arts. In Proceedings of the International Conference on New Interfaces for Musical Expression (NIME), pages 48--53. National University of Singapore, 2003.
    [12]
    Eduardo Miranda and Andrew Brouse. Toward direct brain-computer musical interfaces. In Proceedings of the International Conference on New Interfaces for Musical Expression (NIME), pages 216--219. National University of Singapore, 2005.
    [13]
    Beste F. Yuksel, Daniel Afergan, Evan M. Peck, Garth Griffin, Lane Harrison, Nick W.B. Chen, Remco Chang, and Robert J.K. Jacob. Braahms: a novel adaptive musical interface based on users' cognitive state. In NIME, pages 136--139, 2015.
    [14]
    Yoichiro Taki, Kenji Suzuki, and Shuji Hashimoto. Real-time initiative exchange algorithm for interactive music system. In International Computer Music Conference (ICMC), 2000.
    [15]
    Roger B. Dannenberg. An on-line algorithm for real-time accompaniment. In ICMC, volume 84, pages 193--198, 1984.
    [16]
    Arshia Cont. Realtime audio to score alignment for polyphonic music instruments, using sparse non-negative constraints and hierarchical hmms. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2006.
    [17]
    Nicola Montecchio and Arshia Cont. A unified approach to real time audio-to-score and audio-to-audio alignment using sequential montecarlo inference techniques. In IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 193--196, 2011.
    [18]
    Takuma Otsuka, Kazuhiro Nakadai, Toru Takahashi, Tetsuya Ogata, and HiroshiG Okuno. Real-time audio-to-score alignment using particle filter for coplayer music robots. EURASIP Journal on Advances in Signal Processing, 2011(1):384651, 2011.
    [19]
    Simon Dixon. Live tracking of musical performances using on-line time warping. In Proceedings of the 8th International Conference on Digital Audio Effects (DAFx), volume 92, page 97, 2005.
    [20]
    Robert Macrae and Simon Dixon. Accurate real-time windowed time warping. In Proc. ISMIR, pages 423--428, 2010.
    [21]
    Francisco Jose Rodriguez-Serrano, Julio Jose Carabias-Orti, Pedro Vera-Candeas, and Damian Martinez-Munoz. Tempo driven audio-to-score alignment using spectral decomposition and online dynamic time warping. ACM Trans. on Intelligent Systems and Technology, 8(2):22, 2016.
    [22]
    Pedro Alonso, Raquel Cortina, Francisco J Rodr'iguez-Serrano, Pedro Vera-Candeas, Mar'ia Alonso-González, and José Ranilla. Parallel online time warping for real-time audio-to-score alignment in multi-core systems. The Journal of Supercomputing, 73(1):126--138, 2017.
    [23]
    I-Chieh Wei and Li Su. Online music performance tracking using parallel dynamic time warping. In IEEE International Workshop on Multimedia Signal Processing (MMSP), pages 1--6. IEEE, 2018.
    [24]
    Matthias Dorfer, Florian Henkel, and Gerhard Widmer. Learning to listen, read, and follow: Score following as a reinforcement learning game. arXiv preprint arXiv:1807.06391, 2018.
    [25]
    Nicola Orio, Serge Lemouton, and Diemo Schwarz. Score following: State of the art and new developments. In Proceedings of the International Conference on New Interfaces for Musical Expression (NIME), pages 36--41, 2003.
    [26]
    Arshia Cont, Diemo Schwarz, Norbert Schnell, and Christopher Raphael. Evaluation of real-time audio-to-score alignment. In International Symposium on Music Information Retrieval (ISMIR), 2007.
    [27]
    Takuma Otsuka, Kazuhiro Nakadai, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, and Hiroshi G Okuno. Design and implementation of two-level synchronization for interactive music robot. In AAAI Conference on Artificial Intelligence (AAAI), 2010.
    [28]
    Andreas Arzt, Gerhard Widmer, and Simon Dixon. Adaptive distance normalization for real-time music tracking. In 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO), pages 2689--2693. IEEE, 2012.
    [29]
    Jinxiang Chai and Jessica K Hodgins. Performance animation from low-dimensional control signals. In ACM SIGGRAPH, pages 686--696. 2005.
    [30]
    Jonathan Starck and Adrian Hilton. Surface capture for performance-based animation. IEEE Computer Graphics and Applications, 27(3):21--31, 2007.
    [31]
    Shihong Xia, Lin Gao, Yu-Kun Lai, Ming-Ze Yuan, and Jinxiang Chai. A survey on human performance capture and animation. Journal of Computer Science and Technology, 32(3):536--554, 2017.
    [32]
    Bibeg Hang Limbu, Halszka Jarodzka, Roland Klemke, and Marcus Specht. Using sensors and augmented reality to train apprentices using recorded expert performance: A systematic literature review. Educational Research Review, 25:1--22, 2018.
    [33]
    Yeyao Zhang, Eleftheria Tsipidi, Sasha Schriber, Mubbasir Kapadia, Markus Gross, and Ashutosh Modi. Generating animations from screenplays. The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), page 292, 2019.
    [34]
    Hariharan Subramonyam, Wilmot Li, Eytan Adar, and Mira Dontcheva. Taketoons: Script-driven performance animation. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology, pages 663--674, 2018.
    [35]
    Nora S Willett, Hijung Valentina Shin, Zeyu Jin, Wilmot Li, and Adam Finkelstein. Pose2pose: pose selection and transfer for 2d character animation. In Proceedings of the 25th International Conference on Intelligent User Interfaces (IUI), pages 88--99, 2020.
    [36]
    Adrien Nivaggioli and Damien Rohmer. Animation synthesis triggered by vocal mimics. In Motion, Interaction and Games, pages 1--5. 2019.
    [37]
    Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, and Jitendra Malik. Learning individual styles of conversational gesture. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3497--3506, 2019.
    [38]
    Ryo Kakitsuka, Kosetsu Tsukuda, Satoru Fukayama, Naoya Iwamoto, Masataka Goto, and Shigeo Morishima. A choreographic authoring system for character dance animation reflecting a user's preference. In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pages 5--1, 2016.
    [39]
    Dario Pavllo, Christoph Feichtenhofer, David Grangier, and Michael Auli. 3d human pose estimation in video with temporal convolutions and semi-supervised training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7753--7762, 2019.
    [40]
    Brian McFee, Colin Raffel, Dawen Liang, Daniel P.W. Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. librosa: Audio and music signal analysis in python. In Proceedings of the 14th python in science conference, volume 8, 2015.
    [41]
    Sebastian Böck, Filip Korzeniowski, Jan Schlüter, Florian Krebs, and Gerhard Widmer. MADMOM: A new python audio and music signal processing library. In Proceedings of the 24th ACM international conference on Multimedia, pages 1174--1178, 2016.
    [42]
    Zhan Xu, Yang Zhou, Evangelos Kalogerakis, Chris Landreth, and Karan Singh. Rignet: Neural rigging for articulated characters. arXiv preprint arXiv:2005.00559, 2020.

    Cited By

    View all
    • (2024)Virtual Instrument Performances (VIP): A Comprehensive ReviewComputer Graphics Forum10.1111/cgf.1506543:2Online publication date: 30-Apr-2024
    • (2022)Traditional Javanese Membranophone Percussion Play Formalization for Virtual Orchestra Automation2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)10.1109/CyberneticsCom55287.2022.9865292(381-386)Online publication date: 16-Jun-2022
    • (2021)Developing an Online Music Teaching and Practicing Platform via Machine Learning: A Review PaperUniversal Access in Human-Computer Interaction. Access to Media, Learning and Assistive Environments10.1007/978-3-030-78095-1_9(95-108)Online publication date: 3-Jul-2021

    Index Terms

    1. A Human-Computer Duet System for Music Performance

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          MM '20: Proceedings of the 28th ACM International Conference on Multimedia
          October 2020
          4889 pages
          ISBN:9781450379885
          DOI:10.1145/3394171
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 12 October 2020

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. animation
          2. automatic accompaniment
          3. body movement generation
          4. computer-human interaction
          5. music information retrieval

          Qualifiers

          • Research-article

          Conference

          MM '20
          Sponsor:

          Acceptance Rates

          Overall Acceptance Rate 995 of 4,171 submissions, 24%

          Upcoming Conference

          MM '24
          The 32nd ACM International Conference on Multimedia
          October 28 - November 1, 2024
          Melbourne , VIC , Australia

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)62
          • Downloads (Last 6 weeks)3

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Virtual Instrument Performances (VIP): A Comprehensive ReviewComputer Graphics Forum10.1111/cgf.1506543:2Online publication date: 30-Apr-2024
          • (2022)Traditional Javanese Membranophone Percussion Play Formalization for Virtual Orchestra Automation2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)10.1109/CyberneticsCom55287.2022.9865292(381-386)Online publication date: 16-Jun-2022
          • (2021)Developing an Online Music Teaching and Practicing Platform via Machine Learning: A Review PaperUniversal Access in Human-Computer Interaction. Access to Media, Learning and Assistive Environments10.1007/978-3-030-78095-1_9(95-108)Online publication date: 3-Jul-2021

          View Options

          Get Access

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media