research-article

Free access

MIMOSA: Human-AI Co-Creation of Computational Spatial Audio Effects on Videos

Authors:

Yapeng Tian, and

Toby Jia-Jun LiAuthors Info & Claims

C&C '24: Proceedings of the 16th Conference on Creativity & Cognition

June 2024

Pages 156 - 169

https://doi.org/10.1145/3635636.3656189

Published: 23 June 2024 Publication History

All formats PDF

Abstract

Spatial audio offers more immersive video consumption experiences to viewers; however, creating and editing spatial audio often expensive and requires specialized hardware equipment and skills, posing a high barrier for amateur video creators. We present Mimosa, a human-AI co-creation tool that enables amateur users to computationally generate and manipulate spatial audio effects. For a video with only monaural or stereo audio, Mimosa automatically grounds each sound source to the corresponding sounding object in the visual scene and enables users to further validate and fix errors in the location of the sounding objects. Users can also augment the spatial audio effect by flexibly manipulating the sounding source positions and creatively customizing the audio effect. The design of Mimosa exemplifies a human-AI collaboration approach that, instead of utilizing state-of-art end-to-end “black-box” ML models, uses a multistep pipeline that aligns its interpretable intermediate results with the user’s workflow. A lab user study with 15 participants demonstrates Mimosa’s usability, usefulness, expressiveness, and capability in creating immersive spatial audio effects in collaboration with users.

References

[1]

Jessica J. Baldis. 2001. Effects of spatial audio on memory, comprehension, and preference during desktop conferences. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Seattle, Washington, USA) (CHI ’01). Association for Computing Machinery, New York, NY, USA, 166–173. https://doi.org/10.1145/365024.365092

Digital Library

[2]

Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology 3, 2 (2006), 77–101.

[3]

Chakravarty R Alla Chaitanya, Nikunj Raghuvanshi, Keith W Godin, Zechen Zhang, Derek Nowrouzezahrai, and John M Snyder. 2020. Directional sources and listeners in interactive sound propagation using reciprocal wave field coding. ACM Transactions on Graphics (TOG) 39, 4 (2020), 44–1.

[4]

Erin Cherry and Celine Latulipe. 2014. Quantifying the Creativity Support of Digital Tools through the Creativity Support Index. ACM Trans. Comput.-Hum. Interact. 21, 4, Article 21 (jun 2014), 25 pages. https://doi.org/10.1145/2617588

Digital Library

[5]

John Joon Young Chung, Shiqing He, and Eytan Adar. 2021. The intersection of users, roles, interactions, and technologies in creativity support tools. In Designing Interactive Systems Conference 2021. 1817–1833.

Digital Library

[6]

John Joon Young Chung, Shiqing He, and Eytan Adar. 2022. Artist Support Networks: Implications for Future Creativity Support Tools. In Proceedings of the 2022 ACM Designing Interactive Systems Conference (Virtual Event, Australia) (DIS ’22). Association for Computing Machinery, New York, NY, USA, 232–246. https://doi.org/10.1145/3532106.3533505

Digital Library

[7]

H. Clark, G. Dutton, and P. Vanderlyn. 1957. The "Stereosonic" recording and reproducing system. IRE Transactions on Audio AU-5, 4 (1957), 96–111. https://doi.org/10.1109/TAU.1957.1166013

[8]

Philip Coleman, Andreas Franck, Philip JB Jackson, Richard J Hughes, Luca Remaggi, and Frank Melchior. 2017. Object-based reverberation for spatial audio. Journal of the Audio Engineering Society 65, 1/2 (2017), 66–77.

[9]

Robert Dalton, Jimmy Tobin, and David Grunzweig. 2016. Rondo360: Dysonics’ Spatial Audio Post-Production Toolkit for 360 Media. In Audio Engineering Society Convention 141. https://www.aes.org/e-lib/browse.cfm?elib=18387

[10]

Chris Dede. 2009. Immersive Interfaces for Engagement and Learning. Science 323, 5910 (2009), 66–69. https://doi.org/10.1126/science.1167311 arXiv:https://www.science.org/doi/pdf/10.1126/science.1167311

[11]

Jonas Frich, Lindsay MacDonald Vermeulen, Christian Remy, Michael Mose Biskjaer, and Peter Dalsgaard. 2019. Mapping the landscape of creativity support tools in HCI. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–18.

Digital Library

[12]

Ruohan Gao and Kristen Grauman. 2018. 2.5D Visual Sound. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018), 324–333. https://api.semanticscholar.org/CorpusID:54628402

[13]

William G Gardner and Keith D Martin. 1995. HRTF measurements of a KEMAR. The Journal of the Acoustical Society of America 97, 6 (1995), 3907–3908.

[14]

Simret Araya Gebreegziabher, Zheng Zhang, Xiaohang Tang, Yihao Meng, Elena Glassman, and Toby Jia-Jun Li. 2023. PaTAT: Human-AI Collaborative Qualitative Coding with Explainable Interactive Rule Synthesis. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems(CHI ’23). ACM.

Digital Library

[15]

Israel D Gebru, Dejan Marković, Alexander Richard, Steven Krenn, Gladstone A Butler, Fernando De la Torre, and Yaser Sheikh. 2021. Implicit hrtf modeling using temporal convolutional networks. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3385–3389.

[16]

Matthew Guzdial, Nicholas Liao, Jonathan Chen, Shao-Yu Chen, Shukan Shah, Vishwa Shah, Joshua Reno, Gillian Smith, and Mark O. Riedl. 2019. Friend, Collaborator, Student, Manager: How Design of an AI-Driven Game Level Editor Affects Creators. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3290605.3300854

Digital Library

[17]

Florian Heller and Johannes Schöning. 2018. NavigaTone: seamlessly embedding navigation cues in mobile music listening. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–7.

Digital Library

[18]

Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Curtis Hawthorne, AM Dai, MD Hoffman, and D Eck. 2018. Music transformer: Generating music with long-term structure (2018). arXiv preprint arXiv:1809.04281 (2018).

[19]

Gaurav Jain, Basel Hindi, Connor Courtien, Xin Yi Therese Xu, Conrad Wyrick, Michael Malcolm, and Brian A. Smith. 2023. Front Row: Automatically Generating Immersive Audio Representations of Tennis Broadcasts for Blind Viewers. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (San Francisco, CA, USA) (UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 39, 17 pages. https://doi.org/10.1145/3586183.3606830

Digital Library

[20]

Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, and Junmo Kim. 2022. Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth. arXiv preprint arXiv:2201.07436 (2022).

[21]

Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, and Mark D. Plumbley. 2020. PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition. arxiv:1912.10211 [cs.SD]

[22]

Janusz Konrad, Meng Wang, Prakash Ishwar, Chen Wu, and Debargha Mukherjee. 2013. Learning-based, automatic 2D-to-3D image and video conversion. IEEE Transactions on Image Processing 22, 9 (2013), 3485–3496.

Digital Library

[23]

Simon Langford. 2013. Digital audio editing: correcting and enhancing audio in Pro Tools, Logic Pro, Cubase, and Studio One. Routledge. https://doi.org/10.4324/9780203512890

[24]

Jonathan Lazar, Jinjuan Heidi Feng, and Harry Hochheiser. 2017. Preface. In Research Methods in Human Computer Interaction (Second Edition) (second edition ed.), Jonathan Lazar, Jinjuan Heidi Feng, and Harry Hochheiser (Eds.). Morgan Kaufmann, Boston. https://doi.org/10.1016/B978-0-12-805390-4.09987-8

[25]

Junwei Liang, Lu Jiang, and Alexander Hauptmann. 2020. Simaug: Learning robust representations from simulation for trajectory prediction. In European Conference on Computer Vision. Springer, 275–292.

Digital Library

[26]

David Chuan-En Lin, Anastasis Germanidis, Cristóbal Valenzuela, Yining Shi, and Nikolas Martelaro. 2023. Soundify: Matching Sound Effects to Video. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (San Francisco, CA, USA) (UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 18, 13 pages. https://doi.org/10.1145/3586183.3606823

Digital Library

[27]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740–755.

[28]

Yuyu Lin, Jiahao Guo, Yang Chen, Cheng Yao, and Fangtian Ying. 2020. It is your turn: collaborative ideation with a co-creative robot through sketch. In Proceedings of the 2020 CHI conference on human factors in computing systems. 1–14.

Digital Library

[29]

Ryan Louie, Andy Coenen, Cheng Zhi Huang, Michael Terry, and Carrie J Cai. 2020. Novice-AI music co-creation via AI-steering tools for deep generative models. In Proceedings of the 2020 CHI conference on human factors in computing systems. 1–13.

Digital Library

[30]

Leo McCormack and Archontis Politis. 2019. SPARTA & COMPASS: Real-time implementations of linear and parametric spatial audio reproduction and processing methods. In Audio Engineering Society Conference: 2019 AES International Conference on Immersive and Interactive Audio. Audio Engineering Society.

[31]

Logan Middleton. 2022. Mix It Up, Mash It Up: Arrangement, Audio Editing, and the Importance of Sonic Context. (2022), 29. https://doi.org/10.37514/PRA-B.2022.1688.2.01

[32]

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2021. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 1 (2021), 99–106.

Digital Library

[33]

Abduallah Mohamed, Deyao Zhu, Warren Vu, Mohamed Elhoseiny, and Christian Claudel. 2022. Social-Implicit: Rethinking Trajectory Prediction Evaluation and The Effectiveness of Implicit Maximum Likelihood Estimation. arXiv preprint arXiv:2203.03057 (2022).

[34]

Henrik Møller, Michael Friis Sørensen, Dorte Hammershøi, and Clemen Boje Jensen. 1995. Head-related transfer functions of human subjects. Journal of the Audio Engineering Society 43, 5 (1995), 300–321.

[35]

Pedro Morgado, Nuno Nvasconcelos, Timothy Langlois, and Oliver Wang. 2018. Self-supervised generation of spatial audio for 360 video. Advances in neural information processing systems 31 (2018).

[36]

Rosiana Natalie, Jolene Loh, Huei Suen Tan, Joshua Tseng, Hernisa Kacorri, and Kotaro Hara. 2021. Uncovering Patterns in Reviewers’ Feedback to Scene Description Authors. In Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility (Virtual Event, USA) (ASSETS ’21). Association for Computing Machinery, New York, NY, USA, Article 93, 4 pages. https://doi.org/10.1145/3441852.3476550

Digital Library

[37]

Cuong Nguyen, Stephen DiVerdi, Aaron Hertzmann, and Feng Liu. 2017. CollaVR: collaborative in-headset review for VR video. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology. 267–277.

Digital Library

[38]

Jakob Nielsen and Rolf Molich. 1990. Heuristic evaluation of user interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Seattle, Washington, USA) (CHI ’90). Association for Computing Machinery, New York, NY, USA, 249–256. https://doi.org/10.1145/97243.97281

Digital Library

[39]

Zheng Ning, Brianna L. Wimer, Kaiwen Jiang, Keyi Chen, Jerrick Ban, Yapeng Tian, Yuhang Zhao, and Toby Jia-Jun Li. 2024. SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems.

Digital Library

[40]

Zheng Ning, Zheng Zhang, Tianyi Sun, Yuan Tian, Tianyi Zhang, and Toby Jia-Jun Li. 2023. An Empirical Study of Model Errors and User Error Discovery and Repair Strategies in Natural Language Database Queries. In Proceedings of the 28th International Conference on Intelligent User Interfaces (Sydney, NSW, Australia) (IUI ’23). Association for Computing Machinery, New York, NY, USA, 633–649. https://doi.org/10.1145/3581641.3584067

Digital Library

[41]

Adrian North and David Hargreaves. 2008. The Social and Applied Psychology of Music. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780198567424.001.0001

[42]

Changhoon Oh, Jungwoo Song, Jinhan Choi, Seonghyeon Kim, Sungwoo Lee, and Bongwon Suh. 2018. I Lead, You Help but Only with Enough Details: Understanding User Experience of Co-Creation with Artificial Intelligence. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3173574.3174223

Digital Library

[43]

Sharon Oviatt. 1999. Mutual Disambiguation of Recognition Errors in a Multimodel Architecture. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Pittsburgh, Pennsylvania, USA) (CHI ’99). Association for Computing Machinery, New York, NY, USA, 576–583. https://doi.org/10.1145/302979.303163

Digital Library

[44]

Sharon Oviatt. 1999. Ten Myths of Multimodal Interaction. Commun. ACM 42, 11 (nov 1999), 74–81. https://doi.org/10.1145/319382.319398

Digital Library

[45]

Nikunj Raghuvanshi and John Snyder. 2014. Parametric wave field coding for precomputed sound propagation. ACM Transactions on Graphics (TOG) 33, 4 (2014), 1–11.

Digital Library

[46]

Nikunj Raghuvanshi and John Snyder. 2018. Parametric directional coding for precomputed sound propagation. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1–14.

Digital Library

[47]

Summer Rebensky, Kendall Carmody, Cherrise Ficke, Daniel Nguyen, Meredith Carroll, Jessica Wildman, and Amanda Thayer. 2021. Whoops! Something went wrong: Errors, trust, and trust repair strategies in human agent teaming. In International Conference on Human-Computer Interaction. Springer, 95–106.

Digital Library

[48]

Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative adversarial text to image synthesis. In International conference on machine learning. PMLR, 1060–1069.

[49]

Christian Remy, Lindsay MacDonald Vermeulen, Jonas Frich, Michael Mose Biskjaer, and Peter Dalsgaard. 2020. Evaluating creativity support tools in HCI research. In Proceedings of the 2020 ACM designing interactive systems conference. 457–476.

Digital Library

[50]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 (2015), 91–99.

[51]

Alexander Richard, Dejan Markovic, Israel D. Gebru, Steven Krenn, Gladstone Alexander Butler, Fernando Torre, and Yaser Sheikh. 2021. Neural Synthesis of Binaural Speech From Mono Audio. In International Conference on Learning Representations. https://openreview.net/forum?id=uAX8q61EVRu

[52]

Agnieszka Roginska and Paul Geluso. 2017. Immersive sound: The art and science of binaural and multi-channel audio. https://doi.org/10.4324/9781315707525

[53]

Joshua S Rubinstein, David E Meyer, and Jeffrey E Evans. 2001. Executive control of cognitive processes in task switching.Journal of experimental psychology: human perception and performance 27, 4 (2001), 763.

[54]

Francis Rumsey. 2012. Spatial audio. Routledge. https://doi.org/10.4324/9780080498195

[55]

Francis Rumsey and Tim McCormick. 2021. Sound and recording: applications and theory. Routledge. https://doi.org/10.4324/9781003092919

[56]

Jaime Sánchez and Héctor Flores. 2003. Memory enhancement through audio. In Proceedings of the 6th international ACM SIGACCESS conference on Computers and accessibility. 24–31.

Digital Library

[57]

Adam Ścibior, Vasileios Lioutas, Daniele Reda, Peyman Bateni, and Frank Wood. 2021. Imagining the road ahead: Multi-agent trajectory prediction via differentiable simulation. In 2021 IEEE International Intelligent Transportation Systems Conference (ITSC). IEEE, 720–725.

Digital Library

[58]

Ian Simon, Dan Morris, and Sumit Basu. 2008. MySong: automatic accompaniment generation for vocal melodies. In Proceedings of the SIGCHI conference on human factors in computing systems. 725–734.

Digital Library

[59]

Paris Smaragdis. 2009. User guided audio selection from complex sound mixtures. In Proceedings of the 22nd Annual ACM Symposium on User Interface Software and Technology (Victoria, BC, Canada) (UIST ’09). Association for Computing Machinery, New York, NY, USA, 89–92. https://doi.org/10.1145/1622176.1622193

Digital Library

[60]

Yapeng Tian, Di Hu, and Chenliang Xu. 2021. Cyclic co-learning of sounding object visual grounding and sound separation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2745–2754.

[61]

Chuhua Wang, Yuchen Wang, Mingze Xu, and David J Crandall. 2022. Stepwise goal-driven networks for trajectory prediction. IEEE Robotics and Automation Letters 7, 2 (2022), 2716–2723.

[62]

Dakuo Wang, Elizabeth Churchill, Pattie Maes, Xiangmin Fan, Ben Shneiderman, Yuanchun Shi, and Qianying Wang. 2020. From human-human collaboration to Human-AI collaboration: Designing AI systems that can work together with people. In Extended abstracts of the 2020 CHI conference on human factors in computing systems. 1–6.

[63]

Yujia Wang, Wei Liang, Haikun Huang, Yongqi Zhang, Dingzeyu Li, and Lap-Fai Yu. 2021. Toward Automatic Audio Description Generation for Accessible Videos. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 277, 12 pages. https://doi.org/10.1145/3411764.3445347

Digital Library

[64]

Yake Wei, Di Hu, Yapeng Tian, and Xuelong Li. 2022. Learning in Audio-visual Context: A Review, Analysis, and New Perspective. CoRR abs/2208.09579 (2022). https://doi.org/10.48550/ARXIV.2208.09579 arXiv:2208.09579

[65]

Scott Wisdom, Hakan Erdogan, Daniel PW Ellis, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Justin Salamon, Prem Seetharaman, and John R Hershey. 2021. What’s all the fuss about free universal sound separation data?. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 186–190.

[66]

Junyuan Xie, Ross Girshick, and Ali Farhadi. 2016. Deep3d: Fully automatic 2d-to-3d video conversion with deep convolutional neural networks. In European conference on computer vision. Springer, 842–857.

[67]

Xudong Xu, Hang Zhou, Ziwei Liu, Bo Dai, Xiaogang Wang, and Dahua Lin. 2021. Visually informed binaural audio generation without binaural audios. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15485–15494.

[68]

Kazuhiko Yamamoto and Takeo Igarashi. 2017. Fully perceptual-based 3D spatial sound individualization with an adaptive variational autoencoder. ACM Transactions on Graphics (TOG) 36, 6 (2017), 1–13.

Digital Library

[69]

Jing Yang and Friedemann Mattern. 2019. Audio Augmented Reality for Human-Object Interactions. In Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers (London, United Kingdom) (UbiComp/ISWC ’19 Adjunct). Association for Computing Machinery, New York, NY, USA, 408–412. https://doi.org/10.1145/3341162.3349302

Digital Library

[70]

Qian Yang, Aaron Steinfeld, Carolyn Rosé, and John Zimmerman. 2020. Re-Examining Whether, Why, and How Human-AI Interaction Is Uniquely Difficult to Design. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376301

Digital Library

[71]

Seraphina Yong and Hao-Chuan Wang. 2018. Using spatialized audio to improve human spatial knowledge acquisition in virtual reality. In Proceedings of the 23rd International Conference on Intelligent User Interfaces Companion. 1–2.

Digital Library

[72]

Beste F. Yuksel, Pooyan Fazli, Umang Mathur, Vaishali Bisht, Soo Jung Kim, Joshua Junhee Lee, Seung Jung Jin, Yue-Ting Siu, Joshua A. Miele, and Ilmi Yoon. 2020. Human-in-the-Loop Machine Learning to Increase Video Accessibility for Visually Impaired and Blind Users. In Proceedings of the 2020 ACM Designing Interactive Systems Conference (Eindhoven, Netherlands) (DIS ’20). Association for Computing Machinery, New York, NY, USA, 47–60. https://doi.org/10.1145/3357236.3395433

Digital Library

[73]

Mengfan Zhang, Jui-Hsien Wang, and Doug L James. 2021. Personalized HRTF Modeling Using DNN-Augmented BEM. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 451–455.

[74]

Zheng Zhang, Jie Gao, Ranjodh Singh Dhaliwal, and Toby Jia-Jun Li. 2023. VISAR: A Human-AI Argumentative Writing Assistant with Visual Programming and Rapid Draft Prototyping. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (San Francisco, CA, USA) (UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 5, 30 pages. https://doi.org/10.1145/3586183.3606800

Digital Library

[75]

Zheng Zhang, Zheng Ning, Chenliang Xu, Yapeng Tian, and Toby Jia-Jun Li. 2023. PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (San Francisco,CA,USA) (UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 17, 18 pages. https://doi.org/10.1145/3586183.3606776

Digital Library

[76]

Hang Zhou, Xudong Xu, Dahua Lin, Xiaogang Wang, and Ziwei Liu. 2020. Sep-stereo: Visually guided stereophonic audio generation by associating source separation. In European Conference on Computer Vision. Springer, 52–69.

Digital Library

Index Terms

MIMOSA: Human-AI Co-Creation of Computational Spatial Audio Effects on Videos
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interactive systems and tools
      1. User interface programming

Recommendations

Towards user-friendly audio creation
AM '10: Proceedings of the 5th Audio Mostly Conference: A Conference on Interaction with Sound

This paper presents a new approach to sound composition for soundtrack composers and sound designers. We propose a tool for usable sound manipulation and composition that targets sound variety and expressive rendering of the composition. We first ...
Read More
Multimodal Music Mood Classification Using Audio and Lyrics
ICMLA '08: Proceedings of the 2008 Seventh International Conference on Machine Learning and Applications

In this paper we present a study on music mood classification using audio and lyrics information. The mood of a song is expressed by means of musical features but a relevant part also seems to be conveyed by the lyrics. We evaluate each factor ...
Read More
User experience of stereo and spatial audio in 360° live music videos
AcademicMindtrek '20: Proceedings of the 23rd International Conference on Academic Mindtrek

360° music videos are becoming prevalent in music entertainment. Still, academic studies of the 360° live music experience covering both audio and visual experience are scarce. In this paper, we present a study of user experience of stereo and spatial ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

C&C '24: Proceedings of the 16th Conference on Creativity & Cognition

June 2024

718 pages

ISBN:9798400704857

DOI:10.1145/3635636

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

AnalytiXIN Faculty Fellowship
NSF (National Science Foundation)
NVIDIA Academic Hardware Grant

Conference

C&C '24

Sponsor:

SIGCHI

C&C '24: Creativity and Cognition

June 23 - 26, 2024

IL, Chicago, USA

Acceptance Rates

Overall Acceptance Rate 108 of 371 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
36
Total Downloads

Downloads (Last 12 months)36
Downloads (Last 6 weeks)36

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents