skip to main content
research-article
Free access

MIMOSA: Human-AI Co-Creation of Computational Spatial Audio Effects on Videos

Published: 23 June 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Spatial audio offers more immersive video consumption experiences to viewers; however, creating and editing spatial audio often expensive and requires specialized hardware equipment and skills, posing a high barrier for amateur video creators. We present Mimosa, a human-AI co-creation tool that enables amateur users to computationally generate and manipulate spatial audio effects. For a video with only monaural or stereo audio, Mimosa automatically grounds each sound source to the corresponding sounding object in the visual scene and enables users to further validate and fix errors in the location of the sounding objects. Users can also augment the spatial audio effect by flexibly manipulating the sounding source positions and creatively customizing the audio effect. The design of Mimosa exemplifies a human-AI collaboration approach that, instead of utilizing state-of-art end-to-end “black-box” ML models, uses a multistep pipeline that aligns its interpretable intermediate results with the user’s workflow. A lab user study with 15 participants demonstrates Mimosa’s usability, usefulness, expressiveness, and capability in creating immersive spatial audio effects in collaboration with users.

    References

    [1]
    Jessica J. Baldis. 2001. Effects of spatial audio on memory, comprehension, and preference during desktop conferences. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Seattle, Washington, USA) (CHI ’01). Association for Computing Machinery, New York, NY, USA, 166–173. https://doi.org/10.1145/365024.365092
    [2]
    Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology 3, 2 (2006), 77–101.
    [3]
    Chakravarty R Alla Chaitanya, Nikunj Raghuvanshi, Keith W Godin, Zechen Zhang, Derek Nowrouzezahrai, and John M Snyder. 2020. Directional sources and listeners in interactive sound propagation using reciprocal wave field coding. ACM Transactions on Graphics (TOG) 39, 4 (2020), 44–1.
    [4]
    Erin Cherry and Celine Latulipe. 2014. Quantifying the Creativity Support of Digital Tools through the Creativity Support Index. ACM Trans. Comput.-Hum. Interact. 21, 4, Article 21 (jun 2014), 25 pages. https://doi.org/10.1145/2617588
    [5]
    John Joon Young Chung, Shiqing He, and Eytan Adar. 2021. The intersection of users, roles, interactions, and technologies in creativity support tools. In Designing Interactive Systems Conference 2021. 1817–1833.
    [6]
    John Joon Young Chung, Shiqing He, and Eytan Adar. 2022. Artist Support Networks: Implications for Future Creativity Support Tools. In Proceedings of the 2022 ACM Designing Interactive Systems Conference (Virtual Event, Australia) (DIS ’22). Association for Computing Machinery, New York, NY, USA, 232–246. https://doi.org/10.1145/3532106.3533505
    [7]
    H. Clark, G. Dutton, and P. Vanderlyn. 1957. The "Stereosonic" recording and reproducing system. IRE Transactions on Audio AU-5, 4 (1957), 96–111. https://doi.org/10.1109/TAU.1957.1166013
    [8]
    Philip Coleman, Andreas Franck, Philip JB Jackson, Richard J Hughes, Luca Remaggi, and Frank Melchior. 2017. Object-based reverberation for spatial audio. Journal of the Audio Engineering Society 65, 1/2 (2017), 66–77.
    [9]
    Robert Dalton, Jimmy Tobin, and David Grunzweig. 2016. Rondo360: Dysonics’ Spatial Audio Post-Production Toolkit for 360 Media. In Audio Engineering Society Convention 141. https://www.aes.org/e-lib/browse.cfm?elib=18387
    [10]
    Chris Dede. 2009. Immersive Interfaces for Engagement and Learning. Science 323, 5910 (2009), 66–69. https://doi.org/10.1126/science.1167311 arXiv:https://www.science.org/doi/pdf/10.1126/science.1167311
    [11]
    Jonas Frich, Lindsay MacDonald Vermeulen, Christian Remy, Michael Mose Biskjaer, and Peter Dalsgaard. 2019. Mapping the landscape of creativity support tools in HCI. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–18.
    [12]
    Ruohan Gao and Kristen Grauman. 2018. 2.5D Visual Sound. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018), 324–333. https://api.semanticscholar.org/CorpusID:54628402
    [13]
    William G Gardner and Keith D Martin. 1995. HRTF measurements of a KEMAR. The Journal of the Acoustical Society of America 97, 6 (1995), 3907–3908.
    [14]
    Simret Araya Gebreegziabher, Zheng Zhang, Xiaohang Tang, Yihao Meng, Elena Glassman, and Toby Jia-Jun Li. 2023. PaTAT: Human-AI Collaborative Qualitative Coding with Explainable Interactive Rule Synthesis. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems(CHI ’23). ACM.
    [15]
    Israel D Gebru, Dejan Marković, Alexander Richard, Steven Krenn, Gladstone A Butler, Fernando De la Torre, and Yaser Sheikh. 2021. Implicit hrtf modeling using temporal convolutional networks. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3385–3389.
    [16]
    Matthew Guzdial, Nicholas Liao, Jonathan Chen, Shao-Yu Chen, Shukan Shah, Vishwa Shah, Joshua Reno, Gillian Smith, and Mark O. Riedl. 2019. Friend, Collaborator, Student, Manager: How Design of an AI-Driven Game Level Editor Affects Creators. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3290605.3300854
    [17]
    Florian Heller and Johannes Schöning. 2018. NavigaTone: seamlessly embedding navigation cues in mobile music listening. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–7.
    [18]
    Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Curtis Hawthorne, AM Dai, MD Hoffman, and D Eck. 2018. Music transformer: Generating music with long-term structure (2018). arXiv preprint arXiv:1809.04281 (2018).
    [19]
    Gaurav Jain, Basel Hindi, Connor Courtien, Xin Yi Therese Xu, Conrad Wyrick, Michael Malcolm, and Brian A. Smith. 2023. Front Row: Automatically Generating Immersive Audio Representations of Tennis Broadcasts for Blind Viewers. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (San Francisco, CA, USA) (UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 39, 17 pages. https://doi.org/10.1145/3586183.3606830
    [20]
    Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, and Junmo Kim. 2022. Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth. arXiv preprint arXiv:2201.07436 (2022).
    [21]
    Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, and Mark D. Plumbley. 2020. PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition. arxiv:1912.10211 [cs.SD]
    [22]
    Janusz Konrad, Meng Wang, Prakash Ishwar, Chen Wu, and Debargha Mukherjee. 2013. Learning-based, automatic 2D-to-3D image and video conversion. IEEE Transactions on Image Processing 22, 9 (2013), 3485–3496.
    [23]
    Simon Langford. 2013. Digital audio editing: correcting and enhancing audio in Pro Tools, Logic Pro, Cubase, and Studio One. Routledge. https://doi.org/10.4324/9780203512890
    [24]
    Jonathan Lazar, Jinjuan Heidi Feng, and Harry Hochheiser. 2017. Preface. In Research Methods in Human Computer Interaction (Second Edition) (second edition ed.), Jonathan Lazar, Jinjuan Heidi Feng, and Harry Hochheiser (Eds.). Morgan Kaufmann, Boston. https://doi.org/10.1016/B978-0-12-805390-4.09987-8
    [25]
    Junwei Liang, Lu Jiang, and Alexander Hauptmann. 2020. Simaug: Learning robust representations from simulation for trajectory prediction. In European Conference on Computer Vision. Springer, 275–292.
    [26]
    David Chuan-En Lin, Anastasis Germanidis, Cristóbal Valenzuela, Yining Shi, and Nikolas Martelaro. 2023. Soundify: Matching Sound Effects to Video. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (San Francisco, CA, USA) (UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 18, 13 pages. https://doi.org/10.1145/3586183.3606823
    [27]
    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740–755.
    [28]
    Yuyu Lin, Jiahao Guo, Yang Chen, Cheng Yao, and Fangtian Ying. 2020. It is your turn: collaborative ideation with a co-creative robot through sketch. In Proceedings of the 2020 CHI conference on human factors in computing systems. 1–14.
    [29]
    Ryan Louie, Andy Coenen, Cheng Zhi Huang, Michael Terry, and Carrie J Cai. 2020. Novice-AI music co-creation via AI-steering tools for deep generative models. In Proceedings of the 2020 CHI conference on human factors in computing systems. 1–13.
    [30]
    Leo McCormack and Archontis Politis. 2019. SPARTA & COMPASS: Real-time implementations of linear and parametric spatial audio reproduction and processing methods. In Audio Engineering Society Conference: 2019 AES International Conference on Immersive and Interactive Audio. Audio Engineering Society.
    [31]
    Logan Middleton. 2022. Mix It Up, Mash It Up: Arrangement, Audio Editing, and the Importance of Sonic Context. (2022), 29. https://doi.org/10.37514/PRA-B.2022.1688.2.01
    [32]
    Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2021. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 1 (2021), 99–106.
    [33]
    Abduallah Mohamed, Deyao Zhu, Warren Vu, Mohamed Elhoseiny, and Christian Claudel. 2022. Social-Implicit: Rethinking Trajectory Prediction Evaluation and The Effectiveness of Implicit Maximum Likelihood Estimation. arXiv preprint arXiv:2203.03057 (2022).
    [34]
    Henrik Møller, Michael Friis Sørensen, Dorte Hammershøi, and Clemen Boje Jensen. 1995. Head-related transfer functions of human subjects. Journal of the Audio Engineering Society 43, 5 (1995), 300–321.
    [35]
    Pedro Morgado, Nuno Nvasconcelos, Timothy Langlois, and Oliver Wang. 2018. Self-supervised generation of spatial audio for 360 video. Advances in neural information processing systems 31 (2018).
    [36]
    Rosiana Natalie, Jolene Loh, Huei Suen Tan, Joshua Tseng, Hernisa Kacorri, and Kotaro Hara. 2021. Uncovering Patterns in Reviewers’ Feedback to Scene Description Authors. In Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility (Virtual Event, USA) (ASSETS ’21). Association for Computing Machinery, New York, NY, USA, Article 93, 4 pages. https://doi.org/10.1145/3441852.3476550
    [37]
    Cuong Nguyen, Stephen DiVerdi, Aaron Hertzmann, and Feng Liu. 2017. CollaVR: collaborative in-headset review for VR video. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology. 267–277.
    [38]
    Jakob Nielsen and Rolf Molich. 1990. Heuristic evaluation of user interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Seattle, Washington, USA) (CHI ’90). Association for Computing Machinery, New York, NY, USA, 249–256. https://doi.org/10.1145/97243.97281
    [39]
    Zheng Ning, Brianna L. Wimer, Kaiwen Jiang, Keyi Chen, Jerrick Ban, Yapeng Tian, Yuhang Zhao, and Toby Jia-Jun Li. 2024. SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems.
    [40]
    Zheng Ning, Zheng Zhang, Tianyi Sun, Yuan Tian, Tianyi Zhang, and Toby Jia-Jun Li. 2023. An Empirical Study of Model Errors and User Error Discovery and Repair Strategies in Natural Language Database Queries. In Proceedings of the 28th International Conference on Intelligent User Interfaces (Sydney, NSW, Australia) (IUI ’23). Association for Computing Machinery, New York, NY, USA, 633–649. https://doi.org/10.1145/3581641.3584067
    [41]
    Adrian North and David Hargreaves. 2008. The Social and Applied Psychology of Music. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780198567424.001.0001
    [42]
    Changhoon Oh, Jungwoo Song, Jinhan Choi, Seonghyeon Kim, Sungwoo Lee, and Bongwon Suh. 2018. I Lead, You Help but Only with Enough Details: Understanding User Experience of Co-Creation with Artificial Intelligence. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3173574.3174223
    [43]
    Sharon Oviatt. 1999. Mutual Disambiguation of Recognition Errors in a Multimodel Architecture. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Pittsburgh, Pennsylvania, USA) (CHI ’99). Association for Computing Machinery, New York, NY, USA, 576–583. https://doi.org/10.1145/302979.303163
    [44]
    Sharon Oviatt. 1999. Ten Myths of Multimodal Interaction. Commun. ACM 42, 11 (nov 1999), 74–81. https://doi.org/10.1145/319382.319398
    [45]
    Nikunj Raghuvanshi and John Snyder. 2014. Parametric wave field coding for precomputed sound propagation. ACM Transactions on Graphics (TOG) 33, 4 (2014), 1–11.
    [46]
    Nikunj Raghuvanshi and John Snyder. 2018. Parametric directional coding for precomputed sound propagation. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1–14.
    [47]
    Summer Rebensky, Kendall Carmody, Cherrise Ficke, Daniel Nguyen, Meredith Carroll, Jessica Wildman, and Amanda Thayer. 2021. Whoops! Something went wrong: Errors, trust, and trust repair strategies in human agent teaming. In International Conference on Human-Computer Interaction. Springer, 95–106.
    [48]
    Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative adversarial text to image synthesis. In International conference on machine learning. PMLR, 1060–1069.
    [49]
    Christian Remy, Lindsay MacDonald Vermeulen, Jonas Frich, Michael Mose Biskjaer, and Peter Dalsgaard. 2020. Evaluating creativity support tools in HCI research. In Proceedings of the 2020 ACM designing interactive systems conference. 457–476.
    [50]
    Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 (2015), 91–99.
    [51]
    Alexander Richard, Dejan Markovic, Israel D. Gebru, Steven Krenn, Gladstone Alexander Butler, Fernando Torre, and Yaser Sheikh. 2021. Neural Synthesis of Binaural Speech From Mono Audio. In International Conference on Learning Representations. https://openreview.net/forum?id=uAX8q61EVRu
    [52]
    Agnieszka Roginska and Paul Geluso. 2017. Immersive sound: The art and science of binaural and multi-channel audio. https://doi.org/10.4324/9781315707525
    [53]
    Joshua S Rubinstein, David E Meyer, and Jeffrey E Evans. 2001. Executive control of cognitive processes in task switching.Journal of experimental psychology: human perception and performance 27, 4 (2001), 763.
    [54]
    Francis Rumsey. 2012. Spatial audio. Routledge. https://doi.org/10.4324/9780080498195
    [55]
    Francis Rumsey and Tim McCormick. 2021. Sound and recording: applications and theory. Routledge. https://doi.org/10.4324/9781003092919
    [56]
    Jaime Sánchez and Héctor Flores. 2003. Memory enhancement through audio. In Proceedings of the 6th international ACM SIGACCESS conference on Computers and accessibility. 24–31.
    [57]
    Adam Ścibior, Vasileios Lioutas, Daniele Reda, Peyman Bateni, and Frank Wood. 2021. Imagining the road ahead: Multi-agent trajectory prediction via differentiable simulation. In 2021 IEEE International Intelligent Transportation Systems Conference (ITSC). IEEE, 720–725.
    [58]
    Ian Simon, Dan Morris, and Sumit Basu. 2008. MySong: automatic accompaniment generation for vocal melodies. In Proceedings of the SIGCHI conference on human factors in computing systems. 725–734.
    [59]
    Paris Smaragdis. 2009. User guided audio selection from complex sound mixtures. In Proceedings of the 22nd Annual ACM Symposium on User Interface Software and Technology (Victoria, BC, Canada) (UIST ’09). Association for Computing Machinery, New York, NY, USA, 89–92. https://doi.org/10.1145/1622176.1622193
    [60]
    Yapeng Tian, Di Hu, and Chenliang Xu. 2021. Cyclic co-learning of sounding object visual grounding and sound separation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2745–2754.
    [61]
    Chuhua Wang, Yuchen Wang, Mingze Xu, and David J Crandall. 2022. Stepwise goal-driven networks for trajectory prediction. IEEE Robotics and Automation Letters 7, 2 (2022), 2716–2723.
    [62]
    Dakuo Wang, Elizabeth Churchill, Pattie Maes, Xiangmin Fan, Ben Shneiderman, Yuanchun Shi, and Qianying Wang. 2020. From human-human collaboration to Human-AI collaboration: Designing AI systems that can work together with people. In Extended abstracts of the 2020 CHI conference on human factors in computing systems. 1–6.
    [63]
    Yujia Wang, Wei Liang, Haikun Huang, Yongqi Zhang, Dingzeyu Li, and Lap-Fai Yu. 2021. Toward Automatic Audio Description Generation for Accessible Videos. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 277, 12 pages. https://doi.org/10.1145/3411764.3445347
    [64]
    Yake Wei, Di Hu, Yapeng Tian, and Xuelong Li. 2022. Learning in Audio-visual Context: A Review, Analysis, and New Perspective. CoRR abs/2208.09579 (2022). https://doi.org/10.48550/ARXIV.2208.09579 arXiv:2208.09579
    [65]
    Scott Wisdom, Hakan Erdogan, Daniel PW Ellis, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Justin Salamon, Prem Seetharaman, and John R Hershey. 2021. What’s all the fuss about free universal sound separation data?. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 186–190.
    [66]
    Junyuan Xie, Ross Girshick, and Ali Farhadi. 2016. Deep3d: Fully automatic 2d-to-3d video conversion with deep convolutional neural networks. In European conference on computer vision. Springer, 842–857.
    [67]
    Xudong Xu, Hang Zhou, Ziwei Liu, Bo Dai, Xiaogang Wang, and Dahua Lin. 2021. Visually informed binaural audio generation without binaural audios. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15485–15494.
    [68]
    Kazuhiko Yamamoto and Takeo Igarashi. 2017. Fully perceptual-based 3D spatial sound individualization with an adaptive variational autoencoder. ACM Transactions on Graphics (TOG) 36, 6 (2017), 1–13.
    [69]
    Jing Yang and Friedemann Mattern. 2019. Audio Augmented Reality for Human-Object Interactions. In Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers (London, United Kingdom) (UbiComp/ISWC ’19 Adjunct). Association for Computing Machinery, New York, NY, USA, 408–412. https://doi.org/10.1145/3341162.3349302
    [70]
    Qian Yang, Aaron Steinfeld, Carolyn Rosé, and John Zimmerman. 2020. Re-Examining Whether, Why, and How Human-AI Interaction Is Uniquely Difficult to Design. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376301
    [71]
    Seraphina Yong and Hao-Chuan Wang. 2018. Using spatialized audio to improve human spatial knowledge acquisition in virtual reality. In Proceedings of the 23rd International Conference on Intelligent User Interfaces Companion. 1–2.
    [72]
    Beste F. Yuksel, Pooyan Fazli, Umang Mathur, Vaishali Bisht, Soo Jung Kim, Joshua Junhee Lee, Seung Jung Jin, Yue-Ting Siu, Joshua A. Miele, and Ilmi Yoon. 2020. Human-in-the-Loop Machine Learning to Increase Video Accessibility for Visually Impaired and Blind Users. In Proceedings of the 2020 ACM Designing Interactive Systems Conference (Eindhoven, Netherlands) (DIS ’20). Association for Computing Machinery, New York, NY, USA, 47–60. https://doi.org/10.1145/3357236.3395433
    [73]
    Mengfan Zhang, Jui-Hsien Wang, and Doug L James. 2021. Personalized HRTF Modeling Using DNN-Augmented BEM. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 451–455.
    [74]
    Zheng Zhang, Jie Gao, Ranjodh Singh Dhaliwal, and Toby Jia-Jun Li. 2023. VISAR: A Human-AI Argumentative Writing Assistant with Visual Programming and Rapid Draft Prototyping. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (San Francisco, CA, USA) (UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 5, 30 pages. https://doi.org/10.1145/3586183.3606800
    [75]
    Zheng Zhang, Zheng Ning, Chenliang Xu, Yapeng Tian, and Toby Jia-Jun Li. 2023. PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (San Francisco,CA,USA) (UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 17, 18 pages. https://doi.org/10.1145/3586183.3606776
    [76]
    Hang Zhou, Xudong Xu, Dahua Lin, Xiaogang Wang, and Ziwei Liu. 2020. Sep-stereo: Visually guided stereophonic audio generation by associating source separation. In European Conference on Computer Vision. Springer, 52–69.

    Index Terms

    1. MIMOSA: Human-AI Co-Creation of Computational Spatial Audio Effects on Videos

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      C&C '24: Proceedings of the 16th Conference on Creativity & Cognition
      June 2024
      718 pages
      ISBN:9798400704857
      DOI:10.1145/3635636
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 23 June 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. creator tools
      2. multimodal
      3. sound effects
      4. video

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      Conference

      C&C '24
      Sponsor:
      C&C '24: Creativity and Cognition
      June 23 - 26, 2024
      IL, Chicago, USA

      Acceptance Rates

      Overall Acceptance Rate 108 of 371 submissions, 29%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 36
        Total Downloads
      • Downloads (Last 12 months)36
      • Downloads (Last 6 weeks)36

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media