Skip to main content

Showing 1–9 of 9 results for author: Hawley, S H

  1. arXiv:2407.01499  [pdf, other

    cs.SD cs.LG eess.AS

    Pictures Of MIDI: Controlled Music Generation via Graphical Prompts for Image-Based Diffusion Inpainting

    Authors: Scott H. Hawley

    Abstract: Recent years have witnessed significant progress in generative models for music, featuring diverse architectures that balance output quality, diversity, speed, and user control. This study explores a user-friendly graphical interface enabling the drawing of masked regions for inpainting by an Hourglass Diffusion Transformer (HDiT) model trained on MIDI piano roll images. To enhance note generation… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 6 pages text + 2 pages references, 10 figures

    ACM Class: J.5; I.2.0; I.4.0

  2. arXiv:2406.02699  [pdf, other

    cs.LG cs.SD eess.AS

    Operational Latent Spaces

    Authors: Scott H. Hawley, Austin R. Tackett

    Abstract: We investigate the construction of latent spaces through self-supervised learning to support semantically meaningful operations. Analogous to operational amplifiers, these "operational latent spaces" (OpLaS) not only demonstrate semantic structure such as clustering but also support common transformational operations with inherent semantic meaning. Some operational latent spaces are found to have… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 7 pages, 6 figures. Accepted to AES International Symposium on AI and the Musician

    ACM Class: I.2.4; J.5

  3. arXiv:2402.04825  [pdf, other

    cs.SD cs.LG eess.AS

    Fast Timing-Conditioned Latent Audio Diffusion

    Authors: Zach Evans, CJ Carr, Josiah Taylor, Scott H. Hawley, Jordi Pons

    Abstract: Generating long-form 44.1kHz stereo audio from text prompts can be computationally demanding. Further, most previous works do not tackle that music and sound effects naturally vary in their duration. Our research focuses on the efficient generation of long-form, variable-length stereo music and sounds at 44.1kHz using text prompts with a generative model. Stable Audio is based on latent diffusion,… ▽ More

    Submitted 13 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: Accepted to ICML 2024. Code: https://github.com/Stability-AI/stable-audio-tools. Metrics: https://github.com/Stability-AI/stable-audio-metrics. Demo: https://stability-ai.github.io/stable-audio-demo

  4. arXiv:2304.04394  [pdf, other

    eess.AS cs.SD

    Leveraging Neural Representations for Audio Manipulation

    Authors: Scott H. Hawley, Christian J. Steinmetz

    Abstract: We investigate applying audio manipulations using pretrained neural network-based autoencoders as an alternative to traditional signal processing methods, since the former may provide greater semantic or perceptual organization. To establish the potential of this approach, we first establish if representations from these models encode information about manipulations. We carry out experiments and p… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Comments: Accepted as Express Paper for AES Europe 2023, https://aeseurope.com/

  5. arXiv:2110.12261  [pdf, other

    cs.CV eess.IV

    espiownage: Tracking Transients in Steelpan Drum Strikes Using Surveillance Technology

    Authors: Scott H. Hawley, Andrew C. Morrison, Grant S. Morgan

    Abstract: We present an improvement in the ability to meaningfully track features in high speed videos of Caribbean steelpan drums illuminated by Electronic Speckle Pattern Interferometry (ESPI). This is achieved through the use of up-to-date computer vision libraries for object detection and image segmentation as well as a significant effort toward cleaning the dataset previously used to train systems for… ▽ More

    Submitted 23 October, 2021; originally announced October 2021.

    Comments: 6 pages, 5 figures, submitted to NeurIPS 2021 Workshop on Machine Learning and the Physical Sciences

    ACM Class: I.4.6; I.4.9

  6. arXiv:2102.00632  [pdf, other

    cs.CV cs.LG physics.app-ph physics.ins-det

    ConvNets for Counting: Object Detection of Transient Phenomena in Steelpan Drums

    Authors: Scott H. Hawley, Andrew C. Morrison

    Abstract: We train an object detector built from convolutional neural networks to count interference fringes in elliptical antinode regions in frames of high-speed video recordings of transient oscillations in Caribbean steelpan drums illuminated by electronic speckle pattern interferometry (ESPI). The annotations provided by our model aim to contribute to the understanding of time-dependent behavior in suc… ▽ More

    Submitted 6 September, 2021; v1 submitted 31 January, 2021; originally announced February 2021.

    Comments: 13 pages, 9 figures, accepted for J. Acous. Soc. Am. (JASA) Special Issue on Machine Learning in Acoustics

    ACM Class: I.4.7

  7. arXiv:2006.05584  [pdf, other

    eess.AS cs.LG cs.SD

    Exploring Quality and Generalizability in Parameterized Neural Audio Effects

    Authors: William Mitchell, Scott H. Hawley

    Abstract: Deep neural networks have shown promise for music audio signal processing applications, often surpassing prior approaches, particularly as end-to-end models in the waveform domain. Yet results to date have tended to be constrained by low sample rates, noise, narrow domains of signal types, and/or lack of parameterized controls (i.e. "knobs"), making their suitability for professional audio enginee… ▽ More

    Submitted 9 June, 2020; originally announced June 2020.

    Comments: 7 pages, 5 figures

    ACM Class: I.2.6

  8. arXiv:1905.11928  [pdf, other

    eess.AS cs.LG cs.SD

    SignalTrain: Profiling Audio Compressors with Deep Neural Networks

    Authors: Scott H. Hawley, Benjamin Colburn, Stylianos I. Mimilakis

    Abstract: In this work we present a data-driven approach for predicting the behavior of (i.e., profiling) a given non-linear audio signal processing effect (henceforth "audio effect"). Our objective is to learn a mapping function that maps the unprocessed audio to the processed by the audio effect to be profiled, using time-domain samples. To that aim, we employ a deep auto-encoder model that is conditioned… ▽ More

    Submitted 29 May, 2019; v1 submitted 28 May, 2019; originally announced May 2019.

    Comments: 9 pages, 10 figures. v2: typos & references fixed

    ACM Class: I.2.6

  9. arXiv:1903.03171  [pdf

    cs.CY cs.AI

    Challenges for an Ontology of Artificial Intelligence

    Authors: Scott H. Hawley

    Abstract: Of primary importance in formulating a response to the increasing prevalence and power of artificial intelligence (AI) applications in society are questions of ontology. Questions such as: What "are" these systems? How are they to be regarded? How does an algorithm come to be regarded as an agent? We discuss three factors which hinder discussion and obscure attempts to form a clear ontology of AI:… ▽ More

    Submitted 25 February, 2019; originally announced March 2019.

    Comments: 20 pages, accepted for publication in Journal of the American Scientific Affiliation. In press, expected publication March 2019

    ACM Class: I.2.0; K.4.0