Showing 1–2 of 2 results for author: Fernández, M P

Search v0.5.6 released 2020-02-24

arXiv:2406.09940 [pdf, other]

q-bio.NC cs.AI cs.NE

Implementing engrams from a machine learning perspective: XOR as a basic motif

Authors: Jesus Marco de Lucas, Maria Peña Fernandez, Lara Lloret Iglesias

Abstract: We have previously presented the idea of how complex multimodal information could be represented in our brains in a compressed form, following mechanisms similar to those employed in machine learning tools, like autoencoders. In this short comment note we reflect, mainly with a didactical purpose, upon the basic question for a biological implementation: what could be the mechanism working as a los… ▽ More We have previously presented the idea of how complex multimodal information could be represented in our brains in a compressed form, following mechanisms similar to those employed in machine learning tools, like autoencoders. In this short comment note we reflect, mainly with a didactical purpose, upon the basic question for a biological implementation: what could be the mechanism working as a loss function, and how it could be connected to a neuronal network providing the required feedback to build a simple training configuration. We present our initial ideas based on a basic motif that implements an XOR switch, using few excitatory and inhibitory neurons. Such motif is guided by a principle of homeostasis, and it implements a loss function that could provide feedback to other neuronal structures, establishing a control system. We analyse the presence of this XOR motif in the connectome of C.Elegans, and indicate the relationship with the well-known lateral inhibition motif. We then explore how to build a basic biological neuronal structure with learning capacity integrating this XOR motif. Guided by the computational analogy, we show an initial example that indicates the feasibility of this approach, applied to learning binary sequences, like it is the case for simple melodies. In summary, we provide didactical examples exploring the parallelism between biological and computational learning mechanisms, identifying basic motifs and training procedures, and how an engram encoding a melody could be built using a simple recurrent network involving both excitatory and inhibitory neurons. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 9 pages, short comment
arXiv:2210.02833 [pdf, other]

cs.IR cs.CL cs.LG cs.SD eess.AS

Matching Text and Audio Embeddings: Exploring Transfer-learning Strategies for Language-based Audio Retrieval

Authors: Benno Weck, Miguel Pérez Fernández, Holger Kirchhoff, Xavier Serra

Abstract: We present an analysis of large-scale pretrained deep learning models used for cross-modal (text-to-audio) retrieval. We use embeddings extracted by these models in a metric learning framework to connect matching pairs of audio and text. Shallow neural networks map the embeddings to a common dimensionality. Our system, which is an extension of our submission to the Language-based Audio Retrieval T… ▽ More We present an analysis of large-scale pretrained deep learning models used for cross-modal (text-to-audio) retrieval. We use embeddings extracted by these models in a metric learning framework to connect matching pairs of audio and text. Shallow neural networks map the embeddings to a common dimensionality. Our system, which is an extension of our submission to the Language-based Audio Retrieval Task of the DCASE Challenge 2022, employs the RoBERTa foundation model as the text embedding extractor. A pretrained PANNs model extracts the audio embeddings. To improve the generalisation of our model, we investigate how pretraining with audio and associated noisy text collected from the online platform Freesound improves the performance of our method. Furthermore, our ablation study reveals that the proper choice of the loss function and fine-tuning the pretrained models are essential in training a competitive retrieval system. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Comments: 5 pages, 2 figures. Accepted at Detection and Classification of Acoustic Scenes and Events 2022 (DCASE2022)

Search v0.5.6 released 2020-02-24