Skip to main content

Showing 1–50 of 325 results for author: Sophia

  1. arXiv:2407.09468  [pdf, other

    cs.LG

    Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures

    Authors: Sophia Sanborn, Johan Mathe, Mathilde Papillon, Domas Buracas, Hansen J Lillemark, Christian Shewmake, Abby Bertics, Xavier Pennec, Nina Miolane

    Abstract: The enduring legacy of Euclidean geometry underpins classical machine learning, which, for decades, has been primarily developed for data lying in Euclidean space. Yet, modern machine learning increasingly encounters richly structured data that is inherently nonEuclidean. This data can exhibit intricate geometric, topological and algebraic structure: from the geometry of the curvature of space-tim… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  2. arXiv:2407.08877  [pdf, other

    q-bio.NC cs.HC

    Analyzing Speech Motor Movement using Surface Electromyography in Minimally Verbal Adults with Autism Spectrum Disorder

    Authors: Wazeer Zulfikar, Nishat Protyasha, Camila Canales, Heli Patel, James Williamson, Laura Sarnie, Lisa Nowinski, Nataliya Kosmyna, Paige Townsend, Sophia Yuditskaya, Tanya Talkar, Utkarsh Oggy Sarawgi, Christopher McDougle, Thomas Quatieri, Pattie Maes, Maria Mody

    Abstract: Adults who are minimally verbal with autism spectrum disorder (mvASD) have pronounced speech difficulties linked to impaired motor skills. Existing research and clinical assessments primarily use indirect methods such as standardized tests, video-based facial features, and handwriting tasks, which may not directly target speech-related motor skills. In this study, we measure activity from eight fa… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  3. arXiv:2407.07655  [pdf, other

    cs.LG

    The Selective G-Bispectrum and its Inversion: Applications to G-Invariant Networks

    Authors: Simon Mataigne, Johan Mathe, Sophia Sanborn, Christopher Hillar, Nina Miolane

    Abstract: An important problem in signal processing and deep learning is to achieve \textit{invariance} to nuisance factors not relevant for the task. Since many of these factors are describable as the action of a group $G$ (e.g. rotations, translations, scalings), we want methods to be $G$-invariant. The $G$-Bispectrum extracts every characteristic of a given signal up to group action: for example, the sha… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 9 pages

    MSC Class: 68T01; 68T07; 68R01; 20K01

  4. arXiv:2407.05467  [pdf, other

    cs.DC cs.AI

    The infrastructure powering IBM's Gen AI model development

    Authors: Talia Gershon, Seetharami Seelam, Brian Belgodere, Milton Bonilla, Lan Hoang, Danny Barnett, I-Hsin Chung, Apoorve Mohan, Ming-Hung Chen, Lixiang Luo, Robert Walkup, Constantinos Evangelinos, Shweta Salaria, Marc Dombrowa, Yoonho Park, Apo Kayi, Liran Schour, Alim Alim, Ali Sydney, Pavlos Maniotis, Laurent Schares, Bernard Metzler, Bengi Karacali-Akyamac, Sophia Wen, Tatsuhiro Chiba , et al. (121 additional authors not shown)

    Abstract: AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering effi… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Corresponding Authors: Talia Gershon, Seetharami Seelam,Brian Belgodere, Milton Bonilla

  5. arXiv:2407.04667  [pdf, other

    stat.ME cs.LG

    The diameter of a stochastic matrix: A new measure for sensitivity analysis in Bayesian networks

    Authors: Manuele Leonelli, Jim Q. Smith, Sophia K. Wright

    Abstract: Bayesian networks are one of the most widely used classes of probabilistic models for risk management and decision support because of their interpretability and flexibility in including heterogeneous pieces of information. In any applied modelling, it is critical to assess how robust the inferences on certain target variables are to changes in the model. In Bayesian networks, these analyses fall u… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  6. arXiv:2407.04352  [pdf, other

    cs.HC cs.LG

    UpStory: the Uppsala Storytelling dataset

    Authors: Marc Fraile, Natalia Calvo-Barajas, Anastasia Sophia Apeiron, Giovanna Varni, Joakim Lindblad, Nataša Sladoje, Ginevra Castellano

    Abstract: Friendship and rapport play an important role in the formation of constructive social interactions, and have been widely studied in educational settings due to their impact on student outcomes. Given the growing interest in automating the analysis of such phenomena through Machine Learning (ML), access to annotated interaction datasets is highly valuable. However, no dataset on dyadic child-child… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  7. arXiv:2406.16192  [pdf, other

    cs.CV

    HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis

    Authors: Guillaume Jaume, Paul Doucet, Andrew H. Song, Ming Y. Lu, Cristina Almagro-Pérez, Sophia J. Wagner, Anurag J. Vaidya, Richard J. Chen, Drew F. K. Williamson, Ahrong Kim, Faisal Mahmood

    Abstract: Spatial transcriptomics (ST) enables interrogating the molecular composition of tissue with ever-increasing resolution, depth, and sensitivity. However, costs, rapidly evolving technology, and lack of standards have constrained computational methods in ST to narrow tasks and small cohorts. In addition, the underlying tissue morphology as reflected by H&E-stained whole slide images (WSIs) encodes r… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Under review

  8. arXiv:2406.15647  [pdf, other

    cs.SD cs.LG eess.AS

    Generating Music with Structure Using Self-Similarity as Attention

    Authors: Sophia Hager, Kathleen Hablutzel, Katherine M. Kinnaird

    Abstract: Despite the innovations in deep learning and generative AI, creating long term structure as well as the layers of repeated structure common in musical works remains an open challenge in music generation. We propose an attention layer that uses a novel approach applying user-supplied self-similarity matrices to previous time steps, and demonstrate it in our Similarity Incentivized Neural Generator… ▽ More

    Submitted 25 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

  9. arXiv:2406.14949  [pdf, other

    cs.AI

    CEASEFIRE: An AI-powered system for combatting illicit firearms trafficking

    Authors: Ioannis Mademlis, Jorgen Cani, Marina Mancuso, Caterina Paternoster, Emmanouil Adamakis, George Margetis, Sylvie Chambon, Alain Crouzil, Loubna Lechelek, Georgia Dede, Spyridon Evangelatos, George Lalas, Franck Mignet, Pantelis Linardatos, Konstantinos Kentrotis, Henryk Gierszal, Piotr Tyczka, Sophia Karagiorgou, George Pantelis, Georgios Stavropoulos, Konstantinos Votis, Georgios Th. Papadopoulos

    Abstract: Modern technologies have led illicit firearms trafficking to partially merge with cybercrime, while simultaneously permitting its off-line aspects to become more sophisticated. Law enforcement officers face difficult challenges that require hi-tech solutions. This article presents a real-world system, powered by advanced Artificial Intelligence, for facilitating them in their everyday work.

    Submitted 21 June, 2024; originally announced June 2024.

  10. arXiv:2406.11328  [pdf, other

    cs.CL

    Are Large Language Models True Healthcare Jacks-of-All-Trades? Benchmarking Across Health Professions Beyond Physician Exams

    Authors: Zheheng Luo, Chenhan Yuan, Qianqian Xie, Sophia Ananiadou

    Abstract: Recent advancements in Large Language Models (LLMs) have demonstrated their potential in delivering accurate answers to questions about world knowledge. Despite this, existing benchmarks for evaluating LLMs in healthcare predominantly focus on medical doctors, leaving other critical healthcare professions underrepresented. To fill this research gap, we introduce the Examinations for Medical Person… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 15 pages, 4 figures

  11. arXiv:2406.11186  [pdf, other

    cs.CY cs.HC

    An Initial Study Review of Designing a Technology Solution for Women in Technologically Deprived Areas or Low Resource Constraint Communities

    Authors: Jones Yeboah, Sophia Bampoh, Annu Sible Prabhakar

    Abstract: In the West African country of Ghana, depression is a significant issue affecting a large number of women. Despite its importance, the issue received insufficient attention during the COVID-19 pandemic. In developed countries, mobile phones serve as a convenient medium for accessing health information and providers. However, in Ghana, women's access to mobile phones is limited by cultural, social,… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 15 pages, 1 figure

  12. arXiv:2406.11093  [pdf, other

    cs.CL

    RAEmoLLM: Retrieval Augmented LLMs for Cross-Domain Misinformation Detection Using In-Context Learning based on Emotional Information

    Authors: Zhiwei Liu, Kailai Yang, Qianqian Xie, Christine de Kock, Sophia Ananiadou, Eduard Hovy

    Abstract: Misinformation is prevalent in various fields such as education, politics, health, etc., causing significant harm to society. However, current methods for cross-domain misinformation detection rely on time and resources consuming fine-tuning and complex model structures. With the outstanding performance of LLMs, many studies have employed them for misinformation detection. Unfortunately, they focu… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  13. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, Jinming Guo, Xiaolin Chen, Jingcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  14. arXiv:2406.08216  [pdf, ps, other

    cs.SE

    A Software Engineering Perspective on Testing Large Language Models: Research, Practice, Tools and Benchmarks

    Authors: Sinclair Hudson, Sophia Jit, Boyue Caroline Hu, Marsha Chechik

    Abstract: Large Language Models (LLMs) are rapidly becoming ubiquitous both as stand-alone tools and as components of current and future software systems. To enable usage of LLMs in the high-stake or safety-critical systems of 2030, they need to undergo rigorous testing. Software Engineering (SE) research on testing Machine Learning (ML) components and ML-based systems has systematically explored many topic… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  15. Mind Mansion: Exploring Metaphorical Interactions to Engage with Negative Thoughts in Virtual Reality

    Authors: Julian Rasch, Michelle Johanna Zender, Sophia Sakel, Nadine Wagener

    Abstract: Recurrent negative thoughts can significantly disrupt daily life and contribute to negative emotional states. Facing, confronting, and noticing such thoughts without support can be challenging. To provide a playful setting and leverage the technical maturation of Virtual Reality (VR), our VR experience, Mind Mansion, places the user in an initially cluttered virtual apartment. Here we utilize esta… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: To appear in Proceedings of the Designing Interactive Systems Conference (DIS '24), July 1-5, 2024, IT University of Copenhagen, Denmark

  16. arXiv:2406.04287  [pdf, other

    cs.CV cs.RO

    SpectralZoom: Efficient Segmentation with an Adaptive Hyperspectral Camera

    Authors: Jackson Arnold, Sophia Rossi, Chloe Petrosino, Ethan Mitchell, Sanjeev J. Koppal

    Abstract: Hyperspectral image segmentation is crucial for many fields such as agriculture, remote sensing, biomedical imaging, battlefield sensing and astronomy. However, the challenge of hyper and multi spectral imaging is its large data footprint. We propose both a novel camera design and a vision transformer-based (ViT) algorithm that alleviate both the captured data footprint and the computational load… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  17. arXiv:2406.01528  [pdf, other

    cs.LG

    Physics-Informed Neural Networks for Dynamic Process Operations with Limited Physical Knowledge and Data

    Authors: Mehmet Velioglu, Song Zhai, Sophia Rupprecht, Alexander Mitsos, Andreas Jupke, Manuel Dahmen

    Abstract: In chemical engineering, process data are expensive to acquire, and complex phenomena are difficult to fully model. We explore the use of physics-informed neural networks (PINNs) for dynamic processes with incomplete mechanistic semi-explicit differential-algebraic equation systems and scarce process data. In particular, we focus on estimating states for which neither direct observational data nor… ▽ More

    Submitted 7 July, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: manuscript (32 pages, 9 figures, 11 tables), supporting materials (14 pages, 4 figures, 5 tables)

  18. arXiv:2405.20195  [pdf, other

    cs.HC

    Using Large Language Models for Humanitarian Frontline Negotiation: Opportunities and Considerations

    Authors: Zilin Ma, Susannah, Su, Nathan Zhao, Linn Bieske, Blake Bullwinkel, Yanyi Zhang, Sophia, Yang, Ziqing Luo, Siyao Li, Gekai Liao, Boxiang Wang, Jinglun Gao, Zihan Wen, Claude Bruderlein, Weiwei Pan

    Abstract: Humanitarian negotiations in conflict zones, called \emph{frontline negotiation}, are often highly adversarial, complex, and high-risk. Several best-practices have emerged over the years that help negotiators extract insights from large datasets to navigate nuanced and rapidly evolving scenarios. Recent advances in large language models (LLMs) have sparked interest in the potential for AI to aid d… ▽ More

    Submitted 30 May, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  19. arXiv:2405.18536  [pdf, other

    cs.LG

    Data-Driven Simulator for Mechanical Circulatory Support with Domain Adversarial Neural Process

    Authors: Sophia Sun, Wenyuan Chen, Zihao Zhou, Sonia Fereidooni, Elise Jortberg, Rose Yu

    Abstract: Mechanical Circulatory Support (MCS) devices, implemented as a probabilistic deep sequence model. Existing mechanical simulators for MCS rely on oversimplifying assumptions and are insensitive to patient-specific behavior, limiting their applicability to real-world treatment scenarios. To address these shortcomings, our model Domain Adversarial Neural Process (DANP) employs a neural process archit… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  20. arXiv:2405.13949  [pdf, other

    cs.CV

    PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery

    Authors: Runlong He, Mengya Xu, Adrito Das, Danyal Z. Khan, Sophia Bano, Hani J. Marcus, Danail Stoyanov, Matthew J. Clarkson, Mobarakol Islam

    Abstract: Visual Question Answering (VQA) within the surgical domain, utilizing Large Language Models (LLMs), offers a distinct opportunity to improve intra-operative decision-making and facilitate intuitive surgeon-AI interaction. However, the development of LLMs for surgical VQA is hindered by the scarcity of diverse and extensive datasets with complex reasoning tasks. Moreover, contextual fusion of the i… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 10 pages, 3 figures

  21. arXiv:2405.07111  [pdf, other

    cs.CL

    Designing and Evaluating Dialogue LLMs for Co-Creative Improvised Theatre

    Authors: Boyd Branch, Piotr Mirowski, Kory Mathewson, Sophia Ppali, Alexandra Covaci

    Abstract: Social robotics researchers are increasingly interested in multi-party trained conversational agents. With a growing demand for real-world evaluations, our study presents Large Language Models (LLMs) deployed in a month-long live show at the Edinburgh Festival Fringe. This case study investigates human improvisers co-creating with conversational agents in a professional theatre setting. We explore… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: 13 pages, 7 figures, accepted for publication at the International Conference on Computational Creativity 2024

  22. arXiv:2405.04324  [pdf, other

    cs.AI cs.CL cs.SE

    Granite Code Models: A Family of Open Foundation Models for Code Intelligence

    Authors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang, Yikang Shen, Aditya Prasad, Adriana Meza Soria, Michele Merler, Parameswaran Selvam, Saptha Surendran, Shivdeep Singh, Manish Sethi, Xuan-Hong Dang, Pengyuan Li, Kun-Lung Wu, Syed Zawad, Andrew Coleman, Matthew White, Mark Lewis, Raju Pavuluri, Yan Koyfman, Boris Lublinsky, Maximilien de Bayser, Ibrahim Abdelaziz, Kinjal Basu, Mayank Agarwal , et al. (21 additional authors not shown)

    Abstract: Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabili… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Corresponding Authors: Rameswar Panda, Ruchir Puri; Equal Contributors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang

  23. arXiv:2405.00146  [pdf, other

    quant-ph cs.ET

    Averting multi-qubit burst errors in surface code magic state factories

    Authors: Jason D. Chadwick, Christopher Kang, Joshua Viszlai, Sophia Fuhui Lin, Frederic T. Chong

    Abstract: Fault-tolerant quantum computation relies on the assumption of time-invariant, sufficiently low physical error rates. However, current superconducting quantum computers suffer from frequent disruptive noise events, including cosmic ray impacts and shifting two-level system defects. Several methods have been proposed to mitigate these issues in software, but they add large overheads in terms of phy… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: 13 pages, 12 figures

  24. arXiv:2404.19264  [pdf, other

    cs.RO

    DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets

    Authors: Xiaoyu Huang, Yufeng Chi, Ruofeng Wang, Zhongyu Li, Xue Bin Peng, Sophia Shao, Borivoje Nikolic, Koushil Sreenath

    Abstract: This work introduces DiffuseLoco, a framework for training multi-skill diffusion-based policies for dynamic legged locomotion from offline datasets, enabling real-time control of diverse skills on robots in the real world. Offline learning at scale has led to breakthroughs in computer vision, natural language processing, and robotic manipulation domains. However, scaling up learning for legged rob… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  25. arXiv:2404.18796  [pdf, other

    cs.CL cs.AI

    Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

    Authors: Pat Verga, Sebastian Hofstatter, Sophia Althammer, Yixuan Su, Aleksandra Piktus, Arkady Arkhangorodsky, Minjie Xu, Naomi White, Patrick Lewis

    Abstract: As Large Language Models (LLMs) have become more advanced, they have outpaced our abilities to accurately evaluate their quality. Not only is finding data to adequately probe particular model properties difficult, but evaluating the correctness of a model's freeform generation alone is a challenge. To address this, many evaluations now rely on using LLMs themselves as judges to score the quality o… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  26. arXiv:2404.15236  [pdf, other

    cs.SE

    Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models

    Authors: Aidan Z. H. Yang, Sophia Kolak, Vincent J. Hellendoorn, Ruben Martins, Claire Le Goues

    Abstract: Language models have improved by orders of magnitude with the recent emergence of Transformer-based Large Language Models (LLMs). LLMs have demonstrated their ability to generate natural code that is highly similar to code written by professional developers. One intermediate value an LLM can emit is entropy, which measures the naturalness of a token of code. We hypothesize that entropy can be used… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  27. arXiv:2404.14789  [pdf, other

    cs.MA cs.LO

    Opinion Update in a Subjective Logic Model for Social Networks

    Authors: Mário S. Alvim, Sophia Knight, José C. Oliveira

    Abstract: Subjective Logic (SL) is a logic incorporating uncertainty and opinions for agents in dynamic systems. In this work, we investigate the use of subjective logic to model opinions and belief change in social networks. In particular, we work toward the development of a subjective logic belief/opinion update function appropriate for modeling belief change as communication occurs in social networks. We… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  28. arXiv:2404.14040  [pdf, other

    cs.CV

    Surgical-DeSAM: Decoupling SAM for Instrument Segmentation in Robotic Surgery

    Authors: Yuyang Sheng, Sophia Bano, Matthew J. Clarkson, Mobarakol Islam

    Abstract: Purpose: The recent Segment Anything Model (SAM) has demonstrated impressive performance with point, text or bounding box prompts, in various applications. However, in safety-critical surgical tasks, prompting is not possible due to (i) the lack of per-frame prompts for supervised learning, (ii) it is unrealistic to prompt frame-by-frame in a real-time tracking application, and (iii) it is expensi… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 8 pages, 2 figures

  29. arXiv:2404.14027  [pdf, other

    cs.CV cs.LG

    OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks

    Authors: Sophia Sirko-Galouchenko, Alexandre Boulch, Spyros Gidaris, Andrei Bursuc, Antonin Vobecky, Patrick Pérez, Renaud Marlet

    Abstract: We introduce a self-supervised pretraining method, called OccFeat, for camera-only Bird's-Eye-View (BEV) segmentation networks. With OccFeat, we pretrain a BEV network via occupancy prediction and feature distillation tasks. Occupancy prediction provides a 3D geometric understanding of the scene to the model. However, the geometry learned is class-agnostic. Hence, we add semantic information to th… ▽ More

    Submitted 12 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024, Workshop on Autonomous Driving

  30. arXiv:2404.09220  [pdf, other

    cs.CL

    Compass: Large Multilingual Language Model for South-east Asia

    Authors: Sophia Maria

    Abstract: Large language models have exhibited significant proficiency in languages endowed with extensive linguistic resources, such as English and Chinese. Nevertheless, their effectiveness notably diminishes when applied to languages characterized by limited linguistic resources, particularly within the Southeast Asian linguistic landscape, such as Indonesian. The scarcity of linguistic resources for the… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  31. arXiv:2404.06309  [pdf, other

    cs.CV

    Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models

    Authors: David Kurzendörfer, Otniel-Bogdan Mercea, A. Sophia Koepke, Zeynep Akata

    Abstract: Audio-visual zero-shot learning methods commonly build on features extracted from pre-trained models, e.g. video or audio classification models. However, existing benchmarks predate the popularization of large multi-modal models, such as CLIP and CLAP. In this work, we explore such large pre-trained models to obtain features, i.e. CLIP for visual features, and CLAP for audio features. Furthermore,… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: CVPRw 2024 (L3D-IVU)

  32. arXiv:2404.06128  [pdf, other

    cs.CV

    Gaussian Pancakes: Geometrically-Regularized 3D Gaussian Splatting for Realistic Endoscopic Reconstruction

    Authors: Sierra Bonilla, Shuai Zhang, Dimitrios Psychogyios, Danail Stoyanov, Francisco Vasconcelos, Sophia Bano

    Abstract: Within colorectal cancer diagnostics, conventional colonoscopy techniques face critical limitations, including a limited field of view and a lack of depth information, which can impede the detection of precancerous lesions. Current methods struggle to provide comprehensive and accurate 3D reconstructions of the colonic surface which can help minimize the missing regions and reinspection for pre-ca… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 12 pages, 5 figures

  33. arXiv:2404.05022  [pdf, other

    cs.CV cs.LG

    DinoBloom: A Foundation Model for Generalizable Cell Embeddings in Hematology

    Authors: Valentin Koch, Sophia J. Wagner, Salome Kazeminia, Ece Sancar, Matthias Hehr, Julia Schnabel, Tingying Peng, Carsten Marr

    Abstract: In hematology, computational models offer significant potential to improve diagnostic accuracy, streamline workflows, and reduce the tedious work of analyzing single cells in peripheral blood or bone marrow smears. However, clinical adoption of computational models has been hampered by the lack of generalization due to large batch effects, small dataset sizes, and poor performance in transfer lear… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  34. arXiv:2403.17141  [pdf, other

    cs.CL cs.AI

    MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models

    Authors: Kailai Yang, Zhiwei Liu, Qianqian Xie, Jimin Huang, Tianlin Zhang, Sophia Ananiadou

    Abstract: Recent advancements in large language models (LLMs) aim to tackle heterogeneous human expectations and values via multi-objective preference alignment. However, existing methods are parameter-adherent to the policy model, leading to two key limitations: (1) the high-cost repetition of their alignment algorithms for each new target model; (2) they cannot expand to unseen objectives due to their sta… ▽ More

    Submitted 6 May, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: Work in progress

  35. arXiv:2403.16760  [pdf

    cs.HC cs.AI cs.SD eess.AS

    As Good As A Coin Toss: Human detection of AI-generated images, videos, audio, and audiovisual stimuli

    Authors: Di Cooke, Abigail Edwards, Sophia Barkoff, Kathryn Kelly

    Abstract: As synthetic media becomes progressively more realistic and barriers to using it continue to lower, the technology has been increasingly utilized for malicious purposes, from financial fraud to nonconsensual pornography. Today, the principal defense against being misled by synthetic media relies on the ability of the human observer to visually and auditorily discern between real and fake. However,… ▽ More

    Submitted 4 April, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: For study pre-registration, see https://osf.io/fnhr3

    MSC Class: 68T01 ACM Class: I.2

  36. arXiv:2403.15243  [pdf, other

    q-fin.CP cs.LG q-fin.MF q-fin.PM

    Robust Utility Optimization via a GAN Approach

    Authors: Florian Krach, Josef Teichmann, Hanna Wutte

    Abstract: Robust utility optimization enables an investor to deal with market uncertainty in a structured way, with the goal of maximizing the worst-case outcome. In this work, we propose a generative adversarial network (GAN) approach to (approximately) solve robust utility optimization problems in general and realistic settings. In particular, we model both the investor and the market by neural networks (… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    MSC Class: 91-08; 68T07; 91G10; 91G60

  37. arXiv:2403.13313  [pdf, other

    cs.AI cs.CL

    Polaris: A Safety-focused LLM Constellation Architecture for Healthcare

    Authors: Subhabrata Mukherjee, Paul Gamble, Markel Sanz Ausin, Neel Kant, Kriti Aggarwal, Neha Manjunath, Debajyoti Datta, Zhengliang Liu, Jiayuan Ding, Sophia Busacca, Cezanne Bianco, Swapnil Sharma, Rae Lasko, Michelle Voisard, Sanchay Harneja, Darya Filippova, Gerry Meixiong, Kevin Cha, Amir Youssefi, Meyhaa Buvanesh, Howard Weingram, Sebastian Bierman-Lytle, Harpreet Singh Mangat, Kim Parikh, Saad Godil , et al. (1 additional authors not shown)

    Abstract: We develop Polaris, the first safety-focused LLM constellation for real-time patient-AI healthcare conversations. Unlike prior LLM works in healthcare focusing on tasks like question answering, our work specifically focuses on long multi-turn voice conversations. Our one-trillion parameter constellation system is composed of several multibillion parameter LLMs as co-operative agents: a stateful pr… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  38. arXiv:2403.11743  [pdf, other

    cs.LG stat.ML

    PARMESAN: Parameter-Free Memory Search and Transduction for Dense Prediction Tasks

    Authors: Philip Matthias Winter, Maria Wimmer, David Major, Dimitrios Lenis, Astrid Berg, Theresa Neubauer, Gaia Romana De Paolis, Johannes Novotny, Sophia Ulonska, Katja Bühler

    Abstract: In this work we address flexibility in deep learning by means of transductive reasoning. For adaptation to new tasks or new data, existing methods typically involve tuning of learnable parameters or even complete re-training from scratch, rendering such approaches unflexible in practice. We argue that the notion of separating computation from memory by the means of transduction can act as a steppi… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: preprint, 27 pages, 8 figures

  39. arXiv:2403.06765  [pdf, other

    cs.CL

    ConspEmoLLM: Conspiracy Theory Detection Using an Emotion-Based Large Language Model

    Authors: Zhiwei Liu, Boyang Liu, Paul Thompson, Kailai Yang, Sophia Ananiadou

    Abstract: The internet has brought both benefits and harms to society. A prime example of the latter is misinformation, including conspiracy theories, which flood the web. Recent advances in natural language processing, particularly the emergence of large language models (LLMs), have improved the prospects of accurate misinformation detection. However, most LLM-based approaches to conspiracy theory detectio… ▽ More

    Submitted 16 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Work in progress

  40. arXiv:2403.06249  [pdf, other

    cs.CE cs.CL

    No Language is an Island: Unifying Chinese and English in Financial Large Language Models, Instruction Data, and Benchmarks

    Authors: Gang Hu, Ke Qin, Chenhan Yuan, Min Peng, Alejandro Lopez-Lira, Benyou Wang, Sophia Ananiadou, Wanlong Yu, Jimin Huang, Qianqian Xie

    Abstract: While the progression of Large Language Models (LLMs) has notably propelled financial analysis, their application has largely been confined to singular language realms, leaving untapped the potential of bilingual Chinese-English capacity. To bridge this chasm, we introduce ICE-PIXIU, seamlessly amalgamating the ICE-INTENT model and ICE-FLARE benchmark for bilingual financial analysis. ICE-PIXIU un… ▽ More

    Submitted 16 April, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

    Comments: 24 pages, 5 figures, 12 tables, including Appendix

  41. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  42. arXiv:2403.05098  [pdf, ps, other

    cs.HC cs.CY cs.RO

    Love, Joy, and Autism Robots: A Metareview and Provocatype

    Authors: Andrew Hundt, Gabrielle Ohlson, Pieter Wolfert, Lux Miranda, Sophia Zhu, Katie Winkle

    Abstract: Previous work has observed how Neurodivergence is often harmfully pathologized in Human-Computer Interaction (HCI) and Human-Robot interaction (HRI) research. We conduct a review of autism robot reviews and find the dominant research direction is Autistic people's second to lowest (24 of 25) research priority: interventions and treatments purporting to 'help' neurodivergent individuals to conform… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 3 pages; In Assistive Applications, Accessibility, and Disability Ethics (A3DE) workshop at the Human Robot Interaction (HRI) Conference 2024; https://sites.google.com/view/love-joy-and-autism-robots/home

  43. arXiv:2403.01353  [pdf, other

    quant-ph cs.AR cs.ET

    Spatially parallel decoding for multi-qubit lattice surgery

    Authors: Sophia Fuhui Lin, Eric C. Peterson, Krishanu Sankar, Prasahnt Sivarajah

    Abstract: Running quantum algorithms protected by quantum error correction requires a real time, classical decoder. To prevent the accumulation of a backlog, this decoder must process syndromes from the quantum device at a faster rate than they are generated. Most prior work on real time decoding has focused on an isolated logical qubit encoded in the surface code. However, for surface code, quantum program… ▽ More

    Submitted 6 May, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

  44. arXiv:2402.19106  [pdf, other

    eess.AS cs.IR cs.SD

    A SOUND APPROACH: Using Large Language Models to generate audio descriptions for egocentric text-audio retrieval

    Authors: Andreea-Maria Oncescu, João F. Henriques, Andrew Zisserman, Samuel Albanie, A. Sophia Koepke

    Abstract: Video databases from the internet are a valuable source of text-audio retrieval datasets. However, given that sound and vision streams represent different "views" of the data, treating visual descriptions as audio descriptions is far from optimal. Even if audio class labels are present, they commonly are not very detailed, making them unsuited for text-audio retrieval. To exploit relevant audio in… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: 9 pages, 2 figures, 9 tables, Accepted at ICASSP 2024

  45. arXiv:2402.17615  [pdf, other

    cs.MA cs.SI

    A Multi-Agent Model for Opinion Evolution under Cognitive Biases

    Authors: Mário S. Alvim, Artur Gaspar da Silva, Sophia Knight, Frank Valencia

    Abstract: We generalize the DeGroot model for opinion dynamics to better capture realistic social scenarios. We introduce a model where each agent has their own individual cognitive biases. Society is represented as a directed graph whose edges indicate how much agents influence one another. Biases are represented as the functions in the square region $[-1,1]^2$ and categorized into four sub-regions based o… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  46. arXiv:2402.13758  [pdf, other

    cs.CL

    Factual Consistency Evaluation of Summarisation in the Era of Large Language Models

    Authors: Zheheng Luo, Qianqian Xie, Sophia Ananiadou

    Abstract: Factual inconsistency with source documents in automatically generated summaries can lead to misinformation or pose risks. Existing factual consistency(FC) metrics are constrained by their performance, efficiency, and explainability. Recent advances in Large language models (LLMs) have demonstrated remarkable potential in text evaluation but their effectiveness in assessing FC in summarisation rem… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: 5 figures

  47. arXiv:2402.13498  [pdf, other

    cs.CL

    The Lay Person's Guide to Biomedicine: Orchestrating Large Language Models

    Authors: Zheheng Luo, Qianqian Xie, Sophia Ananiadou

    Abstract: Automated lay summarisation (LS) aims to simplify complex technical documents into a more accessible format to non-experts. Existing approaches using pre-trained language models, possibly augmented with external background knowledge, tend to struggle with effective simplification and explanation. Moreover, automated methods that can effectively assess the `layness' of generated summaries are lacki… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: 18 pages, 4 figures

  48. arXiv:2402.12659  [pdf, other

    cs.CL cs.AI cs.CE

    FinBen: A Holistic Financial Benchmark for Large Language Models

    Authors: Qianqian Xie, Weiguang Han, Zhengyu Chen, Ruoyu Xiang, Xiao Zhang, Yueru He, Mengxi Xiao, Dong Li, Yongfu Dai, Duanyu Feng, Yijing Xu, Haoqiang Kang, Ziyan Kuang, Chenhan Yuan, Kailai Yang, Zheheng Luo, Tianlin Zhang, Zhiwei Liu, Guojun Xiong, Zhiyang Deng, Yuechen Jiang, Zhiyuan Yao, Haohang Li, Yangyang Yu, Gang Hu , et al. (9 additional authors not shown)

    Abstract: LLMs have transformed NLP and shown promise in various fields, yet their potential in finance is underexplored due to a lack of comprehensive evaluation benchmarks, the rapid development of LLMs, and the complexity of financial tasks. In this paper, we introduce FinBen, the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks, covering seven critical… ▽ More

    Submitted 18 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: 26 pages, 11 figures

  49. arXiv:2402.10456  [pdf, other

    stat.ML cs.LG stat.AP stat.ME

    Generative Modeling for Tabular Data via Penalized Optimal Transport Network

    Authors: Wenhui Sophia Lu, Chenyang Zhong, Wing Hung Wong

    Abstract: The task of precisely learning the probability distribution of rows within tabular data and producing authentic synthetic samples is both crucial and non-trivial. Wasserstein generative adversarial network (WGAN) marks a notable improvement in generative modeling, addressing the challenges faced by its predecessor, generative adversarial network. However, due to the mixed data types and multimodal… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: 37 pages, 23 figures

  50. arXiv:2402.10035  [pdf, other

    cs.CV cs.DC

    Investigation of Federated Learning Algorithms for Retinal Optical Coherence Tomography Image Classification with Statistical Heterogeneity

    Authors: Sanskar Amgain, Prashant Shrestha, Sophia Bano, Ignacio del Valle Torres, Michael Cunniffe, Victor Hernandez, Phil Beales, Binod Bhattarai

    Abstract: Purpose: We apply federated learning to train an OCT image classifier simulating a realistic scenario with multiple clients and statistical heterogeneous data distribution where data in the clients lack samples of some categories entirely. Methods: We investigate the effectiveness of FedAvg and FedProx to train an OCT image classification model in a decentralized fashion, addressing privacy conc… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.