Skip to main content

Showing 1–50 of 98 results for author: Díaz, M

  1. arXiv:2406.11757  [pdf, other

    cs.AI cs.CL cs.CY cs.HC

    STAR: SocioTechnical Approach to Red Teaming Language Models

    Authors: Laura Weidinger, John Mellor, Bernat Guillen Pegueroles, Nahema Marchal, Ravin Kumar, Kristian Lum, Canfer Akbulut, Mark Diaz, Stevie Bergman, Mikel Rodriguez, Verena Rieser, William Isaac

    Abstract: This research introduces STAR, a sociotechnical framework that improves on current best practices for red teaming safety of large language models. STAR makes two key contributions: it enhances steerability by generating parameterised instructions for human red teamers, leading to improved coverage of the risk surface. Parameterised instructions also provide more detailed insights into model failur… ▽ More

    Submitted 10 July, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures, 5 pages appendix. * denotes equal contribution

  2. A short review on graphonometric evaluation tools in children

    Authors: Belen Esther Aleman, Moises Diaz, Miguel Angel Ferrer

    Abstract: Handwriting is a complex task that involves the coordination of motor, perceptual and cognitive skills. It is a fundamental skill for the cognitive and academic development of children. However, the technological, and educational changes in recent decades have affected both the teaching and assessment of handwriting. This paper presents a literature review of handwriting analysis in children, incl… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Journal ref: Computer Science, vol 14285. Springer, Cham, 2024

  3. From operculum and body tail movements to different coupling of physical activity and respiratory frequency in farmed gilthead sea bream and European sea bass. Insights on aquaculture biosensing

    Authors: Miguel A. Ferrer, Josep A. Calduch-Giner, Moises Díaz, Javier Sosa, Enrique Rosell-Moll, Judith Santana Abril, Graciela Santana Sosa, Tomás Bautista Delgado, Cristina Carmona, Juan Antonio Martos-Sitcha, Enric Cabruja, Juan Manuel Afonso, Aurelio Vega, Manuel Lozano, Juan Antonio Montiel-Nelson, Jaume Pérez-Sánchez

    Abstract: The AEFishBIT tri-axial accelerometer was externally attached to the operculum to assess the divergent activity and respiratory patterns of two marine farmed fish, the gilthead sea bream (Sparus aurata) and European sea bass (Dicentrarchus labrax). Analysis of raw data from exercised fish highlighted the large amplitude of operculum aperture and body tail movements in European sea bass, which were… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Journal ref: Computers and Electronics in Agriculture, col.175,pp.105531,2020

  4. Writing Order Recovery in Complex and Long Static Handwriting

    Authors: Moises Diaz, Gioele Crispo, Antonio Parziale, Angelo Marcelli, Miguel A. Ferrer

    Abstract: The order in which the trajectory is executed is a powerful source of information for recognizers. However, there is still no general approach for recovering the trajectory of complex and long handwriting from static images. Complex specimens can result in multiple pen-downs and in a high number of trajectory crossings yielding agglomerations of pixels (also known as clusters). While the scientifi… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Journal ref: International Journal of Interactive Multimedia and Artificial Intelligence, Volume 7, number 4, Pages 171-184, 2022

  5. On the use of first and second derivative approximations for biometric online signature recognition

    Authors: Marcos Faundez-Zanuy, Moises Diaz

    Abstract: This paper investigates the impact of different approximation methods in feature extraction for pattern recognition applications, specifically focused on delta and delta-delta parameters. Using MCYT330 online signature data-base, our experiments show that 11-point approximation outperforms 1-point approximation, resulting in a 1.4% improvement in identification rate, 36.8% reduction in random forg… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Advances in Computational Intelligence. IWANN 2023. pp 461 to 472

    Journal ref: Lecture Notes in Computer Science, vol 14134, 2023

  6. Uniform vs. Lognormal Kinematics in Robots: Perceptual Preferences for Robotic Movements

    Authors: Jose J. Quintana, Miguel A. Ferrer, Moises Diaz, Jose J. Feo, Adam Wolniakowski, Konstantsin Miatliuk

    Abstract: Collaborative robots or cobots interact with humans in a common work environment. In cobots, one under investigated but important issue is related to their movement and how it is perceived by humans. This paper tries to analyze whether humans prefer a robot moving in a human or in a robotic fashion. To this end, the present work lays out what differentiates the movement performed by an industrial… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Journal ref: Applied Sciences Volume 12 Issue 23 (2022)

  7. MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification

    Authors: Miguel A. Ferrer, Abhijit Das, Moises Diaz, Aythami Morales, Cristina Carmona-Duarte, Umapada Pal

    Abstract: Script identification plays a vital role in applications that involve handwriting and document analysis within a multi-script and multi-lingual environment. Moreover, it exhibits a profound connection with human cognition. This paper provides a new database for benchmarking script identification algorithms, which contains both printed and handwritten documents collected from a wide variety of scri… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Journal ref: Cognitive Computation, Volume 16, pages 131 to 157,(2024)

  8. Graphomotor and Handwriting Disabilities Rating Scale (GHDRS):towards complex and objective assessment

    Authors: Jiri Mekyska, Katarina Safarova, Tomas Urbanek, Jirina Bednarova, Vojtech Zvoncak, Jana Marie Havigerova, Lukas Cunek, Zoltan Galaz, Jan Mucha, Christine Klauszova, Marcos Faundez-Zanuy, Miguel A. Ferrer, Moises Diaz

    Abstract: Graphomotor and handwriting disabilities (GD and HD, respectively) could significantly reduce children's quality of life. Effective remediation depends on proper diagnosis; however, current approaches to diagnosis and assessment of GD and HD have several limitations and knowledge gaps, e.g. they are subjective, they do not facilitate identification of specific manifestations, etc. The aim of this… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Journal ref: Australian Journalof Learning Difficulties, Routledge, 1-34,2024

  9. A Machine Learning Approach to Analyze the Effects of Alzheimer's Disease on Handwriting through Lognormal Features

    Authors: Tiziana D'Alessandro, Cristina Carmona-Duarte, Claudio De Stefano, Moises Diaz, Miguel A. Ferrer, Francesco Fontanella

    Abstract: Alzheimer's disease is one of the most incisive illnesses among the neurodegenerative ones, and it causes a progressive decline in cognitive abilities that, in the worst cases, becomes severe enough to interfere with daily life. Currently, there is no cure, so an early diagnosis is strongly needed to try and slow its progression through medical treatments. Handwriting analysis is considered a pote… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Journal ref: IGS 2023. Lecture Notes in Computer Science, vol 14285. Springer (2023)

  10. CowScreeningDB: A public benchmark dataset for lameness detection in dairy cows

    Authors: Shahid Ismail, Moises Diaz, Cristina Carmona-Duarte, Jose Manuel Vilar, Miguel A. Ferrer

    Abstract: Lameness is one of the costliest pathological problems affecting dairy animals. It is usually assessed by trained veterinary clinicians who observe features such as gait symmetry or gait parameters as step counts in real-time. With the development of artificial intelligence, various modular systems have been proposed to minimize subjectivity in lameness assessment. However, the major limitation in… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Journal ref: Computers and Electronics in Agriculture, vol.216, pp.108500, 2024

  11. Investigating the Common Authorship of Signatures by Off-Line Automatic Signature Verification Without the Use of Reference Signatures

    Authors: Moises Diaz, Miguel A. Ferrer, Soodamani Ramalingam, Richard Guest

    Abstract: In automatic signature verification, questioned specimens are usually compared with reference signatures. In writer-dependent schemes, a number of reference signatures are required to build up the individual signer model while a writer-independent system requires a set of reference signatures from several signers to develop the model of the system. This paper addresses the problem of automatic sig… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Journal ref: IEEE Transactions on Information Forensics and Security, vol.15, no.1, pp. 487 to 499 (2019)

  12. A Perspective Analysis of Handwritten Signature Technology

    Authors: Moises Diaz, Miguel A. Ferrer, Donato Impedovo, Muhammad Imran Malik, Giuseppe Pirlo, Rejean Plamondon

    Abstract: Handwritten signatures are biometric traits at the center of debate in the scientific community. Over the last 40 years, the interest in signature studies has grown steadily, having as its main reference the application of automatic signature verification, as previously published reviews in 1989, 2000, and 2008 bear witness. Ever since, and over the last 10 years, the application of handwritten si… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Journal ref: ACM Computing Surveys (CSUR), vol.51, no 6, pp. 117:1-117:39 (2018)

  13. Dynamically enhanced static handwriting representation for Parkinson's disease detection

    Authors: Moises Diaz, Miguel Angel Ferrer, Donato Impedovo, Giuseppe Pirlo, Gennaro Vessio

    Abstract: Computer aided diagnosis systems can provide non-invasive, low-cost tools to support clinicians. These systems have the potential to assist the diagnosis and monitoring of neurodegenerative disorders, in particular Parkinson's disease (PD). Handwriting plays a special role in the context of PD assessment. In this paper, the discriminating power of "dynamically enhanced" static images of handwritin… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Journal ref: Pattern Recognition Letters, vol. 128, pp. 204-210 (2019)

  14. Explainable offline automatic signature verifier to support forensic handwriting examiners

    Authors: Moises Diaz, Miguel A. Ferrer, Gennaro Vessio

    Abstract: Signature verification is a critical task in many applications, including forensic science, legal judgments, and financial markets. However, current signature verification systems are often difficult to explain, which can limit their acceptance in these applications. In this paper, we propose a novel explainable offline automatic signature verifier (ASV) to support forensic handwriting examiners.… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Journal ref: Neural Computing and Applications, Volume 36, pages 2411 to 2427 (2024)

  15. Online Signature Recognition: A Biologically Inspired Feature Vector Splitting Approach

    Authors: Marcos Faundez, Moises Diaz, Miguel Angel Ferrer

    Abstract: This research introduces an innovative approach to explore the cognitive and biologically inspired underpinnings of feature vector splitting for analyzing the significance of different attributes in e-security biometric signature recognition applications. Departing from traditional methods of concatenating features into an extended set, we employ multiple splitting strategies, aligning with cognit… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Journal ref: Cognitive Computation,vol:16,Pages 265 to 277 (2024)

  16. SM-DTW: Stability Modulated Dynamic Time Warping for signature verification

    Authors: Antonio Parziale, Moises Diaz, Miguel A. Ferrer, Angelo Marcelli

    Abstract: Building upon findings in computational model of handwriting learning and execution, we introduce the concept of stability to explain the difference between the actual movements performed during multiple execution of the subject's signature, and conjecture that the most stable parts of the signature should play a paramount role in evaluating the similarity between a questioned signature and the re… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Journal ref: Pattern Recognition Letters, Volume: 121, Pages 113-122 (2019)

  17. arXiv:2404.10857  [pdf, other

    cs.CL

    D3CODE: Disentangling Disagreements in Data across Cultures on Offensiveness Detection and Evaluation

    Authors: Aida Mostafazadeh Davani, Mark Díaz, Dylan Baker, Vinodkumar Prabhakaran

    Abstract: While human annotations play a crucial role in language technologies, annotator subjectivity has long been overlooked in data collection. Recent studies that have critically examined this issue are often situated in the Western context, and solely document differences across age, gender, or racial groups. As a result, NLP research on subjectivity have overlooked the fact that individuals within de… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  18. arXiv:2404.03084  [pdf, other

    cs.LG cs.AI cs.GT

    Rethinking Teacher-Student Curriculum Learning through the Cooperative Mechanics of Experience

    Authors: Manfred Diaz, Liam Paull, Andrea Tacchetti

    Abstract: Teacher-Student Curriculum Learning (TSCL) is a curriculum learning framework that draws inspiration from human cultural transmission and learning. It involves a teacher algorithm shaping the learning process of a learner algorithm by exposing it to controlled experiences. Despite its success, understanding the conditions under which TSCL is effective remains challenging. In this paper, we propose… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  19. arXiv:2402.06811  [pdf, ps, other

    cs.AI

    Discipline and Label: A WEIRD Genealogy and Social Theory of Data Annotation

    Authors: Andrew Smart, Ding Wang, Ellis Monk, Mark Díaz, Atoosa Kasirzadeh, Erin Van Liemt, Sonja Schmer-Galunder

    Abstract: Data annotation remains the sine qua non of machine learning and AI. Recent empirical work on data annotation has begun to highlight the importance of rater diversity for fairness, model performance, and new lines of research have begun to examine the working conditions for data annotation workers, the impacts and role of annotator subjectivity on labels, and the potential psychological harms from… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

    Comments: 18 pages

  20. Capturing waste collection planning expert knowledge in a fitness function through preference learning

    Authors: Laura Fernández Díaz, Miriam Fernández Díaz, José Ramón Quevedo, Elena Montañés

    Abstract: This paper copes with the COGERSA waste collection process. Up to now, experts have been manually designed the process using a trial and error mechanism. This process is not globally optimized, since it has been progressively and locally built as council demands appear. Planning optimization algorithms usually solve it, but they need a fitness function to evaluate a route planning quality. The dra… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Journal ref: Engineering Applications of Artificial Intelligence 2021 Volume 99 104113

  21. Static and Dynamic Synthesis of Bengali and Devanagari Signatures

    Authors: Miguel A. Ferrer, Sukalpa Chanda, Moises Diaz, Chayan Kr. Banerjee, Anirban Majumdar, Cristina Carmona-Duarte, Parikshit Acharya, Umapada Pal

    Abstract: Developing an automatic signature verification system is challenging and demands a large number of training samples. This is why synthetic handwriting generation is an emerging topic in document image analysis. Some handwriting synthesizers use the motor equivalence model, the well-established hypothesis from neuroscience, which analyses how a human being accomplishes movement. Specifically, a mot… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: Accepted version. Published on IEEE Transactions on Cybernetics [ISSN 2168-2267], v. 48(10), p. 2896-2907

    Journal ref: IEEE Transactions on Cybernetics, v. 48(10), p. 2896-2907, 2018

  22. Extending the kinematic theory of rapid movements with new primitives

    Authors: Miguel A. Ferrer, Moises Diaz, Jose J. Quintana, Cristina Carmona-Duarte

    Abstract: The Kinematic Theory of rapid movements, and its associated Sigma-Lognormal, model 2D spatiotemporal trajectories. It is constructed mainly as a temporal overlap of curves between virtual target points. Specifically, it uses an arc and a lognormal as primitives for the representation of the trajectory and velocity, respectively. This paper proposes developing this model, in what we call the Kinema… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted version: published on Pattern Recognition Letters [ISSN 0167-8655], v. 167, p. 181-188, (Marzo 2023)

    Journal ref: Pattern Recognition Letters, 167, 181-188,2023

  23. Synthesis of 3D on-air signatures with the Sigma-Lognormal model

    Authors: Miguel A. Ferrer, Moises Diaz, Cristina Carmona-Duarte, Jose J. Quintana Hernandez, Rejean Plamondon

    Abstract: Signature synthesis is a computation technique that generates artificial specimens which can support decision making in automatic signature verification. A lot of work has been dedicated to this subject, which centres on synthesizing dynamic and static two-dimensional handwriting on canvas. This paper proposes a framework to generate synthetic 3D on-air signatures exploiting the lognormality princ… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted Version. Published on Knowledge-Based Systems

    Journal ref: Knowledge-Based Systems, Vol. 265,2023

  24. iDeLog: Iterative Dual Spatial and Kinematic Extraction of Sigma-Lognormal Parameters

    Authors: Miguel A. Ferrer, Moises Diaz, Cristina Carmona-Duarte, Rejean Plamondon

    Abstract: The Kinematic Theory of rapid movements and its associated Sigma-Lognormal model have been extensively used in a large variety of applications. While the physical and biological meaning of the model have been widely tested and validated for rapid movements, some shortcomings have been detected when it is used with continuous long and complex movements. To alleviate such drawbacks, and inspired by… ▽ More

    Submitted 7 February, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

    Comments: Accepted Version published by Transactions on Pattern Analysis and Machine Intelligence

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(1); p.p. 114-125, 2020

  25. The illusion of artificial inclusion

    Authors: William Agnew, A. Stevie Bergman, Jennifer Chien, Mark Díaz, Seliem El-Sayed, Jaylen Pittman, Shakir Mohamed, Kevin R. McKee

    Abstract: Human participants play a central role in the development of modern artificial intelligence (AI) technology, in psychological science, and in user research. Recent advances in generative AI have attracted growing interest to the possibility of replacing human participants in these domains with AI surrogates. We survey several such "substitution proposals" to better understand the arguments for and… ▽ More

    Submitted 5 February, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI 2024)

  26. arXiv:2312.06861  [pdf, other

    cs.CY cs.CL

    Disentangling Perceptions of Offensiveness: Cultural and Moral Correlates

    Authors: Aida Davani, Mark Díaz, Dylan Baker, Vinodkumar Prabhakaran

    Abstract: Perception of offensiveness is inherently subjective, shaped by the lived experiences and socio-cultural values of the perceivers. Recent years have seen substantial efforts to build AI-based tools that can detect offensive language at scale, as a means to moderate social media platforms, and to ensure safety of conversational AI technologies such as ChatGPT and Bard. However, existing approaches… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  27. arXiv:2311.17259  [pdf, other

    cs.LG cs.CY

    SoUnD Framework: Analyzing (So)cial Representation in (Un)structured (D)ata

    Authors: Mark Díaz, Sunipa Dev, Emily Reif, Emily Denton, Vinodkumar Prabhakaran

    Abstract: The unstructured nature of data used in foundation model development is a challenge to systematic analyses for making data use and documentation decisions. From a Responsible AI perspective, these decisions often rely upon understanding how people are represented in data. We propose a framework designed to guide analysis of human representation in unstructured data and identify downstream risks. W… ▽ More

    Submitted 1 December, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

  28. arXiv:2311.14654  [pdf, other

    hep-ph cs.LG

    JetLOV: Enhancing Jet Tree Tagging through Neural Network Learning of Optimal LundNet Variables

    Authors: Mauricio A. Diaz, Giorgio Cerro, Jacan Chaplais, Srinandan Dasmahapatra, Stefano Moretti

    Abstract: Machine learning has played a pivotal role in advancing physics, with deep learning notably contributing to solving complex classification problems such as jet tagging in the field of jet physics. In this experiment, we aim to harness the full potential of neural networks while acknowledging that, at times, we may lose sight of the underlying physics governing these models. Nevertheless, we demons… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

    Comments: Accepted at the NeurIPS 2023 workshop: Machine Learning and the Physical Sciences

  29. arXiv:2311.05074  [pdf, other

    cs.CL cs.AI

    GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives

    Authors: Vinodkumar Prabhakaran, Christopher Homan, Lora Aroyo, Aida Mostafazadeh Davani, Alicia Parrish, Alex Taylor, Mark Díaz, Ding Wang, Gregory Serapio-García

    Abstract: Human annotation plays a core role in machine learning -- annotations for supervised models, safety guardrails for generative models, and human feedback for reinforcement learning, to cite a few avenues. However, the fact that many of these human annotations are inherently subjective is often overlooked. Recent work has demonstrated that ignoring rater subjectivity (typically resulting in rater di… ▽ More

    Submitted 13 June, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: Presented as a long paper at NAACL 2024 main conference

    Journal ref: 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics

  30. arXiv:2309.15432  [pdf, other

    cs.PL

    ComPile: A Large IR Dataset from Production Sources

    Authors: Aiden Grossman, Ludger Paehler, Konstantinos Parasyris, Tal Ben-Nun, Jacob Hegna, William Moses, Jose M Monsalve Diaz, Mircea Trofin, Johannes Doerfert

    Abstract: Code is increasingly becoming a core data modality of modern machine learning research impacting not only the way we write code with conversational agents like OpenAI's ChatGPT, Google's Bard, or Anthropic's Claude, the way we translate code from one language into another, but also the compiler infrastructure underlying the language. While modeling approaches may vary and representations differ, t… ▽ More

    Submitted 30 April, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

  31. arXiv:2307.11899  [pdf, other

    cs.LG cs.DC cs.SE

    Project Florida: Federated Learning Made Easy

    Authors: Daniel Madrigal Diaz, Andre Manoel, Jialei Chen, Nalin Singal, Robert Sim

    Abstract: We present Project Florida, a system architecture and software development kit (SDK) enabling deployment of large-scale Federated Learning (FL) solutions across a heterogeneous device ecosystem. Federated learning is an approach to machine learning based on a strong data sovereignty principle, i.e., that privacy and security of data is best enabled by storing it at its origin, whether on end-user… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

  32. arXiv:2306.11530  [pdf, other

    cs.HC

    Intersectionality in Conversational AI Safety: How Bayesian Multilevel Models Help Understand Diverse Perceptions of Safety

    Authors: Christopher M. Homan, Greg Serapio-Garcia, Lora Aroyo, Mark Diaz, Alicia Parrish, Vinodkumar Prabhakaran, Alex S. Taylor, Ding Wang

    Abstract: Conversational AI systems exhibit a level of human-like behavior that promises to have profound impacts on many aspects of daily life -- how people access information, create content, and seek social support. Yet these models have also shown a propensity for biases, offensive language, and conveying false information. Consequently, understanding and moderating safety risks in these models is a cri… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

  33. arXiv:2306.11247  [pdf, other

    cs.HC

    DICES Dataset: Diversity in Conversational AI Evaluation for Safety

    Authors: Lora Aroyo, Alex S. Taylor, Mark Diaz, Christopher M. Homan, Alicia Parrish, Greg Serapio-Garcia, Vinodkumar Prabhakaran, Ding Wang

    Abstract: Machine learning approaches often require training and evaluation datasets with a clear separation between positive and negative examples. This risks simplifying and even obscuring the inherent subjectivity present in many tasks. Preserving such variance in content and diversity in datasets is often expensive and laborious. This is especially troubling when building safety datasets for conversatio… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

  34. arXiv:2306.06327  [pdf, other

    cs.LG math.RT stat.ML

    Any-dimensional equivariant neural networks

    Authors: Eitan Levin, Mateo Díaz

    Abstract: Traditional supervised learning aims to learn an unknown mapping by fitting a function to a set of input-output pairs with a fixed dimension. The fitted function is then defined on inputs of the same dimension. However, in many settings, the unknown mapping takes inputs in any dimension; examples include graph parameters defined on graphs of any size and physics quantities defined on an arbitrary… ▽ More

    Submitted 29 April, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: 21 pages, 2 figures

    Journal ref: International Conference on Artificial Intelligence and Statistics. PMLR, 2024. Available from https://proceedings.mlr.press/v238/levin24a.html

  35. arXiv:2305.10403  [pdf, other

    cs.CL cs.AI

    PaLM 2 Technical Report

    Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego , et al. (103 additional authors not shown)

    Abstract: We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on… ▽ More

    Submitted 13 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  36. arXiv:2305.09903  [pdf, other

    cs.LG cs.CR cs.IT math.OC

    Privacy Loss of Noisy Stochastic Gradient Descent Might Converge Even for Non-Convex Losses

    Authors: Shahab Asoodeh, Mario Diaz

    Abstract: The Noisy-SGD algorithm is widely used for privately training machine learning models. Traditional privacy analyses of this algorithm assume that the internal state is publicly revealed, resulting in privacy loss bounds that increase indefinitely with the number of iterations. However, recent findings have shown that if the internal state remains hidden, then the privacy loss might remain bounded.… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

  37. arXiv:2305.09011  [pdf, other

    eess.IV cs.CV

    The Brain Tumor Segmentation (BraTS) Challenge 2023: Brain MR Image Synthesis for Tumor Segmentation (BraSyn)

    Authors: Hongwei Bran Li, Gian Marco Conte, Syed Muhammad Anwar, Florian Kofler, Ivan Ezhov, Koen van Leemput, Marie Piraud, Maria Diaz, Byrone Cole, Evan Calabrese, Jeff Rudie, Felix Meissen, Maruf Adewole, Anastasia Janas, Anahita Fathi Kazerooni, Dominic LaBella, Ahmed W. Moawad, Keyvan Farahani, James Eddy, Timothy Bergquist, Verena Chung, Russell Takeshi Shinohara, Farouk Dako, Walter Wiggins, Zachary Reitman , et al. (43 additional authors not shown)

    Abstract: Automated brain tumor segmentation methods have become well-established and reached performance levels offering clear clinical utility. These methods typically rely on four input magnetic resonance imaging (MRI) modalities: T1-weighted images with and without contrast enhancement, T2-weighted images, and FLAIR images. However, some sequences are often missing in clinical practice due to time const… ▽ More

    Submitted 28 June, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

    Comments: Technical report of BraSyn

  38. arXiv:2302.04337  [pdf, ps, other

    cs.LG

    (Re)Defining Expertise in Machine Learning Development

    Authors: Mark Díaz, Angela D. R. Smith

    Abstract: Domain experts are often engaged in the development of machine learning systems in a variety of ways, such as in data collection and evaluation of system performance. At the same time, who counts as an 'expert' and what constitutes 'expertise' is not always explicitly defined. In this project, we conduct a systematic literature review of machine learning research to understand 1) the bases on whic… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

    Comments: NeurIPS 2022 Workshop on Data-Centric AI, 2 pages

  39. arXiv:2302.00115  [pdf, other

    cs.DC

    On Memory Codelets: Prefetching, Recoding, Moving and Streaming Data

    Authors: Dawson Fox, Jose Monsalve Diaz, Xiaoming Li

    Abstract: For decades, memory capabilities have scaled up much slower than compute capabilities, leaving memory utilization as a major bottleneck. Prefetching and cache hierarchies mitigate this in applications with easily predictable memory accesses or those with high locality. In other applications like sparse linear algebra or graph-based applications, these strategies do not achieve effective utilizatio… ▽ More

    Submitted 31 January, 2023; originally announced February 2023.

  40. arXiv:2301.09406  [pdf, other

    cs.HC

    The Reasonable Effectiveness of Diverse Evaluation Data

    Authors: Lora Aroyo, Mark Diaz, Christopher Homan, Vinodkumar Prabhakaran, Alex Taylor, Ding Wang

    Abstract: In this paper, we present findings from an semi-experimental exploration of rater diversity and its influence on safety annotations of conversations generated by humans talking to a generative AI-chat bot. We find significant differences in judgments produced by raters from different geographic regions and annotation platforms, and correlate these perspectives with demographic sub-groups. Our work… ▽ More

    Submitted 23 January, 2023; originally announced January 2023.

    Comments: 5 pages

    Journal ref: 2022

  41. arXiv:2301.05560  [pdf, other

    cs.SE cs.LG

    OpenTwins: An open-source framework for the design, development and integration of effective 3D-IoT-AI-powered digital twins

    Authors: Julia Robles, Cristian Martín, Manuel Díaz

    Abstract: Although digital twins have recently emerged as a clear alternative for reliable asset representations, most of the solutions and tools available for the development of digital twins are tailored to specific environments. Furthermore, achieving reliable digital twins often requires the orchestration of technologies and paradigms such as machine learning, the Internet of Things, and 3D visualizatio… ▽ More

    Submitted 12 January, 2023; originally announced January 2023.

  42. Power to the People? Opportunities and Challenges for Participatory AI

    Authors: Abeba Birhane, William Isaac, Vinodkumar Prabhakaran, Mark Díaz, Madeleine Clare Elish, Iason Gabriel, Shakir Mohamed

    Abstract: Participatory approaches to artificial intelligence (AI) and machine learning (ML) are gaining momentum: the increased attention comes partly with the view that participation opens the gateway to an inclusive, equitable, robust, responsible and trustworthy AI.Among other benefits, participatory approaches are essential to understanding and adequately representing the needs, desires and perspective… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: To appear in the proceeding of EAAMO 2022

  43. arXiv:2209.06083  [pdf, other

    cs.DC

    Chiplets and the Codelet Model

    Authors: Dawson Fox, Jose M Monsalve Diaz, Xiaoming Li

    Abstract: Recently, hardware technology has rapidly evolved pertaining to domain-specific applications/architectures. Soon, processors may be composed of a large collection of vendor-independent IP specialized for application-specific algorithms, resulting in extreme heterogeneity. However, integrating multiple vendors within the same die is difficult. Chiplet technology is a solution that integrates multip… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: 11 pages, 4 figures, 2 tables

  44. arXiv:2207.13394  [pdf, other

    cs.LG cs.CV

    BeCAPTCHA-Type: Biometric Keystroke Data Generation for Improved Bot Detection

    Authors: Daniel DeAlcala, Aythami Morales, Ruben Tolosana, Alejandro Acien, Julian Fierrez, Santiago Hernandez, Miguel A. Ferrer, Moises Diaz

    Abstract: This work proposes a data driven learning model for the synthesis of keystroke biometric data. The proposed method is compared with two statistical approaches based on Universal and User-dependent models. These approaches are validated on the bot detection task, using the keystroke synthetic data to improve the training process of keystroke-based bot detection systems. Our experimental framework c… ▽ More

    Submitted 11 April, 2023; v1 submitted 27 July, 2022; originally announced July 2022.

    Comments: Paper accepted in IEEE Computer Society Workshop on Biometrics (CVPRw) 2023

  45. arXiv:2207.04173  [pdf, other

    math.OC cs.LG stat.ML

    Stochastic Approximation with Decision-Dependent Distributions: Asymptotic Normality and Optimality

    Authors: Joshua Cutler, Mateo Díaz, Dmitriy Drusvyatskiy

    Abstract: We analyze a stochastic approximation algorithm for decision-dependent problems, wherein the data distribution used by the algorithm evolves along the iterate sequence. The primary examples of such problems appear in performative prediction and its multiplayer extensions. We show that under mild assumptions, the deviation between the average iterate of the algorithm and the solution is asymptotica… ▽ More

    Submitted 13 March, 2024; v1 submitted 8 July, 2022; originally announced July 2022.

    Comments: 49 pages, 1 figure. v2: revised asymptotic optimality results and reworked exposition. v3: minor updates

    MSC Class: 90C15; 90C25

    Journal ref: Journal of Machine Learning Research, 25(90):1-49, 2024

  46. CrowdWorkSheets: Accounting for Individual and Collective Identities Underlying Crowdsourced Dataset Annotation

    Authors: Mark Diaz, Ian D. Kivlichan, Rachel Rosen, Dylan K. Baker, Razvan Amironesei, Vinodkumar Prabhakaran, Emily Denton

    Abstract: Human annotated data plays a crucial role in machine learning (ML) research and development. However, the ethical considerations around the processes and decisions that go into dataset annotation have not received nearly enough attention. In this paper, we survey an array of literature that provides insights into ethical considerations around crowdsourced dataset annotation. We synthesize these in… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

    Comments: 11 pages, Accepted at 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT). arXiv admin note: text overlap with arXiv:2112.04554

  47. arXiv:2204.02311  [pdf, other

    cs.CL

    PaLM: Scaling Language Modeling with Pathways

    Authors: Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin , et al. (42 additional authors not shown)

    Abstract: Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Tran… ▽ More

    Submitted 5 October, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

  48. arXiv:2203.13789  [pdf, other

    cs.LG

    FLUTE: A Scalable, Extensible Framework for High-Performance Federated Learning Simulations

    Authors: Mirian Hipolito Garcia, Andre Manoel, Daniel Madrigal Diaz, Fatemehsadat Mireshghallah, Robert Sim, Dimitrios Dimitriadis

    Abstract: In this paper we introduce "Federated Learning Utilities and Tools for Experimentation" (FLUTE), a high-performance open-source platform for federated learning research and offline simulations. The goal of FLUTE is to enable rapid prototyping and simulation of new federated learning algorithms at scale, including novel optimization, privacy, and communications strategies. We describe the architect… ▽ More

    Submitted 14 November, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

    Comments: 14 Pages, 3 Figures, 11 Tables

  49. Handwriting Biometrics: Applications and Future Trends in e-Security and e-Health

    Authors: Marcos Faundez-Zanuy, Julian Fierrez, Miguel A. Ferrer, Moises Diaz, Ruben Tolosana, Réjean Plamondon

    Abstract: Background- This paper summarizes the state-of-the-art and applications based on online handwritting signals with special emphasis on e-security and e-health fields. Methods- In particular, we focus on the main achievements and challenges that should be addressed by the scientific community, providing a guide document for future research. Conclusions- Among all the points discussed in this article… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

    Comments: 24 pages

    Journal ref: Cognitive Computation 12, 2020

  50. arXiv:2201.08239  [pdf, other

    cs.CL cs.AI

    LaMDA: Language Models for Dialog Applications

    Authors: Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, YaGuang Li, Hongrae Lee, Huaixiu Steven Zheng, Amin Ghafouri, Marcelo Menegali, Yanping Huang, Maxim Krikun, Dmitry Lepikhin, James Qin, Dehao Chen, Yuanzhong Xu, Zhifeng Chen, Adam Roberts, Maarten Bosma, Vincent Zhao , et al. (35 additional authors not shown)

    Abstract: We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformer-based neural language models specialized for dialog, which have up to 137B parameters and are pre-trained on 1.56T words of public dialog data and web text. While model scaling alone can improve quality, it shows less improvements on safety and factual grounding. We demonstrate that fine-tuning with annotat… ▽ More

    Submitted 10 February, 2022; v1 submitted 20 January, 2022; originally announced January 2022.