Skip to main content

Showing 1–5 of 5 results for author: Earnshaw, B

  1. arXiv:2404.10242  [pdf, other

    cs.CV cs.AI cs.LG

    Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology

    Authors: Oren Kraus, Kian Kenyon-Dean, Saber Saberian, Maryam Fallah, Peter McLean, Jess Leung, Vasudev Sharma, Ayla Khan, Jia Balakrishnan, Safiye Celik, Dominique Beaini, Maciej Sypetkowski, Chi Vicky Cheng, Kristen Morse, Maureen Makes, Ben Mabey, Berton Earnshaw

    Abstract: Featurizing microscopy images for use in biological research remains a significant challenge, especially for large-scale experiments spanning millions of images. This work explores the scaling properties of weakly supervised classifiers and self-supervised masked autoencoders (MAEs) when training with increasingly larger model backbones and microscopy datasets. Our results show that ViT-based MAEs… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Highlight. arXiv admin note: text overlap with arXiv:2309.16064

  2. arXiv:2309.16064  [pdf, other

    cs.CV cs.AI cs.LG

    Masked Autoencoders are Scalable Learners of Cellular Morphology

    Authors: Oren Kraus, Kian Kenyon-Dean, Saber Saberian, Maryam Fallah, Peter McLean, Jess Leung, Vasudev Sharma, Ayla Khan, Jia Balakrishnan, Safiye Celik, Maciej Sypetkowski, Chi Vicky Cheng, Kristen Morse, Maureen Makes, Ben Mabey, Berton Earnshaw

    Abstract: Inferring biological relationships from cellular phenotypes in high-content microscopy screens provides significant opportunity and challenge in biological research. Prior results have shown that deep vision models can capture biological signal better than hand-crafted features. This work explores how self-supervised deep learning approaches scale when training larger models on larger microscopy d… ▽ More

    Submitted 27 November, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Spotlight at NeurIPS 2023 Generative AI and Biology (GenBio) Workshop

  3. arXiv:2301.05768  [pdf, other

    cs.CV

    RxRx1: A Dataset for Evaluating Experimental Batch Correction Methods

    Authors: Maciej Sypetkowski, Morteza Rezanejad, Saber Saberian, Oren Kraus, John Urbanik, James Taylor, Ben Mabey, Mason Victors, Jason Yosinski, Alborz Rezazadeh Sereshkeh, Imran Haque, Berton Earnshaw

    Abstract: High-throughput screening techniques are commonly used to obtain large quantities of data in many fields of biology. It is well known that artifacts arising from variability in the technical execution of different experimental batches within such screens confound these observations and can lead to invalid biological conclusions. It is therefore necessary to account for these batch effects when ana… ▽ More

    Submitted 13 January, 2023; originally announced January 2023.

  4. arXiv:2211.02657  [pdf, other

    q-bio.QM cs.LG q-bio.BM

    MolE: a molecular foundation model for drug discovery

    Authors: Oscar Méndez-Lucio, Christos Nicolaou, Berton Earnshaw

    Abstract: Models that accurately predict properties based on chemical structure are valuable tools in drug discovery. However, for many properties, public and private training sets are typically small, and it is difficult for the models to generalize well outside of the training data. Recently, large language models have addressed this problem by using self-supervised pretraining on large unlabeled datasets… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

    Comments: Accepted at Learning Meaningful Representations of Life workshop, NeurIPS 2022

  5. arXiv:2012.07421  [pdf, other

    cs.LG

    WILDS: A Benchmark of in-the-Wild Distribution Shifts

    Authors: Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, Etienne David, Ian Stavness, Wei Guo, Berton A. Earnshaw, Imran S. Haque, Sara Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, Percy Liang

    Abstract: Distribution shifts -- where the training distribution differs from the test distribution -- can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild. Despite their ubiquity in the real-world deployments, these distribution shifts are under-represented in the datasets widely used in the ML community today. To address this gap, we present WILDS, a curated benchma… ▽ More

    Submitted 16 July, 2021; v1 submitted 14 December, 2020; originally announced December 2020.