subscribe to arXiv mailings

Building a Verifiable Logical Clock for P2P Networks

Authors: Guangda Sun, Tianyang Tao, Yanpei Guo, Michael Yiqing Hu, Jialin Li

Abstract: Logical clocks are a fundamental tool to establish causal ordering of events in a distributed system. They have been applied in weakly consistent storage systems, causally ordered broadcast, distributed snapshots, deadlock detection, and distributed system debugging. However, prior logical clock constructs fail to work in an open network with Byzantine participants. In this work, we present Chrono… ▽ More Logical clocks are a fundamental tool to establish causal ordering of events in a distributed system. They have been applied in weakly consistent storage systems, causally ordered broadcast, distributed snapshots, deadlock detection, and distributed system debugging. However, prior logical clock constructs fail to work in an open network with Byzantine participants. In this work, we present Chrono, a novel logical clock system that targets such challenging environment. We first redefine causality properties among distributed processes under the Byzantine failure model. To enforce these properties, Chrono defines a new validator abstraction for building fault-tolerant logical clocks. Furthermore, our validator abstraction is customizable: Chrono includes multiple backend implementations for the abstraction, each with different security-performance trade-offs. We have applied Chrono to build two decentralized applications, a mutual exclusive service and a weakly consistent key-value store. Chrono adds only marginal overhead compared to systems that tolerate no Byzantine faults. It also out-performs state-of-the-art BFT total order protocols by significant margins. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2404.06214 [pdf, other]

[Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

Authors: Leshem Choshen, Ryan Cotterell, Michael Y. Hu, Tal Linzen, Aaron Mueller, Candace Ross, Alex Warstadt, Ethan Wilcox, Adina Williams, Chengxu Zhuang

Abstract: After last year's successful BabyLM Challenge, the competition will be hosted again in 2024/2025. The overarching goals of the challenge remain the same; however, some of the competition rules will be different. The big changes for this year's competition are as follows: First, we replace the loose track with a paper track, which allows (for example) non-model-based submissions, novel cognitively-… ▽ More After last year's successful BabyLM Challenge, the competition will be hosted again in 2024/2025. The overarching goals of the challenge remain the same; however, some of the competition rules will be different. The big changes for this year's competition are as follows: First, we replace the loose track with a paper track, which allows (for example) non-model-based submissions, novel cognitively-inspired benchmarks, or analysis techniques. Second, we are relaxing the rules around pretraining data, and will now allow participants to construct their own datasets provided they stay within the 100M-word or 10M-word budget. Third, we introduce a multimodal vision-and-language track, and will release a corpus of 50% text-only and 50% image-text multimodal data as a starting point for LM model training. The purpose of this CfP is to provide rules for this year's challenge, explain these rule changes and their rationale in greater detail, give a timeline of this year's competition, and provide answers to frequently asked questions from last year's challenge. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2402.03618 [pdf, other]

Comparing Abstraction in Humans and Large Language Models Using Multimodal Serial Reproduction

Authors: Sreejan Kumar, Raja Marjieh, Byron Zhang, Declan Campbell, Michael Y. Hu, Umang Bhatt, Brenden Lake, Thomas L. Griffiths

Abstract: Humans extract useful abstractions of the world from noisy sensory data. Serial reproduction allows us to study how people construe the world through a paradigm similar to the game of telephone, where one person observes a stimulus and reproduces it for the next to form a chain of reproductions. Past serial reproduction experiments typically employ a single sensory modality, but humans often commu… ▽ More Humans extract useful abstractions of the world from noisy sensory data. Serial reproduction allows us to study how people construe the world through a paradigm similar to the game of telephone, where one person observes a stimulus and reproduces it for the next to form a chain of reproductions. Past serial reproduction experiments typically employ a single sensory modality, but humans often communicate abstractions of the world to each other through language. To investigate the effect language on the formation of abstractions, we implement a novel multimodal serial reproduction framework by asking people who receive a visual stimulus to reproduce it in a linguistic format, and vice versa. We ran unimodal and multimodal chains with both humans and GPT-4 and find that adding language as a modality has a larger effect on human reproductions than GPT-4's. This suggests human visual and linguistic representations are more dissociable than those of GPT-4. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2308.09543 [pdf, other]

Latent State Models of Training Dynamics

Authors: Michael Y. Hu, Angelica Chen, Naomi Saphra, Kyunghyun Cho

Abstract: The impact of randomness on model training is poorly understood. How do differences in data order and initialization actually manifest in the model, such that some training runs outperform others or converge faster? Furthermore, how can we interpret the resulting training dynamics and the phase transitions that characterize different trajectories? To understand the effect of randomness on the dyna… ▽ More The impact of randomness on model training is poorly understood. How do differences in data order and initialization actually manifest in the model, such that some training runs outperform others or converge faster? Furthermore, how can we interpret the resulting training dynamics and the phase transitions that characterize different trajectories? To understand the effect of randomness on the dynamics and outcomes of neural network training, we train models multiple times with different random seeds and compute a variety of metrics throughout training, such as the $L_2$ norm, mean, and variance of the neural network's weights. We then fit a hidden Markov model (HMM) over the resulting sequences of metrics. The HMM represents training as a stochastic process of transitions between latent states, providing an intuitive overview of significant changes during training. Using our method, we produce a low-dimensional, discrete representation of training dynamics on grokking tasks, image classification, and masked language modeling. We use the HMM representation to study phase transitions and identify latent "detour" states that slow down convergence. △ Less

Submitted 19 January, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

Comments: Accepted at TMLR 2023. Updated Jan 19, 2024 with erratum

arXiv:2205.11558 [pdf, other]

Using Natural Language and Program Abstractions to Instill Human Inductive Biases in Machines

Authors: Sreejan Kumar, Carlos G. Correa, Ishita Dasgupta, Raja Marjieh, Michael Y. Hu, Robert D. Hawkins, Nathaniel D. Daw, Jonathan D. Cohen, Karthik Narasimhan, Thomas L. Griffiths

Abstract: Strong inductive biases give humans the ability to quickly learn to perform a variety of tasks. Although meta-learning is a method to endow neural networks with useful inductive biases, agents trained by meta-learning may sometimes acquire very different strategies from humans. We show that co-training these agents on predicting representations from natural language task descriptions and programs… ▽ More Strong inductive biases give humans the ability to quickly learn to perform a variety of tasks. Although meta-learning is a method to endow neural networks with useful inductive biases, agents trained by meta-learning may sometimes acquire very different strategies from humans. We show that co-training these agents on predicting representations from natural language task descriptions and programs induced to generate such tasks guides them toward more human-like inductive biases. Human-generated language descriptions and program induction models that add new learned primitives both contain abstract concepts that can compress description length. Co-training on these representations result in more human-like behavior in downstream meta-reinforcement learning agents than less abstract controls (synthetic language descriptions, program induction without learned primitives), suggesting that the abstraction supported by these representations is key. △ Less

Submitted 5 February, 2023; v1 submitted 23 May, 2022; originally announced May 2022.

Comments: In Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022), winner of Outstanding Paper Award

Showing 1–5 of 5 results for author: Hu, M Y