Skip to main content

Showing 1–50 of 183 results for author: Lee, N

  1. arXiv:2407.10542  [pdf, other

    cs.CV cs.AI

    3D Geometric Shape Assembly via Efficient Point Cloud Matching

    Authors: Nahyuk Lee, Juhong Min, Junha Lee, Seungwook Kim, Kanghee Lee, Jaesik Park, Minsu Cho

    Abstract: Learning to assemble geometric shapes into a larger target structure is a pivotal task in various practical applications. In this work, we tackle this problem by establishing local correspondences between point clouds of part shapes in both coarse- and fine-levels. To this end, we introduce Proxy Match Transform (PMT), an approximate high-order feature transform layer that enables reliable matchin… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted to ICML 2024

  2. arXiv:2407.10461  [pdf, ps, other

    cs.IT

    Multibeam Satellite Communications with Massive MIMO: Asymptotic Performance Analysis and Design Insights

    Authors: Seyong Kim, Jinseok Choi, Wonjae Shin, Namyoon Lee, Jeonghun Park

    Abstract: To achieve high performance without substantial overheads associated with channel state information (CSI) of ground users, we consider a fixed-beam precoding approach, where a satellite forms multiple fixed-beams without relying on CSI, then select a suitable user set for each beam. Upon this precoding method, we put forth a satellite equipped with massive multiple-input multiple-output (MIMO), by… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  3. arXiv:2407.09043  [pdf, other

    cs.AI

    Molecule Language Model with Augmented Pairs and Expertise Transfer

    Authors: Namkyeong Lee, Siddhartha Laghuvarapu, Chanyoung Park, Jimeng Sun

    Abstract: Understanding the molecules and their textual descriptions via molecule language models (MoLM) recently got a surge of interest among researchers. However, unique challenges exist in the field of MoLM due to 1) a limited amount of molecule-text paired data and 2) missing expertise that occurred due to the specialized areas of focus among the experts. To this end, we propose AMOLE, which 1) augment… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: ACL 2024 Workshop on Languages and Molecule

  4. arXiv:2406.15524  [pdf, other

    cs.CL cs.LG

    Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization

    Authors: Sungbin Shin, Wonpyo Park, Jaeho Lee, Namhoon Lee

    Abstract: This work suggests fundamentally rethinking the current practice of pruning large language models (LLMs). The way it is done is by divide and conquer: split the model into submodels, sequentially prune them, and reconstruct predictions of the dense counterparts on small calibration data one at a time; the final model is obtained simply by putting the resulting sparse submodels together. While this… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  5. arXiv:2406.09948  [pdf, other

    cs.CL

    BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages

    Authors: Junho Myung, Nayeon Lee, Yi Zhou, Jiho Jin, Rifki Afina Putri, Dimosthenis Antypas, Hsuvas Borkakoty, Eunsu Kim, Carla Perez-Almendros, Abinew Ali Ayele, Víctor Gutiérrez-Basulto, Yazmín Ibáñez-García, Hwaran Lee, Shamsuddeen Hassan Muhammad, Kiwoong Park, Anar Sabuhi Rzayev, Nina White, Seid Muhie Yimam, Mohammad Taher Pilehvar, Nedjma Ousidhoum, Jose Camacho-Collados, Alice Oh

    Abstract: Large language models (LLMs) often lack culture-specific knowledge of daily life, especially across diverse regions and non-English languages. Existing benchmarks for evaluating LLMs' cultural sensitivities are limited to a single language or collected from online sources such as Wikipedia, which do not reflect the mundane everyday lifestyles of diverse regions. That is, information about the food… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  6. arXiv:2406.06424  [pdf, other

    cs.CV

    Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

    Authors: Jiwoo Hong, Sayak Paul, Noah Lee, Kashif Rasul, James Thorne, Jongheon Jeong

    Abstract: Modern alignment techniques based on human preferences, such as RLHF and DPO, typically employ divergence regularization relative to the reference model to ensure training stability. However, this often limits the flexibility of models during alignment, especially when there is a clear distributional discrepancy between the preference data and the reference model. In this paper, we focus on the al… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Preprint

  7. arXiv:2406.05761  [pdf, other

    cs.CL

    The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

    Authors: Seungone Kim, Juyoung Suk, Ji Yong Cho, Shayne Longpre, Chaeeun Kim, Dongkeun Yoon, Guijin Son, Yejin Cho, Sheikh Shafayat, Jinheon Baek, Sue Hyun Park, Hyeonbin Hwang, Jinkyung Jo, Hyowon Cho, Haebin Shin, Seongyun Lee, Hanseok Oh, Noah Lee, Namgyu Ho, Se June Joo, Miyoung Ko, Yoonjoo Lee, Hyungjoo Chae, Jamin Shin, Joel Jang , et al. (7 additional authors not shown)

    Abstract: As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on spec… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Work in Progress

  8. arXiv:2404.14276  [pdf, other

    stat.ML cs.LG

    A Bayesian Approach for Prioritising Driving Behaviour Investigations in Telematic Auto Insurance Policies

    Authors: Mark McLeod, Bernardo Perez-Orozco, Nika Lee, Davide Zilli

    Abstract: Automotive insurers increasingly have access to telematic information via black-box recorders installed in the insured vehicle, and wish to identify undesirable behaviour which may signify increased risk or uninsured activities. However, identification of such behaviour with machine learning is non-trivial, and results are far from perfect, requiring human investigation to verify suspected cases.… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: International Congress of Actuaries (2023)

  9. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  10. arXiv:2403.18932  [pdf, other

    cs.CL cs.AI

    Measuring Political Bias in Large Language Models: What Is Said and How It Is Said

    Authors: Yejin Bang, Delong Chen, Nayeon Lee, Pascale Fung

    Abstract: We propose to measure political bias in LLMs by analyzing both the content and style of their generated content regarding political issues. Existing benchmarks and measures focus on gender and racial biases. However, political bias exists in LLMs and can lead to polarization and other harms in downstream applications. In order to provide transparency to users, we advocate that there should be fine… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 16 pages

  11. arXiv:2403.16372  [pdf, other

    cs.LG cs.DC eess.SP

    SignSGD with Federated Voting

    Authors: Chanho Park, H. Vincent Poor, Namyoon Lee

    Abstract: Distributed learning is commonly used for accelerating model training by harnessing the computational capabilities of multiple-edge devices. However, in practical applications, the communication delay emerges as a bottleneck due to the substantial information exchange required between workers and a central parameter server. SignSGD with majority voting (signSGD-MV) is an effective distributed lear… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  12. arXiv:2403.15692  [pdf, other

    cs.IT eess.SP

    Block Orthogonal Sparse Superposition Codes for $ \sf{L}^3 $ Communications: Low Error Rate, Low Latency, and Low Power Consumption

    Authors: Donghwa Han, Bowhyung Lee, Min Jang, Donghun Lee, Seho Myung, Namyoon Lee

    Abstract: Block orthogonal sparse superposition (BOSS) code is a class of joint coded modulation methods, which can closely achieve the finite-blocklength capacity with a low-complexity decoder at a few coding rates under Gaussian channels. However, for fading channels, the code performance degrades considerably because coded symbols experience different channel fading effects. In this paper, we put forth n… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  13. arXiv:2403.15042  [pdf, other

    cs.CL

    LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

    Authors: Nicholas Lee, Thanakul Wattanawong, Sehoon Kim, Karttikeya Mangalam, Sheng Shen, Gopala Anumanchipalli, Michael W. Mahoney, Kurt Keutzer, Amir Gholami

    Abstract: Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast majority of natural language processing tasks. While many real-world applications still require fine-tuning to reach satisfactory levels of performance, many of them are in the low-data regime, making fine-tuning challenging. To address this, we propose LLM2LLM, a targeted and iterative data augmentation st… ▽ More

    Submitted 13 July, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

    Comments: ACL 2024

  14. arXiv:2403.11762  [pdf, other

    cs.IT eess.SP

    Full-Duplex MU-MIMO Systems with Coarse Quantization: How Many Bits Do We Need?

    Authors: Seunghyeong Yoo, Seokjun Park, Mintaek Oh, Namyoon Lee, Jinseok Choi

    Abstract: This paper investigates full-duplex (FD) multi-user multiple-input multiple-output (MU-MIMO) system design with coarse quantization. We first analyze the impact of self-interference (SI) on quantization in FD single-input single-output systems. The analysis elucidates that the minimum required number of analog-to-digital converter (ADC) bits is logarithmically proportional to the ratio of total re… ▽ More

    Submitted 18 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  15. arXiv:2403.07821  [pdf, other

    cs.SE

    Augmenting Interpolation-Based Model Checking with Auxiliary Invariants (Extended Version)

    Authors: Dirk Beyer, Po-Chun Chien, Nian-Ze Lee

    Abstract: Software model checking is a challenging problem, and generating relevant invariants is a key factor in proving the safety properties of a program. Program invariants can be obtained by various approaches, including lightweight procedures based on data-flow analysis and intensive techniques using Craig interpolation. Although data-flow analysis runs efficiently, it often produces invariants that a… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  16. arXiv:2403.07691  [pdf, other

    cs.CL cs.AI

    ORPO: Monolithic Preference Optimization without Reference Model

    Authors: Jiwoo Hong, Noah Lee, James Thorne

    Abstract: While recent preference alignment algorithms for language models have demonstrated promising results, supervised fine-tuning (SFT) remains imperative for achieving successful convergence. In this paper, we study the crucial role of SFT within the context of preference alignment, emphasizing that a minor penalty for the disfavored generation style is sufficient for preference-aligned SFT. Building… ▽ More

    Submitted 14 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Preprint

  17. arXiv:2402.09155  [pdf, ps, other

    eess.SP cs.IT

    Joint and Robust Beamforming Framework for Integrated Sensing and Communication Systems

    Authors: Jinseok Choi, Jeonghun Park, Namyoon Lee, Ahmed Alkhateeb

    Abstract: Integrated sensing and communication (ISAC) is widely recognized as a fundamental enabler for future wireless communications. In this paper, we present a joint communication and radar beamforming framework for maximizing a sum spectral efficiency (SE) while guaranteeing desired radar performance with imperfect channel state information (CSI) in multi-user and multi-target ISAC systems. To this end… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: submitted for possible IEEE publication

  18. arXiv:2402.07381  [pdf, other

    cs.IT

    RIS-Empowered LEO Satellite Networks for 6G: Promising Usage Scenarios and Future Directions

    Authors: Mesut Toka, Byungju Lee, Jaehyup Seong, Aryan Kaushik, Juhwan Lee, Jungwoo Lee, Namyoon Lee, Wonjae Shin, H. Vincent Poor

    Abstract: Low-Earth orbit (LEO) satellite systems have been deemed a promising key enabler for current 5G and the forthcoming 6G wireless networks. Such LEO satellite constellations can provide worldwide three-dimensional coverage, high data rate, and scalability, thus enabling truly ubiquitous connectivity. On the other hand, another promising technology, reconfigurable intelligent surfaces (RISs), has eme… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

    Comments: 18 pages, 5 figures, Paper accepted by IEEE Communications Magazine

  19. arXiv:2402.04248  [pdf, other

    cs.LG

    Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

    Authors: Jongho Park, Jaeseung Park, Zheyang Xiong, Nayoung Lee, Jaewoong Cho, Samet Oymak, Kangwook Lee, Dimitris Papailiopoulos

    Abstract: State-space models (SSMs), such as Mamba (Gu & Dao, 2023), have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and input-dependent token selection to mitigate the quadratic cost of multi-head attention. Although SSMs exhibit competitive performance, their in-context learning (ICL) capabilities, a remarkable emergent property of mo… ▽ More

    Submitted 25 April, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: Changes in v2: experiments on formal language ICL and explorations of width vs. depth on ICL; code repo available (24 pages, 10 figures)

  20. arXiv:2402.01340  [pdf, ps, other

    cs.LG cs.CR eess.SP

    SignSGD with Federated Defense: Harnessing Adversarial Attacks through Gradient Sign Decoding

    Authors: Chanho Park, Namyoon Lee

    Abstract: Distributed learning is an effective approach to accelerate model training using multiple workers. However, substantial communication delays emerge between workers and a parameter server due to massive costs associated with communicating gradients. SignSGD with majority voting (signSGD-MV) is a simple yet effective optimizer that reduces communication costs through one-bit quantization, yet the co… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  21. arXiv:2401.05193  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Experiment Planning with Function Approximation

    Authors: Aldo Pacchiano, Jonathan N. Lee, Emma Brunskill

    Abstract: We study the problem of experiment planning with function approximation in contextual bandit problems. In settings where there is a significant overhead to deploying adaptive algorithms -- for example, when the execution of the data collection policies is required to be distributed, or a human in the loop is needed to implement these policies -- producing in advance a set of policies for data coll… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: 10 pages main

  22. arXiv:2312.13289  [pdf, other

    cond-mat.mtrl-sci cs.LG

    Stoichiometry Representation Learning with Polymorphic Crystal Structures

    Authors: Namkyeong Lee, Heewoong Noh, Gyoung S. Na, Tianfan Fu, Jimeng Sun, Chanyoung Park

    Abstract: Despite the recent success of machine learning (ML) in materials science, its success heavily relies on the structural description of crystal, which is itself computationally demanding and occasionally unattainable. Stoichiometry descriptors can be an alternative approach, which reveals the ratio between elements involved to form a certain compound without any structural information. However, it i… ▽ More

    Submitted 17 November, 2023; originally announced December 2023.

    Comments: NeurIPS 2023 AI4Science Workshop

  23. arXiv:2312.04511  [pdf, other

    cs.CL

    An LLM Compiler for Parallel Function Calling

    Authors: Sehoon Kim, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W. Mahoney, Kurt Keutzer, Amir Gholami

    Abstract: The reasoning capabilities of the recent LLMs enable them to execute external function calls to overcome their inherent limitations, such as knowledge cutoffs, poor arithmetic skills, or lack of access to private data. This development has allowed LLMs to select and coordinate multiple functions based on the context to tackle more complex problems. However, current methods for function calling oft… ▽ More

    Submitted 4 June, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: ICML 2024

  24. arXiv:2312.03901  [pdf, other

    cs.CY

    Redrawing the 2012 map of the Maryland congressional districts

    Authors: Noah Lee, Hyunwoo Park, Sangho Shim

    Abstract: Gerrymandering is the practice of drawing biased electoral maps that manipulate the voter population to gain an advantage. The most recent time gerrymandering became an issue was 2019 when the U.S. Federal Supreme Court decided that the court does not have the authority to dictate how to draw the district map and state legislators are the ones who should come up with an electoral district plan. We… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 8 pages, to be submitted to IISE 2024 Annual Conference Proceedings

    MSC Class: 90

  25. arXiv:2311.18172  [pdf, other

    cs.IT eess.SP

    Multi-Rate Variable-Length CSI Compression for FDD Massive MIMO

    Authors: Bumsu Park, Heedong Do, Namyoon Lee

    Abstract: For frequency-division-duplexing (FDD) systems, channel state information (CSI) should be fed back from the user terminal to the base station. This feedback overhead becomes problematic as the number of antennas grows. To alleviate this issue, we propose a flexible CSI compression method using variational autoencoder (VAE) with an entropy bottleneck structure, which can support multi-rate and vari… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  26. arXiv:2311.17539  [pdf, other

    cs.LG math.OC stat.ML

    Critical Influence of Overparameterization on Sharpness-aware Minimization

    Authors: Sungbin Shin, Dongyeop Lee, Maksym Andriushchenko, Namhoon Lee

    Abstract: Training an overparameterized neural network can yield minimizers of different generalization capabilities despite the same level of training loss. Meanwhile, with evidence that suggests a strong correlation between the sharpness of minima and their generalization errors, increasing efforts have been made to develop optimization methods to explicitly find flat minima as more generalizable solution… ▽ More

    Submitted 19 June, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

  27. arXiv:2311.12856  [pdf, other

    cond-mat.mtrl-sci cs.AI cs.LG

    Density of States Prediction of Crystalline Materials via Prompt-guided Multi-Modal Transformer

    Authors: Namkyeong Lee, Heewoong Noh, Sungwon Kim, Dongmin Hyun, Gyoung S. Na, Chanyoung Park

    Abstract: The density of states (DOS) is a spectral property of crystalline materials, which provides fundamental insights into various characteristics of the materials. While previous works mainly focus on obtaining high-quality representations of crystalline materials for DOS prediction, we focus on predicting the DOS from the obtained representations by reflecting the nature of DOS: DOS determines the ge… ▽ More

    Submitted 22 November, 2023; v1 submitted 24 October, 2023; originally announced November 2023.

    Comments: NeurIPS 2023. arXiv admin note: text overlap with arXiv:2303.07000

  28. ezBIDS: Guided standardization of neuroimaging data interoperable with major data archives and platforms

    Authors: Daniel Levitas, Soichi Hayashi, Sophia Vinci-Booher, Anibal Heinsfeld, Dheeraj Bhatia, Nicholas Lee, Anthony Galassi, Guiomar Niso, Franco Pestilli

    Abstract: Data standardization has become one of the leading methods neuroimaging researchers rely on for data sharing and reproducibility. Data standardization promotes a common framework through which researchers can utilize others' data. Yet, as of today, formatting datasets that adhere to community best practices requires technical expertise involving coding and considerable knowledge of file formats an… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  29. arXiv:2311.03285  [pdf, other

    cs.LG cs.AI cs.DC

    S-LoRA: Serving Thousands of Concurrent LoRA Adapters

    Authors: Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica

    Abstract: The "pretrain-then-finetune" paradigm is commonly adopted in the deployment of large language models. Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, is often employed to adapt a base model to a multitude of tasks, resulting in a substantial collection of LoRA adapters derived from one base model. We observe that this paradigm presents significant opportunities for batched in… ▽ More

    Submitted 5 June, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

  30. arXiv:2311.02236  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Robust Fine-Tuning of Vision-Language Models for Domain Generalization

    Authors: Kevin Vogt-Lowell, Noah Lee, Theodoros Tsiligkaridis, Marc Vaillant

    Abstract: Transfer learning enables the sharing of common knowledge among models for a variety of downstream tasks, but traditional methods suffer in limited training data settings and produce narrow models incapable of effectively generalizing under distribution shifts. Foundation models have recently demonstrated impressive zero-shot inference capabilities and robustness under distribution shifts. However… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Comments: In proceedings of the 27th IEEE High Performance Extreme Computing Conference

  31. arXiv:2311.01817  [pdf, other

    cs.CL

    Mitigating Framing Bias with Polarity Minimization Loss

    Authors: Yejin Bang, Nayeon Lee, Pascale Fung

    Abstract: Framing bias plays a significant role in exacerbating political polarization by distorting the perception of actual events. Media outlets with divergent political stances often use polarized language in their reporting of the same event. We propose a new loss function that encourages the model to minimize the polarity difference between the polarized input articles to reduce framing bias. Specific… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Comments: 11 pages, EMNLP2023

  32. arXiv:2310.07101  [pdf, other

    cs.IT eess.SP

    Hybrid Arrays: How Many RF Chains Are Required to Prevent Beam Squint?

    Authors: Heedong Do, Namyoon Lee, Robert W. Heath Jr, Angel Lozano

    Abstract: With increasing frequencies, bandwidths, and array apertures, the phenomenon of beam squint arises as a serious impairment to beamforming. Fully digital arrays with true time delay per antenna element are a potential solution, but they require downconversion at each element. This paper shows that hybrid arrays can perform essentially as well as digital arrays once the number of radio-frequency cha… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  33. arXiv:2310.06271  [pdf, other

    cs.CL cs.AI

    Towards Mitigating Hallucination in Large Language Models via Self-Reflection

    Authors: Ziwei Ji, Tiezheng Yu, Yan Xu, Nayeon Lee, Etsuko Ishii, Pascale Fung

    Abstract: Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks. However, the practical deployment still faces challenges, notably the issue of "hallucination", where models generate plausible-sounding but unfaithful or nonsensical information. This issue becomes particularly critical in the medical domain due to the uncommon pro… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: Accepted by the findings of EMNLP 2023

  34. How Helpful do Novice Programmers Find the Feedback of an Automated Repair Tool?

    Authors: Oka Kurniawan, Christopher M. Poskitt, Ismam Al Hoque, Norman Tiong Seng Lee, Cyrille Jégourel, Nachamma Sockalingam

    Abstract: Immediate feedback has been shown to improve student learning. In programming courses, immediate, automated feedback is typically provided in the form of pre-defined test cases run by a submission platform. While these are excellent for highlighting the presence of logical errors, they do not provide novice programmers enough scaffolding to help them identify where an error is or how to fix it. To… ▽ More

    Submitted 7 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: Experience report accepted by the International Conference on Teaching, Assessment, and Learning for Engineering (TALE'23)

    Journal ref: Proc. TALE'23. IEEE, 2023

  35. arXiv:2309.14381  [pdf, other

    cs.CL cs.AI

    Survey of Social Bias in Vision-Language Models

    Authors: Nayeon Lee, Yejin Bang, Holy Lovenia, Samuel Cahyawijaya, Wenliang Dai, Pascale Fung

    Abstract: In recent years, the rapid advancement of machine learning (ML) models, particularly transformer-based pre-trained models, has revolutionized Natural Language Processing (NLP) and Computer Vision (CV) fields. However, researchers have discovered that these models can inadvertently capture and reinforce social biases present in their training datasets, leading to potential social harms, such as une… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

  36. arXiv:2309.01150  [pdf, other

    cs.LG cs.AI

    FedFwd: Federated Learning without Backpropagation

    Authors: Seonghwan Park, Dahun Shin, Jinseok Chung, Namhoon Lee

    Abstract: In federated learning (FL), clients with limited resources can disrupt the training efficiency. A potential solution to this problem is to leverage a new learning procedure that does not rely on backpropagation (BP). We present a novel approach to FL called FedFwd that employs a recent BP-free method by Hinton (2022), namely the Forward Forward algorithm, in the local training process. FedFwd can… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

    Comments: ICML 2023 Workshop (Federated Learning and Analytics in Practice: Algorithms, Systems, Applications, and Opportunities)

  37. arXiv:2308.16705  [pdf, other

    cs.CL cs.AI

    Exploring Cross-Cultural Differences in English Hate Speech Annotations: From Dataset Construction to Analysis

    Authors: Nayeon Lee, Chani Jung, Junho Myung, Jiho Jin, Jose Camacho-Collados, Juho Kim, Alice Oh

    Abstract: Warning: this paper contains content that may be offensive or upsetting. Most hate speech datasets neglect the cultural diversity within a single language, resulting in a critical shortcoming in hate speech detection. To address this, we introduce CREHate, a CRoss-cultural English Hate speech dataset. To construct CREHate, we follow a two-step procedure: 1) cultural post collection and 2) cross-… ▽ More

    Submitted 3 April, 2024; v1 submitted 31 August, 2023; originally announced August 2023.

    Comments: Accepted to NAACL 2024 Main Conference

  38. arXiv:2308.11189  [pdf, other

    cs.CL cs.AI cs.LG

    Diversity Measures: Domain-Independent Proxies for Failure in Language Model Queries

    Authors: Noel Ngu, Nathaniel Lee, Paulo Shakarian

    Abstract: Error prediction in large language models often relies on domain-specific information. In this paper, we present measures for quantification of error in the response of a large language model based on the diversity of responses to a given prompt - hence independent of the underlying application. We describe how three such measures - based on entropy, Gini impurity, and centroid distance - can be e… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Report number: Accepted to IEEE ICSC '24

  39. arXiv:2308.07317  [pdf, other

    cs.CL

    Platypus: Quick, Cheap, and Powerful Refinement of LLMs

    Authors: Ariel N. Lee, Cole J. Hunter, Nataniel Ruiz

    Abstract: We present $\textbf{Platypus}$, a family of fine-tuned and merged Large Language Models (LLMs) that achieves the strongest performance and currently stands at first place in HuggingFace's Open LLM Leaderboard as of the release date of this work. In this work we describe (1) our curated dataset $\textbf{Open-Platypus}$, that is a subset of other open datasets and which… ▽ More

    Submitted 14 March, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: Workshop on Instruction Tuning and Instruction Following at NeurIPS 2023

  40. arXiv:2308.04058  [pdf, ps, other

    eess.SP cs.IT

    Finding Globally Optimal Configuration of Active RIS in Linear Time

    Authors: Heedong Do, Namyoon Lee

    Abstract: This paper presents an algorithm for finding the optimal configuration of active reconfigurable intelligent surface (RIS) when both transmitter and receiver are equipped with a single antenna each. The resultant configuration is globally optimal and it takes linear time for the computation. Moreover, there is a closed-form expression for the optimal configuration when the direct link vanishes, whi… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

  41. arXiv:2308.03004  [pdf, other

    cs.IT cs.LG

    Deep Polar Codes

    Authors: Geon Choi, Namyoon Lee

    Abstract: In this paper, we introduce a novel class of pre-transformed polar codes, termed as deep polar codes. We first present a deep polar encoder that harnesses a series of multi-layered polar transformations with varying sizes. Our approach to encoding enables a low-complexity implementation while significantly enhancing the weight distribution of the code. Moreover, our encoding method offers flexibil… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

  42. arXiv:2307.16778  [pdf, other

    cs.CL cs.AI

    KoBBQ: Korean Bias Benchmark for Question Answering

    Authors: Jiho Jin, Jiseon Kim, Nayeon Lee, Haneul Yoo, Alice Oh, Hwaran Lee

    Abstract: The Bias Benchmark for Question Answering (BBQ) is designed to evaluate social biases of language models (LMs), but it is not simple to adapt this benchmark to cultural contexts other than the US because social biases depend heavily on the cultural context. In this paper, we present KoBBQ, a Korean bias benchmark dataset, and we propose a general framework that addresses considerations for cultura… ▽ More

    Submitted 25 January, 2024; v1 submitted 31 July, 2023; originally announced July 2023.

    Comments: TACL 2024 (pre-MIT Press publication version)

  43. arXiv:2307.03381  [pdf, other

    cs.LG

    Teaching Arithmetic to Small Transformers

    Authors: Nayoung Lee, Kartik Sreenivasan, Jason D. Lee, Kangwook Lee, Dimitris Papailiopoulos

    Abstract: Large language models like GPT-4 exhibit emergent capabilities across general-purpose tasks, such as basic arithmetic, when trained on extensive text data, even though these tasks are not explicitly encoded by the unsupervised, next-token prediction objective. This study investigates how small transformers, trained from random initialization, can efficiently learn arithmetic operations such as add… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

  44. arXiv:2307.03343  [pdf, other

    cs.IT eess.SP eess.SY

    Unified Modeling and Rate Coverage Analysis for Satellite-Terrestrial Integrated Networks: Coverage Extension or Data Offloading?

    Authors: Jeonghun Park, Jinseok Choi, Namyoon Lee, François Baccelli

    Abstract: With the growing interest in satellite networks, satellite-terrestrial integrated networks (STINs) have gained significant attention because of their potential benefits. However, due to the lack of a tractable network model for the STIN architecture, analytical studies allowing one to investigate the performance of such networks are not yet available. In this work, we propose a unified network mod… ▽ More

    Submitted 3 February, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: submitted to IEEE journal

  45. arXiv:2306.17848  [pdf, other

    cs.CV

    Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing

    Authors: Ariel N. Lee, Sarah Adel Bargal, Janavi Kasera, Stan Sclaroff, Kate Saenko, Nataniel Ruiz

    Abstract: Vision transformers (ViTs) have significantly changed the computer vision landscape and have periodically exhibited superior performance in vision tasks compared to convolutional neural networks (CNNs). Although the jury is still out on which model type is superior, each has unique inductive biases that shape their learning and generalization performance. For example, ViTs have interesting propert… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

  46. arXiv:2306.14892  [pdf, other

    cs.LG cs.AI

    Supervised Pretraining Can Learn In-Context Reinforcement Learning

    Authors: Jonathan N. Lee, Annie Xie, Aldo Pacchiano, Yash Chandak, Chelsea Finn, Ofir Nachum, Emma Brunskill

    Abstract: Large transformer models trained on diverse datasets have shown a remarkable ability to learn in-context, achieving high few-shot performance on tasks they were not explicitly trained to solve. In this paper, we study the in-context learning capabilities of transformers in decision-making problems, i.e., reinforcement learning (RL) for bandits and Markov decision processes. To do so, we introduce… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  47. arXiv:2306.12978  [pdf, other

    cs.IT eess.SP

    Rate-Splitting Multiple Access for 6G Networks: Ten Promising Scenarios and Applications

    Authors: Jeonghun Park, Byungju Lee, Jinseok Choi, Hoon Lee, Namyoon Lee, Seok-Hwan Park, Kyoung-Jae Lee, Junil Choi, Sung Ho Chae, Sang-Woon Jeon, Kyung Sup Kwak, Bruno Clerckx, Wonjae Shin

    Abstract: In the upcoming 6G era, multiple access (MA) will play an essential role in achieving high throughput performances required in a wide range of wireless applications. Since MA and interference management are closely related issues, the conventional MA techniques are limited in that they cannot provide near-optimal performance in universal interference regimes. Recently, rate-splitting multiple acce… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: 17 pages, 6 figures, submitted to IEEE Network Magazine

  48. arXiv:2306.08997   

    cs.CL cs.AI cs.LG

    Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models

    Authors: Sarah J. Zhang, Samuel Florin, Ariel N. Lee, Eamon Niknafs, Andrei Marginean, Annie Wang, Keith Tyser, Zad Chin, Yann Hicke, Nikhil Singh, Madeleine Udell, Yoon Kim, Tonio Buonassisi, Armando Solar-Lezama, Iddo Drori

    Abstract: We curate a comprehensive dataset of 4,550 questions and solutions from problem sets, midterm exams, and final exams across all MIT Mathematics and Electrical Engineering and Computer Science (EECS) courses required for obtaining a degree. We evaluate the ability of large language models to fulfill the graduation requirements for any MIT major in Mathematics and EECS. Our results demonstrate that… ▽ More

    Submitted 24 June, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Did not receive permission to release the data or model fine-tuned on the data

  49. arXiv:2306.01792  [pdf, other

    cs.IR cs.AI cs.LG

    Task Relation-aware Continual User Representation Learning

    Authors: Sein Kim, Namkyeong Lee, Donghyun Kim, Minchul Yang, Chanyoung Park

    Abstract: User modeling, which learns to represent users into a low-dimensional representation space based on their past behaviors, got a surge of interest from the industry for providing personalized services to users. Previous efforts in user modeling mainly focus on learning a task-specific user representation that is designed for a single task. However, since learning task-specific user representations… ▽ More

    Submitted 23 August, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: KDD 2023

  50. Task-Equivariant Graph Few-shot Learning

    Authors: Sungwon Kim, Junseok Lee, Namkyeong Lee, Wonjoong Kim, Seungyoon Choi, Chanyoung Park

    Abstract: Although Graph Neural Networks (GNNs) have been successful in node classification tasks, their performance heavily relies on the availability of a sufficient number of labeled nodes per class. In real-world situations, not all classes have many labeled nodes and there may be instances where the model needs to classify new classes, making manual labeling difficult. To solve this problem, it is impo… ▽ More

    Submitted 24 June, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: KDD 2023