Skip to main content

Showing 1–50 of 65 results for author: Zhou, E

  1. arXiv:2407.06153  [pdf, other

    cs.SE cs.CL

    What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

    Authors: Shihan Dou, Haoxiang Jia, Shenxi Wu, Huiyuan Zheng, Weikang Zhou, Muling Wu, Mingxu Chai, Jessica Fan, Caishuang Huang, Yunbo Tao, Yan Liu, Enyu Zhou, Ming Zhang, Yuhao Zhou, Yueming Wu, Rui Zheng, Ming Wen, Rongxiang Weng, Jingang Wang, Xunliang Cai, Tao Gui, Xipeng Qiu, Qi Zhang, Xuanjing Huang

    Abstract: The increasing development of large language models (LLMs) in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and leveraging diverse training technologies. However, there is a notable lack of comprehensive studies examining the limitations and boundar… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 17 pages, 7 figures

  2. arXiv:2407.03750  [pdf, other

    cs.DB

    GriDB: Scaling Blockchain Database via Sharding and Off-Chain Cross-Shard Mechanism

    Authors: Zicong Hong, Song Guo, Enyuan Zhou, Wuhui Chen, Huawei Huang, Albert Zomaya

    Abstract: Blockchain databases have attracted widespread attention but suffer from poor scalability due to underlying non-scalable blockchains. While blockchain sharding is necessary for a scalable blockchain database, it poses a new challenge named on-chain cross-shard database services. Each cross-shard database service (e.g., cross-shard queries or inter-shard load balancing) involves massive cross-shard… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  3. arXiv:2406.18118  [pdf, other

    cs.CR cs.CL

    SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance

    Authors: Caishuang Huang, Wanxu Zhao, Rui Zheng, Huijie Lv, Shihan Dou, Sixian Li, Xiao Wang, Enyu Zhou, Junjie Ye, Yuming Yang, Tao Gui, Qi Zhang, Xuanjing Huang

    Abstract: As the development of large language models (LLMs) rapidly advances, securing these models effectively without compromising their utility has become a pivotal area of research. However, current defense strategies against jailbreak attacks (i.e., efforts to bypass security protocols) often suffer from limited adaptability, restricted general capability, and high cost. To address these challenges, w… ▽ More

    Submitted 28 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  4. arXiv:2406.17642  [pdf, other

    cs.CL cs.AI

    Banishing LLM Hallucinations Requires Rethinking Generalization

    Authors: Johnny Li, Saksham Consul, Eda Zhou, James Wong, Naila Farooqui, Yuxin Ye, Nithyashree Manohar, Zhuxiaona Wei, Tian Wu, Ben Echols, Sharon Zhou, Gregory Diamos

    Abstract: Despite their powerful chat, coding, and reasoning abilities, Large Language Models (LLMs) frequently hallucinate. Conventional wisdom suggests that hallucinations are a consequence of a balance between creativity and factuality, which can be mitigated, but not eliminated, by grounding the LLM in external knowledge sources. Through extensive systematic experiments, we show that these traditional a… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  5. arXiv:2406.11190  [pdf, other

    cs.CL cs.AI

    Aligning Large Language Models from Self-Reference AI Feedback with one General Principle

    Authors: Rong Bao, Rui Zheng, Shihan Dou, Xiao Wang, Enyu Zhou, Bo Wang, Qi Zhang, Liang Ding, Dacheng Tao

    Abstract: In aligning large language models (LLMs), utilizing feedback from existing advanced AI rather than humans is an important method to scale supervisory signals. However, it is highly challenging for AI to understand human intentions and societal values, and provide accurate preference feedback based on these. Current AI feedback methods rely on powerful LLMs, carefully designed specific principles t… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures

  6. arXiv:2405.00438  [pdf, other

    cs.LG cs.CL

    MetaRM: Shifted Distributions Alignment via Meta-Learning

    Authors: Shihan Dou, Yan Liu, Enyu Zhou, Tianlong Li, Haoxiang Jia, Limao Xiong, Xin Zhao, Junjie Ye, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang

    Abstract: The success of Reinforcement Learning from Human Feedback (RLHF) in language model alignment is critically dependent on the capability of the reward model (RM). However, as the training process progresses, the output distribution of the policy model shifts, leading to the RM's reduced ability to distinguish between responses. This issue is further compounded when the RM, trained on a specific data… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 11 pages, 6 figures. arXiv admin note: text overlap with arXiv:2401.06080

  7. arXiv:2404.16947  [pdf, other

    cs.SE

    Fuzzing MLIR by Synthesizing Custom Mutations

    Authors: Ben Limpanukorn, Jiyuan Wang, Hong Jin Kang, Eric Zitong Zhou, Miryung Kim

    Abstract: Multi-Level Intermediate Representation (MLIR) is an effort to enable faster compiler development by providing an extensible framework for downstream developers to define custom IRs with MLIR dialects. MLIR dialects define new IRs that are tailored for specific domains. The diversity and rapid evolution of these IRs make it impractical to pre-define custom generator logic for every available diale… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  8. arXiv:2404.16331  [pdf, other

    cs.CV cs.AI

    IMWA: Iterative Model Weight Averaging Benefits Class-Imbalanced Learning Tasks

    Authors: Zitong Huang, Ze Chen, Bowen Dong, Chaoqi Liang, Erjin Zhou, Wangmeng Zuo

    Abstract: Model Weight Averaging (MWA) is a technique that seeks to enhance model's performance by averaging the weights of multiple trained models. This paper first empirically finds that 1) the vanilla MWA can benefit the class-imbalanced learning, and 2) performing model averaging in the early epochs of training yields a greater performance improvement than doing that in later epochs. Inspired by these t… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  9. arXiv:2403.12037  [pdf, other

    cs.CV

    MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control

    Authors: Enshen Zhou, Yiran Qin, Zhenfei Yin, Yuzhou Huang, Ruimao Zhang, Lu Sheng, Yu Qiao, Jing Shao

    Abstract: It is a long-lasting goal to design a generalist-embodied agent that can follow diverse instructions in human-like ways. However, existing approaches often fail to steadily follow instructions due to difficulties in understanding abstract and sequential natural language instructions. To this end, we introduce MineDreamer, an open-ended embodied agent built upon the challenging Minecraft simulator… ▽ More

    Submitted 19 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Project page: https://sites.google.com/view/minedreamer/main

  10. arXiv:2403.00675  [pdf, other

    cs.LG math.OC

    Reusing Historical Trajectories in Natural Policy Gradient via Importance Sampling: Convergence and Convergence Rate

    Authors: Yifan Lin, Yuhao Wang, Enlu Zhou

    Abstract: Reinforcement learning provides a mathematical framework for learning-based control, whose success largely depends on the amount of data it can utilize. The efficient utilization of historical trajectories obtained from previous policies is essential for expediting policy optimization. Empirical evidence has shown that policy gradient methods based on importance sampling work well. However, existi… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  11. arXiv:2402.15513  [pdf, other

    cs.MM cs.LG eess.SP physics.med-ph

    Investigating the Generalizability of Physiological Characteristics of Anxiety

    Authors: Emily Zhou, Mohammad Soleymani, Maja J. Matarić

    Abstract: Recent works have demonstrated the effectiveness of machine learning (ML) techniques in detecting anxiety and stress using physiological signals, but it is unclear whether ML models are learning physiological features specific to stress. To address this ambiguity, we evaluated the generalizability of physiological features that have been shown to be correlated with anxiety and stress to high-arous… ▽ More

    Submitted 23 January, 2024; originally announced February 2024.

    Journal ref: 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2023, pp. 4848-4855

  12. arXiv:2402.01391  [pdf, other

    cs.SE cs.CL

    StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

    Authors: Shihan Dou, Yan Liu, Haoxiang Jia, Limao Xiong, Enyu Zhou, Wei Shen, Junjie Shan, Caishuang Huang, Xiao Wang, Xiaoran Fan, Zhiheng Xi, Yuhao Zhou, Tao Ji, Rui Zheng, Qi Zhang, Xuanjing Huang, Tao Gui

    Abstract: The advancement of large language models (LLMs) has significantly propelled the field of code generation. Previous work integrated reinforcement learning (RL) with compiler feedback for exploring the output space of LLMs to enhance code generation quality. However, the lengthy code generated by LLMs in response to complex human requirements makes RL exploration a challenge. Also, since the unit te… ▽ More

    Submitted 5 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: 13 pages, 5 figures

  13. arXiv:2401.15188  [pdf, other

    cs.AI

    CAREForMe: Contextual Multi-Armed Bandit Recommendation Framework for Mental Health

    Authors: Sheng Yu, Narjes Nourzad, Randye J. Semple, Yixue Zhao, Emily Zhou, Bhaskar Krishnamachari

    Abstract: The COVID-19 pandemic has intensified the urgency for effective and accessible mental health interventions in people's daily lives. Mobile Health (mHealth) solutions, such as AI Chatbots and Mindfulness Apps, have gained traction as they expand beyond traditional clinical settings to support daily life. However, the effectiveness of current mHealth solutions is impeded by the lack of context-aware… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: MOBILESoft 2024

  14. arXiv:2401.11144  [pdf, other

    cs.CV

    Towards Open-World Gesture Recognition

    Authors: Junxiao Shen, Matthias De Lange, Xuhai "Orson" Xu, Enmin Zhou, Ran Tan, Naveen Suda, Maciej Lazarewicz, Per Ola Kristensson, Amy Karlson, Evan Strasnick

    Abstract: Static machine learning methods in gesture recognition assume that training and test data come from the same underlying distribution. However, in real-world applications involving gesture recognition on wrist-worn devices, data distribution may change over time. We formulate this problem of adapting recognition models to new tasks, where new data patterns emerge, as open-world gesture recognition… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

  15. arXiv:2401.06080  [pdf, other

    cs.AI

    Secrets of RLHF in Large Language Models Part II: Reward Modeling

    Authors: Binghai Wang, Rui Zheng, Lu Chen, Yan Liu, Shihan Dou, Caishuang Huang, Wei Shen, Senjie Jin, Enyu Zhou, Chenyu Shi, Songyang Gao, Nuo Xu, Yuhao Zhou, Xiaoran Fan, Zhiheng Xi, Jun Zhao, Xiao Wang, Tao Ji, Hang Yan, Lixing Shen, Zhan Chen, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang , et al. (2 additional authors not shown)

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has become a crucial technology for aligning language models with human values and intentions, enabling models to produce more helpful and harmless responses. Reward models are trained as proxies for human preferences to drive reinforcement learning optimization. While reward models are often considered central to achieving high performance, they f… ▽ More

    Submitted 12 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

  16. Designing a Socially Assistive Robot to Support Older Adults with Low Vision

    Authors: Emily Zhou, Zhonghao Shi, Xiaoyang Qiao, Maja J Matarić, Ava K Bittner

    Abstract: Socially assistive robots (SARs) have shown great promise in supplementing and augmenting interventions to support the physical and mental well-being of older adults. However, past work has not yet explored the potential of applying SAR to lower the barriers of long-term low vision rehabilitation (LVR) interventions for older adults. In this work, we present a user-informed design process to valid… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

    Comments: Published in Social Robotics: 13th International Conference, ICSR 2021. Springer International Publishing

  17. arXiv:2401.01598  [pdf, other

    cs.CV

    Learning Prompt with Distribution-Based Feature Replay for Few-Shot Class-Incremental Learning

    Authors: Zitong Huang, Ze Chen, Zhixing Chen, Erjin Zhou, Xinxing Xu, Rick Siow Mong Goh, Yong Liu, Wangmeng Zuo, Chunmei Feng

    Abstract: Few-shot Class-Incremental Learning (FSCIL) aims to continuously learn new classes based on very limited training data without forgetting the old ones encountered. Existing studies solely relied on pure visual networks, while in this paper we solved FSCIL by leveraging the Vision-Language model (e.g., CLIP) and propose a simple yet effective framework, named Learning Prompt with Distribution-based… ▽ More

    Submitted 5 April, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

  18. arXiv:2312.09979  [pdf, other

    cs.CL

    LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin

    Authors: Shihan Dou, Enyu Zhou, Yan Liu, Songyang Gao, Jun Zhao, Wei Shen, Yuhao Zhou, Zhiheng Xi, Xiao Wang, Xiaoran Fan, Shiliang Pu, Jiang Zhu, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang

    Abstract: Supervised fine-tuning (SFT) is a crucial step for large language models (LLMs), enabling them to align with human instructions and enhance their capabilities in downstream tasks. Increasing instruction data substantially is a direct solution to align the model with a broader range of downstream tasks or notably improve its performance on a specific task. However, we find that large-scale increase… ▽ More

    Submitted 8 March, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: 14 pages, 7 figures

  19. arXiv:2312.07472  [pdf, other

    cs.CV

    MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

    Authors: Yiran Qin, Enshen Zhou, Qichang Liu, Zhenfei Yin, Lu Sheng, Ruimao Zhang, Yu Qiao, Jing Shao

    Abstract: It is a long-lasting goal to design an embodied system that can solve long-horizon open-world tasks in human-like ways. However, existing approaches usually struggle with compound difficulties caused by the logic-aware decomposition and context-aware execution of these tasks. To this end, we introduce MP5, an open-ended multimodal embodied system built upon the challenging Minecraft simulator, whi… ▽ More

    Submitted 26 March, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted to CVPR2024

  20. arXiv:2311.04474  [pdf, other

    cs.AI

    Emergent Communication for Rules Reasoning

    Authors: Yuxuan Guo, Yifan Hao, Rui Zhang, Enshuai Zhou, Zidong Du, Xishan Zhang, Xinkai Song, Yuanbo Wen, Yongwei Zhao, Xuehai Zhou, Jiaming Guo, Qi Yi, Shaohui Peng, Di Huang, Ruizhi Chen, Qi Guo, Yunji Chen

    Abstract: Research on emergent communication between deep-learning-based agents has received extensive attention due to its inspiration for linguistics and artificial intelligence. However, previous attempts have hovered around emerging communication under perception-oriented environmental settings, that forces agents to describe low-level perceptual features intra image or symbol contexts. In this work, in… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

  21. arXiv:2310.11227  [pdf, other

    cs.CL cs.AI

    RealBehavior: A Framework for Faithfully Characterizing Foundation Models' Human-like Behavior Mechanisms

    Authors: Enyu Zhou, Rui Zheng, Zhiheng Xi, Songyang Gao, Xiaoran Fan, Zichu Fei, Jingting Ye, Tao Gui, Qi Zhang, Xuanjing Huang

    Abstract: Reports of human-like behaviors in foundation models are growing, with psychological theories providing enduring tools to investigate these behaviors. However, current research tends to directly apply these human-oriented tools without verifying the faithfulness of their outcomes. In this paper, we introduce a framework, RealBehavior, which is designed to characterize the humanoid behaviors of mod… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: Accepted to Findings of EMNLP 2023

  22. G10: Enabling An Efficient Unified GPU Memory and Storage Architecture with Smart Tensor Migrations

    Authors: Haoyang Zhang, Yirui Eric Zhou, Yuqi Xue, Yiqi Liu, Jian Huang

    Abstract: To break the GPU memory wall for scaling deep learning workloads, a variety of architecture and system techniques have been proposed recently. Their typical approaches include memory extension with flash memory and direct storage access. However, these techniques still suffer from suboptimal performance and introduce complexity to the GPU memory management, making them hard to meet the scalability… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

    Comments: This paper is accepted at The 56th IEEE/ACM International Symposium on Microarchitecture (MICRO'23). *Co-primary authors

  23. arXiv:2309.07864  [pdf, other

    cs.AI cs.CL

    The Rise and Potential of Large Language Model Based Agents: A Survey

    Authors: Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Yuhao Zhou, Weiran Wang, Changhao Jiang, Yicheng Zou, Xiangyang Liu, Zhangyue Yin, Shihan Dou, Rongxiang Weng, Wensen Cheng, Qi Zhang, Wenjuan Qin , et al. (4 additional authors not shown)

    Abstract: For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial entities that sense their environment, make decisions, and take actions. Many efforts have been made to develop intelligent agents, but they mainly focus on advancement in algorithms or training stra… ▽ More

    Submitted 19 September, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: 86 pages, 12 figures

  24. arXiv:2308.01734  [pdf, other

    cs.CL cs.HC

    Ambient Adventures: Teaching ChatGPT on Developing Complex Stories

    Authors: Zexin Chen, Eric Zhou, Kenneth Eaton, Xiangyu Peng, Mark Riedl

    Abstract: Imaginative play is an area of creativity that could allow robots to engage with the world around them in a much more personified way. Imaginary play can be seen as taking real objects and locations and using them as imaginary objects and locations in virtual scenarios. We adopted the story generation capability of large language models (LLMs) to obtain the stories used for imaginary play with hum… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

  25. arXiv:2307.06167  [pdf, other

    cs.LG physics.comp-ph

    Auxiliary-Tasks Learning for Physics-Informed Neural Network-Based Partial Differential Equations Solving

    Authors: Junjun Yan, Xinhai Chen, Zhichao Wang, Enqiang Zhou, Jie Liu

    Abstract: Physics-informed neural networks (PINNs) have emerged as promising surrogate modes for solving partial differential equations (PDEs). Their effectiveness lies in the ability to capture solution-related features through neural networks. However, original PINNs often suffer from bottlenecks, such as low accuracy and non-convergence, limiting their applicability in complex physical contexts. To allev… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

  26. arXiv:2306.04118  [pdf

    cs.LG cs.AI

    M$^3$Fair: Mitigating Bias in Healthcare Data through Multi-Level and Multi-Sensitive-Attribute Reweighting Method

    Authors: Yinghao Zhu, Jingkun An, Enshen Zhou, Lu An, Junyi Gao, Hao Li, Haoran Feng, Bo Hou, Wen Tang, Chengwei Pan, Liantao Ma

    Abstract: In the data-driven artificial intelligence paradigm, models heavily rely on large amounts of training data. However, factors like sampling distribution imbalance can lead to issues of bias and unfairness in healthcare data. Sensitive attributes, such as race, gender, age, and medical condition, are characteristics of individuals that are commonly associated with discrimination or bias. In healthca… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: 4 pages, 1 table, Beijing Health Data Science Summit 2023

  27. arXiv:2305.11300  [pdf, other

    cs.LG

    Bayesian Risk-Averse Q-Learning with Streaming Observations

    Authors: Yuhao Wang, Enlu Zhou

    Abstract: We consider a robust reinforcement learning problem, where a learning agent learns from a simulated training environment. To account for the model mis-specification between this training environment and the real environment due to lack of data, we adopt a formulation of Bayesian risk MDP (BRMDP) with infinite horizon, which uses Bayesian posterior to estimate the transition model and impose a risk… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

  28. arXiv:2304.09344  [pdf

    cs.DB q-bio.QM

    BioThings Explorer: a query engine for a federated knowledge graph of biomedical APIs

    Authors: Jackson Callaghan, Colleen H. Xu, Jiwen Xin, Marco Alvarado Cano, Anders Riutta, Eric Zhou, Rohan Juneja, Yao Yao, Madhumita Narayan, Kristina Hanspers, Ayushi Agrawal, Alexander R. Pico, Chunlei Wu, Andrew I. Su

    Abstract: Knowledge graphs are an increasingly common data structure for representing biomedical information. These knowledge graphs can easily represent heterogeneous types of information, and many algorithms and tools exist for querying and analyzing graphs. Biomedical knowledge graphs have been used in a variety of applications, including drug repurposing, identification of drug targets, prediction of dr… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

  29. arXiv:2304.08595  [pdf, other

    cs.CR

    Prophet: Conflict-Free Sharding Blockchain via Byzantine-Tolerant Deterministic Ordering

    Authors: Zicong Hong, Song Guo, Enyuan Zhou, Jianting Zhang, Wuhui Chen, Jinwen Liang, Jie Zhang, Albert Zomaya

    Abstract: Sharding scales throughput by splitting blockchain nodes into parallel groups. However, different shards' independent and random scheduling for cross-shard transactions results in numerous conflicts and aborts, since cross-shard transactions from different shards may access the same account. A deterministic ordering can eliminate conflicts by determining a global order for transactions before proc… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

  30. arXiv:2301.01434  [pdf, ps, other

    cs.LG cs.DM stat.ML

    Online Learning of Smooth Functions

    Authors: Jesse Geneson, Ethan Zhou

    Abstract: In this paper, we study the online learning of real-valued functions where the hidden function is known to have certain smoothness properties. Specifically, for $q \ge 1$, let $\mathcal F_q$ be the class of absolutely continuous functions $f: [0,1] \to \mathbb R$ such that $\|f'\|_q \le 1$. For $q \ge 1$ and $d \in \mathbb Z^+$, let $\mathcal F_{q,d}$ be the class of functions… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

    Comments: text overlap with arXiv:2105.14648

  31. arXiv:2209.14408  [pdf, other

    cs.CV cs.LG cs.RO

    RALACs: Action Recognition in Autonomous Vehicles using Interaction Encoding and Optical Flow

    Authors: Eddy Zhou, Alex Zhuang, Alikasim Budhwani, Owen Leather, Rowan Dempster, Quanquan Li, Mohammad Al-Sharman, Derek Rayside, William Melek

    Abstract: When applied to autonomous vehicle (AV) settings, action recognition can enhance an environment model's situational awareness. This is especially prevalent in scenarios where traditional geometric descriptions and heuristics in AVs are insufficient. However, action recognition has traditionally been studied for humans, and its limited adaptability to noisy, un-clipped, un-pampered, raw RGB data ha… ▽ More

    Submitted 14 January, 2024; v1 submitted 28 September, 2022; originally announced September 2022.

  32. arXiv:2207.12104  [pdf, other

    cs.CV

    W2N:Switching From Weak Supervision to Noisy Supervision for Object Detection

    Authors: Zitong Huang, Yiping Bao, Bowen Dong, Erjin Zhou, Wangmeng Zuo

    Abstract: Weakly-supervised object detection (WSOD) aims to train an object detector only requiring the image-level annotations. Recently, some works have managed to select the accurate boxes generated from a well-trained WSOD network to supervise a semi-supervised detection framework for better performance. However, these approaches simply divide the training set into labeled and unlabeled sets according t… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

    Comments: ECCV2022

  33. arXiv:2206.12463  [pdf, other

    cs.LG cs.IT stat.ML

    Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

    Authors: Yifan Lin, Yuhao Wang, Enlu Zhou

    Abstract: In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the corresponding reward. In particular, we consider mean-variance as the risk criterion, and the best arm is the one with the largest mean-variance reward. We apply the T… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

  34. arXiv:2206.10996  [pdf, other

    cs.CV

    ProtoCLIP: Prototypical Contrastive Language Image Pretraining

    Authors: Delong Chen, Zhao Wu, Fan Liu, Zaiquan Yang, Huaxi Huang, Ying Tan, Erjin Zhou

    Abstract: Contrastive Language Image Pretraining (CLIP) has received widespread attention, since its learned representations can be transferred well to various downstream tasks. During the training process of the CLIP model, the InfoNCE objective aligns positive image-text pairs and separates negative ones. We show an underlying representation grouping effect during this process: the InfoNCE objective indir… ▽ More

    Submitted 20 November, 2023; v1 submitted 22 June, 2022; originally announced June 2022.

    Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems (TNNLS)

  35. arXiv:2203.11335  [pdf, other

    cs.CV

    Global Matching with Overlapping Attention for Optical Flow Estimation

    Authors: Shiyu Zhao, Long Zhao, Zhixing Zhang, Enyu Zhou, Dimitris Metaxas

    Abstract: Optical flow estimation is a fundamental task in computer vision. Recent direct-regression methods using deep neural networks achieve remarkable performance improvement. However, they do not explicitly capture long-term motion correspondences and thus cannot handle large motions effectively. In this paper, inspired by the traditional matching-optimization methods where matching is introduced to ha… ▽ More

    Submitted 21 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022 (with additional figures)

  36. arXiv:2202.07549  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Robust Multi-Objective Bayesian Optimization Under Input Noise

    Authors: Samuel Daulton, Sait Cakmak, Maximilian Balandat, Michael A. Osborne, Enlu Zhou, Eytan Bakshy

    Abstract: Bayesian optimization (BO) is a sample-efficient approach for tuning design parameters to optimize expensive-to-evaluate, black-box performance metrics. In many manufacturing processes, the design parameters are subject to random input noise, resulting in a product that is often less performant than expected. Although BO methods have been proposed for optimizing a single objective under input nois… ▽ More

    Submitted 3 June, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

    Comments: To appear at ICML 2022. 36 pages. Code is available at https://github.com/facebookresearch/robust_mobo

  37. arXiv:2202.03535  [pdf, other

    cs.LG

    Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably

    Authors: Tianyi Liu, Yan Li, Enlu Zhou, Tuo Zhao

    Abstract: We investigate the role of noise in optimization algorithms for learning over-parameterized models. Specifically, we consider the recovery of a rank one matrix $Y^*\in R^{d\times d}$ from a noisy observation $Y$ using an over-parameterization model. We parameterize the rank one matrix $Y^*$ by $XX^\top$, where $X\in R^{d\times d}$. We then show that under mild conditions, the estimator, obtained b… ▽ More

    Submitted 7 February, 2022; originally announced February 2022.

  38. arXiv:2201.05538  [pdf, other

    cs.SI

    A Fine-Grained Analysis of Public Opinion toward Chinese Technology Companies on Reddit

    Authors: Enting Zhou, Yurong Liu, Hanjia Lyu, Jiebo Luo

    Abstract: In the face of the growing global influence and prevalence of Chinese technology companies, governments worldwide have expressed concern and mistrust toward these companies. There is a scarcity of research that specifically examines the widespread public response to this phenomenon on a large scale. This study aims to fill in the gap in understanding public opinion toward Chinese technology compan… ▽ More

    Submitted 19 March, 2023; v1 submitted 14 January, 2022; originally announced January 2022.

  39. arXiv:2111.12892  [pdf, other

    cs.CV

    Attend to Who You Are: Supervising Self-Attention for Keypoint Detection and Instance-Aware Association

    Authors: Sen Yang, Zhicheng Wang, Ze Chen, Yanjie Li, Shoukui Zhang, Zhibin Quan, Shu-Tao Xia, Yiping Bao, Erjin Zhou, Wankou Yang

    Abstract: This paper presents a new method to solve keypoint detection and instance association by using Transformer. For bottom-up multi-person pose estimation models, they need to detect keypoints and learn associative information between keypoints. We argue that these problems can be entirely solved by Transformer. Specifically, the self-attention in Transformer measures dependencies between any pair of… ▽ More

    Submitted 24 November, 2021; originally announced November 2021.

    Comments: 16 pages, 9 figures, 7 tables

  40. arXiv:2108.09691  [pdf, other

    cs.CV

    Guiding Query Position and Performing Similar Attention for Transformer-Based Detection Heads

    Authors: Xiaohu Jiang, Ze Chen, Zhicheng Wang, Erjin Zhou, ChunYuan

    Abstract: After DETR was proposed, this novel transformer-based detection paradigm which performs several cross-attentions between object queries and feature maps for predictions has subsequently derived a series of transformer-based detection heads. These models iterate object queries after each cross-attention. However, they don't renew the query position which indicates object queries' position informati… ▽ More

    Submitted 22 August, 2021; originally announced August 2021.

  41. arXiv:2108.07931  [pdf, other

    cs.LG cs.DC

    Learning Federated Representations and Recommendations with Limited Negatives

    Authors: Lin Ning, Karan Singhal, Ellie X. Zhou, Sushant Prakash

    Abstract: Deep retrieval models are widely used for learning entity representations and recommendations. Federated learning provides a privacy-preserving way to train these models without requiring centralization of user data. However, federated deep retrieval models usually perform much worse than their centralized counterparts due to non-IID (independent and identically distributed) training data on clien… ▽ More

    Submitted 2 November, 2021; v1 submitted 17 August, 2021; originally announced August 2021.

  42. arXiv:2107.10477  [pdf, other

    cs.CV

    Adaptive Dilated Convolution For Human Pose Estimation

    Authors: Zhengxiong Luo, Zhicheng Wang, Yan Huang, Liang Wang, Tieniu Tan, Erjin Zhou

    Abstract: Most existing human pose estimation (HPE) methods exploit multi-scale information by fusing feature maps of four different spatial sizes, \ie $1/4$, $1/8$, $1/16$, and $1/32$ of the input image. There are two drawbacks of this strategy: 1) feature maps of different spatial sizes may be not well aligned spatially, which potentially hurts the accuracy of keypoint location; 2) these scales are fixed… ▽ More

    Submitted 22 July, 2021; originally announced July 2021.

  43. arXiv:2106.04051  [pdf, other

    cs.LG cs.AI cs.CV cs.SI

    Graph-MLP: Node Classification without Message Passing in Graph

    Authors: Yang Hu, Haoxuan You, Zhecan Wang, Zhicheng Wang, Erjin Zhou, Yue Gao

    Abstract: Graph Neural Network (GNN) has been demonstrated its effectiveness in dealing with non-Euclidean structural data. Both spatial-based and spectral-based GNNs are relying on adjacency matrix to guide message passing among neighbors during feature aggregation. Recent works have mainly focused on powerful message passing modules, however, in this paper, we show that none of the message passing modules… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Comments: 11 pages, 6 figures

  44. arXiv:2104.03516  [pdf, other

    cs.CV

    TokenPose: Learning Keypoint Tokens for Human Pose Estimation

    Authors: Yanjie Li, Shoukui Zhang, Zhicheng Wang, Sen Yang, Wankou Yang, Shu-Tao Xia, Erjin Zhou

    Abstract: Human pose estimation deeply relies on visual clues and anatomical constraints between parts to locate keypoints. Most existing CNN-based methods do well in visual representation, however, lacking in the ability to explicitly learn the constraint relationships between keypoints. In this paper, we propose a novel approach based on Token representation for human Pose estimation~(TokenPose). In detai… ▽ More

    Submitted 13 August, 2021; v1 submitted 8 April, 2021; originally announced April 2021.

    Comments: Accepted by ICCV'21. Code is publicly available at https://github.com/leeyegy/TokenPose

  45. arXiv:2104.03106  [pdf, other

    cs.CV

    V2F-Net: Explicit Decomposition of Occluded Pedestrian Detection

    Authors: Mingyang Shang, Dawei Xiang, Zhicheng Wang, Erjin Zhou

    Abstract: Occlusion is very challenging in pedestrian detection. In this paper, we propose a simple yet effective method named V2F-Net, which explicitly decomposes occluded pedestrian detection into visible region detection and full body estimation. V2F-Net consists of two sub-networks: Visible region Detection Network (VDN) and Full body Estimation Network (FEN). VDN tries to localize visible regions and F… ▽ More

    Submitted 7 April, 2021; originally announced April 2021.

    Comments: 11 pages, 4 figures

  46. arXiv:2103.02340  [pdf, other

    cs.CV

    General Instance Distillation for Object Detection

    Authors: Xing Dai, Zeren Jiang, Zhao Wu, Yiping Bao, Zhicheng Wang, Si Liu, Erjin Zhou

    Abstract: In recent years, knowledge distillation has been proved to be an effective solution for model compression. This approach can make lightweight student models acquire the knowledge extracted from cumbersome teacher models. However, previous distillation methods of detection have weak generalization for different detection frameworks and rely heavily on ground truth (GT), ignoring the valuable relati… ▽ More

    Submitted 29 April, 2021; v1 submitted 3 March, 2021; originally announced March 2021.

    Comments: 10 pages (including 2 pages of References), 5 figures, 7 tables. Accepted by CVPR 2021. Camera Ready

  47. arXiv:2102.12430  [pdf, other

    cs.LG stat.ML

    Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization

    Authors: Tianyi Liu, Yan Li, Song Wei, Enlu Zhou, Tuo Zhao

    Abstract: Numerous empirical evidences have corroborated the importance of noise in nonconvex optimization problems. The theory behind such empirical observations, however, is still largely unknown. This paper studies this fundamental problem through investigating the nonconvex rectangular matrix factorization problem, which has infinitely many global minima due to rotation and scaling invariance. Hence, gr… ▽ More

    Submitted 24 February, 2021; originally announced February 2021.

  48. arXiv:2101.10811  [pdf, other

    cs.CV

    Semi-synthesis: A fast way to produce effective datasets for stereo matching

    Authors: Ju He, Enyu Zhou, Liusheng Sun, Fei Lei, Chenyang Liu, Wenxiu Sun

    Abstract: Stereo matching is an important problem in computer vision which has drawn tremendous research attention for decades. Recent years, data-driven methods with convolutional neural networks (CNNs) are continuously pushing stereo matching to new heights. However, data-driven methods require large amount of training data, which is not an easy task for real stereo data due to the annotation difficulties… ▽ More

    Submitted 26 January, 2021; originally announced January 2021.

  49. arXiv:2012.15175  [pdf, other

    cs.CV

    Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation

    Authors: Zhengxiong Luo, Zhicheng Wang, Yan Huang, Tieniu Tan, Erjin Zhou

    Abstract: Heatmap regression has become the most prevalent choice for nowadays human pose estimation methods. The ground-truth heatmaps are usually constructed via covering all skeletal keypoints by 2D gaussian kernels. The standard deviations of these kernels are fixed. However, for bottom-up methods, which need to handle a large variance of human scales and labeling ambiguities, the current practice seems… ▽ More

    Submitted 25 March, 2021; v1 submitted 30 December, 2020; originally announced December 2020.

    Comments: Accepted by CVPR2021

  50. arXiv:2012.07033  [pdf, other

    cs.CV

    Efficient Human Pose Estimation by Learning Deeply Aggregated Representations

    Authors: Zhengxiong Luo, Zhicheng Wang, Yuanhao Cai, Guanan Wang, Yan Huang, Liang Wang, Erjin Zhou, Tieniu Tan, Jian Sun

    Abstract: In this paper, we propose an efficient human pose estimation network (DANet) by learning deeply aggregated representations. Most existing models explore multi-scale information mainly from features with different spatial sizes. Powerful multi-scale representations usually rely on the cascaded pyramid framework. This framework largely boosts the performance but in the meanwhile makes networks very… ▽ More

    Submitted 14 December, 2020; v1 submitted 13 December, 2020; originally announced December 2020.