Skip to main content

Showing 1–32 of 32 results for author: Gu, K

  1. arXiv:2407.11009  [pdf, other

    cs.CL cs.LG

    CharED: Character-wise Ensemble Decoding for Large Language Models

    Authors: Kevin Gu, Eva Tuecke, Dmitriy Katz, Raya Horesh, David Alvarez-Melis, Mikhail Yurochkin

    Abstract: Large language models (LLMs) have shown remarkable potential for problem solving, with open source models achieving increasingly impressive performance on benchmarks measuring areas from logical reasoning to mathematical ability. Ensembling models can further improve capabilities across a variety of domains. However, conventional methods of combining models at inference time such as shallow fusion… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

    Comments: 9 pages, 4 figures

  2. arXiv:2407.00574  [pdf, other

    cs.CV

    OfCaM: Global Human Mesh Recovery via Optimization-free Camera Motion Scale Calibration

    Authors: Fengyuan Yang, Kerui Gu, Ha Linh Nguyen, Angela Yao

    Abstract: Accurate camera motion estimation is critical to estimate human motion in the global space. A standard and widely used method for estimating camera motion is Simultaneous Localization and Mapping (SLAM). However, SLAM only provides a trajectory up to an unknown scale factor. Different from previous attempts that optimize the scale factor, this paper presents Optimization-free Camera Motion Scale C… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 12 pages, 7 figures, 4 tables

  3. arXiv:2406.19888  [pdf, other

    cs.AI

    Fine-tuning of Geospatial Foundation Models for Aboveground Biomass Estimation

    Authors: Michal Muszynski, Levente Klein, Ademir Ferreira da Silva, Anjani Prasad Atluri, Carlos Gomes, Daniela Szwarcman, Gurkanwar Singh, Kewen Gu, Maciel Zortea, Naomi Simumba, Paolo Fraccaro, Shraddha Singh, Steve Meliksetian, Campbell Watson, Daiki Kimura, Harini Srinivasan

    Abstract: Global vegetation structure mapping is critical for understanding the global carbon cycle and maximizing the efficacy of nature-based carbon sequestration initiatives. Moreover, vegetation structure mapping can help reduce the impacts of climate change by, for example, guiding actions to improve water security, increase biodiversity and reduce flood risk. Global satellite measurements provide an i… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  4. arXiv:2405.19833  [pdf, other

    cs.CV

    KITRO: Refining Human Mesh by 2D Clues and Kinematic-tree Rotation

    Authors: Fengyuan Yang, Kerui Gu, Angela Yao

    Abstract: 2D keypoints are commonly used as an additional cue to refine estimated 3D human meshes. Current methods optimize the pose and shape parameters with a reprojection loss on the provided 2D keypoints. Such an approach, while simple and intuitive, has limited effectiveness because the optimal solution is hard to find in ambiguous parameter space and may sacrifice depth. Additionally, divergent gradie… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR24

  5. arXiv:2403.14863  [pdf, other

    physics.med-ph cs.CV cs.LG

    Distribution-informed and wavelength-flexible data-driven photoacoustic oximetry

    Authors: Janek Gröhl, Kylie Yeung, Kevin Gu, Thomas R. Else, Monika Golinska, Ellie V. Bunce, Lina Hacker, Sarah E. Bohndiek

    Abstract: Significance: Photoacoustic imaging (PAI) promises to measure spatially-resolved blood oxygen saturation, but suffers from a lack of accurate and robust spectral unmixing methods to deliver on this promise. Accurate blood oxygenation estimation could have important clinical applications, from cancer detection to quantifying inflammation. Aim: This study addresses the inflexibility of existing da… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 37 pages, 7 figures

    ACM Class: F.2.1

  6. arXiv:2403.10557  [pdf, other

    cs.LG cs.AI cs.CL

    Second-Order Information Matters: Revisiting Machine Unlearning for Large Language Models

    Authors: Kang Gu, Md Rafi Ur Rashid, Najrin Sultana, Shagufta Mehnaz

    Abstract: With the rapid development of Large Language Models (LLMs), we have witnessed intense competition among the major LLM products like ChatGPT, LLaMa, and Gemini. However, various issues (e.g. privacy leakage and copyright violation) of the training corpus still remain underexplored. For example, the Times sued OpenAI and Microsoft for infringing on its copyrights by using millions of its articles fo… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  7. arXiv:2401.13956  [pdf, other

    cs.CV

    A New Image Quality Database for Multiple Industrial Processes

    Authors: Xuanchao Ma, Yanlin Jiang, Hongyan Liu, Chengxu Zhou, Ke Gu

    Abstract: Recent years have witnessed a broader range of applications of image processing technologies in multiple industrial processes, such as smoke detection, security monitoring, and workpiece inspection. Different kinds of distortion types and levels must be introduced into an image during the processes of acquisition, compression, transmission, storage, and display, which might heavily degrade the ima… ▽ More

    Submitted 15 February, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

  8. arXiv:2401.02823  [pdf, other

    cs.CL cs.IR

    DocGraphLM: Documental Graph Language Model for Information Extraction

    Authors: Dongsheng Wang, Zhiqiang Ma, Armineh Nourbakhsh, Kang Gu, Sameena Shah

    Abstract: Advances in Visually Rich Document Understanding (VrDU) have enabled information extraction and question answering over documents with complex layouts. Two tropes of architectures have emerged -- transformer-based models inspired by LLMs, and Graph Neural Networks. In this paper, we introduce DocGraphLM, a novel framework that combines pre-trained language models with graph semantics. To achieve t… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: Published at SIGIR'23 (repost for easier access)

  9. arXiv:2312.00462  [pdf, other

    cs.CV

    Learning Unorthogonalized Matrices for Rotation Estimation

    Authors: Kerui Gu, Zhihao Li, Shiyong Liu, Jianzhuang Liu, Songcen Xu, Youliang Yan, Michael Bi Mi, Kenji Kawaguchi, Angela Yao

    Abstract: Estimating 3D rotations is a common procedure for 3D computer vision. The accuracy depends heavily on the rotation representation. One form of representation -- rotation matrices -- is popular due to its continuity, especially for pose estimation tasks. The learning process usually incorporates orthogonalization to ensure orthonormal matrices. Our work reveals, through gradient analysis, that comm… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  10. arXiv:2311.17105  [pdf, other

    cs.CV

    On the Calibration of Human Pose Estimation

    Authors: Kerui Gu, Rongyu Chen, Angela Yao

    Abstract: Most 2D human pose estimation frameworks estimate keypoint confidence in an ad-hoc manner, using heuristics such as the maximum value of heatmaps. The confidence is part of the evaluation scheme, e.g., AP for the MSCOCO dataset, yet has been largely overlooked in the development of state-of-the-art methods. This paper takes the first steps in addressing miscalibration in pose estimation. From a ca… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  11. arXiv:2310.16152  [pdf, other

    cs.CR cs.LG

    FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering

    Authors: Md Rafi Ur Rashid, Vishnu Asutosh Dasu, Kang Gu, Najrin Sultana, Shagufta Mehnaz

    Abstract: Federated learning (FL) has become a key component in various language modeling applications such as machine translation, next-word prediction, and medical record analysis. These applications are trained on datasets from many FL participants that often include privacy-sensitive data, such as healthcare records, phone/credit card numbers, login credentials, etc. Although FL enables computation with… ▽ More

    Submitted 25 May, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: 20 pages (including bibliography and Appendix), Submitted to ACM CCS '24

  12. arXiv:2309.10947  [pdf, other

    cs.HC

    How Do Analysts Understand and Verify AI-Assisted Data Analyses?

    Authors: Ken Gu, Ruoxi Shang, Tim Althoff, Chenglong Wang, Steven M. Drucker

    Abstract: Data analysis is challenging as it requires synthesizing domain knowledge, statistical expertise, and programming skills. Assistants powered by large language models (LLMs), such as ChatGPT, can assist analysts by translating natural language instructions into code. However, AI-assistant responses and analysis code can be misaligned with the analyst's intent or be seemingly correct but lead to inc… ▽ More

    Submitted 4 March, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Accepted to CHI 2024

  13. arXiv:2309.10108  [pdf, other

    cs.HC

    How Do Data Analysts Respond to AI Assistance? A Wizard-of-Oz Study

    Authors: Ken Gu, Madeleine Grunde-McLaughlin, Andrew M. McNutt, Jeffrey Heer, Tim Althoff

    Abstract: Data analysis is challenging as analysts must navigate nuanced decisions that may yield divergent conclusions. AI assistants have the potential to support analysts in planning their analyses, enabling more robust decision making. Though AI-based assistants that target code execution (e.g., Github Copilot) have received significant attention, limited research addresses assistance for both analysis… ▽ More

    Submitted 4 March, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: Accepted to CHI 2024

  14. arXiv:2308.04160  [pdf, other

    cs.RO

    S&Reg: End-to-End Learning-Based Model for Multi-Goal Path Planning Problem

    Authors: Yuan Huang, Kairui Gu, Hee-hyol Lee

    Abstract: In this paper, we propose a novel end-to-end approach for solving the multi-goal path planning problem in obstacle environments. Our proposed model, called S&Reg, integrates multi-task learning networks with a TSP solver and a path planner to quickly compute a closed and feasible path visiting all goals. Specifically, the model first predicts promising regions that potentially contain the optimal… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: 7 paegs, 12 figures. Accepted at IEEE International Conference on Robot and Human Interactive Communication (ROMAN), 2023

  15. arXiv:2301.10431  [pdf, other

    cs.CV

    Bias-Compensated Integral Regression for Human Pose Estimation

    Authors: Kerui Gu, Linlin Yang, Michael Bi Mi, Angela Yao

    Abstract: In human and hand pose estimation, heatmaps are a crucial intermediate representation for a body or hand keypoint. Two popular methods to decode the heatmap into a final joint coordinate are via an argmax, as done in heatmap detection, or via softmax and expectation, as done in integral regression. Integral regression is learnable end-to-end, but has lower accuracy than detection. This paper uncov… ▽ More

    Submitted 25 January, 2023; originally announced January 2023.

  16. arXiv:2210.03804  [pdf, other

    cs.HC cs.SE

    Understanding and Supporting Debugging Workflows in Multiverse Analysis

    Authors: Ken Gu, Eunice Jun, Tim Althoff

    Abstract: Multiverse analysis, a paradigm for statistical analysis that considers all combinations of reasonable analysis choices in parallel, promises to improve transparency and reproducibility. Although recent tools help analysts specify multiverse analyses, they remain difficult to use in practice. In this work, we identify debugging as a key barrier due to the latency from running analyses to detecting… ▽ More

    Submitted 4 June, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

    Comments: CHI 2023

    Journal ref: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23), April 23-28, 2023, Hamburg, Germany. ACM, New York, NY, USA

  17. arXiv:2205.03574  [pdf, other

    cs.CV eess.IV

    Utility-Oriented Underwater Image Quality Assessment Based on Transfer Learning

    Authors: Weiling Chen, Rongfu Lin, Honggang Liao, Tiesong Zhao, Ke Gu, Patrick Le Callet

    Abstract: The widespread image applications have greatly promoted the vision-based tasks, in which the Image Quality Assessment (IQA) technique has become an increasingly significant issue. For user enjoyment in multimedia systems, the IQA exploits image fidelity and aesthetics to characterize user experience; while for other tasks such as popular object recognition, there exists a low correlation between u… ▽ More

    Submitted 7 May, 2022; originally announced May 2022.

  18. arXiv:2107.11413  [pdf, other

    cs.LG cs.HC

    An Instance-Dependent Simulation Framework for Learning with Label Noise

    Authors: Keren Gu, Xander Masotto, Vandana Bachani, Balaji Lakshminarayanan, Jack Nikodem, Dong Yin

    Abstract: We propose a simulation framework for generating instance-dependent noisy labels via a pseudo-labeling paradigm. We show that the distribution of the synthetic noisy labels generated with our framework is closer to human labels compared to independent and class-conditional random flipping. Equipped with controllable label noise, we study the negative impact of noisy labels across a few practical s… ▽ More

    Submitted 17 October, 2021; v1 submitted 23 July, 2021; originally announced July 2021.

    Comments: Datasets released at https://github.com/deepmind/deepmind-research/tree/master/noisy_label

  19. Feature Selection for Multivariate Time Series via Network Pruning

    Authors: Kang Gu, Soroush Vosoughi, Temiloluwa Prioleau

    Abstract: In recent years, there has been an ever increasing amount of multivariate time series (MTS) data in various domains, typically generated by a large family of sensors such as wearable devices. This has led to the development of novel learning methods on MTS data, with deep learning models dominating the most recent advancements. Prior literature has primarily focused on designing new network archit… ▽ More

    Submitted 21 October, 2021; v1 submitted 11 February, 2021; originally announced February 2021.

    Comments: In ICDM 2021 Workshop on Systematic Feature Engineering for Time-Series Data Mining (SFE-TSDM)

  20. arXiv:2006.14002  [pdf, other

    cs.CE cs.LG stat.ML

    Bi-Level Graph Neural Networks for Drug-Drug Interaction Prediction

    Authors: Yunsheng Bai, Ken Gu, Yizhou Sun, Wei Wang

    Abstract: We introduce Bi-GNN for modeling biological link prediction tasks such as drug-drug interaction (DDI) and protein-protein interaction (PPI). Taking drug-drug interaction as an example, existing methods using machine learning either only utilize the link structure between drugs without using the graph representation of each drug molecule, or only leverage the individual drug compound structures wit… ▽ More

    Submitted 11 June, 2020; originally announced June 2020.

  21. Shop The Look: Building a Large Scale Visual Shopping System at Pinterest

    Authors: Raymond Shiau, Hao-Yu Wu, Eric Kim, Yue Li Du, Anqi Guo, Zhiyuan Zhang, Eileen Li, Kunlong Gu, Charles Rosenberg, Andrew Zhai

    Abstract: As online content becomes ever more visual, the demand for searching by visual queries grows correspondingly stronger. Shop The Look is an online shopping discovery service at Pinterest, leveraging visual search to enable users to find and buy products within an image. In this work, we provide a holistic view of how we built Shop The Look, a shopping oriented visual search system, along with lesso… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

    Comments: 10 pages, 7 figures, Accepted to KDD'20

    ACM Class: I.2.10; I.4.8; I.4.9; I.4.10; I.5.4; K.4.4

  22. Bootstrapping Complete The Look at Pinterest

    Authors: Eileen Li, Eric Kim, Andrew Zhai, Josh Beal, Kunlong Gu

    Abstract: Putting together an ideal outfit is a process that involves creativity and style intuition. This makes it a particularly difficult task to automate. Existing styling products generally involve human specialists and a highly curated set of fashion items. In this paper, we will describe how we bootstrapped the Complete The Look (CTL) system at Pinterest. This is a technology that aims to learn the s… ▽ More

    Submitted 29 June, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: 9 pages, 12 figures, To be published in KDD '20

  23. arXiv:2003.09784  [pdf, other

    eess.IV cs.CV

    AQPDCITY Dataset: Picture-Based PM Monitoring in the Urban Area of Big Cities

    Authors: Yonghui Zhang, Ke Gu

    Abstract: Since Particulate Matters (PMs) are closely related to people's living and health, it has become one of the most important indicator of air quality monitoring around the world. But the existing sensor-based methods for PM monitoring have remarkable disadvantages, such as low-density monitoring stations and high-requirement monitoring conditions. It is highly desired to devise a method that can obt… ▽ More

    Submitted 5 April, 2020; v1 submitted 21 March, 2020; originally announced March 2020.

    ACM Class: J.6.0

  24. arXiv:2003.08609  [pdf, other

    cs.CV

    AQPDBJUT Dataset: Picture-Based PM Monitoring in the Campus of BJUT

    Authors: Yonghui Zhang, Ke Gu

    Abstract: Ensuring the students in good physical levels is imperative for their future health. In recent years, the continually growing concentration of Particulate Matter (PM) has done increasingly serious harm to student health. Hence, it is highly required to prevent and control PM concentrations in the campus. As the source of PM prevention and control, developing a good model for PM monitoring is extre… ▽ More

    Submitted 21 March, 2020; v1 submitted 19 March, 2020; originally announced March 2020.

    ACM Class: J.6.0

  25. arXiv:1912.05971  [pdf, other

    eess.IV cs.CV cs.HC

    Toward Better Understanding of Saliency Prediction in Augmented 360 Degree Videos

    Authors: Yucheng Zhu, Xiongkuo Min, DanDan Zhu, Ke Gu, Jiantao Zhou, Guangtao Zhai, Xiaokang Yang, Wenjun Zhang

    Abstract: Augmented reality (AR) overlays digital content onto the reality. In AR system, correct and precise estimations of user's visual fixations and head movements can enhance the quality of experience by allocating more computation resources on the areas of interest. However, there is inadequate research about understanding the visual exploration of users when using an AR system or modeling AR visual a… ▽ More

    Submitted 20 July, 2020; v1 submitted 12 December, 2019; originally announced December 2019.

  26. arXiv:1911.09224  [pdf, other

    cs.CV

    FLNet: Landmark Driven Fetching and Learning Network for Faithful Talking Facial Animation Synthesis

    Authors: Kuangxiao Gu, Yuqian Zhou, Thomas Huang

    Abstract: Talking face synthesis has been widely studied in either appearance-based or warping-based methods. Previous works mostly utilize single face image as a source, and generate novel facial animations by merging other person's facial features. However, some facial regions like eyes or teeth, which may be hidden in the source image, can not be synthesized faithfully and stably. In this paper, We prese… ▽ More

    Submitted 20 November, 2019; originally announced November 2019.

    Comments: Accepted by AAAI 2020

  27. arXiv:1904.10076  [pdf, other

    cs.CV cs.LG

    Using Videos to Evaluate Image Model Robustness

    Authors: Keren Gu, Brandon Yang, Jiquan Ngiam, Quoc Le, Jonathon Shlens

    Abstract: Human visual systems are robust to a wide range of image transformations that are challenging for artificial networks. We present the first study of image model robustness to the minute transformations found across video frames, which we term "natural robustness". Compared to previous studies on adversarial examples and synthetic distortions, natural robustness captures a more diverse set of commo… ▽ More

    Submitted 29 August, 2019; v1 submitted 22 April, 2019; originally announced April 2019.

    Comments: Video Robustness Dataset included in directory

  28. arXiv:1904.08632  [pdf, other

    cs.CV

    Learning a No-Reference Quality Assessment Model of Enhanced Images With Big Data

    Authors: Ke Gu, Dacheng Tao, Junfei Qiao, Weisi Lin

    Abstract: In this paper we investigate into the problem of image quality assessment (IQA) and enhancement via machine learning. This issue has long attracted a wide range of attention in computational intelligence and image processing communities, since, for many practical applications, e.g. object detection and recognition, raw images are usually needed to be appropriately enhanced to raise the visual qual… ▽ More

    Submitted 18 April, 2019; originally announced April 2019.

    Comments: 12 pages, 45 figures

  29. arXiv:1904.01098  [pdf, other

    cs.LG stat.ML

    Unsupervised Inductive Graph-Level Representation Learning via Graph-Graph Proximity

    Authors: Yunsheng Bai, Hao Ding, Yang Qiao, Agustin Marinovic, Ken Gu, Ting Chen, Yizhou Sun, Wei Wang

    Abstract: We introduce a novel approach to graph-level representation learning, which is to embed an entire graph into a vector space where the embeddings of two graphs preserve their graph-graph proximity. Our approach, UGRAPHEMB, is a general framework that provides a novel means to performing graph-level embedding in a completely unsupervised and inductive manner. The learned neural network can be consid… ▽ More

    Submitted 2 June, 2019; v1 submitted 1 April, 2019; originally announced April 2019.

    Comments: IJCAI 2019 camera ready version with supplementary material

  30. arXiv:1804.07353  [pdf, other

    cs.CV cs.LG stat.ML

    Unsupervised Representation Adversarial Learning Network: from Reconstruction to Generation

    Authors: Yuqian Zhou, Kuangxiao Gu, Thomas Huang

    Abstract: A good representation for arbitrarily complicated data should have the capability of semantic generation, clustering and reconstruction. Previous research has already achieved impressive performance on either one. This paper aims at learning a disentangled representation effective for all of them in an unsupervised way. To achieve all the three tasks together, we learn the forward and inverse mapp… ▽ More

    Submitted 6 April, 2019; v1 submitted 19 April, 2018; originally announced April 2018.

  31. arXiv:1708.02286  [pdf, other

    cs.CV cs.LG stat.ML

    Jointly Attentive Spatial-Temporal Pooling Networks for Video-based Person Re-Identification

    Authors: Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, Pan Zhou

    Abstract: Person Re-Identification (person re-id) is a crucial task as its applications in visual surveillance and human-computer interaction. In this work, we present a novel joint Spatial and Temporal Attention Pooling Network (ASTPN) for video-based person re-identification, which enables the feature extractor to be aware of the current input video sequences, in a way that interdependency from the matchi… ▽ More

    Submitted 29 September, 2017; v1 submitted 2 August, 2017; originally announced August 2017.

    Comments: To appear in ICCV 2017

  32. arXiv:1405.6341  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    Efficient Model Learning for Human-Robot Collaborative Tasks

    Authors: Stefanos Nikolaidis, Keren Gu, Ramya Ramakrishnan, Julie Shah

    Abstract: We present a framework for learning human user models from joint-action demonstrations that enables the robot to compute a robust policy for a collaborative task with a human. The learning takes place completely automatically, without any human intervention. First, we describe the clustering of demonstrated action sequences into different human types using an unsupervised learning algorithm. These… ▽ More

    Submitted 24 May, 2014; originally announced May 2014.

    ACM Class: I.2.6; I.2.8; I.2.9

    Journal ref: Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction (HRI 2015)