Skip to main content

Showing 1–50 of 70 results for author: Weng, Y

  1. arXiv:2407.11459  [pdf, other

    eess.SP cs.LG

    RIMformer: An End-to-End Transformer for FMCW Radar Interference Mitigation

    Authors: Ziang Zhang, Guangzhi Chen, Youlong Weng, Shunchuan Yang, Zhiyu Jia, Jingxuan Chen

    Abstract: Frequency-modulated continuous-wave (FMCW) radar plays a pivotal role in the field of remote sensing. The increasing degree of FMCW radar deployment has increased the mutual interference, which weakens the detection capabilities of radars and threatens reliability and safety of systems. In this paper, a novel FMCW radar interference mitigation (RIM) method, termed as RIMformer, is proposed by usin… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  2. arXiv:2406.17739  [pdf, other

    cs.CL cs.AI

    Find Parent then Label Children: A Two-stage Taxonomy Completion Method with Pre-trained Language Model

    Authors: Fei Xia, Yixuan Weng, Shizhu He, Kang Liu, Jun Zhao

    Abstract: Taxonomies, which organize domain concepts into hierarchical structures, are crucial for building knowledge systems and downstream applications. As domain knowledge evolves, taxonomies need to be continuously updated to include new concepts. Previous approaches have mainly focused on adding concepts to the leaf nodes of the existing hierarchical tree, which does not fully utilize the taxonomy's kn… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2406.10318  [pdf, other

    cs.CV cs.AI

    Creating a Lens of Chinese Culture: A Multimodal Dataset for Chinese Pun Rebus Art Understanding

    Authors: Tuo Zhang, Tiantian Feng, Yibin Ni, Mengqin Cao, Ruying Liu, Katharine Butler, Yanjun Weng, Mi Zhang, Shrikanth S. Narayanan, Salman Avestimehr

    Abstract: Large vision-language models (VLMs) have demonstrated remarkable abilities in understanding everyday content. However, their performance in the domain of art, particularly culturally rich art forms, remains less explored. As a pearl of human wisdom and creativity, art encapsulates complex cultural narratives and symbolism. In this paper, we offer the Pun Rebus Art Dataset, a multimodal dataset for… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  4. arXiv:2406.08474  [pdf, other

    cs.CV cs.AI cs.LG

    Real2Code: Reconstruct Articulated Objects via Code Generation

    Authors: Zhao Mandi, Yijia Weng, Dominik Bauer, Shuran Song

    Abstract: We present Real2Code, a novel approach to reconstructing articulated objects via code generation. Given visual observations of an object, we first reconstruct its part geometry using an image segmentation model and a shape completion model. We then represent the object parts with oriented bounding boxes, which are input to a fine-tuned large language model (LLM) to predict joint articulation as co… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  5. arXiv:2405.17421  [pdf, other

    cs.CV cs.GR

    MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds

    Authors: Jiahui Lei, Yijia Weng, Adam Harley, Leonidas Guibas, Kostas Daniilidis

    Abstract: We introduce 4D Motion Scaffolds (MoSca), a neural information processing system designed to reconstruct and synthesize novel views of dynamic scenes from monocular videos captured casually in the wild. To address such a challenging and ill-posed inverse problem, we leverage prior knowledge from foundational vision models, lift the video data to a novel Motion Scaffold (MoSca) representation, whic… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: project page: https://www.cis.upenn.edu/~leijh/projects/mosca

  6. arXiv:2405.15301  [pdf, other

    cs.LG

    Rankability-enhanced Revenue Uplift Modeling Framework for Online Marketing

    Authors: Bowei He, Yunpeng Weng, Xing Tang, Ziqiang Cui, Zexu Sun, Liang Chen, Xiuqiang He, Chen Ma

    Abstract: Uplift modeling has been widely employed in online marketing by predicting the response difference between the treatment and control groups, so as to identify the sensitive individuals toward interventions like coupons or discounts. Compared with traditional \textit{conversion uplift modeling}, \textit{revenue uplift modeling} exhibits higher potential due to its direct connection with the corpora… ▽ More

    Submitted 12 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024

  7. arXiv:2405.13380  [pdf, other

    cs.CR

    The Illusion of Anonymity: Uncovering the Impact of User Actions on Privacy in Web3 Social Ecosystems

    Authors: Bin Wang, Tianjian Liu, Wenqi Wang, Yuan Weng, Chao Li, Guangquan Xu, Meng Shen, Sencun Zhu, Wei Wang

    Abstract: The rise of Web3 social ecosystems signifies the dawn of a new chapter in digital interaction, offering significant prospects for user engagement and financial advancement. Nonetheless, this progress is shadowed by potential privacy concessions, especially as these platforms frequently merge with existing Web2.0 social media accounts, amplifying data privacy risks for users. In this study, we in… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  8. 3D Gaussian Blendshapes for Head Avatar Animation

    Authors: Shengjie Ma, Yanlin Weng, Tianjia Shao, Kun Zhou

    Abstract: We introduce 3D Gaussian blendshapes for modeling photorealistic head avatars. Taking a monocular video as input, we learn a base head model of neutral expression, along with a group of expression blendshapes, each of which corresponds to a basis expression in classical parametric face models. Both the neutral model and expression blendshapes are represented as 3D Gaussians, which contain a few pr… ▽ More

    Submitted 2 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: ACM SIGGRAPH Conference Proceedings 2024

  9. arXiv:2404.03384  [pdf, other

    cs.CV

    LongVLM: Efficient Long Video Understanding via Large Language Models

    Authors: Yuetian Weng, Mingfei Han, Haoyu He, Xiaojun Chang, Bohan Zhuang

    Abstract: Empowered by Large Language Models (LLMs), recent advancements in VideoLLMs have driven progress in various video understanding tasks. These models encode video representations through pooling or query aggregation over a vast number of visual tokens, making computational and memory costs affordable. Despite successfully providing an overall comprehension of video content, existing VideoLLMs still… ▽ More

    Submitted 10 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

  10. arXiv:2404.01440  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects

    Authors: Yijia Weng, Bowen Wen, Jonathan Tremblay, Valts Blukis, Dieter Fox, Leonidas Guibas, Stan Birchfield

    Abstract: We address the problem of building digital twins of unknown articulated objects from two RGBD scans of the object at different articulation states. We decompose the problem into two stages, each addressing distinct aspects. Our method first reconstructs object-level shape at each state, then recovers the underlying articulation model including part segmentation and joint articulations that associa… ▽ More

    Submitted 6 June, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  11. arXiv:2402.10151  [pdf, other

    cs.CL

    ControlLM: Crafting Diverse Personalities for Language Models

    Authors: Yixuan Weng, Shizhu He, Kang Liu, Shengping Liu, Jun Zhao

    Abstract: As language models continue to scale in size and capability, they display an array of emerging behaviors, both beneficial and concerning. This heightens the need to control model behaviors. We hope to be able to control the personality traits of language models at the inference-time so as to have various character features, on top of which the requirements of different types of tasks can be met. P… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: 17 pages

  12. arXiv:2401.16419  [pdf, other

    cs.LG stat.ML

    Semi-parametric Expert Bayesian Network Learning with Gaussian Processes and Horseshoe Priors

    Authors: Yidou Weng, Finale Doshi-Velez

    Abstract: This paper proposes a model learning Semi-parametric relationships in an Expert Bayesian Network (SEBN) with linear parameter and structure constraints. We use Gaussian Processes and a Horseshoe prior to introduce minimal nonlinear components. To prioritize modifying the expert graph over adding new edges, we optimize differential Horseshoe scales. In real-world datasets with unknown truth, we gen… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: 8 pages, 4 figures, AAAI-2024 workshops

  13. arXiv:2401.01525  [pdf, other

    cs.IR

    Expected Transaction Value Optimization for Precise Marketing in FinTech Platforms

    Authors: Yunpeng Weng, Xing Tang, Liang Chen, Dugang Liu, Xiuqiang He

    Abstract: FinTech platforms facilitated by digital payments are watching growth rapidly, which enable the distribution of mutual funds personalized to individual investors via mobile Apps. As the important intermediation of financial products investment, these platforms distribute thousands of mutual funds obtaining impressions under guaranteed delivery (GD) strategy required by fund companies. Driven by th… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: Accepted by Workshop on Deep Learning Practice for High-Dimensional Sparse Data in RecSys'23 (DLP@RecSys), Singapore, 2023

  14. arXiv:2312.15610  [pdf, other

    cs.CV

    Towards Learning Geometric Eigen-Lengths Crucial for Fitting Tasks

    Authors: Yijia Weng, Kaichun Mo, Ruoxi Shi, Yanchao Yang, Leonidas J. Guibas

    Abstract: Some extremely low-dimensional yet crucial geometric eigen-lengths often determine the success of some geometric tasks. For example, the height of an object is important to measure to check if it can fit between the shelves of a cabinet, while the width of a couch is crucial when trying to move it through a doorway. Humans have materialized such crucial geometric eigen-lengths in common sense sinc… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

    Comments: ICML 2023. Project page: https://yijiaweng.github.io/geo-eigen-length

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:36958-36977, 2023

  15. arXiv:2311.17607  [pdf, other

    cs.CV cs.LG

    Topology-Preserving Adversarial Training

    Authors: Xiaoyue Mi, Fan Tang, Yepeng Weng, Danding Wang, Juan Cao, Sheng Tang, Peng Li, Yang Liu

    Abstract: Despite the effectiveness in improving the robustness of neural networks, adversarial training has suffered from the natural accuracy degradation problem, i.e., accuracy on natural samples has reduced significantly. In this study, we reveal that natural accuracy degradation is highly related to the disruption of the natural sample topology in the representation space by quantitative and qualitativ… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  16. arXiv:2311.16809  [pdf, other

    cs.RO

    Design and trajectory tracking control of CuRobot: A Cubic Reversible Robot

    Authors: Kai Yang, Jiahui Wang, Yuchen Weng, Baolei Wu, Fuqiang Li, Jihong Zhu, Jun Wang

    Abstract: In field environments, numerous robots necessitate manual intervention for restoration of functionality post a turnover, resulting in diminished operational efficiency. This study presents an innovative design solution for a reversible omnidirectional mobile robot denoted as CuRobot, featuring a cube structure, thereby facilitating uninterrupted omnidirectional movement even in the event of flippi… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  17. arXiv:2311.09053  [pdf, other

    cs.CL cs.AI

    Assessing Knowledge Editing in Language Models via Relation Perspective

    Authors: Yifan Wei, Xiaoyan Yu, Huanhuan Ma, Fangyu Lei, Yixuan Weng, Ran Song, Kang Liu

    Abstract: Knowledge Editing (KE) for modifying factual knowledge in Large Language Models (LLMs) has been receiving increasing attention. However, existing knowledge editing methods are entity-centric, and it is unclear whether this approach is suitable for a relation-centric perspective. To address this gap, this paper constructs a new benchmark named RaKE, which focuses on Relation based Knowledge Editing… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: Work in progress

  18. arXiv:2310.18954  [pdf, other

    cs.CV cs.AI

    Mask Propagation for Efficient Video Semantic Segmentation

    Authors: Yuetian Weng, Mingfei Han, Haoyu He, Mingjie Li, Lina Yao, Xiaojun Chang, Bohan Zhuang

    Abstract: Video Semantic Segmentation (VSS) involves assigning a semantic label to each pixel in a video sequence. Prior work in this field has demonstrated promising results by extending image semantic segmentation models to exploit temporal relationships across video frames; however, these approaches often incur significant computational costs. In this paper, we propose an efficient mask propagation frame… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  19. arXiv:2310.15928  [pdf, other

    cs.RO

    AO-Grasp: Articulated Object Grasp Generation

    Authors: Carlota Parés Morlans, Claire Chen, Yijia Weng, Michelle Yi, Yuying Huang, Nick Heppert, Linqi Zhou, Leonidas Guibas, Jeannette Bohg

    Abstract: We introduce AO-Grasp, a grasp proposal method that generates 6 DoF grasps that enable robots to interact with articulated objects, such as opening and closing cabinets and appliances. AO-Grasp consists of two main contributions: the AO-Grasp Model and the AO-Grasp Dataset. Given a segmented partial point cloud of a single articulated object, the AO-Grasp Model predicts the best grasp points on th… ▽ More

    Submitted 18 March, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: Project website: https://stanford-iprl-lab.github.io/ao-grasp

  20. arXiv:2309.07925  [pdf, other

    eess.AS cs.AI cs.MM cs.SD

    Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023

    Authors: Haotian Wang, Yuxuan Xi, Hang Chen, Jun Du, Yan Song, Qing Wang, Hengshun Zhou, Chenxi Wang, Jiefeng Ma, Pengfei Hu, Ya Jiang, Shi Cheng, Jie Zhang, Yuzhe Weng

    Abstract: In this paper, we propose a novel framework for recognizing both discrete and dimensional emotions. In our framework, deep features extracted from foundation models are used as robust acoustic and visual representations of raw video. Three different structures based on attention-guided feature gathering (AFG) are designed for deep feature fusion. Then, we introduce a joint decoding structure for e… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

    Comments: 5 pages, 4 figures

    Journal ref: The 31st ACM International Conference on Multimedia (MM'23), 2023

  21. arXiv:2309.07157  [pdf, other

    cs.LG eess.SY math.OC stat.AP

    Distribution Grid Line Outage Identification with Unknown Pattern and Performance Guarantee

    Authors: Chenhan Xiao, Yizheng Liao, Yang Weng

    Abstract: Line outage identification in distribution grids is essential for sustainable grid operation. In this work, we propose a practical yet robust detection approach that utilizes only readily available voltage magnitudes, eliminating the need for costly phase angles or power flow data. Given the sensor data, many existing detection methods based on change-point detection require prior knowledge of out… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

    Comments: 12 pages

    Journal ref: IEEE Transactions on Power Systems 2023

  22. arXiv:2308.10252  [pdf, other

    cs.CL cs.AI

    LMTuner: An user-friendly and highly-integrable Training Framework for fine-tuning Large Language Models

    Authors: Yixuan Weng, Zhiqi Wang, Huanxuan Liao, Shizhu He, Shengping Liu, Kang Liu, Jun Zhao

    Abstract: With the burgeoning development in the realm of large language models (LLMs), the demand for efficient incremental training tailored to specific industries and domains continues to increase. Currently, the predominantly employed frameworks lack modular design, it often takes a lot of coding work to kickstart the training of LLM. To address this, we present "LMTuner", a highly usable, integrable, a… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

  23. arXiv:2305.14211  [pdf, other

    cs.CL

    Towards Graph-hop Retrieval and Reasoning in Complex Question Answering over Textual Database

    Authors: Minjun Zhu, Yixuan Weng, Shizhu He, Kang Liu, Jun Zhao

    Abstract: In Textual question answering (TQA) systems, complex questions often require retrieving multiple textual fact chains with multiple reasoning steps. While existing benchmarks are limited to single-chain or single-hop retrieval scenarios. In this paper, we propose to conduct Graph-Hop -- a novel multi-chains and multi-hops retrieval and reasoning paradigm in complex question answering. We construct… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  24. arXiv:2305.10807  [pdf, other

    eess.IV cs.CV

    Transformer-based Variable-rate Image Compression with Region-of-interest Control

    Authors: Chia-Hao Kao, Ying-Chieh Weng, Yi-Hsin Chen, Wei-Chen Chiu, Wen-Hsiao Peng

    Abstract: This paper proposes a transformer-based learned image compression system. It is capable of achieving variable-rate compression with a single model while supporting the region-of-interest (ROI) functionality. Inspired by prompt tuning, we introduce prompt generation networks to condition the transformer-based autoencoder of compression. Our prompt generation networks generate content-adaptive token… ▽ More

    Submitted 1 August, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted to IEEE ICIP 2023

  25. arXiv:2305.05410  [pdf, other

    cs.CL

    Large Language Models Need Holistically Thought in Medical Conversational QA

    Authors: Yixuan Weng, Bin Li, Fei Xia, Minjun Zhu, Bin Sun, Shizhu He, Kang Liu, Jun Zhao

    Abstract: The medical conversational question answering (CQA) system aims at providing a series of professional medical services to improve the efficiency of medical care. Despite the success of large language models (LLMs) in complex reasoning tasks in various fields, such as mathematics, logic, and commonsense QA, they still need to improve with the increased complexity and specialization of the medical f… ▽ More

    Submitted 10 May, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

  26. arXiv:2305.01514  [pdf, other

    cs.IR cs.AI cs.LG

    Curriculum Modeling the Dependence among Targets with Multi-task Learning for Financial Marketing

    Authors: Yunpeng Weng, Xing Tang, Liang Chen, Xiuqiang He

    Abstract: Multi-task learning for various real-world applications usually involves tasks with logical sequential dependence. For example, in online marketing, the cascade behavior pattern of $impression \rightarrow click \rightarrow conversion$ is usually modeled as multiple tasks in a multi-task manner, where the sequential dependence between tasks is simply connected with an explicitly defined function or… ▽ More

    Submitted 25 April, 2023; originally announced May 2023.

  27. arXiv:2304.01665  [pdf, other

    cs.CL

    Mastering Symbolic Operations: Augmenting Language Models with Compiled Neural Networks

    Authors: Yixuan Weng, Minjun Zhu, Fei Xia, Bin Li, Shizhu He, Kang Liu, Jun Zhao

    Abstract: Language models' (LMs) proficiency in handling deterministic symbolic reasoning and rule-based tasks remains limited due to their dependency implicit learning on textual data. To endow LMs with genuine rule comprehension abilities, we propose "Neural Comprehension" - a framework that synergistically integrates compiled neural networks (CoNNs) into the standard transformer architecture. CoNNs are n… ▽ More

    Submitted 9 March, 2024; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: Accepted in ICLR 2024

  28. arXiv:2303.00938  [pdf, other

    cs.RO cs.CV

    UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy

    Authors: Yinzhen Xu, Weikang Wan, Jialiang Zhang, Haoran Liu, Zikang Shan, Hao Shen, Ruicheng Wang, Haoran Geng, Yijia Weng, Jiayi Chen, Tengyu Liu, Li Yi, He Wang

    Abstract: In this work, we tackle the problem of learning universal robotic dexterous grasping from a point cloud observation under a table-top setting. The goal is to grasp and lift up objects in high-quality and diverse ways and generalize across hundreds of categories and even the unseen. Inspired by successful pipelines used in parallel gripper grasping, we split the task into two stages: 1) grasp propo… ▽ More

    Submitted 25 March, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023

  29. arXiv:2302.01107  [pdf, other

    cs.LG cs.AI cs.CV

    A Survey on Efficient Training of Transformers

    Authors: Bohan Zhuang, Jing Liu, Zizheng Pan, Haoyu He, Yuetian Weng, Chunhua Shen

    Abstract: Recent advances in Transformers have come with a huge requirement on computing resources, highlighting the importance of developing efficient training techniques to make Transformer training faster, at lower cost, and to higher accuracy by the efficient use of computation and memory resources. This survey provides the first systematic overview of the efficient training of Transformers, covering th… ▽ More

    Submitted 3 May, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: IJCAI 2023 survey track

  30. arXiv:2212.09561  [pdf, other

    cs.AI cs.CL

    Large Language Models are Better Reasoners with Self-Verification

    Authors: Yixuan Weng, Minjun Zhu, Fei Xia, Bin Li, Shizhu He, Shengping Liu, Bin Sun, Kang Liu, Jun Zhao

    Abstract: Recently, with the chain of thought (CoT) prompting, large language models (LLMs), e.g., GPT-3, have shown strong reasoning ability in several natural language processing tasks such as arithmetic, commonsense, and logical reasoning. However, LLMs with CoT require multi-step prompting and multi-token prediction, which is highly sensitive to individual mistakes and vulnerable to error accumulation.… ▽ More

    Submitted 19 October, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: Accept in EMNLP 2023 Findings

  31. arXiv:2212.05194  [pdf, other

    cs.CL cs.AI

    Artificial Text Detection with Multiple Training Strategies

    Authors: Bin Li, Yixuan Weng, Qiya Song, Hanjun Deng

    Abstract: As the deep learning rapidly promote, the artificial texts created by generative models are commonly used in news and social media. However, such models can be abused to generate product reviews, fake news, and even fake political content. The paper proposes a solution for the Russian Artificial Text Detection in the Dialogue shared task 2022 (RuATD 2022) to distinguish which model within the list… ▽ More

    Submitted 9 December, 2022; originally announced December 2022.

    Comments: Accepted by Dialogue-2022 Conference. 7 pages, 2 figures, 2 tables

    Report number: 20

    Journal ref: Computational linguistics and intellectual technologies: Papers from the annual conference Dialogue. 2022

  32. arXiv:2210.14823  [pdf, other

    cs.CV cs.AI

    Visual Answer Localization with Cross-modal Mutual Knowledge Transfer

    Authors: Yixuan Weng, Bin Li

    Abstract: The goal of visual answering localization (VAL) in the video is to obtain a relevant and concise time clip from a video as the answer to the given natural language question. Early methods are based on the interaction modelling between video and text to predict the visual answer by the visual predictor. Later, using the textual predictor with subtitles for the VAL proves to be more precise. However… ▽ More

    Submitted 28 October, 2022; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: 4 pages, 3 figures, 2 tables

  33. arXiv:2210.08763  [pdf, other

    cs.CL cs.AI

    ReasonChainQA: Text-based Complex Question Answering with Explainable Evidence Chains

    Authors: Minjun Zhu, Yixuan Weng, Shizhu He, Kang Liu, Jun Zhao

    Abstract: The ability of reasoning over evidence has received increasing attention in question answering (QA). Recently, natural language database (NLDB) conducts complex QA in knowledge base with textual evidences rather than structured representations, this task attracts a lot of attention because of the flexibility and richness of textual evidence. However, existing text-based complex question answering… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: 5 pages

    Journal ref: CAC 2022

  34. arXiv:2210.06144  [pdf, other

    nlin.AO cs.LG

    Digital twins of nonlinear dynamical systems

    Authors: Ling-Wei Kong, Yang Weng, Bryan Glaz, Mulugeta Haile, Ying-Cheng Lai

    Abstract: We articulate the design imperatives for machine-learning based digital twins for nonlinear dynamical systems subject to external driving, which can be used to monitor the ``health'' of the target system and anticipate its future collapse. We demonstrate that, with single or parallel reservoir computing configurations, the digital twins are capable of challenging forecasting and monitoring tasks.… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: 21 pages, 14 figures

  35. Learning to Locate Visual Answer in Video Corpus Using Question

    Authors: Bin Li, Yixuan Weng, Bin Sun, Shutao Li

    Abstract: We introduce a new task, named video corpus visual answer localization (VCVAL), which aims to locate the visual answer in a large collection of untrimmed instructional videos using a natural language question. This task requires a range of skills - the interaction between vision and language, video retrieval, passage comprehension, and visual answer localization. In this paper, we propose a cross-… ▽ More

    Submitted 1 March, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted by ICASSP 2023

  36. arXiv:2209.12009  [pdf, other

    cs.CV

    Tracking and Reconstructing Hand Object Interactions from Point Cloud Sequences in the Wild

    Authors: Jiayi Chen, Mi Yan, Jiazhao Zhang, Yinzhen Xu, Xiaolong Li, Yijia Weng, Li Yi, Shuran Song, He Wang

    Abstract: In this work, we tackle the challenging task of jointly tracking hand object pose and reconstructing their shapes from depth point cloud sequences in the wild, given the initial poses at frame 0. We for the first time propose a point cloud based hand joint tracking network, HandTrackNet, to estimate the inter-frame hand joint motion. Our HandTrackNet proposes a novel hand pose canonicalization mod… ▽ More

    Submitted 24 September, 2022; originally announced September 2022.

  37. arXiv:2207.10448  [pdf, other

    cs.CV

    An Efficient Spatio-Temporal Pyramid Transformer for Action Detection

    Authors: Yuetian Weng, Zizheng Pan, Mingfei Han, Xiaojun Chang, Bohan Zhuang

    Abstract: The task of action detection aims at deducing both the action category and localization of the start and end moment for each action instance in a long, untrimmed video. While vision Transformers have driven the recent advances in video understanding, it is non-trivial to design an efficient architecture for action detection due to the prohibitively expensive self-attentions over a long sequence of… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022

  38. arXiv:2207.01823  [pdf, other

    cs.CL cs.CV

    Scene-Aware Prompt for Multi-modal Dialogue Understanding and Generation

    Authors: Bin Li, Yixuan Weng, Ziyu Ma, Bin Sun, Shutao Li

    Abstract: This paper introduces the schemes of Team LingJing's experiments in NLPCC-2022-Shared-Task-4 Multi-modal Dialogue Understanding and Generation (MDUG). The MDUG task can be divided into two phases: multi-modal context understanding and response generation. To fully leverage the visual information for both scene understanding and dialogue generation, we propose the scene-aware prompt for the MDUG ta… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

    Comments: Accepted in NLPCC 2022

  39. arXiv:2206.00257  [pdf, other

    cs.LG cs.AI

    CoNSoLe: Convex Neural Symbolic Learning

    Authors: Haoran Li, Yang Weng, Hanghang Tong

    Abstract: Learning the underlying equation from data is a fundamental problem in many disciplines. Recent advances rely on Neural Networks (NNs) but do not provide theoretical guarantees in obtaining the exact equations owing to the non-convexity of NNs. In this paper, we propose Convex Neural Symbolic Learning (CoNSoLe) to seek convexity under mild conditions. The main idea is to decompose the recovering p… ▽ More

    Submitted 12 October, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

    Comments: 18 pages, 5 figures, conference for NeurIPS 2022

  40. arXiv:2205.09874  [pdf, other

    math.OC cs.AI eess.SY

    Explainable Graph Theory-Based Identification of Meter-Transformer Mapping

    Authors: Bilal Saleem, Yang Weng

    Abstract: Distributed energy resources are better for the environment but may cause transformer overload in distribution grids, calling for recovering meter-transformer mapping to provide situational awareness, i.e., the transformer loading. The challenge lies in recovering meter-transformer (M.T.) mapping for two common scenarios, e.g., large distances between a meter and its parent transformer or high sim… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

  41. arXiv:2204.09220  [pdf, other

    cs.CL cs.AI

    LingYi: Medical Conversational Question Answering System based on Multi-modal Knowledge Graphs

    Authors: Fei Xia, Bin Li, Yixuan Weng, Shizhu He, Kang Liu, Bin Sun, Shutao Li, Jun Zhao

    Abstract: The medical conversational system can relieve the burden of doctors and improve the efficiency of healthcare, especially during the pandemic. This paper presents a medical conversational question answering (CQA) system based on the multi-modal knowledge graph, namely "LingYi", which is designed as a pipeline framework to maintain high flexibility. Our system utilizes automated medical procedures i… ▽ More

    Submitted 20 April, 2022; originally announced April 2022.

    Comments: 9 pages, 4 figures, 5 tables

  42. arXiv:2204.04344  [pdf, other

    cs.CL

    Towards Better Chinese-centric Neural Machine Translation for Low-resource Languages

    Authors: Bin Li, Yixuan Weng, Fei Xia, Hanjun Deng

    Abstract: The last decade has witnessed enormous improvements in science and technology, stimulating the growing demand for economic and cultural exchanges in various countries. Building a neural machine translation (NMT) system has become an urgent trend, especially in the low-resource setting. However, recent work tends to study NMT systems for low-resource languages centered on English, while few works f… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: 7pages, 4 figures, 4 tables

  43. Prompt-based System for Personality and Interpersonal Reactivity Prediction

    Authors: Bin Li, Yixuan Weng

    Abstract: This paper describes our proposed method for the Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA) 2022 shared task on Personality Prediction (PER) and Reactivity Index Prediction (IRI). In this paper, we adopt the prompt-based learning method with the pre-trained language model to accomplish these tasks. Specifically, the prompt is designed to provide… ▽ More

    Submitted 18 May, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: Published in Software Impacts

    Journal ref: Software Impacts, (2022) 2665-9638

  44. arXiv:2203.06667  [pdf, other

    cs.CV cs.AI cs.CL

    Towards Visual-Prompt Temporal Answering Grounding in Medical Instructional Video

    Authors: Bin Li, Yixuan Weng, Bin Sun, Shutao Li

    Abstract: The temporal answering grounding in the video (TAGV) is a new task naturally derived from temporal sentence grounding in the video (TSGV). Given an untrimmed video and a text question, this task aims at locating the matching span from the video that can semantically answer the question. Existing methods tend to formulate the TAGV task with a visual span-based question answering (QA) approach by ma… ▽ More

    Submitted 29 March, 2022; v1 submitted 13 March, 2022; originally announced March 2022.

    Comments: 8 pages, 6 figures, 3 tables

  45. arXiv:2112.09996  [pdf, ps, other

    eess.SY cs.AI

    Curriculum Based Reinforcement Learning of Grid Topology Controllers to Prevent Thermal Cascading

    Authors: Amarsagar Reddy Ramapuram Matavalam, Kishan Prudhvi Guddanti, Yang Weng, Venkataramana Ajjarapu

    Abstract: This paper describes how domain knowledge of power system operators can be integrated into reinforcement learning (RL) frameworks to effectively learn agents that control the grid's topology to prevent thermal cascading. Typical RL-based topology controllers fail to perform well due to the large search/optimization space. Here, we propose an actor-critic-based agent to address the problem's combin… ▽ More

    Submitted 18 December, 2021; originally announced December 2021.

  46. arXiv:2112.08991  [pdf, other

    cs.CL cs.AI

    ADBCMM : Acronym Disambiguation by Building Counterfactuals and Multilingual Mixing

    Authors: Yixuan Weng, Fei Xia, Bin Li, Xiusheng Huang, Shizhu He

    Abstract: Scientific documents often contain a large number of acronyms. Disambiguation of these acronyms will help researchers better understand the meaning of vocabulary in the documents. In the past, thanks to large amounts of data from English literature, acronym task was mainly applied in English literature. However, for other low-resource languages, this task is difficult to obtain good performance an… ▽ More

    Submitted 5 February, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: SDU@AAAI-2022

  47. arXiv:2111.14306  [pdf, other

    cs.CL cs.AI

    SimCLAD: A Simple Framework for Contrastive Learning of Acronym Disambiguation

    Authors: Bin Li, Fei Xia, Yixuan Weng, Xiusheng Huang, Bin Sun

    Abstract: Acronym disambiguation means finding the correct meaning of an ambiguous acronym from the dictionary in a given sentence, which is one of the key points for scientific document understanding (SDU@AAAI-22). Recently, many attempts have tried to solve this problem via fine-tuning the pre-trained masked language models (MLMs) in order to obtain a better acronym representation. However, the acronym me… ▽ More

    Submitted 9 December, 2021; v1 submitted 28 November, 2021; originally announced November 2021.

    Comments: Accepted for Artificial Intelligence on Scientific Document Understanding (SDU) workshop at AAAI 2022

  48. arXiv:2111.14301  [pdf, other

    cs.CL cs.AI

    PSG: Prompt-based Sequence Generation for Acronym Extraction

    Authors: Bin Li, Fei Xia, Yixuan Weng, Xiusheng Huang, Bin Sun, Shutao Li

    Abstract: Acronym extraction aims to find acronyms (i.e., short-forms) and their meanings (i.e., long-forms) from the documents, which is important for scientific document understanding (SDU@AAAI-22) tasks. Previous works are devoted to modeling this task as a paragraph-level sequence labeling problem. However, it lacks the effective use of the external knowledge, especially when the datasets are in a low-r… ▽ More

    Submitted 9 December, 2021; v1 submitted 28 November, 2021; originally announced November 2021.

    Comments: Accepted for Artificial Intelligence on Scientific Document Understanding (SDU) workshop at AAAI 2022

  49. arXiv:2111.02026  [pdf

    cs.AI eess.SP

    The Powerful Use of AI in the Energy Sector: Intelligent Forecasting

    Authors: Erik Blasch, Haoran Li, Zhihao Ma, Yang Weng

    Abstract: Artificial Intelligence (AI) techniques continue to broaden across governmental and public sectors, such as power and energy - which serve as critical infrastructures for most societal operations. However, due to the requirements of reliability, accountability, and explainability, it is risky to directly apply AI-based methods to power systems because society cannot afford cascading failures and l… ▽ More

    Submitted 3 November, 2021; originally announced November 2021.

    Comments: Presented at AAAI FSS-21: Artificial Intelligence in Government and Public Sector, Washington, DC, USA

  50. arXiv:2111.00190  [pdf, other

    cs.CV

    Leveraging SE(3) Equivariance for Self-Supervised Category-Level Object Pose Estimation

    Authors: Xiaolong Li, Yijia Weng, Li Yi, Leonidas Guibas, A. Lynn Abbott, Shuran Song, He Wang

    Abstract: Category-level object pose estimation aims to find 6D object poses of previously unseen object instances from known categories without access to object CAD models. To reduce the huge amount of pose annotations needed for category-level learning, we propose for the first time a self-supervised learning framework to estimate category-level 6D object pose from single 3D point clouds.During training,… ▽ More

    Submitted 30 October, 2021; originally announced November 2021.

    Comments: 20 pages, 11 figures