subscribe to arXiv mailings

Towards Continual Knowledge Graph Embedding via Incremental Distillation

Authors: Jiajun Liu, Wenjun Ke, Peng Wang, Ziyu Shang, Jinhua Gao, Guozheng Li, Ke Ji, Yanhe Liu

Abstract: Traditional knowledge graph embedding (KGE) methods typically require preserving the entire knowledge graph (KG) with significant training costs when new knowledge emerges. To address this issue, the continual knowledge graph embedding (CKGE) task has been proposed to train the KGE model by learning emerging knowledge efficiently while simultaneously preserving decent old knowledge. However, the e… ▽ More Traditional knowledge graph embedding (KGE) methods typically require preserving the entire knowledge graph (KG) with significant training costs when new knowledge emerges. To address this issue, the continual knowledge graph embedding (CKGE) task has been proposed to train the KGE model by learning emerging knowledge efficiently while simultaneously preserving decent old knowledge. However, the explicit graph structure in KGs, which is critical for the above goal, has been heavily ignored by existing CKGE methods. On the one hand, existing methods usually learn new triples in a random order, destroying the inner structure of new KGs. On the other hand, old triples are preserved with equal priority, failing to alleviate catastrophic forgetting effectively. In this paper, we propose a competitive method for CKGE based on incremental distillation (IncDE), which considers the full use of the explicit graph structure in KGs. First, to optimize the learning order, we introduce a hierarchical strategy, ranking new triples for layer-by-layer learning. By employing the inter- and intra-hierarchical orders together, new triples are grouped into layers based on the graph structure features. Secondly, to preserve the old knowledge effectively, we devise a novel incremental distillation mechanism, which facilitates the seamless transfer of entity representations from the previous layer to the next one, promoting old knowledge preservation. Finally, we adopt a two-stage training paradigm to avoid the over-corruption of old knowledge influenced by under-trained new knowledge. Experimental results demonstrate the superiority of IncDE over state-of-the-art baselines. Notably, the incremental distillation mechanism contributes to improvements of 0.2%-6.5% in the mean reciprocal rank (MRR) score. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: Accepted by AAAI 2024

arXiv:2405.02288 [pdf, other]

Prospective Role of Foundation Models in Advancing Autonomous Vehicles

Authors: Jianhua Wu, Bingzhao Gao, Jincheng Gao, Jianhao Yu, Hongqing Chu, Qiankun Yu, Xun Gong, Yi Chang, H. Eric Tseng, Hong Chen, Jie Chen

Abstract: With the development of artificial intelligence and breakthroughs in deep learning, large-scale Foundation Models (FMs), such as GPT, Sora, etc., have achieved remarkable results in many fields including natural language processing and computer vision. The application of FMs in autonomous driving holds considerable promise. For example, they can contribute to enhancing scene understanding and reas… ▽ More With the development of artificial intelligence and breakthroughs in deep learning, large-scale Foundation Models (FMs), such as GPT, Sora, etc., have achieved remarkable results in many fields including natural language processing and computer vision. The application of FMs in autonomous driving holds considerable promise. For example, they can contribute to enhancing scene understanding and reasoning. By pre-training on rich linguistic and visual data, FMs can understand and interpret various elements in a driving scene, and provide cognitive reasoning to give linguistic and action instructions for driving decisions and planning. Furthermore, FMs can augment data based on the understanding of driving scenarios to provide feasible scenes of those rare occurrences in the long tail distribution that are unlikely to be encountered during routine driving and data collection. The enhancement can subsequently lead to improvement in the accuracy and reliability of autonomous driving systems. Another testament to the potential of FMs' applications lies in World Models, exemplified by the DREAMER series, which showcases the ability to comprehend physical laws and dynamics. Learning from massive data under the paradigm of self-supervised learning, World Model can generate unseen yet plausible driving environments, facilitating the enhancement in the prediction of road users' behaviors and the off-line training of driving strategies. In this paper, we synthesize the applications and future trends of FMs in autonomous driving. By utilizing the powerful capabilities of FMs, we strive to tackle the potential issues stemming from the long-tail distribution in autonomous driving, consequently advancing overall safety in this domain. △ Less

Submitted 17 May, 2024; v1 submitted 8 December, 2023; originally announced May 2024.

Comments: 45 pages,8 figures

arXiv:2405.02218 [pdf, other]

Multispectral Fine-Grained Classification of Blackgrass in Wheat and Barley Crops

Authors: Madeleine Darbyshire, Shaun Coutts, Eleanor Hammond, Fazilet Gokbudak, Cengiz Oztireli, Petra Bosilj, Junfeng Gao, Elizabeth Sklar, Simon Parsons

Abstract: As the burden of herbicide resistance grows and the environmental repercussions of excessive herbicide use become clear, new ways of managing weed populations are needed. This is particularly true for cereal crops, like wheat and barley, that are staple food crops and occupy a globally significant portion of agricultural land. Even small improvements in weed management practices across these major… ▽ More As the burden of herbicide resistance grows and the environmental repercussions of excessive herbicide use become clear, new ways of managing weed populations are needed. This is particularly true for cereal crops, like wheat and barley, that are staple food crops and occupy a globally significant portion of agricultural land. Even small improvements in weed management practices across these major food crops worldwide would yield considerable benefits for both the environment and global food security. Blackgrass is a major grass weed which causes particular problems in cereal crops in north-west Europe, a major cereal production area, because it has high levels of of herbicide resistance and is well adapted to agronomic practice in this region. With the use of machine vision and multispectral imaging, we investigate the effectiveness of state-of-the-art methods to identify blackgrass in wheat and barley crops. As part of this work, we provide a large dataset with which we evaluate several key aspects of blackgrass weed recognition. Firstly, we determine the performance of different CNN and transformer-based architectures on images from unseen fields. Secondly, we demonstrate the role that different spectral bands have on the performance of weed classification. Lastly, we evaluate the role of dataset size in classification performance for each of the models trialled. We find that even with a fairly modest quantity of training data an accuracy of almost 90% can be achieved on images from unseen fields. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: 19 pages, 6 figures

arXiv:2405.01561 [pdf]

doi 10.5281/zenodo.10899798

Rapid Mobile App Development for Generative AI Agents on MIT App Inventor

Authors: Jaida Gao, Calab Su, Etai Miller, Kevin Lu, Yu Meng

Abstract: The evolution of Artificial Intelligence (AI) stands as a pivotal force shaping our society, finding applications across diverse domains such as education, sustainability, and safety. Leveraging AI within mobile applications makes it easily accessible to the public, catalyzing its transformative potential. In this paper, we present a methodology for the rapid development of AI agent applications u… ▽ More The evolution of Artificial Intelligence (AI) stands as a pivotal force shaping our society, finding applications across diverse domains such as education, sustainability, and safety. Leveraging AI within mobile applications makes it easily accessible to the public, catalyzing its transformative potential. In this paper, we present a methodology for the rapid development of AI agent applications using the development platform provided by MIT App Inventor. To demonstrate its efficacy, we share the development journey of three distinct mobile applications: SynchroNet for fostering sustainable communities; ProductiviTeams for addressing procrastination; and iHELP for enhancing community safety. All three applications seamlessly integrate a spectrum of generative AI features, leveraging OpenAI APIs. Furthermore, we offer insights gleaned from overcoming challenges in integrating diverse tools and AI functionalities, aiming to inspire young developers to join our efforts in building practical AI agent applications. △ Less

Submitted 31 March, 2024; originally announced May 2024.

Journal ref: Journal of advances in information science and technology 2(3) 1-8, March 2024

arXiv:2405.01306 [pdf, other]

Graph is all you need? Lightweight data-agnostic neural architecture search without training

Authors: Zhenhan Huang, Tejaswini Pedapati, Pin-Yu Chen, Chunhen Jiang, Jianxi Gao

Abstract: Neural architecture search (NAS) enables the automatic design of neural network models. However, training the candidates generated by the search algorithm for performance evaluation incurs considerable computational overhead. Our method, dubbed nasgraph, remarkably reduces the computational costs by converting neural architectures to graphs and using the average degree, a graph measure, as the pro… ▽ More Neural architecture search (NAS) enables the automatic design of neural network models. However, training the candidates generated by the search algorithm for performance evaluation incurs considerable computational overhead. Our method, dubbed nasgraph, remarkably reduces the computational costs by converting neural architectures to graphs and using the average degree, a graph measure, as the proxy in lieu of the evaluation metric. Our training-free NAS method is data-agnostic and light-weight. It can find the best architecture among 200 randomly sampled architectures from NAS-Bench201 in 217 CPU seconds. Besides, our method is able to achieve competitive performance on various datasets including NASBench-101, NASBench-201, and NDS search spaces. We also demonstrate that nasgraph generalizes to more challenging tasks on Micro TransNAS-Bench-101. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2405.00778 [pdf, ps, other]

Rigidity matroids and linear algebraic matroids with applications to matrix completion and tensor codes

Authors: Joshua Brakensiek, Manik Dhar, Jiyang Gao, Sivakanth Gopi, Matt Larson

Abstract: We establish a connection between problems studied in rigidity theory and matroids arising from linear algebraic constructions like tensor products and symmetric products. A special case of this correspondence identifies the problem of giving a description of the correctable erasure patterns in a maximally recoverable tensor code with the problem of describing bipartite rigid graphs or low-rank co… ▽ More We establish a connection between problems studied in rigidity theory and matroids arising from linear algebraic constructions like tensor products and symmetric products. A special case of this correspondence identifies the problem of giving a description of the correctable erasure patterns in a maximally recoverable tensor code with the problem of describing bipartite rigid graphs or low-rank completable matrix patterns. Additionally, we relate dependencies among symmetric products of generic vectors to graph rigidity and symmetric matrix completion. With an eye toward applications to computer science, we study the dependency of these matroids on the characteristic by giving new combinatorial descriptions in several cases, including the first description of the correctable patterns in an (m, n, a=2, b=2) maximally recoverable tensor code. △ Less

Submitted 1 May, 2024; originally announced May 2024.

MSC Class: 94B05; 52C25; 05B35

arXiv:2405.00557 [pdf, other]

Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment

Authors: Zhili Liu, Yunhao Gou, Kai Chen, Lanqing Hong, Jiahui Gao, Fei Mi, Yu Zhang, Zhenguo Li, Xin Jiang, Qun Liu, James T. Kwok

Abstract: As the capabilities of large language models (LLMs) have expanded dramatically, aligning these models with human values presents a significant challenge. Traditional alignment strategies rely heavily on human intervention, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), or on the self-alignment capacities of LLMs, which usually require a strong LLM's eme… ▽ More As the capabilities of large language models (LLMs) have expanded dramatically, aligning these models with human values presents a significant challenge. Traditional alignment strategies rely heavily on human intervention, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), or on the self-alignment capacities of LLMs, which usually require a strong LLM's emergent ability to improve its original bad answer. To address these challenges, we propose a novel self-alignment method that utilizes a Chain of Thought (CoT) approach, termed AlignCoT. This method encompasses stages of Question Analysis, Answer Guidance, and Safe Answer production. It is designed to enable LLMs to generate high-quality, safe responses throughout various stages of their development. Furthermore, we introduce the Mixture of insighTful Experts (MoTE) architecture, which applies mixture of experts to enhance each component of the AlignCoT process, markedly increasing alignment efficiency. The MoTE approach not only outperforms existing methods in aligning LLMs with human values but also highlights the benefits of using self-generated data, revealing the dual benefits of improved alignment and training efficiency. △ Less

Submitted 8 July, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.18527 [pdf]

Bridging Data Barriers among Participants: Assessing the Potential of Geoenergy through Federated Learning

Authors: Weike Peng, Jiaxin Gao, Yuntian Chen, Shengwei Wang

Abstract: Machine learning algorithms emerge as a promising approach in energy fields, but its practical is hindered by data barriers, stemming from high collection costs and privacy concerns. This study introduces a novel federated learning (FL) framework based on XGBoost models, enabling safe collaborative modeling with accessible yet concealed data from multiple parties. Hyperparameter tuning of the mode… ▽ More Machine learning algorithms emerge as a promising approach in energy fields, but its practical is hindered by data barriers, stemming from high collection costs and privacy concerns. This study introduces a novel federated learning (FL) framework based on XGBoost models, enabling safe collaborative modeling with accessible yet concealed data from multiple parties. Hyperparameter tuning of the models is achieved through Bayesian Optimization. To ascertain the merits of the proposed FL-XGBoost method, a comparative analysis is conducted between separate and centralized models to address a classical binary classification problem in geoenergy sector. The results reveal that the proposed FL framework strikes an optimal balance between privacy and accuracy. FL models demonstrate superior accuracy and generalization capabilities compared to separate models, particularly for participants with limited data or low correlation features and offers significant privacy benefits compared to centralized model. The aggregated optimization approach within the FL agreement proves effective in tuning hyperparameters. This study opens new avenues for assessing unconventional reservoirs through collaborative and privacy-preserving FL techniques. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.17287 [pdf, other]

When to Trust LLMs: Aligning Confidence with Response Quality

Authors: Shuchang Tao, Liuyi Yao, Hanxing Ding, Yuexiang Xie, Qi Cao, Fei Sun, Jinyang Gao, Huawei Shen, Bolin Ding

Abstract: Despite the success of large language models (LLMs) in natural language generation, much evidence shows that LLMs may produce incorrect or nonsensical text. This limitation highlights the importance of discerning when to trust LLMs, especially in safety-critical domains. Existing methods often express reliability by confidence level, however, their effectiveness is limited by the lack of objective… ▽ More Despite the success of large language models (LLMs) in natural language generation, much evidence shows that LLMs may produce incorrect or nonsensical text. This limitation highlights the importance of discerning when to trust LLMs, especially in safety-critical domains. Existing methods often express reliability by confidence level, however, their effectiveness is limited by the lack of objective guidance. To address this, we propose CONfidence-Quality-ORDer-preserving alignment approach (CONQORD), which leverages reinforcement learning guided by a tailored dual-component reward function. This function integrates quality reward and order-preserving alignment reward functions. Specifically, the order-preserving reward incentivizes the model to verbalize greater confidence for responses of higher quality to align the order of confidence and quality. Experiments demonstrate that CONQORD significantly improves the alignment performance between confidence and response accuracy, without causing over-cautious. Furthermore, the aligned confidence provided by CONQORD informs when to trust LLMs, and acts as a determinant for initiating the retrieval process of external knowledge. Aligning confidence with response quality ensures more transparent and reliable responses, providing better trustworthiness. △ Less

Submitted 9 June, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

Comments: Accepted by ACL 2024

arXiv:2404.16375 [pdf, other]

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Authors: An Yan, Zhengyuan Yang, Junda Wu, Wanrong Zhu, Jianwei Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Julian McAuley, Jianfeng Gao, Lijuan Wang

Abstract: Set-of-Mark (SoM) Prompting unleashes the visual grounding capability of GPT-4V, by enabling the model to associate visual objects with tags inserted on the image. These tags, marked with alphanumerics, can be indexed via text tokens for easy reference. Despite the extraordinary performance from GPT-4V, we observe that other Multimodal Large Language Models (MLLMs) struggle to understand these vis… ▽ More Set-of-Mark (SoM) Prompting unleashes the visual grounding capability of GPT-4V, by enabling the model to associate visual objects with tags inserted on the image. These tags, marked with alphanumerics, can be indexed via text tokens for easy reference. Despite the extraordinary performance from GPT-4V, we observe that other Multimodal Large Language Models (MLLMs) struggle to understand these visual tags. To promote the learning of SoM prompting for open-source models, we propose a new learning paradigm: "list items one by one," which asks the model to enumerate and describe all visual tags placed on the image following the alphanumeric orders of tags. By integrating our curated dataset with other visual instruction tuning datasets, we are able to equip existing MLLMs with the SoM prompting ability. Furthermore, we evaluate our finetuned SoM models on five MLLM benchmarks. We find that this new dataset, even in a relatively small size (10k-30k images with tags), significantly enhances visual reasoning capabilities and reduces hallucinations for MLLMs. Perhaps surprisingly, these improvements persist even when the visual tags are omitted from input images during inference. This suggests the potential of "list items one by one" as a new paradigm for training MLLMs, which strengthens the object-text alignment through the use of visual tags in the training stage. Finally, we conduct analyses by probing trained models to understand the working mechanism of SoM. Our code and data are available at \url{https://github.com/zzxslp/SoM-LLaVA}. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: Preprint

arXiv:2404.15899 [pdf, other]

ST-MambaSync: The Complement of Mamba and Transformers for Spatial-Temporal in Traffic Flow Prediction

Authors: Zhiqi Shao, Xusheng Yao, Ze Wang, Junbin Gao

Abstract: Accurate traffic flow prediction is crucial for optimizing traffic management, enhancing road safety, and reducing environmental impacts. Existing models face challenges with long sequence data, requiring substantial memory and computational resources, and often suffer from slow inference times due to the lack of a unified summary state. This paper introduces ST-MambaSync, an innovative traffic fl… ▽ More Accurate traffic flow prediction is crucial for optimizing traffic management, enhancing road safety, and reducing environmental impacts. Existing models face challenges with long sequence data, requiring substantial memory and computational resources, and often suffer from slow inference times due to the lack of a unified summary state. This paper introduces ST-MambaSync, an innovative traffic flow prediction model that combines transformer technology with the ST-Mamba block, representing a significant advancement in the field. We are the pioneers in employing the Mamba mechanism which is an attention mechanism integrated with ResNet within a transformer framework, which significantly enhances the model's explainability and performance. ST-MambaSync effectively addresses key challenges such as data length and computational efficiency, setting new benchmarks for accuracy and processing speed through comprehensive comparative analysis. This development has significant implications for urban planning and real-time traffic management, establishing a new standard in traffic flow prediction technology. △ Less

Submitted 9 May, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

Comments: 11 pages. arXiv admin note: substantial text overlap with arXiv:2404.13257

MSC Class: 53A45 ACM Class: I.2.0

arXiv:2404.15357 [pdf]

On-liquid-gallium surface synthesis of ultra-smooth conductive metal-organic framework thin films

Authors: Jinxin Liu, Yunxu Chen, Xing Huang, Yanhan Ren, Mike Hambsch, David Bodesheim, Darius Pohl, Xiaodong Li, Marielle Deconinck, Bowen Zhang, Markus Löffler, Zhongquan Liao, Fengxiang Zhao, Arezoo Dianat, Gianaurelio Cuniberti, Yana Vaynzof, Junfeng Gao, Jingcheng Hao, Stefan C. B. Mannsfeld, Xinliang Feng, Renhao Dong

Abstract: Conductive metal-organic frameworks (MOFs) are emerging electroactive materials for (opto-)electronics. However, it remains a great challenge to achieve reliable MOF-based devices via the existing synthesis methods that are compatible with the complementary metal-oxide-semiconductor technology, as the surface roughness of thus-far synthetic MOF films or pellets is rather high for efficient electro… ▽ More Conductive metal-organic frameworks (MOFs) are emerging electroactive materials for (opto-)electronics. However, it remains a great challenge to achieve reliable MOF-based devices via the existing synthesis methods that are compatible with the complementary metal-oxide-semiconductor technology, as the surface roughness of thus-far synthetic MOF films or pellets is rather high for efficient electrode contact. Here, we develop an on-liquid-gallium surface synthesis (OLGSS) strategy under chemical vapor deposition (CVD) conditions for the controlled growth of two-dimensional conjugated MOF (2D c-MOF) thin films with ten-fold improvement of surface flatness (surface roughness can reach as low as ~2 Å) compared with MOF films grown by the traditional methods. Supported by theoretical modeling, we unveil a layer-by-layer CVD growth mode for constructing flattening surfaces, that is triggered by the high adhesion energy between gallium (Ga) and planar aromatic ligands. We further demonstrate the generality of the as-proposed OLGSS strategy by reproducing such a flat surface over nine different 2D c-MOF films with variable thicknesses (~2 to 208 nm) and large lateral sizes (over 1 cm2). The resultant ultra-smooth 2D c-MOF films enable the formation of high-quality electrical contacts with gold (Au) electrodes, leading to a reduction of contact resistance by over ten orders of magnitude compared to the traditional uneven MOF films. Furthermore, due to the efficient interfacial interaction benifited from the high-quality contacts, the prepared van der Waals heterostructure (vdWH) of OLGSS c-MOF and MoS2 exhibits intriguing photoluminescence (PL) enhancement, PL peak shift and large work function modulation. The establishment of the reliable OLGSS method provides the chances to push the development of MOF electronics and the construction of multicomponent MOF-based heterostructure materials. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.15279 [pdf, other]

Jointly Modeling Spatio-Temporal Features of Tactile Signals for Action Classification

Authors: Jimmy Lin, Junkai Li, Jiasi Gao, Weizhi Ma, Yang Liu

Abstract: Tactile signals collected by wearable electronics are essential in modeling and understanding human behavior. One of the main applications of tactile signals is action classification, especially in healthcare and robotics. However, existing tactile classification methods fail to capture the spatial and temporal features of tactile signals simultaneously, which results in sub-optimal performances.… ▽ More Tactile signals collected by wearable electronics are essential in modeling and understanding human behavior. One of the main applications of tactile signals is action classification, especially in healthcare and robotics. However, existing tactile classification methods fail to capture the spatial and temporal features of tactile signals simultaneously, which results in sub-optimal performances. In this paper, we design Spatio-Temporal Aware tactility Transformer (STAT) to utilize continuous tactile signals for action classification. We propose spatial and temporal embeddings along with a new temporal pretraining task in our model, which aims to enhance the transformer in modeling the spatio-temporal features of tactile signals. Specially, the designed temporal pretraining task is to differentiate the time order of tubelet inputs to model the temporal properties explicitly. Experimental results on a public action classification dataset demonstrate that our model outperforms state-of-the-art methods in all metrics. △ Less

Submitted 20 January, 2024; originally announced April 2024.

Comments: Accepted by AAAI 2024

arXiv:2404.14219 [pdf, other]

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset for training, a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide some initial parameter-scaling results with a 7B and 14B models trained for 4.8T tokens, called phi-3-small and phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75% and 78% on MMLU, and 8.7 and 8.9 on MT-bench). Moreover, we also introduce phi-3-vision, a 4.2 billion parameter model based on phi-3-mini with strong reasoning capabilities for image and text prompts. △ Less

Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: 19 pages

arXiv:2404.13992 [pdf, other]

Dynamic Proxy Domain Generalizes the Crowd Localization by Better Binary Segmentation

Authors: Junyu Gao, Da Zhang, Xuelong Li

Abstract: Crowd localization targets on predicting each instance precise location within an image. Current advanced methods propose the pixel-wise binary classification to tackle the congested prediction, in which the pixel-level thresholds binarize the prediction confidence of being the pedestrian head. Since the crowd scenes suffer from extremely varying contents, counts and scales, the confidence-thresho… ▽ More Crowd localization targets on predicting each instance precise location within an image. Current advanced methods propose the pixel-wise binary classification to tackle the congested prediction, in which the pixel-level thresholds binarize the prediction confidence of being the pedestrian head. Since the crowd scenes suffer from extremely varying contents, counts and scales, the confidence-threshold learner is fragile and under-generalized encountering domain knowledge shift. Moreover, at the most time, the target domain is agnostic in training. Hence, it is imperative to exploit how to enhance the generalization of confidence-threshold locator to the latent target domain. In this paper, we propose a Dynamic Proxy Domain (DPD) method to generalize the learner under domain shift. Concretely, based on the theoretical analysis to the generalization error risk upper bound on the latent target domain to a binary classifier, we propose to introduce a generated proxy domain to facilitate generalization. Then, based on the theory, we design a DPD algorithm which is composed by a training paradigm and proxy domain generator to enhance the domain generalization of the confidence-threshold learner. Besides, we conduct our method on five kinds of domain shift scenarios, demonstrating the effectiveness on generalizing the crowd localization. Our code will be available at https://github.com/zhangda1018/DPD. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.13611 [pdf, other]

Video sentence grounding with temporally global textual knowledge

Authors: Cai Chen, Runzhong Zhang, Jianjun Gao, Kejun Wu, Kim-Hui Yap, Yi Wang

Abstract: Temporal sentence grounding involves the retrieval of a video moment with a natural language query. Many existing works directly incorporate the given video and temporally localized query for temporal grounding, overlooking the inherent domain gap between different modalities. In this paper, we utilize pseudo-query features containing extensive temporally global textual knowledge sourced from the… ▽ More Temporal sentence grounding involves the retrieval of a video moment with a natural language query. Many existing works directly incorporate the given video and temporally localized query for temporal grounding, overlooking the inherent domain gap between different modalities. In this paper, we utilize pseudo-query features containing extensive temporally global textual knowledge sourced from the same video-query pair, to enhance the bridging of domain gaps and attain a heightened level of similarity between multi-modal features. Specifically, we propose a Pseudo-query Intermediary Network (PIN) to achieve an improved alignment of visual and comprehensive pseudo-query features within the feature space through contrastive learning. Subsequently, we utilize learnable prompts to encapsulate the knowledge of pseudo-queries, propagating them into the textual encoder and multi-modal fusion module, further enhancing the feature alignment between visual and language for better temporal grounding. Extensive experiments conducted on the Charades-STA and ActivityNet-Captions datasets demonstrate the effectiveness of our method. △ Less

Submitted 1 June, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.13257 [pdf, other]

ST-Mamba: Spatial-Temporal Selective State Space Model for Traffic Flow Prediction

Authors: Zhiqi Shao, Michael G. H. Bell, Ze Wang, D. Glenn Geers, Haoning Xi, Junbin Gao

Abstract: Traffic flow prediction, a critical aspect of intelligent transportation systems, has been increasingly popular in the field of artificial intelligence, driven by the availability of extensive traffic data. The current challenges of traffic flow prediction lie in integrating diverse factors while balancing the trade-off between computational complexity and the precision necessary for effective lon… ▽ More Traffic flow prediction, a critical aspect of intelligent transportation systems, has been increasingly popular in the field of artificial intelligence, driven by the availability of extensive traffic data. The current challenges of traffic flow prediction lie in integrating diverse factors while balancing the trade-off between computational complexity and the precision necessary for effective long-range and large-scale predictions. To address these challenges, we introduce a Spatial-Temporal Selective State Space (ST-Mamba) model, which is the first to leverage the power of spatial-temporal learning in traffic flow prediction without using graph modeling. The ST-Mamba model can effectively capture the long-range dependency for traffic flow data, thereby avoiding the issue of over-smoothing. The proposed ST-Mamba model incorporates an effective Spatial-Temporal Mixer (ST-Mixer) to seamlessly integrate spatial and temporal data processing into a unified framework and employs a Spatial-Temporal Selective State Space (ST-SSM) block to improve computational efficiency. The proposed ST-Mamba model, specifically designed for spatial-temporal data, simplifies processing procedure and enhances generalization capabilities, thereby significantly improving the accuracy of long-range traffic flow prediction. Compared to the previous state-of-the-art (SOTA) model, the proposed ST-Mamba model achieves a 61.11\% improvement in computational speed and increases prediction accuracy by 0.67\%. Extensive experiments with real-world traffic datasets demonstrate that the \textsf{ST-Mamba} model sets a new benchmark in traffic flow prediction, achieving SOTA performance in computational efficiency for both long- and short-range predictions and significantly improving the overall efficiency and effectiveness of traffic management. △ Less

Submitted 18 May, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

Comments: 25 pages, 6 figures

MSC Class: 53A45 ACM Class: I.2.0

arXiv:2404.12210 [pdf, other]

An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training

Authors: Jin Gao, Shubo Lin, Shaoru Wang, Yutong Kou, Zeming Li, Liang Li, Congxuan Zhang, Xiaoqin Zhang, Yizheng Wang, Weiming Hu

Abstract: Masked image modeling (MIM) pre-training for large-scale vision transformers (ViTs) has enabled promising downstream performance on top of the learned self-supervised ViT features. In this paper, we question if the \textit{extremely simple} lightweight ViTs' fine-tuning performance can also benefit from this pre-training paradigm, which is considerably less studied yet in contrast to the well-esta… ▽ More Masked image modeling (MIM) pre-training for large-scale vision transformers (ViTs) has enabled promising downstream performance on top of the learned self-supervised ViT features. In this paper, we question if the \textit{extremely simple} lightweight ViTs' fine-tuning performance can also benefit from this pre-training paradigm, which is considerably less studied yet in contrast to the well-established lightweight architecture design methodology. We use an observation-analysis-solution flow for our study. We first systematically observe different behaviors among the evaluated pre-training methods with respect to the downstream fine-tuning data scales. Furthermore, we analyze the layer representation similarities and attention maps across the obtained models, which clearly show the inferior learning of MIM pre-training on higher layers, leading to unsatisfactory transfer performance on data-insufficient downstream tasks. This finding is naturally a guide to designing our distillation strategies during pre-training to solve the above deterioration problem. Extensive experiments have demonstrated the effectiveness of our approach. Our pre-training with distillation on pure lightweight ViTs with vanilla/hierarchical design ($5.7M$/$6.5M$) can achieve $79.4\%$/$78.9\%$ top-1 accuracy on ImageNet-1K. It also enables SOTA performance on the ADE20K segmentation task ($42.8\%$ mIoU) and LaSOT tracking task ($66.1\%$ AUC) in the lightweight regime. The latter even surpasses all the current SOTA lightweight CPU-realtime trackers. △ Less

Submitted 25 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

Comments: A submission to IJCV

arXiv:2404.10866 [pdf, other]

Spectroscopic measurements and models of energy deposition in the substrate of quantum circuits by natural ionizing radiation

Authors: Joseph W. Fowler, Paul Szypryt, Raymond Bunker, Ellen R. Edwards, Ian Fogarty Florang, Jiansong Gao, Andrea Giachero, Shannon F. Hoogerheide, Ben Loer, H. Pieter Mumm, Nathan Nakamura, Galen C. O'Neil, John L. Orrell, Elizabeth M. Scott, Jason Stevens, Daniel S. Swetz, Brent A. VanDevender, Michael Vissers, Joel N. Ullom

Abstract: Naturally occurring background radiation is a source of correlated decoherence events in superconducting qubits that will challenge error-correction schemes. To characterize the radiation environment in an unshielded laboratory, we performed broadband, spectroscopic measurements of background events in silicon substrates located inside a millikelvin refrigerator, an environment representative of s… ▽ More Naturally occurring background radiation is a source of correlated decoherence events in superconducting qubits that will challenge error-correction schemes. To characterize the radiation environment in an unshielded laboratory, we performed broadband, spectroscopic measurements of background events in silicon substrates located inside a millikelvin refrigerator, an environment representative of superconducting qubit systems. We measured the background spectra in silicon substrates of two thicknesses, 0.5 mm and 1.5 mm, and obtained the average event rate and the integrated power deposition. In a 25 mm^2 area and the thinner substrate, these values are 0.023 events per second and 4.9 keV/s, counting events that deposit at least 40 keV. We find the background spectrum to be nearly featureless. Its intensity decreases by a factor of 40,000 between 100 keV and 3 MeV for silicon substrates 0.5 mm thick. We find the cryogenic measurements to be in good agreement with predictions based on measurements of the terrestrial gamma-ray flux, published models of cosmic-ray fluxes, a crude model of the cryostat, and radiation-transport simulations. No free parameters are required to predict the background spectra in the silicon substrates. The good agreement between measurements and predictions allow assessment of the relative contributions of terrestrial and cosmic background sources and their dependence on substrate thickness. Our spectroscopic measurements are performed with superconducting microresonators that transduce deposited energy to a readily detectable electrical signal. We find that gamma-ray emissions from radioisotopes are responsible for the majority of events depositing E<1.5 MeV, while nucleons among the cosmic-ray secondary particles cause most events that deposit more energy. These results suggest several paths to reducing the impact of background radiation on quantum circuits. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.10719 [pdf, other]

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

Authors: Shusheng Xu, Wei Fu, Jiaxuan Gao, Wenjie Ye, Weilin Liu, Zhiyu Mei, Guangju Wang, Chao Yu, Yi Wu

Abstract: Reinforcement Learning from Human Feedback (RLHF) is currently the most widely used method to align large language models (LLMs) with human preferences. Existing RLHF methods can be roughly categorized as either reward-based or reward-free. Novel applications such as ChatGPT and Claude leverage reward-based methods that first learn a reward model and apply actor-critic algorithms, such as Proximal… ▽ More Reinforcement Learning from Human Feedback (RLHF) is currently the most widely used method to align large language models (LLMs) with human preferences. Existing RLHF methods can be roughly categorized as either reward-based or reward-free. Novel applications such as ChatGPT and Claude leverage reward-based methods that first learn a reward model and apply actor-critic algorithms, such as Proximal Policy Optimization (PPO). However, in academic benchmarks, state-of-the-art results are often achieved via reward-free methods, such as Direct Preference Optimization (DPO). Is DPO truly superior to PPO? Why does PPO perform poorly on these benchmarks? In this paper, we first conduct both theoretical and empirical studies on the algorithmic properties of DPO and show that DPO may have fundamental limitations. Moreover, we also comprehensively examine PPO and reveal the key factors for the best performances of PPO in fine-tuning LLMs. Finally, we benchmark DPO and PPO across a collection of RLHF testbeds, ranging from dialogue to code generation. Experiment results demonstrate that PPO is able to surpass other alignment methods in all cases and achieve state-of-the-art results in challenging code competitions. △ Less

Submitted 21 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: 16 pages, 2 figures, 14 tables

arXiv:2404.10260 [pdf, other]

HelixFold-Multimer: Elevating Protein Complex Structure Prediction to New Heights

Authors: Xiaomin Fang, Jie Gao, Jing Hu, Lihang Liu, Yang Xue, Xiaonan Zhang, Kunrui Zhu

Abstract: While monomer protein structure prediction tools boast impressive accuracy, the prediction of protein complex structures remains a daunting challenge in the field. This challenge is particularly pronounced in scenarios involving complexes with protein chains from different species, such as antigen-antibody interactions, where accuracy often falls short. Limited by the accuracy of complex predictio… ▽ More While monomer protein structure prediction tools boast impressive accuracy, the prediction of protein complex structures remains a daunting challenge in the field. This challenge is particularly pronounced in scenarios involving complexes with protein chains from different species, such as antigen-antibody interactions, where accuracy often falls short. Limited by the accuracy of complex prediction, tasks based on precise protein-protein interaction analysis also face obstacles. In this report, we highlight the ongoing advancements of our protein complex structure prediction model, HelixFold-Multimer, underscoring its enhanced performance. HelixFold-Multimer provides precise predictions for diverse protein complex structures, especially in therapeutic protein interactions. Notably, HelixFold-Multimer achieves remarkable success in antigen-antibody and peptide-protein structure prediction, greatly surpassing AlphaFold 3. HelixFold-Multimer is now available for public use on the PaddleHelix platform, offering both a general version and an antigen-antibody version. Researchers can conveniently access and utilize this service for their development needs. △ Less

Submitted 17 May, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.09920 [pdf, other]

Combined Pre-Supernova Alert System with Kamland and Super-Kamiokande

Authors: KamLAND, Super-Kamiokande Collaborations, :, Seisho Abe, Minori Eizuka, Sawako Futagi, Azusa Gando, Yoshihito Gando, Shun Goto, Takahiko Hachiya, Kazumi Hata, Koichi Ichimura, Sei Ieki, Haruo Ikeda, Kunio Inoue, Koji Ishidoshiro, Yuto Kamei, Nanami Kawada, Yasuhiro Kishimoto, Masayuki Koga, Maho Kurasawa, Tadao Mitsui, Haruhiko Miyake, Daisuke Morita, Takeshi Nakahata , et al. (290 additional authors not shown)

Abstract: Preceding a core-collapse supernova, various processes produce an increasing amount of neutrinos of all flavors characterized by mounting energies from the interior of massive stars. Among them, the electron antineutrinos are potentially detectable by terrestrial neutrino experiments such as KamLAND and Super-Kamiokande via inverse beta decay interactions. Once these pre-supernova neutrinos are ob… ▽ More Preceding a core-collapse supernova, various processes produce an increasing amount of neutrinos of all flavors characterized by mounting energies from the interior of massive stars. Among them, the electron antineutrinos are potentially detectable by terrestrial neutrino experiments such as KamLAND and Super-Kamiokande via inverse beta decay interactions. Once these pre-supernova neutrinos are observed, an early warning of the upcoming core-collapse supernova can be provided. In light of this, KamLAND and Super-Kamiokande, both located in the Kamioka mine in Japan, have been monitoring pre-supernova neutrinos since 2015 and 2021, respectively. Recently, we performed a joint study between KamLAND and Super-Kamiokande on pre-supernova neutrino detection. A pre-supernova alert system combining the KamLAND detector and the Super-Kamiokande detector was developed and put into operation, which can provide a supernova alert to the astrophysics community. Fully leveraging the complementary properties of these two detectors, the combined alert is expected to resolve a pre-supernova neutrino signal from a 15 M$_{\odot}$ star within 510 pc of the Earth, at a significance level corresponding to a false alarm rate of no more than 1 per century. For a Betelgeuse-like model with optimistic parameters, it can provide early warnings up to 12 hours in advance. △ Less

Submitted 1 July, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

Comments: Resubmitted to ApJ. 22 pages, 16 figures, for more information about the combined pre-supernova alert system, see https://www.lowbg.org/presnalarm/

arXiv:2404.09198 [pdf, ps, other]

Unsourced Random Access in MIMO Quasi-Static Rayleigh Fading Channels with Finite Blocklength

Authors: Junyuan Gao, Yongpeng Wu, Giuseppe Caire, Wei Yang, Wenjun Zhang

Abstract: This paper explores the fundamental limits of unsourced random access (URA) with a random and unknown number ${\rm{K}}_a$ of active users in MIMO quasi-static Rayleigh fading channels. First, we derive an upper bound on the probability of incorrectly estimating the number of active users. We prove that it exponentially decays with the number of receive antennas and eventually vanishes, whereas rea… ▽ More This paper explores the fundamental limits of unsourced random access (URA) with a random and unknown number ${\rm{K}}_a$ of active users in MIMO quasi-static Rayleigh fading channels. First, we derive an upper bound on the probability of incorrectly estimating the number of active users. We prove that it exponentially decays with the number of receive antennas and eventually vanishes, whereas reaches a plateau as the power and blocklength increase. Then, we derive non-asymptotic achievability and converse bounds on the minimum energy-per-bit required by each active user to reliably transmit $J$ bits with blocklength $n$. Numerical results verify the tightness of our bounds, suggesting that they provide benchmarks to evaluate existing schemes. The extra required energy-per-bit due to the uncertainty of the number of active users decreases as $\mathbb{E}[{\rm{K}}_a]$ increases. Compared to random access with individual codebooks, the URA paradigm achieves higher spectral and energy efficiency. Moreover, using codewords distributed on a sphere is shown to outperform the Gaussian random coding scheme in the non-asymptotic regime. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: Accepted by ISIT 2024

arXiv:2404.08725 [pdf, other]

Development of a data overflow protection system for Super-Kamiokande to maximize data from nearby supernovae

Authors: M. Mori, K. Abe, Y. Hayato, K. Hiraide, K. Hosokawa, K. Ieki, M. Ikeda, J. Kameda, Y. Kanemura, R. Kaneshima, Y. Kashiwagi, Y. Kataoka, S. Miki, S. Mine, M. Miura, S. Moriyama, Y. Nakano, M. Nakahata, S. Nakayama, Y. Noguchi, K. Okamoto, K. Sato, H. Sekiya, H. Shiba, K. Shimizu , et al. (230 additional authors not shown)

Abstract: Neutrinos from very nearby supernovae, such as Betelgeuse, are expected to generate more than ten million events over 10\,s in Super-Kamokande (SK). At such large event rates, the buffers of the SK analog-to-digital conversion board (QBEE) will overflow, causing random loss of data that is critical for understanding the dynamics of the supernova explosion mechanism. In order to solve this problem,… ▽ More Neutrinos from very nearby supernovae, such as Betelgeuse, are expected to generate more than ten million events over 10\,s in Super-Kamokande (SK). At such large event rates, the buffers of the SK analog-to-digital conversion board (QBEE) will overflow, causing random loss of data that is critical for understanding the dynamics of the supernova explosion mechanism. In order to solve this problem, two new DAQ modules were developed to aid in the observation of very nearby supernovae. The first of these, the SN module, is designed to save only the number of hit PMTs during a supernova burst and the second, the Veto module, prescales the high rate neutrino events to prevent the QBEE from overflowing based on information from the SN module. In the event of a very nearby supernova, these modules allow SK to reconstruct the time evolution of the neutrino event rate from beginning to end using both QBEE and SN module data. This paper presents the development and testing of these modules together with an analysis of supernova-like data generated with a flashing laser diode. We demonstrate that the Veto module successfully prevents DAQ overflows for Betelgeuse-like supernovae as well as the long-term stability of the new modules. During normal running the Veto module is found to issue DAQ vetos a few times per month resulting in a total dead time less than 1\,ms, and does not influence ordinary operations. Additionally, using simulation data we find that supernovae closer than 800~pc will trigger Veto module resulting in a prescaling of the observed neutrino data. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 28 pages, 18 figures. Submitted to PTEP

arXiv:2404.08365 [pdf, other]

Estimation and Inference for Three-Dimensional Panel Data Models

Authors: Guohua Feng, Jiti Gao, Fei Liu, Bin Peng

Abstract: Hierarchical panel data models have recently garnered significant attention. This study contributes to the relevant literature by introducing a novel three-dimensional (3D) hierarchical panel data model, which integrates panel regression with three sets of latent factor structures: one set of global factors and two sets of local factors. Instead of aggregating latent factors from various nodes, as… ▽ More Hierarchical panel data models have recently garnered significant attention. This study contributes to the relevant literature by introducing a novel three-dimensional (3D) hierarchical panel data model, which integrates panel regression with three sets of latent factor structures: one set of global factors and two sets of local factors. Instead of aggregating latent factors from various nodes, as seen in the literature of distributed principal component analysis (PCA), we propose an estimation approach capable of recovering the parameters of interest and disentangling latent factors at different levels and across different dimensions. We establish an asymptotic theory and provide a bootstrap procedure to obtain inference for the parameters of interest while accommodating various types of cross-sectional dependence and time series autocorrelation. Finally, we demonstrate the applicability of our framework by examining productivity convergence in manufacturing industries worldwide. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2404.06047 [pdf, other]

A Systematic Literature Survey of Sparse Matrix-Vector Multiplication

Authors: Jianhua Gao, Bingjie Liu, Weixing Ji, Hua Huang

Abstract: Sparse matrix-vector multiplication (SpMV) is a crucial computing kernel with widespread applications in iterative algorithms. Over the past decades, research on SpMV optimization has made remarkable strides, giving rise to various optimization contributions. However, the comprehensive and systematic literature survey that introduces, analyzes, discusses, and summarizes the advancements of SpMV in… ▽ More Sparse matrix-vector multiplication (SpMV) is a crucial computing kernel with widespread applications in iterative algorithms. Over the past decades, research on SpMV optimization has made remarkable strides, giving rise to various optimization contributions. However, the comprehensive and systematic literature survey that introduces, analyzes, discusses, and summarizes the advancements of SpMV in recent years is currently lacking. Aiming to fill this gap, this paper compares existing techniques and analyzes their strengths and weaknesses. We begin by highlighting two representative applications of SpMV, then conduct an in-depth overview of the important techniques that optimize SpMV on modern architectures, which we specifically classify as classic, auto-tuning, machine learning, and mixed-precision-based optimization. We also elaborate on the hardware-based architectures, including CPU, GPU, FPGA, processing in Memory, heterogeneous, and distributed platforms. We present a comprehensive experimental evaluation that compares the performance of state-of-the-art SpMV implementations. Based on our findings, we identify several challenges and point out future research directions. This survey is intended to provide researchers with a comprehensive understanding of SpMV optimization on modern architectures and provide guidance for future work. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 34 pages, 18 figures, 16 tables

MSC Class: 68-02; 68W10; 65F50 ACM Class: A.1; D.1.3; G.1.3

arXiv:2404.05861 [pdf, other]

The increasing fragmentation of global science limits the diffusion of ideas

Authors: Alexander J. Gates, Indraneel Mane, Jianjian Gao

Abstract: The global scientific landscape emerges from a complex interplay of collaboration and competition, where nations vie for dominance while simultaneously fostering the diffusion of knowledge on a global scale. This raises crucial questions: What underlying patterns govern international scientific recognition and influence? How does this structure impact knowledge dissemination? Traditional models vi… ▽ More The global scientific landscape emerges from a complex interplay of collaboration and competition, where nations vie for dominance while simultaneously fostering the diffusion of knowledge on a global scale. This raises crucial questions: What underlying patterns govern international scientific recognition and influence? How does this structure impact knowledge dissemination? Traditional models view the global scientific ecosystem through a core-periphery lens, with Western nations dominating knowledge production. Here, we investigate the dynamics of international scientific recognition through the lens of national preferences, introducing a novel signed measure to characterize national citation preferences and enabling a network analysis of international scientific recognition. We find that scientific recognition is related to cultural and political factors in addition to economic strength and scientific quality. Our analysis challenges the conventional core-periphery narrative, uncovering instead several communities of international knowledge production that are rapidly fragmenting the scientific recognition ecosystem. Moreover, we provide compelling evidence that this network significantly constrains the diffusion of ideas across international borders. The resulting network framework for global scientific recognition sheds light on the barriers and opportunities for collaboration, innovation, and the equitable recognition of scientific advancements, with significant consequences for policymakers seeking to foster inclusive and impactful international scientific endeavors. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 30 pages (main text), 3 figures (main text), 1 table (main text), 20 SI pages

arXiv:2404.01677 [pdf, other]

Towards Generalizable and Faithful Logic Reasoning over Natural Language via Resolution Refutation

Authors: Zhouhao Sun, Xiao Ding, Li Du, Bibo Cai, Jinglong Gao, Ting Liu, Qin Bing

Abstract: Large language models (LLMs) have achieved significant performance in various natural language reasoning tasks. However, they still struggle with performing first-order logic reasoning over formal logical theories expressed in natural language. This is because the previous LLMs-based reasoning systems have the theoretical incompleteness issue. As a result, it can only address a limited set of simp… ▽ More Large language models (LLMs) have achieved significant performance in various natural language reasoning tasks. However, they still struggle with performing first-order logic reasoning over formal logical theories expressed in natural language. This is because the previous LLMs-based reasoning systems have the theoretical incompleteness issue. As a result, it can only address a limited set of simple reasoning problems, which significantly decreases their generalization ability. To address this issue, we propose a novel framework, named Generalizable and Faithful Reasoner (GFaiR), which introduces the paradigm of resolution refutation. Resolution refutation has the capability to solve all first-order logic reasoning problems by extending reasoning rules and employing the principle of proof by contradiction, so our system's completeness can be improved by introducing resolution refutation. Experimental results demonstrate that our system outperforms previous works by achieving state-of-the-art performances in complex scenarios while maintaining performances in simple scenarios. Besides, we observe that GFaiR is faithful to its reasoning process. △ Less

Submitted 3 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: LREC-Coling 2024

arXiv:2404.01213 [pdf, ps, other]

Bifurcation on Fully Nonlinear Elliptic Equations and Systems

Authors: Jing Gao, Weijun Zhang, Zhitao Zhang

Abstract: In this paper, we study the following fully nonlinear elliptic equations \begin{equation*} \left\{\begin{array}{rl} \left(S_{k}(D^{2}u)\right)^{\frac1k}=λf(-u) & in\quadΩ\\ u=0 & on\quad \partialΩ\\ \end{array} \right. \end{equation*} and coupled systems \begin{equation*} \left\{\begin{array}{rl} (S_{k}(D^{2}u))^\frac1k=λg(-u,-v) & in\quadΩ\\ (S_{k}(D^{2}v))^\frac1k=λh(-u,-v) & in\quadΩ\\ u=v=0 &… ▽ More In this paper, we study the following fully nonlinear elliptic equations \begin{equation*} \left\{\begin{array}{rl} \left(S_{k}(D^{2}u)\right)^{\frac1k}=λf(-u) & in\quadΩ\\ u=0 & on\quad \partialΩ\\ \end{array} \right. \end{equation*} and coupled systems \begin{equation*} \left\{\begin{array}{rl} (S_{k}(D^{2}u))^\frac1k=λg(-u,-v) & in\quadΩ\\ (S_{k}(D^{2}v))^\frac1k=λh(-u,-v) & in\quadΩ\\ u=v=0 & on\quad \partialΩ\\ \end{array} \right. \end{equation*} dominated by $k$-Hessian operators, where $Ω$ is a $(k$-$1)$-convex bounded domain in $\mathbb{R}^{N}$, $λ$ is a non-negative parameter, $f:\left[0,+\infty\right)\rightarrow\left[0,+\infty\right)$ is a continuous function with zeros only at $0$ and $g,h:\left[0,+\infty\right)\times \left[0,+\infty\right)\rightarrow \left[0,+\infty\right)$ are continuous functions with zeros only at $(\cdot,0)$ and $(0,\cdot)$. We determine the interval of $λ$ about the existence, non-existence, uniqueness and multiplicity of $k$-convex solutions to the above problems according to various cases of $f,g,h$, which is a complete supplement to the known results in previous literature. In particular, the above results are also new for Laplacian and Monge-Ampère operators. We mainly use bifurcation theory, a-priori estimates, various maximum principles and technical strategies in the proof. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: comments are welcome!

arXiv:2404.00942 [pdf, other]

Evaluating the Factuality of Large Language Models using Large-Scale Knowledge Graphs

Authors: Xiaoze Liu, Feijie Wu, Tianyang Xu, Zhuo Chen, Yichi Zhang, Xiaoqian Wang, Jing Gao

Abstract: The advent of Large Language Models (LLMs) has significantly transformed the AI landscape, enhancing machine learning and AI capabilities. Factuality issue is a critical concern for LLMs, as they may generate factually incorrect responses. In this paper, we propose GraphEval to evaluate an LLM's performance using a substantially large test dataset. Specifically, the test dataset is retrieved from… ▽ More The advent of Large Language Models (LLMs) has significantly transformed the AI landscape, enhancing machine learning and AI capabilities. Factuality issue is a critical concern for LLMs, as they may generate factually incorrect responses. In this paper, we propose GraphEval to evaluate an LLM's performance using a substantially large test dataset. Specifically, the test dataset is retrieved from a large knowledge graph with more than 10 million facts without expensive human efforts. Unlike conventional methods that evaluate LLMs based on generated responses, GraphEval streamlines the evaluation process by creating a judge model to estimate the correctness of the answers given by the LLM. Our experiments demonstrate that the judge model's factuality assessment aligns closely with the correctness of the LLM's generated outputs, while also substantially reducing evaluation costs. Besides, our findings offer valuable insights into LLM performance across different metrics and highlight the potential for future improvements in ensuring the factual integrity of LLM outputs. The code is publicly available at https://github.com/xz-liu/GraphEval. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.00629 [pdf, other]

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

Authors: Lizhi Lin, Honglin Mu, Zenan Zhai, Minghan Wang, Yuxia Wang, Renxi Wang, Junjie Gao, Yixuan Zhang, Wanxiang Che, Timothy Baldwin, Xudong Han, Haonan Li

Abstract: Generative models are rapidly gaining popularity and being integrated into everyday applications, raising concerns over their safety issues as various vulnerabilities are exposed. Faced with the problem, the field of red teaming is experiencing fast-paced growth, which highlights the need for a comprehensive organization covering the entire pipeline and addressing emerging topics for the community… ▽ More Generative models are rapidly gaining popularity and being integrated into everyday applications, raising concerns over their safety issues as various vulnerabilities are exposed. Faced with the problem, the field of red teaming is experiencing fast-paced growth, which highlights the need for a comprehensive organization covering the entire pipeline and addressing emerging topics for the community. Our extensive survey, which examines over 120 papers, introduces a taxonomy of fine-grained attack strategies grounded in the inherent capabilities of language models. Additionally, we have developed the searcher framework that unifies various automatic red teaming approaches. Moreover, our survey covers novel areas including multimodal attacks and defenses, risks around multilingual models, overkill of harmless queries, and safety of downstream applications. We hope this survey can provide a systematic perspective on the field and unlock new areas of research. △ Less

Submitted 31 March, 2024; originally announced April 2024.

arXiv:2404.00405 [pdf, other]

doi 10.1145/3613905.3650786

A Taxonomy for Human-LLM Interaction Modes: An Initial Exploration

Authors: Jie Gao, Simret Araya Gebreegziabher, Kenny Tsu Wei Choo, Toby Jia-Jun Li, Simon Tangi Perrault, Thomas W. Malone

Abstract: With ChatGPT's release, conversational prompting has become the most popular form of human-LLM interaction. However, its effectiveness is limited for more complex tasks involving reasoning, creativity, and iteration. Through a systematic analysis of HCI papers published since 2021, we identified four key phases in the human-LLM interaction flow - planning, facilitating, iterating, and testing - to… ▽ More With ChatGPT's release, conversational prompting has become the most popular form of human-LLM interaction. However, its effectiveness is limited for more complex tasks involving reasoning, creativity, and iteration. Through a systematic analysis of HCI papers published since 2021, we identified four key phases in the human-LLM interaction flow - planning, facilitating, iterating, and testing - to precisely understand the dynamics of this process. Additionally, we have developed a taxonomy of four primary interaction modes: Mode 1: Standard Prompting, Mode 2: User Interface, Mode 3: Context-based, and Mode 4: Agent Facilitator. This taxonomy was further enriched using the "5W1H" guideline method, which involved a detailed examination of definitions, participant roles (Who), the phases that happened (When), human objectives and LLM abilities (What), and the mechanics of each interaction mode (How). We anticipate this taxonomy will contribute to the future design and evaluation of human-LLM interaction. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: 11 pages, 4 figures, 3 tables. Accepted at CHI Late-Breaking Work 2024

arXiv:2403.19881 [pdf, other]

IME: Integrating Multi-curvature Shared and Specific Embedding for Temporal Knowledge Graph Completion

Authors: Jiapu Wang, Zheng Cui, Boyue Wang, Shirui Pan, Junbin Gao, Baocai Yin, Wen Gao

Abstract: Temporal Knowledge Graphs (TKGs) incorporate a temporal dimension, allowing for a precise capture of the evolution of knowledge and reflecting the dynamic nature of the real world. Typically, TKGs contain complex geometric structures, with various geometric structures interwoven. However, existing Temporal Knowledge Graph Completion (TKGC) methods either model TKGs in a single space or neglect the… ▽ More Temporal Knowledge Graphs (TKGs) incorporate a temporal dimension, allowing for a precise capture of the evolution of knowledge and reflecting the dynamic nature of the real world. Typically, TKGs contain complex geometric structures, with various geometric structures interwoven. However, existing Temporal Knowledge Graph Completion (TKGC) methods either model TKGs in a single space or neglect the heterogeneity of different curvature spaces, thus constraining their capacity to capture these intricate geometric structures. In this paper, we propose a novel Integrating Multi-curvature shared and specific Embedding (IME) model for TKGC tasks. Concretely, IME models TKGs into multi-curvature spaces, including hyperspherical, hyperbolic, and Euclidean spaces. Subsequently, IME incorporates two key properties, namely space-shared property and space-specific property. The space-shared property facilitates the learning of commonalities across different curvature spaces and alleviates the spatial gap caused by the heterogeneous nature of multi-curvature spaces, while the space-specific property captures characteristic features. Meanwhile, IME proposes an Adjustable Multi-curvature Pooling (AMP) approach to effectively retain important information. Furthermore, IME innovatively designs similarity, difference, and structure loss functions to attain the stated objective. Experimental results clearly demonstrate the superior performance of IME over existing state-of-the-art TKGC models. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.19739 [pdf, other]

Detecting Light Dark Matter with Kinetic Inductance Detectors

Authors: Jiansong Gao, Yonit Hochberg, Benjamin V. Lehmann, Sae Woo Nam, Paul Szypryt, Michael R. Vissers, Tao Xu

Abstract: Superconducting detectors are a promising technology for probing dark matter at extremely low masses, where dark matter interactions are currently unconstrained. Realizing the potential of such detectors requires new readout technologies to achieve the lowest possible thresholds for deposited energy. Here we perform a prototype search for dark matter--electron interactions with kinetic inductance… ▽ More Superconducting detectors are a promising technology for probing dark matter at extremely low masses, where dark matter interactions are currently unconstrained. Realizing the potential of such detectors requires new readout technologies to achieve the lowest possible thresholds for deposited energy. Here we perform a prototype search for dark matter--electron interactions with kinetic inductance detectors (KIDs), a class of superconducting detector originally designed for infrared astronomy applications. We demonstrate that existing KIDs can achieve effective thresholds as low as 0.2 eV, and we use existing data to set new dark matter constraints. The relative maturity of the technology underlying KIDs means that this platform can be scaled significantly with existing tools, enabling powerful new searches in the coming years. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: 6+6 pages, 4+3 figures

Report number: MIT-CTP/5654

arXiv:2403.19094 [pdf, other]

Learning From Correctness Without Prompting Makes LLM Efficient Reasoner

Authors: Yuxuan Yao, Han Wu, Zhijiang Guo, Biyan Zhou, Jiahui Gao, Sichun Luo, Hanxu Hou, Xiaojin Fu, Linqi Song

Abstract: Large language models (LLMs) have demonstrated outstanding performance across various tasks, yet they still exhibit limitations such as hallucination, unfaithful reasoning, and toxic content. One potential approach to mitigate these issues is learning from human or external feedback (e.g. tools). In this paper, we introduce an intrinsic self-correct reasoning framework for LLMs that eliminates the… ▽ More Large language models (LLMs) have demonstrated outstanding performance across various tasks, yet they still exhibit limitations such as hallucination, unfaithful reasoning, and toxic content. One potential approach to mitigate these issues is learning from human or external feedback (e.g. tools). In this paper, we introduce an intrinsic self-correct reasoning framework for LLMs that eliminates the need for human feedback, external tools, and handcraft prompts. The proposed framework, based on a multi-step reasoning paradigm \textbf{Le}arning from \textbf{Co}rrectness (\textsc{LeCo}), improves reasoning performance without needing to learn from errors. This paradigm prioritizes learning from correct reasoning steps, and a unique method to measure confidence for each reasoning step based on generation logits. Experimental results across various multi-step reasoning tasks demonstrate the effectiveness of the framework in improving reasoning performance with reduced token consumption. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.18528 [pdf, other]

Limited Attention Allocation in a Stochastic Linear Quadratic System with Multiplicative Noise

Authors: Xiangyu Cui, Jianjun Gao, Lingjie Kong

Abstract: This study addresses limited attention allocation in a stochastic linear quadratic system with multiplicative noise. Our approach enables strategic resource allocation to enhance noise estimation and improve control decisions. We provide analytical optimal control and propose a numerical method for optimal attention allocation. Additionally, we apply our ffndings to dynamic mean-variance portfolio… ▽ More This study addresses limited attention allocation in a stochastic linear quadratic system with multiplicative noise. Our approach enables strategic resource allocation to enhance noise estimation and improve control decisions. We provide analytical optimal control and propose a numerical method for optimal attention allocation. Additionally, we apply our ffndings to dynamic mean-variance portfolio selection, showing effective resource allocation across time periods and factors, providing valuable insights for investors. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.17753 [pdf, other]

CCDSReFormer: Traffic Flow Prediction with a Criss-Crossed Dual-Stream Enhanced Rectified Transformer Model

Authors: Zhiqi Shao, Michael G. H. Bell, Ze Wang, D. Glenn Geers, Xusheng Yao, Junbin Gao

Abstract: Accurate, and effective traffic forecasting is vital for smart traffic systems, crucial in urban traffic planning and management. Current Spatio-Temporal Transformer models, despite their prediction capabilities, struggle with balancing computational efficiency and accuracy, favoring global over local information, and handling spatial and temporal data separately, limiting insight into complex int… ▽ More Accurate, and effective traffic forecasting is vital for smart traffic systems, crucial in urban traffic planning and management. Current Spatio-Temporal Transformer models, despite their prediction capabilities, struggle with balancing computational efficiency and accuracy, favoring global over local information, and handling spatial and temporal data separately, limiting insight into complex interactions. We introduce the Criss-Crossed Dual-Stream Enhanced Rectified Transformer model (CCDSReFormer), which includes three innovative modules: Enhanced Rectified Spatial Self-attention (ReSSA), Enhanced Rectified Delay Aware Self-attention (ReDASA), and Enhanced Rectified Temporal Self-attention (ReTSA). These modules aim to lower computational needs via sparse attention, focus on local information for better traffic dynamics understanding, and merge spatial and temporal insights through a unique learning method. Extensive tests on six real-world datasets highlight CCDSReFormer's superior performance. An ablation study also confirms the significant impact of each component on the model's predictive accuracy, showcasing our model's ability to forecast traffic flow effectively. △ Less

Submitted 29 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

Comments: 18 pages

ACM Class: I.2.0

arXiv:2403.15750 [pdf, other]

iDAT: inverse Distillation Adapter-Tuning

Authors: Jiacheng Ruan, Jingsheng Gao, Mingye Xie, Daize Dong, Suncheng Xiang, Ting Liu, Yuzhuo Fu

Abstract: Adapter-Tuning (AT) method involves freezing a pre-trained model and introducing trainable adapter modules to acquire downstream knowledge, thereby calibrating the model for better adaptation to downstream tasks. This paper proposes a distillation framework for the AT method instead of crafting a carefully designed adapter module, which aims to improve fine-tuning performance. For the first time,… ▽ More Adapter-Tuning (AT) method involves freezing a pre-trained model and introducing trainable adapter modules to acquire downstream knowledge, thereby calibrating the model for better adaptation to downstream tasks. This paper proposes a distillation framework for the AT method instead of crafting a carefully designed adapter module, which aims to improve fine-tuning performance. For the first time, we explore the possibility of combining the AT method with knowledge distillation. Via statistical analysis, we observe significant differences in the knowledge acquisition between adapter modules of different models. Leveraging these differences, we propose a simple yet effective framework called inverse Distillation Adapter-Tuning (iDAT). Specifically, we designate the smaller model as the teacher and the larger model as the student. The two are jointly trained, and online knowledge distillation is applied to inject knowledge of different perspective to student model, and significantly enhance the fine-tuning performance on downstream tasks. Extensive experiments on the VTAB-1K benchmark with 19 image classification tasks demonstrate the effectiveness of iDAT. The results show that using existing AT method within our iDAT framework can further yield a 2.66% performance gain, with only an additional 0.07M trainable parameters. Our approach compares favorably with state-of-the-arts without bells and whistles. Our code is available at https://github.com/JCruan519/iDAT. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: 10 pages, 9 figures, 13 tables. This paper has been accepted by ICME 2024

arXiv:2403.15712 [pdf, other]

doi 10.1109/LRA.2024.3379865

PNAS-MOT: Multi-Modal Object Tracking with Pareto Neural Architecture Search

Authors: Chensheng Peng, Zhaoyu Zeng, Jinling Gao, Jundong Zhou, Masayoshi Tomizuka, Xinbing Wang, Chenghu Zhou, Nanyang Ye

Abstract: Multiple object tracking is a critical task in autonomous driving. Existing works primarily focus on the heuristic design of neural networks to obtain high accuracy. As tracking accuracy improves, however, neural networks become increasingly complex, posing challenges for their practical application in real driving scenarios due to the high level of latency. In this paper, we explore the use of th… ▽ More Multiple object tracking is a critical task in autonomous driving. Existing works primarily focus on the heuristic design of neural networks to obtain high accuracy. As tracking accuracy improves, however, neural networks become increasingly complex, posing challenges for their practical application in real driving scenarios due to the high level of latency. In this paper, we explore the use of the neural architecture search (NAS) methods to search for efficient architectures for tracking, aiming for low real-time latency while maintaining relatively high accuracy. Another challenge for object tracking is the unreliability of a single sensor, therefore, we propose a multi-modal framework to improve the robustness. Experiments demonstrate that our algorithm can run on edge devices within lower latency constraints, thus greatly reducing the computational requirements for multi-modal object tracking while keeping lower latency. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: IEEE Robotics and Automation Letters 2024. Code is available at https://github.com/PholyPeng/PNAS-MOT

Journal ref: IEEE Robotics and Automation Letters, 2024

arXiv:2403.15385 [pdf, other]

LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis

Authors: Kevin Xie, Jonathan Lorraine, Tianshi Cao, Jun Gao, James Lucas, Antonio Torralba, Sanja Fidler, Xiaohui Zeng

Abstract: Recent text-to-3D generation approaches produce impressive 3D results but require time-consuming optimization that can take up to an hour per prompt. Amortized methods like ATT3D optimize multiple prompts simultaneously to improve efficiency, enabling fast text-to-3D synthesis. However, they cannot capture high-frequency geometry and texture details and struggle to scale to large prompt sets, so t… ▽ More Recent text-to-3D generation approaches produce impressive 3D results but require time-consuming optimization that can take up to an hour per prompt. Amortized methods like ATT3D optimize multiple prompts simultaneously to improve efficiency, enabling fast text-to-3D synthesis. However, they cannot capture high-frequency geometry and texture details and struggle to scale to large prompt sets, so they generalize poorly. We introduce LATTE3D, addressing these limitations to achieve fast, high-quality generation on a significantly larger prompt set. Key to our method is 1) building a scalable architecture and 2) leveraging 3D data during optimization through 3D-aware diffusion priors, shape regularization, and model initialization to achieve robustness to diverse and complex training prompts. LATTE3D amortizes both neural field and textured surface generation to produce highly detailed textured meshes in a single forward pass. LATTE3D generates 3D objects in 400ms, and can be further enhanced with fast test-time optimization. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: See the project website at https://research.nvidia.com/labs/toronto-ai/LATTE3D/

MSC Class: 68T45 ACM Class: I.2.6; I.2.7; I.3.6; I.3.7

arXiv:2403.14000 [pdf, other]

Visual Imitation Learning of Task-Oriented Object Grasping and Rearrangement

Authors: Yichen Cai, Jianfeng Gao, Christoph Pohl, Tamim Asfour

Abstract: Task-oriented object grasping and rearrangement are critical skills for robots to accomplish different real-world manipulation tasks. However, they remain challenging due to partial observations of the objects and shape variations in categorical objects. In this paper, we propose the Multi-feature Implicit Model (MIMO), a novel object representation that encodes multiple spatial features between a… ▽ More Task-oriented object grasping and rearrangement are critical skills for robots to accomplish different real-world manipulation tasks. However, they remain challenging due to partial observations of the objects and shape variations in categorical objects. In this paper, we propose the Multi-feature Implicit Model (MIMO), a novel object representation that encodes multiple spatial features between a point and an object in an implicit neural field. Training such a model on multiple features ensures that it embeds the object shapes consistently in different aspects, thus improving its performance in object shape reconstruction from partial observation, shape similarity measure, and modeling spatial relations between objects. Based on MIMO, we propose a framework to learn task-oriented object grasping and rearrangement from single or multiple human demonstration videos. The evaluations in simulation show that our approach outperforms the state-of-the-art methods for multi- and single-view observations. Real-world experiments demonstrate the efficacy of our approach in one- and few-shot imitation learning of manipulation tasks. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.13696 [pdf, ps, other]

Electron wave spin in a cavity

Authors: Ju Gao, Fang Shen

Abstract: Our study reveals electron spin in a cavity as a stable circulating current density, characterized by a torus topology. This current density circulates concentrically beyond the cavity boundary, illustrating the concept of evanescent wave spin. While the interaction with a uniform magnetic field aligns with established spin-field observations, our analysis of regional contributions deviates from p… ▽ More Our study reveals electron spin in a cavity as a stable circulating current density, characterized by a torus topology. This current density circulates concentrically beyond the cavity boundary, illustrating the concept of evanescent wave spin. While the interaction with a uniform magnetic field aligns with established spin-field observations, our analysis of regional contributions deviates from particle-based spin predictions. The integration of charge and spin properties into a single Lorentz covariant entity suggests that the electron wave constitutes the fundamental and deterministic reality of the electron. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 12 pages, 2 figures

arXiv:2403.13443 [pdf, other]

Fast-Poly: A Fast Polyhedral Framework For 3D Multi-Object Tracking

Authors: Xiaoyu Li, Dedong Liu, Lijun Zhao, Yitao Wu, Xian Wu, Jinghan Gao

Abstract: 3D Multi-Object Tracking (MOT) captures stable and comprehensive motion states of surrounding obstacles, essential for robotic perception. However, current 3D trackers face issues with accuracy and latency consistency. In this paper, we propose Fast-Poly, a fast and effective filter-based method for 3D MOT. Building upon our previous work Poly-MOT, Fast-Poly addresses object rotational anisotropy… ▽ More 3D Multi-Object Tracking (MOT) captures stable and comprehensive motion states of surrounding obstacles, essential for robotic perception. However, current 3D trackers face issues with accuracy and latency consistency. In this paper, we propose Fast-Poly, a fast and effective filter-based method for 3D MOT. Building upon our previous work Poly-MOT, Fast-Poly addresses object rotational anisotropy in 3D space, enhances local computation densification, and leverages parallelization technique, improving inference speed and precision. Fast-Poly is extensively tested on two large-scale tracking benchmarks with Python implementation. On the nuScenes dataset, Fast-Poly achieves new state-of-the-art performance with 75.8% AMOTA among all methods and can run at 34.2 FPS on a personal CPU. On the Waymo dataset, Fast-Poly exhibits competitive accuracy with 63.6% MOTA and impressive inference speed (35.5 FPS). The source code is publicly available at https://github.com/lixiaoyu2000/FastPoly. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 1st on the NuScenes Tracking benchmark with 75.8 AMOTA and 34.2 FPS

arXiv:2403.12982 [pdf]

Knowledge-Reuse Transfer Learning Methods in Molecular and Material Science

Authors: An Chen, Zhilong Wang, Karl Luigi Loza Vidaurre, Yanqiang Han, Simin Ye, Kehao Tao, Shiwei Wang, Jing Gao, Jinjin Li

Abstract: Molecules and materials are the foundation for the development of modern advanced industries such as energy storage systems and semiconductor devices. However, traditional trial-and-error methods or theoretical calculations are highly resource-intensive, and extremely long R&D (Research and Development) periods cannot meet the urgent need for molecules/materials in industrial development. Machine… ▽ More Molecules and materials are the foundation for the development of modern advanced industries such as energy storage systems and semiconductor devices. However, traditional trial-and-error methods or theoretical calculations are highly resource-intensive, and extremely long R&D (Research and Development) periods cannot meet the urgent need for molecules/materials in industrial development. Machine learning (ML) methods based on big data are expected to break this dilemma. However, the difficulty in constructing large-scale datasets of new molecules/materials due to the high cost of data acquisition and annotation limits the development of machine learning. The application of transfer learning lowers the data requirements for model training, which makes transfer learning stand out in researches addressing data quality issues. In this review, we summarize recent advances in transfer learning related to molecular and materials science. We focus on the application of transfer learning methods for the discovery of advanced molecules/materials, particularly, the construction of transfer learning frameworks for different systems, and how transfer learning can enhance the performance of models. In addition, the challenges of transfer learning are also discussed. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Comments: 42 pages, 10 figures

arXiv:2403.12945 [pdf, other]

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

Authors: Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park , et al. (74 additional authors not shown)

Abstract: The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu… ▽ More The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a result, even the most general robot manipulation policies today are mostly trained on data collected in a small number of environments with limited scene and task diversity. In this work, we introduce DROID (Distributed Robot Interaction Dataset), a diverse robot manipulation dataset with 76k demonstration trajectories or 350 hours of interaction data, collected across 564 scenes and 84 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance and improved generalization ability. We open source the full dataset, policy learning code, and a detailed guide for reproducing our robot hardware setup. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: Project website: https://droid-dataset.github.io/

arXiv:2403.12919 [pdf, ps, other]

Generalized Ramsey--Turán density for cliques

Authors: Jun Gao, Suyun Jiang, Hong Liu, Maya Sankar

Abstract: We study the generalized Ramsey--Turán function $\mathrm{RT}(n,K_s,K_t,o(n))$, which is the maximum possible number of copies of $K_s$ in an $n$-vertex $K_t$-free graph with independence number $o(n)$. The case when $s=2$ was settled by Erd{ő}s, S{ó}s, Bollob{á}s, Hajnal, and Szemerédi in the 1980s. We combinatorially resolve the general case for all $s\ge 3$, showing that the (asymptotic) extrema… ▽ More We study the generalized Ramsey--Turán function $\mathrm{RT}(n,K_s,K_t,o(n))$, which is the maximum possible number of copies of $K_s$ in an $n$-vertex $K_t$-free graph with independence number $o(n)$. The case when $s=2$ was settled by Erd{ő}s, S{ó}s, Bollob{á}s, Hajnal, and Szemerédi in the 1980s. We combinatorially resolve the general case for all $s\ge 3$, showing that the (asymptotic) extremal graphs for this problem have simple (bounded) structures. In particular, it implies that the extremal structures follow a periodic pattern when $t$ is much larger than $s$. Our results disprove a conjecture of Balogh, Liu, and Sharifzadeh and show that a relaxed version does hold. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: 28 pages

arXiv:2403.12109 [pdf, other]

GCAM: Gaussian and causal-attention model of food fine-grained recognition

Authors: Guohang Zhuang, Yue Hu, Tianxing Yan, JiaZhan Gao

Abstract: Currently, most food recognition relies on deep learning for category classification. However, these approaches struggle to effectively distinguish between visually similar food samples, highlighting the pressing need to address fine-grained issues in food recognition. To mitigate these challenges, we propose the adoption of a Gaussian and causal-attention model for fine-grained object recognition… ▽ More Currently, most food recognition relies on deep learning for category classification. However, these approaches struggle to effectively distinguish between visually similar food samples, highlighting the pressing need to address fine-grained issues in food recognition. To mitigate these challenges, we propose the adoption of a Gaussian and causal-attention model for fine-grained object recognition.In particular, we train to obtain Gaussian features over target regions, followed by the extraction of fine-grained features from the objects, thereby enhancing the feature mapping capabilities of the target regions. To counteract data drift resulting from uneven data distributions, we employ a counterfactual reasoning approach. By using counterfactual interventions, we analyze the impact of the learned image attention mechanism on network predictions, enabling the network to acquire more useful attention weights for fine-grained image recognition. Finally, we design a learnable loss strategy to balance training stability across various modules, ultimately improving the accuracy of the final target recognition. We validate our approach on four relevant datasets, demonstrating its excellent performance across these four datasets.We experimentally show that GCAM surpasses state-of-the-art methods on the ETH-FOOD101, UECFOOD256, and Vireo-FOOD172 datasets. Furthermore, our approach also achieves state-of-the-art performance on the CUB-200 dataset. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: 23 pages, 11 figures

arXiv:2403.11354 [pdf, other]

Kinetic inductance traveling wave amplifier designs for practical microwave readout applications

Authors: A. Giachero, M. Visser, J. Wheeler, L. Howe, J. Gao, J. Austermann, J. Hubmayr, A. Nucciotti, J. Ullom

Abstract: A Kinetic Inductance Traveling Wave amplifier (KIT) utilizes the nonlinear kinetic inductance of superconducting films, particularly Niobium Titanium Nitride (NbTiN), for parametric amplification. These amplifiers achieve remarkable performance in terms of gain, bandwidth, compression power, and frequently approach the quantum limit for noise. However, most KIT demonstrations have been isolated fr… ▽ More A Kinetic Inductance Traveling Wave amplifier (KIT) utilizes the nonlinear kinetic inductance of superconducting films, particularly Niobium Titanium Nitride (NbTiN), for parametric amplification. These amplifiers achieve remarkable performance in terms of gain, bandwidth, compression power, and frequently approach the quantum limit for noise. However, most KIT demonstrations have been isolated from practical device readout systems. Using a KIT as the first amplifier in the readout chain of an unoptimized microwave SQUID multiplexer coupled to a transition-edge sensor microcalorimeter we see an initial improvement in the flux noise. One challenge in KIT integration is the considerable microwave pump power required to drive the non-linearity. To address this, we have initiated efforts to reduce the pump power by using thinner NbTiN films and an inverted microstrip transmission line design. In this article, we present the new transmission line design, fabrication procedure, and initial device characterization -- including gain and added noise. These devices exhibit over 10 dB of gain with a 3 dB bandwidth of approximately 5.5-7.25 GHz, a maximum practical gain of 12 dB and typical gain ripple under 4 dB peak-to-peak. We observe an appreciable impedance mismatch in the NbTiN transmission line, which is likely the source of the majority of the gain ripple. Finally we perform an initial noise characterization and demonstrate system-added noise of three quanta or less over nearly the entire 3 dB bandwidth. △ Less

Submitted 20 March, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.11136 [pdf, other]

Is Contrastive Learning Necessary? A Study of Data Augmentation vs Contrastive Learning in Sequential Recommendation

Authors: Peilin Zhou, You-Liang Huang, Yueqi Xie, Jingqi Gao, Shoujin Wang, Jae Boum Kim, Sunghun Kim

Abstract: Sequential recommender systems (SRS) are designed to predict users' future behaviors based on their historical interaction data. Recent research has increasingly utilized contrastive learning (CL) to leverage unsupervised signals to alleviate the data sparsity issue in SRS. In general, CL-based SRS first augments the raw sequential interaction data by using data augmentation strategies and employs… ▽ More Sequential recommender systems (SRS) are designed to predict users' future behaviors based on their historical interaction data. Recent research has increasingly utilized contrastive learning (CL) to leverage unsupervised signals to alleviate the data sparsity issue in SRS. In general, CL-based SRS first augments the raw sequential interaction data by using data augmentation strategies and employs a contrastive training scheme to enforce the representations of those sequences from the same raw interaction data to be similar. Despite the growing popularity of CL, data augmentation, as a basic component of CL, has not received sufficient attention. This raises the question: Is it possible to achieve superior recommendation results solely through data augmentation? To answer this question, we benchmark eight widely used data augmentation strategies, as well as state-of-the-art CL-based SRS methods, on four real-world datasets under both warm- and cold-start settings. Intriguingly, the conclusion drawn from our study is that, certain data augmentation strategies can achieve similar or even superior performance compared with some CL-based methods, demonstrating the potential to significantly alleviate the data sparsity issue with fewer computational overhead. We hope that our study can further inspire more fundamental studies on the key functional components of complex CL techniques. Our processed datasets and codes are available at https://github.com/AIM-SE/DA4Rec. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: Accepted by WWW 2024

arXiv:2403.10770 [pdf, ps, other]

Local existence and uniqueness of solution to the two-dimensional inhomogeneous Prandtl equations by energy method

Authors: Jincheng Gao, Lianyun Peng, Zheng-an Yao

Abstract: In this paper, we consider the local existence and uniqueness result for the inhomogeneous Prandtl equations in dimension two by energy method. First of all, for the homogeneous case, the local-in-time well-posedness theory of unsteady Prandtl equations was obtained by [Alexandre, Wang, Xu, Yang, J. Am. Math. Soc., 28 (3), 745-784 (2015)] and [Masmoudi, Wong, Comm. Pure Appl. Math., 68 (10), 1683-… ▽ More In this paper, we consider the local existence and uniqueness result for the inhomogeneous Prandtl equations in dimension two by energy method. First of all, for the homogeneous case, the local-in-time well-posedness theory of unsteady Prandtl equations was obtained by [Alexandre, Wang, Xu, Yang, J. Am. Math. Soc., 28 (3), 745-784 (2015)] and [Masmoudi, Wong, Comm. Pure Appl. Math., 68 (10), 1683-1741 (2015)] independently by energy method without any transformation. However, for the inhomogeneous case, the appearance of density will create some new difficulties for us to overcome the loss of tangential derivative of horizontal velocity. Thus, our first result is to overcome the loss of tangential derivative such that one can establish the local-in-time well-posedness result for the inhomogeneous Prandtl equations by energy method. Secondly, for the homogeneous case, the local-in-x well-posedness in higher regular space for the steady Prandtl equations was obtained by [Guo, Iyer, Comm. Math. Phys., 382 (3), 1403-447 (2021)] by energy method since they firstly found the good quantity(called `quotient'). With the help of this quotient, our second result is to establish the local-in-x well-posedness in higher regular Sobolev space for the steady inhomogeneous Prandtl equations. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Showing 101–150 of 2,123 results for author: Gao, J