-
Towards Continual Knowledge Graph Embedding via Incremental Distillation
Authors:
Jiajun Liu,
Wenjun Ke,
Peng Wang,
Ziyu Shang,
Jinhua Gao,
Guozheng Li,
Ke Ji,
Yanhe Liu
Abstract:
Traditional knowledge graph embedding (KGE) methods typically require preserving the entire knowledge graph (KG) with significant training costs when new knowledge emerges. To address this issue, the continual knowledge graph embedding (CKGE) task has been proposed to train the KGE model by learning emerging knowledge efficiently while simultaneously preserving decent old knowledge. However, the e…
▽ More
Traditional knowledge graph embedding (KGE) methods typically require preserving the entire knowledge graph (KG) with significant training costs when new knowledge emerges. To address this issue, the continual knowledge graph embedding (CKGE) task has been proposed to train the KGE model by learning emerging knowledge efficiently while simultaneously preserving decent old knowledge. However, the explicit graph structure in KGs, which is critical for the above goal, has been heavily ignored by existing CKGE methods. On the one hand, existing methods usually learn new triples in a random order, destroying the inner structure of new KGs. On the other hand, old triples are preserved with equal priority, failing to alleviate catastrophic forgetting effectively. In this paper, we propose a competitive method for CKGE based on incremental distillation (IncDE), which considers the full use of the explicit graph structure in KGs. First, to optimize the learning order, we introduce a hierarchical strategy, ranking new triples for layer-by-layer learning. By employing the inter- and intra-hierarchical orders together, new triples are grouped into layers based on the graph structure features. Secondly, to preserve the old knowledge effectively, we devise a novel incremental distillation mechanism, which facilitates the seamless transfer of entity representations from the previous layer to the next one, promoting old knowledge preservation. Finally, we adopt a two-stage training paradigm to avoid the over-corruption of old knowledge influenced by under-trained new knowledge. Experimental results demonstrate the superiority of IncDE over state-of-the-art baselines. Notably, the incremental distillation mechanism contributes to improvements of 0.2%-6.5% in the mean reciprocal rank (MRR) score.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Prospective Role of Foundation Models in Advancing Autonomous Vehicles
Authors:
Jianhua Wu,
Bingzhao Gao,
Jincheng Gao,
Jianhao Yu,
Hongqing Chu,
Qiankun Yu,
Xun Gong,
Yi Chang,
H. Eric Tseng,
Hong Chen,
Jie Chen
Abstract:
With the development of artificial intelligence and breakthroughs in deep learning, large-scale Foundation Models (FMs), such as GPT, Sora, etc., have achieved remarkable results in many fields including natural language processing and computer vision. The application of FMs in autonomous driving holds considerable promise. For example, they can contribute to enhancing scene understanding and reas…
▽ More
With the development of artificial intelligence and breakthroughs in deep learning, large-scale Foundation Models (FMs), such as GPT, Sora, etc., have achieved remarkable results in many fields including natural language processing and computer vision. The application of FMs in autonomous driving holds considerable promise. For example, they can contribute to enhancing scene understanding and reasoning. By pre-training on rich linguistic and visual data, FMs can understand and interpret various elements in a driving scene, and provide cognitive reasoning to give linguistic and action instructions for driving decisions and planning. Furthermore, FMs can augment data based on the understanding of driving scenarios to provide feasible scenes of those rare occurrences in the long tail distribution that are unlikely to be encountered during routine driving and data collection. The enhancement can subsequently lead to improvement in the accuracy and reliability of autonomous driving systems. Another testament to the potential of FMs' applications lies in World Models, exemplified by the DREAMER series, which showcases the ability to comprehend physical laws and dynamics. Learning from massive data under the paradigm of self-supervised learning, World Model can generate unseen yet plausible driving environments, facilitating the enhancement in the prediction of road users' behaviors and the off-line training of driving strategies. In this paper, we synthesize the applications and future trends of FMs in autonomous driving. By utilizing the powerful capabilities of FMs, we strive to tackle the potential issues stemming from the long-tail distribution in autonomous driving, consequently advancing overall safety in this domain.
△ Less
Submitted 17 May, 2024; v1 submitted 8 December, 2023;
originally announced May 2024.
-
Multispectral Fine-Grained Classification of Blackgrass in Wheat and Barley Crops
Authors:
Madeleine Darbyshire,
Shaun Coutts,
Eleanor Hammond,
Fazilet Gokbudak,
Cengiz Oztireli,
Petra Bosilj,
Junfeng Gao,
Elizabeth Sklar,
Simon Parsons
Abstract:
As the burden of herbicide resistance grows and the environmental repercussions of excessive herbicide use become clear, new ways of managing weed populations are needed. This is particularly true for cereal crops, like wheat and barley, that are staple food crops and occupy a globally significant portion of agricultural land. Even small improvements in weed management practices across these major…
▽ More
As the burden of herbicide resistance grows and the environmental repercussions of excessive herbicide use become clear, new ways of managing weed populations are needed. This is particularly true for cereal crops, like wheat and barley, that are staple food crops and occupy a globally significant portion of agricultural land. Even small improvements in weed management practices across these major food crops worldwide would yield considerable benefits for both the environment and global food security. Blackgrass is a major grass weed which causes particular problems in cereal crops in north-west Europe, a major cereal production area, because it has high levels of of herbicide resistance and is well adapted to agronomic practice in this region. With the use of machine vision and multispectral imaging, we investigate the effectiveness of state-of-the-art methods to identify blackgrass in wheat and barley crops. As part of this work, we provide a large dataset with which we evaluate several key aspects of blackgrass weed recognition. Firstly, we determine the performance of different CNN and transformer-based architectures on images from unseen fields. Secondly, we demonstrate the role that different spectral bands have on the performance of weed classification. Lastly, we evaluate the role of dataset size in classification performance for each of the models trialled. We find that even with a fairly modest quantity of training data an accuracy of almost 90% can be achieved on images from unseen fields.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
Rapid Mobile App Development for Generative AI Agents on MIT App Inventor
Authors:
Jaida Gao,
Calab Su,
Etai Miller,
Kevin Lu,
Yu Meng
Abstract:
The evolution of Artificial Intelligence (AI) stands as a pivotal force shaping our society, finding applications across diverse domains such as education, sustainability, and safety. Leveraging AI within mobile applications makes it easily accessible to the public, catalyzing its transformative potential. In this paper, we present a methodology for the rapid development of AI agent applications u…
▽ More
The evolution of Artificial Intelligence (AI) stands as a pivotal force shaping our society, finding applications across diverse domains such as education, sustainability, and safety. Leveraging AI within mobile applications makes it easily accessible to the public, catalyzing its transformative potential. In this paper, we present a methodology for the rapid development of AI agent applications using the development platform provided by MIT App Inventor. To demonstrate its efficacy, we share the development journey of three distinct mobile applications: SynchroNet for fostering sustainable communities; ProductiviTeams for addressing procrastination; and iHELP for enhancing community safety. All three applications seamlessly integrate a spectrum of generative AI features, leveraging OpenAI APIs. Furthermore, we offer insights gleaned from overcoming challenges in integrating diverse tools and AI functionalities, aiming to inspire young developers to join our efforts in building practical AI agent applications.
△ Less
Submitted 31 March, 2024;
originally announced May 2024.
-
Graph is all you need? Lightweight data-agnostic neural architecture search without training
Authors:
Zhenhan Huang,
Tejaswini Pedapati,
Pin-Yu Chen,
Chunhen Jiang,
Jianxi Gao
Abstract:
Neural architecture search (NAS) enables the automatic design of neural network models. However, training the candidates generated by the search algorithm for performance evaluation incurs considerable computational overhead. Our method, dubbed nasgraph, remarkably reduces the computational costs by converting neural architectures to graphs and using the average degree, a graph measure, as the pro…
▽ More
Neural architecture search (NAS) enables the automatic design of neural network models. However, training the candidates generated by the search algorithm for performance evaluation incurs considerable computational overhead. Our method, dubbed nasgraph, remarkably reduces the computational costs by converting neural architectures to graphs and using the average degree, a graph measure, as the proxy in lieu of the evaluation metric. Our training-free NAS method is data-agnostic and light-weight. It can find the best architecture among 200 randomly sampled architectures from NAS-Bench201 in 217 CPU seconds. Besides, our method is able to achieve competitive performance on various datasets including NASBench-101, NASBench-201, and NDS search spaces. We also demonstrate that nasgraph generalizes to more challenging tasks on Micro TransNAS-Bench-101.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Rigidity matroids and linear algebraic matroids with applications to matrix completion and tensor codes
Authors:
Joshua Brakensiek,
Manik Dhar,
Jiyang Gao,
Sivakanth Gopi,
Matt Larson
Abstract:
We establish a connection between problems studied in rigidity theory and matroids arising from linear algebraic constructions like tensor products and symmetric products. A special case of this correspondence identifies the problem of giving a description of the correctable erasure patterns in a maximally recoverable tensor code with the problem of describing bipartite rigid graphs or low-rank co…
▽ More
We establish a connection between problems studied in rigidity theory and matroids arising from linear algebraic constructions like tensor products and symmetric products. A special case of this correspondence identifies the problem of giving a description of the correctable erasure patterns in a maximally recoverable tensor code with the problem of describing bipartite rigid graphs or low-rank completable matrix patterns. Additionally, we relate dependencies among symmetric products of generic vectors to graph rigidity and symmetric matrix completion. With an eye toward applications to computer science, we study the dependency of these matroids on the characteristic by giving new combinatorial descriptions in several cases, including the first description of the correctable patterns in an (m, n, a=2, b=2) maximally recoverable tensor code.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment
Authors:
Zhili Liu,
Yunhao Gou,
Kai Chen,
Lanqing Hong,
Jiahui Gao,
Fei Mi,
Yu Zhang,
Zhenguo Li,
Xin Jiang,
Qun Liu,
James T. Kwok
Abstract:
As the capabilities of large language models (LLMs) have expanded dramatically, aligning these models with human values presents a significant challenge. Traditional alignment strategies rely heavily on human intervention, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), or on the self-alignment capacities of LLMs, which usually require a strong LLM's eme…
▽ More
As the capabilities of large language models (LLMs) have expanded dramatically, aligning these models with human values presents a significant challenge. Traditional alignment strategies rely heavily on human intervention, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), or on the self-alignment capacities of LLMs, which usually require a strong LLM's emergent ability to improve its original bad answer. To address these challenges, we propose a novel self-alignment method that utilizes a Chain of Thought (CoT) approach, termed AlignCoT. This method encompasses stages of Question Analysis, Answer Guidance, and Safe Answer production. It is designed to enable LLMs to generate high-quality, safe responses throughout various stages of their development. Furthermore, we introduce the Mixture of insighTful Experts (MoTE) architecture, which applies mixture of experts to enhance each component of the AlignCoT process, markedly increasing alignment efficiency. The MoTE approach not only outperforms existing methods in aligning LLMs with human values but also highlights the benefits of using self-generated data, revealing the dual benefits of improved alignment and training efficiency.
△ Less
Submitted 8 July, 2024; v1 submitted 1 May, 2024;
originally announced May 2024.
-
Bridging Data Barriers among Participants: Assessing the Potential of Geoenergy through Federated Learning
Authors:
Weike Peng,
Jiaxin Gao,
Yuntian Chen,
Shengwei Wang
Abstract:
Machine learning algorithms emerge as a promising approach in energy fields, but its practical is hindered by data barriers, stemming from high collection costs and privacy concerns. This study introduces a novel federated learning (FL) framework based on XGBoost models, enabling safe collaborative modeling with accessible yet concealed data from multiple parties. Hyperparameter tuning of the mode…
▽ More
Machine learning algorithms emerge as a promising approach in energy fields, but its practical is hindered by data barriers, stemming from high collection costs and privacy concerns. This study introduces a novel federated learning (FL) framework based on XGBoost models, enabling safe collaborative modeling with accessible yet concealed data from multiple parties. Hyperparameter tuning of the models is achieved through Bayesian Optimization. To ascertain the merits of the proposed FL-XGBoost method, a comparative analysis is conducted between separate and centralized models to address a classical binary classification problem in geoenergy sector. The results reveal that the proposed FL framework strikes an optimal balance between privacy and accuracy. FL models demonstrate superior accuracy and generalization capabilities compared to separate models, particularly for participants with limited data or low correlation features and offers significant privacy benefits compared to centralized model. The aggregated optimization approach within the FL agreement proves effective in tuning hyperparameters. This study opens new avenues for assessing unconventional reservoirs through collaborative and privacy-preserving FL techniques.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
When to Trust LLMs: Aligning Confidence with Response Quality
Authors:
Shuchang Tao,
Liuyi Yao,
Hanxing Ding,
Yuexiang Xie,
Qi Cao,
Fei Sun,
Jinyang Gao,
Huawei Shen,
Bolin Ding
Abstract:
Despite the success of large language models (LLMs) in natural language generation, much evidence shows that LLMs may produce incorrect or nonsensical text. This limitation highlights the importance of discerning when to trust LLMs, especially in safety-critical domains. Existing methods often express reliability by confidence level, however, their effectiveness is limited by the lack of objective…
▽ More
Despite the success of large language models (LLMs) in natural language generation, much evidence shows that LLMs may produce incorrect or nonsensical text. This limitation highlights the importance of discerning when to trust LLMs, especially in safety-critical domains. Existing methods often express reliability by confidence level, however, their effectiveness is limited by the lack of objective guidance. To address this, we propose CONfidence-Quality-ORDer-preserving alignment approach (CONQORD), which leverages reinforcement learning guided by a tailored dual-component reward function. This function integrates quality reward and order-preserving alignment reward functions. Specifically, the order-preserving reward incentivizes the model to verbalize greater confidence for responses of higher quality to align the order of confidence and quality. Experiments demonstrate that CONQORD significantly improves the alignment performance between confidence and response accuracy, without causing over-cautious. Furthermore, the aligned confidence provided by CONQORD informs when to trust LLMs, and acts as a determinant for initiating the retrieval process of external knowledge. Aligning confidence with response quality ensures more transparent and reliable responses, providing better trustworthiness.
△ Less
Submitted 9 June, 2024; v1 submitted 26 April, 2024;
originally announced April 2024.
-
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
Authors:
An Yan,
Zhengyuan Yang,
Junda Wu,
Wanrong Zhu,
Jianwei Yang,
Linjie Li,
Kevin Lin,
Jianfeng Wang,
Julian McAuley,
Jianfeng Gao,
Lijuan Wang
Abstract:
Set-of-Mark (SoM) Prompting unleashes the visual grounding capability of GPT-4V, by enabling the model to associate visual objects with tags inserted on the image. These tags, marked with alphanumerics, can be indexed via text tokens for easy reference. Despite the extraordinary performance from GPT-4V, we observe that other Multimodal Large Language Models (MLLMs) struggle to understand these vis…
▽ More
Set-of-Mark (SoM) Prompting unleashes the visual grounding capability of GPT-4V, by enabling the model to associate visual objects with tags inserted on the image. These tags, marked with alphanumerics, can be indexed via text tokens for easy reference. Despite the extraordinary performance from GPT-4V, we observe that other Multimodal Large Language Models (MLLMs) struggle to understand these visual tags. To promote the learning of SoM prompting for open-source models, we propose a new learning paradigm: "list items one by one," which asks the model to enumerate and describe all visual tags placed on the image following the alphanumeric orders of tags. By integrating our curated dataset with other visual instruction tuning datasets, we are able to equip existing MLLMs with the SoM prompting ability. Furthermore, we evaluate our finetuned SoM models on five MLLM benchmarks. We find that this new dataset, even in a relatively small size (10k-30k images with tags), significantly enhances visual reasoning capabilities and reduces hallucinations for MLLMs. Perhaps surprisingly, these improvements persist even when the visual tags are omitted from input images during inference. This suggests the potential of "list items one by one" as a new paradigm for training MLLMs, which strengthens the object-text alignment through the use of visual tags in the training stage. Finally, we conduct analyses by probing trained models to understand the working mechanism of SoM. Our code and data are available at \url{https://github.com/zzxslp/SoM-LLaVA}.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
ST-MambaSync: The Complement of Mamba and Transformers for Spatial-Temporal in Traffic Flow Prediction
Authors:
Zhiqi Shao,
Xusheng Yao,
Ze Wang,
Junbin Gao
Abstract:
Accurate traffic flow prediction is crucial for optimizing traffic management, enhancing road safety, and reducing environmental impacts. Existing models face challenges with long sequence data, requiring substantial memory and computational resources, and often suffer from slow inference times due to the lack of a unified summary state. This paper introduces ST-MambaSync, an innovative traffic fl…
▽ More
Accurate traffic flow prediction is crucial for optimizing traffic management, enhancing road safety, and reducing environmental impacts. Existing models face challenges with long sequence data, requiring substantial memory and computational resources, and often suffer from slow inference times due to the lack of a unified summary state. This paper introduces ST-MambaSync, an innovative traffic flow prediction model that combines transformer technology with the ST-Mamba block, representing a significant advancement in the field. We are the pioneers in employing the Mamba mechanism which is an attention mechanism integrated with ResNet within a transformer framework, which significantly enhances the model's explainability and performance. ST-MambaSync effectively addresses key challenges such as data length and computational efficiency, setting new benchmarks for accuracy and processing speed through comprehensive comparative analysis. This development has significant implications for urban planning and real-time traffic management, establishing a new standard in traffic flow prediction technology.
△ Less
Submitted 9 May, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
On-liquid-gallium surface synthesis of ultra-smooth conductive metal-organic framework thin films
Authors:
Jinxin Liu,
Yunxu Chen,
Xing Huang,
Yanhan Ren,
Mike Hambsch,
David Bodesheim,
Darius Pohl,
Xiaodong Li,
Marielle Deconinck,
Bowen Zhang,
Markus Löffler,
Zhongquan Liao,
Fengxiang Zhao,
Arezoo Dianat,
Gianaurelio Cuniberti,
Yana Vaynzof,
Junfeng Gao,
Jingcheng Hao,
Stefan C. B. Mannsfeld,
Xinliang Feng,
Renhao Dong
Abstract:
Conductive metal-organic frameworks (MOFs) are emerging electroactive materials for (opto-)electronics. However, it remains a great challenge to achieve reliable MOF-based devices via the existing synthesis methods that are compatible with the complementary metal-oxide-semiconductor technology, as the surface roughness of thus-far synthetic MOF films or pellets is rather high for efficient electro…
▽ More
Conductive metal-organic frameworks (MOFs) are emerging electroactive materials for (opto-)electronics. However, it remains a great challenge to achieve reliable MOF-based devices via the existing synthesis methods that are compatible with the complementary metal-oxide-semiconductor technology, as the surface roughness of thus-far synthetic MOF films or pellets is rather high for efficient electrode contact. Here, we develop an on-liquid-gallium surface synthesis (OLGSS) strategy under chemical vapor deposition (CVD) conditions for the controlled growth of two-dimensional conjugated MOF (2D c-MOF) thin films with ten-fold improvement of surface flatness (surface roughness can reach as low as ~2 Å) compared with MOF films grown by the traditional methods. Supported by theoretical modeling, we unveil a layer-by-layer CVD growth mode for constructing flattening surfaces, that is triggered by the high adhesion energy between gallium (Ga) and planar aromatic ligands. We further demonstrate the generality of the as-proposed OLGSS strategy by reproducing such a flat surface over nine different 2D c-MOF films with variable thicknesses (~2 to 208 nm) and large lateral sizes (over 1 cm2). The resultant ultra-smooth 2D c-MOF films enable the formation of high-quality electrical contacts with gold (Au) electrodes, leading to a reduction of contact resistance by over ten orders of magnitude compared to the traditional uneven MOF films. Furthermore, due to the efficient interfacial interaction benifited from the high-quality contacts, the prepared van der Waals heterostructure (vdWH) of OLGSS c-MOF and MoS2 exhibits intriguing photoluminescence (PL) enhancement, PL peak shift and large work function modulation. The establishment of the reliable OLGSS method provides the chances to push the development of MOF electronics and the construction of multicomponent MOF-based heterostructure materials.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Jointly Modeling Spatio-Temporal Features of Tactile Signals for Action Classification
Authors:
Jimmy Lin,
Junkai Li,
Jiasi Gao,
Weizhi Ma,
Yang Liu
Abstract:
Tactile signals collected by wearable electronics are essential in modeling and understanding human behavior. One of the main applications of tactile signals is action classification, especially in healthcare and robotics. However, existing tactile classification methods fail to capture the spatial and temporal features of tactile signals simultaneously, which results in sub-optimal performances.…
▽ More
Tactile signals collected by wearable electronics are essential in modeling and understanding human behavior. One of the main applications of tactile signals is action classification, especially in healthcare and robotics. However, existing tactile classification methods fail to capture the spatial and temporal features of tactile signals simultaneously, which results in sub-optimal performances. In this paper, we design Spatio-Temporal Aware tactility Transformer (STAT) to utilize continuous tactile signals for action classification. We propose spatial and temporal embeddings along with a new temporal pretraining task in our model, which aims to enhance the transformer in modeling the spatio-temporal features of tactile signals. Specially, the designed temporal pretraining task is to differentiate the time order of tubelet inputs to model the temporal properties explicitly. Experimental results on a public action classification dataset demonstrate that our model outperforms state-of-the-art methods in all metrics.
△ Less
Submitted 20 January, 2024;
originally announced April 2024.
-
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Authors:
Marah Abdin,
Sam Ade Jacobs,
Ammar Ahmad Awan,
Jyoti Aneja,
Ahmed Awadallah,
Hany Awadalla,
Nguyen Bach,
Amit Bahree,
Arash Bakhtiari,
Jianmin Bao,
Harkirat Behl,
Alon Benhaim,
Misha Bilenko,
Johan Bjorck,
Sébastien Bubeck,
Qin Cai,
Martin Cai,
Caio César Teodoro Mendes,
Weizhu Chen,
Vishrav Chaudhary,
Dong Chen,
Dongdong Chen,
Yen-Chun Chen,
Yi-Ling Chen,
Parul Chopra
, et al. (90 additional authors not shown)
Abstract:
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset…
▽ More
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset for training, a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide some initial parameter-scaling results with a 7B and 14B models trained for 4.8T tokens, called phi-3-small and phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75% and 78% on MMLU, and 8.7 and 8.9 on MT-bench). Moreover, we also introduce phi-3-vision, a 4.2 billion parameter model based on phi-3-mini with strong reasoning capabilities for image and text prompts.
△ Less
Submitted 23 May, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
Dynamic Proxy Domain Generalizes the Crowd Localization by Better Binary Segmentation
Authors:
Junyu Gao,
Da Zhang,
Xuelong Li
Abstract:
Crowd localization targets on predicting each instance precise location within an image. Current advanced methods propose the pixel-wise binary classification to tackle the congested prediction, in which the pixel-level thresholds binarize the prediction confidence of being the pedestrian head. Since the crowd scenes suffer from extremely varying contents, counts and scales, the confidence-thresho…
▽ More
Crowd localization targets on predicting each instance precise location within an image. Current advanced methods propose the pixel-wise binary classification to tackle the congested prediction, in which the pixel-level thresholds binarize the prediction confidence of being the pedestrian head. Since the crowd scenes suffer from extremely varying contents, counts and scales, the confidence-threshold learner is fragile and under-generalized encountering domain knowledge shift. Moreover, at the most time, the target domain is agnostic in training. Hence, it is imperative to exploit how to enhance the generalization of confidence-threshold locator to the latent target domain. In this paper, we propose a Dynamic Proxy Domain (DPD) method to generalize the learner under domain shift. Concretely, based on the theoretical analysis to the generalization error risk upper bound on the latent target domain to a binary classifier, we propose to introduce a generated proxy domain to facilitate generalization. Then, based on the theory, we design a DPD algorithm which is composed by a training paradigm and proxy domain generator to enhance the domain generalization of the confidence-threshold learner. Besides, we conduct our method on five kinds of domain shift scenarios, demonstrating the effectiveness on generalizing the crowd localization. Our code will be available at https://github.com/zhangda1018/DPD.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Video sentence grounding with temporally global textual knowledge
Authors:
Cai Chen,
Runzhong Zhang,
Jianjun Gao,
Kejun Wu,
Kim-Hui Yap,
Yi Wang
Abstract:
Temporal sentence grounding involves the retrieval of a video moment with a natural language query. Many existing works directly incorporate the given video and temporally localized query for temporal grounding, overlooking the inherent domain gap between different modalities. In this paper, we utilize pseudo-query features containing extensive temporally global textual knowledge sourced from the…
▽ More
Temporal sentence grounding involves the retrieval of a video moment with a natural language query. Many existing works directly incorporate the given video and temporally localized query for temporal grounding, overlooking the inherent domain gap between different modalities. In this paper, we utilize pseudo-query features containing extensive temporally global textual knowledge sourced from the same video-query pair, to enhance the bridging of domain gaps and attain a heightened level of similarity between multi-modal features. Specifically, we propose a Pseudo-query Intermediary Network (PIN) to achieve an improved alignment of visual and comprehensive pseudo-query features within the feature space through contrastive learning. Subsequently, we utilize learnable prompts to encapsulate the knowledge of pseudo-queries, propagating them into the textual encoder and multi-modal fusion module, further enhancing the feature alignment between visual and language for better temporal grounding. Extensive experiments conducted on the Charades-STA and ActivityNet-Captions datasets demonstrate the effectiveness of our method.
△ Less
Submitted 1 June, 2024; v1 submitted 21 April, 2024;
originally announced April 2024.
-
ST-Mamba: Spatial-Temporal Selective State Space Model for Traffic Flow Prediction
Authors:
Zhiqi Shao,
Michael G. H. Bell,
Ze Wang,
D. Glenn Geers,
Haoning Xi,
Junbin Gao
Abstract:
Traffic flow prediction, a critical aspect of intelligent transportation systems, has been increasingly popular in the field of artificial intelligence, driven by the availability of extensive traffic data. The current challenges of traffic flow prediction lie in integrating diverse factors while balancing the trade-off between computational complexity and the precision necessary for effective lon…
▽ More
Traffic flow prediction, a critical aspect of intelligent transportation systems, has been increasingly popular in the field of artificial intelligence, driven by the availability of extensive traffic data. The current challenges of traffic flow prediction lie in integrating diverse factors while balancing the trade-off between computational complexity and the precision necessary for effective long-range and large-scale predictions. To address these challenges, we introduce a Spatial-Temporal Selective State Space (ST-Mamba) model, which is the first to leverage the power of spatial-temporal learning in traffic flow prediction without using graph modeling. The ST-Mamba model can effectively capture the long-range dependency for traffic flow data, thereby avoiding the issue of over-smoothing. The proposed ST-Mamba model incorporates an effective Spatial-Temporal Mixer (ST-Mixer) to seamlessly integrate spatial and temporal data processing into a unified framework and employs a Spatial-Temporal Selective State Space (ST-SSM) block to improve computational efficiency. The proposed ST-Mamba model, specifically designed for spatial-temporal data, simplifies processing procedure and enhances generalization capabilities, thereby significantly improving the accuracy of long-range traffic flow prediction. Compared to the previous state-of-the-art (SOTA) model, the proposed ST-Mamba model achieves a 61.11\% improvement in computational speed and increases prediction accuracy by 0.67\%. Extensive experiments with real-world traffic datasets demonstrate that the \textsf{ST-Mamba} model sets a new benchmark in traffic flow prediction, achieving SOTA performance in computational efficiency for both long- and short-range predictions and significantly improving the overall efficiency and effectiveness of traffic management.
△ Less
Submitted 18 May, 2024; v1 submitted 19 April, 2024;
originally announced April 2024.
-
An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training
Authors:
Jin Gao,
Shubo Lin,
Shaoru Wang,
Yutong Kou,
Zeming Li,
Liang Li,
Congxuan Zhang,
Xiaoqin Zhang,
Yizheng Wang,
Weiming Hu
Abstract:
Masked image modeling (MIM) pre-training for large-scale vision transformers (ViTs) has enabled promising downstream performance on top of the learned self-supervised ViT features. In this paper, we question if the \textit{extremely simple} lightweight ViTs' fine-tuning performance can also benefit from this pre-training paradigm, which is considerably less studied yet in contrast to the well-esta…
▽ More
Masked image modeling (MIM) pre-training for large-scale vision transformers (ViTs) has enabled promising downstream performance on top of the learned self-supervised ViT features. In this paper, we question if the \textit{extremely simple} lightweight ViTs' fine-tuning performance can also benefit from this pre-training paradigm, which is considerably less studied yet in contrast to the well-established lightweight architecture design methodology. We use an observation-analysis-solution flow for our study. We first systematically observe different behaviors among the evaluated pre-training methods with respect to the downstream fine-tuning data scales. Furthermore, we analyze the layer representation similarities and attention maps across the obtained models, which clearly show the inferior learning of MIM pre-training on higher layers, leading to unsatisfactory transfer performance on data-insufficient downstream tasks. This finding is naturally a guide to designing our distillation strategies during pre-training to solve the above deterioration problem. Extensive experiments have demonstrated the effectiveness of our approach. Our pre-training with distillation on pure lightweight ViTs with vanilla/hierarchical design ($5.7M$/$6.5M$) can achieve $79.4\%$/$78.9\%$ top-1 accuracy on ImageNet-1K. It also enables SOTA performance on the ADE20K segmentation task ($42.8\%$ mIoU) and LaSOT tracking task ($66.1\%$ AUC) in the lightweight regime. The latter even surpasses all the current SOTA lightweight CPU-realtime trackers.
△ Less
Submitted 25 May, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
Spectroscopic measurements and models of energy deposition in the substrate of quantum circuits by natural ionizing radiation
Authors:
Joseph W. Fowler,
Paul Szypryt,
Raymond Bunker,
Ellen R. Edwards,
Ian Fogarty Florang,
Jiansong Gao,
Andrea Giachero,
Shannon F. Hoogerheide,
Ben Loer,
H. Pieter Mumm,
Nathan Nakamura,
Galen C. O'Neil,
John L. Orrell,
Elizabeth M. Scott,
Jason Stevens,
Daniel S. Swetz,
Brent A. VanDevender,
Michael Vissers,
Joel N. Ullom
Abstract:
Naturally occurring background radiation is a source of correlated decoherence events in superconducting qubits that will challenge error-correction schemes. To characterize the radiation environment in an unshielded laboratory, we performed broadband, spectroscopic measurements of background events in silicon substrates located inside a millikelvin refrigerator, an environment representative of s…
▽ More
Naturally occurring background radiation is a source of correlated decoherence events in superconducting qubits that will challenge error-correction schemes. To characterize the radiation environment in an unshielded laboratory, we performed broadband, spectroscopic measurements of background events in silicon substrates located inside a millikelvin refrigerator, an environment representative of superconducting qubit systems. We measured the background spectra in silicon substrates of two thicknesses, 0.5 mm and 1.5 mm, and obtained the average event rate and the integrated power deposition. In a 25 mm^2 area and the thinner substrate, these values are 0.023 events per second and 4.9 keV/s, counting events that deposit at least 40 keV. We find the background spectrum to be nearly featureless. Its intensity decreases by a factor of 40,000 between 100 keV and 3 MeV for silicon substrates 0.5 mm thick. We find the cryogenic measurements to be in good agreement with predictions based on measurements of the terrestrial gamma-ray flux, published models of cosmic-ray fluxes, a crude model of the cryostat, and radiation-transport simulations. No free parameters are required to predict the background spectra in the silicon substrates. The good agreement between measurements and predictions allow assessment of the relative contributions of terrestrial and cosmic background sources and their dependence on substrate thickness. Our spectroscopic measurements are performed with superconducting microresonators that transduce deposited energy to a readily detectable electrical signal. We find that gamma-ray emissions from radioisotopes are responsible for the majority of events depositing E<1.5 MeV, while nucleons among the cosmic-ray secondary particles cause most events that deposit more energy. These results suggest several paths to reducing the impact of background radiation on quantum circuits.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Authors:
Shusheng Xu,
Wei Fu,
Jiaxuan Gao,
Wenjie Ye,
Weilin Liu,
Zhiyu Mei,
Guangju Wang,
Chao Yu,
Yi Wu
Abstract:
Reinforcement Learning from Human Feedback (RLHF) is currently the most widely used method to align large language models (LLMs) with human preferences. Existing RLHF methods can be roughly categorized as either reward-based or reward-free. Novel applications such as ChatGPT and Claude leverage reward-based methods that first learn a reward model and apply actor-critic algorithms, such as Proximal…
▽ More
Reinforcement Learning from Human Feedback (RLHF) is currently the most widely used method to align large language models (LLMs) with human preferences. Existing RLHF methods can be roughly categorized as either reward-based or reward-free. Novel applications such as ChatGPT and Claude leverage reward-based methods that first learn a reward model and apply actor-critic algorithms, such as Proximal Policy Optimization (PPO). However, in academic benchmarks, state-of-the-art results are often achieved via reward-free methods, such as Direct Preference Optimization (DPO). Is DPO truly superior to PPO? Why does PPO perform poorly on these benchmarks? In this paper, we first conduct both theoretical and empirical studies on the algorithmic properties of DPO and show that DPO may have fundamental limitations. Moreover, we also comprehensively examine PPO and reveal the key factors for the best performances of PPO in fine-tuning LLMs. Finally, we benchmark DPO and PPO across a collection of RLHF testbeds, ranging from dialogue to code generation. Experiment results demonstrate that PPO is able to surpass other alignment methods in all cases and achieve state-of-the-art results in challenging code competitions.
△ Less
Submitted 21 April, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
HelixFold-Multimer: Elevating Protein Complex Structure Prediction to New Heights
Authors:
Xiaomin Fang,
Jie Gao,
Jing Hu,
Lihang Liu,
Yang Xue,
Xiaonan Zhang,
Kunrui Zhu
Abstract:
While monomer protein structure prediction tools boast impressive accuracy, the prediction of protein complex structures remains a daunting challenge in the field. This challenge is particularly pronounced in scenarios involving complexes with protein chains from different species, such as antigen-antibody interactions, where accuracy often falls short. Limited by the accuracy of complex predictio…
▽ More
While monomer protein structure prediction tools boast impressive accuracy, the prediction of protein complex structures remains a daunting challenge in the field. This challenge is particularly pronounced in scenarios involving complexes with protein chains from different species, such as antigen-antibody interactions, where accuracy often falls short. Limited by the accuracy of complex prediction, tasks based on precise protein-protein interaction analysis also face obstacles. In this report, we highlight the ongoing advancements of our protein complex structure prediction model, HelixFold-Multimer, underscoring its enhanced performance. HelixFold-Multimer provides precise predictions for diverse protein complex structures, especially in therapeutic protein interactions. Notably, HelixFold-Multimer achieves remarkable success in antigen-antibody and peptide-protein structure prediction, greatly surpassing AlphaFold 3. HelixFold-Multimer is now available for public use on the PaddleHelix platform, offering both a general version and an antigen-antibody version. Researchers can conveniently access and utilize this service for their development needs.
△ Less
Submitted 17 May, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Combined Pre-Supernova Alert System with Kamland and Super-Kamiokande
Authors:
KamLAND,
Super-Kamiokande Collaborations,
:,
Seisho Abe,
Minori Eizuka,
Sawako Futagi,
Azusa Gando,
Yoshihito Gando,
Shun Goto,
Takahiko Hachiya,
Kazumi Hata,
Koichi Ichimura,
Sei Ieki,
Haruo Ikeda,
Kunio Inoue,
Koji Ishidoshiro,
Yuto Kamei,
Nanami Kawada,
Yasuhiro Kishimoto,
Masayuki Koga,
Maho Kurasawa,
Tadao Mitsui,
Haruhiko Miyake,
Daisuke Morita,
Takeshi Nakahata
, et al. (290 additional authors not shown)
Abstract:
Preceding a core-collapse supernova, various processes produce an increasing amount of neutrinos of all flavors characterized by mounting energies from the interior of massive stars. Among them, the electron antineutrinos are potentially detectable by terrestrial neutrino experiments such as KamLAND and Super-Kamiokande via inverse beta decay interactions. Once these pre-supernova neutrinos are ob…
▽ More
Preceding a core-collapse supernova, various processes produce an increasing amount of neutrinos of all flavors characterized by mounting energies from the interior of massive stars. Among them, the electron antineutrinos are potentially detectable by terrestrial neutrino experiments such as KamLAND and Super-Kamiokande via inverse beta decay interactions. Once these pre-supernova neutrinos are observed, an early warning of the upcoming core-collapse supernova can be provided. In light of this, KamLAND and Super-Kamiokande, both located in the Kamioka mine in Japan, have been monitoring pre-supernova neutrinos since 2015 and 2021, respectively. Recently, we performed a joint study between KamLAND and Super-Kamiokande on pre-supernova neutrino detection. A pre-supernova alert system combining the KamLAND detector and the Super-Kamiokande detector was developed and put into operation, which can provide a supernova alert to the astrophysics community. Fully leveraging the complementary properties of these two detectors, the combined alert is expected to resolve a pre-supernova neutrino signal from a 15 M$_{\odot}$ star within 510 pc of the Earth, at a significance level corresponding to a false alarm rate of no more than 1 per century. For a Betelgeuse-like model with optimistic parameters, it can provide early warnings up to 12 hours in advance.
△ Less
Submitted 1 July, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Unsourced Random Access in MIMO Quasi-Static Rayleigh Fading Channels with Finite Blocklength
Authors:
Junyuan Gao,
Yongpeng Wu,
Giuseppe Caire,
Wei Yang,
Wenjun Zhang
Abstract:
This paper explores the fundamental limits of unsourced random access (URA) with a random and unknown number ${\rm{K}}_a$ of active users in MIMO quasi-static Rayleigh fading channels. First, we derive an upper bound on the probability of incorrectly estimating the number of active users. We prove that it exponentially decays with the number of receive antennas and eventually vanishes, whereas rea…
▽ More
This paper explores the fundamental limits of unsourced random access (URA) with a random and unknown number ${\rm{K}}_a$ of active users in MIMO quasi-static Rayleigh fading channels. First, we derive an upper bound on the probability of incorrectly estimating the number of active users. We prove that it exponentially decays with the number of receive antennas and eventually vanishes, whereas reaches a plateau as the power and blocklength increase. Then, we derive non-asymptotic achievability and converse bounds on the minimum energy-per-bit required by each active user to reliably transmit $J$ bits with blocklength $n$. Numerical results verify the tightness of our bounds, suggesting that they provide benchmarks to evaluate existing schemes. The extra required energy-per-bit due to the uncertainty of the number of active users decreases as $\mathbb{E}[{\rm{K}}_a]$ increases. Compared to random access with individual codebooks, the URA paradigm achieves higher spectral and energy efficiency. Moreover, using codewords distributed on a sphere is shown to outperform the Gaussian random coding scheme in the non-asymptotic regime.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Development of a data overflow protection system for Super-Kamiokande to maximize data from nearby supernovae
Authors:
M. Mori,
K. Abe,
Y. Hayato,
K. Hiraide,
K. Hosokawa,
K. Ieki,
M. Ikeda,
J. Kameda,
Y. Kanemura,
R. Kaneshima,
Y. Kashiwagi,
Y. Kataoka,
S. Miki,
S. Mine,
M. Miura,
S. Moriyama,
Y. Nakano,
M. Nakahata,
S. Nakayama,
Y. Noguchi,
K. Okamoto,
K. Sato,
H. Sekiya,
H. Shiba,
K. Shimizu
, et al. (230 additional authors not shown)
Abstract:
Neutrinos from very nearby supernovae, such as Betelgeuse, are expected to generate more than ten million events over 10\,s in Super-Kamokande (SK). At such large event rates, the buffers of the SK analog-to-digital conversion board (QBEE) will overflow, causing random loss of data that is critical for understanding the dynamics of the supernova explosion mechanism. In order to solve this problem,…
▽ More
Neutrinos from very nearby supernovae, such as Betelgeuse, are expected to generate more than ten million events over 10\,s in Super-Kamokande (SK). At such large event rates, the buffers of the SK analog-to-digital conversion board (QBEE) will overflow, causing random loss of data that is critical for understanding the dynamics of the supernova explosion mechanism. In order to solve this problem, two new DAQ modules were developed to aid in the observation of very nearby supernovae. The first of these, the SN module, is designed to save only the number of hit PMTs during a supernova burst and the second, the Veto module, prescales the high rate neutrino events to prevent the QBEE from overflowing based on information from the SN module. In the event of a very nearby supernova, these modules allow SK to reconstruct the time evolution of the neutrino event rate from beginning to end using both QBEE and SN module data. This paper presents the development and testing of these modules together with an analysis of supernova-like data generated with a flashing laser diode. We demonstrate that the Veto module successfully prevents DAQ overflows for Betelgeuse-like supernovae as well as the long-term stability of the new modules. During normal running the Veto module is found to issue DAQ vetos a few times per month resulting in a total dead time less than 1\,ms, and does not influence ordinary operations. Additionally, using simulation data we find that supernovae closer than 800~pc will trigger Veto module resulting in a prescaling of the observed neutrino data.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Estimation and Inference for Three-Dimensional Panel Data Models
Authors:
Guohua Feng,
Jiti Gao,
Fei Liu,
Bin Peng
Abstract:
Hierarchical panel data models have recently garnered significant attention. This study contributes to the relevant literature by introducing a novel three-dimensional (3D) hierarchical panel data model, which integrates panel regression with three sets of latent factor structures: one set of global factors and two sets of local factors. Instead of aggregating latent factors from various nodes, as…
▽ More
Hierarchical panel data models have recently garnered significant attention. This study contributes to the relevant literature by introducing a novel three-dimensional (3D) hierarchical panel data model, which integrates panel regression with three sets of latent factor structures: one set of global factors and two sets of local factors. Instead of aggregating latent factors from various nodes, as seen in the literature of distributed principal component analysis (PCA), we propose an estimation approach capable of recovering the parameters of interest and disentangling latent factors at different levels and across different dimensions. We establish an asymptotic theory and provide a bootstrap procedure to obtain inference for the parameters of interest while accommodating various types of cross-sectional dependence and time series autocorrelation. Finally, we demonstrate the applicability of our framework by examining productivity convergence in manufacturing industries worldwide.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
A Systematic Literature Survey of Sparse Matrix-Vector Multiplication
Authors:
Jianhua Gao,
Bingjie Liu,
Weixing Ji,
Hua Huang
Abstract:
Sparse matrix-vector multiplication (SpMV) is a crucial computing kernel with widespread applications in iterative algorithms. Over the past decades, research on SpMV optimization has made remarkable strides, giving rise to various optimization contributions. However, the comprehensive and systematic literature survey that introduces, analyzes, discusses, and summarizes the advancements of SpMV in…
▽ More
Sparse matrix-vector multiplication (SpMV) is a crucial computing kernel with widespread applications in iterative algorithms. Over the past decades, research on SpMV optimization has made remarkable strides, giving rise to various optimization contributions. However, the comprehensive and systematic literature survey that introduces, analyzes, discusses, and summarizes the advancements of SpMV in recent years is currently lacking. Aiming to fill this gap, this paper compares existing techniques and analyzes their strengths and weaknesses. We begin by highlighting two representative applications of SpMV, then conduct an in-depth overview of the important techniques that optimize SpMV on modern architectures, which we specifically classify as classic, auto-tuning, machine learning, and mixed-precision-based optimization. We also elaborate on the hardware-based architectures, including CPU, GPU, FPGA, processing in Memory, heterogeneous, and distributed platforms. We present a comprehensive experimental evaluation that compares the performance of state-of-the-art SpMV implementations. Based on our findings, we identify several challenges and point out future research directions. This survey is intended to provide researchers with a comprehensive understanding of SpMV optimization on modern architectures and provide guidance for future work.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
The increasing fragmentation of global science limits the diffusion of ideas
Authors:
Alexander J. Gates,
Indraneel Mane,
Jianjian Gao
Abstract:
The global scientific landscape emerges from a complex interplay of collaboration and competition, where nations vie for dominance while simultaneously fostering the diffusion of knowledge on a global scale. This raises crucial questions: What underlying patterns govern international scientific recognition and influence? How does this structure impact knowledge dissemination? Traditional models vi…
▽ More
The global scientific landscape emerges from a complex interplay of collaboration and competition, where nations vie for dominance while simultaneously fostering the diffusion of knowledge on a global scale. This raises crucial questions: What underlying patterns govern international scientific recognition and influence? How does this structure impact knowledge dissemination? Traditional models view the global scientific ecosystem through a core-periphery lens, with Western nations dominating knowledge production. Here, we investigate the dynamics of international scientific recognition through the lens of national preferences, introducing a novel signed measure to characterize national citation preferences and enabling a network analysis of international scientific recognition. We find that scientific recognition is related to cultural and political factors in addition to economic strength and scientific quality. Our analysis challenges the conventional core-periphery narrative, uncovering instead several communities of international knowledge production that are rapidly fragmenting the scientific recognition ecosystem. Moreover, we provide compelling evidence that this network significantly constrains the diffusion of ideas across international borders. The resulting network framework for global scientific recognition sheds light on the barriers and opportunities for collaboration, innovation, and the equitable recognition of scientific advancements, with significant consequences for policymakers seeking to foster inclusive and impactful international scientific endeavors.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Towards Generalizable and Faithful Logic Reasoning over Natural Language via Resolution Refutation
Authors:
Zhouhao Sun,
Xiao Ding,
Li Du,
Bibo Cai,
Jinglong Gao,
Ting Liu,
Qin Bing
Abstract:
Large language models (LLMs) have achieved significant performance in various natural language reasoning tasks. However, they still struggle with performing first-order logic reasoning over formal logical theories expressed in natural language. This is because the previous LLMs-based reasoning systems have the theoretical incompleteness issue. As a result, it can only address a limited set of simp…
▽ More
Large language models (LLMs) have achieved significant performance in various natural language reasoning tasks. However, they still struggle with performing first-order logic reasoning over formal logical theories expressed in natural language. This is because the previous LLMs-based reasoning systems have the theoretical incompleteness issue. As a result, it can only address a limited set of simple reasoning problems, which significantly decreases their generalization ability. To address this issue, we propose a novel framework, named Generalizable and Faithful Reasoner (GFaiR), which introduces the paradigm of resolution refutation. Resolution refutation has the capability to solve all first-order logic reasoning problems by extending reasoning rules and employing the principle of proof by contradiction, so our system's completeness can be improved by introducing resolution refutation. Experimental results demonstrate that our system outperforms previous works by achieving state-of-the-art performances in complex scenarios while maintaining performances in simple scenarios. Besides, we observe that GFaiR is faithful to its reasoning process.
△ Less
Submitted 3 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Bifurcation on Fully Nonlinear Elliptic Equations and Systems
Authors:
Jing Gao,
Weijun Zhang,
Zhitao Zhang
Abstract:
In this paper, we study the following fully nonlinear elliptic equations \begin{equation*} \left\{\begin{array}{rl} \left(S_{k}(D^{2}u)\right)^{\frac1k}=λf(-u) & in\quadΩ\\ u=0 & on\quad \partialΩ\\ \end{array} \right. \end{equation*} and coupled systems \begin{equation*} \left\{\begin{array}{rl} (S_{k}(D^{2}u))^\frac1k=λg(-u,-v) & in\quadΩ\\ (S_{k}(D^{2}v))^\frac1k=λh(-u,-v) & in\quadΩ\\ u=v=0 &…
▽ More
In this paper, we study the following fully nonlinear elliptic equations \begin{equation*} \left\{\begin{array}{rl} \left(S_{k}(D^{2}u)\right)^{\frac1k}=λf(-u) & in\quadΩ\\ u=0 & on\quad \partialΩ\\ \end{array} \right. \end{equation*} and coupled systems \begin{equation*} \left\{\begin{array}{rl} (S_{k}(D^{2}u))^\frac1k=λg(-u,-v) & in\quadΩ\\ (S_{k}(D^{2}v))^\frac1k=λh(-u,-v) & in\quadΩ\\ u=v=0 & on\quad \partialΩ\\ \end{array} \right. \end{equation*} dominated by $k$-Hessian operators, where $Ω$ is a $(k$-$1)$-convex bounded domain in $\mathbb{R}^{N}$, $λ$ is a non-negative parameter, $f:\left[0,+\infty\right)\rightarrow\left[0,+\infty\right)$ is a continuous function with zeros only at $0$ and $g,h:\left[0,+\infty\right)\times \left[0,+\infty\right)\rightarrow \left[0,+\infty\right)$ are continuous functions with zeros only at $(\cdot,0)$ and $(0,\cdot)$. We determine the interval of $λ$ about the existence, non-existence, uniqueness and multiplicity of $k$-convex solutions to the above problems according to various cases of $f,g,h$, which is a complete supplement to the known results in previous literature. In particular, the above results are also new for Laplacian and Monge-Ampère operators. We mainly use bifurcation theory, a-priori estimates, various maximum principles and technical strategies in the proof.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Evaluating the Factuality of Large Language Models using Large-Scale Knowledge Graphs
Authors:
Xiaoze Liu,
Feijie Wu,
Tianyang Xu,
Zhuo Chen,
Yichi Zhang,
Xiaoqian Wang,
Jing Gao
Abstract:
The advent of Large Language Models (LLMs) has significantly transformed the AI landscape, enhancing machine learning and AI capabilities. Factuality issue is a critical concern for LLMs, as they may generate factually incorrect responses. In this paper, we propose GraphEval to evaluate an LLM's performance using a substantially large test dataset. Specifically, the test dataset is retrieved from…
▽ More
The advent of Large Language Models (LLMs) has significantly transformed the AI landscape, enhancing machine learning and AI capabilities. Factuality issue is a critical concern for LLMs, as they may generate factually incorrect responses. In this paper, we propose GraphEval to evaluate an LLM's performance using a substantially large test dataset. Specifically, the test dataset is retrieved from a large knowledge graph with more than 10 million facts without expensive human efforts. Unlike conventional methods that evaluate LLMs based on generated responses, GraphEval streamlines the evaluation process by creating a judge model to estimate the correctness of the answers given by the LLM. Our experiments demonstrate that the judge model's factuality assessment aligns closely with the correctness of the LLM's generated outputs, while also substantially reducing evaluation costs. Besides, our findings offer valuable insights into LLM performance across different metrics and highlight the potential for future improvements in ensuring the factual integrity of LLM outputs. The code is publicly available at https://github.com/xz-liu/GraphEval.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Against The Achilles' Heel: A Survey on Red Teaming for Generative Models
Authors:
Lizhi Lin,
Honglin Mu,
Zenan Zhai,
Minghan Wang,
Yuxia Wang,
Renxi Wang,
Junjie Gao,
Yixuan Zhang,
Wanxiang Che,
Timothy Baldwin,
Xudong Han,
Haonan Li
Abstract:
Generative models are rapidly gaining popularity and being integrated into everyday applications, raising concerns over their safety issues as various vulnerabilities are exposed. Faced with the problem, the field of red teaming is experiencing fast-paced growth, which highlights the need for a comprehensive organization covering the entire pipeline and addressing emerging topics for the community…
▽ More
Generative models are rapidly gaining popularity and being integrated into everyday applications, raising concerns over their safety issues as various vulnerabilities are exposed. Faced with the problem, the field of red teaming is experiencing fast-paced growth, which highlights the need for a comprehensive organization covering the entire pipeline and addressing emerging topics for the community. Our extensive survey, which examines over 120 papers, introduces a taxonomy of fine-grained attack strategies grounded in the inherent capabilities of language models. Additionally, we have developed the searcher framework that unifies various automatic red teaming approaches. Moreover, our survey covers novel areas including multimodal attacks and defenses, risks around multilingual models, overkill of harmless queries, and safety of downstream applications. We hope this survey can provide a systematic perspective on the field and unlock new areas of research.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
A Taxonomy for Human-LLM Interaction Modes: An Initial Exploration
Authors:
Jie Gao,
Simret Araya Gebreegziabher,
Kenny Tsu Wei Choo,
Toby Jia-Jun Li,
Simon Tangi Perrault,
Thomas W. Malone
Abstract:
With ChatGPT's release, conversational prompting has become the most popular form of human-LLM interaction. However, its effectiveness is limited for more complex tasks involving reasoning, creativity, and iteration. Through a systematic analysis of HCI papers published since 2021, we identified four key phases in the human-LLM interaction flow - planning, facilitating, iterating, and testing - to…
▽ More
With ChatGPT's release, conversational prompting has become the most popular form of human-LLM interaction. However, its effectiveness is limited for more complex tasks involving reasoning, creativity, and iteration. Through a systematic analysis of HCI papers published since 2021, we identified four key phases in the human-LLM interaction flow - planning, facilitating, iterating, and testing - to precisely understand the dynamics of this process. Additionally, we have developed a taxonomy of four primary interaction modes: Mode 1: Standard Prompting, Mode 2: User Interface, Mode 3: Context-based, and Mode 4: Agent Facilitator. This taxonomy was further enriched using the "5W1H" guideline method, which involved a detailed examination of definitions, participant roles (Who), the phases that happened (When), human objectives and LLM abilities (What), and the mechanics of each interaction mode (How). We anticipate this taxonomy will contribute to the future design and evaluation of human-LLM interaction.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
IME: Integrating Multi-curvature Shared and Specific Embedding for Temporal Knowledge Graph Completion
Authors:
Jiapu Wang,
Zheng Cui,
Boyue Wang,
Shirui Pan,
Junbin Gao,
Baocai Yin,
Wen Gao
Abstract:
Temporal Knowledge Graphs (TKGs) incorporate a temporal dimension, allowing for a precise capture of the evolution of knowledge and reflecting the dynamic nature of the real world. Typically, TKGs contain complex geometric structures, with various geometric structures interwoven. However, existing Temporal Knowledge Graph Completion (TKGC) methods either model TKGs in a single space or neglect the…
▽ More
Temporal Knowledge Graphs (TKGs) incorporate a temporal dimension, allowing for a precise capture of the evolution of knowledge and reflecting the dynamic nature of the real world. Typically, TKGs contain complex geometric structures, with various geometric structures interwoven. However, existing Temporal Knowledge Graph Completion (TKGC) methods either model TKGs in a single space or neglect the heterogeneity of different curvature spaces, thus constraining their capacity to capture these intricate geometric structures. In this paper, we propose a novel Integrating Multi-curvature shared and specific Embedding (IME) model for TKGC tasks. Concretely, IME models TKGs into multi-curvature spaces, including hyperspherical, hyperbolic, and Euclidean spaces. Subsequently, IME incorporates two key properties, namely space-shared property and space-specific property. The space-shared property facilitates the learning of commonalities across different curvature spaces and alleviates the spatial gap caused by the heterogeneous nature of multi-curvature spaces, while the space-specific property captures characteristic features. Meanwhile, IME proposes an Adjustable Multi-curvature Pooling (AMP) approach to effectively retain important information. Furthermore, IME innovatively designs similarity, difference, and structure loss functions to attain the stated objective. Experimental results clearly demonstrate the superior performance of IME over existing state-of-the-art TKGC models.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Detecting Light Dark Matter with Kinetic Inductance Detectors
Authors:
Jiansong Gao,
Yonit Hochberg,
Benjamin V. Lehmann,
Sae Woo Nam,
Paul Szypryt,
Michael R. Vissers,
Tao Xu
Abstract:
Superconducting detectors are a promising technology for probing dark matter at extremely low masses, where dark matter interactions are currently unconstrained. Realizing the potential of such detectors requires new readout technologies to achieve the lowest possible thresholds for deposited energy. Here we perform a prototype search for dark matter--electron interactions with kinetic inductance…
▽ More
Superconducting detectors are a promising technology for probing dark matter at extremely low masses, where dark matter interactions are currently unconstrained. Realizing the potential of such detectors requires new readout technologies to achieve the lowest possible thresholds for deposited energy. Here we perform a prototype search for dark matter--electron interactions with kinetic inductance detectors (KIDs), a class of superconducting detector originally designed for infrared astronomy applications. We demonstrate that existing KIDs can achieve effective thresholds as low as 0.2 eV, and we use existing data to set new dark matter constraints. The relative maturity of the technology underlying KIDs means that this platform can be scaled significantly with existing tools, enabling powerful new searches in the coming years.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Learning From Correctness Without Prompting Makes LLM Efficient Reasoner
Authors:
Yuxuan Yao,
Han Wu,
Zhijiang Guo,
Biyan Zhou,
Jiahui Gao,
Sichun Luo,
Hanxu Hou,
Xiaojin Fu,
Linqi Song
Abstract:
Large language models (LLMs) have demonstrated outstanding performance across various tasks, yet they still exhibit limitations such as hallucination, unfaithful reasoning, and toxic content. One potential approach to mitigate these issues is learning from human or external feedback (e.g. tools). In this paper, we introduce an intrinsic self-correct reasoning framework for LLMs that eliminates the…
▽ More
Large language models (LLMs) have demonstrated outstanding performance across various tasks, yet they still exhibit limitations such as hallucination, unfaithful reasoning, and toxic content. One potential approach to mitigate these issues is learning from human or external feedback (e.g. tools). In this paper, we introduce an intrinsic self-correct reasoning framework for LLMs that eliminates the need for human feedback, external tools, and handcraft prompts. The proposed framework, based on a multi-step reasoning paradigm \textbf{Le}arning from \textbf{Co}rrectness (\textsc{LeCo}), improves reasoning performance without needing to learn from errors. This paradigm prioritizes learning from correct reasoning steps, and a unique method to measure confidence for each reasoning step based on generation logits. Experimental results across various multi-step reasoning tasks demonstrate the effectiveness of the framework in improving reasoning performance with reduced token consumption.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Limited Attention Allocation in a Stochastic Linear Quadratic System with Multiplicative Noise
Authors:
Xiangyu Cui,
Jianjun Gao,
Lingjie Kong
Abstract:
This study addresses limited attention allocation in a stochastic linear quadratic system with multiplicative noise. Our approach enables strategic resource allocation to enhance noise estimation and improve control decisions. We provide analytical optimal control and propose a numerical method for optimal attention allocation. Additionally, we apply our ffndings to dynamic mean-variance portfolio…
▽ More
This study addresses limited attention allocation in a stochastic linear quadratic system with multiplicative noise. Our approach enables strategic resource allocation to enhance noise estimation and improve control decisions. We provide analytical optimal control and propose a numerical method for optimal attention allocation. Additionally, we apply our ffndings to dynamic mean-variance portfolio selection, showing effective resource allocation across time periods and factors, providing valuable insights for investors.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
CCDSReFormer: Traffic Flow Prediction with a Criss-Crossed Dual-Stream Enhanced Rectified Transformer Model
Authors:
Zhiqi Shao,
Michael G. H. Bell,
Ze Wang,
D. Glenn Geers,
Xusheng Yao,
Junbin Gao
Abstract:
Accurate, and effective traffic forecasting is vital for smart traffic systems, crucial in urban traffic planning and management. Current Spatio-Temporal Transformer models, despite their prediction capabilities, struggle with balancing computational efficiency and accuracy, favoring global over local information, and handling spatial and temporal data separately, limiting insight into complex int…
▽ More
Accurate, and effective traffic forecasting is vital for smart traffic systems, crucial in urban traffic planning and management. Current Spatio-Temporal Transformer models, despite their prediction capabilities, struggle with balancing computational efficiency and accuracy, favoring global over local information, and handling spatial and temporal data separately, limiting insight into complex interactions. We introduce the Criss-Crossed Dual-Stream Enhanced Rectified Transformer model (CCDSReFormer), which includes three innovative modules: Enhanced Rectified Spatial Self-attention (ReSSA), Enhanced Rectified Delay Aware Self-attention (ReDASA), and Enhanced Rectified Temporal Self-attention (ReTSA). These modules aim to lower computational needs via sparse attention, focus on local information for better traffic dynamics understanding, and merge spatial and temporal insights through a unique learning method. Extensive tests on six real-world datasets highlight CCDSReFormer's superior performance. An ablation study also confirms the significant impact of each component on the model's predictive accuracy, showcasing our model's ability to forecast traffic flow effectively.
△ Less
Submitted 29 March, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
iDAT: inverse Distillation Adapter-Tuning
Authors:
Jiacheng Ruan,
Jingsheng Gao,
Mingye Xie,
Daize Dong,
Suncheng Xiang,
Ting Liu,
Yuzhuo Fu
Abstract:
Adapter-Tuning (AT) method involves freezing a pre-trained model and introducing trainable adapter modules to acquire downstream knowledge, thereby calibrating the model for better adaptation to downstream tasks. This paper proposes a distillation framework for the AT method instead of crafting a carefully designed adapter module, which aims to improve fine-tuning performance. For the first time,…
▽ More
Adapter-Tuning (AT) method involves freezing a pre-trained model and introducing trainable adapter modules to acquire downstream knowledge, thereby calibrating the model for better adaptation to downstream tasks. This paper proposes a distillation framework for the AT method instead of crafting a carefully designed adapter module, which aims to improve fine-tuning performance. For the first time, we explore the possibility of combining the AT method with knowledge distillation. Via statistical analysis, we observe significant differences in the knowledge acquisition between adapter modules of different models. Leveraging these differences, we propose a simple yet effective framework called inverse Distillation Adapter-Tuning (iDAT). Specifically, we designate the smaller model as the teacher and the larger model as the student. The two are jointly trained, and online knowledge distillation is applied to inject knowledge of different perspective to student model, and significantly enhance the fine-tuning performance on downstream tasks. Extensive experiments on the VTAB-1K benchmark with 19 image classification tasks demonstrate the effectiveness of iDAT. The results show that using existing AT method within our iDAT framework can further yield a 2.66% performance gain, with only an additional 0.07M trainable parameters. Our approach compares favorably with state-of-the-arts without bells and whistles. Our code is available at https://github.com/JCruan519/iDAT.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
PNAS-MOT: Multi-Modal Object Tracking with Pareto Neural Architecture Search
Authors:
Chensheng Peng,
Zhaoyu Zeng,
Jinling Gao,
Jundong Zhou,
Masayoshi Tomizuka,
Xinbing Wang,
Chenghu Zhou,
Nanyang Ye
Abstract:
Multiple object tracking is a critical task in autonomous driving. Existing works primarily focus on the heuristic design of neural networks to obtain high accuracy. As tracking accuracy improves, however, neural networks become increasingly complex, posing challenges for their practical application in real driving scenarios due to the high level of latency. In this paper, we explore the use of th…
▽ More
Multiple object tracking is a critical task in autonomous driving. Existing works primarily focus on the heuristic design of neural networks to obtain high accuracy. As tracking accuracy improves, however, neural networks become increasingly complex, posing challenges for their practical application in real driving scenarios due to the high level of latency. In this paper, we explore the use of the neural architecture search (NAS) methods to search for efficient architectures for tracking, aiming for low real-time latency while maintaining relatively high accuracy. Another challenge for object tracking is the unreliability of a single sensor, therefore, we propose a multi-modal framework to improve the robustness. Experiments demonstrate that our algorithm can run on edge devices within lower latency constraints, thus greatly reducing the computational requirements for multi-modal object tracking while keeping lower latency.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis
Authors:
Kevin Xie,
Jonathan Lorraine,
Tianshi Cao,
Jun Gao,
James Lucas,
Antonio Torralba,
Sanja Fidler,
Xiaohui Zeng
Abstract:
Recent text-to-3D generation approaches produce impressive 3D results but require time-consuming optimization that can take up to an hour per prompt. Amortized methods like ATT3D optimize multiple prompts simultaneously to improve efficiency, enabling fast text-to-3D synthesis. However, they cannot capture high-frequency geometry and texture details and struggle to scale to large prompt sets, so t…
▽ More
Recent text-to-3D generation approaches produce impressive 3D results but require time-consuming optimization that can take up to an hour per prompt. Amortized methods like ATT3D optimize multiple prompts simultaneously to improve efficiency, enabling fast text-to-3D synthesis. However, they cannot capture high-frequency geometry and texture details and struggle to scale to large prompt sets, so they generalize poorly. We introduce LATTE3D, addressing these limitations to achieve fast, high-quality generation on a significantly larger prompt set. Key to our method is 1) building a scalable architecture and 2) leveraging 3D data during optimization through 3D-aware diffusion priors, shape regularization, and model initialization to achieve robustness to diverse and complex training prompts. LATTE3D amortizes both neural field and textured surface generation to produce highly detailed textured meshes in a single forward pass. LATTE3D generates 3D objects in 400ms, and can be further enhanced with fast test-time optimization.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Visual Imitation Learning of Task-Oriented Object Grasping and Rearrangement
Authors:
Yichen Cai,
Jianfeng Gao,
Christoph Pohl,
Tamim Asfour
Abstract:
Task-oriented object grasping and rearrangement are critical skills for robots to accomplish different real-world manipulation tasks. However, they remain challenging due to partial observations of the objects and shape variations in categorical objects. In this paper, we propose the Multi-feature Implicit Model (MIMO), a novel object representation that encodes multiple spatial features between a…
▽ More
Task-oriented object grasping and rearrangement are critical skills for robots to accomplish different real-world manipulation tasks. However, they remain challenging due to partial observations of the objects and shape variations in categorical objects. In this paper, we propose the Multi-feature Implicit Model (MIMO), a novel object representation that encodes multiple spatial features between a point and an object in an implicit neural field. Training such a model on multiple features ensures that it embeds the object shapes consistently in different aspects, thus improving its performance in object shape reconstruction from partial observation, shape similarity measure, and modeling spatial relations between objects. Based on MIMO, we propose a framework to learn task-oriented object grasping and rearrangement from single or multiple human demonstration videos. The evaluations in simulation show that our approach outperforms the state-of-the-art methods for multi- and single-view observations. Real-world experiments demonstrate the efficacy of our approach in one- and few-shot imitation learning of manipulation tasks.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Electron wave spin in a cavity
Authors:
Ju Gao,
Fang Shen
Abstract:
Our study reveals electron spin in a cavity as a stable circulating current density, characterized by a torus topology. This current density circulates concentrically beyond the cavity boundary, illustrating the concept of evanescent wave spin. While the interaction with a uniform magnetic field aligns with established spin-field observations, our analysis of regional contributions deviates from p…
▽ More
Our study reveals electron spin in a cavity as a stable circulating current density, characterized by a torus topology. This current density circulates concentrically beyond the cavity boundary, illustrating the concept of evanescent wave spin. While the interaction with a uniform magnetic field aligns with established spin-field observations, our analysis of regional contributions deviates from particle-based spin predictions. The integration of charge and spin properties into a single Lorentz covariant entity suggests that the electron wave constitutes the fundamental and deterministic reality of the electron.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Fast-Poly: A Fast Polyhedral Framework For 3D Multi-Object Tracking
Authors:
Xiaoyu Li,
Dedong Liu,
Lijun Zhao,
Yitao Wu,
Xian Wu,
Jinghan Gao
Abstract:
3D Multi-Object Tracking (MOT) captures stable and comprehensive motion states of surrounding obstacles, essential for robotic perception. However, current 3D trackers face issues with accuracy and latency consistency. In this paper, we propose Fast-Poly, a fast and effective filter-based method for 3D MOT. Building upon our previous work Poly-MOT, Fast-Poly addresses object rotational anisotropy…
▽ More
3D Multi-Object Tracking (MOT) captures stable and comprehensive motion states of surrounding obstacles, essential for robotic perception. However, current 3D trackers face issues with accuracy and latency consistency. In this paper, we propose Fast-Poly, a fast and effective filter-based method for 3D MOT. Building upon our previous work Poly-MOT, Fast-Poly addresses object rotational anisotropy in 3D space, enhances local computation densification, and leverages parallelization technique, improving inference speed and precision. Fast-Poly is extensively tested on two large-scale tracking benchmarks with Python implementation. On the nuScenes dataset, Fast-Poly achieves new state-of-the-art performance with 75.8% AMOTA among all methods and can run at 34.2 FPS on a personal CPU. On the Waymo dataset, Fast-Poly exhibits competitive accuracy with 63.6% MOTA and impressive inference speed (35.5 FPS). The source code is publicly available at https://github.com/lixiaoyu2000/FastPoly.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Knowledge-Reuse Transfer Learning Methods in Molecular and Material Science
Authors:
An Chen,
Zhilong Wang,
Karl Luigi Loza Vidaurre,
Yanqiang Han,
Simin Ye,
Kehao Tao,
Shiwei Wang,
Jing Gao,
Jinjin Li
Abstract:
Molecules and materials are the foundation for the development of modern advanced industries such as energy storage systems and semiconductor devices. However, traditional trial-and-error methods or theoretical calculations are highly resource-intensive, and extremely long R&D (Research and Development) periods cannot meet the urgent need for molecules/materials in industrial development. Machine…
▽ More
Molecules and materials are the foundation for the development of modern advanced industries such as energy storage systems and semiconductor devices. However, traditional trial-and-error methods or theoretical calculations are highly resource-intensive, and extremely long R&D (Research and Development) periods cannot meet the urgent need for molecules/materials in industrial development. Machine learning (ML) methods based on big data are expected to break this dilemma. However, the difficulty in constructing large-scale datasets of new molecules/materials due to the high cost of data acquisition and annotation limits the development of machine learning. The application of transfer learning lowers the data requirements for model training, which makes transfer learning stand out in researches addressing data quality issues. In this review, we summarize recent advances in transfer learning related to molecular and materials science. We focus on the application of transfer learning methods for the discovery of advanced molecules/materials, particularly, the construction of transfer learning frameworks for different systems, and how transfer learning can enhance the performance of models. In addition, the challenges of transfer learning are also discussed.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Authors:
Alexander Khazatsky,
Karl Pertsch,
Suraj Nair,
Ashwin Balakrishna,
Sudeep Dasari,
Siddharth Karamcheti,
Soroush Nasiriany,
Mohan Kumar Srirama,
Lawrence Yunliang Chen,
Kirsty Ellis,
Peter David Fagan,
Joey Hejna,
Masha Itkina,
Marion Lepert,
Yecheng Jason Ma,
Patrick Tree Miller,
Jimmy Wu,
Suneel Belkhale,
Shivin Dass,
Huy Ha,
Arhan Jain,
Abraham Lee,
Youngwoon Lee,
Marius Memmel,
Sungjae Park
, et al. (74 additional authors not shown)
Abstract:
The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu…
▽ More
The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a result, even the most general robot manipulation policies today are mostly trained on data collected in a small number of environments with limited scene and task diversity. In this work, we introduce DROID (Distributed Robot Interaction Dataset), a diverse robot manipulation dataset with 76k demonstration trajectories or 350 hours of interaction data, collected across 564 scenes and 84 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance and improved generalization ability. We open source the full dataset, policy learning code, and a detailed guide for reproducing our robot hardware setup.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Generalized Ramsey--Turán density for cliques
Authors:
Jun Gao,
Suyun Jiang,
Hong Liu,
Maya Sankar
Abstract:
We study the generalized Ramsey--Turán function $\mathrm{RT}(n,K_s,K_t,o(n))$, which is the maximum possible number of copies of $K_s$ in an $n$-vertex $K_t$-free graph with independence number $o(n)$. The case when $s=2$ was settled by Erd{ő}s, S{ó}s, Bollob{á}s, Hajnal, and Szemerédi in the 1980s. We combinatorially resolve the general case for all $s\ge 3$, showing that the (asymptotic) extrema…
▽ More
We study the generalized Ramsey--Turán function $\mathrm{RT}(n,K_s,K_t,o(n))$, which is the maximum possible number of copies of $K_s$ in an $n$-vertex $K_t$-free graph with independence number $o(n)$. The case when $s=2$ was settled by Erd{ő}s, S{ó}s, Bollob{á}s, Hajnal, and Szemerédi in the 1980s. We combinatorially resolve the general case for all $s\ge 3$, showing that the (asymptotic) extremal graphs for this problem have simple (bounded) structures. In particular, it implies that the extremal structures follow a periodic pattern when $t$ is much larger than $s$. Our results disprove a conjecture of Balogh, Liu, and Sharifzadeh and show that a relaxed version does hold.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
GCAM: Gaussian and causal-attention model of food fine-grained recognition
Authors:
Guohang Zhuang,
Yue Hu,
Tianxing Yan,
JiaZhan Gao
Abstract:
Currently, most food recognition relies on deep learning for category classification. However, these approaches struggle to effectively distinguish between visually similar food samples, highlighting the pressing need to address fine-grained issues in food recognition. To mitigate these challenges, we propose the adoption of a Gaussian and causal-attention model for fine-grained object recognition…
▽ More
Currently, most food recognition relies on deep learning for category classification. However, these approaches struggle to effectively distinguish between visually similar food samples, highlighting the pressing need to address fine-grained issues in food recognition. To mitigate these challenges, we propose the adoption of a Gaussian and causal-attention model for fine-grained object recognition.In particular, we train to obtain Gaussian features over target regions, followed by the extraction of fine-grained features from the objects, thereby enhancing the feature mapping capabilities of the target regions. To counteract data drift resulting from uneven data distributions, we employ a counterfactual reasoning approach. By using counterfactual interventions, we analyze the impact of the learned image attention mechanism on network predictions, enabling the network to acquire more useful attention weights for fine-grained image recognition. Finally, we design a learnable loss strategy to balance training stability across various modules, ultimately improving the accuracy of the final target recognition. We validate our approach on four relevant datasets, demonstrating its excellent performance across these four datasets.We experimentally show that GCAM surpasses state-of-the-art methods on the ETH-FOOD101, UECFOOD256, and Vireo-FOOD172 datasets. Furthermore, our approach also achieves state-of-the-art performance on the CUB-200 dataset.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Kinetic inductance traveling wave amplifier designs for practical microwave readout applications
Authors:
A. Giachero,
M. Visser,
J. Wheeler,
L. Howe,
J. Gao,
J. Austermann,
J. Hubmayr,
A. Nucciotti,
J. Ullom
Abstract:
A Kinetic Inductance Traveling Wave amplifier (KIT) utilizes the nonlinear kinetic inductance of superconducting films, particularly Niobium Titanium Nitride (NbTiN), for parametric amplification. These amplifiers achieve remarkable performance in terms of gain, bandwidth, compression power, and frequently approach the quantum limit for noise. However, most KIT demonstrations have been isolated fr…
▽ More
A Kinetic Inductance Traveling Wave amplifier (KIT) utilizes the nonlinear kinetic inductance of superconducting films, particularly Niobium Titanium Nitride (NbTiN), for parametric amplification. These amplifiers achieve remarkable performance in terms of gain, bandwidth, compression power, and frequently approach the quantum limit for noise. However, most KIT demonstrations have been isolated from practical device readout systems. Using a KIT as the first amplifier in the readout chain of an unoptimized microwave SQUID multiplexer coupled to a transition-edge sensor microcalorimeter we see an initial improvement in the flux noise. One challenge in KIT integration is the considerable microwave pump power required to drive the non-linearity. To address this, we have initiated efforts to reduce the pump power by using thinner NbTiN films and an inverted microstrip transmission line design. In this article, we present the new transmission line design, fabrication procedure, and initial device characterization -- including gain and added noise. These devices exhibit over 10 dB of gain with a 3 dB bandwidth of approximately 5.5-7.25 GHz, a maximum practical gain of 12 dB and typical gain ripple under 4 dB peak-to-peak. We observe an appreciable impedance mismatch in the NbTiN transmission line, which is likely the source of the majority of the gain ripple. Finally we perform an initial noise characterization and demonstrate system-added noise of three quanta or less over nearly the entire 3 dB bandwidth.
△ Less
Submitted 20 March, 2024; v1 submitted 17 March, 2024;
originally announced March 2024.
-
Is Contrastive Learning Necessary? A Study of Data Augmentation vs Contrastive Learning in Sequential Recommendation
Authors:
Peilin Zhou,
You-Liang Huang,
Yueqi Xie,
Jingqi Gao,
Shoujin Wang,
Jae Boum Kim,
Sunghun Kim
Abstract:
Sequential recommender systems (SRS) are designed to predict users' future behaviors based on their historical interaction data. Recent research has increasingly utilized contrastive learning (CL) to leverage unsupervised signals to alleviate the data sparsity issue in SRS. In general, CL-based SRS first augments the raw sequential interaction data by using data augmentation strategies and employs…
▽ More
Sequential recommender systems (SRS) are designed to predict users' future behaviors based on their historical interaction data. Recent research has increasingly utilized contrastive learning (CL) to leverage unsupervised signals to alleviate the data sparsity issue in SRS. In general, CL-based SRS first augments the raw sequential interaction data by using data augmentation strategies and employs a contrastive training scheme to enforce the representations of those sequences from the same raw interaction data to be similar. Despite the growing popularity of CL, data augmentation, as a basic component of CL, has not received sufficient attention. This raises the question: Is it possible to achieve superior recommendation results solely through data augmentation? To answer this question, we benchmark eight widely used data augmentation strategies, as well as state-of-the-art CL-based SRS methods, on four real-world datasets under both warm- and cold-start settings. Intriguingly, the conclusion drawn from our study is that, certain data augmentation strategies can achieve similar or even superior performance compared with some CL-based methods, demonstrating the potential to significantly alleviate the data sparsity issue with fewer computational overhead. We hope that our study can further inspire more fundamental studies on the key functional components of complex CL techniques. Our processed datasets and codes are available at https://github.com/AIM-SE/DA4Rec.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Local existence and uniqueness of solution to the two-dimensional inhomogeneous Prandtl equations by energy method
Authors:
Jincheng Gao,
Lianyun Peng,
Zheng-an Yao
Abstract:
In this paper, we consider the local existence and uniqueness result for the inhomogeneous Prandtl equations in dimension two by energy method. First of all, for the homogeneous case, the local-in-time well-posedness theory of unsteady Prandtl equations was obtained by [Alexandre, Wang, Xu, Yang, J. Am. Math. Soc., 28 (3), 745-784 (2015)] and [Masmoudi, Wong, Comm. Pure Appl. Math., 68 (10), 1683-…
▽ More
In this paper, we consider the local existence and uniqueness result for the inhomogeneous Prandtl equations in dimension two by energy method. First of all, for the homogeneous case, the local-in-time well-posedness theory of unsteady Prandtl equations was obtained by [Alexandre, Wang, Xu, Yang, J. Am. Math. Soc., 28 (3), 745-784 (2015)] and [Masmoudi, Wong, Comm. Pure Appl. Math., 68 (10), 1683-1741 (2015)] independently by energy method without any transformation. However, for the inhomogeneous case, the appearance of density will create some new difficulties for us to overcome the loss of tangential derivative of horizontal velocity. Thus, our first result is to overcome the loss of tangential derivative such that one can establish the local-in-time well-posedness result for the inhomogeneous Prandtl equations by energy method. Secondly, for the homogeneous case, the local-in-x well-posedness in higher regular space for the steady Prandtl equations was obtained by [Guo, Iyer, Comm. Math. Phys., 382 (3), 1403-447 (2021)] by energy method since they firstly found the good quantity(called `quotient'). With the help of this quotient, our second result is to establish the local-in-x well-posedness in higher regular Sobolev space for the steady inhomogeneous Prandtl equations.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.