-
MEEG and AT-DGNN: Advancing EEG Emotion Recognition with Music and Graph Learning
Authors:
Minghao Xiao,
Zhengxi Zhu,
Wenyu Wang,
Meixia Qu
Abstract:
Recent advances in neuroscience have elucidated the crucial role of coordinated brain region activities during cognitive tasks. To explore the complexity, we introduce the MEEG dataset, a comprehensive multi-modal music-induced electroencephalogram (EEG) dataset and the Attention-based Temporal Learner with Dynamic Graph Neural Network (AT-DGNN), a novel framework for EEG-based emotion recognition…
▽ More
Recent advances in neuroscience have elucidated the crucial role of coordinated brain region activities during cognitive tasks. To explore the complexity, we introduce the MEEG dataset, a comprehensive multi-modal music-induced electroencephalogram (EEG) dataset and the Attention-based Temporal Learner with Dynamic Graph Neural Network (AT-DGNN), a novel framework for EEG-based emotion recognition. The MEEG dataset captures a wide range of emotional responses to music, enabling an in-depth analysis of brainwave patterns in musical contexts. The AT-DGNN combines an attention-based temporal learner with a dynamic graph neural network (DGNN) to accurately model the local and global graph dynamics of EEG data across varying brain network topology. Our evaluations show that AT-DGNN achieves superior performance, with an accuracy (ACC) of 83.06\% in arousal and 85.31\% in valence, outperforming state-of-the-art (SOTA) methods on the MEEG dataset. Comparative analyses with traditional datasets like DEAP highlight the effectiveness of our approach and underscore the potential of music as a powerful medium for emotion induction. This study not only advances our understanding of the brain emotional processing, but also enhances the accuracy of emotion recognition technologies in brain-computer interfaces (BCI), leveraging both graph-based learning and the emotional impact of music. The source code and dataset are available at \textit{https://github.com/xmh1011/AT-DGNN}.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
ACTRESS: Active Retraining for Semi-supervised Visual Grounding
Authors:
Weitai Kang,
Mengxue Qu,
Yunchao Wei,
Yan Yan
Abstract:
Semi-Supervised Visual Grounding (SSVG) is a new challenge for its sparse labeled data with the need for multimodel understanding. A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision. However, this approach is incompatible with current state-of-the-art visual gro…
▽ More
Semi-Supervised Visual Grounding (SSVG) is a new challenge for its sparse labeled data with the need for multimodel understanding. A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision. However, this approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline. These pipelines directly regress results without region proposals or foreground binary classification, rendering them unsuitable for fitting in RefTeacher due to the absence of confidence scores. Furthermore, the geometric difference in teacher and student inputs, stemming from different data augmentations, induces natural misalignment in attention-based constraints. To establish a compatible SSVG framework, our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS. Initially, the model is enhanced by incorporating an additional quantized detection head to expose its detection confidence. Building upon this, ACTRESS consists of an active sampling strategy and a selective retraining strategy. The active sampling strategy iteratively selects high-quality pseudo labels by evaluating three crucial aspects: Faithfulness, Robustness, and Confidence, optimizing the utilization of unlabeled data. The selective retraining strategy retrains the model with periodic re-initialization of specific parameters, facilitating the model's escape from local minima. Extensive experiments demonstrates our superior performance on widely-used benchmark datasets.
△ Less
Submitted 6 July, 2024; v1 submitted 3 July, 2024;
originally announced July 2024.
-
Burn-in Test and Thermal Performance Evaluation of Silicon Photomultipliers for the JUNO-TAO Experiment
Authors:
X. Chen,
G. F. Cao,
M. H. Qu,
H. W. Wang,
N. Anfimov,
A. Rybnikov,
J. Y. Xu,
A. Q. Su,
Z. L. Chen,
J. Cao,
Y. C. Li,
M. Qi
Abstract:
This study evaluates more than 4,000 tiles made of Hamamatsu visual-sensitive silicon photomultipier (SiPM), each with dimensions of 5 $\times$ 5 cm$^2$, intended for the central detector of the Taishan Anti-neutrino Observatory (TAO), a satellite experiment of the Jiangmen Underground Neutrino Observatory (JUNO) aimed at measuring the reactor anti-neutrino energy spectrum with unprecedented energ…
▽ More
This study evaluates more than 4,000 tiles made of Hamamatsu visual-sensitive silicon photomultipier (SiPM), each with dimensions of 5 $\times$ 5 cm$^2$, intended for the central detector of the Taishan Anti-neutrino Observatory (TAO), a satellite experiment of the Jiangmen Underground Neutrino Observatory (JUNO) aimed at measuring the reactor anti-neutrino energy spectrum with unprecedented energy resolution. All SiPM tiles underwent a room temperature burn-in test in the dark for two weeks, while cryogenic testing analyzed the thermal dependence of parameters for some sampled SiPMs. Results from these comprehensive tests provide crucial insights into the long-term performance and stability of the 10 square meters of SiPMs operating at -50°C to detect scintillation photons in the TAO detector. Despite some anomalies awaiting further inspection, all SiPMs successfully passed the burn-in test over two weeks at room temperature, which is equivalent to 6.7 years at -50°C. Results are also used to guide optimal SiPM selection, configuration, and operation, ensuring reliability and sustainability in reactor neutrino measurements. This work also provides insights for a rapid and robust quality assessment in future experiments that employ large-scale SiPMs as detection systems.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention
Authors:
Weitai Kang,
Mengxue Qu,
Jyoti Kini,
Yunchao Wei,
Mubarak Shah,
Yan Yan
Abstract:
In real-life scenarios, humans seek out objects in the 3D world to fulfill their daily needs or intentions. This inspires us to introduce 3D intention grounding, a new task in 3D object detection employing RGB-D, based on human intention, such as "I want something to support my back". Closely related, 3D visual grounding focuses on understanding human reference. To achieve detection based on human…
▽ More
In real-life scenarios, humans seek out objects in the 3D world to fulfill their daily needs or intentions. This inspires us to introduce 3D intention grounding, a new task in 3D object detection employing RGB-D, based on human intention, such as "I want something to support my back". Closely related, 3D visual grounding focuses on understanding human reference. To achieve detection based on human intention, it relies on humans to observe the scene, reason out the target that aligns with their intention ("pillow" in this case), and finally provide a reference to the AI system, such as "A pillow on the couch". Instead, 3D intention grounding challenges AI agents to automatically observe, reason and detect the desired target solely based on human intention. To tackle this challenge, we introduce the new Intent3D dataset, consisting of 44,990 intention texts associated with 209 fine-grained classes from 1,042 scenes of the ScanNet dataset. We also establish several baselines based on different language-based 3D object detection models on our benchmark. Finally, we propose IntentNet, our unique approach, designed to tackle this intention-based detection problem. It focuses on three key aspects: intention understanding, reasoning to identify object candidates, and cascaded adaptive learning that leverages the intrinsic priority logic of different losses for multiple objective optimization.
△ Less
Submitted 6 July, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
JUNO Sensitivity to Invisible Decay Modes of Neutrons
Authors:
JUNO Collaboration,
Angel Abusleme,
Thomas Adam,
Kai Adamowicz,
Shakeel Ahmad,
Rizwan Ahmed,
Sebastiano Aiello,
Fengpeng An,
Qi An,
Giuseppe Andronico,
Nikolay Anfimov,
Vito Antonelli,
Tatiana Antoshkina,
João Pedro Athayde Marcondes de André,
Didier Auguste,
Weidong Bai,
Nikita Balashov,
Wander Baldini,
Andrea Barresi,
Davide Basilico,
Eric Baussan,
Marco Bellato,
Marco Beretta,
Antonio Bergnoli,
Daniel Bick
, et al. (635 additional authors not shown)
Abstract:
We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation mode…
▽ More
We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation modes of the excited residual nuclei can produce a time- and space-correlated triple coincidence signal in the JUNO detector. Based on a full Monte Carlo simulation informed with the latest available data, we estimate all backgrounds, including inverse beta decay events of the reactor antineutrino $\barν_e$, natural radioactivity, cosmogenic isotopes and neutral current interactions of atmospheric neutrinos. Pulse shape discrimination and multivariate analysis techniques are employed to further suppress backgrounds. With two years of exposure, JUNO is expected to give an order of magnitude improvement compared to the current best limits. After 10 years of data taking, the JUNO expected sensitivities at a 90% confidence level are $τ/B( n \rightarrow { inv} ) > 5.0 \times 10^{31} \, {\rm yr}$ and $τ/B( nn \rightarrow { inv} ) > 1.4 \times 10^{32} \, {\rm yr}$.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Deep Learning and LLM-based Methods Applied to Stellar Lightcurve Classification
Authors:
Yu-Yang Li,
Yu Bai,
Cunshi Wang,
Mengwei Qu,
Ziteng Lu,
Roberto Soria,
Jifeng Liu
Abstract:
Light curves serve as a valuable source of information on stellar formation and evolution. With the rapid advancement of machine learning techniques, it can be effectively processed to extract astronomical patterns and information. In this study, we present a comprehensive evaluation of deep-learning and large language model (LLM) based models for the automatic classification of variable star ligh…
▽ More
Light curves serve as a valuable source of information on stellar formation and evolution. With the rapid advancement of machine learning techniques, it can be effectively processed to extract astronomical patterns and information. In this study, we present a comprehensive evaluation of deep-learning and large language model (LLM) based models for the automatic classification of variable star light curves, based on large datasets from the Kepler and K2 missions. Special emphasis is placed on Cepheids, RR Lyrae, and eclipsing binaries, examining the influence of observational cadence and phase distribution on classification precision. Employing AutoDL optimization, we achieve striking performance with the 1D-Convolution+BiLSTM architecture and the Swin Transformer, hitting accuracies of 94\% and 99\% correspondingly, with the latter demonstrating a notable 83\% accuracy in discerning the elusive Type II Cepheids-comprising merely 0.02\% of the total dataset.We unveil StarWhisper LightCurve (LC), an innovative Series comprising three LLM-based models: LLM, multimodal large language model (MLLM), and Large Audio Language Model (LALM). Each model is fine-tuned with strategic prompt engineering and customized training methods to explore the emergent abilities of these models for astronomical data. Remarkably, StarWhisper LC Series exhibit high accuracies around 90\%, significantly reducing the need for explicit feature engineering, thereby paving the way for streamlined parallel data processing and the progression of multifaceted multimodal models in astronomical applications. The study furnishes two detailed catalogs illustrating the impacts of phase and sampling intervals on deep learning classification accuracy, showing that a substantial decrease of up to 14\% in observation duration and 21\% in sampling points can be realized without compromising accuracy by more than 10\%.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Performance of the Mass Testing Setup for Arrays of Silicon Photomultipliers in the TAO Experiment
Authors:
A. Rybnikov,
N. Anfimov,
M. Qu,
A. Chetverikov,
G. Cao,
D. Fedoseev,
H. Wang,
V. Kozhukalov,
V. Sharov,
S. Sokolov,
A. Sotnikov
Abstract:
Modern neutrino physics detectors often employ thousands, and sometimes even hundreds of thousands, of Silicon Photomultipliers (SiPMs). The TAO experiment is a notable example that utilizes a spherical scintillator barrel with a diameter of 1.8 meters, housing approximately 130,000 SiPMs organized into 4,100 tiles. Each tile with size of 5x5cm^2 consists of a 32-SiPM array functioning as a single…
▽ More
Modern neutrino physics detectors often employ thousands, and sometimes even hundreds of thousands, of Silicon Photomultipliers (SiPMs). The TAO experiment is a notable example that utilizes a spherical scintillator barrel with a diameter of 1.8 meters, housing approximately 130,000 SiPMs organized into 4,100 tiles. Each tile with size of 5x5cm^2 consists of a 32-SiPM array functioning as a single detector unit. To achieve an unparalleled energy resolution of 2% at 1 MeV within this volume, the SiPMs must possess cutting-edge parameters, including a photon detection efficiency (PDE) exceeding 50%, cross-talk of approximately 10%, and an extremely low dark count rate (DCR) below 50Hz/mm^2. Maintaining the setup at a negative temperature of -50C is necessary to achieve the desired DCR. This article presents the setup and methods employed to individually characterize the mass of SiPMs across all 4,100 tiles at the specified negative temperature.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science
Authors:
Xiangru Tang,
Qiao Jin,
Kunlun Zhu,
Tongxin Yuan,
Yichi Zhang,
Wangchunshu Zhou,
Meng Qu,
Yilun Zhao,
Jian Tang,
Zhuosheng Zhang,
Arman Cohan,
Zhiyong Lu,
Mark Gerstein
Abstract:
Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines. While their capabilities are promising, these agents, called scientific LLM agents, also introduce novel vulnerabilities that demand careful consideration for safety. However, there exists a notab…
▽ More
Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines. While their capabilities are promising, these agents, called scientific LLM agents, also introduce novel vulnerabilities that demand careful consideration for safety. However, there exists a notable gap in the literature, as there has been no comprehensive exploration of these vulnerabilities. This perspective paper fills this gap by conducting a thorough examination of vulnerabilities in LLM-based agents within scientific domains, shedding light on potential risks associated with their misuse and emphasizing the need for safety measures. We begin by providing a comprehensive overview of the potential risks inherent to scientific LLM agents, taking into account user intent, the specific scientific domain, and their potential impact on the external environment. Then, we delve into the origins of these vulnerabilities and provide a scoping review of the limited existing works. Based on our analysis, we propose a triadic framework involving human regulation, agent alignment, and an understanding of environmental feedback (agent regulation) to mitigate these identified risks. Furthermore, we highlight the limitations and challenges associated with safeguarding scientific agents and advocate for the development of improved models, robust benchmarks, and comprehensive regulations to address these issues effectively.
△ Less
Submitted 5 June, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
A parking function interpretation for $(-1)^{k}\nabla m_{2^{k}1^{l}}$
Authors:
Menghao Qu,
Guoce Xin
Abstract:
Haglund, Morse, and Zabrocki introduced a family of creation operators of Hall-Littlewood polynomials, $\{C_{a}\}$ for any $a\in \mathbb{Z}$, in their compositional refinement of the shuffle (ex-)conjecture. For any $α\vDash n$, the combinatorial formula for $\nabla C_α$ is a weighted sum of parking functions. These summations can be converted to a weighted sum of certain LLT polynomials. Thus…
▽ More
Haglund, Morse, and Zabrocki introduced a family of creation operators of Hall-Littlewood polynomials, $\{C_{a}\}$ for any $a\in \mathbb{Z}$, in their compositional refinement of the shuffle (ex-)conjecture. For any $α\vDash n$, the combinatorial formula for $\nabla C_α$ is a weighted sum of parking functions. These summations can be converted to a weighted sum of certain LLT polynomials. Thus $\nabla C_α$ is Schur positive since Grojnowski and Haiman proved that all LLT polynomials are Schur positive. In this paper, we obtain a recursion that implies the $C$-positivity of $(-1)^{k} m_{2^{k}1^{l}}$, and hence prove the Schur positivity of $(-1)^{k}\nabla m_{2^{k}1^{l}}$. As a corollary, a parking function interpretation for $(-1)^{k}\nabla m_{2^{k}1^{l}}$ is obtained by using the compositional shuffle theorem of Carlsson and Mellit.
△ Less
Submitted 27 December, 2023;
originally announced December 2023.
-
RIO: A Benchmark for Reasoning Intention-Oriented Objects in Open Environments
Authors:
Mengxue Qu,
Yu Wu,
Wu Liu,
Xiaodan Liang,
Jingkuan Song,
Yao Zhao,
Yunchao Wei
Abstract:
Intention-oriented object detection aims to detect desired objects based on specific intentions or requirements. For instance, when we desire to "lie down and rest", we instinctively seek out a suitable option such as a "bed" or a "sofa" that can fulfill our needs. Previous work in this area is limited either by the number of intention descriptions or by the affordance vocabulary available for int…
▽ More
Intention-oriented object detection aims to detect desired objects based on specific intentions or requirements. For instance, when we desire to "lie down and rest", we instinctively seek out a suitable option such as a "bed" or a "sofa" that can fulfill our needs. Previous work in this area is limited either by the number of intention descriptions or by the affordance vocabulary available for intention objects. These limitations make it challenging to handle intentions in open environments effectively. To facilitate this research, we construct a comprehensive dataset called Reasoning Intention-Oriented Objects (RIO). In particular, RIO is specifically designed to incorporate diverse real-world scenarios and a wide range of object categories. It offers the following key features: 1) intention descriptions in RIO are represented as natural sentences rather than a mere word or verb phrase, making them more practical and meaningful; 2) the intention descriptions are contextually relevant to the scene, enabling a broader range of potential functionalities associated with the objects; 3) the dataset comprises a total of 40,214 images and 130,585 intention-object pairs. With the proposed RIO, we evaluate the ability of some existing models to reason intention-oriented objects in open environments.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
Towards a compact soliton microcomb fully referenced on atomic reference
Authors:
Mingfei Qu,
Dou Li,
Chenhong Li,
Kangqi Liu,
Weihang Zhu,
Yuan Wei,
Pengfei Wang,
Songbai Kang
Abstract:
A fully stabilized soliton microcomb is critical for many applications of optical frequency comb based on microresonators. However, the current approaches for full frequency stabilization require either external acousto-optic or electro-optic devices or auxiliary lasers and multiple phase-locked loops, which compromises the convenience of the system. This study explores a compact atomic referenced…
▽ More
A fully stabilized soliton microcomb is critical for many applications of optical frequency comb based on microresonators. However, the current approaches for full frequency stabilization require either external acousto-optic or electro-optic devices or auxiliary lasers and multiple phase-locked loops, which compromises the convenience of the system. This study explores a compact atomic referenced fully stabilized soliton microcomb that directly uses a rubidium atomic optical frequency reference as the pump source, and complements the repetition rate (7.3 GHz) of the soliton microcomb was phase-locked to an atomic-clock-stabilized radio frequency (RF) reference by mechanically tuning the resonance of the optical resonator. The results demonstrate that the stability of the comb line (0.66 THz away from the pump line) is consistent with that of the Rb87 optical reference, attaining a level of approximately 4 Hz @100 s, corresponding to the frequency stability of 2E-14 @100 s. Furthermore,the frequency reproducibility of the comb line was evaluated over six days and it was discovered that the standard deviation (SD) of the frequency of the comb line is 10 kHz, resulting in a corresponding absolute deviation uncertainty of 1.3E-10, which is technically limited by the locking range of the soliton repetition rate. The proposed method gives a low-power and compact solution for fully stabilized soliton micorcombs.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
GraphText: Graph Reasoning in Text Space
Authors:
Jianan Zhao,
Le Zhuo,
Yikang Shen,
Meng Qu,
Kai Liu,
Michael Bronstein,
Zhaocheng Zhu,
Jian Tang
Abstract:
Large Language Models (LLMs) have gained the ability to assimilate human knowledge and facilitate natural language interactions with both humans and other LLMs. However, despite their impressive achievements, LLMs have not made significant advancements in the realm of graph machine learning. This limitation arises because graphs encapsulate distinct relational data, making it challenging to transf…
▽ More
Large Language Models (LLMs) have gained the ability to assimilate human knowledge and facilitate natural language interactions with both humans and other LLMs. However, despite their impressive achievements, LLMs have not made significant advancements in the realm of graph machine learning. This limitation arises because graphs encapsulate distinct relational data, making it challenging to transform them into natural language that LLMs understand. In this paper, we bridge this gap with a novel framework, GraphText, that translates graphs into natural language. GraphText derives a graph-syntax tree for each graph that encapsulates both the node attributes and inter-node relationships. Traversal of the tree yields a graph text sequence, which is then processed by an LLM to treat graph tasks as text generation tasks. Notably, GraphText offers multiple advantages. It introduces training-free graph reasoning: even without training on graph data, GraphText with ChatGPT can achieve on par with, or even surpassing, the performance of supervised-trained graph neural networks through in-context learning (ICL). Furthermore, GraphText paves the way for interactive graph reasoning, allowing both humans and LLMs to communicate with the model seamlessly using natural language. These capabilities underscore the vast, yet-to-be-explored potential of LLMs in the domain of graph machine learning.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Real-time Monitoring for the Next Core-Collapse Supernova in JUNO
Authors:
Angel Abusleme,
Thomas Adam,
Shakeel Ahmad,
Rizwan Ahmed,
Sebastiano Aiello,
Muhammad Akram,
Abid Aleem,
Fengpeng An,
Qi An,
Giuseppe Andronico,
Nikolay Anfimov,
Vito Antonelli,
Tatiana Antoshkina,
Burin Asavapibhop,
João Pedro Athayde Marcondes de André,
Didier Auguste,
Weidong Bai,
Nikita Balashov,
Wander Baldini,
Andrea Barresi,
Davide Basilico,
Eric Baussan,
Marco Bellato,
Marco Beretta,
Antonio Bergnoli
, et al. (606 additional authors not shown)
Abstract:
The core-collapse supernova (CCSN) is considered one of the most energetic astrophysical events in the universe. The early and prompt detection of neutrinos before (pre-SN) and during the supernova (SN) burst presents a unique opportunity for multi-messenger observations of CCSN events. In this study, we describe the monitoring concept and present the sensitivity of the system to pre-SN and SN neu…
▽ More
The core-collapse supernova (CCSN) is considered one of the most energetic astrophysical events in the universe. The early and prompt detection of neutrinos before (pre-SN) and during the supernova (SN) burst presents a unique opportunity for multi-messenger observations of CCSN events. In this study, we describe the monitoring concept and present the sensitivity of the system to pre-SN and SN neutrinos at the Jiangmen Underground Neutrino Observatory (JUNO), a 20 kton liquid scintillator detector currently under construction in South China. The real-time monitoring system is designed to ensure both prompt alert speed and comprehensive coverage of progenitor stars. It incorporates prompt monitors on the electronic board as well as online monitors at the data acquisition stage. Assuming a false alert rate of 1 per year, this monitoring system exhibits sensitivity to pre-SN neutrinos up to a distance of approximately 1.6 (0.9) kiloparsecs and SN neutrinos up to about 370 (360) kiloparsecs for a progenitor mass of 30 solar masses, considering both normal and inverted mass ordering scenarios. The pointing ability of the CCSN is evaluated by analyzing the accumulated event anisotropy of inverse beta decay interactions from pre-SN or SN neutrinos. This, along with the early alert, can play a crucial role in facilitating follow-up multi-messenger observations of the next galactic or nearby extragalactic CCSN.
△ Less
Submitted 4 December, 2023; v1 submitted 13 September, 2023;
originally announced September 2023.
-
Soliton generation in CaF$_2$ crystalline whispering gallery mode resonators with negative thermal-optical effects
Authors:
Mingfei Qu,
Chenhong Li,
Kangqi Liu,
Weihang Zhu,
Yuan Wei,
Pengfei Wang,
Songbai Kang
Abstract:
Calcium fluoride (CaF$_2$) crystalline whispering gallery mode resonators (WGMRs) exhibit ultrahigh intrinsic quality factors and a low power anomalous dispersion in the communication and mid-infrared bands, making them attractive platforms for microresonator-based comb generation. However, their unique negative thermo-optic effects pose challenges when achieving thermal equilibrium. To our knowle…
▽ More
Calcium fluoride (CaF$_2$) crystalline whispering gallery mode resonators (WGMRs) exhibit ultrahigh intrinsic quality factors and a low power anomalous dispersion in the communication and mid-infrared bands, making them attractive platforms for microresonator-based comb generation. However, their unique negative thermo-optic effects pose challenges when achieving thermal equilibrium. To our knowledge, our experiments serve as the first demonstration of soliton microcombs in Q > 109 CaF$_2$ WGMRs. We observed soliton mode-locking and bidirectional switching of soliton numbers caused by the negative thermo-optic effects. Additionally, various soliton formation dynamics are shown, including breathing and vibrational solitons, which can be attributed to thermo-photomechanical oscillations. Thus, our results enrich the soliton generation platform and provide a reference for generating solitons from WGMRs that comprise other materials with negative thermo-optic effects. In the future, the ultrahigh quality factor of CaF$_2$ crystal cavities may enable the generation of sub-milliwatt-level broad-spectrum soliton combs.
△ Less
Submitted 3 June, 2023;
originally announced June 2023.
-
TGNN: A Joint Semi-supervised Framework for Graph-level Classification
Authors:
Wei Ju,
Xiao Luo,
Meng Qu,
Yifan Wang,
Chong Chen,
Minghua Deng,
Xian-Sheng Hua,
Ming Zhang
Abstract:
This paper studies semi-supervised graph classification, a crucial task with a wide range of applications in social network analysis and bioinformatics. Recent works typically adopt graph neural networks to learn graph-level representations for classification, failing to explicitly leverage features derived from graph topology (e.g., paths). Moreover, when labeled data is scarce, these methods are…
▽ More
This paper studies semi-supervised graph classification, a crucial task with a wide range of applications in social network analysis and bioinformatics. Recent works typically adopt graph neural networks to learn graph-level representations for classification, failing to explicitly leverage features derived from graph topology (e.g., paths). Moreover, when labeled data is scarce, these methods are far from satisfactory due to their insufficient topology exploration of unlabeled data. We address the challenge by proposing a novel semi-supervised framework called Twin Graph Neural Network (TGNN). To explore graph structural information from complementary views, our TGNN has a message passing module and a graph kernel module. To fully utilize unlabeled data, for each module, we calculate the similarity of each unlabeled graph to other labeled graphs in the memory bank and our consistency loss encourages consistency between two similarity distributions in different embedding spaces. The two twin modules collaborate with each other by exchanging instance similarity knowledge to fully explore the structure information of both labeled and unlabeled data. We evaluate our TGNN on various public datasets and show that it achieves strong performance.
△ Less
Submitted 23 April, 2023;
originally announced April 2023.
-
Interpretable machine learning-accelerated seed treatment by nanomaterials for environmental stress alleviation
Authors:
Hengjie Yu,
Dan Luo,
Sam F. Y. Li,
Maozhen Qu,
Da Liu,
Yingchao He,
Fang Cheng
Abstract:
Crops are constantly challenged by different environmental conditions. Seed treatment by nanomaterials is a cost-effective and environmentally-friendly solution for environmental stress mitigation in crop plants. Here, 56 seed nanopriming treatments are used to alleviate environmental stresses in maize. Seven selected nanopriming treatments significantly increase the stress resistance index (SRI)…
▽ More
Crops are constantly challenged by different environmental conditions. Seed treatment by nanomaterials is a cost-effective and environmentally-friendly solution for environmental stress mitigation in crop plants. Here, 56 seed nanopriming treatments are used to alleviate environmental stresses in maize. Seven selected nanopriming treatments significantly increase the stress resistance index (SRI) by 13.9% and 12.6% under salinity stress and combined heat-drought stress, respectively. Metabolomics data reveals that ZnO nanopriming treatment, with the highest SRI value, mainly regulates the pathways of amino acid metabolism, secondary metabolite synthesis, carbohydrate metabolism, and translation. Understanding the mechanism of seed nanopriming is still difficult due to the variety of nanomaterials and the complexity of interactions between nanomaterials and plants. Using the nanopriming data, we present an interpretable structure-activity relationship (ISAR) approach based on interpretable machine learning for predicting and understanding its stress mitigation effects. The post hoc and model-based interpretation approaches of machine learning are combined to provide complementary benefits and give researchers or policymakers more illuminating or trustworthy results. The concentration, size, and zeta potential of nanoparticles are identified as dominant factors for correlating root dry weight under salinity stress, and their effects and interactions are explained. Additionally, a web-based interactive tool is developed for offering prediction-level interpretation and gathering more details about specific nanopriming treatments. This work offers a promising framework for accelerating the agricultural applications of nanomaterials and may profoundly contribute to nanosafety assessment.
△ Less
Submitted 8 April, 2023;
originally announced April 2023.
-
Thermoelectric properties of cement composite analogues from first principles calculations
Authors:
Esther Orisakwe,
Conrad Johnston,
Ruchita Jani,
Xiaoli Liu,
Lorenzo Stella,
Jorge Kohanoff,
Niall Holmes,
Brian Norton,
Ming Qu,
Hongxi Yin,
Kazuaki Yazawa
Abstract:
Buildings are responsible for a considerable fraction of the energy wasted globally every year, and as a result, excess carbon emissions. While heat is lost directly in colder months and climates, resulting in increased heating loads, in hot climates cooling and ventilation is required. One avenue towards improving the energy efficiency of buildings is to integrate thermoelectric devices and mater…
▽ More
Buildings are responsible for a considerable fraction of the energy wasted globally every year, and as a result, excess carbon emissions. While heat is lost directly in colder months and climates, resulting in increased heating loads, in hot climates cooling and ventilation is required. One avenue towards improving the energy efficiency of buildings is to integrate thermoelectric devices and materials within the fabric of the building to exploit the temperature gradient between the inside and outside to do useful work. Cement-based materials are ubiquitous in modern buildings and present an interesting opportunity to be functionalised. We present a systematic investigation of the electronic transport coefficients relevant to the thermoelectric materials of the calcium silicate hydrate (C-S-H) gel analogue, tobermorite, using Density Functional Theory calculations with the Boltzmann transport method. The calculated values of the Seebeck coefficient are within the typical magnitude (200 - 600 $μV/K$) indicative of a good thermoelectric material. The tobermorite models are predicted to be intrinsically $p$-type thermoelectric material because of the presence of large concentration of the Si-O tetrahedra sites. The calculated electronic $ZT$ for the tobermorite models have their optimal values of 0.983 at (400 $\mathrm{K}$ and $10^{17}$ $\mathrm{cm^{-3}}$) for tobermorite 9 Å, 0.985 at (400 $\mathrm{K}$ and $10^{17}$ $\mathrm{cm^{-3}}$) for tobermorite 11 Å and 1.20 at (225 $\mathrm{K}$ and $10^{19}$ $\mathrm{cm^{-3}}$) for tobermorite 14 Å, respectively.
△ Less
Submitted 30 November, 2022;
originally announced November 2022.
-
Learning on Large-scale Text-attributed Graphs via Variational Inference
Authors:
Jianan Zhao,
Meng Qu,
Chaozhuo Li,
Hao Yan,
Qian Liu,
Rui Li,
Xing Xie,
Jian Tang
Abstract:
This paper studies learning on text-attributed graphs (TAGs), where each node is associated with a text description. An ideal solution for such a problem would be integrating both the text and graph structure information with large language models and graph neural networks (GNNs). However, the problem becomes very challenging when graphs are large due to the high computational complexity brought b…
▽ More
This paper studies learning on text-attributed graphs (TAGs), where each node is associated with a text description. An ideal solution for such a problem would be integrating both the text and graph structure information with large language models and graph neural networks (GNNs). However, the problem becomes very challenging when graphs are large due to the high computational complexity brought by training large language models and GNNs together. In this paper, we propose an efficient and effective solution to learning on large text-attributed graphs by fusing graph structure and language learning with a variational Expectation-Maximization (EM) framework, called GLEM. Instead of simultaneously training large language models and GNNs on big graphs, GLEM proposes to alternatively update the two modules in the E-step and M-step. Such a procedure allows training the two modules separately while simultaneously allowing the two modules to interact and mutually enhance each other. Extensive experiments on multiple data sets demonstrate the efficiency and effectiveness of the proposed approach.
△ Less
Submitted 1 March, 2023; v1 submitted 26 October, 2022;
originally announced October 2022.
-
SiRi: A Simple Selective Retraining Mechanism for Transformer-based Visual Grounding
Authors:
Mengxue Qu,
Yu Wu,
Wu Liu,
Qiqi Gong,
Xiaodan Liang,
Olga Russakovsky,
Yao Zhao,
Yunchao Wei
Abstract:
In this paper, we investigate how to achieve better visual grounding with modern vision-language transformers, and propose a simple yet powerful Selective Retraining (SiRi) mechanism for this challenging task. Particularly, SiRi conveys a significant principle to the research of visual grounding, i.e., a better initialized vision-language encoder would help the model converge to a better local min…
▽ More
In this paper, we investigate how to achieve better visual grounding with modern vision-language transformers, and propose a simple yet powerful Selective Retraining (SiRi) mechanism for this challenging task. Particularly, SiRi conveys a significant principle to the research of visual grounding, i.e., a better initialized vision-language encoder would help the model converge to a better local minimum, advancing the performance accordingly. In specific, we continually update the parameters of the encoder as the training goes on, while periodically re-initialize rest of the parameters to compel the model to be better optimized based on an enhanced encoder. SiRi can significantly outperform previous approaches on three popular benchmarks. Specifically, our method achieves 83.04% Top1 accuracy on RefCOCO+ testA, outperforming the state-of-the-art approaches (training from scratch) by more than 10.21%. Additionally, we reveal that SiRi performs surprisingly superior even with limited training data. We also extend it to transformer-based visual grounding models and other vision-language tasks to verify the validity.
△ Less
Submitted 27 July, 2022;
originally announced July 2022.
-
KGNN: Harnessing Kernel-based Networks for Semi-supervised Graph Classification
Authors:
Wei Ju,
Junwei Yang,
Meng Qu,
Weiping Song,
Jianhao Shen,
Ming Zhang
Abstract:
This paper studies semi-supervised graph classification, which is an important problem with various applications in social network analysis and bioinformatics. This problem is typically solved by using graph neural networks (GNNs), which yet rely on a large number of labeled graphs for training and are unable to leverage unlabeled graphs. We address the limitations by proposing the Kernel-based Gr…
▽ More
This paper studies semi-supervised graph classification, which is an important problem with various applications in social network analysis and bioinformatics. This problem is typically solved by using graph neural networks (GNNs), which yet rely on a large number of labeled graphs for training and are unable to leverage unlabeled graphs. We address the limitations by proposing the Kernel-based Graph Neural Network (KGNN). A KGNN consists of a GNN-based network as well as a kernel-based network parameterized by a memory network. The GNN-based network performs classification through learning graph representations to implicitly capture the similarity between query graphs and labeled graphs, while the kernel-based network uses graph kernels to explicitly compare each query graph with all the labeled graphs stored in a memory for prediction. The two networks are motivated from complementary perspectives, and thus combing them allows KGNN to use labeled graphs more effectively. We jointly train the two networks by maximizing their agreement on unlabeled graphs via posterior regularization, so that the unlabeled graphs serve as a bridge to let both networks mutually enhance each other. Experiments on a range of well-known benchmark datasets demonstrate that KGNN achieves impressive performance over competitive baselines.
△ Less
Submitted 21 May, 2022;
originally announced May 2022.
-
A Multi-Head Convolutional Neural Network With Multi-path Attention improves Image Denoising
Authors:
Jiahong Zhang,
Meijun Qu,
Ye Wang,
Lihong Cao
Abstract:
Recently, convolutional neural networks (CNNs) and attention mechanisms have been widely used in image denoising and achieved satisfactory performance. However, the previous works mostly use a single head to receive the noisy image, limiting the richness of extracted features. Therefore, a novel CNN with multiple heads (MH) named MHCNN is proposed in this paper, whose heads will receive the input…
▽ More
Recently, convolutional neural networks (CNNs) and attention mechanisms have been widely used in image denoising and achieved satisfactory performance. However, the previous works mostly use a single head to receive the noisy image, limiting the richness of extracted features. Therefore, a novel CNN with multiple heads (MH) named MHCNN is proposed in this paper, whose heads will receive the input images rotated by different rotation angles. MH makes MHCNN simultaneously utilize features of rotated images to remove noise. To integrate these features effectively, we present a novel multi-path attention mechanism (MPA). Unlike previous attention mechanisms that handle pixel-level, channel-level, or patch-level features, MPA focuses on features at the image level. Experiments show MHCNN surpasses other state-of-the-art CNN models on additive white Gaussian noise (AWGN) denoising and real-world image denoising. Its peak signal-to-noise ratio (PSNR) results are higher than other networks, such as BRDNet, RIDNet, PAN-Net, and CSANN. The code is accessible at https://github.com/JiaHongZ/MHCNN.
△ Less
Submitted 3 November, 2022; v1 submitted 27 April, 2022;
originally announced April 2022.
-
Neural Structured Prediction for Inductive Node Classification
Authors:
Meng Qu,
Huiyu Cai,
Jian Tang
Abstract:
This paper studies node classification in the inductive setting, i.e., aiming to learn a model on labeled training graphs and generalize it to infer node labels on unlabeled test graphs. This problem has been extensively studied with graph neural networks (GNNs) by learning effective node representations, as well as traditional structured prediction methods for modeling the structured output of no…
▽ More
This paper studies node classification in the inductive setting, i.e., aiming to learn a model on labeled training graphs and generalize it to infer node labels on unlabeled test graphs. This problem has been extensively studied with graph neural networks (GNNs) by learning effective node representations, as well as traditional structured prediction methods for modeling the structured output of node labels, e.g., conditional random fields (CRFs). In this paper, we present a new approach called the Structured Proxy Network (SPN), which combines the advantages of both worlds. SPN defines flexible potential functions of CRFs with GNNs. However, learning such a model is nontrivial as it involves optimizing a maximin game with high-cost inference. Inspired by the underlying connection between joint and marginal distributions defined by Markov networks, we propose to solve an approximate version of the optimization problem as a proxy, which yields a near-optimal solution, making learning more efficient. Extensive experiments on two settings show that our approach outperforms many competitive baselines.
△ Less
Submitted 15 April, 2022;
originally announced April 2022.
-
Structured Multi-task Learning for Molecular Property Prediction
Authors:
Shengchao Liu,
Meng Qu,
Zuobai Zhang,
Huiyu Cai,
Jian Tang
Abstract:
Multi-task learning for molecular property prediction is becoming increasingly important in drug discovery. However, in contrast to other domains, the performance of multi-task learning in drug discovery is still not satisfying as the number of labeled data for each task is too limited, which calls for additional data to complement the data scarcity. In this paper, we study multi-task learning for…
▽ More
Multi-task learning for molecular property prediction is becoming increasingly important in drug discovery. However, in contrast to other domains, the performance of multi-task learning in drug discovery is still not satisfying as the number of labeled data for each task is too limited, which calls for additional data to complement the data scarcity. In this paper, we study multi-task learning for molecular property prediction in a novel setting, where a relation graph between tasks is available. We first construct a dataset (ChEMBL-STRING) including around 400 tasks as well as a task relation graph. Then to better utilize such relation graph, we propose a method called SGNN-EBM to systematically investigate the structured task modeling from two perspectives. (1) In the \emph{latent} space, we model the task representations by applying a state graph neural network (SGNN) on the relation graph. (2) In the \emph{output} space, we employ structured prediction with the energy-based model (EBM), which can be efficiently trained through noise-contrastive estimation (NCE) approach. Empirical results justify the effectiveness of SGNN-EBM. Code is available on https://github.com/chao1224/SGNN-EBM.
△ Less
Submitted 5 October, 2022; v1 submitted 22 February, 2022;
originally announced March 2022.
-
TorchDrug: A Powerful and Flexible Machine Learning Platform for Drug Discovery
Authors:
Zhaocheng Zhu,
Chence Shi,
Zuobai Zhang,
Shengchao Liu,
Minghao Xu,
Xinyu Yuan,
Yangtian Zhang,
Junkun Chen,
Huiyu Cai,
Jiarui Lu,
Chang Ma,
Runcheng Liu,
Louis-Pascal Xhonneux,
Meng Qu,
Jian Tang
Abstract:
Machine learning has huge potential to revolutionize the field of drug discovery and is attracting increasing attention in recent years. However, lacking domain knowledge (e.g., which tasks to work on), standard benchmarks and data preprocessing pipelines are the main obstacles for machine learning researchers to work in this domain. To facilitate the progress of machine learning for drug discover…
▽ More
Machine learning has huge potential to revolutionize the field of drug discovery and is attracting increasing attention in recent years. However, lacking domain knowledge (e.g., which tasks to work on), standard benchmarks and data preprocessing pipelines are the main obstacles for machine learning researchers to work in this domain. To facilitate the progress of machine learning for drug discovery, we develop TorchDrug, a powerful and flexible machine learning platform for drug discovery built on top of PyTorch. TorchDrug benchmarks a variety of important tasks in drug discovery, including molecular property prediction, pretrained molecular representations, de novo molecular design and optimization, retrosynthsis prediction, and biomedical knowledge graph reasoning. State-of-the-art techniques based on geometric deep learning (or graph machine learning), deep generative models, reinforcement learning and knowledge graph reasoning are implemented for these tasks. TorchDrug features a hierarchical interface that facilitates customization from both novices and experts in this domain. Tutorials, benchmark results and documentation are available at https://torchdrug.ai. Code is released under Apache License 2.0.
△ Less
Submitted 16 February, 2022;
originally announced February 2022.
-
COVI-AgentSim: an Agent-based Model for Evaluating Methods of Digital Contact Tracing
Authors:
Prateek Gupta,
Tegan Maharaj,
Martin Weiss,
Nasim Rahaman,
Hannah Alsdurf,
Abhinav Sharma,
Nanor Minoyan,
Soren Harnois-Leblanc,
Victor Schmidt,
Pierre-Luc St. Charles,
Tristan Deleu,
Andrew Williams,
Akshay Patel,
Meng Qu,
Olexa Bilaniuk,
Gaétan Marceau Caron,
Pierre Luc Carrier,
Satya Ortiz-Gagné,
Marc-Andre Rousseau,
David Buckeridge,
Joumana Ghosn,
Yang Zhang,
Bernhard Schölkopf,
Jian Tang,
Irina Rish
, et al. (4 additional authors not shown)
Abstract:
The rapid global spread of COVID-19 has led to an unprecedented demand for effective methods to mitigate the spread of the disease, and various digital contact tracing (DCT) methods have emerged as a component of the solution. In order to make informed public health choices, there is a need for tools which allow evaluation and comparison of DCT methods. We introduce an agent-based compartmental si…
▽ More
The rapid global spread of COVID-19 has led to an unprecedented demand for effective methods to mitigate the spread of the disease, and various digital contact tracing (DCT) methods have emerged as a component of the solution. In order to make informed public health choices, there is a need for tools which allow evaluation and comparison of DCT methods. We introduce an agent-based compartmental simulator we call COVI-AgentSim, integrating detailed consideration of virology, disease progression, social contact networks, and mobility patterns, based on parameters derived from empirical research. We verify by comparing to real data that COVI-AgentSim is able to reproduce realistic COVID-19 spread dynamics, and perform a sensitivity analysis to verify that the relative performance of contact tracing methods are consistent across a range of settings. We use COVI-AgentSim to perform cost-benefit analyses comparing no DCT to: 1) standard binary contact tracing (BCT) that assigns binary recommendations based on binary test results; and 2) a rule-based method for feature-based contact tracing (FCT) that assigns a graded level of recommendation based on diverse individual features. We find all DCT methods consistently reduce the spread of the disease, and that the advantage of FCT over BCT is maintained over a wide range of adoption rates. Feature-based methods of contact tracing avert more disability-adjusted life years (DALYs) per socioeconomic cost (measured by productive hours lost). Our results suggest any DCT method can help save lives, support re-opening of economies, and prevent second-wave outbreaks, and that FCT methods are a promising direction for enriching BCT using self-reported symptoms, yielding earlier warning signals and a significantly reduced spread of the virus per socioeconomic cost.
△ Less
Submitted 29 October, 2020;
originally announced October 2020.
-
Predicting Infectiousness for Proactive Contact Tracing
Authors:
Yoshua Bengio,
Prateek Gupta,
Tegan Maharaj,
Nasim Rahaman,
Martin Weiss,
Tristan Deleu,
Eilif Muller,
Meng Qu,
Victor Schmidt,
Pierre-Luc St-Charles,
Hannah Alsdurf,
Olexa Bilanuik,
David Buckeridge,
Gáetan Marceau Caron,
Pierre-Luc Carrier,
Joumana Ghosn,
Satya Ortiz-Gagne,
Chris Pal,
Irina Rish,
Bernhard Schölkopf,
Abhinav Sharma,
Jian Tang,
Andrew Williams
Abstract:
The COVID-19 pandemic has spread rapidly worldwide, overwhelming manual contact tracing in many countries and resulting in widespread lockdowns for emergency containment. Large-scale digital contact tracing (DCT) has emerged as a potential solution to resume economic and social activity while minimizing spread of the virus. Various DCT methods have been proposed, each making trade-offs between pri…
▽ More
The COVID-19 pandemic has spread rapidly worldwide, overwhelming manual contact tracing in many countries and resulting in widespread lockdowns for emergency containment. Large-scale digital contact tracing (DCT) has emerged as a potential solution to resume economic and social activity while minimizing spread of the virus. Various DCT methods have been proposed, each making trade-offs between privacy, mobility restrictions, and public health. The most common approach, binary contact tracing (BCT), models infection as a binary event, informed only by an individual's test results, with corresponding binary recommendations that either all or none of the individual's contacts quarantine. BCT ignores the inherent uncertainty in contacts and the infection process, which could be used to tailor messaging to high-risk individuals, and prompt proactive testing or earlier warnings. It also does not make use of observations such as symptoms or pre-existing medical conditions, which could be used to make more accurate infectiousness predictions. In this paper, we use a recently-proposed COVID-19 epidemiological simulator to develop and test methods that can be deployed to a smartphone to locally and proactively predict an individual's infectiousness (risk of infecting others) based on their contact history and other information, while respecting strong privacy constraints. Predictions are used to provide personalized recommendations to the individual via an app, as well as to send anonymized messages to the individual's contacts, who use this information to better predict their own infectiousness, an approach we call proactive contact tracing (PCT). We find a deep-learning based PCT method which improves over BCT for equivalent average mobility, suggesting PCT could help in safe re-opening and second-wave prevention.
△ Less
Submitted 23 October, 2020;
originally announced October 2020.
-
RNNLogic: Learning Logic Rules for Reasoning on Knowledge Graphs
Authors:
Meng Qu,
Junkun Chen,
Louis-Pascal Xhonneux,
Yoshua Bengio,
Jian Tang
Abstract:
This paper studies learning logic rules for reasoning on knowledge graphs. Logic rules provide interpretable explanations when used for prediction as well as being able to generalize to other tasks, and hence are critical to learn. Existing methods either suffer from the problem of searching in a large search space (e.g., neural logic programming) or ineffective optimization due to sparse rewards…
▽ More
This paper studies learning logic rules for reasoning on knowledge graphs. Logic rules provide interpretable explanations when used for prediction as well as being able to generalize to other tasks, and hence are critical to learn. Existing methods either suffer from the problem of searching in a large search space (e.g., neural logic programming) or ineffective optimization due to sparse rewards (e.g., techniques based on reinforcement learning). To address these limitations, this paper proposes a probabilistic model called RNNLogic. RNNLogic treats logic rules as a latent variable, and simultaneously trains a rule generator as well as a reasoning predictor with logic rules. We develop an EM-based algorithm for optimization. In each iteration, the reasoning predictor is first updated to explore some generated logic rules for reasoning. Then in the E-step, we select a set of high-quality rules from all generated rules with both the rule generator and reasoning predictor via posterior inference; and in the M-step, the rule generator is updated with the rules selected in the E-step. Experiments on four datasets prove the effectiveness of RNNLogic.
△ Less
Submitted 15 July, 2021; v1 submitted 8 October, 2020;
originally announced October 2020.
-
Few-shot Relation Extraction via Bayesian Meta-learning on Relation Graphs
Authors:
Meng Qu,
Tianyu Gao,
Louis-Pascal A. C. Xhonneux,
Jian Tang
Abstract:
This paper studies few-shot relation extraction, which aims at predicting the relation for a pair of entities in a sentence by training with a few labeled examples in each relation. To more effectively generalize to new relations, in this paper we study the relationships between different relations and propose to leverage a global relation graph. We propose a novel Bayesian meta-learning approach…
▽ More
This paper studies few-shot relation extraction, which aims at predicting the relation for a pair of entities in a sentence by training with a few labeled examples in each relation. To more effectively generalize to new relations, in this paper we study the relationships between different relations and propose to leverage a global relation graph. We propose a novel Bayesian meta-learning approach to effectively learn the posterior distribution of the prototype vectors of relations, where the initial prior of the prototype vectors is parameterized with a graph neural network on the global relation graph. Moreover, to effectively optimize the posterior distribution of the prototype vectors, we propose to use the stochastic gradient Langevin dynamics, which is related to the MAML algorithm but is able to handle the uncertainty of the prototype vectors. The whole framework can be effectively and efficiently optimized in an end-to-end fashion. Experiments on two benchmark datasets prove the effectiveness of our proposed approach against competitive baselines in both the few-shot and zero-shot settings.
△ Less
Submitted 5 July, 2020;
originally announced July 2020.
-
Graph Policy Network for Transferable Active Learning on Graphs
Authors:
Shengding Hu,
Zheng Xiong,
Meng Qu,
Xingdi Yuan,
Marc-Alexandre Côté,
Zhiyuan Liu,
Jian Tang
Abstract:
Graph neural networks (GNNs) have been attracting increasing popularity due to their simplicity and effectiveness in a variety of fields. However, a large number of labeled data is generally required to train these networks, which could be very expensive to obtain in some domains. In this paper, we study active learning for GNNs, i.e., how to efficiently label the nodes on a graph to reduce the an…
▽ More
Graph neural networks (GNNs) have been attracting increasing popularity due to their simplicity and effectiveness in a variety of fields. However, a large number of labeled data is generally required to train these networks, which could be very expensive to obtain in some domains. In this paper, we study active learning for GNNs, i.e., how to efficiently label the nodes on a graph to reduce the annotation cost of training GNNs. We formulate the problem as a sequential decision process on graphs and train a GNN-based policy network with reinforcement learning to learn the optimal query strategy. By jointly training on several source graphs with full labels, we learn a transferable active learning policy which can directly generalize to unlabeled target graphs. Experimental results on multiple datasets from different domains prove the effectiveness of the learned policy in promoting active learning performance in both settings of transferring between graphs in the same domain and across different domains.
△ Less
Submitted 23 October, 2020; v1 submitted 24 June, 2020;
originally announced June 2020.
-
COVI White Paper
Authors:
Hannah Alsdurf,
Edmond Belliveau,
Yoshua Bengio,
Tristan Deleu,
Prateek Gupta,
Daphne Ippolito,
Richard Janda,
Max Jarvie,
Tyler Kolody,
Sekoul Krastev,
Tegan Maharaj,
Robert Obryk,
Dan Pilat,
Valerie Pisano,
Benjamin Prud'homme,
Meng Qu,
Nasim Rahaman,
Irina Rish,
Jean-Francois Rousseau,
Abhinav Sharma,
Brooke Struck,
Jian Tang,
Martin Weiss,
Yun William Yu
Abstract:
The SARS-CoV-2 (Covid-19) pandemic has caused significant strain on public health institutions around the world. Contact tracing is an essential tool to change the course of the Covid-19 pandemic. Manual contact tracing of Covid-19 cases has significant challenges that limit the ability of public health authorities to minimize community infections. Personalized peer-to-peer contact tracing through…
▽ More
The SARS-CoV-2 (Covid-19) pandemic has caused significant strain on public health institutions around the world. Contact tracing is an essential tool to change the course of the Covid-19 pandemic. Manual contact tracing of Covid-19 cases has significant challenges that limit the ability of public health authorities to minimize community infections. Personalized peer-to-peer contact tracing through the use of mobile apps has the potential to shift the paradigm. Some countries have deployed centralized tracking systems, but more privacy-protecting decentralized systems offer much of the same benefit without concentrating data in the hands of a state authority or for-profit corporations. Machine learning methods can circumvent some of the limitations of standard digital tracing by incorporating many clues and their uncertainty into a more graded and precise estimation of infection risk. The estimated risk can provide early risk awareness, personalized recommendations and relevant information to the user. Finally, non-identifying risk data can inform epidemiological models trained jointly with the machine learning predictor. These models can provide statistical evidence for the importance of factors involved in disease transmission. They can also be used to monitor, evaluate and optimize health policy and (de)confinement scenarios according to medical and economic productivity indicators. However, such a strategy based on mobile apps and machine learning should proactively mitigate potential ethical and privacy risks, which could have substantial impacts on society (not only impacts on health but also impacts such as stigmatization and abuse of personal data). Here, we present an overview of the rationale, design, ethical considerations and privacy strategy of `COVI,' a Covid-19 public peer-to-peer contact tracing and risk awareness mobile application developed in Canada.
△ Less
Submitted 27 July, 2020; v1 submitted 18 May, 2020;
originally announced May 2020.
-
Edge Temperature Ring Oscillation Modulated by Turbulence Transition for Sustaining Stationary Improved Energy Confinement Plasmas
Authors:
A. D. Liu,
X. L. Zou,
M. K. Han,
T. B. Wang,
C. Zhou,
M. Y. Wang,
Y. M. Duan,
G. Verdoolaege,
J. Q. Dong,
Z. X. Wang,
X. Feng,
J. L. Xie,
G. Zhuang,
W. X. Ding,
S. B. Zhang,
Y. Liu,
H. Q. Liu,
L. Wang,
Y. Y. Li,
Y. M. Wang,
B. Lv,
G. H. Hu,
Q. Zhang,
S. X. Wang,
H. L. Zhao
, et al. (11 additional authors not shown)
Abstract:
A reproducible stationary improved confinement mode (I-mode) has been achieved recently in the Experimental Advanced Superconducting Tokamak, featuring good confinement without particle transport barrier, which could be beneficial to solving the heat flux problem caused by edge localized modes (ELM) and the helium ash problem for future fusion reactors. The microscopic mechanism of sustaining stat…
▽ More
A reproducible stationary improved confinement mode (I-mode) has been achieved recently in the Experimental Advanced Superconducting Tokamak, featuring good confinement without particle transport barrier, which could be beneficial to solving the heat flux problem caused by edge localized modes (ELM) and the helium ash problem for future fusion reactors. The microscopic mechanism of sustaining stationary I-mode, based on the coupling between turbulence transition and the edge temperature oscillation, has been discovered for the first time. A radially localized edge temperature ring oscillation (ETRO) with azimuthally symmetric structure ($n=0$,$m=0$) has been identified and it is caused by alternative turbulence transitions between ion temperature gradient modes (ITG) and trapped electron modes (TEM). The ITG-TEM transition is controlled by local electron temperature gradient and consistent with the gyrokinetic simulations. The self-organizing system consisting with ETRO, turbulence and transport transitions plays the key role in sustaining the I-mode confinement. These results provide a novel physics basis for accessing, maintaining and controlling stationary I-mode in the future.
△ Less
Submitted 19 February, 2020;
originally announced February 2020.
-
Continuous Graph Neural Networks
Authors:
Louis-Pascal A. C. Xhonneux,
Meng Qu,
Jian Tang
Abstract:
This paper builds on the connection between graph neural networks and traditional dynamical systems. We propose continuous graph neural networks (CGNN), which generalise existing graph neural networks with discrete dynamics in that they can be viewed as a specific discretisation scheme. The key idea is how to characterise the continuous dynamics of node representations, i.e. the derivatives of nod…
▽ More
This paper builds on the connection between graph neural networks and traditional dynamical systems. We propose continuous graph neural networks (CGNN), which generalise existing graph neural networks with discrete dynamics in that they can be viewed as a specific discretisation scheme. The key idea is how to characterise the continuous dynamics of node representations, i.e. the derivatives of node representations, w.r.t. time. Inspired by existing diffusion-based methods on graphs (e.g. PageRank and epidemic models on social networks), we define the derivatives as a combination of the current node representations, the representations of neighbors, and the initial values of the nodes. We propose and analyse two possible dynamics on graphs---including each dimension of node representations (a.k.a. the feature channel) change independently or interact with each other---both with theoretical justification. The proposed continuous graph neural networks are robust to over-smoothing and hence allow us to build deeper networks, which in turn are able to capture the long-range dependencies between nodes. Experimental results on the task of node classification demonstrate the effectiveness of our proposed approach over competitive baselines.
△ Less
Submitted 16 July, 2020; v1 submitted 2 December, 2019;
originally announced December 2019.
-
GraphMix: Improved Training of GNNs for Semi-Supervised Learning
Authors:
Vikas Verma,
Meng Qu,
Kenji Kawaguchi,
Alex Lamb,
Yoshua Bengio,
Juho Kannala,
Jian Tang
Abstract:
We present GraphMix, a regularization method for Graph Neural Network based semi-supervised object classification, whereby we propose to train a fully-connected network jointly with the graph neural network via parameter sharing and interpolation-based regularization. Further, we provide a theoretical analysis of how GraphMix improves the generalization bounds of the underlying graph neural networ…
▽ More
We present GraphMix, a regularization method for Graph Neural Network based semi-supervised object classification, whereby we propose to train a fully-connected network jointly with the graph neural network via parameter sharing and interpolation-based regularization. Further, we provide a theoretical analysis of how GraphMix improves the generalization bounds of the underlying graph neural network, without making any assumptions about the "aggregation" layer or the depth of the graph neural networks. We experimentally validate this analysis by applying GraphMix to various architectures such as Graph Convolutional Networks, Graph Attention Networks and Graph-U-Net. Despite its simplicity, we demonstrate that GraphMix can consistently improve or closely match state-of-the-art performance using even simpler architectures such as Graph Convolutional Networks, across three established graph benchmarks: Cora, Citeseer and Pubmed citation network datasets, as well as three newly proposed datasets: Cora-Full, Co-author-CS and Co-author-Physics.
△ Less
Submitted 8 October, 2020; v1 submitted 25 September, 2019;
originally announced September 2019.
-
Collaborative Policy Learning for Open Knowledge Graph Reasoning
Authors:
Cong Fu,
Tong Chen,
Meng Qu,
Woojeong Jin,
Xiang Ren
Abstract:
In recent years, there has been a surge of interests in interpretable graph reasoning methods. However, these models often suffer from limited performance when working on sparse and incomplete graphs, due to the lack of evidential paths that can reach target entities. Here we study open knowledge graph reasoning---a task that aims to reason for missing facts over a graph augmented by a background…
▽ More
In recent years, there has been a surge of interests in interpretable graph reasoning methods. However, these models often suffer from limited performance when working on sparse and incomplete graphs, due to the lack of evidential paths that can reach target entities. Here we study open knowledge graph reasoning---a task that aims to reason for missing facts over a graph augmented by a background text corpus. A key challenge of the task is to filter out "irrelevant" facts extracted from corpus, in order to maintain an effective search space during path inference. We propose a novel reinforcement learning framework to train two collaborative agents jointly, i.e., a multi-hop graph reasoner and a fact extractor. The fact extraction agent generates fact triples from corpora to enrich the graph on the fly; while the reasoning agent provides feedback to the fact extractor and guides it towards promoting facts that are helpful for the interpretable reasoning. Experiments on two public datasets demonstrate the effectiveness of the proposed approach. Source code and datasets used in this paper can be downloaded at https://github.com/shanzhenren/CPL
△ Less
Submitted 31 August, 2019;
originally announced September 2019.
-
Weakly-supervised Knowledge Graph Alignment with Adversarial Learning
Authors:
Meng Qu,
Jian Tang,
Yoshua Bengio
Abstract:
This paper studies aligning knowledge graphs from different sources or languages. Most existing methods train supervised methods for the alignment, which usually require a large number of aligned knowledge triplets. However, such a large number of aligned knowledge triplets may not be available or are expensive to obtain in many domains. Therefore, in this paper we propose to study aligning knowle…
▽ More
This paper studies aligning knowledge graphs from different sources or languages. Most existing methods train supervised methods for the alignment, which usually require a large number of aligned knowledge triplets. However, such a large number of aligned knowledge triplets may not be available or are expensive to obtain in many domains. Therefore, in this paper we propose to study aligning knowledge graphs in fully-unsupervised or weakly-supervised fashion, i.e., without or with only a few aligned triplets. We propose an unsupervised framework to align the entity and relation embddings of different knowledge graphs with an adversarial learning framework. Moreover, a regularization term which maximizes the mutual information between the embeddings of different knowledge graphs is used to mitigate the problem of mode collapse when learning the alignment functions. Such a framework can be further seamlessly integrated with existing supervised methods by utilizing a limited number of aligned triples as guidance. Experimental results on multiple datasets prove the effectiveness of our proposed approach in both the unsupervised and the weakly-supervised settings.
△ Less
Submitted 6 July, 2019;
originally announced July 2019.
-
Probabilistic Logic Neural Networks for Reasoning
Authors:
Meng Qu,
Jian Tang
Abstract:
Knowledge graph reasoning, which aims at predicting the missing facts through reasoning with the observed facts, is critical to many applications. Such a problem has been widely explored by traditional logic rule-based approaches and recent knowledge graph embedding methods. A principled logic rule-based approach is the Markov Logic Network (MLN), which is able to leverage domain knowledge with fi…
▽ More
Knowledge graph reasoning, which aims at predicting the missing facts through reasoning with the observed facts, is critical to many applications. Such a problem has been widely explored by traditional logic rule-based approaches and recent knowledge graph embedding methods. A principled logic rule-based approach is the Markov Logic Network (MLN), which is able to leverage domain knowledge with first-order logic and meanwhile handle their uncertainty. However, the inference of MLNs is usually very difficult due to the complicated graph structures. Different from MLNs, knowledge graph embedding methods (e.g. TransE, DistMult) learn effective entity and relation embeddings for reasoning, which are much more effective and efficient. However, they are unable to leverage domain knowledge. In this paper, we propose the probabilistic Logic Neural Network (pLogicNet), which combines the advantages of both methods. A pLogicNet defines the joint distribution of all possible triplets by using a Markov logic network with first-order logic, which can be efficiently optimized with the variational EM algorithm. In the E-step, a knowledge graph embedding model is used for inferring the missing triplets, while in the M-step, the weights of logic rules are updated based on both the observed and predicted triplets. Experiments on multiple knowledge graphs prove the effectiveness of pLogicNet over many competitive baselines.
△ Less
Submitted 29 October, 2019; v1 submitted 20 June, 2019;
originally announced June 2019.
-
vGraph: A Generative Model for Joint Community Detection and Node Representation Learning
Authors:
Fan-Yun Sun,
Meng Qu,
Jordan Hoffmann,
Chin-Wei Huang,
Jian Tang
Abstract:
This paper focuses on two fundamental tasks of graph analysis: community detection and node representation learning, which capture the global and local structures of graphs, respectively. In the current literature, these two tasks are usually independently studied while they are actually highly correlated. We propose a probabilistic generative model called vGraph to learn community membership and…
▽ More
This paper focuses on two fundamental tasks of graph analysis: community detection and node representation learning, which capture the global and local structures of graphs, respectively. In the current literature, these two tasks are usually independently studied while they are actually highly correlated. We propose a probabilistic generative model called vGraph to learn community membership and node representation collaboratively. Specifically, we assume that each node can be represented as a mixture of communities, and each community is defined as a multinomial distribution over nodes. Both the mixing coefficients and the community distribution are parameterized by the low-dimensional representations of the nodes and communities. We designed an effective variational inference algorithm which regularizes the community membership of neighboring nodes to be similar in the latent space. Experimental results on multiple real-world graphs show that vGraph is very effective in both community detection and node representation learning, outperforming many competitive baselines in both tasks. We show that the framework of vGraph is quite flexible and can be easily extended to detect hierarchical communities.
△ Less
Submitted 17 September, 2019; v1 submitted 18 June, 2019;
originally announced June 2019.
-
GMNN: Graph Markov Neural Networks
Authors:
Meng Qu,
Yoshua Bengio,
Jian Tang
Abstract:
This paper studies semi-supervised object classification in relational data, which is a fundamental problem in relational data modeling. The problem has been extensively studied in the literature of both statistical relational learning (e.g. relational Markov networks) and graph neural networks (e.g. graph convolutional networks). Statistical relational learning methods can effectively model the d…
▽ More
This paper studies semi-supervised object classification in relational data, which is a fundamental problem in relational data modeling. The problem has been extensively studied in the literature of both statistical relational learning (e.g. relational Markov networks) and graph neural networks (e.g. graph convolutional networks). Statistical relational learning methods can effectively model the dependency of object labels through conditional random fields for collective classification, whereas graph neural networks learn effective object representations for classification through end-to-end training. In this paper, we propose the Graph Markov Neural Network (GMNN) that combines the advantages of both worlds. A GMNN models the joint distribution of object labels with a conditional random field, which can be effectively trained with the variational EM algorithm. In the E-step, one graph neural network learns effective object representations for approximating the posterior distributions of object labels. In the M-step, another graph neural network is used to model the local label dependency. Experiments on object classification, link classification, and unsupervised node representation learning show that GMNN achieves state-of-the-art results.
△ Less
Submitted 23 July, 2020; v1 submitted 15 May, 2019;
originally announced May 2019.
-
Recurrent Event Network: Autoregressive Structure Inference over Temporal Knowledge Graphs
Authors:
Woojeong Jin,
Meng Qu,
Xisen Jin,
Xiang Ren
Abstract:
Knowledge graph reasoning is a critical task in natural language processing. The task becomes more challenging on temporal knowledge graphs, where each fact is associated with a timestamp. Most existing methods focus on reasoning at past timestamps and they are not able to predict facts happening in the future. This paper proposes Recurrent Event Network (RE-NET), a novel autoregressive architectu…
▽ More
Knowledge graph reasoning is a critical task in natural language processing. The task becomes more challenging on temporal knowledge graphs, where each fact is associated with a timestamp. Most existing methods focus on reasoning at past timestamps and they are not able to predict facts happening in the future. This paper proposes Recurrent Event Network (RE-NET), a novel autoregressive architecture for predicting future interactions. The occurrence of a fact (event) is modeled as a probability distribution conditioned on temporal sequences of past knowledge graphs. Specifically, our RE-NET employs a recurrent event encoder to encode past facts and uses a neighborhood aggregator to model the connection of facts at the same timestamp. Future facts can then be inferred in a sequential manner based on the two modules. We evaluate our proposed method via link prediction at future times on five public datasets. Through extensive experiments, we demonstrate the strength of RENET, especially on multi-step inference over future timestamps, and achieve state-of-the-art performance on all five datasets. Code and data can be found at https://github.com/INK-USC/RE-Net.
△ Less
Submitted 6 October, 2020; v1 submitted 11 April, 2019;
originally announced April 2019.
-
GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding
Authors:
Zhaocheng Zhu,
Shizhen Xu,
Meng Qu,
Jian Tang
Abstract:
Learning continuous representations of nodes is attracting growing interest in both academia and industry recently, due to their simplicity and effectiveness in a variety of applications. Most of existing node embedding algorithms and systems are capable of processing networks with hundreds of thousands or a few millions of nodes. However, how to scale them to networks that have tens of millions o…
▽ More
Learning continuous representations of nodes is attracting growing interest in both academia and industry recently, due to their simplicity and effectiveness in a variety of applications. Most of existing node embedding algorithms and systems are capable of processing networks with hundreds of thousands or a few millions of nodes. However, how to scale them to networks that have tens of millions or even hundreds of millions of nodes remains a challenging problem. In this paper, we propose GraphVite, a high-performance CPU-GPU hybrid system for training node embeddings, by co-optimizing the algorithm and the system. On the CPU end, augmented edge samples are parallelly generated by random walks in an online fashion on the network, and serve as the training data. On the GPU end, a novel parallel negative sampling is proposed to leverage multiple GPUs to train node embeddings simultaneously, without much data transfer and synchronization. Moreover, an efficient collaboration strategy is proposed to further reduce the synchronization cost between CPUs and GPUs. Experiments on multiple real-world networks show that GraphVite is super efficient. It takes only about one minute for a network with 1 million nodes and 5 million edges on a single machine with 4 GPUs, and takes around 20 hours for a network with 66 million nodes and 1.8 billion edges. Compared to the current fastest system, GraphVite is about 50 times faster without any sacrifice on performance.
△ Less
Submitted 2 March, 2019;
originally announced March 2019.
-
Learning Dual Retrieval Module for Semi-supervised Relation Extraction
Authors:
Hongtao Lin,
Jun Yan,
Meng Qu,
Xiang Ren
Abstract:
Relation extraction is an important task in structuring content of text data, and becomes especially challenging when learning with weak supervision---where only a limited number of labeled sentences are given and a large number of unlabeled sentences are available. Most existing work exploits unlabeled data based on the ideas of self-training (i.e., bootstrapping a model) and multi-view learning…
▽ More
Relation extraction is an important task in structuring content of text data, and becomes especially challenging when learning with weak supervision---where only a limited number of labeled sentences are given and a large number of unlabeled sentences are available. Most existing work exploits unlabeled data based on the ideas of self-training (i.e., bootstrapping a model) and multi-view learning (e.g., ensembling multiple model variants). However, these methods either suffer from the issue of semantic drift, or do not fully capture the problem characteristics of relation extraction. In this paper, we leverage a key insight that retrieving sentences expressing a relation is a dual task of predicting relation label for a given sentence---two tasks are complementary to each other and can be optimized jointly for mutual enhancement. To model this intuition, we propose DualRE, a principled framework that introduces a retrieval module which is jointly trained with the original relation prediction module. In this way, high-quality samples selected by retrieval module from unlabeled data can be used to improve prediction module, and vice versa. Experimental results\footnote{\small Code and data can be found at \url{https://github.com/INK-USC/DualRE}.} on two public datasets as well as case studies demonstrate the effectiveness of the DualRE approach.
△ Less
Submitted 22 February, 2019; v1 submitted 20 February, 2019;
originally announced February 2019.
-
I-mode investigation on the Experimental Advanced Superconducting Tokamak
Authors:
X. Feng,
A. D. Liu,
C. Zhou,
Z. X. Liu,
M. Y. Wang,
G. Zhuang,
X. L. Zou,
T. B. Wang,
Y. Z. Zhang,
J. L. Xie,
H. Q. Liu,
T. Zhang,
Y. Liu,
Y. M. Duan,
L. Q. Hu,
G. H. Hu,
D. F. Kong,
S. X. Wang,
H. L. Zhao,
Y. Y. Li,
L. M. Shao,
T. Y. Xia,
W. X. Ding,
T. Lan,
H. Li
, et al. (13 additional authors not shown)
Abstract:
By analyzing large quantities of discharges in the unfavorable ion $ \vec B\times \nabla B $ drift direction, the I-mode operation has been confirmed in EAST tokamak. During the L-mode to I-mode transition, the energy confinement has a prominent improvement by the formation of a high-temperature edge pedestal, while the particle confinement remains almost identical to that in the L-mode. Similar w…
▽ More
By analyzing large quantities of discharges in the unfavorable ion $ \vec B\times \nabla B $ drift direction, the I-mode operation has been confirmed in EAST tokamak. During the L-mode to I-mode transition, the energy confinement has a prominent improvement by the formation of a high-temperature edge pedestal, while the particle confinement remains almost identical to that in the L-mode. Similar with the I-mode observation on other devices, the $ E_r $ profiles obtained by the eight-channel Doppler backscattering system (DBS8)\cite{J.Q.Hu} show a deeper edge $ E_r $ well in the I-mode than that in the L-mode. And a weak coherent mode (WCM) with the frequency range of 40-150 kHz is observed at the edge plasma with the radial extend of about 2-3 cm. WCM could be observed in both density fluctuation and radial electric field fluctuation, and the bicoherence analyses showed significant couplings between WCM and high frequency turbulence, implying that the $ E_r $ fluctuation and the caused flow shear from WCM should play an important role during I-mode. In addition, a low-frequency oscillation with a frequency range of 5-10 kHz is always accompanied with WCM, where GAM intensity is decreased or disappeared. Many evidences show that the a low-frequency oscillation may be a novel kind of limited cycle oscillation but further investigations are needed to explain the new properties such as the harmonics and obvious magnetical perturbations.
△ Less
Submitted 31 May, 2019; v1 submitted 13 February, 2019;
originally announced February 2019.
-
The compactness of commutators of Calderón-Zgymund operators with Dini condition
Authors:
Meng Qu,
Ying Li
Abstract:
Let $T$ be the $θ$-type Calderón-Zgymund operator with Dini condition. In this paper, we prove that for $b\in {\rm CMO}(\mathbb R^n)$, the commutator generated by $T$ with $b$ and the corresponding maximal commutator, are both compact operators on $L^{p}(ω)$ spaces, where $ω$ be the Muchenhoupt $A_p$ weight function and $1<p<\infty$.
Let $T$ be the $θ$-type Calderón-Zgymund operator with Dini condition. In this paper, we prove that for $b\in {\rm CMO}(\mathbb R^n)$, the commutator generated by $T$ with $b$ and the corresponding maximal commutator, are both compact operators on $L^{p}(ω)$ spaces, where $ω$ be the Muchenhoupt $A_p$ weight function and $1<p<\infty$.
△ Less
Submitted 25 December, 2017;
originally announced December 2017.
-
Weakly-supervised Relation Extraction by Pattern-enhanced Embedding Learning
Authors:
Meng Qu,
Xiang Ren,
Yu Zhang,
Jiawei Han
Abstract:
Extracting relations from text corpora is an important task in text mining. It becomes particularly challenging when focusing on weakly-supervised relation extraction, that is, utilizing a few relation instances (i.e., a pair of entities and their relation) as seeds to extract more instances from corpora. Existing distributional approaches leverage the corpus-level co-occurrence statistics of enti…
▽ More
Extracting relations from text corpora is an important task in text mining. It becomes particularly challenging when focusing on weakly-supervised relation extraction, that is, utilizing a few relation instances (i.e., a pair of entities and their relation) as seeds to extract more instances from corpora. Existing distributional approaches leverage the corpus-level co-occurrence statistics of entities to predict their relations, and require large number of labeled instances to learn effective relation classifiers. Alternatively, pattern-based approaches perform bootstrapping or apply neural networks to model the local contexts, but still rely on large number of labeled instances to build reliable models. In this paper, we study integrating the distributional and pattern-based methods in a weakly-supervised setting, such that the two types of methods can provide complementary supervision for each other to build an effective, unified model. We propose a novel co-training framework with a distributional module and a pattern module. During training, the distributional module helps the pattern module discriminate between the informative patterns and other patterns, and the pattern module generates some highly-confident instances to improve the distributional module. The whole framework can be effectively optimized by iterating between improving the pattern module and updating the distributional module. We conduct experiments on two tasks: knowledge base completion with text corpora and corpus-level relation extraction. Experimental results prove the effectiveness of our framework in the weakly-supervised setting.
△ Less
Submitted 25 December, 2017; v1 submitted 8 November, 2017;
originally announced November 2017.
-
An Attention-based Collaboration Framework for Multi-View Network Representation Learning
Authors:
Meng Qu,
Jian Tang,
Jingbo Shang,
Xiang Ren,
Ming Zhang,
Jiawei Han
Abstract:
Learning distributed node representations in networks has been attracting increasing attention recently due to its effectiveness in a variety of applications. Existing approaches usually study networks with a single type of proximity between nodes, which defines a single view of a network. However, in reality there usually exists multiple types of proximities between nodes, yielding networks with…
▽ More
Learning distributed node representations in networks has been attracting increasing attention recently due to its effectiveness in a variety of applications. Existing approaches usually study networks with a single type of proximity between nodes, which defines a single view of a network. However, in reality there usually exists multiple types of proximities between nodes, yielding networks with multiple views. This paper studies learning node representations for networks with multiple views, which aims to infer robust node representations across different views. We propose a multi-view representation learning approach, which promotes the collaboration of different views and lets them vote for the robust representations. During the voting process, an attention mechanism is introduced, which enables each node to focus on the most informative views. Experimental results on real-world networks show that the proposed approach outperforms existing state-of-the-art approaches for network representation learning with a single view and other competitive approaches with multiple views.
△ Less
Submitted 19 September, 2017;
originally announced September 2017.
-
Automatic Synonym Discovery with Knowledge Bases
Authors:
Meng Qu,
Xiang Ren,
Jiawei Han
Abstract:
Recognizing entity synonyms from text has become a crucial task in many entity-leveraging applications. However, discovering entity synonyms from domain-specific text corpora (e.g., news articles, scientific papers) is rather challenging. Current systems take an entity name string as input to find out other names that are synonymous, ignoring the fact that often times a name string can refer to mu…
▽ More
Recognizing entity synonyms from text has become a crucial task in many entity-leveraging applications. However, discovering entity synonyms from domain-specific text corpora (e.g., news articles, scientific papers) is rather challenging. Current systems take an entity name string as input to find out other names that are synonymous, ignoring the fact that often times a name string can refer to multiple entities (e.g., "apple" could refer to both Apple Inc and the fruit apple). Moreover, most existing methods require training data manually created by domain experts to construct supervised-learning systems. In this paper, we study the problem of automatic synonym discovery with knowledge bases, that is, identifying synonyms for knowledge base entities in a given domain-specific corpus. The manually-curated synonyms for each entity stored in a knowledge base not only form a set of name strings to disambiguate the meaning for each other, but also can serve as "distant" supervision to help determine important features for the task. We propose a novel framework, called DPE, to integrate two kinds of mutually-complementing signals for synonym discovery, i.e., distributional features based on corpus-level statistics and textual patterns based on local contexts. In particular, DPE jointly optimizes the two kinds of signals in conjunction with distant supervision, so that they can mutually enhance each other in the training stage. At the inference stage, both signals will be utilized to discover synonyms for the given entities. Experimental results prove the effectiveness of the proposed framework.
△ Less
Submitted 25 June, 2017;
originally announced June 2017.
-
Identity-sensitive Word Embedding through Heterogeneous Networks
Authors:
Jian Tang,
Meng Qu,
Qiaozhu Mei
Abstract:
Most existing word embedding approaches do not distinguish the same words in different contexts, therefore ignoring their contextual meanings. As a result, the learned embeddings of these words are usually a mixture of multiple meanings. In this paper, we acknowledge multiple identities of the same word in different contexts and learn the \textbf{identity-sensitive} word embeddings. Based on an id…
▽ More
Most existing word embedding approaches do not distinguish the same words in different contexts, therefore ignoring their contextual meanings. As a result, the learned embeddings of these words are usually a mixture of multiple meanings. In this paper, we acknowledge multiple identities of the same word in different contexts and learn the \textbf{identity-sensitive} word embeddings. Based on an identity-labeled text corpora, a heterogeneous network of words and word identities is constructed to model different-levels of word co-occurrences. The heterogeneous network is further embedded into a low-dimensional space through a principled network embedding approach, through which we are able to obtain the embeddings of words and the embeddings of word identities. We study three different types of word identities including topics, sentiments and categories. Experimental results on real-world data sets show that the identity-sensitive word embeddings learned by our approach indeed capture different meanings of words and outperforms competitive methods on tasks including text classification and word similarity computation.
△ Less
Submitted 29 November, 2016;
originally announced November 2016.
-
Meta-Path Guided Embedding for Similarity Search in Large-Scale Heterogeneous Information Networks
Authors:
Jingbo Shang,
Meng Qu,
Jialu Liu,
Lance M. Kaplan,
Jiawei Han,
Jian Peng
Abstract:
Most real-world data can be modeled as heterogeneous information networks (HINs) consisting of vertices of multiple types and their relationships. Search for similar vertices of the same type in large HINs, such as bibliographic networks and business-review networks, is a fundamental problem with broad applications. Although similarity search in HINs has been studied previously, most existing appr…
▽ More
Most real-world data can be modeled as heterogeneous information networks (HINs) consisting of vertices of multiple types and their relationships. Search for similar vertices of the same type in large HINs, such as bibliographic networks and business-review networks, is a fundamental problem with broad applications. Although similarity search in HINs has been studied previously, most existing approaches neither explore rich semantic information embedded in the network structures nor take user's preference as a guidance.
In this paper, we re-examine similarity search in HINs and propose a novel embedding-based framework. It models vertices as low-dimensional vectors to explore network structure-embedded similarity. To accommodate user preferences at defining similarity semantics, our proposed framework, ESim, accepts user-defined meta-paths as guidance to learn vertex vectors in a user-preferred embedding space. Moreover, an efficient and parallel sampling-based optimization algorithm has been developed to learn embeddings in large-scale HINs. Extensive experiments on real-world large-scale HINs demonstrate a significant improvement on the effectiveness of ESim over several state-of-the-art algorithms as well as its scalability.
△ Less
Submitted 30 October, 2016;
originally announced October 2016.
-
CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases
Authors:
Xiang Ren,
Zeqiu Wu,
Wenqi He,
Meng Qu,
Clare R. Voss,
Heng Ji,
Tarek F. Abdelzaher,
Jiawei Han
Abstract:
Extracting entities and relations for types of interest from text is important for understanding massive text corpora. Traditionally, systems of entity relation extraction have relied on human-annotated corpora for training and adopted an incremental pipeline. Such systems require additional human expertise to be ported to a new domain, and are vulnerable to errors cascading down the pipeline. In…
▽ More
Extracting entities and relations for types of interest from text is important for understanding massive text corpora. Traditionally, systems of entity relation extraction have relied on human-annotated corpora for training and adopted an incremental pipeline. Such systems require additional human expertise to be ported to a new domain, and are vulnerable to errors cascading down the pipeline. In this paper, we investigate joint extraction of typed entities and relations with labeled data heuristically obtained from knowledge bases (i.e., distant supervision). As our algorithm for type labeling via distant supervision is context-agnostic, noisy training data poses unique challenges for the task. We propose a novel domain-independent framework, called CoType, that runs a data-driven text segmentation algorithm to extract entity mentions, and jointly embeds entity mentions, relation mentions, text features and type labels into two low-dimensional spaces (for entity and relation mentions respectively), where, in each space, objects whose types are close will also have similar representations. CoType, then using these learned embeddings, estimates the types of test (unlinkable) mentions. We formulate a joint optimization problem to learn embeddings from text corpora and knowledge bases, adopting a novel partial-label loss function for noisy labeled data and introducing an object "translation" function to capture the cross-constraints of entities and relations on each other. Experiments on three public datasets demonstrate the effectiveness of CoType across different domains (e.g., news, biomedical), with an average of 25% improvement in F1 score compared to the next best method.
△ Less
Submitted 2 June, 2017; v1 submitted 27 October, 2016;
originally announced October 2016.
-
A preliminary study on dispersions of fatigue properties of materials
Authors:
L Zhou,
H M Qu
Abstract:
Static mechanical properties (e.g. elastic modulus) and fatigue properties of a material all have dispersions. Material inhomogeneity (it can be characterized well by the dispersion of elastic modulus) is the internal factor of dispersions of fatigue properties and the dispersion of the load is the external factor. In this paper, according to theoretical derivation and preliminary experiments veri…
▽ More
Static mechanical properties (e.g. elastic modulus) and fatigue properties of a material all have dispersions. Material inhomogeneity (it can be characterized well by the dispersion of elastic modulus) is the internal factor of dispersions of fatigue properties and the dispersion of the load is the external factor. In this paper, according to theoretical derivation and preliminary experiments verification, the following relationships between dispersions of fatigue properties and dispersions of static mechanical properties of a material are obtained: the dispersion of fatigue life is n (the fatigue index) times of the sum of the dispersion of elastic modulus and the dispersion of the load. The corresponding dispersion of fatigue life is a decreasing function of the given fatigue strength in the stage of high cycle fatigue; P-S curve (the probability statistical distribution of fatigue strength) under the given fatigue life cannot be directly by test, but P-S curve can be obtained on the bias of P-N curve (the probability statistical distribution of fatigue life), and the corresponding dispersion of fatigue strength is a decreasing function of the given fatigue life. On bias of conclusions above, not only the inhomogeneity of material but also the dispersion of the load needs to be considered in the fatigue design, especially in high cycle fatigue design.
△ Less
Submitted 22 October, 2016;
originally announced October 2016.