-
Pre-training with Fractional Denoising to Enhance Molecular Property Prediction
Authors:
Yuyan Ni,
Shikun Feng,
Xin Hong,
Yuancheng Sun,
Wei-Ying Ma,
Zhi-Ming Ma,
Qiwei Ye,
Yanyan Lan
Abstract:
Deep learning methods have been considered promising for accelerating molecular screening in drug discovery and material design. Due to the limited availability of labelled data, various self-supervised molecular pre-training methods have been presented. While many existing methods utilize common pre-training tasks in computer vision (CV) and natural language processing (NLP), they often overlook…
▽ More
Deep learning methods have been considered promising for accelerating molecular screening in drug discovery and material design. Due to the limited availability of labelled data, various self-supervised molecular pre-training methods have been presented. While many existing methods utilize common pre-training tasks in computer vision (CV) and natural language processing (NLP), they often overlook the fundamental physical principles governing molecules. In contrast, applying denoising in pre-training can be interpreted as an equivalent force learning, but the limited noise distribution introduces bias into the molecular distribution. To address this issue, we introduce a molecular pre-training framework called fractional denoising (Frad), which decouples noise design from the constraints imposed by force learning equivalence. In this way, the noise becomes customizable, allowing for incorporating chemical priors to significantly improve molecular distribution modeling. Experiments demonstrate that our framework consistently outperforms existing methods, establishing state-of-the-art results across force prediction, quantum chemical properties, and binding affinity tasks. The refined noise design enhances force accuracy and sampling coverage, which contribute to the creation of physically consistent molecular representations, ultimately leading to superior predictive performance.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
Feedback-Driven Automated Whole Bug Report Reproduction for Android Apps
Authors:
Dingbang Wang,
Yu Zhao,
Sidong Feng,
Zhaoxu Zhang,
William G. J. Halfond,
Chunyang Chen,
Xiaoxia Sun,
Jiangfan Shi,
Tingting Yu
Abstract:
In software development, bug report reproduction is a challenging task. This paper introduces ReBL, a novel feedback-driven approach that leverages GPT-4, a large-scale language model, to automatically reproduce Android bug reports. Unlike traditional methods, ReBL bypasses the use of Step to Reproduce (S2R) entities. Instead, it leverages the entire textual bug report and employs innovative promp…
▽ More
In software development, bug report reproduction is a challenging task. This paper introduces ReBL, a novel feedback-driven approach that leverages GPT-4, a large-scale language model, to automatically reproduce Android bug reports. Unlike traditional methods, ReBL bypasses the use of Step to Reproduce (S2R) entities. Instead, it leverages the entire textual bug report and employs innovative prompts to enhance GPT's contextual reasoning. This approach is more flexible and context-aware than the traditional step-by-step entity matching approach, resulting in improved accuracy and effectiveness. In addition to handling crash reports, ReBL has the capability of handling non-crash bug reports. Our evaluation of 96 Android bug reports (73 crash and 23 non-crash) demonstrates that ReBL successfully reproduced 90.63% of these reports, averaging only 74.98 seconds per bug report. Additionally, ReBL outperformed three existing tools in both success rate and speed.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Prolonged Phase Segregation of Mixed-Halide Perovskite Nanocrystals in the Dark
Authors:
Xueying Ma,
Yuhui Ye,
Yang Xiao,
Shengnan Feng,
Chunfeng Zhang,
Keyu Xia,
Fengrui Hu,
Min Xiao,
Xiaoyong Wang
Abstract:
A critical issue hindering the potential applications of semiconductor mixed-halide perovskites is the phase segregation effect, wherein localized regions enriched with one type of halide anions would be formed upon continuous photogeneration of the excited-state charge carriers. These unexpected phases are capable of remixing again in the dark under the entropic driving force, the process of whic…
▽ More
A critical issue hindering the potential applications of semiconductor mixed-halide perovskites is the phase segregation effect, wherein localized regions enriched with one type of halide anions would be formed upon continuous photogeneration of the excited-state charge carriers. These unexpected phases are capable of remixing again in the dark under the entropic driving force, the process of which are now being exclusively studied after mixed-halide perovskites have arrived at the final stage of complete phase segregation. Here we show that after the removal of laser excitation from a solid film of mixed-halide perovskite nanocrystals with partial phase segregation, the iodide- and bromide-rich regions can continuously grow in the dark for a prolonged time period of several minutes. We propose that this dark phase segregation is sustained by the local electric fields associated with the surface-trapped charge carriers, whose slow dissipation out of mixed-halide perovskite nanocrystals causes a delayed occurrence of the reversal phase remixing process.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
FAUST XVII: Super deuteration in the planet forming system IRS 63 where the streamer strikes the disk
Authors:
L. Podio,
C. Ceccarelli,
C. Codella,
G. Sabatini,
D. Segura-Cox,
N. Balucani,
A. Rimola,
P. Ugliengo,
C. J. Chandler,
N. Sakai,
B. Svoboda,
J. Pineda,
M. De Simone,
E. Bianchi,
P. Caselli,
A. Isella,
Y. Aikawa,
M. Bouvier,
E. Caux,
L. Chahine,
S. B. Charnley,
N. Cuello,
F. Dulieu,
L. Evans,
D. Fedele
, et al. (33 additional authors not shown)
Abstract:
Recent observations suggest that planets formation starts early, in protostellar disks of $\le10^5$ yrs, which are characterized by strong interactions with the environment, e.g., through accretion streamers and molecular outflows. To investigate the impact of such phenomena on disk physical and chemical properties it is key to understand what chemistry planets inherit from their natal environment…
▽ More
Recent observations suggest that planets formation starts early, in protostellar disks of $\le10^5$ yrs, which are characterized by strong interactions with the environment, e.g., through accretion streamers and molecular outflows. To investigate the impact of such phenomena on disk physical and chemical properties it is key to understand what chemistry planets inherit from their natal environment. In the context of the ALMA Large Program Fifty AU STudy of the chemistry in the disk/envelope system of Solar-like protostars (FAUST), we present observations on scales from ~1500 au to ~60 au of H$_2$CO, HDCO, and D$_2$CO towards the young planet-forming disk IRS~63. H$_2$CO probes the gas in the disk as well as in a large scale streamer (~1500 au) impacting onto the South-East (SE) disk side. We detect for the first time deuterated formaldehyde, HDCO and D$_2$CO, in a planet-forming disk, and HDCO in the streamer that is feeding it. This allows us to estimate the deuterium fractionation of H$_2$CO in the disk: [HDCO]/[H$_2$CO]$\sim0.1-0.3$ and [D$_2$CO]/[H$_2$CO]$\sim0.1$. Interestingly, while HDCO follows the H$_2$CO distribution in the disk and in the streamer, the distribution of D$_2$CO is highly asymmetric, with a peak of the emission (and [D]/[H] ratio) in the SE disk side, where the streamer crashes onto the disk. In addition, D$_2$CO is detected in two spots along the blue- and red-shifted outflow. This suggests that: (i) in the disk, HDCO formation is dominated by gas-phase reactions similarly to H$_2$CO, while (ii) D$_2$CO was mainly formed on the grain mantles during the prestellar phase and/or in the disk itself, and is at present released in the gas-phase in the shocks driven by the streamer and the outflow. These findings testify on the key role of streamers in the build-up of the disk both concerning the final mass available for planet formation and its chemical composition.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Spontaneous Reward Hacking in Iterative Self-Refinement
Authors:
Jane Pan,
He He,
Samuel R. Bowman,
Shi Feng
Abstract:
Language models are capable of iteratively improving their outputs based on natural language feedback, thus enabling in-context optimization of user preference. In place of human users, a second language model can be used as an evaluator, providing feedback along with numerical ratings which the generator attempts to optimize. However, because the evaluator is an imperfect proxy of user preference…
▽ More
Language models are capable of iteratively improving their outputs based on natural language feedback, thus enabling in-context optimization of user preference. In place of human users, a second language model can be used as an evaluator, providing feedback along with numerical ratings which the generator attempts to optimize. However, because the evaluator is an imperfect proxy of user preference, this optimization can lead to reward hacking, where the evaluator's ratings improve while the generation quality remains stagnant or even decreases as judged by actual user preference. The concern of reward hacking is heightened in iterative self-refinement where the generator and the evaluator use the same underlying language model, in which case the optimization pressure can drive them to exploit shared vulnerabilities. Using an essay editing task, we show that iterative self-refinement leads to deviation between the language model evaluator and human judgment, demonstrating that reward hacking can occur spontaneously in-context with the use of iterative self-refinement. In addition, we study conditions under which reward hacking occurs and observe two factors that affect reward hacking severity: model size and context sharing between the generator and the evaluator.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Visible, Near-, and Mid-infrared Computational Spectrometer Enabled by Single-Spinning Film Encoder
Authors:
Junren Wen,
Weiming Shi,
Cheng Gao,
Yujie Liu,
Shuaibo Feng,
Yu Shao,
Haiqi Gao,
Yuchuan Shao,
Yueguang Zhang,
Weidong Shen,
Chenying Yang
Abstract:
Computational spectrometers are pivotal in enabling low-cost, in-situ and rapid spectral analysis, with potential applications in chemistry, biology, and environmental science. However, filter-based spectral encoding approaches typically use filter arrays, complicating the manufacturing process and hindering device consistency. By capitalizing on the polarization separation effect under oblique in…
▽ More
Computational spectrometers are pivotal in enabling low-cost, in-situ and rapid spectral analysis, with potential applications in chemistry, biology, and environmental science. However, filter-based spectral encoding approaches typically use filter arrays, complicating the manufacturing process and hindering device consistency. By capitalizing on the polarization separation effect under oblique incidence (PSEOI), we pioneer the use of a single filter for highly efficient spectral encoding, and propose a novel computational spectrometer spanning visible to mid-infrared wavelengths by combining the Single-Spinning Film Encoder (SSFE) with deep learning-based reconstruction algorithm. The particle swarm optimization (PSO) method is employed to optimize the film configuration of SSFE, achieving low-correlation and high-complexity spectral responses under different polarizations and spinning angles, thereby enhancing both spectral resolution and accuracy of reconstruction across diverse spectral ranges. Spectral resolutions up to 0.5 nm, 2 nm, 10 nm can be realized for single-peak narrowband spectra, and 3 nm, 6 nm, 20 nm for dual-peak narrowband spectra, over the visible, near-, and mid-infrared wavelength ranges, respectively. Moreover, the proposed spectrometer demonstrates an overall 81.38% precision for the classification of 220 chemical compounds, confirming its robustness and precision in practical scenarios, along with the capability for compact, cost-effective spectroscopic solutions.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Authors:
Daking Rai,
Yilun Zhou,
Shi Feng,
Abulhair Saparov,
Ziyu Yao
Abstract:
Mechanistic interpretability (MI) is an emerging sub-field of interpretability that seeks to understand a neural network model by reverse-engineering its internal computations. Recently, MI has garnered significant attention for interpreting transformer-based language models (LMs), resulting in many novel insights yet introducing new challenges. However, there has not been work that comprehensivel…
▽ More
Mechanistic interpretability (MI) is an emerging sub-field of interpretability that seeks to understand a neural network model by reverse-engineering its internal computations. Recently, MI has garnered significant attention for interpreting transformer-based language models (LMs), resulting in many novel insights yet introducing new challenges. However, there has not been work that comprehensively reviews these insights and challenges, particularly as a guide for newcomers to this field. To fill this gap, we present a comprehensive survey outlining fundamental objects of study in MI, techniques that have been used for its investigation, approaches for evaluating MI results, and significant findings and applications stemming from the use of MI to understand LMs. In particular, we present a roadmap for beginners to navigate the field and leverage MI for their benefit. Finally, we also identify current gaps in the field and discuss potential future directions.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Integrate the Essence and Eliminate the Dross: Fine-Grained Self-Consistency for Free-Form Language Generation
Authors:
Xinglin Wang,
Yiwei Li,
Shaoxiong Feng,
Peiwen Yuan,
Boyuan Pan,
Heda Wang,
Yao Hu,
Kan Li
Abstract:
Self-consistency (SC), leveraging multiple samples from LLMs, shows significant gains on various reasoning tasks but struggles with free-form generation due to the difficulty of aggregating answers. Its variants, UCS and USC, rely on sample selection or voting mechanisms to improve output quality. These methods, however, face limitations due to their inability to fully utilize the nuanced consensu…
▽ More
Self-consistency (SC), leveraging multiple samples from LLMs, shows significant gains on various reasoning tasks but struggles with free-form generation due to the difficulty of aggregating answers. Its variants, UCS and USC, rely on sample selection or voting mechanisms to improve output quality. These methods, however, face limitations due to their inability to fully utilize the nuanced consensus knowledge present within multiple candidate samples, often resulting in suboptimal outputs. We propose Fine-Grained Self-Consistency (FSC) to addresses these limitations by extracting and integrating segment-level commonalities from candidate samples, enhancing the performance of LLMs both in open-ended and reasoning tasks. Based on this, we present two additional strategies: candidate filtering, which enhances overall quality by identifying highly similar candidate sets, and merging, which reduces input token requirements by combining similar samples. The effectiveness of FSC is demonstrated through extensive experiments on various tasks, including summarization, code generation, and mathematical reasoning, using GPT-3.5-turbo and GPT-4. The results indicate significant improvements over baseline methods, showcasing the potential of FSC to optimize output quality by effectively synthesizing fine-grained consensus knowledge from multiple samples.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Rotation effect on the spectral function of heavy vector mesons in holographic QCD
Authors:
Xiao-Long Wang,
Sheng-Qin Feng
Abstract:
Exploring heavy vector mesons of the $ J / ψ$ and $ Υ( 1 S )$ is crucial for understanding the quark gluon plasma (QGP) formed in heavy ion collisions. The influences of rotational effect on the properties of the $ J / ψ$ and the $ Υ( 1 S )$ are investigated by incorporating rotation medium into the holographic QCD. It is found that temperature, chemical potential, and rotational radius effects en…
▽ More
Exploring heavy vector mesons of the $ J / ψ$ and $ Υ( 1 S )$ is crucial for understanding the quark gluon plasma (QGP) formed in heavy ion collisions. The influences of rotational effect on the properties of the $ J / ψ$ and the $ Υ( 1 S )$ are investigated by incorporating rotation medium into the holographic QCD. It is found that temperature, chemical potential, and rotational radius effects enhance the dissociation process of the $ J / ψ$ and the $ Υ( 1 S )$ states within the medium. This rotation-induced effect is more significant for heavy vector mesons in the transverse direction than that of the longitudinal direction. The first holographic study on the influence of the radius of a homogeneous rotating system on the vector meson spectrum is proposed. It is found that increasing in rotation radius promotes the dissociation of vector mesons of the $ J / ψ$ and $ Υ( 1 S )$. We also find that the dissociation perpendicular to the direction of rotational angular velocity is more significant than that parallel to it at large rational radius.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
MoleculeCLA: Rethinking Molecular Benchmark via Computational Ligand-Target Binding Analysis
Authors:
Shikun Feng,
Jiaxin Zheng,
Yinjun Jia,
Yanwen Huang,
Fengfeng Zhou,
Wei-Ying Ma,
Yanyan Lan
Abstract:
Molecular representation learning is pivotal for various molecular property prediction tasks related to drug discovery. Robust and accurate benchmarks are essential for refining and validating current methods. Existing molecular property benchmarks derived from wet experiments, however, face limitations such as data volume constraints, unbalanced label distribution, and noisy labels. To address th…
▽ More
Molecular representation learning is pivotal for various molecular property prediction tasks related to drug discovery. Robust and accurate benchmarks are essential for refining and validating current methods. Existing molecular property benchmarks derived from wet experiments, however, face limitations such as data volume constraints, unbalanced label distribution, and noisy labels. To address these issues, we construct a large-scale and precise molecular representation dataset of approximately 140,000 small molecules, meticulously designed to capture an extensive array of chemical, physical, and biological properties, derived through a robust computational ligand-target binding analysis pipeline. We conduct extensive experiments on various deep learning models, demonstrating that our dataset offers significant physicochemical interpretability to guide model development and design. Notably, the dataset's properties are linked to binding affinity metrics, providing additional insights into model performance in drug-target interaction tasks. We believe this dataset will serve as a more accurate and reliable benchmark for molecular representation learning, thereby expediting progress in the field of artificial intelligence-driven drug discovery.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Switching Controller Synthesis for Hybrid Systems Against STL Formulas
Authors:
Han Su,
Shenghua Feng,
Sinong Zhan,
Naijun Zhan
Abstract:
Switching controllers play a pivotal role in directing hybrid systems (HSs) towards the desired objective, embodying a ``correct-by-construction'' approach to HS design. Identifying these objectives is thus crucial for the synthesis of effective switching controllers. While most of existing works focus on safety and liveness, few of them consider timing constraints. In this paper, we delves into t…
▽ More
Switching controllers play a pivotal role in directing hybrid systems (HSs) towards the desired objective, embodying a ``correct-by-construction'' approach to HS design. Identifying these objectives is thus crucial for the synthesis of effective switching controllers. While most of existing works focus on safety and liveness, few of them consider timing constraints. In this paper, we delves into the synthesis of switching controllers for HSs that meet system objectives given by a fragment of STL, which essentially corresponds to a reach-avoid problem with timing constraints. Our approach involves iteratively computing the state sets that can be driven to satisfy the reach-avoid specification with timing constraints. This technique supports to create switching controllers for both constant and non-constant HSs. We validate our method's soundness, and confirm its relative completeness for a certain subclass of HSs. Experiment results affirms the efficacy of our approach.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Can LLM Graph Reasoning Generalize beyond Pattern Memorization?
Authors:
Yizhuo Zhang,
Heng Wang,
Shangbin Feng,
Zhaoxuan Tan,
Xiaochuang Han,
Tianxing He,
Yulia Tsvetkov
Abstract:
Large language models (LLMs) demonstrate great potential for problems with implicit graphical structures, while recent works seek to enhance the graph reasoning capabilities of LLMs through specialized instruction tuning. The resulting 'graph LLMs' are evaluated with in-distribution settings only, thus it remains underexplored whether LLMs are learning generalizable graph reasoning skills or merel…
▽ More
Large language models (LLMs) demonstrate great potential for problems with implicit graphical structures, while recent works seek to enhance the graph reasoning capabilities of LLMs through specialized instruction tuning. The resulting 'graph LLMs' are evaluated with in-distribution settings only, thus it remains underexplored whether LLMs are learning generalizable graph reasoning skills or merely memorizing patterns in the synthetic training data. To this end, we propose the NLGift benchmark, an evaluation suite of LLM graph reasoning generalization: whether LLMs could go beyond semantic, numeric, structural, reasoning patterns in the synthetic training data and improve utility on real-world graph-based tasks. Extensive experiments with two LLMs across four graph reasoning tasks demonstrate that while generalization on simple patterns (semantic, numeric) is somewhat satisfactory, LLMs struggle to generalize across reasoning and real-world patterns, casting doubt on the benefit of synthetic graph tuning for real-world tasks with underlying network structures. We explore three strategies to improve LLM graph reasoning generalization, and we find that while post-training alignment is most promising for real-world tasks, empowering LLM graph reasoning to go beyond pattern memorization remains an open research question.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration
Authors:
Shangbin Feng,
Taylor Sorensen,
Yuhan Liu,
Jillian Fisher,
Chan Young Park,
Yejin Choi,
Yulia Tsvetkov
Abstract:
While existing alignment paradigms have been integral in developing large language models (LLMs), LLMs often learn an averaged human preference and struggle to model diverse preferences across cultures, demographics, and communities. We propose Modular Pluralism, a modular framework based on multi-LLM collaboration for pluralistic alignment: it "plugs into" a base LLM a pool of smaller but special…
▽ More
While existing alignment paradigms have been integral in developing large language models (LLMs), LLMs often learn an averaged human preference and struggle to model diverse preferences across cultures, demographics, and communities. We propose Modular Pluralism, a modular framework based on multi-LLM collaboration for pluralistic alignment: it "plugs into" a base LLM a pool of smaller but specialized community LMs, where models collaborate in distinct modes to flexibility support three modes of pluralism: Overton, steerable, and distributional. Modular Pluralism is uniquely compatible with black-box LLMs and offers the modular control of adding new community LMs for previously underrepresented communities. We evaluate Modular Pluralism with six tasks and four datasets featuring questions/instructions with value-laden and perspective-informed responses. Extensive experiments demonstrate that Modular Pluralism advances the three pluralism objectives across six black-box and open-source LLMs. Further analysis reveals that LLMs are generally faithful to the inputs from smaller community LLMs, allowing seamless patching by adding a new community LM to better cover previously underrepresented communities.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Teaching LLMs to Abstain across Languages via Multilingual Feedback
Authors:
Shangbin Feng,
Weijia Shi,
Yike Wang,
Wenxuan Ding,
Orevaoghene Ahia,
Shuyue Stella Li,
Vidhisha Balachandran,
Sunayana Sitaram,
Yulia Tsvetkov
Abstract:
Multilingual LLMs often have knowledge disparities across languages, with larger gaps in under-resourced languages. Teaching LLMs to abstain in the face of knowledge gaps is thus a promising strategy to mitigate hallucinations in multilingual settings. However, previous studies on LLM abstention primarily focus on English; we find that directly applying existing solutions beyond English results in…
▽ More
Multilingual LLMs often have knowledge disparities across languages, with larger gaps in under-resourced languages. Teaching LLMs to abstain in the face of knowledge gaps is thus a promising strategy to mitigate hallucinations in multilingual settings. However, previous studies on LLM abstention primarily focus on English; we find that directly applying existing solutions beyond English results in up to 20.5% performance gaps between high and low-resource languages, potentially due to LLMs' drop in calibration and reasoning beyond a few resource-rich languages. To this end, we propose strategies to enhance LLM abstention by learning from multilingual feedback, where LLMs self-reflect on proposed answers in one language by generating multiple feedback items in related languages: we show that this helps identifying the knowledge gaps across diverse languages, cultures, and communities. Extensive experiments demonstrate that our multilingual feedback approach outperforms various strong baselines, achieving up to 9.2% improvement for low-resource languages across three black-box and open models on three datasets, featuring open-book, closed-book, and commonsense QA. Further analysis reveals that multilingual feedback is both an effective and a more equitable abstain strategy to serve diverse language speakers, and cultural factors have great impact on language selection and LLM abstention behavior, highlighting future directions for multilingual and multi-cultural reliable language modeling.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
A SMART Mnemonic Sounds like "Glue Tonic": Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick
Authors:
Nishant Balepur,
Matthew Shu,
Alexander Hoyle,
Alison Robey,
Shi Feng,
Seraphina Goldfarb-Tarrant,
Jordan Boyd-Graber
Abstract:
Keyword mnemonics are memorable explanations that link new terms to simpler keywords. Prior works generate mnemonics for students, but they do not guide models toward mnemonics students prefer and aid learning. We build SMART, a mnemonic generator trained on feedback from real students learning new terms. To train SMART, we first fine-tune LLaMA-2 on a curated set of user-written mnemonics. We the…
▽ More
Keyword mnemonics are memorable explanations that link new terms to simpler keywords. Prior works generate mnemonics for students, but they do not guide models toward mnemonics students prefer and aid learning. We build SMART, a mnemonic generator trained on feedback from real students learning new terms. To train SMART, we first fine-tune LLaMA-2 on a curated set of user-written mnemonics. We then use LLM alignment to enhance SMART: we deploy mnemonics generated by SMART in a flashcard app to find preferences on mnemonics students favor. We gather 2684 preferences from 45 students across two types: expressed (inferred from ratings) and observed (inferred from student learning), yielding three key findings. First, expressed and observed preferences disagree; what students think is helpful does not fully capture what is truly helpful. Second, Bayesian models can synthesize complementary data from multiple preference types into a single effectiveness signal. SMART is tuned via Direct Preference Optimization on this signal, which we show resolves ties and missing labels in the typical method of pairwise comparisons, augmenting data for LLM output quality gains. Third, mnemonic experts assess SMART as matching GPT-4, at much lower deployment costs, showing the utility of capturing diverse student feedback to align LLMs in education.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Two-Stage Depth Enhanced Learning with Obstacle Map For Object Navigation
Authors:
Yanwei Zheng,
Shaopu Feng,
Bowen Huang,
Changrui Li,
Xiao Zhang,
Dongxiao Yu
Abstract:
The task that requires an agent to navigate to a given object through only visual observation is called visual object navigation (VON). The main bottlenecks of VON are strategies exploration and prior knowledge exploitation. Traditional strategies exploration ignores the differences of searching and navigating stages, using the same reward in two stages, which reduces navigation performance and tr…
▽ More
The task that requires an agent to navigate to a given object through only visual observation is called visual object navigation (VON). The main bottlenecks of VON are strategies exploration and prior knowledge exploitation. Traditional strategies exploration ignores the differences of searching and navigating stages, using the same reward in two stages, which reduces navigation performance and training efficiency. Our study enables the agent to explore larger area in searching stage and seek the optimal path in navigating stage, improving the success rate of navigation. Traditional prior knowledge exploitation focused on learning and utilizing object association, which ignored the depth and obstacle information in the environment. This paper uses the RGB and depth information of the training scene to pretrain the feature extractor, which improves navigation efficiency. The obstacle information is memorized by the agent during the navigation, reducing the probability of collision and deadlock. Depth, obstacle and other prior knowledge are concatenated and input into the policy network, and navigation actions are output under the training of two-stage rewards. We evaluated our method on AI2-Thor and RoboTHOR and demonstrated that it significantly outperforms state-of-the-art (SOTA) methods on success rate and navigation efficiency.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Authors:
Renqiu Xia,
Song Mao,
Xiangchao Yan,
Hongbin Zhou,
Bo Zhang,
Haoyang Peng,
Jiahao Pi,
Daocheng Fu,
Wenjie Wu,
Hancheng Ye,
Shiyang Feng,
Bin Wang,
Chao Xu,
Conghui He,
Pinlong Cai,
Min Dou,
Botian Shi,
Sheng Zhou,
Yongwei Wang,
Bin Wang,
Junchi Yan,
Fei Wu,
Yu Qiao
Abstract:
Scientific documents record research findings and valuable human knowledge, comprising a vast corpus of high-quality data. Leveraging multi-modality data extracted from these documents and assessing large models' abilities to handle scientific document-oriented tasks is therefore meaningful. Despite promising advancements, large models still perform poorly on multi-page scientific document extract…
▽ More
Scientific documents record research findings and valuable human knowledge, comprising a vast corpus of high-quality data. Leveraging multi-modality data extracted from these documents and assessing large models' abilities to handle scientific document-oriented tasks is therefore meaningful. Despite promising advancements, large models still perform poorly on multi-page scientific document extraction and understanding tasks, and their capacity to process within-document data formats such as charts and equations remains under-explored. To address these issues, we present DocGenome, a structured document benchmark constructed by annotating 500K scientific documents from 153 disciplines in the arXiv open-access community, using our custom auto-labeling pipeline. DocGenome features four key characteristics: 1) Completeness: It is the first dataset to structure data from all modalities including 13 layout attributes along with their LaTeX source codes. 2) Logicality: It provides 6 logical relationships between different entities within each scientific document. 3) Diversity: It covers various document-oriented tasks, including document classification, visual grounding, document layout detection, document transformation, open-ended single-page QA and multi-page QA. 4) Correctness: It undergoes rigorous quality control checks conducted by a specialized team. We conduct extensive experiments to demonstrate the advantages of DocGenome and objectively evaluate the performance of large models on our benchmark.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Towards an End-to-End Framework for Invasive Brain Signal Decoding with Large Language Models
Authors:
Sheng Feng,
Heyang Liu,
Yu Wang,
Yanfeng Wang
Abstract:
In this paper, we introduce a groundbreaking end-to-end (E2E) framework for decoding invasive brain signals, marking a significant advancement in the field of speech neuroprosthesis. Our methodology leverages the comprehensive reasoning abilities of large language models (LLMs) to facilitate direct decoding. By fully integrating LLMs, we achieve results comparable to the state-of-the-art cascade m…
▽ More
In this paper, we introduce a groundbreaking end-to-end (E2E) framework for decoding invasive brain signals, marking a significant advancement in the field of speech neuroprosthesis. Our methodology leverages the comprehensive reasoning abilities of large language models (LLMs) to facilitate direct decoding. By fully integrating LLMs, we achieve results comparable to the state-of-the-art cascade models. Our findings underscore the immense potential of E2E frameworks in speech neuroprosthesis, particularly as the technology behind brain-computer interfaces (BCIs) and the availability of relevant datasets continue to evolve. This work not only showcases the efficacy of combining LLMs with E2E decoding for enhancing speech neuroprosthesis but also sets a new direction for future research in BCI applications, underscoring the impact of LLMs in decoding complex neural signals for communication restoration. Code will be made available at https://github.com/FsFrancis15/BrainLLM.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences
Authors:
Bria Long,
Violet Xiang,
Stefan Stojanov,
Robert Z. Sparks,
Zi Yin,
Grace E. Keene,
Alvin W. M. Tan,
Steven Y. Feng,
Chengxu Zhuang,
Virginia A. Marchman,
Daniel L. K. Yamins,
Michael C. Frank
Abstract:
Human children far exceed modern machine learning algorithms in their sample efficiency, achieving high performance in key domains with much less data than current models. This ''data gap'' is a key challenge both for building intelligent artificial systems and for understanding human development. Egocentric video capturing children's experience -- their ''training data'' -- is a key ingredient fo…
▽ More
Human children far exceed modern machine learning algorithms in their sample efficiency, achieving high performance in key domains with much less data than current models. This ''data gap'' is a key challenge both for building intelligent artificial systems and for understanding human development. Egocentric video capturing children's experience -- their ''training data'' -- is a key ingredient for comparison of humans and models and for the development of algorithmic innovations to bridge this gap. Yet there are few such datasets available, and extant data are low-resolution, have limited metadata, and importantly, represent only a small set of children's experiences. Here, we provide the first release of the largest developmental egocentric video dataset to date -- the BabyView dataset -- recorded using a high-resolution camera with a large vertical field-of-view and gyroscope/accelerometer data. This 493 hour dataset includes egocentric videos from children spanning 6 months - 5 years of age in both longitudinal, at-home contexts and in a preschool environment. We provide gold-standard annotations for the evaluation of speech transcription, speaker diarization, and human pose estimation, and evaluate models in each of these domains. We train self-supervised language and vision models and evaluate their transfer to out-of-distribution tasks including syntactic structure learning, object recognition, depth estimation, and image segmentation. Although performance in each scales with dataset size, overall performance is relatively lower than when models are trained on curated datasets, especially in the visual domain. Our dataset stands as an open challenge for robust, humanlike AI systems: how can such systems achieve human-levels of success on the same scale and distribution of training data as humans?
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
A Unified Data Augmentation Framework for Low-Resource Multi-Domain Dialogue Generation
Authors:
Yongkang Liu,
Ercong Nie,
Shi Feng,
Zheng Hua,
Zifeng Ding,
Daling Wang,
Yifei Zhang,
Hinrich Schütze
Abstract:
Current state-of-the-art dialogue systems heavily rely on extensive training datasets. However, challenges arise in domains where domain-specific training datasets are insufficient or entirely absent. To tackle this challenge, we propose a novel data \textbf{A}ugmentation framework for \textbf{M}ulti-\textbf{D}omain \textbf{D}ialogue \textbf{G}eneration, referred to as \textbf{AMD$^2$G}. The AMD…
▽ More
Current state-of-the-art dialogue systems heavily rely on extensive training datasets. However, challenges arise in domains where domain-specific training datasets are insufficient or entirely absent. To tackle this challenge, we propose a novel data \textbf{A}ugmentation framework for \textbf{M}ulti-\textbf{D}omain \textbf{D}ialogue \textbf{G}eneration, referred to as \textbf{AMD$^2$G}. The AMD$^2$G framework consists of a data augmentation process and a two-stage training approach: domain-agnostic training and domain adaptation training. We posit that domain corpora are a blend of domain-agnostic and domain-specific features, with certain representation patterns shared among diverse domains. Domain-agnostic training aims to enable models to learn these common expressive patterns. To construct domain-agnostic dialogue corpora, we employ a \textit{\textbf{de-domaining}} data processing technique used to remove domain-specific features. By mitigating the effects of domain-specific features, the model trained on the de-domained corpora can effectively learn common expression patterns in different domains. Subsequently, we adapt the learned domain-agnostic features to the target domain through domain adaptation training. We conduct experiments on Chinese dialogue datasets from five different domains and show that AMD$^2$G achieves superior performance compared to both direct training on the target domain corpus and collective training on all five domain corpora. Our work underscores AMD$^2$G as a viable alternative solution for low-resource multi-domain dialogue generation. Code and data associated with our work are available on GitHub repository$^{\text 1}$.
△ Less
Submitted 28 June, 2024; v1 submitted 14 June, 2024;
originally announced June 2024.
-
SeMOPO: Learning High-quality Model and Policy from Low-quality Offline Visual Datasets
Authors:
Shenghua Wan,
Ziyuan Chen,
Le Gan,
Shuai Feng,
De-Chuan Zhan
Abstract:
Model-based offline reinforcement Learning (RL) is a promising approach that leverages existing data effectively in many real-world applications, especially those involving high-dimensional inputs like images and videos. To alleviate the distribution shift issue in offline RL, existing model-based methods heavily rely on the uncertainty of learned dynamics. However, the model uncertainty estimatio…
▽ More
Model-based offline reinforcement Learning (RL) is a promising approach that leverages existing data effectively in many real-world applications, especially those involving high-dimensional inputs like images and videos. To alleviate the distribution shift issue in offline RL, existing model-based methods heavily rely on the uncertainty of learned dynamics. However, the model uncertainty estimation becomes significantly biased when observations contain complex distractors with non-trivial dynamics. To address this challenge, we propose a new approach - \emph{Separated Model-based Offline Policy Optimization} (SeMOPO) - decomposing latent states into endogenous and exogenous parts via conservative sampling and estimating model uncertainty on the endogenous states only. We provide a theoretical guarantee of model uncertainty and performance bound of SeMOPO. To assess the efficacy, we construct the Low-Quality Vision Deep Data-Driven Datasets for RL (LQV-D4RL), where the data are collected by non-expert policy and the observations include moving distractors. Experimental results show that our method substantially outperforms all baseline methods, and further analytical experiments validate the critical designs in our method. The project website is \href{https://sites.google.com/view/semopo}{https://sites.google.com/view/semopo}.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes…
▽ More
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γ$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Dynamic Stochastic Decoding Strategy for Open-Domain Dialogue Generation
Authors:
Yiwei Li,
Fei Mi,
Yitong Li,
Yasheng Wang,
Bin Sun,
Shaoxiong Feng,
Kan Li
Abstract:
Stochastic sampling strategies such as top-k and top-p have been widely used in dialogue generation task. However, as an open-domain chatting system, there will be two different conversation scenarios, i.e. chit-chat and knowledge-based question answering. In the former situation, responses diversity is essential due to the one-to-many nature in dialogue. The latter, on the other hand, requires le…
▽ More
Stochastic sampling strategies such as top-k and top-p have been widely used in dialogue generation task. However, as an open-domain chatting system, there will be two different conversation scenarios, i.e. chit-chat and knowledge-based question answering. In the former situation, responses diversity is essential due to the one-to-many nature in dialogue. The latter, on the other hand, requires less randomness given that stochastic decoding strategy entails the risk of generating incorrect information. As a result, an adaptive and flexible decoding strategy is needed to cope with these two scenarios simultaneously. To this end, we propose the dynamic decoding strategy (DDS), which can adjust the decoding space w.r.t. different contexts. In DDS, both sequence-level and token-level adaptive search can be achieved to adjust the decoding process in a unified framework. Besides, our adaptive algorithm can not only be used during model inference, but it can also be applied during the model training stage to further enhance the performance. Comprehensive experiments indicate that the proposed decoding strategy can consistently improve the performance of pre-trained dialogue models when coupled with four well-used stochastic decoding algorithms.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Smart Navigation System for Parking Assignment at Large Events: Incorporating Heterogeneous Driver Characteristics
Authors:
Xi Cheng,
Gaofeng Su,
Siyuan Feng,
Ke Liu,
Chen Zhu,
Hui Lin,
Jilin Song,
Jianan Chen
Abstract:
Parking challenges escalate significantly during large events such as concerts or sports games, yet few studies address dynamic parking lot assignments for such occasions. This paper introduces a smart navigation system designed to optimize parking assignments swiftly during large events, utilizing a mixed search algorithm that accounts for the heterogeneous characteristics of drivers. We conducte…
▽ More
Parking challenges escalate significantly during large events such as concerts or sports games, yet few studies address dynamic parking lot assignments for such occasions. This paper introduces a smart navigation system designed to optimize parking assignments swiftly during large events, utilizing a mixed search algorithm that accounts for the heterogeneous characteristics of drivers. We conducted simulations in the Berkeley city area during the "Big Game" to validate our system and demonstrate the benefits of our innovative parking assignment approach.
△ Less
Submitted 14 May, 2024;
originally announced June 2024.
-
MEDIQ: Question-Asking LLMs for Adaptive and Reliable Clinical Reasoning
Authors:
Shuyue Stella Li,
Vidhisha Balachandran,
Shangbin Feng,
Jonathan Ilgen,
Emma Pierson,
Pang Wei Koh,
Yulia Tsvetkov
Abstract:
In high-stakes domains like clinical reasoning, AI assistants powered by large language models (LLMs) are yet to be reliable and safe. We identify a key obstacle towards reliability: existing LLMs are trained to answer any question, even with incomplete context in the prompt or insufficient parametric knowledge. We propose to change this paradigm to develop more careful LLMs that ask follow-up que…
▽ More
In high-stakes domains like clinical reasoning, AI assistants powered by large language models (LLMs) are yet to be reliable and safe. We identify a key obstacle towards reliability: existing LLMs are trained to answer any question, even with incomplete context in the prompt or insufficient parametric knowledge. We propose to change this paradigm to develop more careful LLMs that ask follow-up questions to gather necessary and sufficient information and respond reliably. We introduce MEDIQ, a framework to simulate realistic clinical interactions, which incorporates a Patient System and an adaptive Expert System. The Patient may provide incomplete information in the beginning; the Expert refrains from making diagnostic decisions when unconfident, and instead elicits missing details from the Patient via follow-up questions. To evaluate MEDIQ, we convert MEDQA and CRAFT-MD -- medical benchmarks for diagnostic question answering -- into an interactive setup. We develop a reliable Patient system and prototype several Expert systems, first showing that directly prompting state-of-the-art LLMs to ask questions degrades the quality of clinical reasoning, indicating that adapting LLMs to interactive information-seeking settings is nontrivial. We then augment the Expert with a novel abstention module to better estimate model confidence and decide whether to ask more questions, thereby improving diagnostic accuracy by 20.3%; however, performance still lags compared to an (unrealistic in practice) upper bound when full information is given upfront. Further analyses reveal that interactive performance can be improved by filtering irrelevant contexts and reformatting conversations. Overall, our paper introduces a novel problem towards LLM reliability, a novel MEDIQ framework, and highlights important future directions to extend the information-seeking abilities of LLM assistants in critical domains.
△ Less
Submitted 4 June, 2024; v1 submitted 2 June, 2024;
originally announced June 2024.
-
SIG: Efficient Self-Interpretable Graph Neural Network for Continuous-time Dynamic Graphs
Authors:
Lanting Fang,
Yulian Yang,
Kai Wang,
Shanshan Feng,
Kaiyu Feng,
Jie Gui,
Shuliang Wang,
Yew-Soon Ong
Abstract:
While dynamic graph neural networks have shown promise in various applications, explaining their predictions on continuous-time dynamic graphs (CTDGs) is difficult. This paper investigates a new research task: self-interpretable GNNs for CTDGs. We aim to predict future links within the dynamic graph while simultaneously providing causal explanations for these predictions. There are two key challen…
▽ More
While dynamic graph neural networks have shown promise in various applications, explaining their predictions on continuous-time dynamic graphs (CTDGs) is difficult. This paper investigates a new research task: self-interpretable GNNs for CTDGs. We aim to predict future links within the dynamic graph while simultaneously providing causal explanations for these predictions. There are two key challenges: (1) capturing the underlying structural and temporal information that remains consistent across both independent and identically distributed (IID) and out-of-distribution (OOD) data, and (2) efficiently generating high-quality link prediction results and explanations. To tackle these challenges, we propose a novel causal inference model, namely the Independent and Confounded Causal Model (ICCM). ICCM is then integrated into a deep learning architecture that considers both effectiveness and efficiency. Extensive experiments demonstrate that our proposed model significantly outperforms existing methods across link prediction accuracy, explanation quality, and robustness to shortcut features. Our code and datasets are anonymously released at https://github.com/2024SIG/SIG.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Learning from Uncertain Data: From Possible Worlds to Possible Models
Authors:
Jiongli Zhu,
Su Feng,
Boris Glavic,
Babak Salimi
Abstract:
We introduce an efficient method for learning linear models from uncertain data, where uncertainty is represented as a set of possible variations in the data, leading to predictive multiplicity. Our approach leverages abstract interpretation and zonotopes, a type of convex polytope, to compactly represent these dataset variations, enabling the symbolic execution of gradient descent on all possible…
▽ More
We introduce an efficient method for learning linear models from uncertain data, where uncertainty is represented as a set of possible variations in the data, leading to predictive multiplicity. Our approach leverages abstract interpretation and zonotopes, a type of convex polytope, to compactly represent these dataset variations, enabling the symbolic execution of gradient descent on all possible worlds simultaneously. We develop techniques to ensure that this process converges to a fixed point and derive closed-form solutions for this fixed point. Our method provides sound over-approximations of all possible optimal models and viable prediction ranges. We demonstrate the effectiveness of our approach through theoretical and empirical analysis, highlighting its potential to reason about model and prediction uncertainty due to data quality issues in training data.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Superionic surface Li-ion transport in carbonaceous materials
Authors:
Jianbin Zhou,
Shen Wang,
Chaoshan Wu,
Ji Qi,
Hongli Wan,
Shen Lai,
Shijie Feng,
Tsz Wai Ko,
Zhaohui Liang,
Ke Zhou,
Nimrod Harpak,
Nick Solan,
Mengchen Liu,
Zeyu Hui,
Paulina J. Ai,
Kent Griffith,
Chunsheng Wang,
Shyue Ping Ong,
Yan Yao,
Ping Liu
Abstract:
Unlike Li-ion transport in the bulk of carbonaceous materials, little is known about Li-ion diffusion on their surface. In this study, we have discovered an ultra-fast Li-ion transport phenomenon on the surface of carbonaceous materials, particularly when they have limited Li insertion capacity along with a high surface area. This is exemplified by a carbon black, Ketjen Black (KB). An ionic condu…
▽ More
Unlike Li-ion transport in the bulk of carbonaceous materials, little is known about Li-ion diffusion on their surface. In this study, we have discovered an ultra-fast Li-ion transport phenomenon on the surface of carbonaceous materials, particularly when they have limited Li insertion capacity along with a high surface area. This is exemplified by a carbon black, Ketjen Black (KB). An ionic conductivity of 18.1 mS cm-1 at room temperature is observed, far exceeding most solid-state ion conductors. Theoretical calculations reveal a low diffusion barrier for the surface Li species. The species is also identified as Li*, which features a partial positive charge. As a result, lithiated KB functions effectively as an interlayer between Li and solid-state electrolytes (SSE) to mitigate dendrite growth and cell shorting. This function is found to be electrolyte agnostic, effective for both sulfide and halide SSEs. Further, lithiated KB can act as a high-performance mixed ion/electron conductor that is thermodynamically stable at potentials near Li metal. A graphite anode mixed with KB instead of a solid electrolyte demonstrates full utilization with a capacity retention of ~85% over 300 cycles. The discovery of this surface-mediated ultra-fast Li-ion transport mechanism provides new directions for the design of solid-state ion conductors and solid-state batteries.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Unusual switch from low-temperature T-quadratic resistivity in the underdoped pseudogap phase of cuprate superconductors to low-temperature T-linear resistivity in the overdoped strange-metal phase
Authors:
Xingyu Ma,
Minghuan Zeng,
Huaiming Guo,
Shiping Feng
Abstract:
The transport experiments demonstrate a dramatic switch from the low-temperature T-linear resistivity in the overdoped strange-metal phase to the T-quadratic resistivity in the underdoped pseudogap phase of cuprate superconductors, however, a consensus on the origin of this switch is still lacking. Here the low-temperature resistivity in the underdoped pseudogap phase of cuprate superconductors is…
▽ More
The transport experiments demonstrate a dramatic switch from the low-temperature T-linear resistivity in the overdoped strange-metal phase to the T-quadratic resistivity in the underdoped pseudogap phase of cuprate superconductors, however, a consensus on the origin of this switch is still lacking. Here the low-temperature resistivity in the underdoped pseudogap phase of cuprate superconductors is investigated using the Boltzmann transport equation. The low-temperature resistivity originates from the electron umklapp scattering mediated by the spin excitation. However, the dominant contribution to the resistivity mainly comes from the antinodal umklapp scattering. In particular, a low temperature $T_{scale}$ scales with $Δ^{2}_{p}$ in the underdoped regime due to the opening of a momentum-dependent spin pseudogap, where $Δ_{p}$ is the minimal umklapp vector at the antinode. Notably, this $T_{scale}$ as a function of doping presents a similar behavior of the antinodal spin pseudogap crossover temperature, i.e., $T_{scale}$ decreases with the increase of doping in the underdoped regime, and then is reduced to a very low temperature in the overdoped regime. In the underdoped regime, the resistivity is T-quadratic in the low temperatures below $T_{scale}$ with the strength of the T-quadratic resistivity that weakens as the doping is raised. However, in the overdoped regime, the resistivity is T-linear in the low temperatures above $T_{scale}$. The result in this paper together with the recent study on the electrical transport in the overdoped regime therefore show that the electron Umklapp scattering from a spin excitation responsible for the low-temperature T-linear resistivity in the overdoped strange-metal phase naturally produces the low-temperature T-quadratic resistivity in the underdoped pseudogap phase resulting from the opening of a momentum dependent spin pseudogap.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Spin-orbit coupling controlled two-dimensional magnetism in chromium trihalides
Authors:
Inhee Lee,
Jiefu Cen,
Oleksandr Molchanov,
Shi Feng,
Warren L. Huey,
Johan van Tol,
Joshua E. Goldberger,
Nandini Trivedi,
Hae-Young Kee,
P. Chris Hammel
Abstract:
CrX$_3$ (X = Cl, Br, I) have the same crystal structure and Hamiltonian but different ligand spin-orbit coupling (SOC) constant $λ_X$, providing excellent material platform exploring for exotic two-dimensional (2D) spin orders. Their microscopic mechanism underlying 2D spin physics and Hamiltonian remain unestablished, along with experimental corroboration of Kitaev exchange interaction, central t…
▽ More
CrX$_3$ (X = Cl, Br, I) have the same crystal structure and Hamiltonian but different ligand spin-orbit coupling (SOC) constant $λ_X$, providing excellent material platform exploring for exotic two-dimensional (2D) spin orders. Their microscopic mechanism underlying 2D spin physics and Hamiltonian remain unestablished, along with experimental corroboration of Kitaev exchange interaction, central to realizing topological quantum spin liquids. We report Kitaev interaction signature in magnetic anisotropy measured by ferromagnetic resonance (FMR) spectroscopy. We present measured values of Heisenberg J, Kitaev K, and off-diagonal symmetric $Γ$ exchange interactions in CrX$_3$ determined using FMR and exact diagonalization. K and $Γ$ exhibit dominant quadratic dependencies on $λ_X$, indicating its central role in 2D magnetism. Our study provides foundation for exploring exotic 2D magnetic topologies by tuning intrinsic material parameters such as SOC.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Multiple chemical tracers finally unveil the intricate NGC\,1333 IRAS\,4A outflow system. FAUST XVI
Authors:
Layal Chahine,
Cecilia Ceccarelli,
Marta De Simone,
Claire J. Chandler,
Claudio Codella,
Linda Podio,
Ana López-Sepulcre,
Nami Sakai,
Laurent Loinard,
Mathilde Bouvier,
Paola Caselli,
Charlotte Vastel,
Eleonora Bianchi,
Nicolás Cuello,
Francesco Fontani,
Doug Johnstone,
Giovanni Sabatini,
Tomoyuki Hanawa,
Ziwei E. Zhang,
Yuri Aikawa,
Gemma Busquet,
Emmanuel Caux,
Aurore Durán,
Eric Herbst,
François Ménard
, et al. (32 additional authors not shown)
Abstract:
The exploration of outflows in protobinary systems presents a challenging yet crucial endeavour, offering valuable insights into the dynamic interplay between protostars and their evolution. In this study, we examine the morphology and dynamics of jets and outflows within the IRAS\,4A protobinary system. This analysis is based on ALMA observations of SiO(5--4), H$_2$CO(3$_{0,3}$--2$_{0,3}$), and H…
▽ More
The exploration of outflows in protobinary systems presents a challenging yet crucial endeavour, offering valuable insights into the dynamic interplay between protostars and their evolution. In this study, we examine the morphology and dynamics of jets and outflows within the IRAS\,4A protobinary system. This analysis is based on ALMA observations of SiO(5--4), H$_2$CO(3$_{0,3}$--2$_{0,3}$), and HDCO(4$_{1,4}$--3$_{1,3}$) with a spatial resolution of $\sim$150\,au. Leveraging an astrochemical approach involving the use of diverse tracers beyond traditional ones has enabled the identification of novel features and a comprehensive understanding of the broader outflow dynamics. Our analysis reveals the presence of two jets in the redshifted emission, emanating from IRAS\,4A1 and IRAS\,4A2, respectively. Furthermore, we identify four distinct outflows in the region for the first time, with each protostar, 4A1 and 4A2, contributing to two of them. We characterise the morphology and orientation of each outflow, challenging previous suggestions of bends in their trajectories. The outflow cavities of IRAS\,4A1 exhibit extensions of 10$''$ and 13$''$ with position angles (PA) of 0$^{\circ}$ and -12$^{\circ}$, respectively, while those of IRAS\,4A2 are more extended, spanning 18$''$ and 25$''$ with PAs of 29$^{\circ}$ and 26$^{\circ}$. We propose that the misalignment of the cavities is due to a jet precession in each protostar, a notion supported by the observation that the more extended cavities of the same source exhibit lower velocities, indicating they may stem from older ejection events.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Emergent Majorana metal from a chiral spin liquid
Authors:
Penghao Zhu,
Shi Feng,
Kang Wang,
Tao Xiang,
Nandini Trivedi
Abstract:
We propose a novel mechanism to explain the emergence of an intermediate gapless spin liquid phase (IGP) in the antiferromagnetic Kitaev model in an externally applied magnetic field, sandwiched between the well-known gapped chiral spin liquid (CSL) and the gapped partially polarized (PP) phase. We propose in moderate fields $π$-fluxes nucleate in the ground state and can trap Majorana zero modes.…
▽ More
We propose a novel mechanism to explain the emergence of an intermediate gapless spin liquid phase (IGP) in the antiferromagnetic Kitaev model in an externally applied magnetic field, sandwiched between the well-known gapped chiral spin liquid (CSL) and the gapped partially polarized (PP) phase. We propose in moderate fields $π$-fluxes nucleate in the ground state and can trap Majorana zero modes. As these fluxes proliferate with increasing field, the Majorana zero modes overlap creating an emergent Majorana metallic state with a `Fermi surface' at zero energy. We further show that the Majorana spectral function captures the dynamical spin and dimer correlations obtained by the infinite Projected Entangled Pair States (iPEPS) ansatz. We discuss the implications of our results for candidate Kitaev materials.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Data quality control system and long-term performance monitor of the LHAASO-KM2A
Authors:
Zhen Cao,
F. Aharonian,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
W. Bian,
A. V. Bukevich,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
H. X. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. Chen
, et al. (263 additional authors not shown)
Abstract:
The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To…
▽ More
The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively.
△ Less
Submitted 13 June, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
CACL: Community-Aware Heterogeneous Graph Contrastive Learning for Social Media Bot Detection
Authors:
Sirry Chen,
Shuo Feng,
Songsong Liang,
Chen-Chen Zong,
Jing Li,
Piji Li
Abstract:
Social media bot detection is increasingly crucial with the rise of social media platforms. Existing methods predominantly construct social networks as graph and utilize graph neural networks (GNNs) for bot detection. However, most of these methods focus on how to improve the performance of GNNs while neglecting the community structure within social networks. Moreover, GNNs based methods still fac…
▽ More
Social media bot detection is increasingly crucial with the rise of social media platforms. Existing methods predominantly construct social networks as graph and utilize graph neural networks (GNNs) for bot detection. However, most of these methods focus on how to improve the performance of GNNs while neglecting the community structure within social networks. Moreover, GNNs based methods still face problems such as poor model generalization due to the relatively small scale of the dataset and over-smoothness caused by information propagation mechanism. To address these problems, we propose a Community-Aware Heterogeneous Graph Contrastive Learning framework (CACL), which constructs social network as heterogeneous graph with multiple node types and edge types, and then utilizes community-aware module to dynamically mine both hard positive samples and hard negative samples for supervised graph contrastive learning with adaptive graph enhancement algorithms. Extensive experiments demonstrate that our framework addresses the previously mentioned challenges and outperforms competitive baselines on three social media bot benchmarks.
△ Less
Submitted 3 June, 2024; v1 submitted 17 May, 2024;
originally announced May 2024.
-
UniCorn: A Unified Contrastive Learning Approach for Multi-view Molecular Representation Learning
Authors:
Shikun Feng,
Yuyan Ni,
Minghao Li,
Yanwen Huang,
Zhi-Ming Ma,
Wei-Ying Ma,
Yanyan Lan
Abstract:
Recently, a noticeable trend has emerged in developing pre-trained foundation models in the domains of CV and NLP. However, for molecular pre-training, there lacks a universal model capable of effectively applying to various categories of molecular tasks, since existing prevalent pre-training methods exhibit effectiveness for specific types of downstream tasks. Furthermore, the lack of profound un…
▽ More
Recently, a noticeable trend has emerged in developing pre-trained foundation models in the domains of CV and NLP. However, for molecular pre-training, there lacks a universal model capable of effectively applying to various categories of molecular tasks, since existing prevalent pre-training methods exhibit effectiveness for specific types of downstream tasks. Furthermore, the lack of profound understanding of existing pre-training methods, including 2D graph masking, 2D-3D contrastive learning, and 3D denoising, hampers the advancement of molecular foundation models. In this work, we provide a unified comprehension of existing pre-training methods through the lens of contrastive learning. Thus their distinctions lie in clustering different views of molecules, which is shown beneficial to specific downstream tasks. To achieve a complete and general-purpose molecular representation, we propose a novel pre-training framework, named UniCorn, that inherits the merits of the three methods, depicting molecular views in three different levels. SOTA performance across quantum, physicochemical, and biological tasks, along with comprehensive ablation study, validate the universality and effectiveness of UniCorn.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models
Authors:
Siwei Wang,
Yifei Shen,
Shi Feng,
Haoran Sun,
Shang-Hua Teng,
Wei Chen
Abstract:
In this paper, we present the findings of our Project ALPINE which stands for ``Autoregressive Learning for Planning In NEtworks." Project ALPINE initiates a theoretical investigation into the development of planning capabilities in Transformer-based language models through their autoregressive learning mechanisms, aiming to identify any potential limitations in their planning abilities. We abstra…
▽ More
In this paper, we present the findings of our Project ALPINE which stands for ``Autoregressive Learning for Planning In NEtworks." Project ALPINE initiates a theoretical investigation into the development of planning capabilities in Transformer-based language models through their autoregressive learning mechanisms, aiming to identify any potential limitations in their planning abilities. We abstract planning as a network path-finding task where the objective is to generate a valid path from a specified source node to a designated target node. In terms of expressiveness, we show that the Transformer is capable of executing path-finding by embedding the adjacency and reachability matrices within its weights. Our theoretical analysis of the gradient-based learning dynamic of the Transformer reveals that the Transformer is capable of learning both the adjacency matrix and a limited form of the reachability matrix. These theoretical insights are then validated through experiments, which demonstrate that the Transformer indeed learns the adjacency matrix and an incomplete reachability matrix, which aligns with the predictions made in our theoretical analysis. Additionally, when applying our methodology to a real-world planning benchmark, called Blocksworld, our observations remain consistent. Our theoretical and empirical analyses further unveil a potential limitation of Transformer in path-finding: it cannot identify reachability relationships through transitivity, and thus would fail when path concatenation is needed to generate a path. In summary, our findings shed new light on how the internal mechanisms of autoregressive learning enable planning in networks. This study may contribute to our understanding of the general planning capabilities in other related domains.
△ Less
Submitted 27 May, 2024; v1 submitted 15 May, 2024;
originally announced May 2024.
-
Flight Path Optimization with Optimal Control Method
Authors:
Gaofeng Su,
Xi Cheng,
Siyuan Feng,
Ke Liu,
Jilin Song,
Jianan Chen,
Chen Zhu,
Hui Lin
Abstract:
This paper is based on a crucial issue in the aviation world: how to optimize the trajectory and controls given to the aircraft in order to optimize flight time and fuel consumption. This study aims to provide elements of a response to this problem and to define, under certain simplifying assumptions, an optimal response, using Constrained Finite Time Optimal Control(CFTOC). The first step is to d…
▽ More
This paper is based on a crucial issue in the aviation world: how to optimize the trajectory and controls given to the aircraft in order to optimize flight time and fuel consumption. This study aims to provide elements of a response to this problem and to define, under certain simplifying assumptions, an optimal response, using Constrained Finite Time Optimal Control(CFTOC). The first step is to define the dynamic model of the aircraft in accordance with the controllable inputs and wind disturbances. Then we will identify a precise objective in terms of optimization and implement an optimization program to solve it under the circumstances of simulated real flight situation. Finally, the optimization result is validated and discussed by different scenarios.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Deep Reinforcement Learning for Real-Time Ground Delay Program Revision and Corresponding Flight Delay Assignments
Authors:
Ke Liu,
Fan Hu,
Hui Lin,
Xi Cheng,
Jianan Chen,
Jilin Song,
Siyuan Feng,
Gaofeng Su,
Chen Zhu
Abstract:
This paper explores the optimization of Ground Delay Programs (GDP), a prevalent Traffic Management Initiative used in Air Traffic Management (ATM) to reconcile capacity and demand discrepancies at airports. Employing Reinforcement Learning (RL) to manage the inherent uncertainties in the national airspace system-such as weather variability, fluctuating flight demands, and airport arrival rates-we…
▽ More
This paper explores the optimization of Ground Delay Programs (GDP), a prevalent Traffic Management Initiative used in Air Traffic Management (ATM) to reconcile capacity and demand discrepancies at airports. Employing Reinforcement Learning (RL) to manage the inherent uncertainties in the national airspace system-such as weather variability, fluctuating flight demands, and airport arrival rates-we developed two RL models: Behavioral Cloning (BC) and Conservative Q-Learning (CQL). These models are designed to enhance GDP efficiency by utilizing a sophisticated reward function that integrates ground and airborne delays and terminal area congestion. We constructed a simulated single-airport environment, SAGDP_ENV, which incorporates real operational data along with predicted uncertainties to facilitate realistic decision-making scenarios. Utilizing the whole year 2019 data from Newark Liberty International Airport (EWR), our models aimed to preemptively set airport program rates. Despite thorough modeling and simulation, initial outcomes indicated that the models struggled to learn effectively, attributed potentially to oversimplified environmental assumptions. This paper discusses the challenges encountered, evaluates the models' performance against actual operational data, and outlines future directions to refine RL applications in ATM.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Airport Delay Prediction with Temporal Fusion Transformers
Authors:
Ke Liu,
Kaijing Ding,
Xi Cheng,
Jianan Chen,
Siyuan Feng,
Hui Lin,
Jilin Song,
Chen Zhu
Abstract:
Since flight delay hurts passengers, airlines, and airports, its prediction becomes crucial for the decision-making of all stakeholders in the aviation industry and thus has been attempted by various previous research. However, previous delay predictions are often categorical and at a highly aggregated level. To improve that, this study proposes to apply the novel Temporal Fusion Transformer model…
▽ More
Since flight delay hurts passengers, airlines, and airports, its prediction becomes crucial for the decision-making of all stakeholders in the aviation industry and thus has been attempted by various previous research. However, previous delay predictions are often categorical and at a highly aggregated level. To improve that, this study proposes to apply the novel Temporal Fusion Transformer model and predict numerical airport arrival delays at quarter hour level for U.S. top 30 airports. Inputs to our model include airport demand and capacity forecasts, historic airport operation efficiency information, airport wind and visibility conditions, as well as enroute weather and traffic conditions. The results show that our model achieves satisfactory performance measured by small prediction errors on the test set. In addition, the interpretability analysis of the model outputs identifies the important input factors for delay prediction.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Discovery of Very-high-energy Gamma-ray Emissions from the Low Luminosity AGN NGC 4278 by LHAASO
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) i…
▽ More
The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) is compatible with NGC 4278 within $\sim0.03$ degree. Variation analysis shows an indication of the variability at a few months level in the TeV band, which is consistent with low frequency observations. Based on these observations, we report the detection of TeV $γ$-ray emissions from this low-luminosity AGN NGC 4278. The observations by LHAASO-WCDA during active period has a significance level of 8.8\,$σ$ with best-fit photon spectral index $\varGamma=2.56\pm0.14$ and a flux $f_{1-10\,\rm{TeV}}=(7.0\pm1.1_{\rm{sta}}\pm0.35_{\rm{syst}})\times10^{-13}\,\rm{photons\,cm^{-2}\,s^{-1}}$, or approximately $5\%$ of the Crab Nebula. The discovery of VHE from NGC 4278 indicates that the compact, weak radio jet can efficiently accelerate particles and emit TeV photons.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
MM-InstructEval: Zero-Shot Evaluation of (Multimodal) Large Language Models on Multimodal Reasoning Tasks
Authors:
Xiaocui Yang,
Wenfang Wu,
Shi Feng,
Ming Wang,
Daling Wang,
Yang Li,
Qi Sun,
Yifei Zhang,
Xiaoming Fu,
Soujanya Poria
Abstract:
The rising popularity of multimodal large language models (MLLMs) has sparked a significant increase in research dedicated to evaluating these models. However, current evaluation studies predominantly concentrate on the ability of models to comprehend and reason within a unimodal (vision-only) context, overlooking critical performance evaluations in complex multimodal reasoning tasks that integrat…
▽ More
The rising popularity of multimodal large language models (MLLMs) has sparked a significant increase in research dedicated to evaluating these models. However, current evaluation studies predominantly concentrate on the ability of models to comprehend and reason within a unimodal (vision-only) context, overlooking critical performance evaluations in complex multimodal reasoning tasks that integrate both visual and text contexts. Furthermore, tasks that demand reasoning across multiple modalities pose greater challenges and require a deep understanding of multimodal contexts. In this paper, we introduce a comprehensive assessment framework named MM-InstructEval, which integrates a diverse array of metrics to provide an extensive evaluation of the performance of various models and instructions across a broad range of multimodal reasoning tasks with vision-text contexts. MM-InstructEval enhances the research on the performance of MLLMs in complex multimodal reasoning tasks, facilitating a more thorough and holistic zero-shot evaluation of MLLMs. We firstly utilize the "Best Performance" metric to determine the upper performance limit of each model across various datasets. The "Mean Relative Gain" metric provides an analysis of the overall performance across different models and instructions, while the "Stability" metric evaluates their sensitivity to variations. Historically, the research has focused on evaluating models independently or solely assessing instructions, overlooking the interplay between models and instructions. To address this gap, we introduce the "Adaptability" metric, designed to quantify the degree of adaptability between models and instructions. Evaluations are conducted on 31 models (23 MLLMs) across 16 multimodal datasets, covering 6 tasks, with 10 distinct instructions. The extensive analysis enables us to derive novel insights.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
MUD: Towards a Large-Scale and Noise-Filtered UI Dataset for Modern Style UI Modeling
Authors:
Sidong Feng,
Suyu Ma,
Han Wang,
David Kong,
Chunyang Chen
Abstract:
The importance of computational modeling of mobile user interfaces (UIs) is undeniable. However, these require a high-quality UI dataset. Existing datasets are often outdated, collected years ago, and are frequently noisy with mismatches in their visual representation. This presents challenges in modeling UI understanding in the wild. This paper introduces a novel approach to automatically mine UI…
▽ More
The importance of computational modeling of mobile user interfaces (UIs) is undeniable. However, these require a high-quality UI dataset. Existing datasets are often outdated, collected years ago, and are frequently noisy with mismatches in their visual representation. This presents challenges in modeling UI understanding in the wild. This paper introduces a novel approach to automatically mine UI data from Android apps, leveraging Large Language Models (LLMs) to mimic human-like exploration. To ensure dataset quality, we employ the best practices in UI noise filtering and incorporate human annotation as a final validation step. Our results demonstrate the effectiveness of LLMs-enhanced app exploration in mining more meaningful UIs, resulting in a large dataset MUD of 18k human-annotated UIs from 3.3k apps. We highlight the usefulness of MUD in two common UI modeling tasks: element detection and UI retrieval, showcasing its potential to establish a foundation for future research into high-quality, modern UIs.
△ Less
Submitted 11 May, 2024;
originally announced May 2024.
-
LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought
Authors:
Zhuoxuan Jiang,
Haoyuan Peng,
Shanshan Feng,
Fan Li,
Dongsheng Li
Abstract:
Self-correction is emerging as a promising approach to mitigate the issue of hallucination in Large Language Models (LLMs). To facilitate effective self-correction, recent research has proposed mistake detection as its initial step. However, current literature suggests that LLMs often struggle with reliably identifying reasoning mistakes when using simplistic prompting strategies. To address this…
▽ More
Self-correction is emerging as a promising approach to mitigate the issue of hallucination in Large Language Models (LLMs). To facilitate effective self-correction, recent research has proposed mistake detection as its initial step. However, current literature suggests that LLMs often struggle with reliably identifying reasoning mistakes when using simplistic prompting strategies. To address this challenge, we introduce a unique prompting strategy, termed the Pedagogical Chain-of-Thought (PedCoT), which is specifically designed to guide the identification of reasoning mistakes, particularly mathematical reasoning mistakes. PedCoT consists of pedagogical principles for prompts (PPP) design, two-stage interaction process (TIP) and grounded PedCoT prompts, all inspired by the educational theory of the Bloom Cognitive Model (BCM). We evaluate our approach on two public datasets featuring math problems of varying difficulty levels. The experiments demonstrate that our zero-shot prompting strategy significantly outperforms strong baselines. The proposed method can achieve the goal of reliable mathematical mistake identification and provide a foundation for automatic math answer grading. The results underscore the significance of educational theory, serving as domain knowledge, in guiding prompting strategy design for addressing challenging tasks with LLMs effectively.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Fire in SRRN: Next-Gen 3D Temperature Field Reconstruction Technology
Authors:
Shenxiang Feng,
Xiaojian Hao,
Xiaodong Huang,
Pan Pei,
Tong Wei,
Chenyang Xu
Abstract:
In aerospace and energy engineering, accurate 3D combustion field temperature measurement is critical. The resolution of traditional methods based on algebraic iteration is limited by the initial voxel division. This study introduces a novel method for reconstructing three-dimensional temperature fields using the Spatial Radiation Representation Network (SRRN). This method utilizes the flame therm…
▽ More
In aerospace and energy engineering, accurate 3D combustion field temperature measurement is critical. The resolution of traditional methods based on algebraic iteration is limited by the initial voxel division. This study introduces a novel method for reconstructing three-dimensional temperature fields using the Spatial Radiation Representation Network (SRRN). This method utilizes the flame thermal radiation characteristics and differentiable rendering in graphics, and combines it with a multi-layer perceptron to achieve a functional representation of the flame temperature field. The effectiveness of SRRN is evaluated through simulated temperature field reconstruction experiments with different levels of complexity. The maximum root mean square error is 10.17, which proves the robustness of the algorithm to Gaussian noise and salt-and-pepper noise. We conducted a butane flame temperature field reconstruction experiment, and the maximum relative error between the reconstruction result and the thermocouple measurement value was 4.86%, confirming that the algorithm can achieve accurate reconstruction.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
CO Observations of Early-mid Stage Major-mergers in MaNGA Survey
Authors:
Qingzheng Yu,
Taotao Fang,
Cong Kevin Xu,
Shuai Feng,
Siyi Feng,
Yu Gao,
Xue-Jian Jiang,
Ute Lisenfeld
Abstract:
We present a study of the molecular gas in early-mid stage major-mergers, with a sample of 43 major-merger galaxy pairs selected from the Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey and a control sample of 195 isolated galaxies selected from the xCOLD GASS survey. Adopting kinematic asymmetry as a new effective indicator to describe the merger stage, we aim to study the role…
▽ More
We present a study of the molecular gas in early-mid stage major-mergers, with a sample of 43 major-merger galaxy pairs selected from the Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey and a control sample of 195 isolated galaxies selected from the xCOLD GASS survey. Adopting kinematic asymmetry as a new effective indicator to describe the merger stage, we aim to study the role of molecular gas in the merger-induced star formation enhancement along the merger sequence of galaxy pairs. We obtain the molecular gas properties from CO observations with the James Clerk Maxwell Telescope (JCMT), Institut de Radioastronomie Milimetrique (IRAM) 30-m telescope, and the MASCOT survey. Using these data, we investigate the differences in molecular gas fraction ($f_{\rm H_{2}}$), star formation rate (SFR), star formation efficiency (SFE), molecular-to-atomic gas ratio ($M_{\rm H_{2}}/M_{\rm HI}$), total gas fraction ($f_{\rm gas}$), and the star formation efficiency of total gas (${\rm SFE_{gas}}$) between the pair and control samples. In the full pair sample, our results suggest the $f_{\rm H_{2}}$ of paired galaxies is significantly enhanced, while the SFE is comparable to that of isolated galaxies. We detect significantly increased $f_{\rm H_{2}}$ and $M_{\rm H_{2}}/M_{\rm HI}$ in paired galaxies at the pericenter stage, indicating an accelerated transition from atomic gas to molecular gas due to interactions. Our results indicate that the elevation of $f_{\rm H_{2}}$ plays a major role in the enhancement of global SFR in paired galaxies at the pericenter stage, while the contribution of enhanced SFE in specific regions requires further explorations through spatially resolved observations of a larger sample spanning a wide range of merger stages.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Enhanced second harmonic generation in high-$Q$ all-dielectric metasurfaces with backward frequency conversion
Authors:
Xu Tu,
Siqi Feng,
Jiajun Li,
Yangguang Xing,
Feng Wu,
Tingting Liu,
Shuyuan Xiao
Abstract:
Here we employ the quasi-bound state in the continuum (quasi-BIC) resonance in all-dielectric metasurfaces for efficient nonlinear processes in consideration of the backward frequency conversion. We theoretically study the second-harmonic generation (SHG) from symmetry-broken AlGaAs metasurfaces and reveal the efficiency enhancement empowered by high-$Q$ quasi-BIC resonances. By introducing the co…
▽ More
Here we employ the quasi-bound state in the continuum (quasi-BIC) resonance in all-dielectric metasurfaces for efficient nonlinear processes in consideration of the backward frequency conversion. We theoretically study the second-harmonic generation (SHG) from symmetry-broken AlGaAs metasurfaces and reveal the efficiency enhancement empowered by high-$Q$ quasi-BIC resonances. By introducing the correction term of nonlinear polarization at the fundamental wave field to the conventional undepleted approximation, we uncover the effect of backward frequency conversion on the nonlinear conversation efficiency. The SHG efficiency as $2.45\times10^{-2}$ with the developed depleted model, shows a $14.3\%$ decrease compared with $2.86\times10^{-2}$ in conventional undepleted approximation, under the incident intensity of 10 MW/cm$^{2}$. Our results are of significant importance for designing efficient nonlinear metasurfaces supporting high-$Q$ resonances.
△ Less
Submitted 11 June, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Central limit theorems associated with the hierarchical Dirichlet process
Authors:
Shui Feng,
J. E. Paguyo
Abstract:
The Dirichlet process is a discrete random measure specified by a concentration parameter and a base distribution, and is used as a prior distribution in Bayesian nonparametrics. The hierarchical Dirichlet process generalizes the Dirichlet process by randomizing the base distribution through a draw from another Dirichlet process. It is motivated by the study of groups of clustered data, where the…
▽ More
The Dirichlet process is a discrete random measure specified by a concentration parameter and a base distribution, and is used as a prior distribution in Bayesian nonparametrics. The hierarchical Dirichlet process generalizes the Dirichlet process by randomizing the base distribution through a draw from another Dirichlet process. It is motivated by the study of groups of clustered data, where the group specific Dirichlet processes are linked through an intergroup Dirichlet process. Focusing on an individual group, the hierarchical Dirichlet process is a discrete random measure whose weights have stronger dependence than the weights of the Dirichlet process. In this paper, we study the asymptotic behavior of the power sum symmetric polynomials for the vector of weights of the hierarchical Dirichlet process when the corresponding concentration parameters tend to infinity. We establish central limit theorems and obtain explicit representations for the asymptotic variances, with the latter clearly showing the impact of the hierarchical structure. These objects are closely related to the homozygosity in population genetics, the Simpson diversity index in ecology, and the Herfindahl-Hirschman index in economics.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Deep neural networks for choice analysis: Enhancing behavioral regularity with gradient regularization
Authors:
Siqi Feng,
Rui Yao,
Stephane Hess,
Ricardo A. Daziano,
Timothy Brathwaite,
Joan Walker,
Shenhao Wang
Abstract:
Deep neural networks (DNNs) frequently present behaviorally irregular patterns, significantly limiting their practical potentials and theoretical validity in travel behavior modeling. This study proposes strong and weak behavioral regularities as novel metrics to evaluate the monotonicity of individual demand functions (a.k.a. law of demand), and further designs a constrained optimization framewor…
▽ More
Deep neural networks (DNNs) frequently present behaviorally irregular patterns, significantly limiting their practical potentials and theoretical validity in travel behavior modeling. This study proposes strong and weak behavioral regularities as novel metrics to evaluate the monotonicity of individual demand functions (a.k.a. law of demand), and further designs a constrained optimization framework with six gradient regularizers to enhance DNNs' behavioral regularity. The proposed framework is applied to travel survey data from Chicago and London to examine the trade-off between predictive power and behavioral regularity for large vs. small sample scenarios and in-domain vs. out-of-domain generalizations. The results demonstrate that, unlike models with strong behavioral foundations such as the multinomial logit, the benchmark DNNs cannot guarantee behavioral regularity. However, gradient regularization (GR) increases DNNs' behavioral regularity by around 6 percentage points (pp) while retaining their relatively high predictive power. In the small sample scenario, GR is more effective than in the large sample scenario, simultaneously improving behavioral regularity by about 20 pp and log-likelihood by around 1.7%. Comparing with the in-domain generalization of DNNs, GR works more effectively in out-of-domain generalization: it drastically improves the behavioral regularity of poorly performing benchmark DNNs by around 65 pp, indicating the criticality of behavioral regularization for enhancing model transferability and application in forecasting. Moreover, the proposed framework is applicable to other NN-based choice models such as TasteNets. Future studies could use behavioral regularity as a metric along with log-likelihood in evaluating travel demand models, and investigate other methods to further enhance behavioral regularity when adopting complex machine learning models.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Application of Kalman Filter in Stochastic Differential Equations
Authors:
Wencheng Bao,
Shi Feng,
Kaiwen Zhang
Abstract:
In areas such as finance, engineering, and science, we often face situations that change quickly and unpredictably. These situations are tough to handle and require special tools and methods capable of understanding and predicting what might happen next. Stochastic Differential Equations (SDEs) are renowned for modeling and analyzing real-world dynamical systems. However, obtaining the parameters,…
▽ More
In areas such as finance, engineering, and science, we often face situations that change quickly and unpredictably. These situations are tough to handle and require special tools and methods capable of understanding and predicting what might happen next. Stochastic Differential Equations (SDEs) are renowned for modeling and analyzing real-world dynamical systems. However, obtaining the parameters, boundary conditions, and closed-form solutions of SDEs can often be challenging. In this paper, we will discuss the application of Kalman filtering theory to SDEs, including Extended Kalman filtering and Particle Extended Kalman filtering. We will explore how to fit existing SDE systems through filtering and track the original SDEs by fitting the obtained closed-form solutions. This approach aims to gather more information about these SDEs, which could be used in various ways, such as incorporating them into parameters of data-based SDE models.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
LLM Evaluators Recognize and Favor Their Own Generations
Authors:
Arjun Panickssery,
Samuel R. Bowman,
Shi Feng
Abstract:
Self-evaluation using large language models (LLMs) has proven valuable not only in benchmarking but also methods like reward modeling, constitutional AI, and self-refinement. But new biases are introduced due to the same LLM acting as both the evaluator and the evaluatee. One such bias is self-preference, where an LLM evaluator scores its own outputs higher than others' while human annotators cons…
▽ More
Self-evaluation using large language models (LLMs) has proven valuable not only in benchmarking but also methods like reward modeling, constitutional AI, and self-refinement. But new biases are introduced due to the same LLM acting as both the evaluator and the evaluatee. One such bias is self-preference, where an LLM evaluator scores its own outputs higher than others' while human annotators consider them of equal quality. But do LLMs actually recognize their own outputs when they give those texts higher scores, or is it just a coincidence? In this paper, we investigate if self-recognition capability contributes to self-preference. We discover that, out of the box, LLMs such as GPT-4 and Llama 2 have non-trivial accuracy at distinguishing themselves from other LLMs and humans. By fine-tuning LLMs, we discover a linear correlation between self-recognition capability and the strength of self-preference bias; using controlled experiments, we show that the causal explanation resists straightforward confounders. We discuss how self-recognition can interfere with unbiased evaluations and AI safety more generally.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.