subscribe to arXiv mailings

arXiv:2407.04126 [pdf, other]

Atomistic modeling of bulk and grain boundary diffusion in solid electrolyte \texorpdfstring{Li\textsubscript{6}PS\textsubscript{5}Cl}{Li6PS5Cl} using machine-learning interatomic potentials

Authors: Yongliang Ou, Yuji Ikeda, Lena Scholz, Sergiy Divinski, Felix Fritzen, Blazej Grabowski

Abstract: Li\textsubscript{6}PS\textsubscript{5}Cl is a promising candidate for the solid electrolyte in all-solid-state Li-ion batteries. In applications, this material is in a polycrystalline state with grain boundaries (GBs) that can affect ionic conductivity. While atomistic modeling provides valuable information on the impact of GBs on Li diffusion, such studies face either high computational cost (\te… ▽ More Li\textsubscript{6}PS\textsubscript{5}Cl is a promising candidate for the solid electrolyte in all-solid-state Li-ion batteries. In applications, this material is in a polycrystalline state with grain boundaries (GBs) that can affect ionic conductivity. While atomistic modeling provides valuable information on the impact of GBs on Li diffusion, such studies face either high computational cost (\textit{ab initio} methods) or accuracy limitations (classical potentials) as challenges. Here, we develop a quality-level-based active learning scheme for efficient and systematic development of \textit{ab initio}-based machine-learning interatomic potentials, specifically moment tensor potentials (MTPs), for large-scale, long-time, and high-accuracy simulations of complex atomic structures and diffusion mechanisms as encountered in solid electrolytes. Based on this scheme, we obtain MTPs for Li\textsubscript{6}PS\textsubscript{5}Cl and investigate two tilt GBs, $\Sigma3(1\bar{1}2)[110]$, $\Sigma3(\bar{1}11)[110]$, and one twist GB, $\Sigma5(001)[001]$. All three GBs exhibit low formation energies of less than \SI{20}{meV/\angstrom\textsuperscript{2}}, indicating their high stability in polycrystalline Li\textsubscript{6}PS\textsubscript{5}Cl. Using the MTPs, diffusion coefficients of the anion-ordered and anion-disordered bulk, as well as the three GBs, are obtained from molecular dynamics simulations of atomistic models. At \SI{300}{\kelvin}, the GB diffusion coefficients fall between the ones of the anion-ordered bulk structure (\SI{0.012e-7}{cm^2/s}, corresponding ionic conductivity about \SI{0.2}{mS/cm}) and the anion-disordered bulk structure (\SI{50}{\percent} Cl/S-anion disorder; \SI{2.203e-7}{cm^2/s}, about \SI{29.8}{mS/cm}) of Li\textsubscript{6}PS\textsubscript{5}Cl. Experimental data fall between the Arrhenius-extrapolated diffusion coefficients of the investigated atomic structures. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 24 pages, 13 figures, 5 tables

arXiv:2407.03753 [pdf]

Low-Complexity SVM Signal Recovery in Bandwidth-Limited 100Gb/s PAM4 PON Upstream

Authors: Liyan Wu, Yanlu Huang, Kai Jin, Shangya Han, Kun Xu, Yanni Ou

Abstract: We proposed a low-complexity SVM-based signal recovery algorithm and evaluated it in 100G-PON with 25G-class devices. For the first time, it experimentally achieved 24 dB power budget @ FEC threshold 1E-3 over 40 km SMF, improving receiver sensitivity over 2 dB compared to FFE&DFE. We proposed a low-complexity SVM-based signal recovery algorithm and evaluated it in 100G-PON with 25G-class devices. For the first time, it experimentally achieved 24 dB power budget @ FEC threshold 1E-3 over 40 km SMF, improving receiver sensitivity over 2 dB compared to FFE&DFE. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2406.19856 [pdf]

LUT-boosted CDR and Equalization for Burst-mode 50/100 Gbit/s Bandwidth-limited Flexible PON

Authors: Yanlu Huang, Liyan Wu, Shangya Han, Kai Jin, Kun Xu, Yanni Ou

Abstract: We proposed and experimentally demonstrated a look-up table boosted fast CDR and equalization scheme for the burst-mode 50/100 Gbps bandwidth-limited flexible PON, requiring no preamble for convergence and achieved the same bit error rate performance as in the case of long preambles. We proposed and experimentally demonstrated a look-up table boosted fast CDR and equalization scheme for the burst-mode 50/100 Gbps bandwidth-limited flexible PON, requiring no preamble for convergence and achieved the same bit error rate performance as in the case of long preambles. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.18532 [pdf, other]

Symbolic Learning Enables Self-Evolving Agents

Authors: Wangchunshu Zhou, Yixin Ou, Shengwei Ding, Long Li, Jialong Wu, Tiannan Wang, Jiamin Chen, Shuai Wang, Xiaohua Xu, Ningyu Zhang, Huajun Chen, Yuchen Eleanor Jiang

Abstract: The AI community has been exploring a pathway to artificial general intelligence (AGI) by developing "language agents", which are complex large language models (LLMs) pipelines involving both prompting techniques and tool usage methods. While language agents have demonstrated impressive capabilities for many real-world tasks, a fundamental limitation of current language agents research is that the… ▽ More The AI community has been exploring a pathway to artificial general intelligence (AGI) by developing "language agents", which are complex large language models (LLMs) pipelines involving both prompting techniques and tool usage methods. While language agents have demonstrated impressive capabilities for many real-world tasks, a fundamental limitation of current language agents research is that they are model-centric, or engineering-centric. That's to say, the progress on prompts, tools, and pipelines of language agents requires substantial manual engineering efforts from human experts rather than automatically learning from data. We believe the transition from model-centric, or engineering-centric, to data-centric, i.e., the ability of language agents to autonomously learn and evolve in environments, is the key for them to possibly achieve AGI. In this work, we introduce agent symbolic learning, a systematic framework that enables language agents to optimize themselves on their own in a data-centric way using symbolic optimizers. Specifically, we consider agents as symbolic networks where learnable weights are defined by prompts, tools, and the way they are stacked together. Agent symbolic learning is designed to optimize the symbolic network within language agents by mimicking two fundamental algorithms in connectionist learning: back-propagation and gradient descent. Instead of dealing with numeric weights, agent symbolic learning works with natural language simulacrums of weights, loss, and gradients. We conduct proof-of-concept experiments on both standard benchmarks and complex real-world tasks and show that agent symbolic learning enables language agents to update themselves after being created and deployed in the wild, resulting in "self-evolving agents". △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: Code available at https://github.com/aiwaves-cn/agents

arXiv:2406.15050 [pdf, other]

Tri-VQA: Triangular Reasoning Medical Visual Question Answering for Multi-Attribute Analysis

Authors: Lin Fan, Xun Gong, Cenyang Zheng, Yafei Ou

Abstract: The intersection of medical Visual Question Answering (Med-VQA) is a challenging research topic with advantages including patient engagement and clinical expert involvement for second opinions. However, existing Med-VQA methods based on joint embedding fail to explain whether their provided results are based on correct reasoning or coincidental answers, which undermines the credibility of VQA answ… ▽ More The intersection of medical Visual Question Answering (Med-VQA) is a challenging research topic with advantages including patient engagement and clinical expert involvement for second opinions. However, existing Med-VQA methods based on joint embedding fail to explain whether their provided results are based on correct reasoning or coincidental answers, which undermines the credibility of VQA answers. In this paper, we investigate the construction of a more cohesive and stable Med-VQA structure. Motivated by causal effect, we propose a novel Triangular Reasoning VQA (Tri-VQA) framework, which constructs reverse causal questions from the perspective of "Why this answer?" to elucidate the source of the answer and stimulate more reasonable forward reasoning processes. We evaluate our method on the Endoscopic Ultrasound (EUS) multi-attribute annotated dataset from five centers, and test it on medical VQA datasets. Experimental results demonstrate the superiority of our approach over existing methods. Our codes and pre-trained models are available at https://anonymous.4open.science/r/Tri_VQA. △ Less

Submitted 21 June, 2024; originally announced June 2024.

ACM Class: I.2.7; I.2.10; J.3

arXiv:2406.10569 [pdf, other]

MDA: An Interpretable Multi-Modal Fusion with Missing Modalities and Intrinsic Noise

Authors: Lin Fan, Yafei Ou, Cenyang Zheng, Pengyu Dai, Tamotsu Kamishima, Masayuki Ikebe, Kenji Suzuki, Xun Gong

Abstract: Multi-modal fusion is crucial in medical data research, enabling a comprehensive understanding of diseases and improving diagnostic performance by combining diverse modalities. However, multi-modal fusion faces challenges, including capturing interactions between modalities, addressing missing modalities, handling erroneous modal information, and ensuring interpretability. Many existing researcher… ▽ More Multi-modal fusion is crucial in medical data research, enabling a comprehensive understanding of diseases and improving diagnostic performance by combining diverse modalities. However, multi-modal fusion faces challenges, including capturing interactions between modalities, addressing missing modalities, handling erroneous modal information, and ensuring interpretability. Many existing researchers tend to design different solutions for these problems, often overlooking the commonalities among them. This paper proposes a novel multi-modal fusion framework that achieves adaptive adjustment over the weights of each modality by introducing the Modal-Domain Attention (MDA). It aims to facilitate the fusion of multi-modal information while allowing for the inclusion of missing modalities or intrinsic noise, thereby enhancing the representation of multi-modal data. We provide visualizations of accuracy changes and MDA weights by observing the process of modal fusion, offering a comprehensive analysis of its interpretability. Extensive experiments on various gastrointestinal disease benchmarks, the proposed MDA maintains high accuracy even in the presence of missing modalities and intrinsic noise. One thing worth mentioning is that the visualization of MDA is highly consistent with the conclusions of existing clinical studies on the dependence of different diseases on various modalities. Code and dataset will be made available. △ Less

Submitted 15 June, 2024; originally announced June 2024.

ACM Class: I.5.2; I.2.7; I.2.10; J.3

arXiv:2405.08074 [pdf]

Optical Imaging of Flavor Order in Flat Band Graphene

Authors: Tian Xie, Tobias M. Wolf, Siyuan Xu, Zhiyuan Cui, Richen Xiong, Yunbo Ou, Patrick Hays, Ludwig F Holleis, Yi Guo, Owen I Sheekey, Caitlin Patterson, Trevor Arp, Kenji Watanabe, Takashi Taniguchi, Seth Ariel Tongay, Andrea F Young, Allan H. MacDonald, Chenhao Jin

Abstract: Spin and valley flavor polarization plays a central role in the many-body physics of flat band graphene, with fermi surface reconstructions often accompanied by quantized anomalous Hall and superconducting state observed in a variety of experimental systems. Here we describe an optical technique that sensitively and selectively detects flavor textures via the exciton response of a proximal transit… ▽ More Spin and valley flavor polarization plays a central role in the many-body physics of flat band graphene, with fermi surface reconstructions often accompanied by quantized anomalous Hall and superconducting state observed in a variety of experimental systems. Here we describe an optical technique that sensitively and selectively detects flavor textures via the exciton response of a proximal transition metal dichalcogenide layer. Through a systematic study of rhombohedral and rotationally faulted graphene bilayers and trilayers, we show that when the semiconducting dichalcogenide is in direct contact with the graphene, the exciton response is most sensitive to the large momentum rearrangement of the Fermi surface, providing information that is distinct from and complementary to electrical compressibility measurements. The wide-field imaging capability of optical probes allows us to obtain spatial maps of flavor orders with high throughput, and with broad temperature and device compatibility. Our work paves the way for optical probing and imaging of flavor orders in flat band graphene systems. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 29 pages, 4 figures, with supplementary materials

arXiv:2405.05502 [pdf, other]

Towards Accurate and Robust Architectures via Neural Architecture Search

Authors: Yuwei Ou, Yuqi Feng, Yanan Sun

Abstract: To defend deep neural networks from adversarial attacks, adversarial training has been drawing increasing attention for its effectiveness. However, the accuracy and robustness resulting from the adversarial training are limited by the architecture, because adversarial training improves accuracy and robustness by adjusting the weight connection affiliated to the architecture. In this work, we propo… ▽ More To defend deep neural networks from adversarial attacks, adversarial training has been drawing increasing attention for its effectiveness. However, the accuracy and robustness resulting from the adversarial training are limited by the architecture, because adversarial training improves accuracy and robustness by adjusting the weight connection affiliated to the architecture. In this work, we propose ARNAS to search for accurate and robust architectures for adversarial training. First we design an accurate and robust search space, in which the placement of the cells and the proportional relationship of the filter numbers are carefully determined. With the design, the architectures can obtain both accuracy and robustness by deploying accurate and robust structures to their sensitive positions, respectively. Then we propose a differentiable multi-objective search strategy, performing gradient descent towards directions that are beneficial for both natural loss and adversarial loss, thus the accuracy and robustness can be guaranteed at the same time. We conduct comprehensive experiments in terms of white-box attacks, black-box attacks, and transferability. Experimental results show that the searched architecture has the strongest robustness with the competitive accuracy, and breaks the traditional idea that NAS-based architectures cannot transfer well to complex tasks in robustness scenarios. By analyzing outstanding architectures searched, we also conclude that accurate and robust neural architectures tend to deploy different structures near the input and output, which has great practical significance on both hand-crafting and automatically designing of accurate and robust architectures. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: Accepted by CVPR2024. arXiv admin note: substantial text overlap with arXiv:2212.14049

arXiv:2404.18550 [pdf, other]

IncidentResponseGPT: Generating Traffic Incident Response Plans with Generative Artificial Intelligence

Authors: Artur Grigorev, Adriana-Simona Mihaita Khaled Saleh, Yuming Ou

Abstract: Traffic congestion due to road incidents poses a significant challenge in urban environments, leading to increased pollution, economic losses, and traffic congestion. Efficiently managing these incidents is imperative for mitigating their adverse effects; however, the complexity of urban traffic systems and the variety of potential incidents represent a considerable obstacle. This paper introduces… ▽ More Traffic congestion due to road incidents poses a significant challenge in urban environments, leading to increased pollution, economic losses, and traffic congestion. Efficiently managing these incidents is imperative for mitigating their adverse effects; however, the complexity of urban traffic systems and the variety of potential incidents represent a considerable obstacle. This paper introduces IncidentResponseGPT, an innovative solution designed to assist traffic management authorities by providing rapid, informed, and adaptable traffic incident response plans. By integrating a Generative AI platform with real-time traffic incident reports and operational guidelines, our system aims to streamline the decision-making process in responding to traffic incidents. The research addresses the critical challenges involved in deploying AI in traffic management, including overcoming the complexity of urban traffic networks, ensuring real-time decision-making capabilities, aligning with local laws and regulations, and securing public acceptance for AI-driven systems. Through a combination of text analysis of accident reports, validation of AI recommendations through traffic simulation, and implementation of transparent and validated AI systems, IncidentResponseGPT offers a promising approach to optimizing traffic flow and reducing congestion in the face of traffic incidents. The relevance of this work extends to traffic management authorities, emergency response teams, and municipal bodies, all integral stakeholders in urban traffic control and incident management. By proposing a novel solution to the identified challenges, this research aims to develop a framework that not only facilitates faster resolution of traffic incidents but also minimizes their overall impact on urban traffic systems. △ Less

Submitted 29 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.16344 [pdf]

Imaging Tunable Luttinger Liquid Systems in van der Waals Heterostructures

Authors: Hongyuan Li, Ziyu Xiang, Tianle Wang, Mit H. Naik, Woochang Kim, Jiahui Nie, Shiyu Li, Zhehao Ge, Zehao He, Yunbo Ou, Rounak Banerjee, Takashi Taniguchi, Kenji Watanabe, Sefaattin Tongay, Alex Zettl, Steven G. Louie, Michael P. Zaletel, Michael F. Crommie, Feng Wang

Abstract: One-dimensional (1D) interacting electrons are often described as a Luttinger liquid1-4 having properties that are intrinsically different from Fermi liquids in higher dimensions5,6. 1D electrons in materials systems exhibit exotic quantum phenomena that can be tuned by both intra- and inter-1D-chain electronic interactions, but their experimental characterization can be challenging. Here we demon… ▽ More One-dimensional (1D) interacting electrons are often described as a Luttinger liquid1-4 having properties that are intrinsically different from Fermi liquids in higher dimensions5,6. 1D electrons in materials systems exhibit exotic quantum phenomena that can be tuned by both intra- and inter-1D-chain electronic interactions, but their experimental characterization can be challenging. Here we demonstrate that layer-stacking domain walls (DWs) in van der Waals heterostructures form a broadly tunable Luttinger liquid system including both isolated and coupled arrays. We have imaged the evolution of DW Luttinger liquids under different interaction regimes tuned by electron density using a novel scanning tunneling microscopy (STM) technique. Single DWs at low carrier density are highly susceptible to Wigner crystallization consistent with a spin-incoherent Luttinger liquid, while at intermediate densities dimerized Wigner crystals form due to an enhanced magneto-elastic coupling. Periodic arrays of DWs exhibit an interplay between intra- and inter-chain interactions that gives rise to new quantum phases. At low electron densities inter-chain interactions are dominant and induce a 2D electron crystal composed of phased-locked 1D Wigner crystal in a staggered configuration. Increased electron density causes intra-chain fluctuation potentials to dominate, leading to an electronic smectic liquid crystal phase where electrons are ordered with algebraical correlation decay along the chain direction but disordered between chains. Our work shows that layer-stacking DWs in 2D heterostructures offers new opportunities to explore Luttinger liquid physics. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.10980 [pdf, other]

Hyper Evidential Deep Learning to Quantify Composite Classification Uncertainty

Authors: Changbin Li, Kangshuo Li, Yuzhe Ou, Lance M. Kaplan, Audun Jøsang, Jin-Hee Cho, Dong Hyun Jeong, Feng Chen

Abstract: Deep neural networks (DNNs) have been shown to perform well on exclusive, multi-class classification tasks. However, when different classes have similar visual features, it becomes challenging for human annotators to differentiate them. This scenario necessitates the use of composite class labels. In this paper, we propose a novel framework called Hyper-Evidential Neural Network (HENN) that explic… ▽ More Deep neural networks (DNNs) have been shown to perform well on exclusive, multi-class classification tasks. However, when different classes have similar visual features, it becomes challenging for human annotators to differentiate them. This scenario necessitates the use of composite class labels. In this paper, we propose a novel framework called Hyper-Evidential Neural Network (HENN) that explicitly models predictive uncertainty due to composite class labels in training data in the context of the belief theory called Subjective Logic (SL). By placing a grouped Dirichlet distribution on the class probabilities, we treat predictions of a neural network as parameters of hyper-subjective opinions and learn the network that collects both single and composite evidence leading to these hyper-opinions by a deterministic DNN from data. We introduce a new uncertainty type called vagueness originally designed for hyper-opinions in SL to quantify composite classification uncertainty for DNNs. Our results demonstrate that HENN outperforms its state-of-the-art counterparts based on four image datasets. The code and datasets are available at: https://github.com/Hugo101/HyperEvidentialNN. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: In Proceedings of The Twelfth International Conference on Learning Representations, ICLR 2024

arXiv:2404.05888 [pdf, other]

A Realistic Surgical Simulator for Non-Rigid and Contact-Rich Manipulation in Surgeries with the da Vinci Research Kit

Authors: Yafei Ou, Sadra Zargarzadeh, Paniz Sedighi, Mahdi Tavakoli

Abstract: Realistic real-time surgical simulators play an increasingly important role in surgical robotics research, such as surgical robot learning and automation, and surgical skills assessment. Although there are a number of existing surgical simulators for research, they generally lack the ability to simulate the diverse types of objects and contact-rich manipulation tasks typically present in surgeries… ▽ More Realistic real-time surgical simulators play an increasingly important role in surgical robotics research, such as surgical robot learning and automation, and surgical skills assessment. Although there are a number of existing surgical simulators for research, they generally lack the ability to simulate the diverse types of objects and contact-rich manipulation tasks typically present in surgeries, such as tissue cutting and blood suction. In this work, we introduce CRESSim, a realistic surgical simulator based on PhysX 5 for the da Vinci Research Kit (dVRK) that enables simulating various contact-rich surgical tasks involving different surgical instruments, soft tissue, and body fluids. The real-world dVRK console and the master tool manipulator (MTM) robots are incorporated into the system to allow for teleoperation through virtual reality (VR). To showcase the advantages and potentials of the simulator, we present three examples of surgical tasks, including tissue grasping and deformation, blood suction, and tissue cutting. These tasks are performed using the simulated surgical instruments, including the large needle driver, suction irrigator, and curved scissor, through VR-based teleoperation. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 7 pages, 21st International Conference on Ubiquitous Robots (UR 2024), accepted

arXiv:2403.13547 [pdf, other]

Enhancing Traffic Incident Management with Large Language Models: A Hybrid Machine Learning Approach for Severity Classification

Authors: Artur Grigorev, Khaled Saleh, Yuming Ou, Adriana-Simona Mihaita

Abstract: This research showcases the innovative integration of Large Language Models into machine learning workflows for traffic incident management, focusing on the classification of incident severity using accident reports. By leveraging features generated by modern language models alongside conventional data extracted from incident reports, our research demonstrates improvements in the accuracy of sever… ▽ More This research showcases the innovative integration of Large Language Models into machine learning workflows for traffic incident management, focusing on the classification of incident severity using accident reports. By leveraging features generated by modern language models alongside conventional data extracted from incident reports, our research demonstrates improvements in the accuracy of severity classification across several machine learning algorithms. Our contributions are threefold. First, we present an extensive comparison of various machine learning models paired with multiple large language models for feature extraction, aiming to identify the optimal combinations for accurate incident severity classification. Second, we contrast traditional feature engineering pipelines with those enhanced by language models, showcasing the superiority of language-based feature engineering in processing unstructured text. Third, our study illustrates how merging baseline features from accident reports with language-based features can improve the severity classification accuracy. This comprehensive approach not only advances the field of incident management but also highlights the cross-domain application potential of our methodology, particularly in contexts requiring the prediction of event outcomes from unstructured textual data or features translated into textual representation. Specifically, our novel methodology was applied to three distinct datasets originating from the United States, the United Kingdom, and Queensland, Australia. This cross-continental application underlines the robustness of our approach, suggesting its potential for widespread adoption in improving incident management processes globally. △ Less

Submitted 29 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.03101 [pdf, other]

KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents

Authors: Yuqi Zhu, Shuofei Qiao, Yixin Ou, Shumin Deng, Ningyu Zhang, Shiwei Lyu, Yue Shen, Lei Liang, Jinjie Gu, Huajun Chen

Abstract: Large Language Models (LLMs) have demonstrated great potential in complex reasoning tasks, yet they fall short when tackling more sophisticated challenges, especially when interacting with environments through generating executable actions. This inadequacy primarily stems from the lack of built-in action knowledge in language agents, which fails to effectively guide the planning trajectories durin… ▽ More Large Language Models (LLMs) have demonstrated great potential in complex reasoning tasks, yet they fall short when tackling more sophisticated challenges, especially when interacting with environments through generating executable actions. This inadequacy primarily stems from the lack of built-in action knowledge in language agents, which fails to effectively guide the planning trajectories during task solving and results in planning hallucination. To address this issue, we introduce KnowAgent, a novel approach designed to enhance the planning capabilities of LLMs by incorporating explicit action knowledge. Specifically, KnowAgent employs an action knowledge base and a knowledgeable self-learning strategy to constrain the action path during planning, enabling more reasonable trajectory synthesis, and thereby enhancing the planning performance of language agents. Experimental results on HotpotQA and ALFWorld based on various backbone models demonstrate that KnowAgent can achieve comparable or superior performance to existing baselines. Further analysis indicates the effectiveness of KnowAgent in terms of planning hallucinations mitigation. Code is available in https://github.com/zjunlp/KnowAgent. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: Work in progress. Project page: https://zjunlp.github.io/project/KnowAgent/ Code: https://github.com/zjunlp/KnowAgent

arXiv:2402.06212 [pdf]

doi 10.36463/idw.2023.1488

Halo Reduction in Display Systems through Smoothed Local Histogram Equalization and Human Visual System Modeling

Authors: Prasoon Ambalathankandy, Yafei Ou, Masayuki Ikebe

Abstract: Halo artifacts significantly impact display quality. We propose a method to reduce halos in Local Histogram Equalization (LHE) algorithms by separately addressing dark and light variants. This approach results in visually natural images by exploring the relationship between lateral inhibition and halo artifacts in the human visual system. Halo artifacts significantly impact display quality. We propose a method to reduce halos in Local Histogram Equalization (LHE) algorithms by separately addressing dark and light variants. This approach results in visually natural images by exploring the relationship between lateral inhibition and halo artifacts in the human visual system. △ Less

Submitted 9 February, 2024; originally announced February 2024.

ACM Class: I.4.3

arXiv:2402.04587 [pdf, other]

Sparse Anatomical Prompt Semi-Supervised Learning with Masked Image Modeling for CBCT Tooth Segmentation

Authors: Pengyu Dai, Yafei Ou, Yang Liu, Yue Zhao

Abstract: Accurate tooth identification and segmentation in Cone Beam Computed Tomography (CBCT) dental images can significantly enhance the efficiency and precision of manual diagnoses performed by dentists. However, existing segmentation methods are mainly developed based on large data volumes training, on which their annotations are extremely time-consuming. Meanwhile, the teeth of each class in CBCT den… ▽ More Accurate tooth identification and segmentation in Cone Beam Computed Tomography (CBCT) dental images can significantly enhance the efficiency and precision of manual diagnoses performed by dentists. However, existing segmentation methods are mainly developed based on large data volumes training, on which their annotations are extremely time-consuming. Meanwhile, the teeth of each class in CBCT dental images being closely positioned, coupled with subtle inter-class differences, gives rise to the challenge of indistinct boundaries when training model with limited data. To address these challenges, this study aims to propose a tasked-oriented Masked Auto-Encoder paradigm to effectively utilize large amounts of unlabeled data to achieve accurate tooth segmentation with limited labeled data. Specifically, we first construct a self-supervised pre-training framework of masked auto encoder to efficiently utilize unlabeled data to enhance the network performance. Subsequently, we introduce a sparse masked prompt mechanism based on graph attention to incorporate boundary information of the teeth, aiding the network in learning the anatomical structural features of teeth. To the best of our knowledge, we are pioneering the integration of the mask pre-training paradigm into the CBCT tooth segmentation task. Extensive experiments demonstrate both the feasibility of our proposed method and the potential of the boundary prompt mechanism. △ Less

Submitted 7 February, 2024; originally announced February 2024.

ACM Class: I.4.6

arXiv:2402.04583 [pdf, other]

doi 10.2352/CIC.2023.31.1.11

A Psychological Study: Importance of Contrast and Luminance in Color to Grayscale Mapping

Authors: Prasoon Ambalathankandy, Yafei Ou, Sae Kaneko, Masayuki Ikebe

Abstract: Grayscale images are essential in image processing and computer vision tasks. They effectively emphasize luminance and contrast, highlighting important visual features, while also being easily compatible with other algorithms. Moreover, their simplified representation makes them efficient for storage and transmission purposes. While preserving contrast is important for maintaining visual quality,… ▽ More Grayscale images are essential in image processing and computer vision tasks. They effectively emphasize luminance and contrast, highlighting important visual features, while also being easily compatible with other algorithms. Moreover, their simplified representation makes them efficient for storage and transmission purposes. While preserving contrast is important for maintaining visual quality, other factors such as preserving information relevant to the specific application or task at hand may be more critical for achieving optimal performance. To evaluate and compare different decolorization algorithms, we designed a psychological experiment. During the experiment, participants were instructed to imagine color images in a hypothetical "colorless world" and select the grayscale image that best resembled their mental visualization. We conducted a comparison between two types of algorithms: (i) perceptual-based simple color space conversion algorithms, and (ii) spatial contrast-based algorithms, including iteration-based methods. Our experimental findings indicate that CIELAB exhibited superior performance on average, providing further evidence for the effectiveness of perception-based decolorization algorithms. On the other hand, the spatial contrast-based algorithms showed relatively poorer performance, possibly due to factors such as DC-offset and artificial contrast generation. However, these algorithms demonstrated shorter selection times. Notably, no single algorithm consistently outperformed the others across all test images. In this paper, we will delve into a comprehensive discussion on the significance of contrast and luminance in color-to-grayscale mapping based on our experimental results and analysis. △ Less

Submitted 6 February, 2024; originally announced February 2024.

ACM Class: I.4.3

arXiv:2402.03049 [pdf, other]

EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models

Authors: Yixin Ou, Ningyu Zhang, Honghao Gui, Ziwen Xu, Shuofei Qiao, Yida Xue, Runnan Fang, Kangwei Liu, Lei Li, Zhen Bi, Guozhou Zheng, Huajun Chen

Abstract: In recent years, instruction tuning has gained increasing attention and emerged as a crucial technique to enhance the capabilities of Large Language Models (LLMs). To construct high-quality instruction datasets, many instruction processing approaches have been proposed, aiming to achieve a delicate balance between data quantity and data quality. Nevertheless, due to inconsistencies that persist am… ▽ More In recent years, instruction tuning has gained increasing attention and emerged as a crucial technique to enhance the capabilities of Large Language Models (LLMs). To construct high-quality instruction datasets, many instruction processing approaches have been proposed, aiming to achieve a delicate balance between data quantity and data quality. Nevertheless, due to inconsistencies that persist among various instruction processing methods, there is no standard open-source instruction processing implementation framework available for the community, which hinders practitioners from further developing and advancing. To facilitate instruction processing research and development, we present EasyInstruct, an easy-to-use instruction processing framework for LLMs, which modularizes instruction generation, selection, and prompting, while also considering their combination and interaction. EasyInstruct is publicly released and actively maintained at https://github.com/zjunlp/EasyInstruct, along with an online demo app and a demo video for quick-start, calling for broader research centered on instruction data and synthetic data. △ Less

Submitted 23 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: ACL 2024 System Demonstrations; Project website: https://zjunlp.github.io/project/EasyInstruct Code: https://github.com/zjunlp/EasyInstruct Video: https://youtu.be/rfQOWYfziFo Demo: https://huggingface.co/spaces/zjunlp/EasyInstruct

arXiv:2402.02079 [pdf, other]

Prototypical Contrastive Learning through Alignment and Uniformity for Recommendation

Authors: Yangxun Ou, Lei Chen, Fenglin Pan, Yupeng Wu

Abstract: Graph Collaborative Filtering (GCF), one of the most widely adopted recommendation system methods, effectively captures intricate relationships between user and item interactions. Graph Contrastive Learning (GCL) based GCF has gained significant attention as it leverages self-supervised techniques to extract valuable signals from real-world scenarios. However, many methods usually learn the instan… ▽ More Graph Collaborative Filtering (GCF), one of the most widely adopted recommendation system methods, effectively captures intricate relationships between user and item interactions. Graph Contrastive Learning (GCL) based GCF has gained significant attention as it leverages self-supervised techniques to extract valuable signals from real-world scenarios. However, many methods usually learn the instances of discrimination tasks that involve the construction of contrastive pairs through random sampling. GCL approaches suffer from sampling bias issues, where the negatives might have a semantic structure similar to that of the positives, thus leading to a loss of effective feature representation. To address these problems, we present the \underline{Proto}typical contrastive learning through \underline{A}lignment and \underline{U}niformity for recommendation, which is called \textbf{ProtoAU}. Specifically, we first propose prototypes (cluster centroids) as a latent space to ensure consistency across different augmentations from the origin graph, aiming to eliminate the need for random sampling of contrastive pairs. Furthermore, the absence of explicit negatives means that directly optimizing the consistency loss between instance and prototype could easily result in dimensional collapse issues. Therefore, we propose aligning and maintaining uniformity in the prototypes of users and items as optimization objectives to prevent falling into trivial solutions. Finally, we conduct extensive experiments on four datasets and evaluate their performance on the task of link prediction. Experimental results demonstrate that the proposed ProtoAU outperforms other representative methods. The source codes of our proposed ProtoAU are available at \url{https://github.com/oceanlvr/ProtoAU}. △ Less

Submitted 3 February, 2024; originally announced February 2024.

arXiv:2401.15496 [pdf, other]

Baichuan2-Sum: Instruction Finetune Baichuan2-7B Model for Dialogue Summarization

Authors: Jianfei Xiao, Yancan Chen, Yimin Ou, Hanyi Yu, Kai Shu, Yiyong Xiao

Abstract: Large language models (LLMs) like Llama, Baichuan and Bloom models show remarkable ability with instruction fine-tuning in many natural language tasks. Nevertheless, for the dialogue summarization task, which aims to generate summaries for different roles in dialogue, most of the state-of-the-art methods conduct on small models (e.g Bart and Bert). Existing methods try to add task specified optimi… ▽ More Large language models (LLMs) like Llama, Baichuan and Bloom models show remarkable ability with instruction fine-tuning in many natural language tasks. Nevertheless, for the dialogue summarization task, which aims to generate summaries for different roles in dialogue, most of the state-of-the-art methods conduct on small models (e.g Bart and Bert). Existing methods try to add task specified optimization on small models like adding global-local centrality score to models. In this paper, we propose an instruction fine-tuning model: Baichuan2-Sum, for role-oriented diaglouge summarization. By setting different instructions for different roles, the model can learn from the dialogue interactions and output the expected summaries. Furthermore, we applied NEFTune technique to add suitable noise during training to improve the results. The experiments demonstrate that the proposed model achieves the new state-of-the-art results on two public dialogue summarization datasets: CSDS and SAMSUM. We release our model and related codes to facilitate future studies on dialogue summarization task. △ Less

Submitted 3 April, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

arXiv:2312.15028 [pdf, other]

Enhanced Ferromagnetism in Monolayer Cr2Te3 via Topological Insulator Coupling

Authors: Yunbo Ou, Murod Mirzhalilov, Norbert M. Nemes, Jose L. Martinez, Mirko Rocci, Austin Akey, Wenbo Ge, Dhavala Suri, Yiping Wang, Haile Ambaye, Jong Keum, Mohit Randeria, Nandini Trivedi, Kenneth S. Burch, David C. Bell, Weida Wu, Don Heiman, Valeria Lauter, Jagadeesh S. Moodera, Hang Chi

Abstract: Exchange-coupled interfaces are pivotal in exploiting two-dimensional (2D) ferromagnetism. Due to the extraordinary correlations among charge, spin, orbital and lattice degrees of freedom, layered magnetic transition metal chalcogenides (TMCs) bode well for exotic topological phenomena. Here we report the realization of wafer-scale Cr2Te3 down to monolayer (ML) on insulating SrTiO3(111) substrates… ▽ More Exchange-coupled interfaces are pivotal in exploiting two-dimensional (2D) ferromagnetism. Due to the extraordinary correlations among charge, spin, orbital and lattice degrees of freedom, layered magnetic transition metal chalcogenides (TMCs) bode well for exotic topological phenomena. Here we report the realization of wafer-scale Cr2Te3 down to monolayer (ML) on insulating SrTiO3(111) substrates using molecular beam epitaxy. Robust ferromagnetism emerges in 2D Cr2Te3 ML with a Curie temperature TC = 17 K. Moreover, when Cr2Te3 is proximitized with topological insulator (TI) (Bi,Sb)2Te3, the magnetism becomes stronger -- for 1 ML, TC increases to 30 K, while for 2 ML it boosts from 65 K to 82 K. Our experiments and theory strongly indicate that the Bloembergen-Rowland interaction is likely a universal aspect of TC enhancement in TI-coupled magnetic heterostructures. The topological-surface-enhanced magnetism in 2D TMC enables further exchange coupling physics and quantum hybrid studies, including paving the way to realize interface-modulated topological electronics. △ Less

Submitted 22 December, 2023; originally announced December 2023.

Comments: Main: 9 pages, 5 figures; SI: 3 pages, 5 figures

arXiv:2312.03251 [pdf]

Electrically controlled interlayer trion fluid in electron-hole bilayers

Authors: Ruishi Qi, Qize Li, Zuocheng Zhang, Sudi Chen, Jingxu Xie, Yunbo Ou, Zhiyuan Cui, David D. Dai, Andrew Y. Joe, Takashi Taniguchi, Kenji Watanabe, Sefaattin Tongay, Alex Zettl, Liang Fu, Feng Wang

Abstract: The combination of repulsive and attractive Coulomb interactions in a quantum electron(e)-hole(h) fluid can give rise to novel correlated phases of multiparticle charge complexes such as excitons, trions and biexcitons. Here we report the first experimental realization of an electrically controlled interlayer trion fluid in two-dimensional van der Waals heterostructures. We demonstrate that in the… ▽ More The combination of repulsive and attractive Coulomb interactions in a quantum electron(e)-hole(h) fluid can give rise to novel correlated phases of multiparticle charge complexes such as excitons, trions and biexcitons. Here we report the first experimental realization of an electrically controlled interlayer trion fluid in two-dimensional van der Waals heterostructures. We demonstrate that in the strong coupling regime of electron-hole bilayers, electrons and holes in separate layers can spontaneously form three-particle trion bound states that resemble positronium ions in high energy physics. The interlayer trions can assume 1e-2h and 2e-1h configurations, where electrons and holes are confined in different transition metal dichalcogenide layers. We show that the two correlated holes in 1e-2h trions form a spin-singlet state with a spin gap of ~1meV. By electrostatic gating, the equilibrium state of our system can be continuously tuned into an exciton fluid, a trion fluid, an exciton-trion mixture, a trion-charge mixture or an electron-hole plasma. Upon optical excitation, the system can host novel high-order multiparticle charge complexes including interlayer four-particle complex (tetrons) and five-particle complex (pentons). Our work demonstrates a unique platform to study novel correlated phases of tunable Bose-Fermi mixtures and opens up new opportunities to realize artificial ions/molecules in electronic devices. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2311.17288 [pdf, other]

Sobolev smoothing estimates for bilinear maximal operators with fractal dilation sets

Authors: Tainara Borges, Benjamin Foster, Yumeng Ou

Abstract: Given a hypersurface $S\subset \mathbb{R}^{2d}$, we study the bilinear averaging operator that averages a pair of functions over $S$, as well as more general bilinear multipliers of limited decay and various maximal analogs. Of particular interest are bilinear maximal operators associated to a fractal dilation set $E\subset [1,2]$; in this case, the boundedness region of the maximal operator is as… ▽ More Given a hypersurface $S\subset \mathbb{R}^{2d}$, we study the bilinear averaging operator that averages a pair of functions over $S$, as well as more general bilinear multipliers of limited decay and various maximal analogs. Of particular interest are bilinear maximal operators associated to a fractal dilation set $E\subset [1,2]$; in this case, the boundedness region of the maximal operator is associated to the geometry of the hypersurface and various notions of the dimension of the dilation set. In particular, we determine Sobolev smoothing estimates at the exponent $L^2 \times L^2 \rightarrow L^2$ using Fourier-analytic methods, which allow us to deduce additional $L^p$ improving bounds for the operators and sparse bounds and their weighted corollaries for the associated multi-scale maximal functions. We also extend the method to study analogues of these questions for the triangle averaging operator and biparameter averaging operators. In addition, some necessary conditions for boundedness of these operators are obtained. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: 40 pages, 5 figures

MSC Class: 42B15; 42B25

arXiv:2311.11933 [pdf, other]

Spin Hall conductivity in Bi$_{1-x}$Sb$_x$ as an experimental test of bulk-boundary correspondence

Authors: Yongxi Ou, Wilson Yanez-Parreño, Yu-sheng Huang, Supriya Ghosh, Cüneyt Şahin, Max Stanley, Sandra Santhosh, Saurav Islam, Anthony Richardella, K. Andre Mkhoyan, Michael E. Flatté, Nitin Samarth

Abstract: Bulk-boundary correspondence is a foundational principle underlying the electronic band structure and physical behavior of topological quantum materials. Although it has been rigorously tested in topological systems where the physical properties involve charge currents, it remains unclear whether bulk-boundary correspondence should also hold for non-conserved spin currents. We study charge-to-spin… ▽ More Bulk-boundary correspondence is a foundational principle underlying the electronic band structure and physical behavior of topological quantum materials. Although it has been rigorously tested in topological systems where the physical properties involve charge currents, it remains unclear whether bulk-boundary correspondence should also hold for non-conserved spin currents. We study charge-to-spin conversion in a canonical topological insulator, Bi$_{1-x}$Sb$_x$, to address this fundamentally unresolved question. We use spin-torque ferromagnetic resonance measurements to accurately probe the charge-to-spin conversion efficiency in epitaxial Bi$_{1-x}$Sb$_x$~thin films of high structural quality spanning the entire range of composition, including both trivial and topological band structures, as verified using {\it in vacuo} angle-resolved photoemission spectroscopy. From these measurements, we deduce the effective spin Hall conductivity (SHC) and find excellent agreement with the values predicted by tight-binding calculations for the intrinsic SHC of the bulk bands. These results provide strong evidence that the strong spin-orbit entanglement of bulk states well below the Fermi energy connects directly to the SHC in epitaxial Bi$_{1-x}$Sb$_x$~films interfaced with a metallic ferromagnet. The excellent agreement between theory and experiment points to the generic value of analyses focused entirely on bulk properties, even for topological systems involving non-conserved spin currents. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.11899 [pdf, other]

doi 10.1063/5.0190217

Epitaxial growth and characterization of Bi$_{1-x}$Sb$_x$ thin films on (0001) sapphire substrates

Authors: Yu-Sheng Huang, Saurav Islam, Yongxi Ou, Supriya Ghosh, Anthony Richardella, K. Andre Mkhoyan, Nitin Samarth

Abstract: We report the molecular beam epitaxy of Bi_1-xSb_x thin films ($0 \leq x \leq 1$) on (0001) sapphire substrates using a thin (Bi,Sb)$_2$Te$_3$ buffer layer. Characterization of the films using reflection high energy diffraction, x-ray diffraction, atomic force microscopy, and scanning transmission electron microscopy reveals epitaxial growth of films of reasonable structural quality. This is furth… ▽ More We report the molecular beam epitaxy of Bi_1-xSb_x thin films ($0 \leq x \leq 1$) on (0001) sapphire substrates using a thin (Bi,Sb)$_2$Te$_3$ buffer layer. Characterization of the films using reflection high energy diffraction, x-ray diffraction, atomic force microscopy, and scanning transmission electron microscopy reveals epitaxial growth of films of reasonable structural quality. This is further confirmed via x-ray diffraction pole figures that determine the epitaxial registry between the thin film and substrate. We further investigate the microscopic structure of thin films via Raman spectroscopy, demonstrating how the vibrational modes vary as the composition changes and discussing the implications for the crystal structure. We also characterize the samples using electrical transport measurements. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Journal ref: APL Mater. 12, 021106 (2024)

arXiv:2311.05371 [pdf, other]

Training Robust Deep Physiological Measurement Models with Synthetic Video-based Data

Authors: Yuxuan Ou, Yuzhe Zhang, Yuntang Wang, Shwetak Patel, Daniel McDuf, Yuzhe Yang, Xin Liu

Abstract: Recent advances in supervised deep learning techniques have demonstrated the possibility to remotely measure human physiological vital signs (e.g., photoplethysmograph, heart rate) just from facial videos. However, the performance of these methods heavily relies on the availability and diversity of real labeled data. Yet, collecting large-scale real-world data with high-quality labels is typically… ▽ More Recent advances in supervised deep learning techniques have demonstrated the possibility to remotely measure human physiological vital signs (e.g., photoplethysmograph, heart rate) just from facial videos. However, the performance of these methods heavily relies on the availability and diversity of real labeled data. Yet, collecting large-scale real-world data with high-quality labels is typically challenging and resource intensive, which also raises privacy concerns when storing personal bio-metric data. Synthetic video-based datasets (e.g., SCAMPS \cite{mcduff2022scamps}) with photo-realistic synthesized avatars are introduced to alleviate the issues while providing high-quality synthetic data. However, there exists a significant gap between synthetic and real-world data, which hinders the generalization of neural models trained on these synthetic datasets. In this paper, we proposed several measures to add real-world noise to synthetic physiological signals and corresponding facial videos. We experimented with individual and combined augmentation methods and evaluated our framework on three public real-world datasets. Our results show that we were able to reduce the average MAE from 6.9 to 2.0. △ Less

Submitted 15 November, 2023; v1 submitted 9 November, 2023; originally announced November 2023.

arXiv:2310.20155 [pdf]

doi 10.1021/acs.jctc.3c01203

MLatom 3: Platform for machine learning-enhanced computational chemistry simulations and workflows

Authors: Pavlo O. Dral, Fuchun Ge, Yi-Fan Hou, Peikun Zheng, Yuxinxin Chen, Mario Barbatti, Olexandr Isayev, Cheng Wang, Bao-Xin Xue, Max Pinheiro Jr, Yuming Su, Yiheng Dai, Yangtao Chen, Lina Zhang, Shuang Zhang, Arif Ullah, Quanhao Zhang, Yanchi Ou

Abstract: Machine learning (ML) is increasingly becoming a common tool in computational chemistry. At the same time, the rapid development of ML methods requires a flexible software framework for designing custom workflows. MLatom 3 is a program package designed to leverage the power of ML to enhance typical computational chemistry simulations and to create complex workflows. This open-source package provid… ▽ More Machine learning (ML) is increasingly becoming a common tool in computational chemistry. At the same time, the rapid development of ML methods requires a flexible software framework for designing custom workflows. MLatom 3 is a program package designed to leverage the power of ML to enhance typical computational chemistry simulations and to create complex workflows. This open-source package provides plenty of choice to the users who can run simulations with the command line options, input files, or with scripts using MLatom as a Python package, both on their computers and on the online XACS cloud computing at XACScloud.com. Computational chemists can calculate energies and thermochemical properties, optimize geometries, run molecular and quantum dynamics, and simulate (ro)vibrational, one-photon UV/vis absorption, and two-photon absorption spectra with ML, quantum mechanical, and combined models. The users can choose from an extensive library of methods containing pre-trained ML models and quantum mechanical approximations such as AIQM1 approaching coupled-cluster accuracy. The developers can build their own models using various ML algorithms. The great flexibility of MLatom is largely due to the extensive use of the interfaces to many state-of-the-art software packages and libraries. △ Less

Submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.13574 [pdf, other]

doi 10.1109/JBHI.2024.3410274

Progressive Dual Priori Network for Generalized Breast Tumor Segmentation

Authors: Li Wang, Lihui Wang, Zixiang Kuai, Lei Tang, Yingfeng Ou, Chen Ye, Yuemin Zhu

Abstract: To promote the generalization ability of breast tumor segmentation models, as well as to improve the segmentation performance for breast tumors with smaller size, low-contrast and irregular shape, we propose a progressive dual priori network (PDPNet) to segment breast tumors from dynamic enhanced magnetic resonance images (DCE-MRI) acquired at different centers. The PDPNet first cropped tumor regi… ▽ More To promote the generalization ability of breast tumor segmentation models, as well as to improve the segmentation performance for breast tumors with smaller size, low-contrast and irregular shape, we propose a progressive dual priori network (PDPNet) to segment breast tumors from dynamic enhanced magnetic resonance images (DCE-MRI) acquired at different centers. The PDPNet first cropped tumor regions with a coarse-segmentation based localization module, then the breast tumor mask was progressively refined by using the weak semantic priori and cross-scale correlation prior knowledge. To validate the effectiveness of PDPNet, we compared it with several state-of-the-art methods on multi-center datasets. The results showed that, comparing against the suboptimal method, the DSC and HD95 of PDPNet were improved at least by 5.13% and 7.58% respectively on multi-center test sets. In addition, through ablations, we demonstrated that the proposed localization module can decrease the influence of normal tissues and therefore improve the generalization ability of the model. The weak semantic priors allow focusing on tumor regions to avoid missing small tumors and low-contrast tumors. The cross-scale correlation priors are beneficial for promoting the shape-aware ability for irregular tumors. Thus integrating them in a unified framework improved the multi-center breast tumor segmentation performance. The source code and open data can be accessed at https://github.com/wangli100209/PDPNet. △ Less

Submitted 16 June, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

Comments: 14 pages, 12 figures

Journal ref: IEEE Journal of Biomedical and Health Informatics, 2024

arXiv:2310.09444 [pdf, other]

Tackling Heterogeneity in Medical Federated learning via Vision Transformers

Authors: Erfan Darzi, Yiqing Shen, Yangming Ou, Nanna M. Sijtsema, P. M. A van Ooijen

Abstract: Optimization-based regularization methods have been effective in addressing the challenges posed by data heterogeneity in medical federated learning, particularly in improving the performance of underrepresented clients. However, these methods often lead to lower overall model accuracy and slower convergence rates. In this paper, we demonstrate that using Vision Transformers can substantially impr… ▽ More Optimization-based regularization methods have been effective in addressing the challenges posed by data heterogeneity in medical federated learning, particularly in improving the performance of underrepresented clients. However, these methods often lead to lower overall model accuracy and slower convergence rates. In this paper, we demonstrate that using Vision Transformers can substantially improve the performance of underrepresented clients without a significant trade-off in overall accuracy. This improvement is attributed to the Vision transformer's ability to capture long-range dependencies within the input data. △ Less

Submitted 15 November, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

arXiv:2310.02031 [pdf, other]

OceanGPT: A Large Language Model for Ocean Science Tasks

Authors: Zhen Bi, Ningyu Zhang, Yida Xue, Yixin Ou, Daxiong Ji, Guozhou Zheng, Huajun Chen

Abstract: Ocean science, which delves into the oceans that are reservoirs of life and biodiversity, is of great significance given that oceans cover over 70% of our planet's surface. Recently, advances in Large Language Models (LLMs) have transformed the paradigm in science. Despite the success in other domains, current LLMs often fall short in catering to the needs of domain experts like oceanographers, an… ▽ More Ocean science, which delves into the oceans that are reservoirs of life and biodiversity, is of great significance given that oceans cover over 70% of our planet's surface. Recently, advances in Large Language Models (LLMs) have transformed the paradigm in science. Despite the success in other domains, current LLMs often fall short in catering to the needs of domain experts like oceanographers, and the potential of LLMs for ocean science is under-explored. The intrinsic reasons are the immense and intricate nature of ocean data as well as the necessity for higher granularity and richness in knowledge. To alleviate these issues, we introduce OceanGPT, the first-ever large language model in the ocean domain, which is expert in various ocean science tasks. We also propose OceanGPT, a novel framework to automatically obtain a large volume of ocean domain instruction data, which generates instructions based on multi-agent collaboration. Additionally, we construct the first oceanography benchmark, OceanBench, to evaluate the capabilities of LLMs in the ocean domain. Though comprehensive experiments, OceanGPT not only shows a higher level of knowledge expertise for oceans science tasks but also gains preliminary embodied intelligence capabilities in ocean technology. △ Less

Submitted 23 May, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

Comments: ACL2024. Project Website: https://oceangpt.zjukg.cn/

arXiv:2309.04501 [pdf, ps, other]

Weighted refined decoupling estimates and application to Falconer distance set problem

Authors: Xiumin Du, Yumeng Ou, Kevin Ren, Ruixiang Zhang

Abstract: We prove some weighted refined decoupling estimates. As an application, we give an alternative proof of the following result on Falconer's distance set problem by the authors in a companion work: if a compact set $E\subset \mathbb{R}^d$ has Hausdorff dimension larger than $\frac{d}{2}+\frac{1}{4}-\frac{1}{8d+4}$, where $d\geq 4$, then there is a point $x\in E$ such that the pinned distance set… ▽ More We prove some weighted refined decoupling estimates. As an application, we give an alternative proof of the following result on Falconer's distance set problem by the authors in a companion work: if a compact set $E\subset \mathbb{R}^d$ has Hausdorff dimension larger than $\frac{d}{2}+\frac{1}{4}-\frac{1}{8d+4}$, where $d\geq 4$, then there is a point $x\in E$ such that the pinned distance set $Δ_x(E)$ has positive Lebesgue measure. Aside from this application, the weighted refined decoupling estimates may be of independent interest. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: 28 pages. arXiv admin note: text overlap with arXiv:2309.04103

MSC Class: 28A80; 28A78

arXiv:2309.04103 [pdf, ps, other]

New improvement to Falconer distance set problem in higher dimensions

Authors: Xiumin Du, Yumeng Ou, Kevin Ren, Ruixiang Zhang

Abstract: We show that if a compact set $E\subset \mathbb{R}^d$ has Hausdorff dimension larger than $\frac{d}{2}+\frac{1}{4}-\frac{1}{8d+4}$, where $d\geq 3$, then there is a point $x\in E$ such that the pinned distance set $Δ_x(E)$ has positive Lebesgue measure. This improves upon bounds of Du-Zhang and Du-Iosevich-Ou-Wang-Zhang in all dimensions $d \ge 3$. We also prove lower bounds for Hausdorff dimensio… ▽ More We show that if a compact set $E\subset \mathbb{R}^d$ has Hausdorff dimension larger than $\frac{d}{2}+\frac{1}{4}-\frac{1}{8d+4}$, where $d\geq 3$, then there is a point $x\in E$ such that the pinned distance set $Δ_x(E)$ has positive Lebesgue measure. This improves upon bounds of Du-Zhang and Du-Iosevich-Ou-Wang-Zhang in all dimensions $d \ge 3$. We also prove lower bounds for Hausdorff dimension of pinned distance sets when $\dim_H (E) \in (\frac{d}{2} - \frac{1}{4} - \frac{3}{8d+4}, \frac{d}{2}+\frac{1}{4}-\frac{1}{8d+4})$, which improves upon bounds of Harris and Wang-Zheng in dimensions $d \ge 3$. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: 36 pages

MSC Class: 28A80; 28A78

arXiv:2309.01969 [pdf, other]

Multi-mode quantum correlation generated from an unbalanced SU(1,1) interferometer using ultra-short laser pulses as pump

Authors: Xueshi Guo, Wen Zhao, Xiaoying Li, Z. Y. Ou

Abstract: Multi-mode entanglement is one of the critical resource in quantum information technology. Generating large scale multi-mode entanglement state by coherently combining time-delayed continuous variables Einstein-Podolsky-Rosen pairs with linear beam-splitters has been widely studied recently. Here we theoretically investigate the multi-mode quantum correlation property of the optical fields generat… ▽ More Multi-mode entanglement is one of the critical resource in quantum information technology. Generating large scale multi-mode entanglement state by coherently combining time-delayed continuous variables Einstein-Podolsky-Rosen pairs with linear beam-splitters has been widely studied recently. Here we theoretically investigate the multi-mode quantum correlation property of the optical fields generated from an unbalanced SU(1,1) interferometer pumped ultra-short pulses, which generates multi-mode entangled state by using a non-degenerate parametric processes to coherently combine delayed Einstein-Podolsky-Rosen pairs in different frequency band. The covariance matrix of the generated multi-mode state is derived analytically for arbitrary mode number $M$ within adjacent timing slot, which shows a given mode is maximally correlated to 5 other modes. Based on the derived covariance matrix, both photon number correlation and quadrature amplitude correlation of the generated state is analyzed. We also extend our analyzing method to the scheme of generating entangled state by using linear beam splitter as a coherent combiner of delayed EPR pairs, and compare the states generated by the two coherently combining schemes. Our result provides a comprehensive theoretical description on the quantum correlations generated from an unbalanced SU(1,1) interferometer within Gaussian system range, and will offer more perspectives to quantum information technology. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: 13 pages, 4 figures

arXiv:2308.11459 [pdf, other]

Phase Dependent Hanbury-Brown and Twiss effect

Authors: Xuan Tang, Yunxiao Zhang, Xueshi Guo, Liang Cui, Xiaoying Li, Z. Y. Ou

Abstract: Hanbury-Brown and Twiss (HBT) effect is the foundation for stellar intensity interferometry. However, it is a phase insensitive two-photon interference effect. In this paper, we extend the HBT interferometer by mixing two phase-coherent input fields with coherent auxiliary fields before intensity correlation measurement and achieve phase sensitive two-photon interference so as to measure the compl… ▽ More Hanbury-Brown and Twiss (HBT) effect is the foundation for stellar intensity interferometry. However, it is a phase insensitive two-photon interference effect. In this paper, we extend the HBT interferometer by mixing two phase-coherent input fields with coherent auxiliary fields before intensity correlation measurement and achieve phase sensitive two-photon interference so as to measure the complete complex second-order coherence function of the input fields. This practical scheme paves the way for synthetic aperture imaging for astronomical applications in optical regime. Pulsed input fields is also tested for potential remote sensing and ranging applications. We discuss the condition to implement recently proposed entanglement-based telescopy scheme with the more realistic cw broadband anti-bunched light fields. △ Less

Submitted 30 October, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

Comments: 5 pages, 6 figures

arXiv:2308.07745 [pdf, ps, other]

Low Mach number limit for non-isentropic magnetohydrodynamic equations with ill-prepared data and zero magnetic diffusivity in bounded domains

Authors: Yaobin Ou, Lu Yang

Abstract: In this article, we verify the low Mach number limit of strong solutions to the non-isentropic compressible magnetohydrodynamic equations with zero magnetic diffusivity and ill-prepared initial data in three-dimensional bounded domains, when the density and the temperature vary around constant states. Invoking a new weighted energy functional, we establish the uniform estimates with respect to the… ▽ More In this article, we verify the low Mach number limit of strong solutions to the non-isentropic compressible magnetohydrodynamic equations with zero magnetic diffusivity and ill-prepared initial data in three-dimensional bounded domains, when the density and the temperature vary around constant states. Invoking a new weighted energy functional, we establish the uniform estimates with respect to the Mach number, especially for the spatial derivatives of high order. Due to the vorticity-slip boundary condition of the velocity, we decompose the uniform estimates into the part for the fast variables and the other one for the slow variables. In particular, the weighted estimates of highest-order spatial derivatives of the fast variables are crucial for the uniform bounds. Finally, the low Mach number limit is justified by the strong convergence of the density and the temperature, the divergence-free component of the velocity, and the weak convergence of other variables. The methods in this paper can be applied to singular limits of general hydrodynamic equations of hyperbolic-parabolic type, including the full Navier-Stokes equations. △ Less

Submitted 15 August, 2023; originally announced August 2023.

arXiv:2308.00338 [pdf, other]

A symplectic dynamics approach to the spatial isosceles three-body problem

Authors: Xijun Hu, Lei Liu, Yuwei Ou, Pedro A. S. Salomão, Guowei Yu

Abstract: We study the spatial isosceles three-body problem from the perspective of Symplectic Dynamics. For certain choices of mass ratio, angular momentum, and energy, the dynamics on the energy surface is equivalent to a Reeb flow on the tight three-sphere. We find a Hopf link formed by the Euler orbit and a symmetric brake orbit, which spans an open book decomposition whose pages are annulus-like global… ▽ More We study the spatial isosceles three-body problem from the perspective of Symplectic Dynamics. For certain choices of mass ratio, angular momentum, and energy, the dynamics on the energy surface is equivalent to a Reeb flow on the tight three-sphere. We find a Hopf link formed by the Euler orbit and a symmetric brake orbit, which spans an open book decomposition whose pages are annulus-like global surfaces of section. In the case of large mass ratios, the Hopf link is non-resonant, forcing the existence of infinitely many periodic orbits. The rotation number of the Euler orbit plays a fundamental role in the existence of periodic orbits and their symmetries. We explore such symmetries in the Hill region and show that the Euler orbit is negative hyperbolic for an open set of parameters while it can never be positive hyperbolic. Finally, we address convexity and determine for each parameter whether the energy surface is strictly convex, convex, or non-convex. Dynamical consequences of this fact are then discussed. △ Less

Submitted 1 August, 2023; originally announced August 2023.

Comments: 66 pages, 15 figures

arXiv:2307.15237 [pdf, other]

Weather Sensitive High Spatio-Temporal Resolution Transportation Electric Load Profiles For Multiple Decarbonization Pathways

Authors: Samrat Acharya, Malini Ghosal, Travis Thurber, Casey D. Burleyson, Yang Ou, Allison Campbell, Gokul Iyer, Nathalie Voisin, Jason Fuller

Abstract: Electrification of transport compounded with climate change will transform hourly load profiles and their response to weather. Power system operators and EV charging stakeholders require such high-resolution load profiles for their planning studies. However, such profiles accounting whole transportation sector is lacking. Thus, we present a novel approach to generating hourly electric load profile… ▽ More Electrification of transport compounded with climate change will transform hourly load profiles and their response to weather. Power system operators and EV charging stakeholders require such high-resolution load profiles for their planning studies. However, such profiles accounting whole transportation sector is lacking. Thus, we present a novel approach to generating hourly electric load profiles that considers charging strategies and evolving sensitivity to temperature. The approach consists of downscaling annual state-scale sectoral load projections from the multi-sectoral Global Change Analysis Model (GCAM) into hourly electric load profiles leveraging high resolution climate and population datasets. Profiles are developed and evaluated at the Balancing Authority scale, with a 5-year increment until 2050 over the Western U.S. Interconnect for multiple decarbonization pathways and climate scenarios. The datasets are readily available for production cost model analysis. Our open source approach is transferable to other regions. △ Less

Submitted 6 March, 2024; v1 submitted 27 July, 2023; originally announced July 2023.

arXiv:2307.04356 [pdf, other]

InfLoR-SNN: Reducing Information Loss for Spiking Neural Networks

Authors: Yufei Guo, Yuanpei Chen, Liwen Zhang, Xiaode Liu, Xinyi Tong, Yuanyuan Ou, Xuhui Huang, Zhe Ma

Abstract: The Spiking Neural Network (SNN) has attracted more and more attention recently. It adopts binary spike signals to transmit information. Benefitting from the information passing paradigm of SNNs, the multiplications of activations and weights can be replaced by additions, which are more energy-efficient. However, its "Hard Reset" mechanism for the firing activity would ignore the difference among… ▽ More The Spiking Neural Network (SNN) has attracted more and more attention recently. It adopts binary spike signals to transmit information. Benefitting from the information passing paradigm of SNNs, the multiplications of activations and weights can be replaced by additions, which are more energy-efficient. However, its "Hard Reset" mechanism for the firing activity would ignore the difference among membrane potentials when the membrane potential is above the firing threshold, causing information loss. Meanwhile, quantifying the membrane potential to 0/1 spikes at the firing instants will inevitably introduce the quantization error thus bringing about information loss too. To address these problems, we propose to use the "Soft Reset" mechanism for the supervised training-based SNNs, which will drive the membrane potential to a dynamic reset potential according to its magnitude, and Membrane Potential Rectifier (MPR) to reduce the quantization error via redistributing the membrane potential to a range close to the spikes. Results show that the SNNs with the "Soft Reset" mechanism and MPR outperform their vanilla counterparts on both static and dynamic datasets. △ Less

Submitted 17 August, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

Comments: Accepted by ECCV2022

arXiv:2307.00583 [pdf, other]

A region and category confidence-based multi-task network for carotid ultrasound image segmentation and classification

Authors: Haitao Gan, Ran Zhou, Yanghan Ou, Furong Wang, Xinyao Cheng, Aaron Fenster

Abstract: The segmentation and classification of carotid plaques in ultrasound images play important roles in the treatment of atherosclerosis and assessment for the risk of stroke. Although deep learning methods have been used for carotid plaque segmentation and classification, two-stage methods will increase the complexity of the overall analysis and the existing multi-task methods ignored the relationshi… ▽ More The segmentation and classification of carotid plaques in ultrasound images play important roles in the treatment of atherosclerosis and assessment for the risk of stroke. Although deep learning methods have been used for carotid plaque segmentation and classification, two-stage methods will increase the complexity of the overall analysis and the existing multi-task methods ignored the relationship between the segmentation and classification. These will lead to suboptimal performance as valuable information might not be fully leveraged across all tasks. Therefore, we propose a multi-task learning framework (RCCM-Net) for ultrasound carotid plaque segmentation and classification, which utilizes a region confidence module (RCM) and a sample category confidence module (CCM) to exploit the correlation between these two tasks. The RCM provides knowledge from the probability of plaque regions to the classification task, while the CCM is designed to learn the categorical sample weight for the segmentation task. A total of 1270 2D ultrasound images of carotid plaques were collected from Zhongnan Hospital (Wuhan, China) for our experiments. The results showed that the proposed method can improve both segmentation and classification performance compared to existing single-task networks (i.e., SegNet, Deeplabv3+, UNet++, EfficientNet, Res2Net, RepVGG, DPN) and multi-task algorithms (i.e., HRNet, MTANet), with an accuracy of 85.82% for classification and a Dice-similarity-coefficient of 84.92% for segmentation. In the ablation study, the results demonstrated that both the designed RCM and CCM were beneficial in improving the network's performance. Therefore, we believe that the proposed method could be useful for carotid plaque analysis in clinical trials and practice. △ Less

Submitted 18 November, 2023; v1 submitted 2 July, 2023; originally announced July 2023.

arXiv:2306.14085 [pdf, other]

doi 10.1109/LRA.2023.3254860

Sim-to-Real Surgical Robot Learning and Autonomous Planning for Internal Tissue Points Manipulation using Reinforcement Learning

Authors: Yafei Ou, Mahdi Tavakoli

Abstract: Indirect simultaneous positioning (ISP), where internal tissue points are placed at desired locations indirectly through the manipulation of boundary points, is a type of subtask frequently performed in robotic surgeries. Although challenging due to complex tissue dynamics, automating the task can potentially reduce the workload of surgeons. This paper presents a sim-to-real framework for learning… ▽ More Indirect simultaneous positioning (ISP), where internal tissue points are placed at desired locations indirectly through the manipulation of boundary points, is a type of subtask frequently performed in robotic surgeries. Although challenging due to complex tissue dynamics, automating the task can potentially reduce the workload of surgeons. This paper presents a sim-to-real framework for learning to automate the task without interacting with a real environment, and for planning preoperatively to find the grasping points that minimize local tissue deformation. A control policy is learned using deep reinforcement learning (DRL) in the FEM-based simulation environment and transferred to real-world situation. Grasping points are planned in the simulator by utilizing the trained policy using Bayesian optimization (BO). Inconsistent simulation performance is overcome by formulating the problem as a state augmented Markov decision process (MDP). Experimental results show that the learned policy places the internal tissue points accurately, and that the planned grasping points yield small tissue deformation among the trials. The proposed learning and planning scheme is able to automate internal tissue point manipulation in surgeries and has the potential to be generalized to complex surgical scenarios. △ Less

Submitted 24 June, 2023; originally announced June 2023.

Comments: 8 pages, 8 figures

Journal ref: IEEE Robotics and Automation Letters, vol. 8, no. 5, pp. 2502-2509, May 2023

arXiv:2306.05247 [pdf, ps, other]

A singular variant of the Falconer distance problem

Authors: Tainara Borges, Alex Iosevich, Yumeng Ou

Abstract: In this paper we study the following variant of the Falconer distance problem. Let $E$ be a compact subset of ${\mathbb{R}}^d$, $d \ge 1$, and define $$ \Box(E)=\left\{\sqrt{{|x-y|}^2+{|x-z|}^2}: x,y,z \in E,\, y\neq z \right\}.$$ We shall prove using a variety of methods that if the Hausdorff dimension of $E$ is greater than $\frac{d}{2}+\frac{1}{4}$, then the Lebesgue measure of $\Box(E)$ is pos… ▽ More In this paper we study the following variant of the Falconer distance problem. Let $E$ be a compact subset of ${\mathbb{R}}^d$, $d \ge 1$, and define $$ \Box(E)=\left\{\sqrt{{|x-y|}^2+{|x-z|}^2}: x,y,z \in E,\, y\neq z \right\}.$$ We shall prove using a variety of methods that if the Hausdorff dimension of $E$ is greater than $\frac{d}{2}+\frac{1}{4}$, then the Lebesgue measure of $\Box(E)$ is positive. This problem can be viewed as a singular variant of the classical Falconer distance problem because considering the diagonal $(x,x)$ in the definition of $\Box(E)$ poses interesting complications stemming from the fact that the set $\{(x,x): x \in E\}\subseteq \mathbb{R}^{2d}$ is much smaller than the sets for which the Falconer type results are typically established. We also prove a finite field variant of the Euclidean results for $\Box(E)$ and indicate both the similarities and the differences between the two settings. △ Less

Submitted 31 August, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: A new approach has been added. 25 pages

MSC Class: 42B20

arXiv:2306.01167 [pdf, other]

doi 10.1103/PhysRevMaterials.7.094201

Constraints on proximity-induced ferromagnetism in a Dirac semimetal (Cd$_3$As$_2$)/ferromagnetic semiconductor (Ga$_{1-x}$Mn$_x$Sb) heterostructure

Authors: Arpita Mitra, Run Xiao, Wilson Yanez, Yongxi Ou, Juan Chamorro, Tyrel McQueen, Alexander J. Grutter, Julie A. Borchers, Michael R. Fitzsimmons, Timothy R. Charlton, Nitin Samarth

Abstract: Breaking time-reversal symmetry in a Dirac semimetal Cd$_3$As$_2$ through doping with magnetic ions or by the magnetic proximity effect is expected to cause a transition to other topological phases (such as a Weyl semimetal). To this end, we investigate the possibility of proximity-induced ferromagnetic ordering in epitaxial Dirac semimetal (Cd$_3$As$_2$)/ferromagnetic semiconductor (Ga$_{1-x}$Mn… ▽ More Breaking time-reversal symmetry in a Dirac semimetal Cd$_3$As$_2$ through doping with magnetic ions or by the magnetic proximity effect is expected to cause a transition to other topological phases (such as a Weyl semimetal). To this end, we investigate the possibility of proximity-induced ferromagnetic ordering in epitaxial Dirac semimetal (Cd$_3$As$_2$)/ferromagnetic semiconductor (Ga$_{1-x}$Mn$_x$Sb) heterostructures grown by molecular beam epitaxy. We report the comprehensive characterization of these heterostructures using structural probes (atomic force microscopy, x-ray diffraction, scanning transmission electron microscopy), angle-resolved photoemission spectroscopy, electrical magneto-transport, magnetometry, and polarized neutron reflectometry. Measurements of the magnetoresistance and Hall effect in the temperature range 2 K - 20 K show signatures that could be consistent with either a proximity effect or spin-dependent scattering of charge carriers in the Cd$_3$As$_2$ channel. Polarized neutron reflectometry sets constraints on the interpretation of the magnetotransport studies by showing that (at least for temperatures above 6 K) any induced magnetization in the Cd$_3$As$_2$ itself must be relatively small ($<$ 14 emu/cm$^3$). △ Less

Submitted 1 June, 2023; originally announced June 2023.

Journal ref: Phys. Rev. Materials 7, 094201 (2023)

arXiv:2306.00526 [pdf, other]

Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering

Authors: Wenjin Wang, Yunhao Li, Yixin Ou, Yin Zhang

Abstract: Layout-aware pre-trained models has achieved significant progress on document image question answering. They introduce extra learnable modules into existing language models to capture layout information within document images from text bounding box coordinates obtained by OCR tools. However, extra modules necessitate pre-training on extensive document images. This prevents these methods from direc… ▽ More Layout-aware pre-trained models has achieved significant progress on document image question answering. They introduce extra learnable modules into existing language models to capture layout information within document images from text bounding box coordinates obtained by OCR tools. However, extra modules necessitate pre-training on extensive document images. This prevents these methods from directly utilizing off-the-shelf instruction-tuning language foundation models, which have recently shown promising potential in zero-shot learning. Instead, in this paper, we find that instruction-tuning language models like Claude and ChatGPT can understand layout by spaces and line breaks. Based on this observation, we propose the LAyout and Task aware Instruction Prompt (LATIN-Prompt), which consists of layout-aware document content and task-aware instruction. Specifically, the former uses appropriate spaces and line breaks to recover the layout information among text segments obtained by OCR tools, and the latter ensures that generated answers adhere to formatting requirements. Moreover, we propose the LAyout and Task aware Instruction Tuning (LATIN-Tuning) to improve the performance of small instruction-tuning models like Alpaca. Experimental results show that LATIN-Prompt enables zero-shot performance of Claude and ChatGPT to be comparable to the fine-tuning performance of SOTAs on document image question answering, and LATIN-Tuning enhances the zero-shot performance of Alpaca significantly. For example, LATIN-Prompt improves the performance of Claude and ChatGPT on DocVQA by 263% and 20% respectively. LATIN-Tuning improves the performance of Alpaca on DocVQA by 87.7%. Quantitative and qualitative analyses demonstrate the effectiveness of LATIN-Prompt and LATIN-Tuning. We provide the code in supplementary and will release it to facilitate future research. △ Less

Submitted 7 September, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: Add the LATIN-Tuning for Alapca. Code is available at https://github.com/WenjinW/LATIN-Prompt

arXiv:2305.13168 [pdf, other]

LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities

Authors: Yuqi Zhu, Xiaohan Wang, Jing Chen, Shuofei Qiao, Yixin Ou, Yunzhi Yao, Shumin Deng, Huajun Chen, Ningyu Zhang

Abstract: This paper presents an exhaustive quantitative and qualitative evaluation of Large Language Models (LLMs) for Knowledge Graph (KG) construction and reasoning. We engage in experiments across eight diverse datasets, focusing on four representative tasks encompassing entity and relation extraction, event extraction, link prediction, and question-answering, thereby thoroughly exploring LLMs' performa… ▽ More This paper presents an exhaustive quantitative and qualitative evaluation of Large Language Models (LLMs) for Knowledge Graph (KG) construction and reasoning. We engage in experiments across eight diverse datasets, focusing on four representative tasks encompassing entity and relation extraction, event extraction, link prediction, and question-answering, thereby thoroughly exploring LLMs' performance in the domain of construction and inference. Empirically, our findings suggest that LLMs, represented by GPT-4, are more suited as inference assistants rather than few-shot information extractors. Specifically, while GPT-4 exhibits good performance in tasks related to KG construction, it excels further in reasoning tasks, surpassing fine-tuned models in certain cases. Moreover, our investigation extends to the potential generalization ability of LLMs for information extraction, leading to the proposition of a Virtual Knowledge Extraction task and the development of the corresponding VINE dataset. Based on these empirical findings, we further propose AutoKG, a multi-agent-based approach employing LLMs and external sources for KG construction and reasoning. We anticipate that this research can provide invaluable insights for future undertakings in the field of knowledge graphs. The code and datasets are in https://github.com/zjunlp/AutoKG. △ Less

Submitted 22 February, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: Work in progress

arXiv:2305.11796 [pdf, other]

Laser cooling of traveling wave phonons in an optical fiber

Authors: Joel N. Johnson, Danielle R. Haverkamp, Yi-Hsin Ou, Khanh Kieu, Nils T. Otterstrom, Peter T. Rakich, Ryan O. Behunin

Abstract: In recent years, optical control of mechanical oscillators has emerged as a critical tool for everything from information processing to laser cooling. While traditional forms of optomechanical cooling utilize systems comprised of discrete optical and mechanical modes, it has recently been shown that cooling can be achieved in a chip-based system that possesses a continuum of modes. Through Brillou… ▽ More In recent years, optical control of mechanical oscillators has emerged as a critical tool for everything from information processing to laser cooling. While traditional forms of optomechanical cooling utilize systems comprised of discrete optical and mechanical modes, it has recently been shown that cooling can be achieved in a chip-based system that possesses a continuum of modes. Through Brillouin-mediated phonon-photon interactions, cooling of a band of traveling acoustic waves can occur when anti-Stokes scattered photons exit the system more rapidly than the relaxation rate of the mechanical waves -- to a degree determined by the acousto-optic coupling. Here, we demonstrate that a continuum of traveling wave phonons can be cooled within an optical fiber, extending this physics to macroscopic length scales. Leveraging the large acousto-optic coupling permitted within a liquid-core optical fiber, heterodyne spectroscopy reveals power-dependent changes in spontaneous Brillouin scattering spectra that indicate a reduction of the thermal phonon population by 21K using 120 mW of injected laser power. △ Less

Submitted 19 May, 2023; originally announced May 2023.

Comments: 8 pages, 3 figures

arXiv:2304.13938 [pdf, other]

doi 10.1016/j.compmedimag.2023.102273

A Deep Registration Method for Accurate Quantification of Joint Space Narrowing Progression in Rheumatoid Arthritis

Authors: Haolin Wang, Yafei Ou, Wanxuan Fang, Prasoon Ambalathankandy, Naoto Goto, Gen Ota, Masayuki Ikebe, Tamotsu Kamishima

Abstract: Rheumatoid arthritis (RA) is a chronic autoimmune inflammatory disease that results in progressive articular destruction and severe disability. Joint space narrowing (JSN) progression has been regarded as an important indicator for RA progression and has received sustained attention. In the diagnosis and monitoring of RA, radiology plays a crucial role to monitor joint space. A new framework for m… ▽ More Rheumatoid arthritis (RA) is a chronic autoimmune inflammatory disease that results in progressive articular destruction and severe disability. Joint space narrowing (JSN) progression has been regarded as an important indicator for RA progression and has received sustained attention. In the diagnosis and monitoring of RA, radiology plays a crucial role to monitor joint space. A new framework for monitoring joint space by quantifying JSN progression through image registration in radiographic images has been developed. This framework offers the advantage of high accuracy, however, challenges do exist in reducing mismatches and improving reliability. In this work, a deep intra-subject rigid registration network is proposed to automatically quantify JSN progression in the early stage of RA. In our experiments, the mean-square error of Euclidean distance between moving and fixed image is 0.0031, standard deviation is 0.0661 mm, and the mismatching rate is 0.48\%. The proposed method has sub-pixel level accuracy, exceeding manual measurements by far, and is equipped with immune to noise, rotation, and scaling of joints. Moreover, this work provides loss visualization, which can aid radiologists and rheumatologists in assessing quantification reliability, with important implications for possible future clinical applications. As a result, we are optimistic that this proposed work will make a significant contribution to the automatic quantification of JSN progression in RA. △ Less

Submitted 26 April, 2023; originally announced April 2023.

Comments: 11 pages, 9 figures, 7 tables

MSC Class: 68T45 ACM Class: I.4

arXiv:2304.09324 [pdf, other]

Computer-Vision Benchmark Segment-Anything Model (SAM) in Medical Images: Accuracy in 12 Datasets

Authors: Sheng He, Rina Bao, Jingpeng Li, Jeffrey Stout, Atle Bjornerud, P. Ellen Grant, Yangming Ou

Abstract: Background: The segment-anything model (SAM), introduced in April 2023, shows promise as a benchmark model and a universal solution to segment various natural images. It comes without previously-required re-training or fine-tuning specific to each new dataset. Purpose: To test SAM's accuracy in various medical image segmentation tasks and investigate potential factors that may affect its accurac… ▽ More Background: The segment-anything model (SAM), introduced in April 2023, shows promise as a benchmark model and a universal solution to segment various natural images. It comes without previously-required re-training or fine-tuning specific to each new dataset. Purpose: To test SAM's accuracy in various medical image segmentation tasks and investigate potential factors that may affect its accuracy in medical images. Methods: SAM was tested on 12 public medical image segmentation datasets involving 7,451 subjects. The accuracy was measured by the Dice overlap between the algorithm-segmented and ground-truth masks. SAM was compared with five state-of-the-art algorithms specifically designed for medical image segmentation tasks. Associations of SAM's accuracy with six factors were computed, independently and jointly, including segmentation difficulties as measured by segmentation ability score and by Dice overlap in U-Net, image dimension, size of the target region, image modality, and contrast. Results: The Dice overlaps from SAM were significantly lower than the five medical-image-based algorithms in all 12 medical image segmentation datasets, by a margin of 0.1-0.5 and even 0.6-0.7 Dice. SAM-Semantic was significantly associated with medical image segmentation difficulty and the image modality, and SAM-Point and SAM-Box were significantly associated with image segmentation difficulty, image dimension, target region size, and target-vs-background contrast. All these 3 variations of SAM were more accurate in 2D medical images, larger target region sizes, easier cases with a higher Segmentation Ability score and higher U-Net Dice, and higher foreground-background contrast. △ Less

Submitted 5 May, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

Comments: Technical Report

arXiv:2304.08915 [pdf, other]

Differentiable Genetic Programming for High-dimensional Symbolic Regression

Authors: Peng Zeng, Xiaotian Song, Andrew Lensen, Yuwei Ou, Yanan Sun, Mengjie Zhang, Jiancheng Lv

Abstract: Symbolic regression (SR) is the process of discovering hidden relationships from data with mathematical expressions, which is considered an effective way to reach interpretable machine learning (ML). Genetic programming (GP) has been the dominator in solving SR problems. However, as the scale of SR problems increases, GP often poorly demonstrates and cannot effectively address the real-world high-… ▽ More Symbolic regression (SR) is the process of discovering hidden relationships from data with mathematical expressions, which is considered an effective way to reach interpretable machine learning (ML). Genetic programming (GP) has been the dominator in solving SR problems. However, as the scale of SR problems increases, GP often poorly demonstrates and cannot effectively address the real-world high-dimensional problems. This limitation is mainly caused by the stochastic evolutionary nature of traditional GP in constructing the trees. In this paper, we propose a differentiable approach named DGP to construct GP trees towards high-dimensional SR for the first time. Specifically, a new data structure called differentiable symbolic tree is proposed to relax the discrete structure to be continuous, thus a gradient-based optimizer can be presented for the efficient optimization. In addition, a sampling method is proposed to eliminate the discrepancy caused by the above relaxation for valid symbolic expressions. Furthermore, a diversification mechanism is introduced to promote the optimizer escaping from local optima for globally better solutions. With these designs, the proposed DGP method can efficiently search for the GP trees with higher performance, thus being capable of dealing with high-dimensional SR. To demonstrate the effectiveness of DGP, we conducted various experiments against the state of the arts based on both GP and deep neural networks. The experiment results reveal that DGP can outperform these chosen peer competitors on high-dimensional regression benchmarks with dimensions varying from tens to thousands. In addition, on the synthetic SR problems, the proposed DGP method can also achieve the best recovery rate even with different noisy levels. It is believed this work can facilitate SR being a powerful alternative to interpretable ML for a broader range of real-world problems. △ Less

Submitted 18 April, 2023; originally announced April 2023.

arXiv:2304.01401 [pdf, other]

U-Netmer: U-Net meets Transformer for medical image segmentation

Authors: Sheng He, Rina Bao, P. Ellen Grant, Yangming Ou

Abstract: The combination of the U-Net based deep learning models and Transformer is a new trend for medical image segmentation. U-Net can extract the detailed local semantic and texture information and Transformer can learn the long-rang dependencies among pixels in the input image. However, directly adapting the Transformer for segmentation has ``token-flatten" problem (flattens the local patches into 1D… ▽ More The combination of the U-Net based deep learning models and Transformer is a new trend for medical image segmentation. U-Net can extract the detailed local semantic and texture information and Transformer can learn the long-rang dependencies among pixels in the input image. However, directly adapting the Transformer for segmentation has ``token-flatten" problem (flattens the local patches into 1D tokens which losses the interaction among pixels within local patches) and ``scale-sensitivity" problem (uses a fixed scale to split the input image into local patches). Compared to directly combining U-Net and Transformer, we propose a new global-local fashion combination of U-Net and Transformer, named U-Netmer, to solve the two problems. The proposed U-Netmer splits an input image into local patches. The global-context information among local patches is learnt by the self-attention mechanism in Transformer and U-Net segments each local patch instead of flattening into tokens to solve the `token-flatten" problem. The U-Netmer can segment the input image with different patch sizes with the identical structure and the same parameter. Thus, the U-Netmer can be trained with different patch sizes to solve the ``scale-sensitivity" problem. We conduct extensive experiments in 7 public datasets on 7 organs (brain, heart, breast, lung, polyp, pancreas and prostate) and 4 imaging modalities (MRI, CT, ultrasound, and endoscopy) to show that the proposed U-Netmer can be generally applied to improve accuracy of medical image segmentation. These experimental results show that U-Netmer provides state-of-the-art performance compared to baselines and other models. In addition, the discrepancy among the outputs of U-Netmer with different scales is linearly correlated to the segmentation accuracy which can be considered as a confidence score to rank test images by difficulty without ground-truth. △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: 10 pages, 5 figures, under review

arXiv:2303.16434 [pdf, other]

TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs

Authors: Yaobo Liang, Chenfei Wu, Ting Song, Wenshan Wu, Yan Xia, Yu Liu, Yang Ou, Shuai Lu, Lei Ji, Shaoguang Mao, Yun Wang, Linjun Shou, Ming Gong, Nan Duan

Abstract: Artificial Intelligence (AI) has made incredible progress recently. On the one hand, advanced foundation models like ChatGPT can offer powerful conversation, in-context learning and code generation abilities on a broad range of open-domain tasks. They can also generate high-level solution outlines for domain-specific tasks based on the common sense knowledge they have acquired. However, they still… ▽ More Artificial Intelligence (AI) has made incredible progress recently. On the one hand, advanced foundation models like ChatGPT can offer powerful conversation, in-context learning and code generation abilities on a broad range of open-domain tasks. They can also generate high-level solution outlines for domain-specific tasks based on the common sense knowledge they have acquired. However, they still face difficulties with some specialized tasks because they lack enough domain-specific data during pre-training or they often have errors in their neural network computations on those tasks that need accurate executions. On the other hand, there are also many existing models and systems (symbolic-based or neural-based) that can do some domain-specific tasks very well. However, due to the different implementation or working mechanisms, they are not easily accessible or compatible with foundation models. Therefore, there is a clear and pressing need for a mechanism that can leverage foundation models to propose task solution outlines and then automatically match some of the sub-tasks in the outlines to the off-the-shelf models and systems with special functionalities to complete them. Inspired by this, we introduce TaskMatrix.AI as a new AI ecosystem that connects foundation models with millions of APIs for task completion. Unlike most previous work that aimed to improve a single AI model, TaskMatrix.AI focuses more on using existing foundation models (as a brain-like central system) and APIs of other AI models and systems (as sub-task solvers) to achieve diversified tasks in both digital and physical domains. As a position paper, we will present our vision of how to build such an ecosystem, explain each key component, and use study cases to illustrate both the feasibility of this vision and the main challenges we need to address next. △ Less

Submitted 28 March, 2023; originally announced March 2023.

Showing 1–50 of 329 results for author: Ou, Y