subscribe to arXiv mailings

A hidden AGN powering bright [O III] nebulae in a protocluster core at $z=4.5$ revealed by JWST

Authors: M. Solimano, J. González-López, M. Aravena, B. Alcalde Pampliega, R. J. Assef, M. Béthermin, M. Boquien, S. Bovino, C. M. Casey, P. Cassata, E. da Cunha, R. L. Davies, I. De Looze, X. Ding, T. Díaz-Santos, A. L. Faisst, A. Ferrara, D. B. Fisher, N. M. Förster-Schreiber, S. Fujimoto, M. Ginolfi, C. Gruppioni, L. Guaita, N. Hathi, R. Herrera-Camus , et al. (26 additional authors not shown)

Abstract: We present new JWST/NIRSpec IFU observations of the J1000+0234 system at $z=4.54$, the dense core of a galaxy protocluster hosting a massive, dusty star forming galaxy (DSFG) with a low luminosity radio counterpart. The new data reveals two extended, high equivalent width (EW$_0 > 1000$ Å) nebulae at each side of the DSFG disk along its minor axis (namely O3-N and O3-S). On one hand, O3-N's spectr… ▽ More We present new JWST/NIRSpec IFU observations of the J1000+0234 system at $z=4.54$, the dense core of a galaxy protocluster hosting a massive, dusty star forming galaxy (DSFG) with a low luminosity radio counterpart. The new data reveals two extended, high equivalent width (EW$_0 > 1000$ Å) nebulae at each side of the DSFG disk along its minor axis (namely O3-N and O3-S). On one hand, O3-N's spectrum shows a prominent FWHM $\sim1300$ km s$^{-1}$ broad and blueshifted component, suggesting an outflow origin. On the other hand, O3-S stretches over parsec and has a velocity gradient that spans $800$ km s$^{-1}$ but no evidence of a broad component. Both sources, however, seem to be powered at least partially by an active galactic nucleus (AGN), so we classify them as extended emission-line regions (EELRs). The strongest evidence comes from the detection of the high-ionization [Ne V] $\lambda3427$ line toward O3-N, which paired with the non-detection of hard X-rays implies an obscuring column density above the Compton-thick regime. In O3-S, the [Ne V] line is not detected, but we measure a He II well above the expectation for star formation. We interpret this as O3-S being externally irradiated by the AGN, akin to the famous Hanny's Voorwerp object in the local Universe. In addition, more classical line ratio diagnostics (e.g. [O III]/H$β$ vs [N II]/H$α$) put the DSFG itself in the AGN region of the diagrams, and hence the most probable host of the AGN. These results showcase the ability of JWST of unveiling highly obscured AGN at high redshifts. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: 5 pages, 4 figures plus 5 appendices (incl. 3 extra figures and one table). Submitted to A&A on July 17th 2024

arXiv:2407.12450 [pdf, other]

Interim report for the International Muon Collider Collaboration (IMCC)

Authors: C. Accettura, S. Adrian, R. Agarwal, C. Ahdida, C. Aimé, A. Aksoy, G. L. Alberghi, S. Alden, N. Amapane, D. Amorim, P. Andreetto, F. Anulli, R. Appleby, A. Apresyan, P. Asadi, M. Attia Mahmoud, B. Auchmann, J. Back, A. Badea, K. J. Bae, E. J. Bahng, L. Balconi, F. Balli, L. Bandiera, C. Barbagallo , et al. (362 additional authors not shown)

Abstract: The International Muon Collider Collaboration (IMCC) [1] was established in 2020 following the recommendations of the European Strategy for Particle Physics (ESPP) and the implementation of the European Strategy for Particle Physics-Accelerator R&D Roadmap by the Laboratory Directors Group [2], hereinafter referred to as the the European LDG roadmap. The Muon Collider Study (MuC) covers the accele… ▽ More The International Muon Collider Collaboration (IMCC) [1] was established in 2020 following the recommendations of the European Strategy for Particle Physics (ESPP) and the implementation of the European Strategy for Particle Physics-Accelerator R&D Roadmap by the Laboratory Directors Group [2], hereinafter referred to as the the European LDG roadmap. The Muon Collider Study (MuC) covers the accelerator complex, detectors and physics for a future muon collider. In 2023, European Commission support was obtained for a design study of a muon collider (MuCol) [3]. This project started on 1st March 2023, with work-packages aligned with the overall muon collider studies. In preparation of and during the 2021-22 U.S. Snowmass process, the muon collider project parameters, technical studies and physics performance studies were performed and presented in great detail. Recently, the P5 panel [4] in the U.S. recommended a muon collider R&D, proposed to join the IMCC and envisages that the U.S. should prepare to host a muon collider, calling this their "muon shot". In the past, the U.S. Muon Accelerator Programme (MAP) [5] has been instrumental in studies of concepts and technologies for a muon collider. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: This document summarises the International Muon Collider Collaboration (IMCC) progress and status of the Muon Collider R&D programme

arXiv:2407.10082 [pdf, ps, other]

On the blow-up formula of the Chow weights for polarized toric manifolds

Authors: King Leung Lee, Naoto Yotsutani

Abstract: Let $X$ be a smooth projective toric variety and let $\widetilde{X}$ be the blow-up manifold of $X$ at finitely many distinct tours invariants points of $X$. In this paper, we give an explicit combinatorial formula of the Chow weight of $\widetilde{X}$ in terms of the base toric manifold $X$ and the symplectic cuts of the Delzant polytope. We then apply this blow-up formula to the projective plane… ▽ More Let $X$ be a smooth projective toric variety and let $\widetilde{X}$ be the blow-up manifold of $X$ at finitely many distinct tours invariants points of $X$. In this paper, we give an explicit combinatorial formula of the Chow weight of $\widetilde{X}$ in terms of the base toric manifold $X$ and the symplectic cuts of the Delzant polytope. We then apply this blow-up formula to the projective plane and see the difference of Chow stability between the toric blow-up manifolds and the manifolds of blow-ups at general points. Finally, we detect the blow-up formula of the Futaki-Ono invariant which is an obstruction for asymptotic Chow semistability of a polarized toric manifold. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 23 pages, 3 figures. Comments welcome

MSC Class: 51M20; 53C55; 14M25

arXiv:2407.00925 [pdf, other]

SIDQL: An Efficient Keyframe Extraction and Motion Reconstruction Framework in Motion Capture

Authors: Xuling Zhang, Ziru Zhang, Yuyang Wang, Lik-hang Lee, Pan Hui

Abstract: Metaverse, which integrates the virtual and physical worlds, has emerged as an innovative paradigm for changing people's lifestyles. Motion capture has become a reliable approach to achieve seamless synchronization of the movements between avatars and human beings, which plays an important role in diverse Metaverse applications. However, due to the continuous growth of data, current communication… ▽ More Metaverse, which integrates the virtual and physical worlds, has emerged as an innovative paradigm for changing people's lifestyles. Motion capture has become a reliable approach to achieve seamless synchronization of the movements between avatars and human beings, which plays an important role in diverse Metaverse applications. However, due to the continuous growth of data, current communication systems face a significant challenge of meeting the demand of ultra-low latency during application. In addition, current methods also have shortcomings when selecting keyframes, e.g., relying on recognizing motion types and artificially selected keyframes. Therefore, the utilization of keyframe extraction and motion reconstruction techniques could be considered a feasible and promising solution. In this work, a new motion reconstruction algorithm is designed in a spherical coordinate system involving location and velocity information. Then, we formalize the keyframe extraction problem into an optimization problem to reduce the reconstruction error. Using Deep Q-Learning (DQL), the Spherical Interpolation based Deep Q-Learning (SIDQL) framework is proposed to generate proper keyframes for reconstructing the motion sequences. We use the CMU database to train and evaluate the framework. Our scheme can significantly reduce the data volume and transmission latency compared to various baselines while maintaining a reconstruction error of less than 0.09 when extracting five keyframes. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2406.16003 [pdf]

Unidirectional Chiral Emission via Twisted Bi-layer Metasurfaces

Authors: Dmitrii Gromyko, Shu An, Sergey Gorelik, Jiahui Xu, Li Jun Lim, Henry Yit Loong Lee, Febiana Tjiptoharsono, Zhi-Kuang Tan, Cheng-Wei Qiu, Zhaogang Dong, Lin Wu

Abstract: Controlling and channelling light emissions from unpolarized quantum dots into specific directions with chiral polarization remains a key challenge in modern photonics. Stacked metasurface designs offer a potential compact solution for chirality and directionality engineering. However, experimental observations of directional chiral radiation from resonant metasurfaces with quantum emitters remain… ▽ More Controlling and channelling light emissions from unpolarized quantum dots into specific directions with chiral polarization remains a key challenge in modern photonics. Stacked metasurface designs offer a potential compact solution for chirality and directionality engineering. However, experimental observations of directional chiral radiation from resonant metasurfaces with quantum emitters remain obscure. In this paper, we present experimental observations of unidirectional chiral emission from a twisted bi-layer metasurface via multi-dimensional control, including twist angle, interlayer distance, and lateral displacement between the top and bottom layers, as enabled by doublet alignment lithography (DAL). First, maintaining alignment, the metasurface demonstrates a resonant intrinsic optical chirality with near-unity circular dichroism of 0.94 and reflectance difference of 74%, where a high circular dichroism greater than 0.9 persists across a wide range of angles from -11 to 11 degrees. Second, engineered lateral displacement induces a unidirectional chiral resonance, resulting in unidirectional chiral emission from the quantum dots deposited onto the metasurface. Our bi-layer metasurfaces offer a universal compact platform for efficient radiation manipulation over a wide angular range, promising potential applications in miniaturized lasers, grating couplers, and chiral nanoantennas. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: 16 pages, 4 figures

arXiv:2406.15391 [pdf]

doi 10.2139/ssrn.4807135

Examining the Legal Status of Digital Assets as Property: A Comparative Analysis of Jurisdictional Approaches

Authors: Luke Lee

Abstract: This paper examines the complex legal landscape surrounding digital assets, analysing how they are defined and regulated as property across various jurisdictions. As digital assets such as cryptocurrencies and non-fungible tokens (NFTs) increasingly integrate with global economies, their intangible nature presents unique challenges to traditional property law concepts, necessitating a re-evaluatio… ▽ More This paper examines the complex legal landscape surrounding digital assets, analysing how they are defined and regulated as property across various jurisdictions. As digital assets such as cryptocurrencies and non-fungible tokens (NFTs) increasingly integrate with global economies, their intangible nature presents unique challenges to traditional property law concepts, necessitating a re-evaluation of legal definitions and ownership frameworks. This research presents a comparative analysis, reviewing how different legal systems classify and manage digital assets within property law, highlighting the variations in regulatory approaches and their implications on ownership, transfer, and inheritance rights. By examining seminal cases and regulatory developments in major jurisdictions, including the United States, the European Union, and Singapore, this paper explores the emerging trends and potential legal evolutions that could influence the global handling of digital assets. The study aims to contribute to the scholarly discourse by proposing a harmonized approach to digital asset regulation, seeking to balance innovation with legal certainty and consumer protection. △ Less

Submitted 26 April, 2024; originally announced June 2024.

Comments: 16 pages

arXiv:2406.11886 [pdf, other]

Financial Assets Dependency Prediction Utilizing Spatiotemporal Patterns

Authors: Haoren Zhu, Pengfei Zhao, Wilfred Siu Hung NG, Dik Lun Lee

Abstract: Financial assets exhibit complex dependency structures, which are crucial for investors to create diversified portfolios to mitigate risk in volatile financial markets. To explore the financial asset dependencies dynamics, we propose a novel approach that models the dependencies of assets as an Asset Dependency Matrix (ADM) and treats the ADM sequences as image sequences. This allows us to leverag… ▽ More Financial assets exhibit complex dependency structures, which are crucial for investors to create diversified portfolios to mitigate risk in volatile financial markets. To explore the financial asset dependencies dynamics, we propose a novel approach that models the dependencies of assets as an Asset Dependency Matrix (ADM) and treats the ADM sequences as image sequences. This allows us to leverage deep learning-based video prediction methods to capture the spatiotemporal dependencies among assets. However, unlike images where neighboring pixels exhibit explicit spatiotemporal dependencies due to the natural continuity of object movements, assets in ADM do not have a natural order. This poses challenges to organizing the relational assets to reveal better the spatiotemporal dependencies among neighboring assets for ADM forecasting. To tackle the challenges, we propose the Asset Dependency Neural Network (ADNN), which employs the Convolutional Long Short-Term Memory (ConvLSTM) network, a highly successful method for video prediction. ADNN can employ static and dynamic transformation functions to optimize the representations of the ADM. Through extensive experiments, we demonstrate that our proposed framework consistently outperforms the baselines in the ADM prediction and downstream application tasks. This research contributes to understanding and predicting asset dependencies, offering valuable insights for financial market participants. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.04339 [pdf, other]

RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation

Authors: Jiaming Liu, Mengzhen Liu, Zhenyu Wang, Lily Lee, Kaichen Zhou, Pengju An, Senqiao Yang, Renrui Zhang, Yandong Guo, Shanghang Zhang

Abstract: A fundamental objective in robot manipulation is to enable models to comprehend visual scenes and execute actions. Although existing robot Multimodal Large Language Models (MLLMs) can handle a range of basic tasks, they still face challenges in two areas: 1) inadequate reasoning ability to tackle complex tasks, and 2) high computational costs for MLLM fine-tuning and inference. The recently propos… ▽ More A fundamental objective in robot manipulation is to enable models to comprehend visual scenes and execute actions. Although existing robot Multimodal Large Language Models (MLLMs) can handle a range of basic tasks, they still face challenges in two areas: 1) inadequate reasoning ability to tackle complex tasks, and 2) high computational costs for MLLM fine-tuning and inference. The recently proposed state space model (SSM) known as Mamba demonstrates promising capabilities in non-trivial sequence modeling with linear inference complexity. Inspired by this, we introduce RoboMamba, an end-to-end robotic MLLM that leverages the Mamba model to deliver both robotic reasoning and action capabilities, while maintaining efficient fine-tuning and inference. Specifically, we first integrate the vision encoder with Mamba, aligning visual data with language embedding through co-training, empowering our model with visual common sense and robot-related reasoning. To further equip RoboMamba with action pose prediction abilities, we explore an efficient fine-tuning strategy with a simple policy head. We find that once RoboMamba possesses sufficient reasoning capability, it can acquire manipulation skills with minimal fine-tuning parameters (0.1\% of the model) and time (20 minutes). In experiments, RoboMamba demonstrates outstanding reasoning capabilities on general and robotic evaluation benchmarks. Meanwhile, our model showcases impressive pose prediction results in both simulation and real-world experiments, achieving inference speeds 7 times faster than existing robot MLLMs. Our project web page: https://sites.google.com/view/robomamba-web △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.02408 [pdf, other]

Anomalous 4$f$ fine structure in TmSe$_{1-x}$Te$_x$ across the metal-insulator transition

Authors: C. -H. Min, S. Müller, W. J. Choi, L. Dudy, V. Zabolotny, M. Heber, J. D. Denlinger, C. -J. Kang, M. Kalläne, N. Wind, M. Scholz, T. L. Lee, C. Schlueter, A. Gloskovskii, E. D. L. Rienks, V. Hinkov, H. Bentmann, Y. S. Kwon, F. Reinert, K. Rossnagel

Abstract: Hybridization between localized 4$f$ and itinerant 5$d$6$s$ states in heavy fermion compounds is a well-studied phenomenon and commonly captured by the paradigmatic Anderson model. However, the investigation of additional electronic interactions, beyond the standard Anderson model, has been limited, despite their predicted important role in the exotic quasiparticle formation in mixed-valence syste… ▽ More Hybridization between localized 4$f$ and itinerant 5$d$6$s$ states in heavy fermion compounds is a well-studied phenomenon and commonly captured by the paradigmatic Anderson model. However, the investigation of additional electronic interactions, beyond the standard Anderson model, has been limited, despite their predicted important role in the exotic quasiparticle formation in mixed-valence systems. We investigate the 4$f$ states in TmSe$_{1-x}$Te$_x$ throughout a semimetal-insulator phase transition, which drastically varies the interactions related to the 4$f$ states. Using synchrotron-based hard x-ray and extreme ultraviolet photoemission spectroscopy, we resolve subtle peak splitting in the 4$f$ peaks near the Fermi level in the mixed-valent semimetal phase. The separation is enhanced by several tens of meV by increasing the lattice parameter by a few percent. Our results elucidate the evolving nature of the 4$f$ state across the phase transition, and provide direct experimental evidence for electronic interactions beyond the standard Anderson model in mixed-valence systems. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 5 pages, 4 figures for the main text, 6 pages and 5 figures for the supplementary

arXiv:2405.18047 [pdf, other]

2BP: 2-Stage Backpropagation

Authors: Christopher Rae, Joseph K. L. Lee, James Richings

Abstract: As Deep Neural Networks (DNNs) grow in size and complexity, they often exceed the memory capacity of a single accelerator, necessitating the sharding of model parameters across multiple accelerators. Pipeline parallelism is a commonly used sharding strategy for training large DNNs. However, current implementations of pipeline parallelism are being unintentionally bottlenecked by the automatic diff… ▽ More As Deep Neural Networks (DNNs) grow in size and complexity, they often exceed the memory capacity of a single accelerator, necessitating the sharding of model parameters across multiple accelerators. Pipeline parallelism is a commonly used sharding strategy for training large DNNs. However, current implementations of pipeline parallelism are being unintentionally bottlenecked by the automatic differentiation tools provided by ML frameworks. This paper introduces 2-stage backpropagation (2BP). By splitting the backward propagation step into two separate stages, we can reduce idle compute time. We tested 2BP on various model architectures and pipelining schedules, achieving increases in throughput in all cases. Using 2BP, we were able to achieve a 1.70x increase in throughput compared to traditional methods when training a LLaMa-like transformer with 7 billion parameters across 4 GPUs. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17418 [pdf, other]

Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation

Authors: Jiaming Liu, Chenxuan Li, Guanqun Wang, Lily Lee, Kaichen Zhou, Sixiang Chen, Chuyan Xiong, Jiaxin Ge, Renrui Zhang, Shanghang Zhang

Abstract: Robot manipulation policies have shown unsatisfactory action performance when confronted with novel task or object instances. Hence, the capability to automatically detect and self-correct failure action is essential for a practical robotic system. Recently, Multimodal Large Language Models (MLLMs) have shown promise in visual instruction following and demonstrated strong reasoning abilities in va… ▽ More Robot manipulation policies have shown unsatisfactory action performance when confronted with novel task or object instances. Hence, the capability to automatically detect and self-correct failure action is essential for a practical robotic system. Recently, Multimodal Large Language Models (MLLMs) have shown promise in visual instruction following and demonstrated strong reasoning abilities in various tasks. To unleash general MLLMs as an end-to-end robotic agent, we introduce a Self-Corrected (SC)-MLLM, equipping our model not only to predict end-effector poses but also to autonomously recognize and correct failure actions. Specifically, we first conduct parameter-efficient fine-tuning to empower MLLM with pose prediction ability, which is reframed as a language modeling problem. When facing execution failures, our model learns to identify low-level action error causes (i.e., position and rotation errors) and adaptively seeks prompt feedback from experts. Based on the feedback, SC-MLLM rethinks the current failure scene and generates the corrected actions. Furthermore, we design a continuous policy learning method for successfully corrected samples, enhancing the model's adaptability to the current scene configuration and reducing the frequency of expert intervention. To evaluate our SC-MLLM, we conduct extensive experiments in both simulation and real-world settings. SC-MLLM agent significantly improve manipulation accuracy compared to previous state-of-the-art robotic MLLM (ManipLLM), increasing from 57\% to 79\% on seen object categories and from 47\% to 69\% on unseen novel categories. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.08586 [pdf, other]

Cross-Domain Feature Augmentation for Domain Generalization

Authors: Yingnan Liu, Yingtian Zou, Rui Qiao, Fusheng Liu, Mong Li Lee, Wynne Hsu

Abstract: Domain generalization aims to develop models that are robust to distribution shifts. Existing methods focus on learning invariance across domains to enhance model robustness, and data augmentation has been widely used to learn invariant predictors, with most methods performing augmentation in the input space. However, augmentation in the input space has limited diversity whereas in the feature spa… ▽ More Domain generalization aims to develop models that are robust to distribution shifts. Existing methods focus on learning invariance across domains to enhance model robustness, and data augmentation has been widely used to learn invariant predictors, with most methods performing augmentation in the input space. However, augmentation in the input space has limited diversity whereas in the feature space is more versatile and has shown promising results. Nonetheless, feature semantics is seldom considered and existing feature augmentation methods suffer from a limited variety of augmented features. We decompose features into class-generic, class-specific, domain-generic, and domain-specific components. We propose a cross-domain feature augmentation method named XDomainMix that enables us to increase sample diversity while emphasizing the learning of invariant representations to achieve domain generalization. Experiments on widely used benchmark datasets demonstrate that our proposed method is able to achieve state-of-the-art performance. Quantitative analysis indicates that our feature augmentation approach facilitates the learning of effective models that are invariant across different domains. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: Accepted to the 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024); Code is available at https://github.com/NancyQuris/XDomainMix

arXiv:2405.06883 [pdf, ps, other]

Chow stability of $λ$-stable toric varieties

Authors: King leung Lee, Naoto Yotsutani

Abstract: For a given polarized toric variety, we define the notion of $λ$-stability which is a natural generalization of uniform K-stability. At the neighbourhoods of the vertices of the corresponding moment polytope $Δ$, we consider appropriate triangulations and give a sufficient criteria for a $λ$-stable polarized toric variety $(X,L)$ to be asymptotically Chow polystable when the obstruction of asympto… ▽ More For a given polarized toric variety, we define the notion of $λ$-stability which is a natural generalization of uniform K-stability. At the neighbourhoods of the vertices of the corresponding moment polytope $Δ$, we consider appropriate triangulations and give a sufficient criteria for a $λ$-stable polarized toric variety $(X,L)$ to be asymptotically Chow polystable when the obstruction of asymptotic Chow semistability (the Futaki-Ono invariant) vanishes. As an application, we prove that any K-semistable polarized smooth toric variety $(X,L)$ with the vanishing Futaki-Ono invariant is asymptotically Chow polystable. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: 36pages. Comments are welcome!

MSC Class: 51M20; 53C55; 14M25

arXiv:2405.00653 [pdf, other]

Particle scale anisotropy controls bulk properties in sheared granular materials

Authors: Carmen L. Lee, Ephraim Bililign, Emilien Azéma, Karen E. Daniels

Abstract: The bulk dynamics of dense granular materials arise through a combination of particle-scale and mesoscale effects. Theoretical and numerical studies have shown that collective effects are created by particle-scale anisotropic structures such as grain connectivity (fabric), force transmission, and frictional mobilization, all of which influence bulk properties like bulk friction and the stress tens… ▽ More The bulk dynamics of dense granular materials arise through a combination of particle-scale and mesoscale effects. Theoretical and numerical studies have shown that collective effects are created by particle-scale anisotropic structures such as grain connectivity (fabric), force transmission, and frictional mobilization, all of which influence bulk properties like bulk friction and the stress tensor through the Stress-Force-Fabric (SFF) relationship. To date, establishing the relevance of these effects to laboratory systems has remained elusive due to the challenge of measuring both normal and frictional contact forces at the particle scale. In this study, we perform experiments on a sheared photoelastic granular system in an quasi-2D annular (Couette) cell. During these experiments, we measure particle locations, contacts, and normal and frictional forces vectors during loading. We reconstruct the angular distributions of the contact and force vectors, and extract the corresponding emergent anisotropies for each of these metrics. Finally, we show that the SFF relation quantitatively predicts the relationship between particle scale anisotropies, the stress tensor components, and the bulk friction coefficient, capturing even transient behaviors. As such, this method shows promise for application to other dense particulate systems where fabric anisotropy can provide a useful measure of bulk friction. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 5 pages, 3 figures

arXiv:2404.14687 [pdf, other]

Pegasus-v1 Technical Report

Authors: Raehyuk Jung, Hyojun Go, Jaehyuk Yi, Jiho Jang, Daniel Kim, Jay Suh, Aiden Lee, Cooper Han, Jae Lee, Jeff Kim, Jin-Young Kim, Junwan Kim, Kyle Park, Lucas Lee, Mars Ha, Minjoon Seo, Abraham Jo, Ed Park, Hassan Kianinejad, SJ Kim, Tony Moon, Wade Jeong, Andrei Popescu, Esther Kim, EK Yoon , et al. (19 additional authors not shown)

Abstract: This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's archi… ▽ More This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's architecture, training strategies, and its performance in benchmarks on video conversation, zero-shot video question answering, and video summarization. We also explore qualitative characteristics of Pegasus-1 , demonstrating its capabilities as well as its limitations, in order to provide readers a balanced view of its current state and its future direction. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.11898 [pdf]

Enhancing Financial Inclusion and Regulatory Challenges: A Critical Analysis of Digital Banks and Alternative Lenders Through Digital Platforms, Machine Learning, and Large Language Models Integration

Authors: Luke Lee

Abstract: This paper explores the dual impact of digital banks and alternative lenders on financial inclusion and the regulatory challenges posed by their business models. It discusses the integration of digital platforms, machine learning (ML), and Large Language Models (LLMs) in enhancing financial services accessibility for underserved populations. Through a detailed analysis of operational frameworks an… ▽ More This paper explores the dual impact of digital banks and alternative lenders on financial inclusion and the regulatory challenges posed by their business models. It discusses the integration of digital platforms, machine learning (ML), and Large Language Models (LLMs) in enhancing financial services accessibility for underserved populations. Through a detailed analysis of operational frameworks and technological infrastructures, this research identifies key mechanisms that facilitate broader financial access and mitigate traditional barriers. Additionally, the paper addresses significant regulatory concerns involving data privacy, algorithmic bias, financial stability, and consumer protection. Employing a mixed-methods approach, which combines quantitative financial data analysis with qualitative insights from industry experts, this paper elucidates the complexities of leveraging digital technology to foster financial inclusivity. The findings underscore the necessity of evolving regulatory frameworks that harmonize innovation with comprehensive risk management. This paper concludes with policy recommendations for regulators, financial institutions, and technology providers, aiming to cultivate a more inclusive and stable financial ecosystem through prudent digital technology integration. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 17 pages

arXiv:2404.10536 [pdf, ps, other]

Benchmarking Machine Learning Applications on Heterogeneous Architecture using Reframe

Authors: Christopher Rae, Joseph K. L. Lee, James Richings, Michele Weiland

Abstract: With the rapid increase in machine learning workloads performed on HPC systems, it is beneficial to regularly perform machine learning specific benchmarks to monitor performance and identify issues. Furthermore, as part of the Edinburgh International Data Facility, EPCC currently hosts a wide range of machine learning accelerators including Nvidia GPUs, the Graphcore Bow Pod64 and Cerebras CS-2, w… ▽ More With the rapid increase in machine learning workloads performed on HPC systems, it is beneficial to regularly perform machine learning specific benchmarks to monitor performance and identify issues. Furthermore, as part of the Edinburgh International Data Facility, EPCC currently hosts a wide range of machine learning accelerators including Nvidia GPUs, the Graphcore Bow Pod64 and Cerebras CS-2, which are managed via Kubernetes and Slurm. We extended the Reframe framework to support the Kubernetes scheduler backend, and utilise Reframe to perform machine learning benchmarks, and we discuss the preliminary results collected and challenges involved in integrating Reframe across multiple platforms and architectures. △ Less

Submitted 25 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: Author accepted version of paper in the PERMAVOST workshop at the 33rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC 24)

arXiv:2404.06466 [pdf, other]

Hyperparameter Selection in Continual Learning

Authors: Thomas L. Lee, Sigrid Passano Hellan, Linus Ericsson, Elliot J. Crowley, Amos Storkey

Abstract: In continual learning (CL) -- where a learner trains on a stream of data -- standard hyperparameter optimisation (HPO) cannot be applied, as a learner does not have access to all of the data at the same time. This has prompted the development of CL-specific HPO frameworks. The most popular way to tune hyperparameters in CL is to repeatedly train over the whole data stream with different hyperparam… ▽ More In continual learning (CL) -- where a learner trains on a stream of data -- standard hyperparameter optimisation (HPO) cannot be applied, as a learner does not have access to all of the data at the same time. This has prompted the development of CL-specific HPO frameworks. The most popular way to tune hyperparameters in CL is to repeatedly train over the whole data stream with different hyperparameter settings. However, this end-of-training HPO is unrealistic as in practice a learner can only see the stream once. Hence, there is an open question: what HPO framework should a practitioner use for a CL problem in reality? This paper answers this question by evaluating several realistic HPO frameworks. We find that all the HPO frameworks considered, including end-of-training HPO, perform similarly. We therefore advocate using the realistic and most computationally efficient method: fitting the hyperparameters on the first task and then fixing them throughout training. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: Preprint, 9 pages

arXiv:2404.03575 [pdf, other]

DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling

Authors: Haoran Li, Haolin Shi, Wenli Zhang, Wenjun Wu, Yong Liao, Lin Wang, Lik-hang Lee, Pengyuan Zhou

Abstract: Text-to-3D scene generation holds immense potential for the gaming, film, and architecture sectors. Despite significant progress, existing methods struggle with maintaining high quality, consistency, and editing flexibility. In this paper, we propose DreamScene, a 3D Gaussian-based novel text-to-3D scene generation framework, to tackle the aforementioned three challenges mainly via two strategies.… ▽ More Text-to-3D scene generation holds immense potential for the gaming, film, and architecture sectors. Despite significant progress, existing methods struggle with maintaining high quality, consistency, and editing flexibility. In this paper, we propose DreamScene, a 3D Gaussian-based novel text-to-3D scene generation framework, to tackle the aforementioned three challenges mainly via two strategies. First, DreamScene employs Formation Pattern Sampling (FPS), a multi-timestep sampling strategy guided by the formation patterns of 3D objects, to form fast, semantically rich, and high-quality representations. FPS uses 3D Gaussian filtering for optimization stability, and leverages reconstruction techniques to generate plausible textures. Second, DreamScene employs a progressive three-stage camera sampling strategy, specifically designed for both indoor and outdoor settings, to effectively ensure object-environment integration and scene-wide 3D consistency. Last, DreamScene enhances scene editing flexibility by integrating objects and environments, enabling targeted adjustments. Extensive experiments validate DreamScene's superiority over current state-of-the-art techniques, heralding its wide-ranging potential for diverse applications. Code and demos will be released at https://dreamscene-project.github.io . △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2404.00874 [pdf, other]

DiSR-NeRF: Diffusion-Guided View-Consistent Super-Resolution NeRF

Authors: Jie Long Lee, Chen Li, Gim Hee Lee

Abstract: We present DiSR-NeRF, a diffusion-guided framework for view-consistent super-resolution (SR) NeRF. Unlike prior works, we circumvent the requirement for high-resolution (HR) reference images by leveraging existing powerful 2D super-resolution models. Nonetheless, independent SR 2D images are often inconsistent across different views. We thus propose Iterative 3D Synchronization (I3DS) to mitigate… ▽ More We present DiSR-NeRF, a diffusion-guided framework for view-consistent super-resolution (SR) NeRF. Unlike prior works, we circumvent the requirement for high-resolution (HR) reference images by leveraging existing powerful 2D super-resolution models. Nonetheless, independent SR 2D images are often inconsistent across different views. We thus propose Iterative 3D Synchronization (I3DS) to mitigate the inconsistency problem via the inherent multi-view consistency property of NeRF. Specifically, our I3DS alternates between upscaling low-resolution (LR) rendered images with diffusion models, and updating the underlying 3D representation with standard NeRF training. We further introduce Renoised Score Distillation (RSD), a novel score-distillation objective for 2D image resolution. Our RSD combines features from ancestral sampling and Score Distillation Sampling (SDS) to generate sharp images that are also LR-consistent. Qualitative and quantitative results on both synthetic and real-world datasets demonstrate that our DiSR-NeRF can achieve better results on NeRF super-resolution compared with existing works. Code and video results available at the project website. △ Less

Submitted 31 March, 2024; originally announced April 2024.

arXiv:2403.14090 [pdf]

Dynamic motion trajectory control with nanoradian accuracy for multi-element X-ray optical systems via laser interferometry

Authors: Sina M Koehlenbeck, Lance Lee, Mario D Balcazar, Ying Chen, Vincent Esposito, Jerry Hastings, Matthias C Hoffmann, Zhirong Huang, May-Ling Ng, Saxon Price, Takahiro Sato, Matthew Seaberg, Yanwen Sun, Adam White, Lin Zhang, Brian Lantz, Diling Zhu

Abstract: The past decades have witnessed the development of new X-ray beam sources with brightness growing at a rate surpassing Moore's law. Current and upcoming diffraction limited and fully coherent X-ray beam sources, including multi-bend achromat based synchrotron sources and high repetition rate X-ray free electron lasers, puts increasingly stringent requirements on stability and accuracy of X-ray opt… ▽ More The past decades have witnessed the development of new X-ray beam sources with brightness growing at a rate surpassing Moore's law. Current and upcoming diffraction limited and fully coherent X-ray beam sources, including multi-bend achromat based synchrotron sources and high repetition rate X-ray free electron lasers, puts increasingly stringent requirements on stability and accuracy of X-ray optics systems. Parasitic motion errors at sub-micro radian scale in beam transport and beam conditioning optics can lead to significant loss of coherence and brightness delivered from source to experiment. To address this challenge, we incorporated optical metrology based on interferometry and differential wavefront sensing as part of the X-ray optics motion control system. A prototype X-ray optics system was constructed following the optical layout of a tunable X-ray cavity. On-line interferometric metrology enabled dynamical feedback to a motion control system to track and compensate for motion errors. The system achieved sub-microradian scale performance, as multiple optical elements are synchronously and continuously adjusted. This first proof of principle measurement demonstrated both the potential and necessity of incorporating optical metrology as part of the motion control architecture for large scale X-ray optical systems such as monochromators, delay lines, and in particular, X-ray cavity systems to enable the next generation cavity-based X-ray free electron lasers. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.11384 [pdf, other]

Towards Massive Interaction with Generalist Robotics: A Systematic Review of XR-enabled Remote Human-Robot Interaction Systems

Authors: Xian Wang, Luyao Shen, Lik-Hang Lee

Abstract: The rising interest of generalist robots seek to create robots with versatility to handle multiple tasks in a variety of environments, and human will interact with such robots through immersive interfaces. In the context of human-robot interaction (HRI), this survey provides an exhaustive review of the applications of extended reality (XR) technologies in the field of remote HRI. We developed a sy… ▽ More The rising interest of generalist robots seek to create robots with versatility to handle multiple tasks in a variety of environments, and human will interact with such robots through immersive interfaces. In the context of human-robot interaction (HRI), this survey provides an exhaustive review of the applications of extended reality (XR) technologies in the field of remote HRI. We developed a systematic search strategy based on the PRISMA methodology. From the initial 2,561 articles selected, 100 research papers that met our inclusion criteria were included. We categorized and summarized the domain in detail, delving into XR technologies, including augmented reality (AR), virtual reality (VR), and mixed reality (MR), and their applications in facilitating intuitive and effective remote control and interaction with robotic systems. The survey highlights existing articles on the application of XR technologies, user experience enhancement, and various interaction designs for XR in remote HRI, providing insights into current trends and future directions. We also identified potential gaps and opportunities for future research to improve remote HRI systems through XR technology to guide and inform future XR and robotics research. △ Less

Submitted 26 March, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.08295 [pdf, other]

Gemma: Open Models Based on Gemini Research and Technology

Authors: Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari , et al. (83 additional authors not shown)

Abstract: This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge… ▽ More This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations. △ Less

Submitted 16 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.05131 [pdf, other]

Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation

Authors: Joseph Cho, Fachrina Dewi Puspitasari, Sheng Zheng, Jingyao Zheng, Lik-Hang Lee, Tae-Ho Kim, Choong Seon Hong, Chaoning Zhang

Abstract: The evolution of video generation from text, starting with animating MNIST numbers to simulating the physical world with Sora, has progressed at a breakneck speed over the past seven years. While often seen as a superficial expansion of the predecessor text-to-image generation model, text-to-video generation models are developed upon carefully engineered constituents. Here, we systematically discu… ▽ More The evolution of video generation from text, starting with animating MNIST numbers to simulating the physical world with Sora, has progressed at a breakneck speed over the past seven years. While often seen as a superficial expansion of the predecessor text-to-image generation model, text-to-video generation models are developed upon carefully engineered constituents. Here, we systematically discuss these elements consisting of but not limited to core building blocks (vision, language, and temporal) and supporting features from the perspective of their contributions to achieving a world model. We employ the PRISMA framework to curate 97 impactful research articles from renowned scientific databases primarily studying video synthesis using text conditions. Upon minute exploration of these manuscripts, we observe that text-to-video generation involves more intricate technologies beyond the plain extension of text-to-image generation. Our additional review into the shortcomings of Sora-generated videos pinpoints the call for more in-depth studies in various enabling aspects of video generation such as dataset, evaluation metric, efficient architecture, and human-controlled generation. Finally, we conclude that the study of the text-to-video generation may still be in its infancy, requiring contribution from the cross-discipline research community towards its advancement as the first step to realize artificial general intelligence (AGI). △ Less

Submitted 7 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

Comments: First complete survey on Text-to-Video Generation, 44 pages, 20 figures

arXiv:2403.03379 [pdf, other]

The ALMA-CRISTAL survey: Extended [CII] emission in an interacting galaxy system at z ~ 5.5

Authors: A. Posses, M. Aravena, J. González-López, N. M. Förster Schreiber, D. Liu, L. Lee, M. Solimano, T. Díaz-Santos, R. J. Assef, L. Barcos-Muñoz, S. Bovino, R. A. A. Bowler, G. Calistro Rivera, E. da Cunha, R. L. Davies, M. Killi, I. De Looze, A. Ferrara, D. B. Fisher, R. Herrera-Camus, R. Ikeda, T. Lambert, J. Li, D. Lutz, I. Mitsuhashi , et al. (9 additional authors not shown)

Abstract: The ALMA [CII] Resolved Ism in STar-forming gALaxies (CRISTAL) survey is a Cycle 8 ALMA Large Programme that studies the cold gas component of high-redshift galaxies. Its sub-arcsecond resolution observations are key to disentangling physical mechanisms that shape galaxies during cosmic dawn. In this paper, we explore the morphology and kinematics of the cold gas, star-forming, and stellar compone… ▽ More The ALMA [CII] Resolved Ism in STar-forming gALaxies (CRISTAL) survey is a Cycle 8 ALMA Large Programme that studies the cold gas component of high-redshift galaxies. Its sub-arcsecond resolution observations are key to disentangling physical mechanisms that shape galaxies during cosmic dawn. In this paper, we explore the morphology and kinematics of the cold gas, star-forming, and stellar components in the star-forming main-sequence galaxy CRISTAL-05/HZ3, at z = 5.54. Our analysis includes 0.3" spatial resolution (~2 kpc) ALMA observations of the [CII] line. While CRISTAL-05 was previously classified as a single source, our observations reveal that the system is a close interacting pair surrounded by an extended component of carbon-enriched gas. This is imprinted in the disturbed elongated [CII] morphology and the separation of the two components in the position-velocity diagram (~100 km/s). The central region is composed of two components, named C05-NW and C05-SE, with the former being the dominant one. A significant fraction of the [CII] arises beyond the close pair up to 10 kpc, while the regions forming new massive stars and the stellar component seem compact (r_[CII] ~ 4 r_UV), as traced by rest-frame UV and optical imaging obtained with the Hubble Space Telescope and the James Webb Space Telescope. Our kinematic model, using the DYSMALpy software, yields a minor contribution of dark matter of C05-NW within a radius of ~2x Reff. Finally, we explore the resolved [CII]/FIR ratios as a proxy for shock-heating produced by this merger. We argue that the extended [CII] emission is mainly caused by the merger, which could not be discerned with lower-resolution observations. Our work emphasizes the need for high-resolution observations to fully characterize the dynamic stages of infant galaxies and the physical mechanisms that drive the metal enrichment of the circumgalactic medium. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: Submitted to A&A - comments are welcome! - 19 pages, 13 figures

arXiv:2403.03170 [pdf, other]

SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection

Authors: Peng Qi, Zehong Yan, Wynne Hsu, Mong Li Lee

Abstract: Misinformation is a prevalent societal issue due to its potential high risks. Out-of-context (OOC) misinformation, where authentic images are repurposed with false text, is one of the easiest and most effective ways to mislead audiences. Current methods focus on assessing image-text consistency but lack convincing explanations for their judgments, which is essential for debunking misinformation. W… ▽ More Misinformation is a prevalent societal issue due to its potential high risks. Out-of-context (OOC) misinformation, where authentic images are repurposed with false text, is one of the easiest and most effective ways to mislead audiences. Current methods focus on assessing image-text consistency but lack convincing explanations for their judgments, which is essential for debunking misinformation. While Multimodal Large Language Models (MLLMs) have rich knowledge and innate capability for visual reasoning and explanation generation, they still lack sophistication in understanding and discovering the subtle crossmodal differences. In this paper, we introduce SNIFFER, a novel multimodal large language model specifically engineered for OOC misinformation detection and explanation. SNIFFER employs two-stage instruction tuning on InstructBLIP. The first stage refines the model's concept alignment of generic objects with news-domain entities and the second stage leverages language-only GPT-4 generated OOC-specific instruction data to fine-tune the model's discriminatory powers. Enhanced by external tools and retrieval, SNIFFER not only detects inconsistencies between text and image but also utilizes external knowledge for contextual verification. Our experiments show that SNIFFER surpasses the original MLLM by over 40% and outperforms state-of-the-art methods in detection accuracy. SNIFFER also provides accurate and persuasive explanations as validated by quantitative and human evaluations. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: To appear in CVPR 2024

arXiv:2402.13913 [pdf, other]

An Automated Chemical Exploration of NGC 6334I at 340 au Resolution

Authors: Samer J. El-Abd, Crystal L. Brogan, Todd R. Hunter, Kin Long Kelvin Lee, Ryan A. Loomis, Brett A. McGuire

Abstract: Much of the information gleaned from observations of star-forming regions comes from the analysis of their molecular emission spectra, particularly in the radio regime. The time-consuming nature of fitting synthetic spectra to observations interactively for such line-rich sources, however, often results in such analysis being limited to data extracted from a single-dish observation or a handful of… ▽ More Much of the information gleaned from observations of star-forming regions comes from the analysis of their molecular emission spectra, particularly in the radio regime. The time-consuming nature of fitting synthetic spectra to observations interactively for such line-rich sources, however, often results in such analysis being limited to data extracted from a single-dish observation or a handful of pixels from an interferometric observation. Yet, star-forming regions display a wide variety of physical conditions that are difficult, if not impossible, to accurately characterize with such a limited number of spectra. We have developed an automated fitting routine that visits every pixel in the field of view of an ALMA data cube and determines the best-fit physical parameters, including excitation temperature and column densities, for a given list of molecules. In this proof-of-concept work, we provide an overview of the fitting routine and apply it to 0".26, 1.1 km s$^{-1}$ resolution ALMA observations of two sites of massive star-formation in NGC 6334I. Parameters were found for 21 distinct molecules by generating synthetic spectra across 7.48 GHz of spectral bandwidth between 280 and 351 GHz. Spatial images of the derived parameters for each of the > 8000 pixels are presented with special attention paid to the C$_2$H$_4$O$_2$ isomers and their relative variations. We highlight the greater scientific utility of the column density and velocity images of individual molecules compared to traditional moment maps of single transitions. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: 40 pages, 71 figures, accepted for publication in The Astrophysical Journal

arXiv:2402.06642 [pdf, other]

From GARCH to Neural Network for Volatility Forecast

Authors: Pengfei Zhao, Haoren Zhu, Wilfred Siu Hung NG, Dik Lun Lee

Abstract: Volatility, as a measure of uncertainty, plays a crucial role in numerous financial activities such as risk management. The Econometrics and Machine Learning communities have developed two distinct approaches for financial volatility forecasting: the stochastic approach and the neural network (NN) approach. Despite their individual strengths, these methodologies have conventionally evolved in sepa… ▽ More Volatility, as a measure of uncertainty, plays a crucial role in numerous financial activities such as risk management. The Econometrics and Machine Learning communities have developed two distinct approaches for financial volatility forecasting: the stochastic approach and the neural network (NN) approach. Despite their individual strengths, these methodologies have conventionally evolved in separate research trajectories with little interaction between them. This study endeavors to bridge this gap by establishing an equivalence relationship between models of the GARCH family and their corresponding NN counterparts. With the equivalence relationship established, we introduce an innovative approach, named GARCH-NN, for constructing NN-based volatility models. It obtains the NN counterparts of GARCH models and integrates them as components into an established NN architecture, thereby seamlessly infusing volatility stylized facts (SFs) inherent in the GARCH models into the neural network. We develop the GARCH-LSTM model to showcase the power of the GARCH-NN approach. Experiment results validate that amalgamating the NN counterparts of the GARCH family models into established NN models leads to enhanced outcomes compared to employing the stochastic and NN models in isolation. △ Less

Submitted 29 January, 2024; originally announced February 2024.

Comments: Accepted by AAAI'24

arXiv:2402.03988 [pdf, other]

REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR

Authors: Liang-Hsuan Tseng, En-Pei Hu, Cheng-Han Chiang, Yuan Tseng, Hung-yi Lee, Lin-shan Lee, Shao-Hua Sun

Abstract: Unsupervised automatic speech recognition (ASR) aims to learn the mapping between the speech signal and its corresponding textual transcription without the supervision of paired speech-text data. A word/phoneme in the speech signal is represented by a segment of speech signal with variable length and unknown boundary, and this segmental structure makes learning the mapping between speech and text… ▽ More Unsupervised automatic speech recognition (ASR) aims to learn the mapping between the speech signal and its corresponding textual transcription without the supervision of paired speech-text data. A word/phoneme in the speech signal is represented by a segment of speech signal with variable length and unknown boundary, and this segmental structure makes learning the mapping between speech and text challenging, especially without paired data. In this paper, we propose REBORN,Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR. REBORN alternates between (1) training a segmentation model that predicts the boundaries of the segmental structures in speech signals and (2) training the phoneme prediction model, whose input is the speech feature segmented by the segmentation model, to predict a phoneme transcription. Since supervised data for training the segmentation model is not available, we use reinforcement learning to train the segmentation model to favor segmentations that yield phoneme sequence predictions with a lower perplexity. We conduct extensive experiments and find that under the same setting, REBORN outperforms all prior unsupervised ASR models on LibriSpeech, TIMIT, and five non-English languages in Multilingual LibriSpeech. We comprehensively analyze why the boundaries learned by REBORN improve the unsupervised ASR performance. △ Less

Submitted 28 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.03869 [pdf]

doi 10.1103/PhysRevA.109.043514

Eigenmode Decomposition Method for Full-Wave Modeling of Microring Resonators

Authors: Yuriy Akimov, Aswin Alexander Eapen, Shiyang Zhu, Doris K. T. Ng, Nanxi Li, Woon Leng Loh, Lennon Y. T. Lee, Alagappan Gandhi, Aravind P. Anthur

Abstract: We develop a theoretical predictive model for an all-pass ring resonator that enables the most complete description of linear coupling regimes. The model is based on eigenmode decomposition of Maxwell's equations with full account of the confined and leaky modes, as opposed to the existing phenomenological methods restricted to the confined modes only. This model enables quantitative description o… ▽ More We develop a theoretical predictive model for an all-pass ring resonator that enables the most complete description of linear coupling regimes. The model is based on eigenmode decomposition of Maxwell's equations with full account of the confined and leaky modes, as opposed to the existing phenomenological methods restricted to the confined modes only. This model enables quantitative description of all-pass ring resonators and provides insights into the physics underlying microring-waveguide coupling. We experimentally validate the model using transmission measurements in the linear regime of aluminium nitride resonators. The developed model is then used to explore the field enhancement in microrings crucial for nonlinear photonic applications. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: 8 pages, 11 figures

Journal ref: Physical Review A 109, 043514 (2024)

arXiv:2402.01697 [pdf, other]

doi 10.1145/3589334.3645642

APT-Pipe: A Prompt-Tuning Tool for Social Data Annotation using ChatGPT

Authors: Yiming Zhu, Zhizhuo Yin, Gareth Tyson, Ehsan-Ul Haq, Lik-Hang Lee, Pan Hui

Abstract: Recent research has highlighted the potential of LLM applications, like ChatGPT, for performing label annotation on social computing text. However, it is already well known that performance hinges on the quality of the input prompts. To address this, there has been a flurry of research into prompt tuning -- techniques and guidelines that attempt to improve the quality of prompts. Yet these largely… ▽ More Recent research has highlighted the potential of LLM applications, like ChatGPT, for performing label annotation on social computing text. However, it is already well known that performance hinges on the quality of the input prompts. To address this, there has been a flurry of research into prompt tuning -- techniques and guidelines that attempt to improve the quality of prompts. Yet these largely rely on manual effort and prior knowledge of the dataset being annotated. To address this limitation, we propose APT-Pipe, an automated prompt-tuning pipeline. APT-Pipe aims to automatically tune prompts to enhance ChatGPT's text classification performance on any given dataset. We implement APT-Pipe and test it across twelve distinct text classification datasets. We find that prompts tuned by APT-Pipe help ChatGPT achieve higher weighted F1-score on nine out of twelve experimented datasets, with an improvement of 7.01% on average. We further highlight APT-Pipe's flexibility as a framework by showing how it can be extended to support additional tuning mechanisms. △ Less

Submitted 20 February, 2024; v1 submitted 24 January, 2024; originally announced February 2024.

Comments: Accepted by WWW 2024; Camera-ready version

arXiv:2401.13463 [pdf, other]

SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering

Authors: Chyi-Jiunn Lin, Guan-Ting Lin, Yung-Sung Chuang, Wei-Lun Wu, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Lin-shan Lee

Abstract: Spoken Question Answering (SQA) is essential for machines to reply to user's question by finding the answer span within a given spoken passage. SQA has been previously achieved without ASR to avoid recognition errors and Out-of-Vocabulary (OOV) problems. However, the real-world problem of Open-domain SQA (openSQA), in which the machine needs to first retrieve passages that possibly contain the ans… ▽ More Spoken Question Answering (SQA) is essential for machines to reply to user's question by finding the answer span within a given spoken passage. SQA has been previously achieved without ASR to avoid recognition errors and Out-of-Vocabulary (OOV) problems. However, the real-world problem of Open-domain SQA (openSQA), in which the machine needs to first retrieve passages that possibly contain the answer from a spoken archive in addition, was never considered. This paper proposes the first known end-to-end framework, Speech Dense Passage Retriever (SpeechDPR), for the retrieval component of the openSQA problem. SpeechDPR learns a sentence-level semantic representation by distilling knowledge from the cascading model of unsupervised ASR (UASR) and text dense retriever (TDR). No manually transcribed speech data is needed. Initial experiments showed performance comparable to the cascading model of UASR and TDR, and significantly better when UASR was poor, verifying this approach is more robust to speech recognition errors. △ Less

Submitted 18 March, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: Accepted at ICASSP 2024

arXiv:2401.07331 [pdf, other]

Rapid Estimation of Left Ventricular Contractility with a Physics-Informed Neural Network Inverse Modeling Approach

Authors: Ehsan Naghavi, Haifeng Wang, Lei Fan, Jenny S. Choy, Ghassan Kassab, Seungik Baek, Lik-Chuan Lee

Abstract: Physics-based computer models based on numerical solution of the governing equations generally cannot make rapid predictions, which in turn, limits their applications in the clinic. To address this issue, we developed a physics-informed neural network (PINN) model that encodes the physics of a closed-loop blood circulation system embedding a left ventricle (LV). The PINN model is trained to satisf… ▽ More Physics-based computer models based on numerical solution of the governing equations generally cannot make rapid predictions, which in turn, limits their applications in the clinic. To address this issue, we developed a physics-informed neural network (PINN) model that encodes the physics of a closed-loop blood circulation system embedding a left ventricle (LV). The PINN model is trained to satisfy a system of ordinary differential equations (ODEs) associated with a lumped parameter description of the circulatory system. The model predictions have a maximum error of less than 5% when compared to those obtained by solving the ODEs numerically. An inverse modeling approach using the PINN model is also developed to rapidly estimate model parameters (in $\sim$ 3 mins) from single-beat LV pressure and volume waveforms. Using synthetic LV pressure and volume waveforms generated by the PINN model with different model parameter values, we show that the inverse modeling approach can recover the corresponding ground truth values, which suggests that the model parameters are unique. The PINN inverse modeling approach is then applied to estimate LV contractility indexed by the end-systolic elastance $E_{es}$ using waveforms acquired from 11 swine models, including waveforms acquired before and after administration of dobutamine (an inotropic agent) in 3 animals. The estimated $E_{es}$ is about 58% to 284% higher for the data associated with dobutamine compared to those without, which implies that this approach can be used to estimate LV contractility using single-beat measurements. The PINN inverse modeling can potentially be used in the clinic to simultaneously estimate LV contractility and other physiological parameters from single-beat measurements. △ Less

Submitted 14 January, 2024; originally announced January 2024.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.09799 [pdf, other]

doi 10.1109/OJCAS.2023.3344094

IQNet: Image Quality Assessment Guided Just Noticeable Difference Prefiltering For Versatile Video Coding

Authors: Yu-Han Sun, Chiang Lo-Hsuan Lee, Tian-Sheuan Chang

Abstract: Image prefiltering with just noticeable distortion (JND) improves coding efficiency in a visual lossless way by filtering the perceptually redundant information prior to compression. However, real JND cannot be well modeled with inaccurate masking equations in traditional approaches or image-level subject tests in deep learning approaches. Thus, this paper proposes a fine-grained JND prefiltering… ▽ More Image prefiltering with just noticeable distortion (JND) improves coding efficiency in a visual lossless way by filtering the perceptually redundant information prior to compression. However, real JND cannot be well modeled with inaccurate masking equations in traditional approaches or image-level subject tests in deep learning approaches. Thus, this paper proposes a fine-grained JND prefiltering dataset guided by image quality assessment for accurate block-level JND modeling. The dataset is constructed from decoded images to include coding effects and is also perceptually enhanced with block overlap and edge preservation. Furthermore, based on this dataset, we propose a lightweight JND prefiltering network, IQNet, which can be applied directly to different quantization cases with the same model and only needs 3K parameters. The experimental results show that the proposed approach to Versatile Video Coding could yield maximum/average bitrate savings of 41\%/15\% and 53\%/19\% for all-intra and low-delay P configurations, respectively, with negligible subjective quality loss. Our method demonstrates higher perceptual quality and a model size that is an order of magnitude smaller than previous deep learning methods. △ Less

Submitted 15 December, 2023; originally announced December 2023.

arXiv:2312.08153 [pdf, other]

$ρ$-Diffusion: A diffusion-based density estimation framework for computational physics

Authors: Maxwell X. Cai, Kin Long Kelvin Lee

Abstract: In physics, density $ρ(\cdot)$ is a fundamentally important scalar function to model, since it describes a scalar field or a probability density function that governs a physical process. Modeling $ρ(\cdot)$ typically scales poorly with parameter space, however, and quickly becomes prohibitively difficult and computationally expensive. One promising avenue to bypass this is to leverage the capabili… ▽ More In physics, density $ρ(\cdot)$ is a fundamentally important scalar function to model, since it describes a scalar field or a probability density function that governs a physical process. Modeling $ρ(\cdot)$ typically scales poorly with parameter space, however, and quickly becomes prohibitively difficult and computationally expensive. One promising avenue to bypass this is to leverage the capabilities of denoising diffusion models often used in high-fidelity image generation to parameterize $ρ(\cdot)$ from existing scientific data, from which new samples can be trivially sampled from. In this paper, we propose $ρ$-Diffusion, an implementation of denoising diffusion probabilistic models for multidimensional density estimation in physics, which is currently in active development and, from our results, performs well on physically motivated 2D and 3D density functions. Moreover, we propose a novel hashing technique that allows $ρ$-Diffusion to be conditioned by arbitrary amounts of physical parameters of interest. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: 6 pages, 2 figures, accepted for publication at the NeurIPS 2023 workshop "Machine Learning and the Physical Sciences"

arXiv:2311.17671 [pdf, other]

The ALMA-CRISTAL survey: Widespread dust-obscured star formation in typical star-forming galaxies at z=4-6

Authors: Ikki Mitsuhashi, Ken-ichi Tadaki, Ryota Ikeda, Rodrigo Herrera-Camus, Manuel Aravena, Ilse De Looze, Natascha M. Förster Schreiber, Jorge González-López, Justin Spilker, Roberto J. Assef, Rychard Bouwens, Loreto Barcos-Munoz, Jack Birkin, Rebecca A. A. Bowler, Gabriela Calistro Rivera, Rebecca Davies, Elisabete Da Cunha, Tanio Díaz-Santos, Andrea Ferrara, Deanne Fisher, Lilian L. Lee, Juno Li, Dieter Lutz, Monica Relaño, Thorsten Naab , et al. (7 additional authors not shown)

Abstract: We present the morphological parameters and global properties of dust-obscured star formation in typical star-forming galaxies at z=4-6. Among 26 galaxies composed of 20 galaxies observed by the Cycle-8 ALMA Large Program, CRISTAL, and six galaxies from archival data, we have individually detected rest-frame 158$μ$m dust continuum emission from 19 galaxies, nine of which are reported for the first… ▽ More We present the morphological parameters and global properties of dust-obscured star formation in typical star-forming galaxies at z=4-6. Among 26 galaxies composed of 20 galaxies observed by the Cycle-8 ALMA Large Program, CRISTAL, and six galaxies from archival data, we have individually detected rest-frame 158$μ$m dust continuum emission from 19 galaxies, nine of which are reported for the first time. The derived far-infrared luminosities are in the range $\log_{10} L_{\rm IR}\,[L_{\odot}]=$10.9-12.4, an order of magnitude lower than previously detected massive dusty star-forming galaxies (DSFGs). The average relationship between the fraction of dust-obscured star formation ($f_{\rm obs}$) and the stellar mass is consistent with previous results at z=4-6 in a mass range of $\log_{10} M_{\ast}\,[M_{\odot}]\sim$9.5-11.0 and show potential evolution from z=6-9. The individual $f_{\rm obs}$ exhibits a significant diversity, and it shows a correlation with the spatial offset between the dust and the UV continuum, suggesting the inhomogeneous dust reddening may cause the source-to-source scatter in $f_{\rm obs}$. The effective radii of the dust emission are on average $\sim$1.5 kpc and are $\sim2$ times more extended than the rest-frame UV. The infrared surface densities of these galaxies ($Σ_{\rm IR}\sim2.0\times10^{10}\,L_{\odot}\,{\rm kpc}^{-2}$) are one order of magnitude lower than those of DSFGs that host compact central starbursts. On the basis of the comparable contribution of dust-obscured and dust-unobscured star formation along with their similar spatial extent, we suggest that typical star-forming galaxies at z=4-6 form stars throughout the entirety of their disks. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.15452 [pdf, other]

Buckling instability in a chain of sticky bubbles

Authors: Carmen L. Lee, Kari Dalnoki-Veress

Abstract: A slender object undergoing an axial compression will buckle to alleviate the stress. Typically the morphology of the deformed object depends on the bending stiffness for solids, or the viscoelastic properties for liquid threads. We study a chain of uniform sticky air bubbles that rise due to buoyancy through an aqueous bath. A buckling instability of the bubble chain with a characteristic wavelen… ▽ More A slender object undergoing an axial compression will buckle to alleviate the stress. Typically the morphology of the deformed object depends on the bending stiffness for solids, or the viscoelastic properties for liquid threads. We study a chain of uniform sticky air bubbles that rise due to buoyancy through an aqueous bath. A buckling instability of the bubble chain with a characteristic wavelength is observed. If a chain of bubbles is produced faster than it is able to rise, the dominance of viscous drag over buoyancy results in a compressive stress that is alleviated by buckling the bubble chain. Using low Reynolds number hydrodynamics, we predict the critical buckling speed, the terminal speed of a buckled chain, and the geometry of the buckles. △ Less

Submitted 30 May, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

Comments: 7 pages, 5 figures

arXiv:2311.15294 [pdf, other]

A Study of Partisan News Sharing in the Russian invasion of Ukraine

Authors: Yiming Zhu, Ehsan-Ul Haq, Gareth Tyson, Lik-Hang Lee, Yuyang Wang, Pan Hui

Abstract: Since the Russian invasion of Ukraine, a large volume of biased and partisan news has been spread via social media platforms. As this may lead to wider societal issues, we argue that understanding how partisan news sharing impacts users' communication is crucial for better governance of online communities. In this paper, we perform a measurement study of partisan news sharing. We aim to characteri… ▽ More Since the Russian invasion of Ukraine, a large volume of biased and partisan news has been spread via social media platforms. As this may lead to wider societal issues, we argue that understanding how partisan news sharing impacts users' communication is crucial for better governance of online communities. In this paper, we perform a measurement study of partisan news sharing. We aim to characterize the role of such sharing in influencing users' communications. Our analysis covers an eight-month dataset across six Reddit communities related to the Russian invasion. We first perform an analysis of the temporal evolution of partisan news sharing. We confirm that the invasion stimulates discussion in the observed communities, accompanied by an increased volume of partisan news sharing. Next, we characterize users' response to such sharing. We observe that partisan bias plays a role in narrowing its propagation. More biased media is less likely to be spread across multiple subreddits. However, we find that partisan news sharing attracts more users to engage in the discussion, by generating more comments. We then built a predictive model to identify users likely to spread partisan news. The prediction is challenging though, with 61.57% accuracy on average. Our centrality analysis on the commenting network further indicates that the users who disseminate partisan news possess lower network influence in comparison to those who propagate neutral news. △ Less

Submitted 26 November, 2023; originally announced November 2023.

arXiv:2311.03210 [pdf, other]

Quantum Task Offloading with the OpenMP API

Authors: Joseph K. L. Lee, Oliver T. Brown, Mark Bull, Martin Ruefenacht, Johannes Doerfert, Michael Klemm, Martin Schulz

Abstract: Most of the widely used quantum programming languages and libraries are not designed for the tightly coupled nature of hybrid quantum-classical algorithms, which run on quantum resources that are integrated on-premise with classical HPC infrastructure. We propose a programming model using the API provided by OpenMP to target quantum devices, which provides an easy-to-use and efficient interface fo… ▽ More Most of the widely used quantum programming languages and libraries are not designed for the tightly coupled nature of hybrid quantum-classical algorithms, which run on quantum resources that are integrated on-premise with classical HPC infrastructure. We propose a programming model using the API provided by OpenMP to target quantum devices, which provides an easy-to-use and efficient interface for HPC applications to utilize quantum compute resources. We have implemented a variational quantum eigensolver using the programming model, which has been tested using a classical simulator. We are in the process of testing on the quantum resources hosted at the Leibniz Supercomputing Centre (LRZ). △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: Poster extended abstract for Supercomputing 2023 (SC23)

arXiv:2311.00612 [pdf]

doi 10.1109/DSAA.2017.18

A Collaborative Filtering-Based Two Stage Model with Item Dependency for Course Recommendation

Authors: Eric L. Lee, Tsung-Ting Kuo, Shou-De Lin

Abstract: Recommender systems have been studied for decades with numerous promising models been proposed. Among them, Collaborative Filtering (CF) models are arguably the most successful one due to its high accuracy in recommendation and elimination of privacy-concerned personal meta-data from training. This paper extends the usage of CF-based model to the task of course recommendation. We point out several… ▽ More Recommender systems have been studied for decades with numerous promising models been proposed. Among them, Collaborative Filtering (CF) models are arguably the most successful one due to its high accuracy in recommendation and elimination of privacy-concerned personal meta-data from training. This paper extends the usage of CF-based model to the task of course recommendation. We point out several challenges in applying the existing CF-models to build a course recommendation engine, including the lack of rating and meta-data, the imbalance of course registration distribution, and the demand of course dependency modeling. We then propose several ideas to address these challenges. Eventually, we combine a two-stage CF model regularized by course dependency with a graph-based recommender based on course-transition network, to achieve AUC as high as 0.97 with a real-world dataset. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: 8 pages, 2 figures, 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA)

Journal ref: In 2017 IEEE DSAA, pp. 496-503. IEEE, 2017

arXiv:2310.09129 [pdf, other]

Computing Marginal and Conditional Divergences between Decomposable Models with Applications

Authors: Loong Kuan Lee, Geoffrey I. Webb, Daniel F. Schmidt, Nico Piatkowski

Abstract: The ability to compute the exact divergence between two high-dimensional distributions is useful in many applications but doing so naively is intractable. Computing the alpha-beta divergence -- a family of divergences that includes the Kullback-Leibler divergence and Hellinger distance -- between the joint distribution of two decomposable models, i.e chordal Markov networks, can be done in time ex… ▽ More The ability to compute the exact divergence between two high-dimensional distributions is useful in many applications but doing so naively is intractable. Computing the alpha-beta divergence -- a family of divergences that includes the Kullback-Leibler divergence and Hellinger distance -- between the joint distribution of two decomposable models, i.e chordal Markov networks, can be done in time exponential in the treewidth of these models. However, reducing the dissimilarity between two high-dimensional objects to a single scalar value can be uninformative. Furthermore, in applications such as supervised learning, the divergence over a conditional distribution might be of more interest. Therefore, we propose an approach to compute the exact alpha-beta divergence between any marginal or conditional distribution of two decomposable models. Doing so tractably is non-trivial as we need to decompose the divergence between these distributions and therefore, require a decomposition over the marginal and conditional distributions of these models. Consequently, we provide such a decomposition and also extend existing work to compute the marginal and conditional alpha-beta divergence between these decompositions. We then show how our method can be used to analyze distributional changes by first applying it to a benchmark image dataset. Finally, based on our framework, we propose a novel way to quantify the error in contemporary superconducting quantum computers. Code for all experiments is available at: https://lklee.dev/pub/2023-icdm/code △ Less

Submitted 13 October, 2023; originally announced October 2023.

Comments: 10 pages, 8 figures, Accepted at the IEEE International Conference on Data Mining (ICDM) 2023

arXiv:2310.08864 [pdf, other]

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io. △ Less

Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: Project website: https://robotics-transformer-x.github.io

arXiv:2310.08738 [pdf, other]

Splicing Up Your Predictions with RNA Contrastive Learning

Authors: Philip Fradkin, Ruian Shi, Bo Wang, Brendan Frey, Leo J. Lee

Abstract: In the face of rapidly accumulating genomic data, our understanding of the RNA regulatory code remains incomplete. Recent self-supervised methods in other domains have demonstrated the ability to learn rules underlying the data-generating process such as sentence structure in language. Inspired by this, we extend contrastive learning techniques to genomic data by utilizing functional similarities… ▽ More In the face of rapidly accumulating genomic data, our understanding of the RNA regulatory code remains incomplete. Recent self-supervised methods in other domains have demonstrated the ability to learn rules underlying the data-generating process such as sentence structure in language. Inspired by this, we extend contrastive learning techniques to genomic data by utilizing functional similarities between sequences generated through alternative splicing and gene duplication. Our novel dataset and contrastive objective enable the learning of generalized RNA isoform representations. We validate their utility on downstream tasks such as RNA half-life and mean ribosome load prediction. Our pre-training strategy yields competitive results using linear probing on both tasks, along with up to a two-fold increase in Pearson correlation in low-data conditions. Importantly, our exploration of the learned latent space reveals that our contrastive objective yields semantically meaningful representations, underscoring its potential as a valuable initialization technique for RNA property prediction. △ Less

Submitted 17 October, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.07864 [pdf, other]

doi 10.1145/3624062.3626081

Towards Foundation Models for Materials Science: The Open MatSci ML Toolkit

Authors: Kin Long Kelvin Lee, Carmelo Gonzales, Matthew Spellings, Mikhail Galkin, Santiago Miret, Nalini Kumar

Abstract: Artificial intelligence and machine learning have shown great promise in their ability to accelerate novel materials discovery. As researchers and domain scientists seek to unify and consolidate chemical knowledge, the case for models with potential to generalize across different tasks within materials science - so-called "foundation models" - grows with ambitions. This manuscript reviews our rece… ▽ More Artificial intelligence and machine learning have shown great promise in their ability to accelerate novel materials discovery. As researchers and domain scientists seek to unify and consolidate chemical knowledge, the case for models with potential to generalize across different tasks within materials science - so-called "foundation models" - grows with ambitions. This manuscript reviews our recent progress with development of Open MatSci ML Toolkit, and details experiments that lay the groundwork for foundation model research and development with our framework. First, we describe and characterize a new pretraining task that uses synthetic data generated from symmetry operations, and reveal complex training dynamics at large scales. Using the pretrained model, we discuss a number of use cases relevant to foundation model development: semantic architecture of datasets, and fine-tuning for property prediction and classification. Our key results show that for simple applications, pretraining appears to provide worse modeling performance than training models from random initialization. However, for more complex instances, such as when a model is required to learn across multiple datasets and types of targets simultaneously, the inductive bias from pretraining provides significantly better performance. This insight will hopefully inform subsequent efforts into creating foundation models for materials science applications. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: 17 pages, 7 figures, 1 table. Accepted paper/presentation at the AI4Science workshop at Super Computing '23

arXiv:2310.02206 [pdf, other]

Chunking: Continual Learning is not just about Distribution Shift

Authors: Thomas L. Lee, Amos Storkey

Abstract: Work on continual learning (CL) has thus far largely focused on the problems arising from shifts in the data distribution. However, CL can be decomposed into two sub-problems: (a) shifts in the data distribution, and (b) dealing with the fact that the data is split into chunks and so only a part of the data is available to be trained on at any point in time. In this work, we look at the latter sub… ▽ More Work on continual learning (CL) has thus far largely focused on the problems arising from shifts in the data distribution. However, CL can be decomposed into two sub-problems: (a) shifts in the data distribution, and (b) dealing with the fact that the data is split into chunks and so only a part of the data is available to be trained on at any point in time. In this work, we look at the latter sub-problem, the chunking of data. We show that chunking is an important part of CL, accounting for around half of the performance drop from offline learning in our experiments. Furthermore, our results reveal that current CL algorithms do not address the chunking sub-problem, only performing as well as plain SGD training when there is no shift in the data distribution. Therefore, we show that chunking is both an important and currently unaddressed sub-problem and until it is addressed CL methods will be capped in performance. Additionally, we analyse why performance drops when learning occurs on identically distributed chunks of data, and find that forgetting, which is often seen to be a problem due to distribution shift, still arises and is a significant problem. We also show that performance on the chunking sub-problem can be increased and that this performance transfers to the full CL setting, where there is distribution shift. Hence, we argue that work on chunking can help advance CL in general. △ Less

Submitted 11 July, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

Comments: Published at the 3rd Conference on Lifelong Learning Agents (CoLLAs), 2024

arXiv:2309.14449 [pdf, other]

Explaining the Chemical Inventory of Orion KL through Machine Learning

Authors: Haley N. Scolati, Anthony J. Remijan, Eric Herbst, Brett A. McGuire, Kin Long Kelvin Lee

Abstract: The interplay of the chemistry and physics that exists within astrochemically relevant sources can only be fully appreciated if we can gain a holistic understanding of their chemical inventories. Previous work by Lee et al. (2021) demonstrated the capabilities of simple regression models to reproduce the abundances of the chemical inventory of the Taurus Molecular Cloud 1 (TMC-1), as well as provi… ▽ More The interplay of the chemistry and physics that exists within astrochemically relevant sources can only be fully appreciated if we can gain a holistic understanding of their chemical inventories. Previous work by Lee et al. (2021) demonstrated the capabilities of simple regression models to reproduce the abundances of the chemical inventory of the Taurus Molecular Cloud 1 (TMC-1), as well as provide abundance predictions for new candidate molecules. It remains to be seen, however, to what degree TMC-1 is a ``unicorn'' in astrochemistry, where the simplicity of its chemistry and physics readily facilitates characterization with simple machine learning models. Here we present an extension in chemical complexity to a heavily studied high-mass star forming region: the Orion Kleinmann-Low (Orion KL) nebula. Unlike TMC-1, Orion KL is composed of several structurally distinct environments that differ chemically and kinematically, wherein the column densities of molecules between these components can have non-linear correlations that cause the unexpected appearance or even lack of likely species in various environments. This proof-of-concept study used similar regression models sampled by Lee et al. (2021) to accurately reproduce the column densities from the XCLASS fitting program presented in Crockett et al. (2014). △ Less

Submitted 25 September, 2023; originally announced September 2023.

Comments: 14 pages; 6 figures, 1 table in the main text. 0 figures, 1 table in the appendix. Accepted for publication in The Astrophysical Journal. Molecular dataset for machine learning can be found in the Zenodo repository here: https://zenodo.org/record/7675609

arXiv:2309.12460 [pdf]

Multimodal Deep Learning for Scientific Imaging Interpretation

Authors: Abdulelah S. Alshehri, Franklin L. Lee, Shihu Wang

Abstract: In the domain of scientific imaging, interpreting visual data often demands an intricate combination of human expertise and deep comprehension of the subject materials. This study presents a novel methodology to linguistically emulate and subsequently evaluate human-like interactions with Scanning Electron Microscopy (SEM) images, specifically of glass materials. Leveraging a multimodal deep learn… ▽ More In the domain of scientific imaging, interpreting visual data often demands an intricate combination of human expertise and deep comprehension of the subject materials. This study presents a novel methodology to linguistically emulate and subsequently evaluate human-like interactions with Scanning Electron Microscopy (SEM) images, specifically of glass materials. Leveraging a multimodal deep learning framework, our approach distills insights from both textual and visual data harvested from peer-reviewed articles, further augmented by the capabilities of GPT-4 for refined data synthesis and evaluation. Despite inherent challenges--such as nuanced interpretations and the limited availability of specialized datasets--our model (GlassLLaVA) excels in crafting accurate interpretations, identifying key features, and detecting defects in previously unseen SEM images. Moreover, we introduce versatile evaluation metrics, suitable for an array of scientific imaging applications, which allows for benchmarking against research-grounded answers. Benefiting from the robustness of contemporary Large Language Models, our model adeptly aligns with insights from research papers. This advancement not only underscores considerable progress in bridging the gap between human and machine interpretation in scientific imaging, but also hints at expansive avenues for future research and broader application. △ Less

Submitted 25 September, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

Report number: NTR208745

arXiv:2309.11816 [pdf, other]

Designing Loving-Kindness Meditation in Virtual Reality for Long-Distance Romantic Relationships

Authors: Xian Wang, Xiaoyu Mo, Lik-Hang Lee, Xiaoying Wei, Xiaofu Jin, Mingming Fan, Pan Hui

Abstract: Loving-kindness meditation (LKM) is used in clinical psychology for couples' relationship therapy, but physical isolation can make the relationship more strained and inaccessible to LKM. Virtual reality (VR) can provide immersive LKM activities for long-distance couples. However, no suitable commercial VR applications for couples exist to engage in LKM activities of long-distance. This paper organ… ▽ More Loving-kindness meditation (LKM) is used in clinical psychology for couples' relationship therapy, but physical isolation can make the relationship more strained and inaccessible to LKM. Virtual reality (VR) can provide immersive LKM activities for long-distance couples. However, no suitable commercial VR applications for couples exist to engage in LKM activities of long-distance. This paper organized a series of workshops with couples to build a prototype of a couple-preferred LKM app. Through analysis of participants' design works and semi-structured interviews, we derived design considerations for such VR apps and created a prototype for couples to experience. We conducted a study with couples to understand their experiences of performing LKM using the VR prototype and a traditional video conferencing tool. Results show that LKM session utilizing both tools has a positive effect on the intimate relationship and the VR prototype is a more preferable tool for long-term use. We believe our experience can inform future researchers. △ Less

Submitted 21 September, 2023; originally announced September 2023.

Showing 1–50 of 546 results for author: Lee, L