subscribe to arXiv mailings

DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation

Authors: Jiwook Kim, Seonho Lee, Jaeyo Shin, Jiho Choi, Hyunjung Shim

Abstract: Score distillation sampling (SDS) has emerged as an effective framework in text-driven 3D editing tasks due to its inherent 3D consistency. However, existing SDS-based 3D editing methods suffer from extensive training time and lead to low-quality results, primarily because these methods deviate from the sampling dynamics of diffusion models. In this paper, we propose DreamCatalyst, a novel framewo… ▽ More Score distillation sampling (SDS) has emerged as an effective framework in text-driven 3D editing tasks due to its inherent 3D consistency. However, existing SDS-based 3D editing methods suffer from extensive training time and lead to low-quality results, primarily because these methods deviate from the sampling dynamics of diffusion models. In this paper, we propose DreamCatalyst, a novel framework that interprets SDS-based editing as a diffusion reverse process. Our objective function considers the sampling dynamics, thereby making the optimization process of DreamCatalyst an approximation of the diffusion reverse process in editing tasks. DreamCatalyst aims to reduce training time and improve editing quality. DreamCatalyst presents two modes: (1) a faster mode, which edits the NeRF scene in only about 25 minutes, and (2) a high-quality mode, which produces superior results in less than 70 minutes. Specifically, our high-quality mode outperforms current state-of-the-art NeRF editing methods both in terms of speed and quality. See more extensive results on our project page: https://dream-catalyst.github.io. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.11365 [pdf, other]

Team HYU ASML ROBOVOX SP Cup 2024 System Description

Authors: Jeong-Hwan Choi, Gaeun Kim, Hee-Jae Lee, Seyun Ahn, Hyun-Soo Kim, Joon-Hyuk Chang

Abstract: This report describes the submission of HYU ASML team to the IEEE Signal Processing Cup 2024 (SP Cup 2024). This challenge, titled "ROBOVOX: Far-Field Speaker Recognition by a Mobile Robot," focuses on speaker recognition using a mobile robot in noisy and reverberant conditions. Our solution combines the result of deep residual neural networks and time-delay neural network-based speaker embedding… ▽ More This report describes the submission of HYU ASML team to the IEEE Signal Processing Cup 2024 (SP Cup 2024). This challenge, titled "ROBOVOX: Far-Field Speaker Recognition by a Mobile Robot," focuses on speaker recognition using a mobile robot in noisy and reverberant conditions. Our solution combines the result of deep residual neural networks and time-delay neural network-based speaker embedding models. These models were trained on a diverse dataset that includes French speech. To account for the challenging evaluation environment characterized by high noise, reverberation, and short speech conditions, we focused on data augmentation and training speech duration for the speaker embedding model. Our submission achieved second place on the SP Cup 2024 public leaderboard, with a detection cost function of 0.5245 and an equal error rate of 6.46%. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: Technical report for IEEE Signal Processing Cup 2024, 9 pages

arXiv:2407.10461 [pdf, ps, other]

Multibeam Satellite Communications with Massive MIMO: Asymptotic Performance Analysis and Design Insights

Authors: Seyong Kim, Jinseok Choi, Wonjae Shin, Namyoon Lee, Jeonghun Park

Abstract: To achieve high performance without substantial overheads associated with channel state information (CSI) of ground users, we consider a fixed-beam precoding approach, where a satellite forms multiple fixed-beams without relying on CSI, then select a suitable user set for each beam. Upon this precoding method, we put forth a satellite equipped with massive multiple-input multiple-output (MIMO), by… ▽ More To achieve high performance without substantial overheads associated with channel state information (CSI) of ground users, we consider a fixed-beam precoding approach, where a satellite forms multiple fixed-beams without relying on CSI, then select a suitable user set for each beam. Upon this precoding method, we put forth a satellite equipped with massive multiple-input multiple-output (MIMO), by which inter-beam interference is efficiently mitigated by narrowing corresponding beam width. By modeling the ground users' locations via a Poisson point process, we rigorously analyze the achievable performance of the presented multibeam satellite system. In particular, we investigate the asymptotic scaling laws that reveal the interplay between the user density, the number of beams, and the number of antennas. Our analysis offers critical design insights for the multibeam satellite with massive MIMO: i) If the user density scales in power with the number of antennas, the considered precoding can achieve a linear fraction of the optimal rate in the asymptotic regime. ii) A certain additional scaling factor for the user density is needed as the number of beams increases to maintain the asymptotic optimality. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.08586 [pdf, other]

Centrality dependence of Lévy-stable two-pion Bose-Einstein correlations in $\sqrt{s_{_{NN}}}=200$ GeV Au$+$Au collisions

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, A. Adare, C. Aidala, N. N. Ajitanand, Y. Akiba, R. Akimoto, H. Al-Ta'ani, J. Alexander, A. Angerami, K. Aoki, N. Apadula, Y. Aramaki, H. Asano, E. C. Aschenauer, E. T. Atomssa, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, B. Bannier, K. N. Barish, B. Bassalleck, S. Bathe , et al. (377 additional authors not shown)

Abstract: The PHENIX experiment measured the centrality dependence of two-pion Bose-Einstein correlation functions in $\sqrt{s_{_{NN}}}=200$~GeV Au$+$Au collisions at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory. The data are well represented by Lévy-stable source distributions. The extracted source parameters are the correlation-strength parameter $λ$, the Lévy index of stability… ▽ More The PHENIX experiment measured the centrality dependence of two-pion Bose-Einstein correlation functions in $\sqrt{s_{_{NN}}}=200$~GeV Au$+$Au collisions at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory. The data are well represented by Lévy-stable source distributions. The extracted source parameters are the correlation-strength parameter $λ$, the Lévy index of stability $α$, and the Lévy-scale parameter $R$ as a function of transverse mass $m_T$ and centrality. The $λ(m_T)$ parameter is constant at larger values of $m_T$, but decreases as $m_T$ decreases. The Lévy scale parameter $R(m_T)$ decreases with $m_T$ and exhibits proportionality to the length scale of the nuclear overlap region. The Lévy exponent $α(m_T)$ is independent of $m_T$ within uncertainties in each investigated centrality bin, but shows a clear centrality dependence. At all centralities, the Lévy exponent $α$ is significantly different from that of Gaussian ($α=2$) or Cauchy ($α=1$) source distributions. Comparisons to the predictions of Monte-Carlo simulations of resonance-decay chains show that in all but the most peripheral centrality class (50%-60%), the obtained results are inconsistent with the measurements, unless a significant reduction of the in-medium mass of the $η'$ meson is included. In each centrality class, the best value of the in-medium $η'$ mass is compared to the mass of the $η$ meson, as well as to several theoretical predictions that consider restoration of $U_A(1)$ symmetry in hot hadronic matter. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 401 authors from 75 institutions, 20 pages, 15 figures, 2 tables. v1 is version submitted to Physical Review C. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

arXiv:2407.08417 [pdf, other]

Unveiling the Potential of BERTopic for Multilingual Fake News Analysis -- Use Case: Covid-19

Authors: Karla Schäfer, Jeong-Eun Choi, Inna Vogel, Martin Steinebach

Abstract: Topic modeling is frequently being used for analysing large text corpora such as news articles or social media data. BERTopic, consisting of sentence embedding, dimension reduction, clustering, and topic extraction, is the newest and currently the SOTA topic modeling method. However, current topic modeling methods have room for improvement because, as unsupervised methods, they require careful tun… ▽ More Topic modeling is frequently being used for analysing large text corpora such as news articles or social media data. BERTopic, consisting of sentence embedding, dimension reduction, clustering, and topic extraction, is the newest and currently the SOTA topic modeling method. However, current topic modeling methods have room for improvement because, as unsupervised methods, they require careful tuning and selection of hyperparameters, e.g., for dimension reduction and clustering. This paper aims to analyse the technical application of BERTopic in practice. For this purpose, it compares and selects different methods and hyperparameters for each stage of BERTopic through density based clustering validation and six different topic coherence measures. Moreover, it also aims to analyse the results of topic modeling on real world data as a use case. For this purpose, the German fake news dataset (GermanFakeNCovid) on Covid-19 was created by us and in order to experiment with topic modeling in a multilingual (English and German) setting combined with the FakeCovid dataset. With the final results, we were able to determine thematic similarities between the United States and Germany. Whereas, distinguishing the topics of fake news from India proved to be more challenging. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: Accepted at the Workshop on Representation Learning and Clustering (RLC) at the 17th ACM International WSDM Conference in 2024

arXiv:2407.07313 [pdf, other]

ESM+: Modern Insights into Perspective on Text-to-SQL Evaluation in the Age of Large Language Models

Authors: Benjamin Ascoli, Ram Kandikonda, Jinho D. Choi

Abstract: The task of Text-to-SQL enables anyone to retrieve information from SQL databases using natural language. Despite several challenges, recent models have made remarkable advancements in this task using large language models (LLMs). Interestingly, we find that LLM-based models without fine-tuning exhibit distinct natures compared to their fine-tuned counterparts, leading to inadequacies in current e… ▽ More The task of Text-to-SQL enables anyone to retrieve information from SQL databases using natural language. Despite several challenges, recent models have made remarkable advancements in this task using large language models (LLMs). Interestingly, we find that LLM-based models without fine-tuning exhibit distinct natures compared to their fine-tuned counterparts, leading to inadequacies in current evaluation metrics to accurately convey their performance. Thus, we analyze the two primary metrics, Test Suite Execution Accuracy (EXE) and Exact Set Matching Accuracy (ESM), to examine their robustness for this task and address shortcomings. We compare the performance of 9 LLM-based models using EXE, the original ESM, and our improved ESM (called ESM+). Our results show that EXE and ESM have high false positive and negative rates of 11.3% and 13.9%, while ESM+ gives those of 0.1% and 2.6% respectively, providing a significantly more stable evaluation. We release the ESM+ script as open-source for the community to contribute, while enjoying a more reliable assessment of Text-to-SQL. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.05951 [pdf]

Purcell enhancement and spin spectroscopy of silicon vacancy centers in silicon carbide using an ultra-small mode-volume plasmonic cavity

Authors: Jae-Pil So, Jialun Luo, Jaehong Choi, Brendan McCullian, Gregory D. Fuchs

Abstract: Silicon vacancy (V$_{Si}$) centers in 4H-silicon carbide have emerged as a strong candidate for quantum networking applications due to their robust electronic and optical properties including a long spin coherence lifetime and bright, stable emission. Here, we report the integration of V$_{Si}$ centers with a plasmonic nanocavity to Purcell enhance the emission, which is critical for scalable quan… ▽ More Silicon vacancy (V$_{Si}$) centers in 4H-silicon carbide have emerged as a strong candidate for quantum networking applications due to their robust electronic and optical properties including a long spin coherence lifetime and bright, stable emission. Here, we report the integration of V$_{Si}$ centers with a plasmonic nanocavity to Purcell enhance the emission, which is critical for scalable quantum networking. Employing a simple fabrication process, we demonstrate plasmonic cavities that support a nanoscale mode volume and exhibit an increase in the spontaneous emission rate with a measured Purcell factor of up to 48. In addition to investigating the optical resonance modes, we demonstrate that an improvement in the optical stability of the spin-preserving resonant optical transitions relative to the radiation-limited value. The results highlight the potential of nanophotonic structures for advancing quantum networking technologies and emphasizes the importance of optimizing emitter-cavity interactions for efficient quantum photonic applications. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 27 pages in manuscript format including supplement

arXiv:2407.05516 [pdf, other]

Differentiable Modal Synthesis for Physical Modeling of Planar String Sound and Motion Simulation

Authors: Jin Woo Lee, Jaehyun Park, Min Jun Choi, Kyogu Lee

Abstract: While significant advancements have been made in music generation and differentiable sound synthesis within machine learning and computer audition, the simulation of instrument vibration guided by physical laws has been underexplored. To address this gap, we introduce a novel model for simulating the spatio-temporal motion of nonlinear strings, integrating modal synthesis and spectral modeling wit… ▽ More While significant advancements have been made in music generation and differentiable sound synthesis within machine learning and computer audition, the simulation of instrument vibration guided by physical laws has been underexplored. To address this gap, we introduce a novel model for simulating the spatio-temporal motion of nonlinear strings, integrating modal synthesis and spectral modeling within a neural network framework. Our model leverages physical properties and fundamental frequencies as inputs, outputting string states across time and space that solve the partial differential equation characterizing the nonlinear string. Empirical evaluations demonstrate that the proposed architecture achieves superior accuracy in string motion simulation compared to existing baseline architectures. The code and demo are available online. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.04494 [pdf, other]

Quantal phase of extreme nonstatic light waves: Step-phase evolution and its effects

Authors: Jeong Ryeol Choi

Abstract: The phases are the main factor that affects the outcome of various optical phenomena, such as quantum superposition, wave interference, and light-matter interaction. As a light wave becomes nonstatic, an additional phase, the so-called geometric phase, takes place in its evolution. Then, due to this phase, the overall phase of the quantum wave function varies in a nonlinear way with time. Interest… ▽ More The phases are the main factor that affects the outcome of various optical phenomena, such as quantum superposition, wave interference, and light-matter interaction. As a light wave becomes nonstatic, an additional phase, the so-called geometric phase, takes place in its evolution. Then, due to this phase, the overall phase of the quantum wave function varies in a nonlinear way with time. Interestingly, the phase exhibits a step-like evolution if the measure of nonstaticity is extremely high. Such an abnormal phase variation is analyzed in detail for better understanding of wave nonstaticity in this work. As the wave becomes highly nonstatic, the phase factor of the electromagnetic wave evolves in a rectangular manner. However, the shape of the electromagnetic field is still a sinusoidal form on account of the compensational variation of the wave amplitude. The electromagnetic field in this case very much resembles that of a standing wave. The effects accompanying the step-phase evolution, such as modification of the probability distribution and alteration of the wave-interference profile, are analyzed and their implications are illustrated. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 24 pages, 9 figures

arXiv:2407.03051 [pdf, other]

Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment

Authors: Janghwan Lee, Seongmin Park, Sukjin Hong, Minsoo Kim, Du-Seong Chang, Jungwook Choi

Abstract: The rapid advancement of large language models (LLMs) has facilitated their transformation into conversational chatbots that can grasp contextual nuances and generate pertinent sentences, closely mirroring human values through advanced techniques such as instruction tuning and reinforcement learning from human feedback (RLHF). However, the computational efficiency required for LLMs, achieved throu… ▽ More The rapid advancement of large language models (LLMs) has facilitated their transformation into conversational chatbots that can grasp contextual nuances and generate pertinent sentences, closely mirroring human values through advanced techniques such as instruction tuning and reinforcement learning from human feedback (RLHF). However, the computational efficiency required for LLMs, achieved through techniques like post-training quantization (PTQ), presents challenges such as token-flipping that can impair chatbot performance. In response, we propose a novel preference alignment approach, quantization-aware direct preference optimization (QDPO), that aligns quantized LLMs with their full-precision counterparts, improving conversational abilities. Evaluated on two instruction-tuned LLMs in various languages, QDPO demonstrated superior performance in improving conversational abilities compared to established PTQ and knowledge-distillation fine-tuning techniques, marking a significant step forward in the development of efficient and effective conversational LLMs. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2406.19634 [pdf, other]

CLOi-Mapper: Consistent, Lightweight, Robust, and Incremental Mapper With Embedded Systems for Commercial Robot Services

Authors: DongKi Noh, Hyungtae Lim, Gyuho Eoh, Duckyu Choi, Jeongsik Choi, Hyunjun Lim, SeungMin Baek, Hyun Myung

Abstract: In commercial autonomous service robots with several form factors, simultaneous localization and mapping (SLAM) is an essential technology for providing proper services such as cleaning and guidance. Such robots require SLAM algorithms suitable for specific applications and environments. Hence, several SLAM frameworks have been proposed to address various requirements in the past decade. However,… ▽ More In commercial autonomous service robots with several form factors, simultaneous localization and mapping (SLAM) is an essential technology for providing proper services such as cleaning and guidance. Such robots require SLAM algorithms suitable for specific applications and environments. Hence, several SLAM frameworks have been proposed to address various requirements in the past decade. However, we have encountered challenges in implementing recent innovative frameworks when handling service robots with low-end processors and insufficient sensor data, such as low-resolution 2D LiDAR sensors. Specifically, regarding commercial robots, consistent performance in different hardware configurations and environments is more crucial than the performance dedicated to specific sensors or environments. Therefore, we propose a) a multi-stage %hierarchical approach for global pose estimation in embedded systems; b) a graph generation method with zero constraints for synchronized sensors; and c) a robust and memory-efficient method for long-term pose-graph optimization. As verified in in-home and large-scale indoor environments, the proposed method yields consistent global pose estimation for services in commercial fields. Furthermore, the proposed method exhibits potential commercial viability considering the consistent performance verified via mass production and long-term (> 5 years) operation. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Journal ref: IEEE Robotics and Automation Letters, 2024

arXiv:2406.17256 [pdf, other]

Disentangled Motion Modeling for Video Frame Interpolation

Authors: Jaihyun Lew, Jooyoung Choi, Chaehun Shin, Dahuin Jung, Sungroh Yoon

Abstract: Video frame interpolation (VFI) aims to synthesize intermediate frames in between existing frames to enhance visual smoothness and quality. Beyond the conventional methods based on the reconstruction loss, recent works employ the high quality generative models for perceptual quality. However, they require complex training and large computational cost for modeling on the pixel space. In this paper,… ▽ More Video frame interpolation (VFI) aims to synthesize intermediate frames in between existing frames to enhance visual smoothness and quality. Beyond the conventional methods based on the reconstruction loss, recent works employ the high quality generative models for perceptual quality. However, they require complex training and large computational cost for modeling on the pixel space. In this paper, we introduce disentangled Motion Modeling (MoMo), a diffusion-based approach for VFI that enhances visual quality by focusing on intermediate motion modeling. We propose disentangled two-stage training process, initially training a frame synthesis model to generate frames from input pairs and their optical flows. Subsequently, we propose a motion diffusion model, equipped with our novel diffusion U-Net architecture designed for optical flow, to produce bi-directional flows between frames. This method, by leveraging the simpler low-frequency representation of motions, achieves superior perceptual quality with reduced computational demands compared to generative modeling methods on the pixel space. Our method surpasses state-of-the-art methods in perceptual metrics across various benchmarks, demonstrating its efficacy and efficiency in VFI. Our code is available at: https://github.com/JHLew/MoMo △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.15996 [pdf, other]

Memorizing Documents with Guidance in Large Language Models

Authors: Bumjin Park, Jaesik Choi

Abstract: Training data plays a pivotal role in AI models. Large language models (LLMs) are trained with massive amounts of documents, and their parameters hold document-related contents. Recently, several studies identified content-specific locations in LLMs by examining the parameters. Instead of the post hoc interpretation, we propose another approach. We propose document-wise memory architecture to trac… ▽ More Training data plays a pivotal role in AI models. Large language models (LLMs) are trained with massive amounts of documents, and their parameters hold document-related contents. Recently, several studies identified content-specific locations in LLMs by examining the parameters. Instead of the post hoc interpretation, we propose another approach. We propose document-wise memory architecture to track document memories in training. The proposed architecture maps document representations to memory entries, which softly mask memories in the forward process of LLMs. Additionally, we propose document guidance loss, which increases the likelihood of text with document memories and reduces the likelihood of the text with the memories of other documents. Experimental results on Wikitext-103-v1 with Pythia-1B show that the proposed methods provide different memory entries for documents and high recall of document-related content in generation with trained document-wise memories. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: IJCAI 2024

arXiv:2406.15635 [pdf, other]

DataFreeShield: Defending Adversarial Attacks without Training Data

Authors: Hyeyoon Lee, Kanghyun Choi, Dain Kwon, Sunjong Park, Mayoore Selvarasa Jaiswal, Noseong Park, Jonghyun Choi, Jinho Lee

Abstract: Recent advances in adversarial robustness rely on an abundant set of training data, where using external or additional datasets has become a common setting. However, in real life, the training data is often kept private for security and privacy issues, while only the pretrained weight is available to the public. In such scenarios, existing methods that assume accessibility to the original data bec… ▽ More Recent advances in adversarial robustness rely on an abundant set of training data, where using external or additional datasets has become a common setting. However, in real life, the training data is often kept private for security and privacy issues, while only the pretrained weight is available to the public. In such scenarios, existing methods that assume accessibility to the original data become inapplicable. Thus we investigate the pivotal problem of data-free adversarial robustness, where we try to achieve adversarial robustness without accessing any real data. Through a preliminary study, we highlight the severity of the problem by showing that robustness without the original dataset is difficult to achieve, even with similar domain datasets. To address this issue, we propose DataFreeShield, which tackles the problem from two perspectives: surrogate dataset generation and adversarial training using the generated data. Through extensive validation, we show that DataFreeShield outperforms baselines, demonstrating that the proposed method sets the first entirely data-free solution for the adversarial robustness problem. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: ICML 2024

arXiv:2406.15062 [pdf, other]

Decoupled static and dynamical charge correlations in La$_{2-x}$Sr$_x$CuO$_4$

Authors: L. Martinelli, I. Biało, X. Hong, J. Oppliger, C. Lin, T. Schaller, J. Küspert, M. H. Fischer, T. Kurosawa, N. Momono, M. Oda, J. Choi, S. Agrestini, M. Garcia-Fernandez, Ke-Jin Zhou, Q. Wang, J. Chang

Abstract: The relation between charge order, its quantum fluctuations and optical phonon modes in cuprate superconductors remains an unsolved problem. The exploration of these excitations is however complicated by the presence of twinned domains. Here, we use uniaxial strain in combination with ultra-high-resolution Resonant Inelastic X-ray Scattering (RIXS) at the oxygen K- and copper L3-edges to study the… ▽ More The relation between charge order, its quantum fluctuations and optical phonon modes in cuprate superconductors remains an unsolved problem. The exploration of these excitations is however complicated by the presence of twinned domains. Here, we use uniaxial strain in combination with ultra-high-resolution Resonant Inelastic X-ray Scattering (RIXS) at the oxygen K- and copper L3-edges to study the excitations stemming from the charge ordering wave vector in La1.875Sr0.125CuO4. By detwinning stripe ordering, we demonstrate that the optical phonon anomalies do not show any stripe anisotropy. The low-energy charge excitations also retain an in-plane four-fold symmetry. As such, we find that both phonon and charge excitations are decoupled entirely from the strength of static charge ordering. The almost isotropic character of charge excitations remains a possible source for the strange metal properties found in the normal state of cuprate superconductors. △ Less

Submitted 15 July, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

Comments: 9 pages, 4 figures

arXiv:2406.12909 [pdf, other]

Scalable Training of Graph Foundation Models for Atomistic Materials Modeling: A Case Study with HydraGNN

Authors: Massimiliano Lupo Pasini, Jong Youl Choi, Kshitij Mehta, Pei Zhang, David Rogers, Jonghyun Bae, Khaled Z. Ibrahim, Ashwin M. Aji, Karl W. Schulz, Jorda Polo, Prasanna Balaprakash

Abstract: We present our work on developing and training scalable graph foundation models (GFM) using HydraGNN, a multi-headed graph convolutional neural network architecture. HydraGNN expands the boundaries of graph neural network (GNN) in both training scale and data diversity. It abstracts over message passing algorithms, allowing both reproduction of and comparison across algorithmic innovations that de… ▽ More We present our work on developing and training scalable graph foundation models (GFM) using HydraGNN, a multi-headed graph convolutional neural network architecture. HydraGNN expands the boundaries of graph neural network (GNN) in both training scale and data diversity. It abstracts over message passing algorithms, allowing both reproduction of and comparison across algorithmic innovations that define convolution in GNNs. This work discusses a series of optimizations that have allowed scaling up the GFM training to tens of thousands of GPUs on datasets that consist of hundreds of millions of graphs. Our GFMs use multi-task learning (MTL) to simultaneously learn graph-level and node-level properties of atomistic structures, such as the total energy and atomic forces. Using over 150 million atomistic structures for training, we illustrate the performance of our approach along with the lessons learned on two United States Department of Energy (US-DOE) supercomputers, namely the Perlmutter petascale system at the National Energy Research Scientific Computing Center and the Frontier exascale system at Oak Ridge National Laboratory. The HydraGNN architecture enables the GFM to achieve near-linear strong scaling performance using more than 2,000 GPUs on Perlmutter and 16,000 GPUs on Frontier. Hyperparameter optimization (HPO) was performed on over 64,000 GPUs on Frontier to select GFM architectures with high accuracy. Early stopping was applied on each GFM architecture for energy awareness in performing such an extreme-scale task. The training of an ensemble of highest-ranked GFM architectures continued until convergence to establish uncertainty quantification (UQ) capabilities with ensemble learning. Our contribution opens the door for rapidly developing, training, and deploying GFMs using large-scale computational resources to enable AI-accelerated materials discovery and design. △ Less

Submitted 28 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: 16 pages, 13 figures

MSC Class: 68T07; 68T09 ACM Class: C.2.4; I.2.11

arXiv:2406.12402 [pdf, other]

Flee the Flaw: Annotating the Underlying Logic of Fallacious Arguments Through Templates and Slot-filling

Authors: Irfan Robbani, Paul Reisert, Naoya Inoue, Surawat Pothong, Camélia Guerraoui, Wenzhi Wang, Shoichi Naito, Jungmin Choi, Kentaro Inui

Abstract: Prior research in computational argumentation has mainly focused on scoring the quality of arguments, with less attention on explicating logical errors. In this work, we introduce four sets of explainable templates for common informal logical fallacies designed to explicate a fallacy's implicit logic. Using our templates, we conduct an annotation study on top of 400 fallacious arguments taken from… ▽ More Prior research in computational argumentation has mainly focused on scoring the quality of arguments, with less attention on explicating logical errors. In this work, we introduce four sets of explainable templates for common informal logical fallacies designed to explicate a fallacy's implicit logic. Using our templates, we conduct an annotation study on top of 400 fallacious arguments taken from LOGIC dataset and achieve a high agreement score (Krippendorf's alpha of 0.54) and reasonable coverage (0.83). Finally, we conduct an experiment for detecting the structure of fallacies and discover that state-of-the-art language models struggle with detecting fallacy templates (0.47 accuracy). To facilitate research on fallacies, we make our dataset and guidelines publicly available. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12233 [pdf, other]

SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization

Authors: Young Jin Ahn, Jungwoo Park, Sangha Park, Jonghyun Choi, Kee-Eung Kim

Abstract: Visual Speech Recognition (VSR) stands at the intersection of computer vision and speech recognition, aiming to interpret spoken content from visual cues. A prominent challenge in VSR is the presence of homophenes-visually similar lip gestures that represent different phonemes. Prior approaches have sought to distinguish fine-grained visemes by aligning visual and auditory semantics, but often fel… ▽ More Visual Speech Recognition (VSR) stands at the intersection of computer vision and speech recognition, aiming to interpret spoken content from visual cues. A prominent challenge in VSR is the presence of homophenes-visually similar lip gestures that represent different phonemes. Prior approaches have sought to distinguish fine-grained visemes by aligning visual and auditory semantics, but often fell short of full synchronization. To address this, we present SyncVSR, an end-to-end learning framework that leverages quantized audio for frame-level crossmodal supervision. By integrating a projection layer that synchronizes visual representation with acoustic data, our encoder learns to generate discrete audio tokens from a video sequence in a non-autoregressive manner. SyncVSR shows versatility across tasks, languages, and modalities at the cost of a forward pass. Our empirical evaluations show that it not only achieves state-of-the-art results but also reduces data usage by up to ninefold. △ Less