subscribe to arXiv mailings

Fast Ensembling with Diffusion Schrödinger Bridge

Authors: Hyunsu Kim, Jongmin Yoon, Juho Lee

Abstract: Deep Ensemble (DE) approach is a straightforward technique used to enhance the performance of deep neural networks by training them from different initial points, converging towards various local optima. However, a limitation of this methodology lies in its high computational overhead for inference, arising from the necessity to store numerous learned parameters and execute individual forward pass… ▽ More Deep Ensemble (DE) approach is a straightforward technique used to enhance the performance of deep neural networks by training them from different initial points, converging towards various local optima. However, a limitation of this methodology lies in its high computational overhead for inference, arising from the necessity to store numerous learned parameters and execute individual forward passes for each parameter during the inference stage. We propose a novel approach called Diffusion Bridge Network (DBN) to address this challenge. Based on the theory of the Schrödinger bridge, this method directly learns to simulate an Stochastic Differential Equation (SDE) that connects the output distribution of a single ensemble member to the output distribution of the ensembled model, allowing us to obtain ensemble prediction without having to invoke forward pass through all the ensemble models. By substituting the heavy ensembles with this lightweight neural network constructing DBN, we achieved inference with reduced computational cost while maintaining accuracy and uncertainty scores on benchmark datasets such as CIFAR-10, CIFAR-100, and TinyImageNet. Our implementation is available at https://github.com/kim-hyunsu/dbn. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Journal ref: ICLR 2024

arXiv:2404.14366 [pdf]

Lessons Learned in Performing a Trustworthy AI and Fundamental Rights Assessment

Authors: Marjolein Boonstra, Frédérick Bruneault, Subrata Chakraborty, Tjitske Faber, Alessio Gallucci, Eleanore Hickman, Gerard Kema, Heejin Kim, Jaap Kooiker, Elisabeth Hildt, Annegret Lamadé, Emilie Wiinblad Mathez, Florian Möslein, Genien Pathuis, Giovanni Sartor, Marijke Steege, Alice Stocco, Willy Tadema, Jarno Tuimala, Isabel van Vledder, Dennis Vetter, Jana Vetter, Magnus Westerlund, Roberto V. Zicari

Abstract: This report shares the experiences, results and lessons learned in conducting a pilot project ``Responsible use of AI'' in cooperation with the Province of Friesland, Rijks ICT Gilde-part of the Ministry of the Interior and Kingdom Relations (BZK) (both in The Netherlands) and a group of members of the Z-Inspection$^{\small{\circledR}}$ Initiative. The pilot project took place from May 2022 throug… ▽ More This report shares the experiences, results and lessons learned in conducting a pilot project ``Responsible use of AI'' in cooperation with the Province of Friesland, Rijks ICT Gilde-part of the Ministry of the Interior and Kingdom Relations (BZK) (both in The Netherlands) and a group of members of the Z-Inspection$^{\small{\circledR}}$ Initiative. The pilot project took place from May 2022 through January 2023. During the pilot, the practical application of a deep learning algorithm from the province of Frŷslan was assessed. The AI maps heathland grassland by means of satellite images for monitoring nature reserves. Environmental monitoring is one of the crucial activities carried on by society for several purposes ranging from maintaining standards on drinkable water to quantifying the CO2 emissions of a particular state or region. Using satellite imagery and machine learning to support decisions is becoming an important part of environmental monitoring. The main focus of this report is to share the experiences, results and lessons learned from performing both a Trustworthy AI assessment using the Z-Inspection$^{\small{\circledR}}$ process and the EU framework for Trustworthy AI, and combining it with a Fundamental Rights assessment using the Fundamental Rights and Algorithms Impact Assessment (FRAIA) as recommended by the Dutch government for the use of AI algorithms by the Dutch public authorities. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: On behalf of the Z-Inspection$^{\small{\circledR}}$ Initiative

arXiv:2404.14220 [pdf, other]

Robust electrothermal switching of optical phase change materials through computer-aided adaptive pulse optimization

Authors: Parth Garud, Kiumars Aryana, Cosmin Constantin Popescu, Steven Vitale, Rashi Sharma, Kathleen Richardson, Tian Gu, Juejun Hu, Hyun Jung Kim

Abstract: Electrically tunable optical devices present diverse functionalities for manipulating electromagnetic waves by leveraging elements capable of reversibly switching between different optical states. This adaptability in adjusting their responses to electromagnetic waves after fabrication is crucial for developing more efficient and compact optical systems for a broad range of applications including… ▽ More Electrically tunable optical devices present diverse functionalities for manipulating electromagnetic waves by leveraging elements capable of reversibly switching between different optical states. This adaptability in adjusting their responses to electromagnetic waves after fabrication is crucial for developing more efficient and compact optical systems for a broad range of applications including sensing, imaging, telecommunications, and data storage. Chalcogenide-based phase change materials (PCMs) have shown great promise due to their stable, non-volatile phase transition between amorphous and crystalline states. Nonetheless, optimizing the switching parameters of PCM devices and maintaining their stable operation over thousands of cycles with minimal variation can be challenging. In this paper, we report on the critical role of PCM pattern as well as electrical pulse form in achieving reliable and stable switching, extending the operational lifetime of the device beyond 13,000 switching events. To achieve this, we have developed a computer-aided algorithm that monitors optical changes in the device and adjusts the applied voltage in accordance with the phase transformation process, thereby significantly enhancing the lifetime of these reconfigurable devices. Our findings reveal that patterned PCM structures show significantly higher endurance compared to blanket PCM thin films. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.12955 [pdf]

Anisotropic electron-phonon interactions in 2D lead-halide perovskites

Authors: Jaco J. Geuchies, Johan Klarbring, Lucia Di Virgillio, Shuai Fu, Sheng Qu, Guangyu Liu, Hai Wang, Jarvist M. Frost, Aron Walsh, Mischa Bonn, Heejae Kim

Abstract: Two-dimensional hybrid organic-inorganic metal halide perovskites offer enhanced stability for perovskite-based applications. Their crystal structure's soft and ionic nature gives rise to strong interactions between charge carriers and ionic rearrangements. Here, we investigate the interaction of photo-generated electrons and ionic polarizations in single-crystal 2D perovskite butylammonium lead i… ▽ More Two-dimensional hybrid organic-inorganic metal halide perovskites offer enhanced stability for perovskite-based applications. Their crystal structure's soft and ionic nature gives rise to strong interactions between charge carriers and ionic rearrangements. Here, we investigate the interaction of photo-generated electrons and ionic polarizations in single-crystal 2D perovskite butylammonium lead iodide, varying the inorganic lammelae thickness in the 2D single crystals. We determined the directionality of the transition dipole moments of the relevant phonon modes (in the 0.3-3 THz range) by angle-and-polarization dependent THz transmission measurements. We find a clear anisotropy of the in-plane photoconductivity, with a 10% reduction along the axis parallel with the transition dipole moment of the most strongly coupled phonon. Detailed calculations, based on Feynman polaron theory, indicate that the anisotropy originates from directional electron-phonon interactions. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: 50 pages, 5 figures main text, 13 figures in supplementary information

arXiv:2404.12542 [pdf, other]

Pinwheel Outflow induced by Stellar Mass Loss in a Coplanar Triple System

Authors: Hyosun Kim, Mark R. Morris, Jongsoo Kim, Jinhua He

Abstract: We develop a physical framework for interpreting complex circumstellar patterns whorled around asymptotic giant branch (AGB) stars by investigating stable, coplanar triple systems using hydrodynamic and particle simulations. The introduction of a close tertiary body causes an additional periodic variation in the orbital velocity and trajectory of the AGB star. As a result, the circumstellar outflo… ▽ More We develop a physical framework for interpreting complex circumstellar patterns whorled around asymptotic giant branch (AGB) stars by investigating stable, coplanar triple systems using hydrodynamic and particle simulations. The introduction of a close tertiary body causes an additional periodic variation in the orbital velocity and trajectory of the AGB star. As a result, the circumstellar outflow builds a fine non-Archimedean spiral pattern superimposed upon the Archimedean spiral produced by the outer binary alone. This fine spiral can be approximated by off-centered circular rings that become tangent to each other at the location of the Archimedean spiral. The superimposed fine pattern fades out relatively quickly as a function of distance from the center of the system, in contrast to the dominant Archimedean spiral pattern, which presents a much slower fractional density decrease with radius. The different rates of radial decrease of the density contrast in the two superimposed patterns, coupled with their different time and spatial scales, lead to an apparent, but illusory radial change in the observed pattern interval, as has been reported, for example, in CW Leo. The function describing the detailed radial dependence of the expansion velocity is different in the two patterns, which may be used to distinguish them. The shape of the circumstellar whorled pattern is further explored as a function of the orbital eccentricity and the inner companion's mass. Although this study is confined to stable, coplanar triple systems, the results are likely applicable to moderately noncoplanar systems and open interesting avenues for studying noncoplanar systems. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 16 pages, 7 figures, 1 table, ApJ in press

arXiv:2404.11972 [pdf, other]

Aligning Language Models to Explicitly Handle Ambiguity

Authors: Hyuhng Joon Kim, Youna Kim, Cheonbok Park, Junyeob Kim, Choonghyun Park, Kang Min Yoo, Sang-goo Lee, Taeuk Kim

Abstract: In interactions between users and language model agents, user utterances frequently exhibit ellipsis (omission of words or phrases) or imprecision (lack of exactness) to prioritize efficiency. This can lead to varying interpretations of the same input based on different assumptions or background knowledge. It is thus crucial for agents to adeptly handle the inherent ambiguity in queries to ensure… ▽ More In interactions between users and language model agents, user utterances frequently exhibit ellipsis (omission of words or phrases) or imprecision (lack of exactness) to prioritize efficiency. This can lead to varying interpretations of the same input based on different assumptions or background knowledge. It is thus crucial for agents to adeptly handle the inherent ambiguity in queries to ensure reliability. However, even state-of-the-art large language models (LLMs) still face challenges in such scenarios, primarily due to the following hurdles: (1) LLMs are not explicitly trained to deal with ambiguous utterances; (2) the degree of ambiguity perceived by the LLMs may vary depending on the possessed knowledge. To address these issues, we propose Alignment with Perceived Ambiguity (APA), a novel pipeline that aligns LLMs to manage ambiguous queries by leveraging their own assessment of ambiguity (i.e., perceived ambiguity). Experimental results on question-answering datasets demonstrate that APA empowers LLMs to explicitly detect and manage ambiguous queries while retaining the ability to answer clear questions. Furthermore, our finding proves that APA excels beyond training with gold-standard labels, especially in out-of-distribution scenarios. △ Less

Submitted 16 June, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.11320 [pdf, other]

Saturated RISE control for considering rotor thrust saturation of fully actuated multirotor

Authors: Dongjae Lee, H. Jin Kim

Abstract: This work proposes a saturated robust controller for a fully actuated multirotor that takes disturbance rejection and rotor thrust saturation into account. A disturbance rejection controller is required to prevent performance degradation in the presence of parametric uncertainty and external disturbance. Furthermore, rotor saturation should be properly addressed in a controller to avoid performanc… ▽ More This work proposes a saturated robust controller for a fully actuated multirotor that takes disturbance rejection and rotor thrust saturation into account. A disturbance rejection controller is required to prevent performance degradation in the presence of parametric uncertainty and external disturbance. Furthermore, rotor saturation should be properly addressed in a controller to avoid performance degradation or even instability due to a gap between the commanded input and the actual input during saturation. To address these issues, we present a modified saturated RISE (Robust Integral of the Sign of the Error) control method. The proposed modified saturated RISE controller is developed for expansion to a system with a non-diagonal, state-dependent input matrix. Next, we present reformulation of the system dynamics of a fully actuated multirotor, and apply the control law to the system. The proposed method is validated in simulation where the proposed controller outperforms the existing one thanks to the capability of handling the input matrix. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 6 pages, 5 figures, 2024 International Conference on Unmanned Aircraft Systems (ICUAS) accepted

arXiv:2404.11310 [pdf, other]

Autonomous aerial perching and unperching using omnidirectional tiltrotor and switching controller

Authors: Dongjae Lee, Sunwoo Hwang, Jeonghyun Byun, Seung Jae Lee, H. Jin Kim

Abstract: Aerial unperching of multirotors has received little attention as opposed to perching that has been investigated to elongate operation time. This study presents a new aerial robot capable of both perching and unperching autonomously on/from a ferromagnetic surface during flight, and a switching controller to avoid rotor saturation and mitigate overshoot during transition between free-flight and pe… ▽ More Aerial unperching of multirotors has received little attention as opposed to perching that has been investigated to elongate operation time. This study presents a new aerial robot capable of both perching and unperching autonomously on/from a ferromagnetic surface during flight, and a switching controller to avoid rotor saturation and mitigate overshoot during transition between free-flight and perching. To enable stable perching and unperching maneuvers on/from a vertical surface, a lightweight ($\approx$ $1$ \si{kg}), fully actuated tiltrotor that can hover at $90^\circ$ pitch angle is first developed. We design a perching/unperching module composed of a single servomotor and a magnet, which is then mounted on the tiltrotor. A switching controller including exclusive control modes for transitions between free-flight and perching is proposed. Lastly, we propose a simple yet effective strategy to ensure robust perching in the presence of measurement and control errors and avoid collisions with the perching site immediately after unperching. We validate the proposed framework in experiments where the tiltrotor successfully performs perching and unperching on/from a vertical surface during flight. We further show effectiveness of the proposed transition mode in the switching controller by ablation studies where large overshoot and even collision with a perching site occur. To the best of the authors' knowledge, this work presents the first autonomous aerial unperching framework using a fully actuated tiltrotor. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 7 pages, 10 figures, 2024 IEEE International Conference on Robotics and Automation (ICRA) accepted

arXiv:2404.11104 [pdf, other]

Object Remover Performance Evaluation Methods using Class-wise Object Removal Images

Authors: Changsuk Oh, Dongseok Shim, Taekbeom Lee, H. Jin Kim

Abstract: Object removal refers to the process of erasing designated objects from an image while preserving the overall appearance, and it is one area where image inpainting is widely used in real-world applications. The performance of an object remover is quantitatively evaluated by measuring the quality of object removal results, similar to how the performance of an image inpainter is gauged. Current work… ▽ More Object removal refers to the process of erasing designated objects from an image while preserving the overall appearance, and it is one area where image inpainting is widely used in real-world applications. The performance of an object remover is quantitatively evaluated by measuring the quality of object removal results, similar to how the performance of an image inpainter is gauged. Current works reporting quantitative performance evaluations utilize original images as references. In this letter, to validate the current evaluation methods cannot properly evaluate the performance of an object remover, we create a dataset with object removal ground truth and compare the evaluations made by the current methods using original images to those utilizing object removal ground truth images. The disparities between two evaluation sets validate that the current methods are not suitable for measuring the performance of an object remover. Additionally, we propose new evaluation methods tailored to gauge the performance of an object remover. The proposed methods evaluate the performance through class-wise object removal results and utilize images without the target class objects as a comparison set. We confirm that the proposed methods can make judgments consistent with human evaluators in the COCO dataset, and that they can produce measurements aligning with those using object removal ground truth in the self-acquired dataset. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.10699 [pdf, other]

ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation

Authors: Iaroslav Melekhov, Anand Umashankar, Hyeong-Jin Kim, Vladislav Serkov, Dusty Argyle

Abstract: We introduce ECLAIR (Extended Classification of Lidar for AI Recognition), a new outdoor large-scale aerial LiDAR dataset designed specifically for advancing research in point cloud semantic segmentation. As the most extensive and diverse collection of its kind to date, the dataset covers a total area of 10$km^2$ with close to 600 million points and features eleven distinct object categories. To g… ▽ More We introduce ECLAIR (Extended Classification of Lidar for AI Recognition), a new outdoor large-scale aerial LiDAR dataset designed specifically for advancing research in point cloud semantic segmentation. As the most extensive and diverse collection of its kind to date, the dataset covers a total area of 10$km^2$ with close to 600 million points and features eleven distinct object categories. To guarantee the dataset's quality and utility, we have thoroughly curated the point labels through an internal team of experts, ensuring accuracy and consistency in semantic labeling. The dataset is engineered to move forward the fields of 3D urban modeling, scene understanding, and utility infrastructure management by presenting new challenges and potential applications. As a benchmark, we report qualitative and quantitative analysis of a voxel-based point cloud segmentation approach based on the Minkowski Engine. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: 11 pages, 7 figures

arXiv:2404.10199 [pdf, other]

CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting

Authors: Huihan Li, Liwei Jiang, Jena D. Huang, Hyunwoo Kim, Sebastin Santy, Taylor Sorensen, Bill Yuchen Lin, Nouha Dziri, Xiang Ren, Yejin Choi

Abstract: As the utilization of large language models (LLMs) has proliferated worldwide, it is crucial for them to have adequate knowledge and fair representation for diverse global cultures. In this work, we uncover culture perceptions of three SOTA models on 110 countries and regions on 8 culture-related topics through culture-conditioned generations, and extract symbols from these generations that are as… ▽ More As the utilization of large language models (LLMs) has proliferated worldwide, it is crucial for them to have adequate knowledge and fair representation for diverse global cultures. In this work, we uncover culture perceptions of three SOTA models on 110 countries and regions on 8 culture-related topics through culture-conditioned generations, and extract symbols from these generations that are associated to each culture by the LLM. We discover that culture-conditioned generation consist of linguistic "markers" that distinguish marginalized cultures apart from default cultures. We also discover that LLMs have an uneven degree of diversity in the culture symbols, and that cultures from different geographic regions have different presence in LLMs' culture-agnostic generation. Our findings promote further research in studying the knowledge and fairness of global culture perception in LLMs. Code and Data can be found in: https://github.com/huihanlhh/Culture-Gen/ △ Less

Submitted 26 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.10050 [pdf, other]

The Cost of Entanglement Renormalization on a Fault-Tolerant Quantum Computer

Authors: Joshua Job, Isaac H. Kim, Eric Johnston, Steve Adachi

Abstract: We perform a detailed resource estimate for the prospect of using deep entanglement renormalization ansatz (DMERA) on a fault-tolerant quantum computer, focusing on the regime in which the target system is large. For probing a relatively large system size ($64\times 64$), we observe up to an order of magnitude reduction in the number of qubits, compared to the approaches based on quantum phase est… ▽ More We perform a detailed resource estimate for the prospect of using deep entanglement renormalization ansatz (DMERA) on a fault-tolerant quantum computer, focusing on the regime in which the target system is large. For probing a relatively large system size ($64\times 64$), we observe up to an order of magnitude reduction in the number of qubits, compared to the approaches based on quantum phase estimation (QPE). We discuss two complementary strategies to measure the energy. The first approach is based on a random sampling of the local terms of the Hamiltonian, requiring $\mathcal{O}(1/ε^2)$ invocations of quantum circuits, each of which have depth of at most $\mathcal{O}(\log N)$, where $ε$ is the relative precision in the energy and $N$ is the system size. The second approach is based on a coherent estimation of the expectation value of observables averaged over space, which achieves the Heisenberg scaling while incurring only a logarithmic cost in the system size. For estimating the energy per site of $ε$, $\mathcal{O}\left(\frac{\log N}ε \right)$ $T$ gates and $\mathcal{O}\left(\log N \right)$ qubits suffice. The constant factor of the leading contribution is shown to be determined by the depth of the DMERA circuit, the gates used in the ansatz, and the periodicity of the circuit. We also derive tight bounds on the variance of the energy gradient, assuming the gates are random Pauli rotations. △ Less

Submitted 16 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

Comments: 21 pages. 12 figures, 2 appendices

arXiv:2404.10046 [pdf, other]

Observation of Cooper-pair density modulation state

Authors: Lingyuan Kong, Michał Papaj, Hyunjin Kim, Yiran Zhang, Eli Baum, Hui Li, Kenji Watanabe, Takashi Taniguchi, Genda Gu, Patrick A. Lee, Stevan Nadj-Perge

Abstract: Superconducting states that break space-group symmetries of the underlying crystal can exhibit nontrivial spatial modulation of the order parameter. Previously, such remarkable states were intimately associated with the breaking of translational symmetry, giving rise to the density-wave orders, with wavelengths spanning several unit cells. However, a related basic concept has been long overlooked:… ▽ More Superconducting states that break space-group symmetries of the underlying crystal can exhibit nontrivial spatial modulation of the order parameter. Previously, such remarkable states were intimately associated with the breaking of translational symmetry, giving rise to the density-wave orders, with wavelengths spanning several unit cells. However, a related basic concept has been long overlooked: when only intra-unit-cell symmetries of the space group are broken, the superconducting states can display a distinct type of nontrivial modulation preserving long-range lattice translation. Here, we refer to this new concept as the pair density modulation (PDM), and report the first observation of a PDM state in exfoliated thin flakes of iron-based superconductor FeTe$_{\text{0.55}}$Se$_{\text{0.45}}$. Using scanning tunneling microscopy, we discover robust superconducting gap modulation with the wavelength corresponding to the lattice periodicity and the amplitude exceeding 30% of the gap average. Importantly, we find that the observed modulation originates from the large difference in superconducting gaps on the two nominally equivalent iron sublattices. The experimental findings, backed up by model calculations, suggest that in contrast to the density-wave orders, the PDM state is driven by the interplay of sublattice symmetry breaking and a peculiar nematic distortion specific to the thin flakes. Our results establish new frontiers for exploring the intertwined orders in strong-correlated electronic systems and open a new chapter for iron-based superconductors. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: Full submission including supplementary information, 4 main figures

arXiv:2404.09409 [pdf, other]

Disorder Chaos in Short-Range, Diluted, and Lévy Spin Glasses

Authors: Wei-Kuo Chen, Heejune Kim, Arnab Sen

Abstract: In a recent breakthrough [arXiv:2301.04112], Chatterjee proved site disorder chaos in the Edwards-Anderson (EA) short-range spin glass model utilizing the Hermite spectral method. In this paper, we demonstrate the further usefulness of this Hermite spectral approach by extending the validity of site disorder chaos in three related spin glass models. The first, called the mixed even $p$-spin shor… ▽ More In a recent breakthrough [arXiv:2301.04112], Chatterjee proved site disorder chaos in the Edwards-Anderson (EA) short-range spin glass model utilizing the Hermite spectral method. In this paper, we demonstrate the further usefulness of this Hermite spectral approach by extending the validity of site disorder chaos in three related spin glass models. The first, called the mixed even $p$-spin short-range model, is a generalization of the EA model where the underlying graph is a deterministic bounded degree hypergraph consisting of hyperedges with even number of vertices. The second model is the diluted mixed $p$-spin model, which is allowed to have hyperedges with both odd and even number of vertices. For both models, our results hold under general symmetric disorder distributions. The main novelty of our argument is played by an elementary algebraic equation for the Fourier-Hermite series coefficients for the two-spin correlation functions. It allows us to deduce necessary geometric conditions to determine the contributing coefficients in the overlap function, which in spirit is the same as the crucial Lemma 1 in [arXiv:2301.04112]. Finally, we also establish disorder chaos in the Lévy model with stable index $α\in (1, 2)$. △ Less

Submitted 13 June, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

Comments: A few paragraphs in the introduction revised for clarity. 24 pages, 3 Figures

MSC Class: 60K35; 82B44

arXiv:2404.08871 [pdf, other]

PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices

Authors: Si Ung Noh, Junguk Hong, Chaemin Lim, Seongyeon Park, Jeehyun Kim, Hanjun Kim, Youngsok Kim, Jinho Lee

Abstract: Recent dual in-line memory modules (DIMMs) are starting to support processing-in-memory (PIM) by associating their memory banks with processing elements (PEs), allowing applications to overcome the data movement bottleneck by offloading memory-intensive operations to the PEs. Many highly parallel applications have been shown to benefit from these PIM-enabled DIMMs, but further speedup is often lim… ▽ More Recent dual in-line memory modules (DIMMs) are starting to support processing-in-memory (PIM) by associating their memory banks with processing elements (PEs), allowing applications to overcome the data movement bottleneck by offloading memory-intensive operations to the PEs. Many highly parallel applications have been shown to benefit from these PIM-enabled DIMMs, but further speedup is often limited by the huge overhead of inter-PE communication. This mainly comes from the slow CPU-mediated inter-PE communication methods which incurs significant performance overheads, making it difficult for PIM-enabled DIMMs to accelerate a wider range of applications. Prior studies have tried to alleviate the communication bottleneck, but they lack enough flexibility and performance to be used for a wide range of applications. In this paper, we present PID-Comm, a fast and flexible collective inter-PE communication framework for commodity PIM-enabled DIMMs. The key idea of PID-Comm is to abstract the PEs as a multi-dimensional hypercube and allow multiple instances of collective inter-PE communication between the PEs belonging to certain dimensions of the hypercube. Leveraging this abstraction, PID-Comm first defines eight collective inter-PE communication patterns that allow applications to easily express their complex communication patterns. Then, PID-Comm provides high-performance implementations of the collective inter-PE communication patterns optimized for the DIMMs. Our evaluation using 16 UPMEM DIMMs and representative parallel algorithms shows that PID-Comm greatly improves the performance by up to 4.20x compared to the existing inter-PE communication implementations. The implementation of PID-Comm is available at https://github.com/AIS-SNU/PID-Comm. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: Accepted to ISCA 2024

arXiv:2404.08594 [pdf, other]

Absolute dimensions of solar-type eclipsing binaries. NY Hya: A test for magnetic stellar evolution models

Authors: T. C. Hinse, O. Baştürk, J. Southworth, G. A. Feiden, J. Tregloan-Reed, V. B. Kostov, J. Livingston, E. M. Esmer, Mesut Yılmaz, Selçuk Yalçınkaya, Şeyma Torun, J. Vos, D. F. Evans, J. C. Morales, J. C. A. Wolf, E. H. Olsen, J. V. Clausen, B. E. Helt, C. T. K. Lý, O. Stahl, R. Wells, M. Herath, U. G. Jørgensen, M. Dominik, J. Skottfelt , et al. (7 additional authors not shown)

Abstract: The binary star NY Hya is a bright, detached, double-lined eclipsing system with an orbital period of just under five days with two components each nearly identical to the Sun and located in the solar neighbourhood. The objective of this study is to test and confront various stellar evolution models for solar-type stars based on accurate measurements of stellar mass and radius. We present new… ▽ More The binary star NY Hya is a bright, detached, double-lined eclipsing system with an orbital period of just under five days with two components each nearly identical to the Sun and located in the solar neighbourhood. The objective of this study is to test and confront various stellar evolution models for solar-type stars based on accurate measurements of stellar mass and radius. We present new ground-based spectroscopic and photometric as well as high-precision space-based photometric and astrometric data from which we derive orbital as well as physical properties of the components via the method of least-squares minimisation based on a standard binary model valid for two detached components. Classic statistical techniques were invoked to test the significance of model parameters. Additional empirical evidence was compiled from the public domain; the derived system properties were compared with archival broad-band photometry data enabling a measurement of the system's spectral energy distribution that allowed an independent estimate of stellar properties. We also utilised semi-empirical calibration methods to derive atmospheric properties from Strömgren photometry and related colour indices. Data was used to confront the observed physical properties with classic and magnetic stellar evolution models. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 34 pages, 19 figures, 13 tables, (accepted for publication in A&A)

arXiv:2404.08175 [pdf, ps, other]

A Novel Vision Transformer based Load Profile Analysis using Load Images as Inputs

Authors: Hyeonjin Kim, Yi Hu, Kai Ye, Ning Lu

Abstract: This paper introduces ViT4LPA, an innovative Vision Transformer (ViT) based approach for Load Profile Analysis (LPA). We transform time-series load profiles into load images. This allows us to leverage the ViT architecture, originally designed for image processing, as a pre-trained image encoder to uncover latent patterns within load data. ViT is pre-trained using an extensive load image dataset,… ▽ More This paper introduces ViT4LPA, an innovative Vision Transformer (ViT) based approach for Load Profile Analysis (LPA). We transform time-series load profiles into load images. This allows us to leverage the ViT architecture, originally designed for image processing, as a pre-trained image encoder to uncover latent patterns within load data. ViT is pre-trained using an extensive load image dataset, comprising 1M load images derived from smart meter data collected over a two-year period from 2,000 residential users. The training methodology is self-supervised, masked image modeling, wherein masked load images are restored to reveal hidden relationships among image patches. The pre-trained ViT encoder is then applied to various downstream tasks, including the identification of electric vehicle (EV) charging loads and behind-the-meter solar photovoltaic (PV) systems and load disaggregation. Simulation results illustrate ViT4LPA's superior performance compared to existing neural network models in downstream tasks. Additionally, we conduct an in-depth analysis of the attention weights within the ViT4LPA model to gain insights into its information flow mechanisms. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.07610 [pdf, other]

Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval

Authors: Minkuk Kim, Hyeon Bae Kim, Jinyoung Moon, Jinwoo Choi, Seong Tae Kim

Abstract: There has been significant attention to the research on dense video captioning, which aims to automatically localize and caption all events within untrimmed video. Several studies introduce methods by designing dense video captioning as a multitasking problem of event localization and event captioning to consider inter-task relations. However, addressing both tasks using only visual input is chall… ▽ More There has been significant attention to the research on dense video captioning, which aims to automatically localize and caption all events within untrimmed video. Several studies introduce methods by designing dense video captioning as a multitasking problem of event localization and event captioning to consider inter-task relations. However, addressing both tasks using only visual input is challenging due to the lack of semantic content. In this study, we address this by proposing a novel framework inspired by the cognitive information processing of humans. Our model utilizes external memory to incorporate prior knowledge. The memory retrieval method is proposed with cross-modal video-to-text matching. To effectively incorporate retrieved text features, the versatile encoder and the decoder with visual and textual cross-attention modules are designed. Comparative experiments have been conducted to show the effectiveness of the proposed method on ActivityNet Captions and YouCook2 datasets. Experimental results show promising performance of our model without extensive pretraining from a large video dataset. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: CVPR 2024

arXiv:2404.07405 [pdf, other]

Simplifying Two-Stage Detectors for On-Device Inference in Remote Sensing

Authors: Jaemin Kang, Hoeseok Yang, Hyungshin Kim

Abstract: Deep learning has been successfully applied to object detection from remotely sensed images. Images are typically processed on the ground rather than on-board due to the computation power of the ground system. Such offloaded processing causes delays in acquiring target mission information, which hinders its application to real-time use cases. For on-device object detection, researches have been co… ▽ More Deep learning has been successfully applied to object detection from remotely sensed images. Images are typically processed on the ground rather than on-board due to the computation power of the ground system. Such offloaded processing causes delays in acquiring target mission information, which hinders its application to real-time use cases. For on-device object detection, researches have been conducted on designing efficient detectors or model compression to reduce inference latency. However, highly accurate two-stage detectors still need further exploitation for acceleration. In this paper, we propose a model simplification method for two-stage object detectors. Instead of constructing a general feature pyramid, we utilize only one feature extraction in the two-stage detector. To compensate for the accuracy drop, we apply a high pass filter to the RPN's score map. Our approach is applicable to any two-stage detector using a feature pyramid network. In the experiments with state-of-the-art two-stage detectors such as ReDet, Oriented-RCNN, and LSKNet, our method reduced computation costs upto 61.2% with the accuracy loss within 2.1% on the DOTAv1.5 dataset. Source code will be released. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.07021 [pdf, other]

A 4x32Gb/s 1.8pJ/bit Collaborative Baud-Rate CDR with Background Eye-Climbing Algorithm and Low-Power Global Clock Distribution

Authors: Jihee Kim, Jia Park, Jiwon Shin, Hanseok Kim, Kahyun Kim, Haengbeom Shin, Ha-Jung Park, Woo-Seok Choi

Abstract: This paper presents design techniques for an energy-efficient multi-lane receiver (RX) with baud-rate clock and data recovery (CDR), which is essential for high-throughput low-latency communication in high-performance computing systems. The proposed low-power global clock distribution not only significantly reduces power consumption across multi-lane RXs but is capable of compensating for the freq… ▽ More This paper presents design techniques for an energy-efficient multi-lane receiver (RX) with baud-rate clock and data recovery (CDR), which is essential for high-throughput low-latency communication in high-performance computing systems. The proposed low-power global clock distribution not only significantly reduces power consumption across multi-lane RXs but is capable of compensating for the frequency offset without any phase interpolators. To this end, a fractional divider controlled by CDR is placed close to the global phase locked loop. Moreover, in order to address the sub-optimal lock point of conventional baud-rate phase detectors, the proposed CDR employs a background eye-climbing algorithm, which optimizes the sampling phase and maximizes the vertical eye margin (VEM). Fabricated in a 28nm CMOS process, the proposed 4x32Gb/s RX shows a low integrated fractional spur of -40.4dBc at a 2500ppm frequency offset. Furthermore, it improves bit-error-rate performance by increasing the VEM by 17%. The entire RX achieves the energy efficiency of 1.8pJ/bit with the aggregate data rate of 128Gb/s. △ Less

Submitted 22 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.06452 [pdf, other]

PAAM: A Framework for Coordinated and Priority-Driven Accelerator Management in ROS 2

Authors: Daniel Enright, Yecheng Xiang, Hyunjong Choi, Hyoseung Kim

Abstract: This paper proposes a Priority-driven Accelerator Access Management (PAAM) framework for multi-process robotic applications built on top of the Robot Operating System (ROS) 2 middleware platform. The framework addresses the issue of predictable execution of time- and safety-critical callback chains that require hardware accelerators such as GPUs and TPUs. PAAM provides a standalone ROS executor th… ▽ More This paper proposes a Priority-driven Accelerator Access Management (PAAM) framework for multi-process robotic applications built on top of the Robot Operating System (ROS) 2 middleware platform. The framework addresses the issue of predictable execution of time- and safety-critical callback chains that require hardware accelerators such as GPUs and TPUs. PAAM provides a standalone ROS executor that acts as an accelerator resource server, arbitrating accelerator access requests from all other callbacks at the application layer. This approach enables coordinated and priority-driven accelerator access management in multi-process robotic systems. The framework design is directly applicable to all types of accelerators and enables granular control over how specific chains access accelerators, making it possible to achieve predictable real-time support for accelerators used by safety-critical callback chains without making changes to underlying accelerator device drivers. The paper shows that PAAM also offers a theoretical analysis that can upper bound the worst-case response time of safety-critical callback chains that necessitate accelerator access. This paper also demonstrates that complex robotic systems with extensive accelerator usage that are integrated with PAAM may achieve up to a 91\% reduction in end-to-end response time of their critical callback chains. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 14 Pages, 14 Figures

arXiv:2404.06059 [pdf, other]

Efficient Quantum Circuits for Machine Learning Activation Functions including Constant T-depth ReLU

Authors: Wei Zi, Siyi Wang, Hyunji Kim, Xiaoming Sun, Anupam Chattopadhyay, Patrick Rebentrost

Abstract: In recent years, Quantum Machine Learning (QML) has increasingly captured the interest of researchers. Among the components in this domain, activation functions hold a fundamental and indispensable role. Our research focuses on the development of activation functions quantum circuits for integration into fault-tolerant quantum computing architectures, with an emphasis on minimizing $T$-depth. Spec… ▽ More In recent years, Quantum Machine Learning (QML) has increasingly captured the interest of researchers. Among the components in this domain, activation functions hold a fundamental and indispensable role. Our research focuses on the development of activation functions quantum circuits for integration into fault-tolerant quantum computing architectures, with an emphasis on minimizing $T$-depth. Specifically, we present novel implementations of ReLU and leaky ReLU activation functions, achieving constant $T$-depths of 4 and 8, respectively. Leveraging quantum lookup tables, we extend our exploration to other activation functions such as the sigmoid. This approach enables us to customize precision and $T$-depth by adjusting the number of qubits, making our results more adaptable to various application scenarios. This study represents a significant advancement towards enhancing the practicality and application of quantum machine learning. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 13 pages

arXiv:2404.05912 [pdf, ps, other]

OGLE-2018-BLG-0971, MOA-2023-BLG-065, and OGLE-2023-BLG-0136: Microlensing events with prominent orbital effects

Authors: Cheongho Han, Andrzej Udalski, Ian A. Bond, Chung-Uk Lee, Andrew Gould, Michael D. Albrow, Sun-Ju Chung, Kyu-Ha Hwang, Youn Kil Jung, Hyoun-Woo Kim, Yoon-Hyun Ryu, Yossi Shvartzvald, In-Gu Shin, Jennifer C. Yee, Hongjing Yang, Weicheng Zang, Sang-Mok Cha, Doeon Kim, Dong-Jin Kim, Seung-Lee Kim, Dong-Joo Lee, Yongseok Lee, Byeong-Gon Park, Richard W. Pogge, Przemek Mróz , et al. (38 additional authors not shown)

Abstract: We undertake a project to reexamine microlensing data gathered from high-cadence surveys. The aim of the project is to reinvestigate lensing events with light curves exhibiting intricate anomaly features associated with caustics, yet lacking prior proposed models to explain these features. Through detailed reanalyses considering higher-order effects, we identify that accounting for orbital motions… ▽ More We undertake a project to reexamine microlensing data gathered from high-cadence surveys. The aim of the project is to reinvestigate lensing events with light curves exhibiting intricate anomaly features associated with caustics, yet lacking prior proposed models to explain these features. Through detailed reanalyses considering higher-order effects, we identify that accounting for orbital motions of lenses is vital in accurately explaining the anomaly features observed in the light curves of the lensing events OGLE-2018-BLG-0971, MOA-2023-BLG-065, and OGLE-2023-BLG-0136. We estimate the masses and distances to the lenses by conducting Bayesian analyses using the lensing parameters of the newly found lensing solutions. From these analyses, we identify that the lenses of the events OGLE-2018-BLG-0971 and MOA-2023-BLG-065 are binaries composed of M dwarfs, while the lens of OGLE-2023-BLG-0136 is likely to be a binary composed of an early K-dwarf primary and a late M-dwarf companion. For all lensing events, the probability of the lens residing in the bulge is considerably higher than that of it being located in the disk. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 11 pages, 13 figures, 6 tables

arXiv:2404.05867 [pdf, other]

Strict area law implies commuting parent Hamiltonian

Authors: Isaac H. Kim, Ting-Chun Lin, Daniel Ranard, Bowen Shi

Abstract: We show that in two spatial dimensions, when a quantum state has entanglement entropy obeying a strict area law, meaning $S(A)=α|\partial A| - γ$ for constants $α, γ$ independent of lattice region $A$, then it admits a commuting parent Hamiltonian. More generally, we prove that the entanglement bootstrap axioms in 2D imply the existence of a commuting, local parent Hamiltonian with a stable spectr… ▽ More We show that in two spatial dimensions, when a quantum state has entanglement entropy obeying a strict area law, meaning $S(A)=α|\partial A| - γ$ for constants $α, γ$ independent of lattice region $A$, then it admits a commuting parent Hamiltonian. More generally, we prove that the entanglement bootstrap axioms in 2D imply the existence of a commuting, local parent Hamiltonian with a stable spectral gap. We also extend our proof to states that describe gapped domain walls. Physically, these results imply that the states studied in the entanglement bootstrap program correspond to ground states of some local Hamiltonian, describing a stable phase of matter. Our result also suggests that systems with chiral gapless edge modes cannot obey a strict area law provided they have finite local Hilbert space. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 19+2 pages, 10 figures

arXiv:2404.05687 [pdf, other]

Retrieval-Augmented Open-Vocabulary Object Detection

Authors: Jooyeon Kim, Eulrang Cho, Sehyung Kim, Hyunwoo J. Kim

Abstract: Open-vocabulary object detection (OVD) has been studied with Vision-Language Models (VLMs) to detect novel objects beyond the pre-trained categories. Previous approaches improve the generalization ability to expand the knowledge of the detector, using 'positive' pseudo-labels with additional 'class' names, e.g., sock, iPod, and alligator. To extend the previous methods in two aspects, we propose R… ▽ More Open-vocabulary object detection (OVD) has been studied with Vision-Language Models (VLMs) to detect novel objects beyond the pre-trained categories. Previous approaches improve the generalization ability to expand the knowledge of the detector, using 'positive' pseudo-labels with additional 'class' names, e.g., sock, iPod, and alligator. To extend the previous methods in two aspects, we propose Retrieval-Augmented Losses and visual Features (RALF). Our method retrieves related 'negative' classes and augments loss functions. Also, visual features are augmented with 'verbalized concepts' of classes, e.g., worn on the feet, handheld music player, and sharp teeth. Specifically, RALF consists of two modules: Retrieval Augmented Losses (RAL) and Retrieval-Augmented visual Features (RAF). RAL constitutes two losses reflecting the semantic similarity with negative vocabularies. In addition, RAF augments visual features with the verbalized concepts from a large language model (LLM). Our experiments demonstrate the effectiveness of RALF on COCO and LVIS benchmark datasets. We achieve improvement up to 3.4 box AP$_{50}^{\text{N}}$ on novel categories of the COCO dataset and 3.6 mask AP$_{\text{r}}$ gains on the LVIS dataset. Code is available at https://github.com/mlvlab/RALF . △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: Accepted paper at CVPR 2024

arXiv:2404.05151 [pdf, other]

STITCH: Augmented Dexterity for Suture Throws Including Thread Coordination and Handoffs

Authors: Kush Hari, Hansoul Kim, Will Panitch, Kishore Srinivas, Vincent Schorp, Karthik Dharmarajan, Shreya Ganti, Tara Sadjadpour, Ken Goldberg

Abstract: We present STITCH: an augmented dexterity pipeline that performs Suture Throws Including Thread Coordination and Handoffs. STITCH iteratively performs needle insertion, thread sweeping, needle extraction, suture cinching, needle handover, and needle pose correction with failure recovery policies. We introduce a novel visual 6D needle pose estimation framework using a stereo camera pair and new sut… ▽ More We present STITCH: an augmented dexterity pipeline that performs Suture Throws Including Thread Coordination and Handoffs. STITCH iteratively performs needle insertion, thread sweeping, needle extraction, suture cinching, needle handover, and needle pose correction with failure recovery policies. We introduce a novel visual 6D needle pose estimation framework using a stereo camera pair and new suturing motion primitives. We compare STITCH to baselines, including a proprioception-only and a policy without visual servoing. In physical experiments across 15 trials, STITCH achieves an average of 2.93 sutures without human intervention and 4.47 sutures with human intervention. See https://sites.google.com/berkeley.edu/stitch for code and supplemental materials. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.05119 [pdf, other]

A 0.65-pJ/bit 3.6-TB/s/mm I/O Interface with XTalk Minimizing Affine Signaling for Next-Generation HBM with High Interconnect Density

Authors: Hyunjun Park, Jiwon Shin, Hanseok Kim, Jihee Kim, Haengbeom Shin, Taehoon Kim, Jung-Hun Park, Woo-Seok Choi

Abstract: This paper presents an I/O interface with Xtalk Minimizing Affine Signaling (XMAS), which is designed to support high-speed data transmission in die-to-die communication over silicon interposers or similar high-density interconnects susceptible to crosstalk. The operating principles of XMAS are elucidated through rigorous analyses, and its advantages over existing signaling are validated through n… ▽ More This paper presents an I/O interface with Xtalk Minimizing Affine Signaling (XMAS), which is designed to support high-speed data transmission in die-to-die communication over silicon interposers or similar high-density interconnects susceptible to crosstalk. The operating principles of XMAS are elucidated through rigorous analyses, and its advantages over existing signaling are validated through numerical experiments. XMAS not only demonstrates exceptional crosstalk removing capabilities but also exhibits robustness against noise, especially simultaneous switching noise. Fabricated in a 28-nm CMOS process, the prototype XMAS transceiver achieves an edge density of 3.6TB/s/mm and an energy efficiency of 0.65pJ/b. Compared to the single-ended signaling, the crosstalk-induced peak-to-peak jitter of the received eye with XMAS is reduced by 75% at 10GS/s/pin data rate, and the horizontal eye opening extends to 0.2UI at a bit error rate < 10$^{-12}$. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04767 [pdf, ps, other]

The intersection cohomology Hodge module of toric varieties

Authors: Hyunsuk Kim, Sridhar Venkatesh

Abstract: We study the Hodge filtration of the intersection cohomology Hodge module for toric varieties. More precisely, we study the cohomology sheaves of the graded de Rham complex of the intersection cohomology Hodge module and give a precise formula relating it with the stalks of the intersection cohomology as a constructible complex. The main idea is to use the Ishida complex in order to compute the hi… ▽ More We study the Hodge filtration of the intersection cohomology Hodge module for toric varieties. More precisely, we study the cohomology sheaves of the graded de Rham complex of the intersection cohomology Hodge module and give a precise formula relating it with the stalks of the intersection cohomology as a constructible complex. The main idea is to use the Ishida complex in order to compute the higher direct images of the sheaf of reflexive differentials. △ Less

Submitted 22 May, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

Comments: 24 pages, minor changes

MSC Class: 14B05; 14C30; 14F10; 14M25; 14Q99; 32S35; 52B22

arXiv:2404.04544 [pdf, other]

BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion

Authors: Gwanghyun Kim, Hayeon Kim, Hoigi Seo, Dong Un Kang, Se Young Chun

Abstract: Generating higher-resolution human-centric scenes with details and controls remains a challenge for existing text-to-image diffusion models. This challenge stems from limited training image size, text encoder capacity (limited tokens), and the inherent difficulty of generating complex scenes involving multiple humans. While current methods attempted to address training size limit only, they often… ▽ More Generating higher-resolution human-centric scenes with details and controls remains a challenge for existing text-to-image diffusion models. This challenge stems from limited training image size, text encoder capacity (limited tokens), and the inherent difficulty of generating complex scenes involving multiple humans. While current methods attempted to address training size limit only, they often yielded human-centric scenes with severe artifacts. We propose BeyondScene, a novel framework that overcomes prior limitations, generating exquisite higher-resolution (over 8K) human-centric scenes with exceptional text-image correspondence and naturalness using existing pretrained diffusion models. BeyondScene employs a staged and hierarchical approach to initially generate a detailed base image focusing on crucial elements in instance creation for multiple humans and detailed descriptions beyond token limit of diffusion model, and then to seamlessly convert the base image to a higher-resolution output, exceeding training image size and incorporating details aware of text and instances via our novel instance-aware hierarchical enlargement process that consists of our proposed high-frequency injected forward diffusion and adaptive joint diffusion. BeyondScene surpasses existing methods in terms of correspondence with detailed text descriptions and naturalness, paving the way for advanced applications in higher-resolution human-centric scene creation beyond the capacity of pretrained diffusion models without costly retraining. Project page: https://janeyeon.github.io/beyond-scene. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: Project page: https://janeyeon.github.io/beyond-scene

arXiv:2404.04532 [pdf, ps, other]

$K_{1}^{\pm}$ mesons moving in nuclear matter

Authors: Seokwoo Yeo, HyungJoo Kim, Su Houng Lee

Abstract: Observing the mass shifts of mesons immersed in nuclear matter is interesting, as the changes are expected to shed light on the effects of chiral symmetry breaking on the origin of hadron masses. At the same time, it is important to understand the momentum dependence of the masses for spin-1 mesons, as the changes manifest differently across the two polarization modes. Here, the mass shifts of… ▽ More Observing the mass shifts of mesons immersed in nuclear matter is interesting, as the changes are expected to shed light on the effects of chiral symmetry breaking on the origin of hadron masses. At the same time, it is important to understand the momentum dependence of the masses for spin-1 mesons, as the changes manifest differently across the two polarization modes. Here, the mass shifts of $K_{1}^{\pm}$ mesons with finite three-momentum in nuclear medium are studied in the QCD sum rule approach. We find that the mass of $K_{1}^{+}$($K_{1}^{-}$) meson is increased(decreased) by the non-trivial momentum effect in both the transverse and longitudinal modes. Specifically, compared to its rest mass in the nuclear medium, in the transverse mode, the mass of $K_{1}^{+}(K_{1}^{-})$ is observed to shift by +2(-55) MeV, while in the longitudinal mode, the mass shift is +13(-11) MeV, all at a momentum of 0.5 GeV. Exploring the medium modifications of $K_{1}$ meson through kaon beams at J-PARC will provide insights on the partial restoration of chiral symmetry in nuclear matter. △ Less

Submitted 10 April, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

Comments: 9 pages, 6 figures, acknowledgments added

arXiv:2404.04096 [pdf, other]

Machine Learning-Aided Cooperative Localization under Dense Urban Environment

Authors: Hoon Lee, Hong Ki Kim, Seung Hyun Oh, Sang Hyun Lee

Abstract: Future wireless network technology provides automobiles with the connectivity feature to consolidate the concept of vehicular networks that collaborate on conducting cooperative driving tasks. The full potential of connected vehicles, which promises road safety and quality driving experience, can be leveraged if machine learning models guarantee the robustness in performing core functions includin… ▽ More Future wireless network technology provides automobiles with the connectivity feature to consolidate the concept of vehicular networks that collaborate on conducting cooperative driving tasks. The full potential of connected vehicles, which promises road safety and quality driving experience, can be leveraged if machine learning models guarantee the robustness in performing core functions including localization and controls. Location awareness, in particular, lends itself to the deployment of location-specific services and the improvement of the operation performance. The localization entails direct communication to the network infrastructure, and the resulting centralized positioning solutions readily become intractable as the network scales up. As an alternative to the centralized solutions, this article addresses decentralized principle of vehicular localization reinforced by machine learning techniques in dense urban environments with frequent inaccessibility to reliable measurement. As such, the collaboration of multiple vehicles enhances the positioning performance of machine learning approaches. A virtual testbed is developed to validate this machine learning model for real-map vehicular networks. Numerical results demonstrate universal feasibility of cooperative localization, in particular, for dense urban area configurations. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.03887 [pdf, other]

SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models

Authors: Hyeonwoo Kim, Gyoungjin Gim, Yungi Kim, Jihoo Kim, Byungju Kim, Wonseok Lee, Chanjun Park

Abstract: This study presents a novel learning approach designed to enhance both mathematical reasoning and problem-solving abilities of Large Language Models (LLMs). We focus on integrating the Chain-of-Thought (CoT) and the Program-of-Thought (PoT) learning, hypothesizing that prioritizing the learning of mathematical reasoning ability is helpful for the amplification of problem-solving ability. Thus, the… ▽ More This study presents a novel learning approach designed to enhance both mathematical reasoning and problem-solving abilities of Large Language Models (LLMs). We focus on integrating the Chain-of-Thought (CoT) and the Program-of-Thought (PoT) learning, hypothesizing that prioritizing the learning of mathematical reasoning ability is helpful for the amplification of problem-solving ability. Thus, the initial learning with CoT is essential for solving challenging mathematical problems. To this end, we propose a sequential learning approach, named SAAS (Solving Ability Amplification Strategy), which strategically transitions from CoT learning to PoT learning. Our empirical study, involving an extensive performance comparison using several benchmarks, demonstrates that our SAAS achieves state-of-the-art (SOTA) performance. The results underscore the effectiveness of our sequential learning approach, marking a significant advancement in the field of mathematical reasoning in LLMs. △ Less

Submitted 24 April, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.03725 [pdf, other]

Conformal geometry from entanglement

Authors: Isaac H. Kim, Xiang Li, Ting-Chun Lin, John McGreevy, Bowen Shi

Abstract: In a physical system with conformal symmetry, observables depend on cross-ratios, measures of distance invariant under global conformal transformations (conformal geometry for short). We identify a quantum information-theoretic mechanism by which the conformal geometry emerges at the gapless edge of a 2+1D quantum many-body system with a bulk energy gap. We introduce a novel pair of information-th… ▽ More In a physical system with conformal symmetry, observables depend on cross-ratios, measures of distance invariant under global conformal transformations (conformal geometry for short). We identify a quantum information-theoretic mechanism by which the conformal geometry emerges at the gapless edge of a 2+1D quantum many-body system with a bulk energy gap. We introduce a novel pair of information-theoretic quantities $(\mathfrak{c}_{\mathrm{tot}}, η)$ that can be defined locally on the edge from the wavefunction of the many-body system, without prior knowledge of any distance measure. We posit that, for a topological groundstate, the quantity $\mathfrak{c}_{\mathrm{tot}}$ is stationary under arbitrary variations of the quantum state, and study the logical consequences. We show that stationarity, modulo an entanglement-based assumption about the bulk, implies (i) $\mathfrak{c}_{\mathrm{tot}}$ is a non-negative constant that can be interpreted as the total central charge of the edge theory. (ii) $η$ is a cross-ratio, obeying the full set of mathematical consistency rules, which further indicates the existence of a distance measure of the edge with global conformal invariance. Thus, the conformal geometry emerges from a simple assumption on groundstate entanglement. We show that stationarity of $\mathfrak{c}_{\mathrm{tot}}$ is equivalent to a vector fixed-point equation involving $η$, making our assumption locally checkable. We also derive similar results for 1+1D systems under a suitable set of assumptions. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: 48+31 pages, 25 figures

arXiv:2404.03691 [pdf, other]

Upgrade of NaI(Tl) crystal encapsulation for the NEON experiment

Authors: J. J. Choi, E. J. Jeon, J. Y. Kim, K. W. Kim, S. H. Kim, S. K. Kim, Y. D. Kim, Y. J. Ko, B. C. Koh, C. Ha, B. J. Park, S. H. Lee, I. S. Lee, H. Lee, H. S. Lee, J. Lee, Y. M. Oh

Abstract: The Neutrino Elastic-scattering Observation with NaI(Tl) experiment (NEON) aims to detect coherent elastic neutrino-nucleus scattering~(\cenns) in a NaI(Tl) crystal using reactor anti-electron neutrinos at the Hanbit nuclear power plant complex. A total of 13.3 kg of NaI(Tl) crystals were initially installed in December 2020 at the tendon gallery, 23.7$\pm$0.3\,m away from the reactor core, which… ▽ More The Neutrino Elastic-scattering Observation with NaI(Tl) experiment (NEON) aims to detect coherent elastic neutrino-nucleus scattering~(\cenns) in a NaI(Tl) crystal using reactor anti-electron neutrinos at the Hanbit nuclear power plant complex. A total of 13.3 kg of NaI(Tl) crystals were initially installed in December 2020 at the tendon gallery, 23.7$\pm$0.3\,m away from the reactor core, which operates at a thermal power of 2.8\,GW. Initial engineering operation was performed from May 2021 to March 2022 and observed unexpected photomultiplier-induced noise and a decreased light yield that were caused by leakage of liquid scintillator into the detector due to weakness of detector encapsulation. We upgraded the detector encapsulation design to prevent the leakage of the liquid scintillator. Meanwhile two small-sized detectors were replaced with larger ones resulting in a total mass of 16.7\,kg. With this new design implementation, the detector system has been operating stably since April 2022 for over a year without detector gain drop. In this paper, we present an improved crystal encapsulation design and stability of the NEON experiment. △ Less

Submitted 28 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.03138 [pdf, other]

Discontinuity-preserving Normal Integration with Auxiliary Edges

Authors: Hyomin Kim, Yucheol Jung, Seungyong Lee

Abstract: Many surface reconstruction methods incorporate normal integration, which is a process to obtain a depth map from surface gradients. In this process, the input may represent a surface with discontinuities, e.g., due to self-occlusion. To reconstruct an accurate depth map from the input normal map, hidden surface gradients occurring from the jumps must be handled. To model these jumps correctly, we… ▽ More Many surface reconstruction methods incorporate normal integration, which is a process to obtain a depth map from surface gradients. In this process, the input may represent a surface with discontinuities, e.g., due to self-occlusion. To reconstruct an accurate depth map from the input normal map, hidden surface gradients occurring from the jumps must be handled. To model these jumps correctly, we design a novel discretization scheme for the domain of normal integration. Our key idea is to introduce auxiliary edges, which bridge between piecewise-smooth patches in the domain so that the magnitude of hidden jumps can be explicitly expressed. Using the auxiliary edges, we design a novel algorithm to optimize the discontinuity and the depth map from the input normal map. Our method optimizes discontinuities by using a combination of iterative re-weighted least squares and iterative filtering of the jump magnitudes on auxiliary edges to provide strong sparsity regularization. Compared to previous discontinuity-preserving normal integration methods, which model the magnitudes of jumps only implicitly, our method reconstructs subtle discontinuities accurately thanks to our explicit representation of jumps allowing for strong sparsity regularization. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: To appear at CVPR 2024. For supplementary video, see https://youtu.be/MTTcW5kAOFE

ACM Class: I.4.5

arXiv:2404.02405 [pdf, other]

TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression

Authors: Ho-Joong Kim, Jung-Ho Hong, Heejo Kong, Seong-Whan Lee

Abstract: In this paper, we investigate that the normalized coordinate expression is a key factor as reliance on hand-crafted components in query-based detectors for temporal action detection (TAD). Despite significant advancements towards an end-to-end framework in object detection, query-based detectors have been limited in achieving full end-to-end modeling in TAD. To address this issue, we propose \mode… ▽ More In this paper, we investigate that the normalized coordinate expression is a key factor as reliance on hand-crafted components in query-based detectors for temporal action detection (TAD). Despite significant advancements towards an end-to-end framework in object detection, query-based detectors have been limited in achieving full end-to-end modeling in TAD. To address this issue, we propose \modelname{}, a full end-to-end temporal action detection transformer that integrates time-aligned coordinate expression. We reformulate coordinate expression utilizing actual timeline values, ensuring length-invariant representations from the extremely diverse video duration environment. Furthermore, our proposed adaptive query selection dynamically adjusts the number of queries based on video length, providing a suitable solution for varying video durations compared to a fixed query set. Our approach not only simplifies the TAD process by eliminating the need for hand-crafted components but also significantly improves the performance of query-based detectors. Our TE-TAD outperforms the previous query-based detectors and achieves competitive performance compared to state-of-the-art methods on popular benchmark datasets. Code is available at: https://github.com/Dotori-HJ/TE-TAD △ Less

Submitted 3 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.02342 [pdf, other]

A Computational Analysis of Lyric Similarity Perception

Authors: Haven Kim, Taketo Akama

Abstract: In musical compositions that include vocals, lyrics significantly contribute to artistic expression. Consequently, previous studies have introduced the concept of a recommendation system that suggests lyrics similar to a user's favorites or personalized preferences, aiding in the discovery of lyrics among millions of tracks. However, many of these systems do not fully consider human perceptions of… ▽ More In musical compositions that include vocals, lyrics significantly contribute to artistic expression. Consequently, previous studies have introduced the concept of a recommendation system that suggests lyrics similar to a user's favorites or personalized preferences, aiding in the discovery of lyrics among millions of tracks. However, many of these systems do not fully consider human perceptions of lyric similarity, primarily due to limited research in this area. To bridge this gap, we conducted a comparative analysis of computational methods for modeling lyric similarity with human perception. Results indicated that computational models based on similarities between embeddings from pre-trained BERT-based models, the audio from which the lyrics are derived, and phonetic components are indicative of perceptual lyric similarity. This finding underscores the importance of semantic, stylistic, and phonetic similarities in human perception about lyric similarity. We anticipate that our findings will enhance the development of similarity-based lyric recommendation systems by offering pseudo-labels for neural network development and introducing objective evaluation metrics. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2404.01807 [pdf, other]

Stacking of charge-density waves in 2H-NbSe$_2$ bilayers

Authors: Fabrizio Cossu, Dhani Nafday, Krisztian Palotás, Mehdi Biderang, Heung-Sik Kim, Alireza Akbari, Igor Di Marco

Abstract: We employ ab-initio electronic structure calculations to investigate the charge-density waves and periodic lattice distortions in bilayer 2H-NbSe$_2$. We demonstrate that the vertical stacking can give rise to a variety of patterns that may lower the symmetry of the CDW exhibited separately by the two composing 1H-NbSe$_2$ monolayers. The general tendency to a spontaneous symmetry breaking observe… ▽ More We employ ab-initio electronic structure calculations to investigate the charge-density waves and periodic lattice distortions in bilayer 2H-NbSe$_2$. We demonstrate that the vertical stacking can give rise to a variety of patterns that may lower the symmetry of the CDW exhibited separately by the two composing 1H-NbSe$_2$ monolayers. The general tendency to a spontaneous symmetry breaking observed in the ground state and the first excited states is shown to originate from a non-negligible inter-layer coupling. Simulated images for scanning tunnelling microscopy (STM) as well as diffraction/scattering patterns show signatures of the different stacking orders. This may not only be useful to reinterpret past experiments on surfaces and thin films, but may also be exploited to devise ad-hoc experiments for the investigation of the stacking order in 2H-NbSe$_2$. We anticipate that our analysis does not only apply to the 2H-NbSe$_2$ bilayer, but is also relevant for thin films and bulk, whose smallest centro-symmetric component is indeed the bilayer. Finally, our results illustrate clearly that the vertical stacking is not only important for 1T structures, as exemplified by the metal-to-insulator transition observed in 1T-TaS$_2$, but seems to be a general feature of metallic layered transition metal dichalcogenides as well. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 11 pages, 5 figures

arXiv:2404.01661 [pdf, other]

Interaction-Aware Vehicle Motion Planning with Collision Avoidance Constraints in Highway Traffic

Authors: Dongryul Kim, Hyeonjeong Kim, Kyoungseok Han

Abstract: This paper proposes collision-free optimal trajectory planning for autonomous vehicles in highway traffic, where vehicles need to deal with the interaction among each other. To address this issue, a novel optimal control framework is suggested, which couples the trajectory of surrounding vehicles with collision avoidance constraints. Additionally, we describe a trajectory optimization technique un… ▽ More This paper proposes collision-free optimal trajectory planning for autonomous vehicles in highway traffic, where vehicles need to deal with the interaction among each other. To address this issue, a novel optimal control framework is suggested, which couples the trajectory of surrounding vehicles with collision avoidance constraints. Additionally, we describe a trajectory optimization technique under state constraints, utilizing a planner based on Pontryagin's Minimum Principle, capable of numerically solving collision avoidance scenarios with surrounding vehicles. Simulation results demonstrate the effectiveness of the proposed approach regarding interaction-based motion planning for different scenarios. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.01628 [pdf, other]

Learning Equi-angular Representations for Online Continual Learning

Authors: Minhyuk Seo, Hyunseo Koh, Wonje Jeung, Minjae Lee, San Kim, Hankook Lee, Sungjun Cho, Sungik Choi, Hyunwoo Kim, Jonghyun Choi

Abstract: Online continual learning suffers from an underfitted solution due to insufficient training for prompt model update (e.g., single-epoch training). To address the challenge, we propose an efficient online continual learning method using the neural collapse phenomenon. In particular, we induce neural collapse to form a simplex equiangular tight frame (ETF) structure in the representation space so th… ▽ More Online continual learning suffers from an underfitted solution due to insufficient training for prompt model update (e.g., single-epoch training). To address the challenge, we propose an efficient online continual learning method using the neural collapse phenomenon. In particular, we induce neural collapse to form a simplex equiangular tight frame (ETF) structure in the representation space so that the continuously learned model with a single epoch can better fit to the streamed data by proposing preparatory data training and residual correction in the representation space. With an extensive set of empirical validations using CIFAR-10/100, TinyImageNet, ImageNet-200, and ImageNet-1K, we show that our proposed method outperforms state-of-the-art methods by a noticeable margin in various online continual learning scenarios such as disjoint and Gaussian scheduled continuous (i.e., boundary-free) data setups. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: CVPR 2024

arXiv:2404.01042 [pdf, ps, other]

Multiplicative Hecke operators and their applications

Authors: Gyucheol Shin, Chang Heon Kim

Abstract: In this paper, we define the multiplicative Hecke operators $\mathcal{T}(n)$ for any positive integer on the integral weight meromorphic modular forms for $Γ_{0}(N)$. We then show that they have properties similar to those of additive Hecke operators. Moreover, we prove that multiplicative Hecke eigenforms with integer Fourier coefficients are eta quotients, and vice versa. In addition, we prove t… ▽ More In this paper, we define the multiplicative Hecke operators $\mathcal{T}(n)$ for any positive integer on the integral weight meromorphic modular forms for $Γ_{0}(N)$. We then show that they have properties similar to those of additive Hecke operators. Moreover, we prove that multiplicative Hecke eigenforms with integer Fourier coefficients are eta quotients, and vice versa. In addition, we prove that the Borcherds product and logarithmic derivative are Hecke equivariant with the multiplicative Hecke operators and the Hecke operators on the half-integral weight harmonic weak Maass forms and weight 2 meromorphic modular forms. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 22 pages

MSC Class: 11F03; 11F12; 11F20; 11F25; 11F37

arXiv:2404.00851 [pdf, other]

Prompt Learning via Meta-Regularization

Authors: Jinyoung Park, Juyeon Ko, Hyunwoo J. Kim

Abstract: Pre-trained vision-language models have shown impressive success on various computer vision tasks with their zero-shot generalizability. Recently, prompt learning approaches have been explored to efficiently and effectively adapt the vision-language models to a variety of downstream tasks. However, most existing prompt learning methods suffer from task overfitting since the general knowledge of th… ▽ More Pre-trained vision-language models have shown impressive success on various computer vision tasks with their zero-shot generalizability. Recently, prompt learning approaches have been explored to efficiently and effectively adapt the vision-language models to a variety of downstream tasks. However, most existing prompt learning methods suffer from task overfitting since the general knowledge of the pre-trained vision language models is forgotten while the prompts are finetuned on a small data set from a specific target task. To address this issue, we propose a Prompt Meta-Regularization (ProMetaR) to improve the generalizability of prompt learning for vision-language models. Specifically, ProMetaR meta-learns both the regularizer and the soft prompts to harness the task-specific knowledge from the downstream tasks and task-agnostic general knowledge from the vision-language models. Further, ProMetaR augments the task to generate multiple virtual tasks to alleviate the meta-overfitting. In addition, we provide the analysis to comprehend how ProMetaR improves the generalizability of prompt tuning in the perspective of the gradient alignment. Our extensive experiments demonstrate that our ProMetaR improves the generalizability of conventional prompt learning methods under base-to-base/base-to-new and domain generalization settings. The code of ProMetaR is available at https://github.com/mlvlab/ProMetaR. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: CVPR 2024

arXiv:2404.00830 [pdf, other]

2D Ego-Motion with Yaw Estimation using Only mmWave Radars via Two-Way weighted ICP

Authors: Hojune Kim, Hyesu Jang, Ayoung Kim

Abstract: The interest in single-chip mmWave Radar is driven by their compact form factor, cost-effectiveness, and robustness under harsh environmental conditions. Despite its promising attributes, the principal limitation of mmWave radar lies in its capacity for autonomous yaw rate estimation. Conventional solutions have often resorted to integrating inertial measurement unit (IMU) or deploying multiple ra… ▽ More The interest in single-chip mmWave Radar is driven by their compact form factor, cost-effectiveness, and robustness under harsh environmental conditions. Despite its promising attributes, the principal limitation of mmWave radar lies in its capacity for autonomous yaw rate estimation. Conventional solutions have often resorted to integrating inertial measurement unit (IMU) or deploying multiple radar units to circumvent this shortcoming. This paper introduces an innovative methodology for two-dimensional ego-motion estimation, focusing on yaw rate deduction, utilizing solely mmWave radar sensors. By applying a weighted Iterated Closest Point (ICP) algorithm to register processed points derived from heatmap data, our method facilitates 2D ego-motion estimation devoid of prior information. Through experimental validation, we verified the effectiveness and promise of our technique for ego-motion estimation using exclusively radar data. △ Less

Submitted 31 March, 2024; originally announced April 2024.

arXiv:2404.00678 [pdf, other]

OmniSDF: Scene Reconstruction using Omnidirectional Signed Distance Functions and Adaptive Binoctrees

Authors: Hakyeong Kim, Andreas Meuleman, Hyeonjoong Jang, James Tompkin, Min H. Kim

Abstract: We present a method to reconstruct indoor and outdoor static scene geometry and appearance from an omnidirectional video moving in a small circular sweep. This setting is challenging because of the small baseline and large depth ranges, making it difficult to find ray crossings. To better constrain the optimization, we estimate geometry as a signed distance field within a spherical binoctree data… ▽ More We present a method to reconstruct indoor and outdoor static scene geometry and appearance from an omnidirectional video moving in a small circular sweep. This setting is challenging because of the small baseline and large depth ranges, making it difficult to find ray crossings. To better constrain the optimization, we estimate geometry as a signed distance field within a spherical binoctree data structure and use a complementary efficient tree traversal strategy based on a breadth-first search for sampling. Unlike regular grids or trees, the shape of this structure well-matches the camera setting, creating a better memory-quality trade-off. From an initial depth estimate, the binoctree is adaptively subdivided throughout the optimization; previous methods use a fixed depth that leaves the scene undersampled. In comparison with three neural optimization methods and two non-neural methods, ours shows decreased geometry error on average, especially in a detailed scene, while significantly reducing the required number of voxels to represent such details. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

arXiv:2404.00676 [pdf, other]

OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos

Authors: Dongyoung Choi, Hyeonjoong Jang, Min H. Kim

Abstract: Omnidirectional cameras are extensively used in various applications to provide a wide field of vision. However, they face a challenge in synthesizing novel views due to the inevitable presence of dynamic objects, including the photographer, in their wide field of view. In this paper, we introduce a new approach called Omnidirectional Local Radiance Fields (OmniLocalRF) that can render static-only… ▽ More Omnidirectional cameras are extensively used in various applications to provide a wide field of vision. However, they face a challenge in synthesizing novel views due to the inevitable presence of dynamic objects, including the photographer, in their wide field of view. In this paper, we introduce a new approach called Omnidirectional Local Radiance Fields (OmniLocalRF) that can render static-only scene views, removing and inpainting dynamic objects simultaneously. Our approach combines the principles of local radiance fields with the bidirectional optimization of omnidirectional rays. Our input is an omnidirectional video, and we evaluate the mutual observations of the entire angle between the previous and current frames. To reduce ghosting artifacts of dynamic objects and inpaint occlusions, we devise a multi-resolution motion mask prediction module. Unlike existing methods that primarily separate dynamic components through the temporal domain, our method uses multi-resolution neural feature planes for precise segmentation, which is more suitable for long 360-degree videos. Our experiments validate that OmniLocalRF outperforms existing methods in both qualitative and quantitative metrics, especially in scenarios with complex real-world scenes. In particular, our approach eliminates the need for manual interaction, such as drawing motion masks by hand and additional pose estimation, making it a highly effective and efficient solution. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

arXiv:2404.00376 [pdf, other]

Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks

Authors: Hyunjae Kim, Hyeon Hwang, Jiwoo Lee, Sihyeon Park, Dain Kim, Taewhoo Lee, Chanwoong Yoon, Jiwoong Sohn, Donghee Choi, Jaewoo Kang

Abstract: While recent advancements in commercial large language models (LM) have shown promising results in medical tasks, their closed-source nature poses significant privacy and security concerns, hindering their widespread use in the medical field. Despite efforts to create open-source models, their limited parameters often result in insufficient multi-step reasoning capabilities required for solving co… ▽ More While recent advancements in commercial large language models (LM) have shown promising results in medical tasks, their closed-source nature poses significant privacy and security concerns, hindering their widespread use in the medical field. Despite efforts to create open-source models, their limited parameters often result in insufficient multi-step reasoning capabilities required for solving complex medical problems. To address this, we introduce Meerkat, a new family of medical AI systems ranging from 7 to 70 billion parameters. The models were trained using our new synthetic dataset consisting of high-quality chain-of-thought reasoning paths sourced from 18 medical textbooks, along with diverse instruction-following datasets. Our systems achieved remarkable accuracy across six medical benchmarks, surpassing the previous best models such as MediTron and BioMistral, and GPT-3.5 by a large margin. Notably, Meerkat-7B surpassed the passing threshold of the United States Medical Licensing Examination (USMLE) for the first time for a 7B-parameter model, while Meerkat-70B outperformed GPT-4 by an average of 1.3%. Additionally, Meerkat-70B correctly diagnosed 21 out of 38 complex clinical cases, outperforming humans' 13.8 and closely matching GPT-4's 21.8. Our systems offered more detailed free-form responses to clinical queries compared to existing small models, approaching the performance level of large commercial models. This significantly narrows the performance gap with large LMs, showcasing its effectiveness in addressing complex medical challenges. △ Less

Submitted 30 June, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

Comments: Added new LLaMA-3-based models and experiments on NEJM case challenges

arXiv:2404.00201 [pdf, other]

Angular analysis of $B \to K^* e^+ e^-$ in the low-$q^2$ region with new electron identification at Belle

Authors: Belle Collaboration, D. Ferlewicz, P. Urquijo, I. Adachi, K. Adamczyk, H. Aihara, D. M. Asner, H. Atmacan, R. Ayad, V. Babu, Sw. Banerjee, P. Behera, K. Belous, J. Bennett, M. Bessner, V. Bhardwaj, B. Bhuyan, T. Bilka, D. Biswas, D. Bodrov, M. Bračko, P. Branchini, T. E. Browder, A. Budano, M. Campajola , et al. (145 additional authors not shown)

Abstract: We perform an angular analysis of the $B\to K^* e^+ e^-$ decay for the dielectron mass squared, $q^2$, range of $0.0008$ to $1.1200 ~\text{GeV}^2 /c^4$ using the full Belle data set in the $K^{*0} \to K^+ π^-$ and $K^{*+} \to K_S^0 π^+$ channels, incorporating new methods of electron identification to improve the statistical power of the data set. This analysis is sensitive to contributions from r… ▽ More We perform an angular analysis of the $B\to K^* e^+ e^-$ decay for the dielectron mass squared, $q^2$, range of $0.0008$ to $1.1200 ~\text{GeV}^2 /c^4$ using the full Belle data set in the $K^{*0} \to K^+ π^-$ and $K^{*+} \to K_S^0 π^+$ channels, incorporating new methods of electron identification to improve the statistical power of the data set. This analysis is sensitive to contributions from right-handed currents from physics beyond the Standard Model by constraining the Wilson coefficients $\mathcal{C}_7^{(\prime)}$. We perform a fit to the $B\to K^* e^+ e^-$ differential decay rate and measure the imaginary component of the transversality amplitude to be $A_T^{\rm Im} = -1.27 \pm 0.52 \pm 0.12$, and the $K^*$ transverse asymmetry to be $A_T^{(2)} = 0.52 \pm 0.53 \pm 0.11$. The resulting constraints on the value of $\mathcal{C}_7^{\prime}$ are consistent with the Standard Model within a $2σ$ confidence interval. △ Less

Submitted 2 April, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

Comments: Submitted to PRD

Report number: Belle preprint 2023-20, KEK preprint 2023-38

arXiv:2403.19785 [pdf, other]

Integrated Communication, Localization, and Sensing in 6G D-MIMO Networks

Authors: Hao Guo, Henk Wymeersch, Behrooz Makki, Hui Chen, Yibo Wu, Giuseppe Durisi, Musa Furkan Keskin, Mohammad H. Moghaddam, Charitha Madapatha, Han Yu, Peter Hammarberg, Hyowon Kim, Tommy Svensson

Abstract: Future generations of mobile networks call for concurrent sensing and communication functionalities in the same hardware and/or spectrum. Compared to communication, sensing services often suffer from limited coverage, due to the high path loss of the reflected signal and the increased infrastructure requirements. To provide a more uniform quality of service, distributed multiple input multiple out… ▽ More Future generations of mobile networks call for concurrent sensing and communication functionalities in the same hardware and/or spectrum. Compared to communication, sensing services often suffer from limited coverage, due to the high path loss of the reflected signal and the increased infrastructure requirements. To provide a more uniform quality of service, distributed multiple input multiple output (D-MIMO) systems deploy a large number of distributed nodes and efficiently control them, making distributed integrated sensing and communications (ISAC) possible. In this paper, we investigate ISAC in D-MIMO through the lens of different design architectures and deployments, revealing both conflicts and synergies. In addition, simulation and demonstration results reveal both opportunities and challenges towards the implementation of ISAC in D-MIMO. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.19270 [pdf, other]

sDPO: Don't Use Your Data All at Once

Authors: Dahyun Kim, Yungi Kim, Wonho Song, Hyeonwoo Kim, Yunsu Kim, Sanghoon Kim, Chanjun Park

Abstract: As development of large language models (LLM) progresses, aligning them with human preferences has become increasingly important. We propose stepwise DPO (sDPO), an extension of the recently popularized direct preference optimization (DPO) for alignment tuning. This approach involves dividing the available preference datasets and utilizing them in a stepwise manner, rather than employing it all at… ▽ More As development of large language models (LLM) progresses, aligning them with human preferences has become increasingly important. We propose stepwise DPO (sDPO), an extension of the recently popularized direct preference optimization (DPO) for alignment tuning. This approach involves dividing the available preference datasets and utilizing them in a stepwise manner, rather than employing it all at once. We demonstrate that this method facilitates the use of more precisely aligned reference models within the DPO training framework. Furthermore, sDPO trains the final model to be more performant, even outperforming other popular LLMs with more parameters. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Showing 151–200 of 5,715 results for author: Kim, H