subscribe to arXiv mailings

Exploring Camera Encoder Designs for Autonomous Driving Perception

Authors: Barath Lakshmanan, Joshua Chen, Shiyi Lan, Maying Shen, Zhiding Yu, Jose M. Alvarez

Abstract: The cornerstone of autonomous vehicles (AV) is a solid perception system, where camera encoders play a crucial role. Existing works usually leverage pre-trained Convolutional Neural Networks (CNN) or Vision Transformers (ViTs) designed for general vision tasks, such as image classification, segmentation, and 2D detection. Although those well-known architectures have achieved state-of-the-art accur… ▽ More The cornerstone of autonomous vehicles (AV) is a solid perception system, where camera encoders play a crucial role. Existing works usually leverage pre-trained Convolutional Neural Networks (CNN) or Vision Transformers (ViTs) designed for general vision tasks, such as image classification, segmentation, and 2D detection. Although those well-known architectures have achieved state-of-the-art accuracy in AV-related tasks, e.g., 3D Object Detection, there remains significant potential for improvement in network design due to the nuanced complexities of industrial-level AV dataset. Moreover, existing public AV benchmarks usually contain insufficient data, which might lead to inaccurate evaluation of those architectures.To reveal the AV-specific model insights, we start from a standard general-purpose encoder, ConvNeXt and progressively transform the design. We adjust different design parameters including width and depth of the model, stage compute ratio, attention mechanisms, and input resolution, supported by systematic analysis to each modifications. This customization yields an architecture optimized for AV camera encoder achieving 8.79% mAP improvement over the baseline. We believe our effort could become a sweet cookbook of image encoders for AV and pave the way to the next-level drive system. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2406.12079 [pdf, other]

Multi-Dimensional Pruning: Joint Channel, Layer and Block Pruning with Latency Constraint

Authors: Xinglong Sun, Barath Lakshmanan, Maying Shen, Shiyi Lan, Jingde Chen, Jose Alvarez

Abstract: As we push the boundaries of performance in various vision tasks, the models grow in size correspondingly. To keep up with this growth, we need very aggressive pruning techniques for efficient inference and deployment on edge devices. Existing pruning approaches are limited to channel pruning and struggle with aggressive parameter reductions. In this paper, we propose a novel multi-dimensional pru… ▽ More As we push the boundaries of performance in various vision tasks, the models grow in size correspondingly. To keep up with this growth, we need very aggressive pruning techniques for efficient inference and deployment on edge devices. Existing pruning approaches are limited to channel pruning and struggle with aggressive parameter reductions. In this paper, we propose a novel multi-dimensional pruning framework that jointly optimizes pruning across channels, layers, and blocks while adhering to latency constraints. We develop a latency modeling technique that accurately captures model-wide latency variations during pruning, which is crucial for achieving an optimal latency-accuracy trade-offs at high pruning ratio. We reformulate pruning as a Mixed-Integer Nonlinear Program (MINLP) to efficiently determine the optimal pruned structure with only a single pass. Our extensive results demonstrate substantial improvements over previous methods, particularly at large pruning ratios. In classification, our method significantly outperforms prior art HALP with a Top-1 accuracy of 70.0(v.s. 68.6) and an FPS of 5262 im/s(v.s. 4101 im/s). In 3D object detection, we establish a new state-of-the-art by pruning StreamPETR at a 45% pruning ratio, achieving higher FPS (37.3 vs. 31.7) and mAP (0.451 vs. 0.449) than the dense baseline. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Under Review

arXiv:2406.11556 [pdf]

PLATO's signal and noise budget

Authors: Anko Börner, Carsten Paproth, Juan Cabrera, Martin Pertenais, Heike Rauer, J. Miguel Mas-Hesse, Isabella Pagano, Jose Lorenzo Alvarez, Anders Erikson, Denis Grießbach, Yves Levillain, Demetrio Magrin, Valery Mogulsky, Sami-Matias Niemi, Thibaut Prod'homme, Sara Regibo, Joris De Ridder, Steve Rockstein, Reza Samadi, Dimitri Serrano-Velarde, Alan Smith, Peter Verhoeve, Dave Walton

Abstract: ESA's PLATO mission aims the detection and characterization of terrestrial planets around solar-type stars as well as the study of host star properties. The noise-to-signal ratio (NSR) is the main performance parameter of the PLATO instrument, which consists of 24 Normal Cameras and 2 Fast Cameras. In order to justify, verify and breakdown NSR-relevant requirements the software simulator PINE was… ▽ More ESA's PLATO mission aims the detection and characterization of terrestrial planets around solar-type stars as well as the study of host star properties. The noise-to-signal ratio (NSR) is the main performance parameter of the PLATO instrument, which consists of 24 Normal Cameras and 2 Fast Cameras. In order to justify, verify and breakdown NSR-relevant requirements the software simulator PINE was developed. PINE models the signal pathway from a target star to the digital output of a camera based on physical models and considers the major noise contributors. In this paper, the simulator's coarse mode is introduced which allows fast performance analyses on instrument level. The added value of PINE is illustrated by exemplary applications. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 17 pages, 8 figures, 3 tables

arXiv:2406.11218 [pdf]

Building another Spanish dictionary, this time with GPT-4

Authors: Miguel Ortega-Martín, Óscar García-Sierra, Alfonso Ardoiz, Juan Carlos Armenteros, Ignacio Garrido, Jorge Álvarez, Camilo Torrón, Iñigo Galdeano, Ignacio Arranz, Oleg Vorontsov, Adrián Alonso

Abstract: We present the "Spanish Built Factual Freectianary 2.0" (Spanish-BFF-2) as the second iteration of an AI-generated Spanish dictionary. Previously, we developed the inaugural version of this unique free dictionary employing GPT-3. In this study, we aim to improve the dictionary by using GPT-4-turbo instead. Furthermore, we explore improvements made to the initial version and compare the performance… ▽ More We present the "Spanish Built Factual Freectianary 2.0" (Spanish-BFF-2) as the second iteration of an AI-generated Spanish dictionary. Previously, we developed the inaugural version of this unique free dictionary employing GPT-3. In this study, we aim to improve the dictionary by using GPT-4-turbo instead. Furthermore, we explore improvements made to the initial version and compare the performance of both models. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.06978 [pdf, other]

Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

Authors: Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, Yu-Gang Jiang, Jose M. Alvarez

Abstract: We propose Hydra-MDP, a novel paradigm employing multiple teachers in a teacher-student model. This approach uses knowledge distillation from both human and rule-based teachers to train the student model, which features a multi-head decoder to learn diverse trajectory candidates tailored to various evaluation metrics. With the knowledge of rule-based teachers, Hydra-MDP learns how the environment… ▽ More We propose Hydra-MDP, a novel paradigm employing multiple teachers in a teacher-student model. This approach uses knowledge distillation from both human and rule-based teachers to train the student model, which features a multi-head decoder to learn diverse trajectory candidates tailored to various evaluation metrics. With the knowledge of rule-based teachers, Hydra-MDP learns how the environment influences the planning in an end-to-end manner instead of resorting to non-differentiable post-processing. This method achieves the $1^{st}$ place in the Navsim challenge, demonstrating significant improvements in generalization across diverse driving environments and conditions. Code will be available at \url{https://github.com/NVlabs/Hydra-MDP}. △ Less

Submitted 19 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: The 1st place solution of End-to-end Driving at Scale at the CVPR 2024 Autonomous Grand Challenge

arXiv:2406.04484 [pdf, ps, other]

Step Out and Seek Around: On Warm-Start Training with Incremental Data

Authors: Maying Shen, Hongxu Yin, Pavlo Molchanov, Lei Mao, Jose M. Alvarez

Abstract: Data often arrives in sequence over time in real-world deep learning applications such as autonomous driving. When new training data is available, training the model from scratch undermines the benefit of leveraging the learned knowledge, leading to significant training costs. Warm-starting from a previously trained checkpoint is the most intuitive way to retain knowledge and advance learning. How… ▽ More Data often arrives in sequence over time in real-world deep learning applications such as autonomous driving. When new training data is available, training the model from scratch undermines the benefit of leveraging the learned knowledge, leading to significant training costs. Warm-starting from a previously trained checkpoint is the most intuitive way to retain knowledge and advance learning. However, existing literature suggests that this warm-starting degrades generalization. In this paper, we advocate for warm-starting but stepping out of the previous converging point, thus allowing a better adaptation to new data without compromising previous knowledge. We propose Knowledge Consolidation and Acquisition (CKCA), a continuous model improvement algorithm with two novel components. First, a novel feature regularization (FeatReg) to retain and refine knowledge from existing checkpoints; Second, we propose adaptive knowledge distillation (AdaKD), a novel approach to forget mitigation and knowledge transfer. We tested our method on ImageNet using multiple splits of the training data. Our approach achieves up to $8.39\%$ higher top1 accuracy than the vanilla warm-starting and consistently outperforms the prior art with a large margin. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2405.20153 [pdf, other]

Decoherence-assisted quantum key distribution

Authors: Daniel R. Sabogal, Daniel F. Urrego, Juan Rafael Álvarez, Andrés F. Herrera, Juan P. Torres, Alejandra Valencia

Abstract: We present a theoretical and experimental study of a controllable decoherence-assisted quantum key distribution scheme. Our method is based on the possibility of introducing controllable decoherence to polarization qubits using the spatial degree of freedom of light. We show that our method reduces the amount of information that an eavesdropper can obtain in the BB84 protocol under the entangling… ▽ More We present a theoretical and experimental study of a controllable decoherence-assisted quantum key distribution scheme. Our method is based on the possibility of introducing controllable decoherence to polarization qubits using the spatial degree of freedom of light. We show that our method reduces the amount of information that an eavesdropper can obtain in the BB84 protocol under the entangling probe attack. We demonstrate experimentally that Alice and Bob can agree on a scheme to that gives low values of the quantum bit error rate, despite the presence of a large amount of decoherence in the transmission channel of the BB84 protocol. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.18902 [pdf, other]

A Causal Framework for Evaluating Deferring Systems

Authors: Filippo Palomba, Andrea Pugnana, José Manuel Alvarez, Salvatore Ruggieri

Abstract: Deferring systems extend supervised Machine Learning (ML) models with the possibility to defer predictions to human experts. However, evaluating the impact of a deferring strategy on system accuracy is still an overlooked area. This paper fills this gap by evaluating deferring systems through a causal lens. We link the potential outcomes framework for causal inference with deferring systems. This… ▽ More Deferring systems extend supervised Machine Learning (ML) models with the possibility to defer predictions to human experts. However, evaluating the impact of a deferring strategy on system accuracy is still an overlooked area. This paper fills this gap by evaluating deferring systems through a causal lens. We link the potential outcomes framework for causal inference with deferring systems. This allows us to identify the causal impact of the deferring strategy on predictive accuracy. We distinguish two scenarios. In the first one, we can access both the human and the ML model predictions for the deferred instances. In such a case, we can identify the individual causal effects for deferred instances and aggregates of them. In the second scenario, only human predictions are available for the deferred instances. In this case, we can resort to regression discontinuity design to estimate a local causal effect. We empirically evaluate our approach on synthetic and real datasets for seven deferring systems from the literature. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.17187 [pdf, other]

Memorize What Matters: Emergent Scene Decomposition from Multitraverse

Authors: Yiming Li, Zehong Wang, Yue Wang, Zhiding Yu, Zan Gojcic, Marco Pavone, Chen Feng, Jose M. Alvarez

Abstract: Humans naturally retain memories of permanent elements, while ephemeral moments often slip through the cracks of memory. This selective retention is crucial for robotic perception, localization, and mapping. To endow robots with this capability, we introduce 3D Gaussian Mapping (3DGM), a self-supervised, camera-only offline mapping framework grounded in 3D Gaussian Splatting. 3DGM converts multitr… ▽ More Humans naturally retain memories of permanent elements, while ephemeral moments often slip through the cracks of memory. This selective retention is crucial for robotic perception, localization, and mapping. To endow robots with this capability, we introduce 3D Gaussian Mapping (3DGM), a self-supervised, camera-only offline mapping framework grounded in 3D Gaussian Splatting. 3DGM converts multitraverse RGB videos from the same region into a Gaussian-based environmental map while concurrently performing 2D ephemeral object segmentation. Our key observation is that the environment remains consistent across traversals, while objects frequently change. This allows us to exploit self-supervision from repeated traversals to achieve environment-object decomposition. More specifically, 3DGM formulates multitraverse environmental mapping as a robust differentiable rendering problem, treating pixels of the environment and objects as inliers and outliers, respectively. Using robust feature distillation, feature residuals mining, and robust optimization, 3DGM jointly performs 2D segmentation and 3D mapping without human intervention. We build the Mapverse benchmark, sourced from the Ithaca365 and nuPlan datasets, to evaluate our method in unsupervised 2D segmentation, 3D reconstruction, and neural rendering. Extensive results verify the effectiveness and potential of our method for self-driving and robotics. △ Less

Submitted 29 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

Comments: Project page: https://3d-gaussian-mapping.github.io; Code and data: https://github.com/NVlabs/3DGM

arXiv:2405.13693 [pdf, ps, other]

Uncovering Algorithmic Discrimination: An Opportunity to Revisit the Comparator

Authors: Jose M. Alvarez, Salvatore Ruggieri

Abstract: Causal reasoning, in particular, counterfactual reasoning plays a central role in testing for discrimination. Counterfactual reasoning materializes when testing for discrimination, what is known as the counterfactual model of discrimination, when we compare the discrimination comparator with the discrimination complainant, where the comparator is a similar (or similarly situated) profile to that o… ▽ More Causal reasoning, in particular, counterfactual reasoning plays a central role in testing for discrimination. Counterfactual reasoning materializes when testing for discrimination, what is known as the counterfactual model of discrimination, when we compare the discrimination comparator with the discrimination complainant, where the comparator is a similar (or similarly situated) profile to that of the complainant used for testing the discrimination claim of the complainant. In this paper, we revisit the comparator by presenting two kinds of comparators based on the sort of causal intervention we want to represent. We present the ceteris paribus and the mutatis mutandis comparator, where the former is the standard and the latter is a new kind of comparator. We argue for the use of the mutatis mutandis comparator, which is built on the fairness given the difference notion, for testing future algorithmic discrimination cases. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13493 [pdf, other]

Euclid. III. The NISP Instrument

Authors: Euclid Collaboration, K. Jahnke, W. Gillard, M. Schirmer, A. Ealet, T. Maciaszek, E. Prieto, R. Barbier, C. Bonoli, L. Corcione, S. Dusini, F. Grupp, F. Hormuth, S. Ligori, L. Martin, G. Morgante, C. Padilla, R. Toledo-Moreo, M. Trifoglio, L. Valenziano, R. Bender, F. J. Castander, B. Garilli, P. B. Lilje, H. -W. Rix , et al. (412 additional authors not shown)

Abstract: The Near-Infrared Spectrometer and Photometer (NISP) on board the Euclid satellite provides multiband photometry and R>=450 slitless grism spectroscopy in the 950-2020nm wavelength range. In this reference article we illuminate the background of NISP's functional and calibration requirements, describe the instrument's integral components, and provide all its key properties. We also sketch the proc… ▽ More The Near-Infrared Spectrometer and Photometer (NISP) on board the Euclid satellite provides multiband photometry and R>=450 slitless grism spectroscopy in the 950-2020nm wavelength range. In this reference article we illuminate the background of NISP's functional and calibration requirements, describe the instrument's integral components, and provide all its key properties. We also sketch the processes needed to understand how NISP operates and is calibrated, and its technical potentials and limitations. Links to articles providing more details and technical background are included. NISP's 16 HAWAII-2RG (H2RG) detectors with a plate scale of 0.3" pix^-1 deliver a field-of-view of 0.57deg^2. In photo mode, NISP reaches a limiting magnitude of ~24.5AB mag in three photometric exposures of about 100s exposure time, for point sources and with a signal-to-noise ratio (SNR) of 5. For spectroscopy, NISP's point-source sensitivity is a SNR = 3.5 detection of an emission line with flux ~2x10^-16erg/s/cm^2 integrated over two resolution elements of 13.4A, in 3x560s grism exposures at 1.6 mu (redshifted Ha). Our calibration includes on-ground and in-flight characterisation and monitoring of detector baseline, dark current, non-linearity, and sensitivity, to guarantee a relative photometric accuracy of better than 1.5%, and relative spectrophotometry to better than 0.7%. The wavelength calibration must be better than 5A. NISP is the state-of-the-art instrument in the NIR for all science beyond small areas available from HST and JWST - and an enormous advance due to its combination of field size and high throughput of telescope and instrument. During Euclid's 6-year survey covering 14000 deg^2 of extragalactic sky, NISP will be the backbone for determining distances of more than a billion galaxies. Its NIR data will become a rich reference imaging and spectroscopy data set for the coming decades. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: Paper submitted as part of the A&A special issue 'Euclid on Sky', which contains Euclid key reference papers and first results from the Euclid Early Release Observations

arXiv:2405.13492 [pdf, other]

Euclid. II. The VIS Instrument

Authors: Euclid Collaboration, M. Cropper, A. Al-Bahlawan, J. Amiaux, S. Awan, R. Azzollini, K. Benson, M. Berthe, J. Boucher, E. Bozzo, C. Brockley-Blatt, G. P. Candini, C. Cara, R. A. Chaudery, R. E. Cole, P. Danto, J. Denniston, A. M. Di Giorgio, B. Dryer, J. Endicott, J. -P. Dubois, M. Farina, E. Galli, L. Genolet, J. P. D. Gow , et al. (403 additional authors not shown)

Abstract: This paper presents the specification, design, and development of the Visible Camera (VIS) on the ESA Euclid mission. VIS is a large optical-band imager with a field of view of 0.54 deg^2 sampled at 0.1" with an array of 609 Megapixels and spatial resolution of 0.18". It will be used to survey approximately 14,000 deg^2 of extragalactic sky to measure the distortion of galaxies in the redshift ran… ▽ More This paper presents the specification, design, and development of the Visible Camera (VIS) on the ESA Euclid mission. VIS is a large optical-band imager with a field of view of 0.54 deg^2 sampled at 0.1" with an array of 609 Megapixels and spatial resolution of 0.18". It will be used to survey approximately 14,000 deg^2 of extragalactic sky to measure the distortion of galaxies in the redshift range z=0.1-1.5 resulting from weak gravitational lensing, one of the two principal cosmology probes of Euclid. With photometric redshifts, the distribution of dark matter can be mapped in three dimensions, and, from how this has changed with look-back time, the nature of dark energy and theories of gravity can be constrained. The entire VIS focal plane will be transmitted to provide the largest images of the Universe from space to date, reaching m_AB>24.5 with S/N >10 in a single broad I_E~(r+i+z) band over a six year survey. The particularly challenging aspects of the instrument are the control and calibration of observational biases, which lead to stringent performance requirements and calibration regimes. With its combination of spatial resolution, calibration knowledge, depth, and area covering most of the extra-Galactic sky, VIS will also provide a legacy data set for many other fields. This paper discusses the rationale behind the VIS concept and describes the instrument design and development before reporting the pre-launch performance derived from ground calibrations and brief results from the in-orbit commissioning. VIS should reach fainter than m_AB=25 with S/N>10 for galaxies of full-width half-maximum of 0.3" in a 1.3" diameter aperture over the Wide Survey, and m_AB>26.4 for a Deep Survey that will cover more than 50 deg^2. The paper also describes how VIS works with the other Euclid components of survey, telescope, and science data processing to extract the cosmological information. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: Paper submitted as part of the A&A special issue `Euclid on Sky', which contains Euclid key reference papers and first results from the Euclid Early Release Observations

arXiv:2405.13491 [pdf, other]

Euclid. I. Overview of the Euclid mission

Authors: Euclid Collaboration, Y. Mellier, Abdurro'uf, J. A. Acevedo Barroso, A. Achúcarro, J. Adamek, R. Adam, G. E. Addison, N. Aghanim, M. Aguena, V. Ajani, Y. Akrami, A. Al-Bahlawan, A. Alavi, I. S. Albuquerque, G. Alestas, G. Alguero, A. Allaoui, S. W. Allen, V. Allevato, A. V. Alonso-Tetilla, B. Altieri, A. Alvarez-Candal, A. Amara, L. Amendola , et al. (1086 additional authors not shown)

Abstract: The current standard model of cosmology successfully describes a variety of measurements, but the nature of its main ingredients, dark matter and dark energy, remains unknown. Euclid is a medium-class mission in the Cosmic Vision 2015-2025 programme of the European Space Agency (ESA) that will provide high-resolution optical imaging, as well as near-infrared imaging and spectroscopy, over about 14… ▽ More The current standard model of cosmology successfully describes a variety of measurements, but the nature of its main ingredients, dark matter and dark energy, remains unknown. Euclid is a medium-class mission in the Cosmic Vision 2015-2025 programme of the European Space Agency (ESA) that will provide high-resolution optical imaging, as well as near-infrared imaging and spectroscopy, over about 14,000 deg^2 of extragalactic sky. In addition to accurate weak lensing and clustering measurements that probe structure formation over half of the age of the Universe, its primary probes for cosmology, these exquisite data will enable a wide range of science. This paper provides a high-level overview of the mission, summarising the survey characteristics, the various data-processing steps, and data products. We also highlight the main science objectives and expected performance. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: Paper submitted as part of the A&A special issue`Euclid on Sky'

arXiv:2405.08975 [pdf, other]

A distribution-free valid p-value for finite samples of bounded random variables

Authors: Joaquin Alvarez

Abstract: We build a valid p-value based on a concentration inequality for bounded random variables introduced by Pelekis, Ramon and Wang. The motivation behind this work is the calibration of predictive algorithms in a distribution-free setting. The super-uniform p-value is tighter than Hoeffding and Bentkus alternatives in certain regions. Even though we are motivated by a calibration setting in a machine… ▽ More We build a valid p-value based on a concentration inequality for bounded random variables introduced by Pelekis, Ramon and Wang. The motivation behind this work is the calibration of predictive algorithms in a distribution-free setting. The super-uniform p-value is tighter than Hoeffding and Bentkus alternatives in certain regions. Even though we are motivated by a calibration setting in a machine learning context, the ideas presented in this work are also relevant in classical statistical inference. Furthermore, we compare the power of a collection of valid p- values for bounded losses, which are presented in previous literature. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: -

arXiv:2405.01542 [pdf]

doi 10.13182/FST11-A12443

Materials research for hiper laser fusion facilities: chamber wall, structural material and final optics

Authors: J. Alvarez, A. Rivera, R. Gonzalez-Arrabal, D. Garoz, E. Del Rio, J. M. Perlado

Abstract: The European HiPER project aims to demonstrate commercial viability of inertial fusion energy within the following two decades. This goal requires an extensive Research & Development program on materials for different applications (e.g., first wall, structural components and final optics). In this paper we will discuss our activities in the framework of HiPER to develop materials studies for the d… ▽ More The European HiPER project aims to demonstrate commercial viability of inertial fusion energy within the following two decades. This goal requires an extensive Research & Development program on materials for different applications (e.g., first wall, structural components and final optics). In this paper we will discuss our activities in the framework of HiPER to develop materials studies for the different areas of interest. The chamber first wall will have to withstand explosions of at least 100 MJ at a repetition rate of 5-10 Hz. If direct drive targets are used, a dry wall chamber operated in vacuum is preferable. In this situation the major threat for the wall stems from ions. For reasonably low chamber radius (5-10 m) new materials based on W and C are being investigated, e.g., engineered surfaces and nanostructured materials. Structural materials will be subject to high fluxes of neutrons leading to deleterious effects, such as, swelling. Low activation advanced steels as well as new nanostructured materials are being investigated. The final optics lenses will not survive the extreme ion irradiation pulses originated in the explosions. Therefore, mitigation strategies are being investigated. In addition, efforts are being carried out in understanding optimized conditions to minimize the loss of optical properties by neutron and gamma irradiation. △ Less

Submitted 11 February, 2024; originally announced May 2024.

Journal ref: Fusion Science and Technology, vol. 60, n. 2, pp. 565-569, 2011

arXiv:2405.01533 [pdf, other]

OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning

Authors: Shihao Wang, Zhiding Yu, Xiaohui Jiang, Shiyi Lan, Min Shi, Nadine Chang, Jan Kautz, Ying Li, Jose M. Alvarez

Abstract: The advances in multimodal large language models (MLLMs) have led to growing interests in LLM-based autonomous driving agents to leverage their strong reasoning capabilities. However, capitalizing on MLLMs' strong reasoning capabilities for improved planning behavior is challenging since planning requires full 3D situational awareness beyond 2D reasoning. To address this challenge, our work propos… ▽ More The advances in multimodal large language models (MLLMs) have led to growing interests in LLM-based autonomous driving agents to leverage their strong reasoning capabilities. However, capitalizing on MLLMs' strong reasoning capabilities for improved planning behavior is challenging since planning requires full 3D situational awareness beyond 2D reasoning. To address this challenge, our work proposes a holistic framework for strong alignment between agent models and 3D driving tasks. Our framework starts with a novel 3D MLLM architecture that uses sparse queries to lift and compress visual representations into 3D before feeding them into an LLM. This query-based representation allows us to jointly encode dynamic objects and static map elements (e.g., traffic lanes), providing a condensed world model for perception-action alignment in 3D. We further propose OmniDrive-nuScenes, a new visual question-answering dataset challenging the true 3D situational awareness of a model with comprehensive visual question-answering (VQA) tasks, including scene description, traffic regulation, 3D grounding, counterfactual reasoning, decision making and planning. Extensive studies show the effectiveness of the proposed architecture as well as the importance of the VQA tasks for reasoning and planning in complex 3D scenes. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.14908 [pdf, other]

Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation

Authors: Hoang Chuong Nguyen, Tianyu Wang, Jose M. Alvarez, Miaomiao Liu

Abstract: This paper focuses on self-supervised monocular depth estimation in dynamic scenes trained on monocular videos. Existing methods jointly estimate pixel-wise depth and motion, relying mainly on an image reconstruction loss. Dynamic regions1 remain a critical challenge for these methods due to the inherent ambiguity in depth and motion estimation, resulting in inaccurate depth estimation. This paper… ▽ More This paper focuses on self-supervised monocular depth estimation in dynamic scenes trained on monocular videos. Existing methods jointly estimate pixel-wise depth and motion, relying mainly on an image reconstruction loss. Dynamic regions1 remain a critical challenge for these methods due to the inherent ambiguity in depth and motion estimation, resulting in inaccurate depth estimation. This paper proposes a self-supervised training framework exploiting pseudo depth labels for dynamic regions from training data. The key contribution of our framework is to decouple depth estimation for static and dynamic regions of images in the training data. We start with an unsupervised depth estimation approach, which provides reliable depth estimates for static regions and motion cues for dynamic regions and allows us to extract moving object information at the instance level. In the next stage, we use an object network to estimate the depth of those moving objects assuming rigid motions. Then, we propose a new scale alignment module to address the scale ambiguity between estimated depths for static and dynamic regions. We can then use the depth labels generated to train an end-to-end depth estimation network and improve its performance. Extensive experiments on the Cityscapes and KITTI datasets show that our self-training strategy consistently outperforms existing self/unsupervised depth estimation methods. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: Accepted to CVPR2024

arXiv:2404.01990 [pdf, other]

What is Point Supervision Worth in Video Instance Segmentation?

Authors: Shuaiyi Huang, De-An Huang, Zhiding Yu, Shiyi Lan, Subhashree Radhakrishnan, Jose M. Alvarez, Abhinav Shrivastava, Anima Anandkumar

Abstract: Video instance segmentation (VIS) is a challenging vision task that aims to detect, segment, and track objects in videos. Conventional VIS methods rely on densely-annotated object masks which are expensive. We reduce the human annotations to only one point for each object in a video frame during training, and obtain high-quality mask predictions close to fully supervised models. Our proposed train… ▽ More Video instance segmentation (VIS) is a challenging vision task that aims to detect, segment, and track objects in videos. Conventional VIS methods rely on densely-annotated object masks which are expensive. We reduce the human annotations to only one point for each object in a video frame during training, and obtain high-quality mask predictions close to fully supervised models. Our proposed training method consists of a class-agnostic proposal generation module to provide rich negative samples and a spatio-temporal point-based matcher to match the object queries with the provided point annotations. Comprehensive experiments on three VIS benchmarks demonstrate competitive performance of the proposed framework, nearly matching fully supervised methods. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2403.09230 [pdf, other]

Improving Distant 3D Object Detection Using 2D Box Supervision

Authors: Zetong Yang, Zhiding Yu, Chris Choy, Renhao Wang, Anima Anandkumar, Jose M. Alvarez

Abstract: Improving the detection of distant 3d objects is an important yet challenging task. For camera-based 3D perception, the annotation of 3d bounding relies heavily on LiDAR for accurate depth information. As such, the distance of annotation is often limited due to the sparsity of LiDAR points on distant objects, which hampers the capability of existing detectors for long-range scenarios. We address t… ▽ More Improving the detection of distant 3d objects is an important yet challenging task. For camera-based 3D perception, the annotation of 3d bounding relies heavily on LiDAR for accurate depth information. As such, the distance of annotation is often limited due to the sparsity of LiDAR points on distant objects, which hampers the capability of existing detectors for long-range scenarios. We address this challenge by considering only 2D box supervision for distant objects since they are easy to annotate. We propose LR3D, a framework that learns to recover the missing depth of distant objects. LR3D adopts an implicit projection head to learn the generation of mapping between 2D boxes and depth using the 3D supervision on close objects. This mapping allows the depth estimation of distant objects conditioned on their 2D boxes, making long-range 3D detection with 2D supervision feasible. Experiments show that without distant 3D annotations, LR3D allows camera-based methods to detect distant objects (over 200m) with comparable accuracy to full 3D supervision. Our framework is general, and could widely benefit 3D detection methods to a large extent. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR 2024

arXiv:2403.08708 [pdf, other]

The importance of stretching rate in achieving true stress relaxation in the elasto-capillary thinning of dilute solutions

Authors: Ann Aisling, Renee Saraka, Nicolas J. Alvarez

Abstract: This work focuses on inferring the molecular state of the polymer chain required to induce elasto-capillary stress relaxation and the accurate measure of the polymer relaxation time in uniaxial stretching of dilute polymer solutions. This work is facilitated by the discovery that constant velocity applied at early times leads to initial constant extension rate before reaching the Rayleigh-Plateau… ▽ More This work focuses on inferring the molecular state of the polymer chain required to induce elasto-capillary stress relaxation and the accurate measure of the polymer relaxation time in uniaxial stretching of dilute polymer solutions. This work is facilitated by the discovery that constant velocity applied at early times leads to initial constant extension rate before reaching the Rayleigh-Plateau instability. Such constant rate experiments are used to correlate initial stretching kinematics with the thinning dynamics in the elasto-capillary Regime. We show that there is a minimum initial strain-rate required to induce rate independent elastic effects. Below the minimum extension rate, insufficient stretching of the chain is observed before capillary instability, such that the polymer stress is comparable to the capillary stress at long times and true stress relaxation is not achieved. Above the minimum strain-rate, the chain reaches a critical stretch before instability, such that during the unstable filament thinning the polymer stress is significantly larger than the capillary stress and true stress relaxation is observed. Using a single relaxation mode Oldroyd-B model, we show that the the minimum strain rate leads to a required initial stretch of the chain before reaching the Rayleigh Plateau limit. Along with the accurate measure of relaxation time, this work introduces a characteristic dimensionless group, called the stretchability factor, that can be used to quantitatively compare different materials based on the overall material deformation/kinematic behavior, not just the relaxation time. Overall, these results demonstrate a useful methodology to study the stretching of dilute solutions using a constant velocity stretching scheme. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 27 pages, 9 figures

arXiv:2403.03538 [pdf, other]

RADIA -- Radio Advertisement Detection with Intelligent Analytics

Authors: Jorge Álvarez, Juan Carlos Armenteros, Camilo Torrón, Miguel Ortega-Martín, Alfonso Ardoiz, Óscar García, Ignacio Arranz, Íñigo Galdeano, Ignacio Garrido, Adrián Alonso, Fernando Bayón, Oleg Vorontsov

Abstract: Radio advertising remains an integral part of modern marketing strategies, with its appeal and potential for targeted reach undeniably effective. However, the dynamic nature of radio airtime and the rising trend of multiple radio spots necessitates an efficient system for monitoring advertisement broadcasts. This study investigates a novel automated radio advertisement detection technique incorpor… ▽ More Radio advertising remains an integral part of modern marketing strategies, with its appeal and potential for targeted reach undeniably effective. However, the dynamic nature of radio airtime and the rising trend of multiple radio spots necessitates an efficient system for monitoring advertisement broadcasts. This study investigates a novel automated radio advertisement detection technique incorporating advanced speech recognition and text classification algorithms. RadIA's approach surpasses traditional methods by eliminating the need for prior knowledge of the broadcast content. This contribution allows for detecting impromptu and newly introduced advertisements, providing a comprehensive solution for advertisement detection in radio broadcasting. Experimental results show that the resulting model, trained on carefully segmented and tagged text data, achieves an F1-macro score of 87.76 against a theoretical maximum of 89.33. This paper provides insights into the choice of hyperparameters and their impact on the model's performance. This study demonstrates its potential to ensure compliance with advertising broadcast contracts and offer competitive surveillance. This groundbreaking research could fundamentally change how radio advertising is monitored and open new doors for marketing optimization. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2402.09468 [pdf]

doi 10.1088/0029-5515/53/1/013010

Silica final lens performance in laser fusion facilities: HiPER and LIFE

Authors: David Garoz, R. González-Arrabal, R. Juárez, J. Álvarez, J. Sanz, J. M. Perlado, A. Rivera

Abstract: Nowadays, the projects LIFE (Laser Inertial Fusion Energy) in USA and HiPER (High Power Laser Energy Research) in Europe are the most advanced ones to demonstrate laser fusion energy viability. One of the main points of concern to properly achieve ignition is the performance of the final optics (lenses) under the severe irradiation conditions that take place in fusion facilities. In this paper, we… ▽ More Nowadays, the projects LIFE (Laser Inertial Fusion Energy) in USA and HiPER (High Power Laser Energy Research) in Europe are the most advanced ones to demonstrate laser fusion energy viability. One of the main points of concern to properly achieve ignition is the performance of the final optics (lenses) under the severe irradiation conditions that take place in fusion facilities. In this paper, we calculate the radiation fluxes and doses as well as the radiation-induced temperature enhancement and colour centre formation in final lenses assuming realistic geometrical configurations for HiPER and LIFE. On these bases, the mechanical stresses generated by the established temperature gradients are evaluated showing that from a mechanical point of view lenses only fulfill specifications if ions resulting from the imploding target are mitigated. The absorption coefficient of the lenses is calculated during reactor startup and steady-state operation. The obtained results evidence the necessity of new solutions to tackle ignition problems during the startup process for HiPER. Finally, we evaluated the effect of temperature gradients on focal length changes and lens surface deformations. In summary, we discuss the capabilities and weak points of silica lenses and propose alternatives to overcome predictable problems. △ Less

Submitted 11 February, 2024; originally announced February 2024.

Journal ref: Nuclear Fusion, vol. 53, no. 1, p. 013010, Jan. 2013

arXiv:2402.02400 [pdf]

doi 10.1109/tim.2019.2959290

Evaluation of Zadoff-Chu, Kasami and Chirp based encoding schemes for Acoustic Local Positioning Systems

Authors: Santiago Murano, Carmen Perez-Rubio, David Gualda, Fernando J. Alvarez, Teodoro Aguilera, Carlos de Marziani

Abstract: The task of determining the physical coordinates of a target in indoor environments is still a key factor for many applications including people and robot navigation, user tracking, location-based advertising, augmented reality, gaming, emergency response or ambient assisted living environments. Among the different possibilities for indoor positioning, Acoustic Local Positioning Systems (ALPS) hav… ▽ More The task of determining the physical coordinates of a target in indoor environments is still a key factor for many applications including people and robot navigation, user tracking, location-based advertising, augmented reality, gaming, emergency response or ambient assisted living environments. Among the different possibilities for indoor positioning, Acoustic Local Positioning Systems (ALPS) have the potential for centimeter level positioning accuracy with coverage distances up to tens of meters. In addition, acoustic transducers are small, low cost and reliable thanks to the room constrained propagation of these mechanical waves. Waveform design (coding and modulation) is usually incorporated into these systems to facilitate the detection of the transmitted signals at the receiver. The aperiodic correlation properties of the emitted signals have a large impact on how the ALPS cope with common impairment factors such as multipath propagation, multiple access interference, Doppler shifting, near-far effect or ambient noise. This work analyzes three of the most promising families of codes found in the literature for ALPS: Kasami codes, Zadoff-Chu and Orthogonal Chirp signals. The performance of these codes is evaluated in terms of time of arrival accuracy and characterized by means of model simulation under realistic conditions and by means of experimental tests in controlled environments. The results derived from this study can be of interest for other applications based on spreading sequences, such as underwater acoustic systems, ultrasonic imaging or even Code Division Multiple Access (CDMA) communications systems. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Journal ref: IEEE Transactions on Instrumentation and Measurement, vol. 69, no. 8, pp. 5356-5368, Aug. 2020

arXiv:2402.02391 [pdf]

doi 10.1109/tim.2018.2794939

Multipath Compensation Algorithm for TDMA-Based Ultrasonic Local Positioning Systems

Authors: Teodoro Aguilera, Fernando J. Alvarez, David Gualda, Jose M. Villadangos, Alvaro Hernandez, Jesus Urena

Abstract: This paper proposes a multipath compensation algorithm (MCA) to enhance the performance of an ultrasonic local positioning system under adverse multipath conditions. The proposed algorithm is based on the accurate estimation of the environment impulse response from which the corresponding line of sight for each channel is obtained. Experimental results in two different environments and with differ… ▽ More This paper proposes a multipath compensation algorithm (MCA) to enhance the performance of an ultrasonic local positioning system under adverse multipath conditions. The proposed algorithm is based on the accurate estimation of the environment impulse response from which the corresponding line of sight for each channel is obtained. Experimental results in two different environments and with different conditions have been conducted in order to evaluate the performance of this proposal. In both environments, results confirm the expected improvements, even under severe multipath conditions where positioning errors have been reduced from 44 to 9 cm for the 95% of the measurements. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Journal ref: IEEE Transactions on Instrumentation and Measurement, vol. 67, no. 5, pp. 984-991, 2018

arXiv:2402.02384 [pdf]

doi 10.1109/jproc.2018.2819938

Acoustic Local Positioning With Encoded Emission Beacons

Authors: Jesus Urena, Alvaro Hernandez, Juan Jesus Garcia, Jose Manuel Villadangos, Maria del Carmen Perez, David Gualda, Fernando J. Alvarez, Teodoro Aguilera

Abstract: Acoustic local positioning systems (ALPSs) are an interesting alternative for indoor positioning due to certain advantages over other approaches, including their relatively high accuracy, low cost, and room-level signal propagation. Centimeter-level or fine-grained indoor positioning can be an asset for robot navigation, guiding a person to, for instance, a particular piece in a museum or to a spe… ▽ More Acoustic local positioning systems (ALPSs) are an interesting alternative for indoor positioning due to certain advantages over other approaches, including their relatively high accuracy, low cost, and room-level signal propagation. Centimeter-level or fine-grained indoor positioning can be an asset for robot navigation, guiding a person to, for instance, a particular piece in a museum or to a specific product in a shop, targeted advertising, or augmented reality. In airborne system applications, acoustic positioning can be based on using opportunistic signals or sounds produced by the person or object to be located (e.g., noise from appliances or the speech from a speaker) or from encoded emission beacons (or anchors) specifically designed for this purpose. This work presents a review of the different challenges that designers of systems based on encoded emission beacons must address in order to achieve suitable performance. At low-level processing, the waveform design (coding and modulation) and the processing of the received signal are key factors to address such drawbacks as multipath propagation, multiple-access interference, nearfar effect, or Doppler shifting. With regards to high-level system design, the issues to be addressed are related to the distribution of beacons, ease of deployment, and calibration and positioning algorithms, including the possible fusion of information. Apart from theoretical discussions, this work also includes the description of an ALPS that was implemented, installed in a large area and tested for mobile robot navigation. In addition to practical interest for real applications, airborne ALPSs can also be used as an excellent platform to test complex algorithms, which can be subsequently adapted for other positioning systems, such as underwater acoustic systems or ultrawideband radiofrequency (UWB RF) systems. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Journal ref: Proceedings of the IEEE, vol. 106, no. 6, pp. 1042-1062, Jun. 2018

arXiv:2401.13408 [pdf, other]

Causal Perception

Authors: Jose M. Alvarez, Salvatore Ruggieri

Abstract: Perception occurs when two individuals interpret the same information differently. Despite being a known phenomenon with implications for bias in decision-making, as individual experience determines interpretation, perception remains largely overlooked in machine learning (ML) research. Modern decision flows, whether partially or fully automated, involve human experts interacting with ML applicati… ▽ More Perception occurs when two individuals interpret the same information differently. Despite being a known phenomenon with implications for bias in decision-making, as individual experience determines interpretation, perception remains largely overlooked in machine learning (ML) research. Modern decision flows, whether partially or fully automated, involve human experts interacting with ML applications. How might we then, e.g., account for two experts that interpret differently a deferred instance or an explanation from a ML model? To account for perception, we first need to formulate it. In this work, we define perception under causal reasoning using structural causal models (SCM). Our framework formalizes individual experience as additional causal knowledge that comes with and is used by a human expert (read, decision maker). We present two kinds of causal perception, unfaithful and inconsistent, based on the SCM properties of faithfulness and consistency. Further, we motivate the importance of perception within fairness problems. We illustrate our framework through a series of decision flow examples involving ML applications and human experts. △ Less

Submitted 22 May, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: arXiv admin note: text overlap with arXiv:2305.09535 by other authors

arXiv:2401.13378 [pdf]

doi 10.1038/s41598-021-03908-2

Tunable circular dichroism through absorption in coupled optical modes of twisted triskelia nanostructures

Authors: Javier Rodriguez Alvarez, Antonio Garcia Martin, Arantxa Fraile Rodriguez, Xavier Batlle, Amilcar Labarta

Abstract: We present a system consisting of two stacked chiral plasmonic nanoelements, so-called triskelia, that exhibits a high degree of circular dichroism. The optical modes arising from the interactions between the two elements are the main responsible for the dichroic signal. Their excitation in the absorption cross section is favored when the circular polarization of the light is opposite to the helic… ▽ More We present a system consisting of two stacked chiral plasmonic nanoelements, so-called triskelia, that exhibits a high degree of circular dichroism. The optical modes arising from the interactions between the two elements are the main responsible for the dichroic signal. Their excitation in the absorption cross section is favored when the circular polarization of the light is opposite to the helicity of the system, so that an intense near-field distribution with 3D character is excited between the two triskelia, which in turn causes the dichroic response. Therefore, the stacking, in itself, provides a simple way to tune both the value of the circular dichroism, up to 60%, and its spectral distribution in the visible and near infrared range. We show how these interaction-driven modes can be controlled by finely tuning the distance and the relative twist angle between the triskelia, yielding maximum values of the dichroism at 20° and 100° for left- and right-handed circularly polarized light, respectively. Despite the three-fold symmetry of the elements, these two situations are not completely equivalent since the interplay between the handedness of the stack and the chirality of each single element breaks the symmetry between clockwise and anticlockwise rotation angles around 0°. This reveals the occurrence of clear helicity-dependent resonances. The proposed structure can be thus finely tuned to tailor the dichroic signal for applications at will, such as highly efficient helicity-sensitive surface spectroscopies or single-photon polarization detectors, among others. △ Less

Submitted 24 January, 2024; originally announced January 2024.

Journal ref: Scientific Reports 12 (2022) 26

arXiv:2401.12702 [pdf]

doi 10.1021/acsnano.2c11016

Imaging of Antiferroelectric Dark Modes in an Inverted Plasmonic Lattice

Authors: Javier Rodriguez Alvarez, Amilcar Labarta, Juan Carlos Idrobo, Rossana Dell Anna, Alessandro Cian, Damiano Giubertoni, Xavier Borrise, Albert Guerrero, Francesc Perez Murano, Arantxa Fraile Rodriguez, Xavier Batlle

Abstract: Plasmonic lattice nanostructures are of technological interest because of their capacity to manipulate light below the diffraction limit. Here, we present a detailed study of dark and bright modes in the visible and near-infrared energy regime of an inverted plasmonic honeycomb lattice by a combination of Au+ focused ion beam lithography with nanometric resolution, optical and electron spectroscop… ▽ More Plasmonic lattice nanostructures are of technological interest because of their capacity to manipulate light below the diffraction limit. Here, we present a detailed study of dark and bright modes in the visible and near-infrared energy regime of an inverted plasmonic honeycomb lattice by a combination of Au+ focused ion beam lithography with nanometric resolution, optical and electron spectroscopy, and finite-difference time-domain simulations. The lattice consists of slits carved in a gold thin film, exhibiting hotspots and a set of bright and dark modes. We proposed that some of the dark modes detected by electron energy-loss spectroscopy are caused by antiferroelectric arrangements of the slit polarizations with two times the size of the hexagonal unit cell. The plasmonic resonances take place within the 0.5_2 eV energy range, indicating that they could be suitable for a synergistic coupling with excitons in two-dimensional transition metal dichalcogenides materials or for designing nanoscale sensing platforms based on near-field enhancement over a metallic surface. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Journal ref: ACS Nano 17 (2023) 8123

arXiv:2401.03844 [pdf, other]

Fully Attentional Networks with Self-emerging Token Labeling

Authors: Bingyin Zhao, Zhiding Yu, Shiyi Lan, Yutao Cheng, Anima Anandkumar, Yingjie Lao, Jose M. Alvarez

Abstract: Recent studies indicate that Vision Transformers (ViTs) are robust against out-of-distribution scenarios. In particular, the Fully Attentional Network (FAN) - a family of ViT backbones, has achieved state-of-the-art robustness. In this paper, we revisit the FAN models and improve their pre-training with a self-emerging token labeling (STL) framework. Our method contains a two-stage training framew… ▽ More Recent studies indicate that Vision Transformers (ViTs) are robust against out-of-distribution scenarios. In particular, the Fully Attentional Network (FAN) - a family of ViT backbones, has achieved state-of-the-art robustness. In this paper, we revisit the FAN models and improve their pre-training with a self-emerging token labeling (STL) framework. Our method contains a two-stage training framework. Specifically, we first train a FAN token labeler (FAN-TL) to generate semantically meaningful patch token labels, followed by a FAN student model training stage that uses both the token labels and the original class label. With the proposed STL framework, our best model based on FAN-L-Hybrid (77.3M parameters) achieves 84.8% Top-1 accuracy and 42.1% mCE on ImageNet-1K and ImageNet-C, and sets a new state-of-the-art for ImageNet-A (46.1%) and ImageNet-R (56.6%) without using extra data, outperforming the original FAN counterpart by significant margins. The proposed framework also demonstrates significantly enhanced performance on downstream tasks such as semantic segmentation, with up to 1.7% improvement in robustness over the counterpart model. Code is available at https://github.com/NVlabs/STL. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 5585-5595

arXiv:2401.03032 [pdf]

doi 10.1016/j.dib.2023.109978

Dataset of turbulent flow over interacting barchan dunes

Authors: Jimmy Gabriel Alvarez, Danilo da Silva Borges, Erick de Moraes Franklin

Abstract: Barchans are dunes commonly found in dune fields on Earth, Mars and other celestial bodies, where they can interact with each other. This article concerns experimental data for the flow over subaqueous barchans that are either isolated or interacting with each other. The experiments were carried out in a transparent channel of rectangular cross section in which turbulent water flows were imposed o… ▽ More Barchans are dunes commonly found in dune fields on Earth, Mars and other celestial bodies, where they can interact with each other. This article concerns experimental data for the flow over subaqueous barchans that are either isolated or interacting with each other. The experiments were carried out in a transparent channel of rectangular cross section in which turbulent water flows were imposed over either one single or a pair of barchans. The instantaneous flow fields were measured by using a low-frequency PIV (particle image velocimetry) and high-frequency PTV (particle tracking velocimetry). From the PIV and PTV data, the mean flow, trajectories, and second-order moments were computed, which are included in the datasets described in this paper, together with raw data (images), instantaneous fields, and scripts to process them. The datasets can be reused for benchmarking or for processing new images generated by other research groups. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Journal ref: Data in Brief, 52, 109978 (2024) - invited article

arXiv:2312.03031 [pdf, other]

Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?

Authors: Zhiqi Li, Zhiding Yu, Shiyi Lan, Jiahan Li, Jan Kautz, Tong Lu, Jose M. Alvarez

Abstract: End-to-end autonomous driving recently emerged as a promising research direction to target autonomy from a full-stack perspective. Along this line, many of the latest works follow an open-loop evaluation setting on nuScenes to study the planning behavior. In this paper, we delve deeper into the problem by conducting thorough analyses and demystifying more devils in the details. We initially observ… ▽ More End-to-end autonomous driving recently emerged as a promising research direction to target autonomy from a full-stack perspective. Along this line, many of the latest works follow an open-loop evaluation setting on nuScenes to study the planning behavior. In this paper, we delve deeper into the problem by conducting thorough analyses and demystifying more devils in the details. We initially observed that the nuScenes dataset, characterized by relatively simple driving scenarios, leads to an under-utilization of perception information in end-to-end models incorporating ego status, such as the ego vehicle's velocity. These models tend to rely predominantly on the ego vehicle's status for future path planning. Beyond the limitations of the dataset, we also note that current metrics do not comprehensively assess the planning quality, leading to potentially biased conclusions drawn from existing benchmarks. To address this issue, we introduce a new metric to evaluate whether the predicted trajectories adhere to the road. We further propose a simple baseline able to achieve competitive results without relying on perception annotations. Given the current limitations on the benchmark and metrics, we suggest the community reassess relevant prevailing research and be cautious whether the continued pursuit of state-of-the-art would yield convincing and universal conclusions. Code and models are available at \url{https://github.com/NVlabs/BEV-Planner} △ Less

Submitted 2 June, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

Comments: Accept to cvpr 2024

arXiv:2312.01696 [pdf, other]

BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection

Authors: Zhenxin Li, Shiyi Lan, Jose M. Alvarez, Zuxuan Wu

Abstract: Recently, the rise of query-based Transformer decoders is reshaping camera-based 3D object detection. These query-based decoders are surpassing the traditional dense BEV (Bird's Eye View)-based methods. However, we argue that dense BEV frameworks remain important due to their outstanding abilities in depth estimation and object localization, depicting 3D scenes accurately and comprehensively. This… ▽ More Recently, the rise of query-based Transformer decoders is reshaping camera-based 3D object detection. These query-based decoders are surpassing the traditional dense BEV (Bird's Eye View)-based methods. However, we argue that dense BEV frameworks remain important due to their outstanding abilities in depth estimation and object localization, depicting 3D scenes accurately and comprehensively. This paper aims to address the drawbacks of the existing dense BEV-based 3D object detectors by introducing our proposed enhanced components, including a CRF-modulated depth estimation module enforcing object-level consistencies, a long-term temporal aggregation module with extended receptive fields, and a two-stage object decoder combining perspective techniques with CRF-modulated depth embedding. These enhancements lead to a "modernized" dense BEV framework dubbed BEVNeXt. On the nuScenes benchmark, BEVNeXt outperforms both BEV-based and query-based frameworks under various settings, achieving a state-of-the-art result of 64.2 NDS on the nuScenes test set. Code will be available at \url{https://github.com/woxihuanjiangguo/BEVNeXt}. △ Less

Submitted 24 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

arXiv:2311.14671 [pdf, other]

SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation

Authors: Lingchen Meng, Shiyi Lan, Hengduo Li, Jose M. Alvarez, Zuxuan Wu, Yu-Gang Jiang

Abstract: In-context segmentation aims at segmenting novel images using a few labeled example images, termed as "in-context examples", exploring content similarities between examples and the target. The resulting models can be generalized seamlessly to novel segmentation tasks, significantly reducing the labeling and training costs compared with conventional pipelines. However, in-context segmentation is mo… ▽ More In-context segmentation aims at segmenting novel images using a few labeled example images, termed as "in-context examples", exploring content similarities between examples and the target. The resulting models can be generalized seamlessly to novel segmentation tasks, significantly reducing the labeling and training costs compared with conventional pipelines. However, in-context segmentation is more challenging than classic ones requiring the model to learn segmentation rules conditioned on a few samples. Unlike previous work with ad-hoc or non-end-to-end designs, we propose SEGIC, an end-to-end segment-in-context framework built upon a single vision foundation model (VFM). In particular, SEGIC leverages the emergent correspondence within VFM to capture dense relationships between target images and in-context samples. As such, information from in-context samples is then extracted into three types of instructions, i.e. geometric, visual, and meta instructions, serving as explicit conditions for the final mask prediction. SEGIC is a straightforward yet effective approach that yields state-of-the-art performance on one-shot segmentation benchmarks. Notably, SEGIC can be easily generalized to diverse tasks, including video object segmentation and open-vocabulary segmentation. Code will be available at https://github.com/MengLcool/SEGIC. △ Less

Submitted 29 March, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

arXiv:2310.20437 [pdf, other]

doi 10.3390/sym16020163

A forecast of the sensitivity of the DALI Experiment to Galactic axion dark matter

Authors: Juan F. Hernández Cabrera, Javier De Miguel, Enrique Joven Álvarez, E. Hernández-Suárez, J. Alberto Rubiiño-Martín, Chiko Otani

Abstract: The axion is a long-postulated boson that can simultaneously solve two fundamental problems of modern physics: the charge-parity symmetry problem in the strong interaction and the enigma of dark matter. In this work we estimate, by means of Monte Carlo simulations, the sensitivity of the Dark-photons$\&$Axion-Like particles Interferometer (DALI), a new-generation Fabry-Pérot haloscope proposed to… ▽ More The axion is a long-postulated boson that can simultaneously solve two fundamental problems of modern physics: the charge-parity symmetry problem in the strong interaction and the enigma of dark matter. In this work we estimate, by means of Monte Carlo simulations, the sensitivity of the Dark-photons$\&$Axion-Like particles Interferometer (DALI), a new-generation Fabry-Pérot haloscope proposed to probe axion dark matter in the 25-250 $μ$eV band. △ Less

Submitted 15 January, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

Comments: As accepted by Symmetry

arXiv:2310.19731 [pdf, other]

ViR: Towards Efficient Vision Retention Backbones

Authors: Ali Hatamizadeh, Michael Ranzinger, Shiyi Lan, Jose M. Alvarez, Sanja Fidler, Jan Kautz

Abstract: Vision Transformers (ViTs) have attracted a lot of popularity in recent years, due to their exceptional capabilities in modeling long-range spatial dependencies and scalability for large scale training. Although the training parallelism of self-attention mechanism plays an important role in retaining great performance, its quadratic complexity baffles the application of ViTs in many scenarios whic… ▽ More Vision Transformers (ViTs) have attracted a lot of popularity in recent years, due to their exceptional capabilities in modeling long-range spatial dependencies and scalability for large scale training. Although the training parallelism of self-attention mechanism plays an important role in retaining great performance, its quadratic complexity baffles the application of ViTs in many scenarios which demand fast inference. This effect is even more pronounced in applications in which autoregressive modeling of input features is required. In Natural Language Processing (NLP), a new stream of efforts has proposed parallelizable models with recurrent formulation that allows for efficient inference in generative applications. Inspired by this trend, we propose a new class of computer vision models, dubbed Vision Retention Networks (ViR), with dual parallel and recurrent formulations, which strike an optimal balance between fast inference and parallel training with competitive performance. In particular, ViR scales favorably for image throughput and memory consumption in tasks that require higher-resolution images due to its flexible formulation in processing large sequence lengths. The ViR is the first attempt to realize dual parallel and recurrent equivalency in a general vision backbone for recognition tasks. We have validated the effectiveness of ViR through extensive experiments with different dataset sizes and various image resolutions and achieved competitive performance. Code: https://github.com/NVlabs/ViR △ Less

Submitted 26 January, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

Comments: Introduction of Vision Retention Networks (ViR) for Efficient Visual Modeling

arXiv:2310.17525 [pdf, other]

Measuring Wigner functions of quantum states of light in the undergraduate laboratory

Authors: Juan-Rafael Álvarez, Andrés Martínez Silva, Alejandra Valencia

Abstract: In this work, we present an educational activity aimed at measuring the Wigner distribution functions of quantum states of light in the undergraduate laboratory. This project was conceived by students from various courses within the physics undergraduate curriculum, and its outcomes were used in an introductory Quantum Optics course at the Universidad de los Andes in Bogotá, Colombia. The activity… ▽ More In this work, we present an educational activity aimed at measuring the Wigner distribution functions of quantum states of light in the undergraduate laboratory. This project was conceived by students from various courses within the physics undergraduate curriculum, and its outcomes were used in an introductory Quantum Optics course at the Universidad de los Andes in Bogotá, Colombia. The activity entails a two-hour laboratory practice in which students engage with a pre-aligned experimental setup. They subsequently employ an open-access, custom-made computational graphical user interface to reconstruct the Wigner distribution function for various quantum states of light. Given that the testing phase coincided with the COVID-19 pandemic, we incorporated the capacity to analyze simulated data into the computational user interface. The activity is now part of the course syllabus and its virtual component has proven to be highly valuable for the implementation of distance learning in quantum optics. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: 10 pages, 5 figures

arXiv:2310.05509 [pdf, other]

Quartic rigid systems in the plane and in the Poincaré sphere

Authors: M. J. Álvarez, J. L. Bravo, L. A. Calderón

Abstract: We consider the planar family of rigid systems of the form $x'=-y+xP(x,y), y'=x+yP(x,y)$, where $P$ is any polynomial with monomials of degree one and three. This is the simplest non-trivial family of rigid systems with no rotatory parameters. The family can be compactified to the Poincaré sphere such that the vector field along the equator is not identically null. We study the centers, singular… ▽ More We consider the planar family of rigid systems of the form $x'=-y+xP(x,y), y'=x+yP(x,y)$, where $P$ is any polynomial with monomials of degree one and three. This is the simplest non-trivial family of rigid systems with no rotatory parameters. The family can be compactified to the Poincaré sphere such that the vector field along the equator is not identically null. We study the centers, singular points and limit cycles of that family on the plane and on the sphere. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: 19 pages, 10 figures

arXiv:2309.05192 [pdf, other]

Towards Viewpoint Robustness in Bird's Eye View Segmentation

Authors: Tzofi Klinghoffer, Jonah Philion, Wenzheng Chen, Or Litany, Zan Gojcic, Jungseock Joo, Ramesh Raskar, Sanja Fidler, Jose M. Alvarez

Abstract: Autonomous vehicles (AV) require that neural networks used for perception be robust to different viewpoints if they are to be deployed across many types of vehicles without the repeated cost of data collection and labeling for each. AV companies typically focus on collecting data from diverse scenarios and locations, but not camera rig configurations, due to cost. As a result, only a small number… ▽ More Autonomous vehicles (AV) require that neural networks used for perception be robust to different viewpoints if they are to be deployed across many types of vehicles without the repeated cost of data collection and labeling for each. AV companies typically focus on collecting data from diverse scenarios and locations, but not camera rig configurations, due to cost. As a result, only a small number of rig variations exist across most fleets. In this paper, we study how AV perception models are affected by changes in camera viewpoint and propose a way to scale them across vehicle types without repeated data collection and labeling. Using bird's eye view (BEV) segmentation as a motivating task, we find through extensive experiments that existing perception models are surprisingly sensitive to changes in camera viewpoint. When trained with data from one camera rig, small changes to pitch, yaw, depth, or height of the camera at inference time lead to large drops in performance. We introduce a technique for novel view synthesis and use it to transform collected data to the viewpoint of target rigs, allowing us to train BEV segmentation models for diverse target rigs without any additional data collection or labeling cost. To analyze the impact of viewpoint changes, we leverage synthetic data to mitigate other gaps (content, ISP, etc). Our approach is then trained on real data and evaluated on synthetic data, enabling evaluation on diverse target rigs. We release all data for use in future work. Our method is able to recover an average of 14.7% of the IoU that is otherwise lost when deploying to new rigs. △ Less

Submitted 10 September, 2023; originally announced September 2023.

Comments: ICCV 2023. Project Page: https://nvlabs.github.io/viewpoint-robustness

arXiv:2308.04556 [pdf, other]

FocalFormer3D : Focusing on Hard Instance for 3D Object Detection

Authors: Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Animashree Anandkumar, Jiaya Jia, Jose Alvarez

Abstract: False negatives (FN) in 3D object detection, {\em e.g.}, missing predictions of pedestrians, vehicles, or other obstacles, can lead to potentially dangerous situations in autonomous driving. While being fatal, this issue is understudied in many current 3D detection methods. In this work, we propose Hard Instance Probing (HIP), a general pipeline that identifies \textit{FN} in a multi-stage manner… ▽ More False negatives (FN) in 3D object detection, {\em e.g.}, missing predictions of pedestrians, vehicles, or other obstacles, can lead to potentially dangerous situations in autonomous driving. While being fatal, this issue is understudied in many current 3D detection methods. In this work, we propose Hard Instance Probing (HIP), a general pipeline that identifies \textit{FN} in a multi-stage manner and guides the models to focus on excavating difficult instances. For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall. FocalFormer3D features a multi-stage query generation to discover hard objects and a box-level transformer decoder to efficiently distinguish objects from massive object candidates. Experimental results on the nuScenes and Waymo datasets validate the superior performance of FocalFormer3D. The advantage leads to strong performance on both detection and tracking, in both LiDAR and multi-modal settings. Notably, FocalFormer3D achieves a 70.5 mAP and 73.9 NDS on nuScenes detection benchmark, while the nuScenes tracking benchmark shows 72.1 AMOTA, both ranking 1st place on the nuScenes LiDAR leaderboard. Our code is available at \url{https://github.com/NVlabs/FocalFormer3D}. △ Less

Submitted 8 August, 2023; originally announced August 2023.

Comments: Accepted by ICCV 2023

arXiv:2308.02236 [pdf, other]

FB-BEV: BEV Representation from Forward-Backward View Transformations

Authors: Zhiqi Li, Zhiding Yu, Wenhai Wang, Anima Anandkumar, Tong Lu, Jose M. Alvarez

Abstract: View Transformation Module (VTM), where transformations happen between multi-view image features and Bird-Eye-View (BEV) representation, is a crucial step in camera-based BEV perception systems. Currently, the two most prominent VTM paradigms are forward projection and backward projection. Forward projection, represented by Lift-Splat-Shoot, leads to sparsely projected BEV features without post-pr… ▽ More View Transformation Module (VTM), where transformations happen between multi-view image features and Bird-Eye-View (BEV) representation, is a crucial step in camera-based BEV perception systems. Currently, the two most prominent VTM paradigms are forward projection and backward projection. Forward projection, represented by Lift-Splat-Shoot, leads to sparsely projected BEV features without post-processing. Backward projection, with BEVFormer being an example, tends to generate false-positive BEV features from incorrect projections due to the lack of utilization on depth. To address the above limitations, we propose a novel forward-backward view transformation module. Our approach compensates for the deficiencies in both existing methods, allowing them to enhance each other to obtain higher quality BEV representations mutually. We instantiate the proposed module with FB-BEV, which achieves a new state-of-the-art result of 62.4% NDS on the nuScenes test set. Code and models are available at https://github.com/NVlabs/FB-BEV. △ Less

Submitted 17 August, 2023; v1 submitted 4 August, 2023; originally announced August 2023.

Comments: Accept to ICCV 2023, camera-ready version

arXiv:2307.15398 [pdf, other]

The Initial Screening Order Problem

Authors: Jose M. Alvarez, Antonio Mastropietro, Salvatore Ruggieri

Abstract: We investigate the role of the initial screening order (ISO) in candidate screening processes, such as employee hiring and academic admissions. The ISO refers to the order in which the screener evaluates the candidate pool. It has been largely overlooked in the literature, despite its potential impact on the optimality and fairness of the chosen set, especially under a human screener. We define tw… ▽ More We investigate the role of the initial screening order (ISO) in candidate screening processes, such as employee hiring and academic admissions. The ISO refers to the order in which the screener evaluates the candidate pool. It has been largely overlooked in the literature, despite its potential impact on the optimality and fairness of the chosen set, especially under a human screener. We define two problem formulations: the best-$k$, where the screener selects the $k$ best candidates, and the good-$k$, where the screener selects the $k$ first good-enough candidates. To study the impact of the ISO, we introduce a human-like screener and compare it to its algorithmic counterpart. The human-like screener is conceived to be inconsistent over time due to fatigue. Our analysis shows that the ISO, in particular, under a human-like screener hinders individual fairness despite meeting group level fairness. This is due to the position bias, where a candidate's evaluation is affected by its position within the ISO. We report extensive simulated experiments exploring the parameters of the best-$k$ and good-$k$ problem formulations both for the algorithmic and human-like screeners. This work is motivated by a real world candidate screening problem studied in collaboration with a large European company. △ Less

Submitted 24 April, 2024; v1 submitted 28 July, 2023; originally announced July 2023.

arXiv:2307.04106 [pdf, other]

Parametric Depth Based Feature Representation Learning for Object Detection and Segmentation in Bird's Eye View

Authors: Jiayu Yang, Enze Xie, Miaomiao Liu, Jose M. Alvarez

Abstract: Recent vision-only perception models for autonomous driving achieved promising results by encoding multi-view image features into Bird's-Eye-View (BEV) space. A critical step and the main bottleneck of these methods is transforming image features into the BEV coordinate frame. This paper focuses on leveraging geometry information, such as depth, to model such feature transformation. Existing works… ▽ More Recent vision-only perception models for autonomous driving achieved promising results by encoding multi-view image features into Bird's-Eye-View (BEV) space. A critical step and the main bottleneck of these methods is transforming image features into the BEV coordinate frame. This paper focuses on leveraging geometry information, such as depth, to model such feature transformation. Existing works rely on non-parametric depth distribution modeling leading to significant memory consumption, or ignore the geometry information to address this problem. In contrast, we propose to use parametric depth distribution modeling for feature transformation. We first lift the 2D image features to the 3D space defined for the ego vehicle via a predicted parametric depth distribution for each pixel in each view. Then, we aggregate the 3D feature volume based on the 3D space occupancy derived from depth to the BEV frame. Finally, we use the transformed features for downstream tasks such as object detection and semantic segmentation. Existing semantic segmentation methods do also suffer from an hallucination problem as they do not take visibility information into account. This hallucination can be particularly problematic for subsequent modules such as control and planning. To mitigate the issue, our method provides depth uncertainty and reliable visibility-aware estimations. We further leverage our parametric depth modeling to present a novel visibility-aware evaluation metric that, when taken into account, can mitigate the hallucination problem. Extensive experiments on object detection and semantic segmentation on the nuScenes datasets demonstrate that our method outperforms existing methods on both tasks. △ Less

Submitted 11 July, 2023; v1 submitted 9 July, 2023; originally announced July 2023.

arXiv:2307.03702 [pdf, other]

doi 10.1016/j.jmr.2022.107143

Monitoring Electron Spin Fluctuations with Paramagnetic Relaxation Enhancement

Authors: Daniel Jardon Alvarez, Tahel Malka, Johan van Tol, Yishay Feldman, Raanan Carmieli, Michal Leskes

Abstract: The magnetic interactions between the spin of an unpaired electron and the surrounding nuclear spins can be exploited to gain structural information, to reduce nuclear relaxation times as well as to create nuclear hyperpolarization via dynamic nuclear polarization (DNP). A central aspect that determines how these interactions manifest from the point of view of NMR is the timescale of the fluctuati… ▽ More The magnetic interactions between the spin of an unpaired electron and the surrounding nuclear spins can be exploited to gain structural information, to reduce nuclear relaxation times as well as to create nuclear hyperpolarization via dynamic nuclear polarization (DNP). A central aspect that determines how these interactions manifest from the point of view of NMR is the timescale of the fluctuations of the magnetic moment of the electron spins. These fluctuations, however, are elusive, particularly when electron relaxation times are short or interactions among electronic spins are strong. Here we map the fluctuations by analyzing the ratio between longitudinal and transverse nuclear relaxation times T1 and T2, a quantity which depends uniquely on the rate of the electron fluctuations and the Larmor frequency of the involved nuclei. This analysis enables rationalizing the evolution of NMR lineshapes, signal quenching as well as DNP enhancements as a function of the concentration of the paramagnetic species and the temperature, demonstrated here for LiMgMnPO4 and Fe(3+) doped Li4Ti5O12, respectively. For the latter, we observe a linear dependence of the DNP enhancement and the electron relaxation time within a temperature range between 100 and 300K. △ Less

Submitted 7 July, 2023; originally announced July 2023.

Journal ref: Journal of Magnetic Resonance Journal of Magnetic Resonance Volume 336, March 2022, 107143

arXiv:2307.01492 [pdf, other]

FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation

Authors: Zhiqi Li, Zhiding Yu, David Austin, Mingsheng Fang, Shiyi Lan, Jan Kautz, Jose M. Alvarez

Abstract: This technical report summarizes the winning solution for the 3D Occupancy Prediction Challenge, which is held in conjunction with the CVPR 2023 Workshop on End-to-End Autonomous Driving and CVPR 23 Workshop on Vision-Centric Autonomous Driving Workshop. Our proposed solution FB-OCC builds upon FB-BEV, a cutting-edge camera-based bird's-eye view perception design using forward-backward projection.… ▽ More This technical report summarizes the winning solution for the 3D Occupancy Prediction Challenge, which is held in conjunction with the CVPR 2023 Workshop on End-to-End Autonomous Driving and CVPR 23 Workshop on Vision-Centric Autonomous Driving Workshop. Our proposed solution FB-OCC builds upon FB-BEV, a cutting-edge camera-based bird's-eye view perception design using forward-backward projection. On top of FB-BEV, we further study novel designs and optimization tailored to the 3D occupancy prediction task, including joint depth-semantic pre-training, joint voxel-BEV representation, model scaling up, and effective post-processing strategies. These designs and optimization result in a state-of-the-art mIoU score of 54.19% on the nuScenes dataset, ranking the 1st place in the challenge track. Code and models will be released at: https://github.com/NVlabs/FB-BEV. △ Less

Submitted 4 July, 2023; originally announced July 2023.

Comments: Outstanding Champion and Innovation Award in the 3D Occupancy Prediction Challenge (CVPR23)

arXiv:2306.14306 [pdf, other]

Adaptive Sharpness-Aware Pruning for Robust Sparse Networks

Authors: Anna Bair, Hongxu Yin, Maying Shen, Pavlo Molchanov, Jose Alvarez

Abstract: Robustness and compactness are two essential attributes of deep learning models that are deployed in the real world. The goals of robustness and compactness may seem to be at odds, since robustness requires generalization across domains, while the process of compression exploits specificity in one domain. We introduce Adaptive Sharpness-Aware Pruning (AdaSAP), which unifies these goals through the… ▽ More Robustness and compactness are two essential attributes of deep learning models that are deployed in the real world. The goals of robustness and compactness may seem to be at odds, since robustness requires generalization across domains, while the process of compression exploits specificity in one domain. We introduce Adaptive Sharpness-Aware Pruning (AdaSAP), which unifies these goals through the lens of network sharpness. The AdaSAP method produces sparse networks that are robust to input variations which are unseen at training time. We achieve this by strategically incorporating weight perturbations in order to optimize the loss landscape. This allows the model to be both primed for pruning and regularized for improved robustness. AdaSAP improves the robust accuracy of pruned models on image classification by up to +6% on ImageNet C and +4% on ImageNet V2, and on object detection by +4% on a corrupted Pascal VOC dataset, over a wide range of compression ratios, pruning criteria, and network architectures, outperforming recent pruning art by large margins. △ Less

Submitted 13 March, 2024; v1 submitted 25 June, 2023; originally announced June 2023.

arXiv:2306.06189 [pdf, other]

FasterViT: Fast Vision Transformers with Hierarchical Attention

Authors: Ali Hatamizadeh, Greg Heinrich, Hongxu Yin, Andrew Tao, Jose M. Alvarez, Jan Kautz, Pavlo Molchanov

Abstract: We design a new family of hybrid CNN-ViT neural networks, named FasterViT, with a focus on high image throughput for computer vision (CV) applications. FasterViT combines the benefits of fast local representation learning in CNNs and global modeling properties in ViT. Our newly introduced Hierarchical Attention (HAT) approach decomposes global self-attention with quadratic complexity into a multi-… ▽ More We design a new family of hybrid CNN-ViT neural networks, named FasterViT, with a focus on high image throughput for computer vision (CV) applications. FasterViT combines the benefits of fast local representation learning in CNNs and global modeling properties in ViT. Our newly introduced Hierarchical Attention (HAT) approach decomposes global self-attention with quadratic complexity into a multi-level attention with reduced computational costs. We benefit from efficient window-based self-attention. Each window has access to dedicated carrier tokens that participate in local and global representation learning. At a high level, global self-attentions enable the efficient cross-window communication at lower costs. FasterViT achieves a SOTA Pareto-front in terms of accuracy and image throughput. We have extensively validated its effectiveness on various CV tasks including classification, object detection and segmentation. We also show that HAT can be used as a plug-and-play module for existing networks and enhance them. We further demonstrate significantly faster and more accurate performance than competitive counterparts for images with high resolution. Code is available at https://github.com/NVlabs/FasterViT. △ Less

Submitted 1 April, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

Comments: ICLR'24 Accepted Paper

arXiv:2305.06866 [pdf]

doi 10.1016/j.ultramic.2023.113757

GHz sample excitation at the ALBA-PEEM

Authors: Muhammad Waqas Khaliq, José M. Álvarez, Antonio Camps, Nahikari González, José Ferrer, Ana Martinez-Carboneres, Jordi Prat, Sandra Ruiz-Gómez, Miguel Angel Niño, Ferran Macià, Lucia Aballe, Michael Foerster

Abstract: We describe a setup that is used for high-frequency electrical sample excitation in a cathode lens electron microscope with the sample stage at high voltage as used in many synchrotron light sources. Electrical signals are transmitted by dedicated high-frequency components to the printed circuit board supporting the sample. Sub-miniature push-on connectors (SMP) are used to realize the connection… ▽ More We describe a setup that is used for high-frequency electrical sample excitation in a cathode lens electron microscope with the sample stage at high voltage as used in many synchrotron light sources. Electrical signals are transmitted by dedicated high-frequency components to the printed circuit board supporting the sample. Sub-miniature push-on connectors (SMP) are used to realize the connection in the ultra-high vacuum chamber, bypassing the standard feedthrough. A bandwidth up to 4 GHz with -6 dB attenuation was measured at the sample position, which allows to apply sub-nanosecond pulses. We describe different electronic sample excitation schemes and demonstrate a spatial resolution of 56 nm employing the new setup. △ Less

Submitted 11 May, 2023; originally announced May 2023.

Journal ref: Ultramicroscopy 2023

arXiv:2305.04899 [pdf, other]

doi 10.1088/1361-6455/acf9d2

Bursts of polarised single photons from atom-cavity sources

Authors: Jan Ole Ernst, Juan-Rafael Alvarez, Thomas D. Barrett, Axel Kuhn

Abstract: Photonic qubits play an instrumental role in the development of advanced quantum technologies, including quantum networking, boson sampling and measurement based quantum computing. A promising framework for the deterministic production of indistinguishable single photons is an atomic emitter coupled to a single mode of a high finesse optical cavity. Polarisation control is an important cornerstone… ▽ More Photonic qubits play an instrumental role in the development of advanced quantum technologies, including quantum networking, boson sampling and measurement based quantum computing. A promising framework for the deterministic production of indistinguishable single photons is an atomic emitter coupled to a single mode of a high finesse optical cavity. Polarisation control is an important cornerstone, particularly when the polarisation defines the state of a quantum bit. Here, we propose a scheme for producing bursts of polarised single photons by coupling a generalised atomic emitter to an optical cavity, exploiting a particular choice of quantisation axis. In connection with two re-preparation methods, simulations predict 10-photon bursts coincidence count rates on the order of 1 kHz with single 87Rb atoms trapped in a state of the art optical cavity. This paves the way for novel n-photon experiments with atom-cavity sources. △ Less

Submitted 25 August, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

Journal ref: Journal of Physics B: Atomic, Molecular and Optical Physics, Volume 56, Number 20, 2023

arXiv:2304.06271 [pdf]

A Contribution of the HAWC Observatory to the TeV era in the High Energy Gamma-Ray Astrophysics: The case of the TeV-Halos

Authors: Ramiro Torres-Escobedo, Hao Zhou, Eduardo de la Fuente, A. U. Abeysekara, A. Albert, R. Alfaro, C. Alvarez, J. D. Álvarez, J. R. Angeles Camacho, J. C. Arteaga-Velázquez, K. P. Arunbabu, D. Avila Rojas, H. A. Ayala Solares, R. Babu, V. Baghmanyan, A. S. Barber, J. Becerra Gonzalez, E. Belmont-Moreno, S. Y. BenZvi, D. Berley, C. Brisbois, K. S. Caballero-Mora, T. Capistrán, A. Carramiñana, S. Casanova , et al. (108 additional authors not shown)

Abstract: We present a short overview of the TeV-Halos objects as a discovery and a relevant contribution of the High Altitude Water Čerenkov (HAWC) observatory to TeV astrophysics. We discuss history, discovery, knowledge, and the next step through a new and more detailed analysis than the original study in 2017. TeV-Halos will contribute to resolving the problem of the local positron excess observed on th… ▽ More We present a short overview of the TeV-Halos objects as a discovery and a relevant contribution of the High Altitude Water Čerenkov (HAWC) observatory to TeV astrophysics. We discuss history, discovery, knowledge, and the next step through a new and more detailed analysis than the original study in 2017. TeV-Halos will contribute to resolving the problem of the local positron excess observed on the Earth. To clarify the latter, understanding the diffusion process is mandatory. △ Less

Submitted 13 April, 2023; originally announced April 2023.

Comments: Work presented in the 21st International Symposium on Very High Energy Cosmic Ray Interactions(ISVHECRI 2022) as part of the Ph. D. Thesis of Ramiro Torres-Escobedo (SJTU, Shanghai, China). Accepted for publication in SciPost Physics Proceedings (ISSN 2666-4003). 11 pages, 3 Figures. Short overview of HAWC and TeV Halos objects until 2022

arXiv:2304.04869 [pdf, other]

doi 10.1088/1538-3873/acd1b5

The James Webb Space Telescope Mission

Authors: Jonathan P. Gardner, John C. Mather, Randy Abbott, James S. Abell, Mark Abernathy, Faith E. Abney, John G. Abraham, Roberto Abraham, Yasin M. Abul-Huda, Scott Acton, Cynthia K. Adams, Evan Adams, David S. Adler, Maarten Adriaensen, Jonathan Albert Aguilar, Mansoor Ahmed, Nasif S. Ahmed, Tanjira Ahmed, Rüdeger Albat, Loïc Albert, Stacey Alberts, David Aldridge, Mary Marsha Allen, Shaune S. Allen, Martin Altenburg , et al. (983 additional authors not shown)

Abstract: Twenty-six years ago a small committee report, building on earlier studies, expounded a compelling and poetic vision for the future of astronomy, calling for an infrared-optimized space telescope with an aperture of at least $4m$. With the support of their governments in the US, Europe, and Canada, 20,000 people realized that vision as the $6.5m$ James Webb Space Telescope. A generation of astrono… ▽ More Twenty-six years ago a small committee report, building on earlier studies, expounded a compelling and poetic vision for the future of astronomy, calling for an infrared-optimized space telescope with an aperture of at least $4m$. With the support of their governments in the US, Europe, and Canada, 20,000 people realized that vision as the $6.5m$ James Webb Space Telescope. A generation of astronomers will celebrate their accomplishments for the life of the mission, potentially as long as 20 years, and beyond. This report and the scientific discoveries that follow are extended thank-you notes to the 20,000 team members. The telescope is working perfectly, with much better image quality than expected. In this and accompanying papers, we give a brief history, describe the observatory, outline its objectives and current observing program, and discuss the inventions and people who made it possible. We cite detailed reports on the design and the measured performance on orbit. △ Less

Submitted 10 April, 2023; originally announced April 2023.

Comments: Accepted by PASP for the special issue on The James Webb Space Telescope Overview, 29 pages, 4 figures

Showing 1–50 of 265 results for author: Alvarez, J