subscribe to arXiv mailings

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

Authors: Chaofan Tao, Qian Liu, Longxu Dou, Niklas Muennighoff, Zhongwei Wan, Ping Luo, Min Lin, Ngai Wong

Abstract: Research on scaling large language models (LLMs) has primarily focused on model parameters and training data size, overlooking the role of vocabulary size. % Intuitively, larger vocabularies enable more efficient tokenization by representing sentences with fewer tokens, but they also increase the risk of under-fitting representations for rare tokens. We investigate how vocabulary size impacts LLM… ▽ More Research on scaling large language models (LLMs) has primarily focused on model parameters and training data size, overlooking the role of vocabulary size. % Intuitively, larger vocabularies enable more efficient tokenization by representing sentences with fewer tokens, but they also increase the risk of under-fitting representations for rare tokens. We investigate how vocabulary size impacts LLM scaling laws by training models ranging from 33M to 3B parameters on up to 500B characters with various vocabulary configurations. We propose three complementary approaches for predicting the compute-optimal vocabulary size: IsoFLOPs analysis, derivative estimation, and parametric fit of the loss function. Our approaches converge on the same result that the optimal vocabulary size depends on the available compute budget and that larger models deserve larger vocabularies. However, most LLMs use too small vocabulary sizes. For example, we predict that the optimal vocabulary size of Llama2-70B should have been at least 216K, 7 times larger than its vocabulary of 32K. We validate our predictions empirically by training models with 3B parameters across different FLOPs budgets. Adopting our predicted optimal vocabulary size consistently improves downstream performance over commonly used vocabulary sizes. By increasing the vocabulary size from the conventional 32K to 43K, we improve performance on ARC-Challenge from 29.1 to 32.0 with the same 2.3e21 FLOPs. Our work emphasizes the necessity of jointly considering model parameters and vocabulary size for efficient scaling. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: 11 pages

arXiv:2407.10154 [pdf, ps, other]

Ternary rings of operators and their linking von Neumann algebras

Authors: Liguang Wang, Ngai-Ching Wong

Abstract: In this short note, we show that a von Neumann algebra can be written as the linking von Neumann algebra of a $W^\ast$-ternary ring of operators ($W^\ast$-TRO, in short), if and only if, it contains no abelian direct summand. We also provide some new characterizations for nuclear TROs and $W^\ast$-exact TROs. In this short note, we show that a von Neumann algebra can be written as the linking von Neumann algebra of a $W^\ast$-ternary ring of operators ($W^\ast$-TRO, in short), if and only if, it contains no abelian direct summand. We also provide some new characterizations for nuclear TROs and $W^\ast$-exact TROs. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 5 pages

MSC Class: 46L10; 46L50

arXiv:2407.10150 [pdf, ps, other]

Operational 2-local automorphisms/derivations

Authors: Liguang Wang, Ngai-Ching Wong

Abstract: Let $φ: A\to A$ be a (not necessarily linear, additive or continuous) map of a standard operator algebra. Suppose for any $a,b\in A$ there is an algebra automorphism $θ_{a,b}$ of $ A$ such that \begin{align*} φ(a)φ(b) = θ_{a,b}(ab). \end{align*} We show that either $φ$ or $-φ$ is a linear Jordan homomorphism. Similar results are obtained when any of the following conditions is satisfied: \begi… ▽ More Let $φ: A\to A$ be a (not necessarily linear, additive or continuous) map of a standard operator algebra. Suppose for any $a,b\in A$ there is an algebra automorphism $θ_{a,b}$ of $ A$ such that \begin{align*} φ(a)φ(b) = θ_{a,b}(ab). \end{align*} We show that either $φ$ or $-φ$ is a linear Jordan homomorphism. Similar results are obtained when any of the following conditions is satisfied: \begin{align*} φ(a) + φ(b) &= θ_{a,b}(a+b), \\ φ(a)φ(b)+φ(b)φ(a) &= θ_{a,b}(ab+ba), \quad\text{or} \\ φ(a)φ(b)φ(a) &= θ_{a,b}(aba). \end{align*} We also show that a map $φ: M\to M$ of a semi-finite von Neumann algebra $ M$ is a linear derivation if for every $a,b\in M$ there is a linear derivation $D_{a,b}$ of $M$ such that $$ φ(a)b + aφ(b) = D_{a,b}(ab). $$ △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 10 pages; to appear in J. Nonlinear and Convex Analysis

MSC Class: 46L10; 46L50

arXiv:2406.13572 [pdf, other]

Entanglement source and quantum memory analysis for zero added-loss multiplexing

Authors: Jeffrey H. Shapiro, Michael G. Raymer, Clark Embleton, Franco N. C. Wong, Brian J. Smith

Abstract: High-rate, high-fidelity entanglement distribution is essential to the creation of a quantum internet, but recent achievements in fiber and satellite-based entanglement distribution fall far short of what is needed. Chen et al. [Phys. Rev. Appl. 19, 054209 (2023)] proposed a means for dramatically increasing entanglement-distribution rates via zero added-loss multiplexing (ZALM). ZALM's quantum tr… ▽ More High-rate, high-fidelity entanglement distribution is essential to the creation of a quantum internet, but recent achievements in fiber and satellite-based entanglement distribution fall far short of what is needed. Chen et al. [Phys. Rev. Appl. 19, 054209 (2023)] proposed a means for dramatically increasing entanglement-distribution rates via zero added-loss multiplexing (ZALM). ZALM's quantum transmitter employs a pair of Sagnac-configured spontaneous parametric downconverters (SPDCs), channelization via dense wavelength-division multiplexing (DWDM) filtering, and partial Bell-state measurements (BSMs) to realize a near-deterministic, heralded source of frequency-multiplexed polarization-entangled biphotons. Each biphoton is transmitted to Alice and Bob with a classical message identifying its frequency channel and the heralded entangled state. Their quantum receivers use DWDM filtering and mode conversion to interface their received biphotons to intra-cavity color-center quantum memories. This paper delves deeply into ZALM's SPDCs, partial-BSMs, and loading of Alice and Bob's quantum memories. It derives the density operators for the SPDC sources and the quantum memories, allowing heralding probability, heralding efficiency, and fidelity to be evaluated for both the polarization-entangled biphotons and the loaded quantum memories, thus enabling exploration of the parameter space for optimizing ZALM performance. Even without optimization analysis, the paper already demonstrates two critical features of the ZALM architecture: the necessity of achieving a near-separable channelized biphoton wave function to ensure the biphoton sent to Alice and Bob is of high purity; and the premium placed on Alice and Bob's temporal-mode converters' enabling narrowband push-pull memory loading to ensure the arriving biphoton's state is faithfully transferred to the intra-cavity color centers. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 26 pages, 15 figure, 1 table

arXiv:2406.11909 [pdf, other]

Mixture-of-Subspaces in Low-Rank Adaptation

Authors: Taiqiang Wu, Jiahao Wang, Zhe Zhao, Ngai Wong

Abstract: In this paper, we introduce a subspace-inspired Low-Rank Adaptation (LoRA) method, which is computationally efficient, easy to implement, and readily applicable to large language, multimodal, and diffusion models. Initially, we equivalently decompose the weights of LoRA into two subspaces, and find that simply mixing them can enhance performance. To study such a phenomenon, we revisit it through a… ▽ More In this paper, we introduce a subspace-inspired Low-Rank Adaptation (LoRA) method, which is computationally efficient, easy to implement, and readily applicable to large language, multimodal, and diffusion models. Initially, we equivalently decompose the weights of LoRA into two subspaces, and find that simply mixing them can enhance performance. To study such a phenomenon, we revisit it through a fine-grained subspace lens, showing that such modification is equivalent to employing a fixed mixer to fuse the subspaces. To be more flexible, we jointly learn the mixer with the original LoRA weights, and term the method Mixture-of-Subspaces LoRA (MoSLoRA). MoSLoRA consistently outperforms LoRA on tasks in different modalities, including commonsense reasoning, visual instruction tuning, and subject-driven text-to-image generation, demonstrating its effectiveness and robustness. Codes are available at https://github.com/wutaiqiang/MoSLoRA. △ Less

Submitted 5 July, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

Comments: working in progress

arXiv:2406.03063 [pdf, other]

In-operando microwave scattering-parameter calibrated measurement of a Josephson travelling wave parametric amplifier

Authors: S. H. Shin, M. Stanley, W. N. Wong, T. Sweetnam, A. Elarabi, T. Lindström, N. M. Ridler, S. E. de Graaf

Abstract: Superconducting travelling wave parametric amplifiers (TWPAs) are broadband near-quantum limited microwave amplifiers commonly used for qubit readout and a wide range of other applications in quantum technologies. The performance of these amplifiers depends on achieving impedance matching to minimise reflected signals. Here we apply a microwave calibration technique to extract the S-parameters of… ▽ More Superconducting travelling wave parametric amplifiers (TWPAs) are broadband near-quantum limited microwave amplifiers commonly used for qubit readout and a wide range of other applications in quantum technologies. The performance of these amplifiers depends on achieving impedance matching to minimise reflected signals. Here we apply a microwave calibration technique to extract the S-parameters of a Josephson junction based TWPA in-operando. This enables reflections occurring at the TWPA and its extended network of components to be quantified, and we find that the in-operation performance can be well described by the off-state measured S-parameters. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.12398 [pdf, other]

ASMR: Activation-sharing Multi-resolution Coordinate Networks For Efficient Inference

Authors: Jason Chun Lok Li, Steven Tin Sui Luo, Le Xu, Ngai Wong

Abstract: Coordinate network or implicit neural representation (INR) is a fast-emerging method for encoding natural signals (such as images and videos) with the benefits of a compact neural representation. While numerous methods have been proposed to increase the encoding capabilities of an INR, an often overlooked aspect is the inference efficiency, usually measured in multiply-accumulate (MAC) count. This… ▽ More Coordinate network or implicit neural representation (INR) is a fast-emerging method for encoding natural signals (such as images and videos) with the benefits of a compact neural representation. While numerous methods have been proposed to increase the encoding capabilities of an INR, an often overlooked aspect is the inference efficiency, usually measured in multiply-accumulate (MAC) count. This is particularly critical in use cases where inference throughput is greatly limited by hardware constraints. To this end, we propose the Activation-Sharing Multi-Resolution (ASMR) coordinate network that combines multi-resolution coordinate decomposition with hierarchical modulations. Specifically, an ASMR model enables the sharing of activations across grids of the data. This largely decouples its inference cost from its depth which is directly correlated to its reconstruction capability, and renders a near O(1) inference complexity irrespective of the number of layers. Experiments show that ASMR can reduce the MAC of a vanilla SIREN model by up to 500x while achieving an even higher reconstruction quality than its SIREN baseline. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: ICLR 2024 (v3: 21 pages, 11 figures, Project Page: https://github.com/stevolopolis/asmr.git)

arXiv:2405.10531 [pdf, other]

Nonparametric Teaching of Implicit Neural Representations

Authors: Chen Zhang, Steven Tin Sui Luo, Jason Chun Lok Li, Yik-Chung Wu, Ngai Wong

Abstract: We investigate the learning of implicit neural representation (INR) using an overparameterized multilayer perceptron (MLP) via a novel nonparametric teaching perspective. The latter offers an efficient example selection framework for teaching nonparametrically defined (viz. non-closed-form) target functions, such as image functions defined by 2D grids of pixels. To address the costly training of I… ▽ More We investigate the learning of implicit neural representation (INR) using an overparameterized multilayer perceptron (MLP) via a novel nonparametric teaching perspective. The latter offers an efficient example selection framework for teaching nonparametrically defined (viz. non-closed-form) target functions, such as image functions defined by 2D grids of pixels. To address the costly training of INRs, we propose a paradigm called Implicit Neural Teaching (INT) that treats INR learning as a nonparametric teaching problem, where the given signal being fitted serves as the target function. The teacher then selects signal fragments for iterative training of the MLP to achieve fast convergence. By establishing a connection between MLP evolution through parameter-based gradient descent and that of function evolution through functional gradient descent in nonparametric teaching, we show for the first time that teaching an overparameterized MLP is consistent with teaching a nonparametric learner. This new discovery readily permits a convenient drop-in of nonparametric teaching algorithms to broadly enhance INR training efficiency, demonstrating 30%+ training time savings across various input modalities. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: ICML 2024 (24 pages, 13 figures)

arXiv:2405.08804 [pdf, other]

Photon Ring Interferometric Signatures Beyond The Universal Regime

Authors: He Jia, Eliot Quataert, Alexandru Lupsasca, George N. Wong

Abstract: We calculate the interferometric signatures of black hole photon rings beyond the universal regime by perturbatively including the effects of finite ring width. Our approach first slices a thick ring into a series of thin rings, each of which falls within the universal regime. We thus calculate the visibility of the thick ring by aggregating the contributions from each thin ring, and then perturba… ▽ More We calculate the interferometric signatures of black hole photon rings beyond the universal regime by perturbatively including the effects of finite ring width. Our approach first slices a thick ring into a series of thin rings, each of which falls within the universal regime. We thus calculate the visibility of the thick ring by aggregating the contributions from each thin ring, and then perturbatively expand the result into polynomials of the baseline length $u$. We show that the visibility amplitude of a thick ring depends on its "center-of-light" diameter; it also includes additional higher-order corrections due to the width of the ring, with the leading correction terms proportional to $u^2$ for the envelope and $u^3$ for the phase. We apply our method to images ray traced from general-relativistic magnetohydrodynamic (GRMHD) simulations and demonstrate that incorporating the higher-order corrections is crucial for accurately modeling the visibility of the first photon ring around M87*. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 10+6 pages, 7+3 figures, to be submitted

arXiv:2405.05573 [pdf, other]

Poisoning-based Backdoor Attacks for Arbitrary Target Label with Positive Triggers

Authors: Binxiao Huang, Jason Chun Lok, Chang Liu, Ngai Wong

Abstract: Poisoning-based backdoor attacks expose vulnerabilities in the data preparation stage of deep neural network (DNN) training. The DNNs trained on the poisoned dataset will be embedded with a backdoor, making them behave well on clean data while outputting malicious predictions whenever a trigger is applied. To exploit the abundant information contained in the input data to output label mapping, our… ▽ More Poisoning-based backdoor attacks expose vulnerabilities in the data preparation stage of deep neural network (DNN) training. The DNNs trained on the poisoned dataset will be embedded with a backdoor, making them behave well on clean data while outputting malicious predictions whenever a trigger is applied. To exploit the abundant information contained in the input data to output label mapping, our scheme utilizes the network trained from the clean dataset as a trigger generator to produce poisons that significantly raise the success rate of backdoor attacks versus conventional approaches. Specifically, we provide a new categorization of triggers inspired by the adversarial technique and develop a multi-label and multi-payload Poisoning-based backdoor attack with Positive Triggers (PPT), which effectively moves the input closer to the target label on benign classifiers. After the classifier is trained on the poisoned dataset, we can generate an input-label-aware trigger to make the infected classifier predict any given input to any target label with a high possibility. Under both dirty- and clean-label settings, we show empirically that the proposed attack achieves a high attack success rate without sacrificing accuracy across various datasets, including SVHN, CIFAR10, GTSRB, and Tiny ImageNet. Furthermore, the PPT attack can elude a variety of classical backdoor defenses, proving its effectiveness. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.02356 [pdf, other]

Stochastic Multivariate Universal-Radix Finite-State Machine: a Theoretically and Practically Elegant Nonlinear Function Approximator

Authors: Xincheng Feng, Guodong Shen, Jianhao Hu, Meng Li, Ngai Wong

Abstract: Nonlinearities are crucial for capturing complex input-output relationships especially in deep neural networks. However, nonlinear functions often incur various hardware and compute overheads. Meanwhile, stochastic computing (SC) has emerged as a promising approach to tackle this challenge by trading output precision for hardware simplicity. To this end, this paper proposes a first-of-its-kind sto… ▽ More Nonlinearities are crucial for capturing complex input-output relationships especially in deep neural networks. However, nonlinear functions often incur various hardware and compute overheads. Meanwhile, stochastic computing (SC) has emerged as a promising approach to tackle this challenge by trading output precision for hardware simplicity. To this end, this paper proposes a first-of-its-kind stochastic multivariate universal-radix finite-state machine (SMURF) that harnesses SC for hardware-simplistic multivariate nonlinear function generation at high accuracy. We present the finite-state machine (FSM) architecture for SMURF, as well as analytical derivations of sampling gate coefficients for accurately approximating generic nonlinear functions. Experiments demonstrate the superiority of SMURF, requiring only 16.07% area and 14.45% power consumption of Taylor-series approximation, and merely 2.22% area of look-up table (LUT) schemes. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.10179 [pdf, other]

Scaling Instructable Agents Across Many Simulated Worlds

Authors: SIMA Team, Maria Abi Raad, Arun Ahuja, Catarina Barros, Frederic Besse, Andrew Bolt, Adrian Bolton, Bethanie Brownfield, Gavin Buttimore, Max Cant, Sarah Chakera, Stephanie C. Y. Chan, Jeff Clune, Adrian Collister, Vikki Copeman, Alex Cullum, Ishita Dasgupta, Dario de Cesare, Julia Di Trapani, Yani Donchev, Emma Dunleavy, Martin Engelcke, Ryan Faulkner, Frankie Garcia, Charles Gbadamosi , et al. (68 additional authors not shown)

Abstract: Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI. Accomplishing this goal requires learning to ground language in perception and embodied actions, in order to accomplish complex tasks. The Scalable, Instructable, Multiworld Agent (SIMA) project tackles this by training agents to follow free-form instructio… ▽ More Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI. Accomplishing this goal requires learning to ground language in perception and embodied actions, in order to accomplish complex tasks. The Scalable, Instructable, Multiworld Agent (SIMA) project tackles this by training agents to follow free-form instructions across a diverse range of virtual 3D environments, including curated research environments as well as open-ended, commercial video games. Our goal is to develop an instructable agent that can accomplish anything a human can do in any simulated 3D environment. Our approach focuses on language-driven generality while imposing minimal assumptions. Our agents interact with environments in real-time using a generic, human-like interface: the inputs are image observations and language instructions and the outputs are keyboard-and-mouse actions. This general approach is challenging, but it allows agents to ground language across many visually complex and semantically rich environments while also allowing us to readily run agents in new environments. In this paper we describe our motivation and goal, the initial progress we have made, and promising preliminary results on several diverse research environments and a variety of commercial video games. △ Less

Submitted 17 April, 2024; v1 submitted 13 March, 2024; originally announced April 2024.

arXiv:2404.02657 [pdf, other]

Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models

Authors: Taiqiang Wu, Chaofan Tao, Jiahao Wang, Zhe Zhao, Ngai Wong

Abstract: Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to compress Large Language Models (LLMs). Contrary to prior assertions that reverse Kullback-Leibler (RKL) divergence is mode-seeking and thus preferable over the mean-seeking forward Kullback-Leibler (FKL) divergence, this study empirically and theoretically demonstrates that neither mode-seeking nor mean-seeking prope… ▽ More Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to compress Large Language Models (LLMs). Contrary to prior assertions that reverse Kullback-Leibler (RKL) divergence is mode-seeking and thus preferable over the mean-seeking forward Kullback-Leibler (FKL) divergence, this study empirically and theoretically demonstrates that neither mode-seeking nor mean-seeking properties manifest in KD for LLMs. Instead, RKL and FKL are found to share the same optimization objective and both converge after a sufficient number of epochs. However, due to practical constraints, LLMs are seldom trained for such an extensive number of epochs. Meanwhile, we further find that RKL focuses on the tail part of the distributions, while FKL focuses on the head part at the beginning epochs. Consequently, we propose a simple yet effective Adaptive Kullback-Leiber (AKL) divergence method, which adaptively allocates weights to combine FKL and RKL. Metric-based and GPT-4-based evaluations demonstrate that the proposed AKL outperforms the baselines across various tasks and improves the diversity and quality of generated responses. △ Less

Submitted 16 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

Comments: working on progress

arXiv:2403.19238 [pdf, other]

Taming Lookup Tables for Efficient Image Retouching

Authors: Sidi Yang, Binxiao Huang, Mingdeng Cao, Yatai Ji, Hanzhong Guo, Ngai Wong, Yujiu Yang

Abstract: The widespread use of high-definition screens in edge devices, such as end-user cameras, smartphones, and televisions, is spurring a significant demand for image enhancement. Existing enhancement models often optimize for high performance while falling short of reducing hardware inference time and power consumption, especially on edge devices with constrained computing and storage resources. To th… ▽ More The widespread use of high-definition screens in edge devices, such as end-user cameras, smartphones, and televisions, is spurring a significant demand for image enhancement. Existing enhancement models often optimize for high performance while falling short of reducing hardware inference time and power consumption, especially on edge devices with constrained computing and storage resources. To this end, we propose Image Color Enhancement Lookup Table (ICELUT) that adopts LUTs for extremely efficient edge inference, without any convolutional neural network (CNN). During training, we leverage pointwise (1x1) convolution to extract color information, alongside a split fully connected layer to incorporate global information. Both components are then seamlessly converted into LUTs for hardware-agnostic deployment. ICELUT achieves near-state-of-the-art performance and remarkably low power consumption. We observe that the pointwise network structure exhibits robust scalability, upkeeping the performance even with a heavily downsampled 32x32 input image. These enable ICELUT, the first-ever purely LUT-based image enhancer, to reach an unprecedented speed of 0.4ms on GPU and 7ms on CPU, at least one order faster than any CNN solution. Codes are available at https://github.com/Stephen0808/ICELUT. △ Less

Submitted 13 July, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

Comments: Accepted by ECCV2024

arXiv:2402.14866 [pdf, other]

doi 10.1145/3649329.3658498

APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models

Authors: Ziyi Guan, Hantao Huang, Yupeng Su, Hong Huang, Ngai Wong, Hao Yu

Abstract: Large Language Models (LLMs) have greatly advanced the natural language processing paradigm. However, the high computational load and huge model sizes pose a grand challenge for deployment on edge devices. To this end, we propose APTQ (Attention-aware Post-Training Mixed-Precision Quantization) for LLMs, which considers not only the second-order information of each layer's weights, but also, for t… ▽ More Large Language Models (LLMs) have greatly advanced the natural language processing paradigm. However, the high computational load and huge model sizes pose a grand challenge for deployment on edge devices. To this end, we propose APTQ (Attention-aware Post-Training Mixed-Precision Quantization) for LLMs, which considers not only the second-order information of each layer's weights, but also, for the first time, the nonlinear effect of attention outputs on the entire model. We leverage the Hessian trace as a sensitivity metric for mixed-precision quantization, ensuring an informed precision reduction that retains model performance. Experiments show APTQ surpasses previous quantization methods, achieving an average of 4 bit width a 5.22 perplexity nearly equivalent to full precision in the C4 dataset. In addition, APTQ attains state-of-the-art zero-shot accuracy of 68.24\% and 70.48\% at an average bitwidth of 3.8 in LLaMa-7B and LLaMa-13B, respectively, demonstrating its effectiveness to produce high-quality quantized LLMs. △ Less

Submitted 15 April, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: 6 pages, 2 figures, published to DAC 2024: 61st IEEE/ACM Design Automation Conference. (DAC'24)

arXiv:2402.11417 [pdf, other]

LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models

Authors: Yifan Yang, Jiajun Zhou, Ngai Wong, Zheng Zhang

Abstract: Various parameter-efficient fine-tuning (PEFT) techniques have been proposed to enable computationally efficient fine-tuning while maintaining model performance. However, existing PEFT methods are still limited by the growing number of trainable parameters with the rapid deployment of Large Language Models (LLMs). To address this challenge, we present LoRETTA, an ultra-parameter-efficient framewor… ▽ More Various parameter-efficient fine-tuning (PEFT) techniques have been proposed to enable computationally efficient fine-tuning while maintaining model performance. However, existing PEFT methods are still limited by the growing number of trainable parameters with the rapid deployment of Large Language Models (LLMs). To address this challenge, we present LoRETTA, an ultra-parameter-efficient framework that significantly reduces trainable parameters through tensor-train decomposition. Specifically, we propose two methods, named {LoRETTA}$_{adp}$ and {LoRETTA}$_{rep}$. The former employs tensorized adapters, offering a high-performance yet lightweight approach for the fine-tuning of LLMs. The latter emphasizes fine-tuning via weight parameterization with a set of small tensor factors. LoRETTA achieves comparable or better performance than most widely used PEFT methods with up to $100\times$ fewer parameters on the LLaMA-2-7B models. Furthermore, empirical results demonstrate that the proposed method effectively improves training efficiency, enjoys better multi-task learning performance, and enhances the anti-overfitting capability. Plug-and-play codes built upon the Huggingface framework and PEFT library will be released. △ Less

Submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.00927 [pdf, other]

doi 10.1051/0004-6361/202348308

Ordered magnetic fields around the 3C 84 central black hole

Authors: G. F. Paraschos, J. -Y. Kim, M. Wielgus, J. Röder, T. P. Krichbaum, E. Ros, I. Agudo, I. Myserlis, M. Moscibrodzka, E. Traianou, J. A. Zensus, L. Blackburn, C. -K. Chan, S. Issaoun, M. Janssen, M. D. Johnson, V. L. Fish, K. Akiyama, A. Alberdi, W. Alef, J. C. Algaba, R. Anantua, K. Asada, R. Azulay, U. Bach , et al. (258 additional authors not shown)

Abstract: 3C84 is a nearby radio source with a complex total intensity structure, showing linear polarisation and spectral patterns. A detailed investigation of the central engine region necessitates the use of VLBI above the hitherto available maximum frequency of 86GHz. Using ultrahigh resolution VLBI observations at the highest available frequency of 228GHz, we aim to directly detect compact structures a… ▽ More 3C84 is a nearby radio source with a complex total intensity structure, showing linear polarisation and spectral patterns. A detailed investigation of the central engine region necessitates the use of VLBI above the hitherto available maximum frequency of 86GHz. Using ultrahigh resolution VLBI observations at the highest available frequency of 228GHz, we aim to directly detect compact structures and understand the physical conditions in the compact region of 3C84. We used EHT 228GHz observations and, given the limited (u,v)-coverage, applied geometric model fitting to the data. We also employed quasi-simultaneously observed, multi-frequency VLBI data for the source in order to carry out a comprehensive analysis of the core structure. We report the detection of a highly ordered, strong magnetic field around the central, SMBH of 3C84. The brightness temperature analysis suggests that the system is in equipartition. We determined a turnover frequency of $ν_m=(113\pm4)$GHz, a corresponding synchrotron self-absorbed magnetic field of $B_{SSA}=(2.9\pm1.6)$G, and an equipartition magnetic field of $B_{eq}=(5.2\pm0.6)$G. Three components are resolved with the highest fractional polarisation detected for this object ($m_\textrm{net}=(17.0\pm3.9)$%). The positions of the components are compatible with those seen in low-frequency VLBI observations since 2017-2018. We report a steeply negative slope of the spectrum at 228GHz. We used these findings to test models of jet formation, propagation, and Faraday rotation in 3C84. The findings of our investigation into different flow geometries and black hole spins support an advection-dominated accretion flow in a magnetically arrested state around a rapidly rotating supermassive black hole as a model of the jet-launching system in the core of 3C84. However, systematic uncertainties due to the limited (u,v)-coverage, however, cannot be ignored. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: 15 pages, 6 figures, published in A&A

Journal ref: Issue: A&A Volume 682, February 2024; Article number: L3; Number of pages: 15

arXiv:2312.17018 [pdf, other]

Learning Spatially Collaged Fourier Bases for Implicit Neural Representation

Authors: Jason Chun Lok Li, Chang Liu, Binxiao Huang, Ngai Wong

Abstract: Existing approaches to Implicit Neural Representation (INR) can be interpreted as a global scene representation via a linear combination of Fourier bases of different frequencies. However, such universal basis functions can limit the representation capability in local regions where a specific component is unnecessary, resulting in unpleasant artifacts. To this end, we introduce a learnable spatial… ▽ More Existing approaches to Implicit Neural Representation (INR) can be interpreted as a global scene representation via a linear combination of Fourier bases of different frequencies. However, such universal basis functions can limit the representation capability in local regions where a specific component is unnecessary, resulting in unpleasant artifacts. To this end, we introduce a learnable spatial mask that effectively dispatches distinct Fourier bases into respective regions. This translates into collaging Fourier patches, thus enabling an accurate representation of complex signals. Comprehensive experiments demonstrate the superior reconstruction quality of the proposed approach over existing baselines across various INR tasks, including image fitting, video representation, and 3D shape representation. Our method outperforms all other baselines, improving the image fitting PSNR by over 3dB and 3D reconstruction to 98.81 IoU and 0.0011 Chamfer Distance. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: 11 pages, 13 figures, Accepted at the 38th AAAI Conference on Artificial Intelligence (AAAI-24)

arXiv:2312.16172 [pdf, other]

Balanced Turbulence and the Helicity Barrier in Black Hole Accretion

Authors: George N. Wong, Lev Arzamasskiy

Abstract: Horizon-scale observations from the Event Horizon Telescope (EHT) have enabled precision study of supermassive black hole accretion. Contemporary accretion modeling often treats the inflowing plasma as a single, thermal fluid, but microphysical kinetic effects can lead to significant deviations from this idealized picture. We investigate how the helicity barrier influences EHT-accessible electroma… ▽ More Horizon-scale observations from the Event Horizon Telescope (EHT) have enabled precision study of supermassive black hole accretion. Contemporary accretion modeling often treats the inflowing plasma as a single, thermal fluid, but microphysical kinetic effects can lead to significant deviations from this idealized picture. We investigate how the helicity barrier influences EHT-accessible electromagnetic observables by employing a simple model for electron heating based on kinetic physics and the cascade of energy and helicity in unbalanced turbulence. Although the helicity barrier plays only a minor role in regions with high plasma-beta, like in SANE disks, it may substantially impact in regions with more ordered magnetic fields, such as the jet and its surrounding wind in SANE flows as well as throughout the entire domain in MAD flows. In SANE flows, emission shifts from the funnel wall towards the lower-magnetization disk region; in MAD flows the emission morphology remains largely unchanged. Including the helicity barrier leads to characteristically lower electron temperatures, and neglecting it can lead to underestimated accretion rates and inferred jet powers. The corresponding higher plasma densities result in increased depolarization and Faraday depths thereby decreasing the amplitude of the beta_2 coefficient while leaving its angle unchanged. Both the increased jet power and lower |beta_2| may help alleviate outstanding tensions between modeling and EHT observations. We also find that the estimated ring diameter may be underestimated when the helicity barrier is neglected. Our results underscore the significance of the helicity barrier in shaping black hole observables and inferred accretion system parameters. △ Less

Submitted 26 December, 2023; originally announced December 2023.

Comments: 19 pages, 10 figures, accepted for publication in ApJ

arXiv:2312.09922 [pdf, other]

A Unifying Tensor View for Lightweight CNNs

Authors: Jason Chun Lok Li, Rui Lin, Jiajun Zhou, Edmund Yin Mun Lam, Ngai Wong

Abstract: Despite the decomposition of convolutional kernels for lightweight CNNs being well studied, existing works that rely on tensor network diagrams or hyperdimensional abstraction lack geometry intuition. This work devises a new perspective by linking a 3D-reshaped kernel tensor to its various slice-wise and rank-1 decompositions, permitting a straightforward connection between various tensor approxim… ▽ More Despite the decomposition of convolutional kernels for lightweight CNNs being well studied, existing works that rely on tensor network diagrams or hyperdimensional abstraction lack geometry intuition. This work devises a new perspective by linking a 3D-reshaped kernel tensor to its various slice-wise and rank-1 decompositions, permitting a straightforward connection between various tensor approximations and efficient CNN modules. Specifically, it is discovered that a pointwise-depthwise-pointwise (PDP) configuration constitutes a viable construct for lightweight CNNs. Moreover, a novel link to the latest ShiftNet is established, inspiring a first-ever shift layer pruning that achieves nearly 50% compression with < 1% drop in accuracy for ShiftResNet. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: 4 pages, 3 figures, accepted in 2023 IEEE 15th International Conference on ASIC (ASICON 2023)

arXiv:2312.06101 [pdf, other]

Hundred-Kilobyte Lookup Tables for Efficient Single-Image Super-Resolution

Authors: Binxiao Huang, Jason Chun Lok Li, Jie Ran, Boyu Li, Jiajun Zhou, Dahai Yu, Ngai Wong

Abstract: Conventional super-resolution (SR) schemes make heavy use of convolutional neural networks (CNNs), which involve intensive multiply-accumulate (MAC) operations, and require specialized hardware such as graphics processing units. This contradicts the regime of edge AI that often runs on devices strained by power, computing, and storage resources. Such a challenge has motivated a series of lookup ta… ▽ More Conventional super-resolution (SR) schemes make heavy use of convolutional neural networks (CNNs), which involve intensive multiply-accumulate (MAC) operations, and require specialized hardware such as graphics processing units. This contradicts the regime of edge AI that often runs on devices strained by power, computing, and storage resources. Such a challenge has motivated a series of lookup table (LUT)-based SR schemes that employ simple LUT readout and largely elude CNN computation. Nonetheless, the multi-megabyte LUTs in existing methods still prohibit on-chip storage and necessitate off-chip memory transport. This work tackles this storage hurdle and innovates hundred-kilobyte LUT (HKLUT) models amenable to on-chip cache. Utilizing an asymmetric two-branch multistage network coupled with a suite of specialized kernel patterns, HKLUT demonstrates an uncompromising performance and superior hardware efficiency over existing LUT schemes. Our implementation is publicly available at: https://github.com/jasonli0707/hklut. △ Less

Submitted 8 May, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

arXiv:2311.08125 [pdf, other]

Lite it fly: An All-Deformable-Butterfly Network

Authors: Rui Lin, Jason Chun Lok Li, Jiajun Zhou, Binxiao Huang, Jie Ran, Ngai Wong

Abstract: Most deep neural networks (DNNs) consist fundamentally of convolutional and/or fully connected layers, wherein the linear transform can be cast as the product between a filter matrix and a data matrix obtained by arranging feature tensors into columns. The lately proposed deformable butterfly (DeBut) decomposes the filter matrix into generalized, butterflylike factors, thus achieving network compr… ▽ More Most deep neural networks (DNNs) consist fundamentally of convolutional and/or fully connected layers, wherein the linear transform can be cast as the product between a filter matrix and a data matrix obtained by arranging feature tensors into columns. The lately proposed deformable butterfly (DeBut) decomposes the filter matrix into generalized, butterflylike factors, thus achieving network compression orthogonal to the traditional ways of pruning or low-rank decomposition. This work reveals an intimate link between DeBut and a systematic hierarchy of depthwise and pointwise convolutions, which explains the empirically good performance of DeBut layers. By developing an automated DeBut chain generator, we show for the first time the viability of homogenizing a DNN into all DeBut layers, thus achieving an extreme sparsity and compression. Various examples and hardware benchmarks verify the advantages of All-DeBut networks. In particular, we show it is possible to compress a PointNet to < 5% parameters with < 5% accuracy drop, a record not achievable by other compression schemes. △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: 7 pages, 3 figures, accepted as a brief paper in IEEE Transactions on Neural Networks and Learning Systems (TNNLS)

arXiv:2311.04388 [pdf, other]

The $230$ GHz Variability of Numerical Models of Sagittarius~A* I. Parameter Surveys on Varying Ion-electron Temperature Ratios Under Strongly Magnetized Conditions

Authors: Ho-Sang Chan, Chi-kwan Chan, Ben S. Prather, George N. Wong, Charles Gammie

Abstract: The $230$ GHz lightcurves of Sagittarius~A* (Sgr~A*) predicted by general relativistic magnetohydrodynamics (GRMHD) and ray-tracing (GRRT) models in Event Horizon Telescope Collaboration et al. (2022) have higher variability $M_{ΔT}$ compared to observations. In this series of papers, we explore the origin of such large brightness variability. In this first paper, we performed large GRRT parameter… ▽ More The $230$ GHz lightcurves of Sagittarius~A* (Sgr~A*) predicted by general relativistic magnetohydrodynamics (GRMHD) and ray-tracing (GRRT) models in Event Horizon Telescope Collaboration et al. (2022) have higher variability $M_{ΔT}$ compared to observations. In this series of papers, we explore the origin of such large brightness variability. In this first paper, we performed large GRRT parameter surveys that span from the optically thin to the optically thick regimes, covering the ion-electron temperature ratio under strongly magnetized conditions, $R_{\rm Low}$, from $1$ to $60$. We find that increasing $R_{\rm Low}$ can lead to either an increase or a reduction in $M_{ΔT}$ depending on other model parameters, making it consistent with the observed variability of Sgr~A* in some cases. Our analysis of GRRT image snapshots finds that the major contribution to the large $M_{ΔT}$ for the $R_{\rm Low} = 1$ models comes from the photon rings. However, secondary contributions from the accretion flow are also visible depending on the spin parameter. Our work demonstrates the importance of the electron temperature used for modelling radiatively inefficient accretion flows and places new constraints on the ion-electron temperature ratio. A more in-depth analysis for understanding the dependencies of $M_{ΔT}$ on $R_{\rm Low}$ will be performed in subsequent papers. △ Less

Submitted 4 February, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

Comments: 15 Pages, 9 Figures

arXiv:2309.05234 [pdf]

High-dimensional time-frequency entanglement in a singly-filtered biphoton frequency comb

Authors: Xiang Cheng, Kai-Chi Chang, Murat Can Sarihan, Andrew Mueller, Maria Spiropulu, Matthew D. Shaw, Boris Korzh, Andrei Faraon, Franco N. C. Wong, Jeffrey H. Shapiro, Chee Wei Wong

Abstract: High-dimensional quantum entanglement is a cornerstone for advanced technology enabling large-scale noise-tolerant quantum systems, fault-tolerant quantum computing, and distributed quantum networks. The recently developed biphoton frequency comb (BFC) provides a powerful platform for high-dimensional quantum information processing in its spectral and temporal quantum modes. Here we propose and ge… ▽ More High-dimensional quantum entanglement is a cornerstone for advanced technology enabling large-scale noise-tolerant quantum systems, fault-tolerant quantum computing, and distributed quantum networks. The recently developed biphoton frequency comb (BFC) provides a powerful platform for high-dimensional quantum information processing in its spectral and temporal quantum modes. Here we propose and generate a singly-filtered high-dimensional BFC via spontaneous parametric down-conversion by spectrally shaping only the signal photons with a Fabry-Perot cavity. High-dimensional energy-time entanglement is verified through Franson-interference recurrences and temporal correlation with low-jitter detectors. Frequency- and temporal- entanglement of our singly-filtered BFC is then quantified by Schmidt mode decomposition. Subsequently, we distribute the high-dimensional singly-filtered BFC state over a 10 km fiber link with a post-distribution time-bin dimension lower bounded to be at least 168. Our demonstrations of high-dimensional entanglement and entanglement distribution show the capability of the singly-filtered quantum frequency comb for high-efficiency quantum information processing and high-capacity quantum networks. △ Less

Submitted 11 September, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

Comments: 30 pages, 4 figures

arXiv:2308.15381 [pdf, other]

A search for pulsars around Sgr A* in the first Event Horizon Telescope dataset

Authors: Pablo Torne, Kuo Liu, Ralph P. Eatough, Jompoj Wongphechauxsorn, James M. Cordes, Gregory Desvignes, Mariafelicia De Laurentis, Michael Kramer, Scott M. Ransom, Shami Chatterjee, Robert Wharton, Ramesh Karuppusamy, Lindy Blackburn, Michael Janssen, Chi-kwan Chan, Geoffrey B. Crew, Lynn D. Matthews, Ciriaco Goddi, Helge Rottmann, Jan Wagner, Salvador Sanchez, Ignacio Ruiz, Federico Abbate, Geoffrey C. Bower, Juan J. Salamanca , et al. (261 additional authors not shown)

Abstract: The Event Horizon Telescope (EHT) observed in 2017 the supermassive black hole at the center of the Milky Way, Sagittarius A* (Sgr A*), at a frequency of 228.1 GHz ($λ$=1.3 mm). The fundamental physics tests that even a single pulsar orbiting Sgr A* would enable motivate searching for pulsars in EHT datasets. The high observing frequency means that pulsars - which typically exhibit steep emission… ▽ More The Event Horizon Telescope (EHT) observed in 2017 the supermassive black hole at the center of the Milky Way, Sagittarius A* (Sgr A*), at a frequency of 228.1 GHz ($λ$=1.3 mm). The fundamental physics tests that even a single pulsar orbiting Sgr A* would enable motivate searching for pulsars in EHT datasets. The high observing frequency means that pulsars - which typically exhibit steep emission spectra - are expected to be very faint. However, it also negates pulse scattering, an effect that could hinder pulsar detections in the Galactic Center. Additionally, magnetars or a secondary inverse Compton emission could be stronger at millimeter wavelengths than at lower frequencies. We present a search for pulsars close to Sgr A* using the data from the three most-sensitive stations in the EHT 2017 campaign: the Atacama Large Millimeter/submillimeter Array, the Large Millimeter Telescope and the IRAM 30 m Telescope. We apply three detection methods based on Fourier-domain analysis, the Fast-Folding-Algorithm and single pulse search targeting both pulsars and burst-like transient emission; using the simultaneity of the observations to confirm potential candidates. No new pulsars or significant bursts were found. Being the first pulsar search ever carried out at such high radio frequencies, we detail our analysis methods and give a detailed estimation of the sensitivity of the search. We conclude that the EHT 2017 observations are only sensitive to a small fraction ($\lesssim$2.2%) of the pulsars that may exist close to Sgr A*, motivating further searches for fainter pulsars in the region. △ Less

Submitted 29 August, 2023; originally announced August 2023.

Comments: 33 pages, 7 figures, 6 Tables. Accepted for publication in ApJ

arXiv:2307.06372 [pdf, other]

doi 10.3847/1538-4357/acf92d

Black Hole Polarimetry I: A Signature of Electromagnetic Energy Extraction

Authors: Andrew Chael, Alexandru Lupsasca, George N. Wong, Eliot Quataert

Abstract: In 1977, Blandford and Znajek showed that the electromagnetic field surrounding a rotating black hole can harvest its spin energy and use it to power a collimated astrophysical jet, such as the one launched from the center of the elliptical galaxy M87. Today, interferometric observations with the Event Horizon Telescope (EHT) are delivering high-resolution, event-horizon-scale, polarimetric images… ▽ More In 1977, Blandford and Znajek showed that the electromagnetic field surrounding a rotating black hole can harvest its spin energy and use it to power a collimated astrophysical jet, such as the one launched from the center of the elliptical galaxy M87. Today, interferometric observations with the Event Horizon Telescope (EHT) are delivering high-resolution, event-horizon-scale, polarimetric images of the supermassive black hole M87* at the jet launching point. These polarimetric images offer an unprecedented window into the electromagnetic field structure around a black hole. In this paper, we show that a simple polarimetric observable -- the phase $\angleβ_2$ of the second azimuthal Fourier mode of the linear polarization in a near-horizon image -- depends on the sign of the electromagnetic energy flux and therefore provides a direct probe of black hole energy extraction. In Boyer-Lindquist coordinates, the Poynting flux for axisymmetric electromagnetic fields is proportional to the product $B^φB^r$. The phase $\angleβ_2$ likewise depends on the ratio $B^φ/B^r$, thereby enabling an observer to experimentally determine the direction of electromagnetic energy flow in the near-horizon environment. Data from the 2017 EHT observations of M87* are consistent with electromagnetic energy outflow. Currently envisioned multi-frequency observations of M87* will achieve higher dynamic range and angular resolution, and hence deliver measurements of $\angleβ_2$ closer to the event horizon as well as better constraints on Faraday rotation. Such observations will enable a definitive test for energy extraction from the black hole M87*. △ Less

Submitted 14 November, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

Comments: 35 pages, 5 figures. Published in ApJ

arXiv:2307.05293 [pdf, other]

doi 10.3847/2041-8213/ace630

Demonstrating Photon Ring Existence with Single-Baseline Polarimetry

Authors: Daniel C. M. Palumbo, George N. Wong, Andrew A. Chael, Michael D. Johnson

Abstract: Images of supermassive black hole accretion flows contain features of both curved spacetime and plasma structure. Inferring properties of the spacetime from images requires modeling the plasma properties, and vice versa. The Event Horizon Telescope Collaboration has imaged near-horizon millimeter emission from both Messier 87* (M87*) and Sagittarius A* (Sgr A*) with very-long-baseline interferomet… ▽ More Images of supermassive black hole accretion flows contain features of both curved spacetime and plasma structure. Inferring properties of the spacetime from images requires modeling the plasma properties, and vice versa. The Event Horizon Telescope Collaboration has imaged near-horizon millimeter emission from both Messier 87* (M87*) and Sagittarius A* (Sgr A*) with very-long-baseline interferometry (VLBI) and has found a preference for magnetically arrested disk (MAD) accretion in each case. MAD accretion enables spacetime measurements through future observations of the photon ring, the image feature composed of near-orbiting photons. The ordered fields and relatively weak Faraday rotation of MADs yield rotationally symmetric polarization when viewed at modest inclination. In this letter, we utilize this symmetry along with parallel transport symmetries to construct a gain-robust interferometric quantity that detects the transition between the weakly lensed accretion flow image and the strongly lensed photon ring. We predict a shift in polarimetric phases on long baselines and demonstrate that the photon rings in M87* and Sgr A* can be unambiguously detected {with sensitive, long-baseline measurements. For M87* we find that photon ring detection in snapshot observations requires $\sim1$ mJy sensitivity on $>15$ G$λ$ baselines at 230 GHz and above, which could be achieved with space-VLBI or higher-frequency ground-based VLBI. For Sgr A*, we find that interstellar scattering inhibits photon ring detectability at 230 GHz, but $\sim10$ mJy sensitivity on $>12$ G$λ$ baselines at 345 GHz is sufficient, which is accessible from the ground. For both sources, these sensitivity requirements may be relaxed by repeated observations and averaging. △ Less

Submitted 11 July, 2023; originally announced July 2023.

Comments: 14 pages, 7 figures, Accepted to ApJL

arXiv:2306.14262 [pdf, other]

A Spectral Perspective towards Understanding and Improving Adversarial Robustness

Authors: Binxiao Huang, Rui Lin, Chaofan Tao, Ngai Wong

Abstract: Deep neural networks (DNNs) are incredibly vulnerable to crafted, imperceptible adversarial perturbations. While adversarial training (AT) has proven to be an effective defense approach, the AT mechanism for robustness improvement is not fully understood. This work investigates AT from a spectral perspective, adding new insights to the design of effective defenses. In particular, we show that AT i… ▽ More Deep neural networks (DNNs) are incredibly vulnerable to crafted, imperceptible adversarial perturbations. While adversarial training (AT) has proven to be an effective defense approach, the AT mechanism for robustness improvement is not fully understood. This work investigates AT from a spectral perspective, adding new insights to the design of effective defenses. In particular, we show that AT induces the deep model to focus more on the low-frequency region, which retains the shape-biased representations, to gain robustness. Further, we find that the spectrum of a white-box attack is primarily distributed in regions the model focuses on, and the perturbation attacks the spectral bands where the model is vulnerable. Based on this observation, to train a model tolerant to frequency-varying perturbation, we propose a spectral alignment regularization (SAR) such that the spectral output inferred by an attacked adversarial input stays as close as possible to its natural input counterpart. Experiments demonstrate that SAR and its weight averaging (WA) extension could significantly improve the robust accuracy by 1.14% ~ 3.87% relative to the standard AT, across multiple datasets (CIFAR-10, CIFAR-100 and Tiny ImageNet), and various attacks (PGD, C&W and Autoattack), without any extra data. △ Less

Submitted 25 June, 2023; originally announced June 2023.

arXiv:2306.14099 [pdf]

High-precision and low-latency widefield diamond quantum sensing with neuromorphic vision sensors

Authors: Zhiyuan Du, Madhav Gupta, Feng Xu, Kai Zhang, Jiahua Zhang, Yan Zhou, Yiyao Liu, Zhenyu Wang, Jorg Wrachtrup, Ngai Wong, Can Li, Zhiqin Chu

Abstract: During the past decade, interest has grown significantly in developing ultrasensitive widefield diamond magnetometry for various applications. Despite attempts to improve the adoption of conventional frame-based sensors, achieving high temporal resolution and sensitivity simultaneously remains a key challenge. This is largely due to the transfer and processing of massive amounts of sensor data to… ▽ More During the past decade, interest has grown significantly in developing ultrasensitive widefield diamond magnetometry for various applications. Despite attempts to improve the adoption of conventional frame-based sensors, achieving high temporal resolution and sensitivity simultaneously remains a key challenge. This is largely due to the transfer and processing of massive amounts of sensor data to capture the widefield fluorescence intensity changes of spin defects in diamonds. In this study, we adopt a neuromorphic vision sensor to address this issue. This sensor pre-processes the detected signals in optically detected magnetic resonance (ODMR) measurements for quantum sensing, employing a working principle that closely resembles the operation of the human vision system. By encoding the changes of light intensity into spikes, this approach results in a vast dynamic range, high temporal resolution, and exceptional signal-to-background ratio. After a thorough evaluation of theoretical feasibility, our experiment with an off-the-shelf event camera demonstrated a 13x improvement in temporal resolution with comparable precision of detecting ODMR resonance frequencies compared with the state-of-the-art highly specialized frame-based approach. A specialized camera system with the same mechanism has the potential to enhance these benefits further. This performance improvement is primarily attributable to orders of magnitude smaller data volumes and, thus, reduced latency. We further showcase the deployment of this technology in monitoring dynamically modulated laser heating of gold nanoparticles coated on a diamond surface, a recognizably difficult task using existing approaches. The current development provides new insights for high-precision and low-latency widefield quantum sensing, with possibilities for integration with emerging memory devices for more efficient event-based data processing. △ Less

Submitted 24 June, 2023; originally announced June 2023.

Comments: 21 pages, 4 figures

arXiv:2306.12824 [pdf, ps, other]

Weighted composition operators preserving various Lipschitz constants

Authors: Ching-Jou Liao, Chih-Neng Liu, Jung-Hui Liu, Ngai-Ching Wong

Abstract: Let $\mathrm{Lip}(X)$, $\mathrm{Lip}^b(X)$, $\mathrm{Lip}^{\mathrm{loc}}(X)$ and $\mathrm{Lip}^\mathrm{pt}(X)$ be the vector spaces of Lipschitz, bounded Lipschitz, locally Lipschitz and pointwise Lipschitz (real-valued) functions defined on a metric space $(X, d_X)$, respectively. We show that if a weighted composition operator $Tf=h\cdot f\circ \varphi$ defines a bijection between such vec… ▽ More Let $\mathrm{Lip}(X)$, $\mathrm{Lip}^b(X)$, $\mathrm{Lip}^{\mathrm{loc}}(X)$ and $\mathrm{Lip}^\mathrm{pt}(X)$ be the vector spaces of Lipschitz, bounded Lipschitz, locally Lipschitz and pointwise Lipschitz (real-valued) functions defined on a metric space $(X, d_X)$, respectively. We show that if a weighted composition operator $Tf=h\cdot f\circ \varphi$ defines a bijection between such vector spaces preserving Lipschitz constants, local Lipschitz constants or pointwise Lipschitz constants, then $h= \pm1/α$ is a constant function for some scalar $α>0$ and $\varphi$ is an $α$-dilation. Let $U$ be open connected and $V$ be open, or both $U,V$ are convex bodies, in normed linear spaces $E, F$, respectively. Let $Tf=h\cdot f\circ\varphi$ be a bijective weighed composition operator between the vector spaces $\mathrm{Lip}(U)$ and $\mathrm{Lip}(V)$, $\mathrm{Lip}^b(U)$ and $\mathrm{Lip}^b(V)$, $\mathrm{Lip}^\mathrm{loc}(U)$ and $\mathrm{Lip}^\mathrm{loc}(V)$, or $\mathrm{Lip}^\mathrm{pt}(U)$ and $\mathrm{Lip}^\mathrm{pt}(V)$, preserving the Lipschitz, locally Lipschitz, or pointwise Lipschitz constants, respectively. We show that there is a linear isometry $A: F\to E$, an $α>0$ and a vector $b\in E$ such that $\varphi(x)=αAx + b$, and $h$ is a constant function assuming value $\pm 1/α$. More concrete results are obtained for the special cases when $E=F=\mathbb{R}^n$, or when $U,V$ are $n$-dimensional flat manifolds. △ Less

Submitted 22 June, 2023; originally announced June 2023.

Comments: to appear in "Annals of Mathematical Sciences and Applications"

MSC Class: 46B04; 51F30; 26A16

arXiv:2306.11149 [pdf, other]

Overcoming Beam Squint in Dual-Wideband mmWave MIMO Channel Estimation: A Bayesian Multi-Band Sparsity Approach

Authors: Le Xu, Lei Cheng, Ngai Wong, Yik-Chung Wu, H. Vincent Poor

Abstract: The beam squint effect, which manifests in different steering matrices in different sub-bands, has been widely considered a challenge in millimeter wave (mmWave) multiinput multi-output (MIMO) channel estimation. Existing methods either require specific forms of the precoding/combining matrix, which restrict their general practicality, or simply ignore the beam squint effect by only making use of… ▽ More The beam squint effect, which manifests in different steering matrices in different sub-bands, has been widely considered a challenge in millimeter wave (mmWave) multiinput multi-output (MIMO) channel estimation. Existing methods either require specific forms of the precoding/combining matrix, which restrict their general practicality, or simply ignore the beam squint effect by only making use of a single sub-band for channel estimation. Recognizing that different steering matrices are coupled by the same set of unknown channel parameters, this paper proposes to exploit the common sparsity structure of the virtual channel model so that signals from different subbands can be jointly utilized to enhance the performance of channel estimation. A probabilistic model is built to induce the common sparsity in the spatial domain, and the first-order Taylor expansion is adopted to get rid of the grid mismatch in the dictionaries. To learn the model parameters, a variational expectation-maximization (EM) algorithm is derived, which automatically obtains the balance between the likelihood function and the common sparsity prior information, and is applicable to arbitrary forms of precoding/combining matrices. Simulation results show the superior estimation accuracy of the proposed algorithm over existing methods under different noise powers and system configurations. △ Less

Submitted 19 June, 2023; originally announced June 2023.

arXiv:2306.11123 [pdf, other]

To Fold or Not to Fold: Graph Regularized Tensor Train for Visual Data Completion

Authors: Le Xu, Lei Cheng, Ngai Wong, Yik-Chung Wu

Abstract: Tensor train (TT) representation has achieved tremendous success in visual data completion tasks, especially when it is combined with tensor folding. However, folding an image or video tensor breaks the original data structure, leading to local information loss as nearby pixels may be assigned into different dimensions and become far away from each other. In this paper, to fully preserve the local… ▽ More Tensor train (TT) representation has achieved tremendous success in visual data completion tasks, especially when it is combined with tensor folding. However, folding an image or video tensor breaks the original data structure, leading to local information loss as nearby pixels may be assigned into different dimensions and become far away from each other. In this paper, to fully preserve the local information of the original visual data, we explore not folding the data tensor, and at the same time adopt graph information to regularize local similarity between nearby entries. To overcome the high computational complexity introduced by the graph-based regularization in the TT completion problem, we propose to break the original problem into multiple sub-problems with respect to each TT core fiber, instead of each TT core as in traditional methods. Furthermore, to avoid heavy parameter tuning, a sparsity promoting probabilistic model is built based on the generalized inverse Gaussian (GIG) prior, and an inference algorithm is derived under the mean-field approximation. Experiments on both synthetic data and real-world visual data show the superiority of the proposed methods. △ Less

Submitted 19 June, 2023; originally announced June 2023.

arXiv:2305.15365 [pdf, other]

Boundary Attention Mapping (BAM): Fine-grained saliency maps for segmentation of Burn Injuries

Authors: Mahla Abdolahnejad, Justin Lee, Hannah Chan, Alex Morzycki, Olivier Ethier, Anthea Mo, Peter X. Liu, Joshua N. Wong, Colin Hong, Rakesh Joshi

Abstract: Burn injuries can result from mechanisms such as thermal, chemical, and electrical insults. A prompt and accurate assessment of burns is essential for deciding definitive clinical treatments. Currently, the primary approach for burn assessments, via visual and tactile observations, is approximately 60%-80% accurate. The gold standard is biopsy and a close second would be non-invasive methods like… ▽ More Burn injuries can result from mechanisms such as thermal, chemical, and electrical insults. A prompt and accurate assessment of burns is essential for deciding definitive clinical treatments. Currently, the primary approach for burn assessments, via visual and tactile observations, is approximately 60%-80% accurate. The gold standard is biopsy and a close second would be non-invasive methods like Laser Doppler Imaging (LDI) assessments, which have up to 97% accuracy in predicting burn severity and the required healing time. In this paper, we introduce a machine learning pipeline for assessing burn severities and segmenting the regions of skin that are affected by burn. Segmenting 2D colour images of burns allows for the injured versus non-injured skin to be delineated, clearly marking the extent and boundaries of the localized burn/region-of-interest, even during remote monitoring of a burn patient. We trained a convolutional neural network (CNN) to classify four severities of burns. We built a saliency mapping method, Boundary Attention Mapping (BAM), that utilises this trained CNN for the purpose of accurately localizing and segmenting the burn regions from skin burn images. We demonstrated the effectiveness of our proposed pipeline through extensive experiments and evaluations using two datasets; 1) A larger skin burn image dataset consisting of 1684 skin burn images of four burn severities, 2) An LDI dataset that consists of a total of 184 skin burn images with their associated LDI scans. The CNN trained using the first dataset achieved an average F1-Score of 78% and micro/macro- average ROC of 85% in classifying the four burn severities. Moreover, a comparison between the BAM results and LDI results for measuring injury boundary showed that the segmentations generated by our method achieved 91.60% accuracy, 78.17% sensitivity, and 93.37% specificity. △ Less

Submitted 24 May, 2023; originally announced May 2023.

arXiv:2305.09812 [pdf]

doi 10.1038/s41566-023-01224-x

A chip-scale polarization-spatial-momentum quantum SWAP gate in silicon nanophotonics

Authors: Xiang Cheng, Kai-Chi Chang, Zhenda Xie, Murat Can Sarihan, Yoo Seung Lee, Yongnan Li, XinAn Xu, Abhinav Kumar Vinod, Serdar Kocaman, Mingbin Yu, Patrick Guo-Qiang Lo, Dim-Lee Kwong, Jeffrey H. Shapiro, Franco N. C. Wong, Chee Wei Wong

Abstract: Recent progress in quantum computing and networking enables high-performance large-scale quantum processors by connecting different quantum modules. Optical quantum systems show advantages in both computing and communications, and integrated quantum photonics further increases the level of scaling and complexity. Here we demonstrate an efficient SWAP gate that deterministically swaps a photon's po… ▽ More Recent progress in quantum computing and networking enables high-performance large-scale quantum processors by connecting different quantum modules. Optical quantum systems show advantages in both computing and communications, and integrated quantum photonics further increases the level of scaling and complexity. Here we demonstrate an efficient SWAP gate that deterministically swaps a photon's polarization qubit with its spatial-momentum qubit on a nanofabricated two-level silicon-photonics chip containing three cascaded gates. The on-chip SWAP gate is comprehensively characterized by tomographic measurements with high fidelity for both single-qubit and two-qubit operation. The coherence preservation of the SWAP gate process is verified by single-photon and two-photon quantum interference. The coherent reversible conversion of our SWAP gate facilitates a quantum interconnect between different photonic subsystems with different degrees of freedom, demonstrated by distributing four Bell states between two chips. We also elucidate the source of decoherence in the SWAP operation in pursuit of near-unity fidelity. Our deterministic SWAP gate in the silicon platform provides a pathway towards integrated quantum information processing for interconnected modular systems. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: 25 pages, 4 figures

Journal ref: Nat. Photon. 17, 656-665 (2023)

arXiv:2305.09098 [pdf, other]

Weight-Inherited Distillation for Task-Agnostic BERT Compression

Authors: Taiqiang Wu, Cheng Hou, Shanshan Lao, Jiayi Li, Ngai Wong, Zhe Zhao, Yujiu Yang

Abstract: Knowledge Distillation (KD) is a predominant approach for BERT compression. Previous KD-based methods focus on designing extra alignment losses for the student model to mimic the behavior of the teacher model. These methods transfer the knowledge in an indirect way. In this paper, we propose a novel Weight-Inherited Distillation (WID), which directly transfers knowledge from the teacher. WID does… ▽ More Knowledge Distillation (KD) is a predominant approach for BERT compression. Previous KD-based methods focus on designing extra alignment losses for the student model to mimic the behavior of the teacher model. These methods transfer the knowledge in an indirect way. In this paper, we propose a novel Weight-Inherited Distillation (WID), which directly transfers knowledge from the teacher. WID does not require any additional alignment loss and trains a compact student by inheriting the weights, showing a new perspective of knowledge distillation. Specifically, we design the row compactors and column compactors as mappings and then compress the weights via structural re-parameterization. Experimental results on the GLUE and SQuAD benchmarks show that WID outperforms previous state-of-the-art KD-based baselines. Further analysis indicates that WID can also learn the attention patterns from the teacher model without any alignment loss on attention distributions. The code is available at https://github.com/wutaiqiang/WID-NAACL2024. △ Less

Submitted 20 March, 2024; v1 submitted 15 May, 2023; originally announced May 2023.

Comments: 9 pages, 4 figures, NAACL2024 findings

arXiv:2304.03804 [pdf, other]

Mahakala: a Python-based Modular Ray-tracing and Radiative Transfer Algorithm for Curved Space-times

Authors: Aniket Sharma, Lia Medeiros, Chi-kwan Chan, Goni Halevi, Patrick D. Mullen, James M. Stone, George N. Wong

Abstract: We introduce Mahakala, a Python-based, modular, radiative ray-tracing code for curved space-times. We employ Google's JAX framework for accelerated automatic differentiation, which can efficiently compute Christoffel symbols directly from the metric, allowing the user to easily and quickly simulate photon trajectories through non-Kerr metrics. JAX also enables Mahakala to run in parallel on both C… ▽ More We introduce Mahakala, a Python-based, modular, radiative ray-tracing code for curved space-times. We employ Google's JAX framework for accelerated automatic differentiation, which can efficiently compute Christoffel symbols directly from the metric, allowing the user to easily and quickly simulate photon trajectories through non-Kerr metrics. JAX also enables Mahakala to run in parallel on both CPUs and GPUs and achieve speeds comparable to C-based codes. Mahakala natively uses the Cartesian Kerr-Schild coordinate system, which avoids numerical issues caused by the "pole" of spherical coordinates. We demonstrate Mahakala's capabilities by simulating the 1.3 mm wavelength images (the wavelength of Event Horizon Telescope observations) of general relativistic magnetohydrodynamic simulations of low-accretion rate supermassive black holes. The modular nature of Mahakala allows us to easily quantify the relative contribution of different regions of the flow to image features. We show that most of the emission seen in 1.3 mm images originates close to the black hole. We also quantify the relative contribution of the disk, forward jet, and counter jet to 1.3 mm images. △ Less

Submitted 7 April, 2023; originally announced April 2023.

Comments: 15 pages, 11 figures

arXiv:2303.15522 [pdf, other]

$κ$monty: a Monte Carlo Compton Scattering code including non-thermal electrons

Authors: Jordy Davelaar, Benjamin R. Ryan, George N. Wong, Thomas Bronzwaer, Hector Olivares, Monika Mościbrodzka, Charles F. Gammie, Heino Falcke

Abstract: Low-luminosity active galactic nuclei are strong sources of X-ray emission produced by Compton scattering originating from the accretion flows surrounding their supermassive black holes. The shape and energy of the resulting spectrum depend on the shape of the underlying electron distribution function (DF). In this work, we present an extended version of the grmonty code, called $κ$monty. The grmo… ▽ More Low-luminosity active galactic nuclei are strong sources of X-ray emission produced by Compton scattering originating from the accretion flows surrounding their supermassive black holes. The shape and energy of the resulting spectrum depend on the shape of the underlying electron distribution function (DF). In this work, we present an extended version of the grmonty code, called $κ$monty. The grmonty code previously only included a thermal Maxwell Jütner electron distribution function. We extend the gromty code with non-thermal electron DFs, namely the $κ$ and power-law DFs, implement Cartesian Kerr-Schild coordinates, accelerate the code with MPI, and couple the code to the non-uniform AMR grid data from the GRMHD code BHAC. For the Compton scattering process, we derive two sampling kernels for both distribution functions. Finally, we present a series of code tests to verify the accuracy of our schemes. The implementation of non-thermal DFs opens the possibility of studying the effect of non-thermal emission on previously developed black hole accretion models. △ Less

Submitted 2 October, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

Comments: 12 pages, 9 figures, accepted for publication in MNRAS

arXiv:2303.14893 [pdf, other]

Context-Aware Transformer for 3D Point Cloud Automatic Annotation

Authors: Xiaoyan Qian, Chang Liu, Xiaojuan Qi, Siew-Chong Tan, Edmund Lam, Ngai Wong

Abstract: 3D automatic annotation has received increased attention since manually annotating 3D point clouds is laborious. However, existing methods are usually complicated, e.g., pipelined training for 3D foreground/background segmentation, cylindrical object proposals, and point completion. Furthermore, they often overlook the inter-object feature relation that is particularly informative to hard samples… ▽ More 3D automatic annotation has received increased attention since manually annotating 3D point clouds is laborious. However, existing methods are usually complicated, e.g., pipelined training for 3D foreground/background segmentation, cylindrical object proposals, and point completion. Furthermore, they often overlook the inter-object feature relation that is particularly informative to hard samples for 3D annotation. To this end, we propose a simple yet effective end-to-end Context-Aware Transformer (CAT) as an automated 3D-box labeler to generate precise 3D box annotations from 2D boxes, trained with a small number of human annotations. We adopt the general encoder-decoder architecture, where the CAT encoder consists of an intra-object encoder (local) and an inter-object encoder (global), performing self-attention along the sequence and batch dimensions, respectively. The former models intra-object interactions among points, and the latter extracts feature relations among different objects, thus boosting scene-level understanding. Via local and global encoders, CAT can generate high-quality 3D box annotations with a streamlined workflow, allowing it to outperform existing state-of-the-art by up to 1.79% 3D AP on the hard task of the KITTI test set. △ Less

Submitted 26 March, 2023; originally announced March 2023.

arXiv:2303.13763 [pdf, other]

Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs

Authors: Taiqiang Wu, Zhe Zhao, Jiahao Wang, Xingyu Bai, Lei Wang, Ngai Wong, Yujiu Yang

Abstract: Distilling high-accuracy Graph Neural Networks~(GNNs) to low-latency multilayer perceptrons~(MLPs) on graph tasks has become a hot research topic. However, MLPs rely exclusively on the node features and fail to capture the graph structural information. Previous methods address this issue by processing graph edges into extra inputs for MLPs, but such graph structures may be unavailable for various… ▽ More Distilling high-accuracy Graph Neural Networks~(GNNs) to low-latency multilayer perceptrons~(MLPs) on graph tasks has become a hot research topic. However, MLPs rely exclusively on the node features and fail to capture the graph structural information. Previous methods address this issue by processing graph edges into extra inputs for MLPs, but such graph structures may be unavailable for various scenarios. To this end, we propose a Prototype-Guided Knowledge Distillation~(PGKD) method, which does not require graph edges~(edge-free) yet learns structure-aware MLPs. Specifically, we analyze the graph structural information in GNN teachers, and distill such information from GNNs to MLPs via prototypes in an edge-free setting. Experimental results on popular graph benchmarks demonstrate the effectiveness and robustness of the proposed PGKD. △ Less

Submitted 27 March, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: 8 pages, 4 figures, 9 tables

arXiv:2303.12004 [pdf, other]

doi 10.3847/1538-4357/acc586

Comparison of Polarized Radiative Transfer Codes used by the EHT Collaboration

Authors: Ben S. Prather, Jason Dexter, Monika Moscibrodzka, Hung-Yi Pu, Thomas Bronzwaer, Jordy Davelaar, Ziri Younsi, Charles F. Gammie, Roman Gold, George N. Wong, Kazunori Akiyama, Antxon Alberdi, Walter Alef, Juan Carlos Algaba, Richard Anantua, Keiichi Asada, Rebecca Azulay, Uwe Bach, Anne-Kathrin Baczko, David Ball, Mislav Baloković, John Barrett, Michi Bauböck, Bradford A. Benson, Dan Bintley , et al. (248 additional authors not shown)

Abstract: Interpretation of resolved polarized images of black holes by the Event Horizon Telescope (EHT) requires predictions of the polarized emission observable by an Earth-based instrument for a particular model of the black hole accretion system. Such predictions are generated by general relativistic radiative transfer (GRRT) codes, which integrate the equations of polarized radiative transfer in curve… ▽ More Interpretation of resolved polarized images of black holes by the Event Horizon Telescope (EHT) requires predictions of the polarized emission observable by an Earth-based instrument for a particular model of the black hole accretion system. Such predictions are generated by general relativistic radiative transfer (GRRT) codes, which integrate the equations of polarized radiative transfer in curved spacetime. A selection of ray-tracing GRRT codes used within the EHT collaboration is evaluated for accuracy and consistency in producing a selection of test images, demonstrating that the various methods and implementations of radiative transfer calculations are highly consistent. When imaging an analytic accretion model, we find that all codes produce images similar within a pixel-wise normalized mean squared error (NMSE) of 0.012 in the worst case. When imaging a snapshot from a cell-based magnetohydrodynamic simulation, we find all test images to be similar within NMSEs of 0.02, 0.04, 0.04, and 0.12 in Stokes I, Q, U , and V respectively. We additionally find the values of several image metrics relevant to published EHT results to be in agreement to much better precision than measurement uncertainties. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: Accepted for publication in ApJ

arXiv:2302.12510 [pdf, other]

doi 10.1109/TCAD.2023.3342730

DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference

Authors: Jiajun Zhou, Jiajun Wu, Yizhao Gao, Yuhao Ding, Chaofan Tao, Boyu Li, Fengbin Tu, Kwang-Ting Cheng, Hayden Kwok-Hay So, Ngai Wong

Abstract: To accelerate the inference of deep neural networks (DNNs), quantization with low-bitwidth numbers is actively researched. A prominent challenge is to quantize the DNN models into low-bitwidth numbers without significant accuracy degradation, especially at very low bitwidths (< 8 bits). This work targets an adaptive data representation with variable-length encoding called DyBit. DyBit can dynamica… ▽ More To accelerate the inference of deep neural networks (DNNs), quantization with low-bitwidth numbers is actively researched. A prominent challenge is to quantize the DNN models into low-bitwidth numbers without significant accuracy degradation, especially at very low bitwidths (< 8 bits). This work targets an adaptive data representation with variable-length encoding called DyBit. DyBit can dynamically adjust the precision and range of separate bit-field to be adapted to the DNN weights/activations distribution. We also propose a hardware-aware quantization framework with a mixed-precision accelerator to trade-off the inference accuracy and speedup. Experimental results demonstrate that the inference accuracy via DyBit is 1.997% higher than the state-of-the-art at 4-bit quantization, and the proposed framework can achieve up to 8.1x speedup compared with the original model. △ Less

Submitted 24 February, 2023; originally announced February 2023.

arXiv:2302.11170 [pdf, ps, other]

Linear maps preserving matrices annihilated by a fixed polynomial

Authors: Chi-Kwong Li, Ming-Cheng Tsai, Ya-Shu Wang, Ngai-Ching Wong

Abstract: Let ${\bf M}_n(\mathbb{F})$ be the algebra of $n\times n$ matrices over an arbitrary field $\mathbb{F}$. We consider linear maps $Φ: {\bf M}_n(\mathbb{F}) \rightarrow {\bf M}_r(\mathbb{F})$ preserving matrices annihilated by a fixed polynomial $f(x) = (x-a_1)\cdots (x-a_m)$ with $m\ge 2$ distinct zeroes $a_1, a_2, \ldots, a_m \in \mathbb{F}$; namely,… ▽ More Let ${\bf M}_n(\mathbb{F})$ be the algebra of $n\times n$ matrices over an arbitrary field $\mathbb{F}$. We consider linear maps $Φ: {\bf M}_n(\mathbb{F}) \rightarrow {\bf M}_r(\mathbb{F})$ preserving matrices annihilated by a fixed polynomial $f(x) = (x-a_1)\cdots (x-a_m)$ with $m\ge 2$ distinct zeroes $a_1, a_2, \ldots, a_m \in \mathbb{F}$; namely, $$ f(Φ(A)) = 0\quad\text{whenever} \quad f(A) = 0. $$ Suppose that $f(0)=0$, and the zero set $Z(f) =\{a_1, \dots, a_m\}$ is not an additive group. Then $Φ$ assumes the form \begin{align}\label{eq:standard} A \mapsto S\begin{pmatrix} A \otimes D_1 &&\cr & A^{T} \otimes D_2& \cr && 0_s\cr\end{pmatrix}S^{-1}, \tag{$\dagger$} \end{align} for some invertible matrix $S\in {\bf M}_r(\mathbb{F})$, invertible diagonal matrices $D_1\in {\bf M}_p(\mathbb{F})$ and $D_2\in {\bf M}_q(\mathbb{F})$, where $s=r-np-nq\geq 0$. The diagonal entries $λ$ in $D_1$ and $D_2$, as well as $0$ in the zero matrix $0_s$, are zero multipliers of $f(x)$ in the sense that $λZ(f) \subseteq Z(f)$. In general, assume that $Z(f) - a_1$ is not an additive group. If $Φ(I_n)$ commutes with $Φ(A)$ for all $A\in {\bf M}_n(\mathbb{F})$, or if $f(x)$ has a unique zero multiplier $λ=1$, then $Φ$ assumes the form \eqref{eq:standard}. The above assertions follow from the special case when $f(x) = x(x-1)=x^2-x$, for which the problem reduces to the study of linear idempotent preservers. It is shown that a linear map $Φ: {\bf M}_n(\mathbb{F}) \rightarrow {\bf M}_r(\mathbb{F})$ sending disjoint rank one idempotents to disjoint idempotents always assume the above form \eqref{eq:standard} with $D_1=I_p$ and $D_2=I_q$, unless ${\bf M}_n(\mathbb{F}) = {\bf M}_2(\mathbb{Z}_2)$. △ Less

Submitted 22 February, 2023; originally announced February 2023.

arXiv:2212.12732 [pdf, other]

Frequency Regularization for Improving Adversarial Robustness

Authors: Binxiao Huang, Chaofan Tao, Rui Lin, Ngai Wong

Abstract: Deep neural networks are incredibly vulnerable to crafted, human-imperceptible adversarial perturbations. Although adversarial training (AT) has proven to be an effective defense approach, we find that the AT-trained models heavily rely on the input low-frequency content for judgment, accounting for the low standard accuracy. To close the large gap between the standard and robust accuracies during… ▽ More Deep neural networks are incredibly vulnerable to crafted, human-imperceptible adversarial perturbations. Although adversarial training (AT) has proven to be an effective defense approach, we find that the AT-trained models heavily rely on the input low-frequency content for judgment, accounting for the low standard accuracy. To close the large gap between the standard and robust accuracies during AT, we investigate the frequency difference between clean and adversarial inputs, and propose a frequency regularization (FR) to align the output difference in the spectral domain. Besides, we find Stochastic Weight Averaging (SWA), by smoothing the kernels over epochs, further improves the robustness. Among various defense schemes, our method achieves the strongest robustness against attacks by PGD-20, C\&W and Autoattack, on a WideResNet trained on CIFAR-10 without any extra data. △ Less

Submitted 24 December, 2022; originally announced December 2022.

Comments: accepted by AAAI 2023 workshop

arXiv:2212.04852 [pdf, other]

doi 10.1093/mnras/stad466

Using Machine Learning to Link Black Hole Accretion Flows with Spatially Resolved Polarimetric Observables

Authors: Richard Qiu, Angelo Ricarte, Ramesh Narayan, George N. Wong, Andrew Chael, Daniel Palumbo

Abstract: We introduce a new library of 535,194 model images of the supermassive black holes and Event Horizon Telescope (EHT) targets Sgr A* and M87*, computed by performing general relativistic radiative transfer calculations on general relativistic magnetohydrodynamics simulations. Then, to infer underlying black hole and accretion flow parameters (spin, inclination, ion-to-electron temperature ratio, an… ▽ More We introduce a new library of 535,194 model images of the supermassive black holes and Event Horizon Telescope (EHT) targets Sgr A* and M87*, computed by performing general relativistic radiative transfer calculations on general relativistic magnetohydrodynamics simulations. Then, to infer underlying black hole and accretion flow parameters (spin, inclination, ion-to-electron temperature ratio, and magnetic field polarity), we train a random forest machine learning model on various hand-picked polarimetric observables computed from each image. Our random forest is capable of making meaningful predictions of spin, inclination, and the ion-to-electron temperature ratio, but has more difficulty inferring magnetic field polarity. To disentangle how physical parameters are encoded in different observables, we apply two different metrics to rank the importance of each observable at inferring each physical parameter. Details of the spatially resolved linear polarization morphology stand out as important discriminators between models. Bearing in mind the theoretical limitations and incompleteness of our image library, for the real M87* data, our machinery favours high-spin retrograde models with large ion-to-electron temperature ratios. Due to the time-variable nature of these targets, repeated polarimetric imaging will further improve model inference as the EHT and next-generation (EHT) continue to develop and monitor their targets. △ Less

Submitted 9 February, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

Comments: 24 pages, 27 figures

arXiv:2211.11602 [pdf, other]

Improving Multimodal Interactive Agents with Reinforcement Learning from Human Feedback

Authors: Josh Abramson, Arun Ahuja, Federico Carnevale, Petko Georgiev, Alex Goldin, Alden Hung, Jessica Landon, Jirka Lhotka, Timothy Lillicrap, Alistair Muldal, George Powell, Adam Santoro, Guy Scully, Sanjana Srivastava, Tamara von Glehn, Greg Wayne, Nathaniel Wong, Chen Yan, Rui Zhu

Abstract: An important goal in artificial intelligence is to create agents that can both interact naturally with humans and learn from their feedback. Here we demonstrate how to use reinforcement learning from human feedback (RLHF) to improve upon simulated, embodied agents trained to a base level of competency with imitation learning. First, we collected data of humans interacting with agents in a simulate… ▽ More An important goal in artificial intelligence is to create agents that can both interact naturally with humans and learn from their feedback. Here we demonstrate how to use reinforcement learning from human feedback (RLHF) to improve upon simulated, embodied agents trained to a base level of competency with imitation learning. First, we collected data of humans interacting with agents in a simulated 3D world. We then asked annotators to record moments where they believed that agents either progressed toward or regressed from their human-instructed goal. Using this annotation data we leveraged a novel method - which we call "Inter-temporal Bradley-Terry" (IBT) modelling - to build a reward model that captures human judgments. Agents trained to optimise rewards delivered from IBT reward models improved with respect to all of our metrics, including subsequent human judgment during live interactions with agents. Altogether our results demonstrate how one can successfully leverage human judgments to improve agent behaviour, allowing us to use reinforcement learning in complex, embodied domains without programmatic reward functions. Videos of agent behaviour may be found at https://youtu.be/v_Z9F2_eKk4. △ Less

Submitted 21 November, 2022; originally announced November 2022.

arXiv:2211.06541 [pdf, other]

Emission Modeling in the EHT-ngEHT Age

Authors: Richard Anantua, Joaquín Dúran, Nathan Ngata, Lani Oramas, Razieh Emami, Angelo Ricarte, Brandon Curd, Jan Röder, Avery Broderick, Jeremy Wayland, George N. Wong, Sean Ressler

Abstract: This work proposes a methodology to test phenomenologically-motivated emission processes that account for the flux and polarization distribution and global structure of the 230 GHz sources imaged by the Event Horizon Telescope (EHT): Messier (M)87* and Sagittarius (Sgr) A*. We introduce to general relativistic magnetohydrodynamic (GRMHD) simulations some novel models to bridge the largely uncertai… ▽ More This work proposes a methodology to test phenomenologically-motivated emission processes that account for the flux and polarization distribution and global structure of the 230 GHz sources imaged by the Event Horizon Telescope (EHT): Messier (M)87* and Sagittarius (Sgr) A*. We introduce to general relativistic magnetohydrodynamic (GRMHD) simulations some novel models to bridge the largely uncertain mechanisms by which high-energy particles in jet/accretion flow/black hole (JAB) system plasmas attain billion degree temperatures and emit synchrotron radiation. The "Observing" JAB Systems methodology then partitions the simulation to apply different parametric models to regions governed by different plasma physics -- an advance over methods where one parametrization is used over simulation regions spanning thousands of gravitational radii from the central supermassive black hole. We present several classes of viewing-angle dependent morphologies, and highlight signatures of piecewise modeling and positron effects -- including a MAD/SANE dichotomy in which polarized maps appear dominated by intrinsic polarization in the MAD case and by Faraday effects in the SANE case. The library of images thus produced spans a wide range of morphologies awaiting discovery by the groundbreaking EHT instrument and its yet more sensitive, higher resolution next-generation counterpart ngEHT. △ Less

Submitted 11 November, 2022; originally announced November 2022.

Comments: 21 pages, 9 figures

arXiv:2210.08701 [pdf, other]

ODG-Q: Robust Quantization via Online Domain Generalization

Authors: Chaofan Tao, Ngai Wong

Abstract: Quantizing neural networks to low-bitwidth is important for model deployment on resource-limited edge hardware. Although a quantized network has a smaller model size and memory footprint, it is fragile to adversarial attacks. However, few methods study the robustness and training efficiency of quantized networks. To this end, we propose a new method by recasting robust quantization as an online do… ▽ More Quantizing neural networks to low-bitwidth is important for model deployment on resource-limited edge hardware. Although a quantized network has a smaller model size and memory footprint, it is fragile to adversarial attacks. However, few methods study the robustness and training efficiency of quantized networks. To this end, we propose a new method by recasting robust quantization as an online domain generalization problem, termed ODG-Q, which generates diverse adversarial data at a low cost during training. ODG-Q consistently outperforms existing works against various adversarial attacks. For example, on CIFAR-10 dataset, ODG-Q achieves 49.2% average improvements under five common white-box attacks and 21.7% average improvements under five common black-box attacks, with a training cost similar to that of natural training (viz. without adversaries). To our best knowledge, this work is the first work that trains both quantized and binary neural networks on ImageNet that consistently improve robustness under different attacks. We also provide a theoretical insight of ODG-Q that accounts for the bound of model risk on attacked data. △ Less

Submitted 16 October, 2022; originally announced October 2022.

arXiv:2210.01218 [pdf, other]

doi 10.3847/1538-4357/acc8cd

Unraveling Twisty Linear Polarization Morphologies in Black Hole Images

Authors: Razieh Emami, Angelo Ricarte, George N. Wong, Daniel Palumbo, Dominic Chang, Sheperd S. Doeleman, Avery Broaderick, Ramesh Narayan, Maciek Wielgus, Lindy Blackburn, Ben S. Prather, Andrew A. Chael, Richard Anantua, Koushik Chatterjee, Ivan Marti-Vidal, Jose L. Gomez, Kazunori Akiyama, Matthew Liska, Lars Hernquist, Grant Tremblay, Mark Vogelsberger, Charles Alcock, Randall Smith, James Steiner, Paul Tiede , et al. (1 additional authors not shown)

Abstract: We investigate general relativistic magnetohydrodynamic simulations (GRMHD) to determine the physical origin of the twisty patterns of linear polarization seen in spatially resolved black hole images and explain their morphological dependence on black hole spin. By characterising the observed emission with a simple analytic ring model, we find that the twisty morphology is determined by the magnet… ▽ More We investigate general relativistic magnetohydrodynamic simulations (GRMHD) to determine the physical origin of the twisty patterns of linear polarization seen in spatially resolved black hole images and explain their morphological dependence on black hole spin. By characterising the observed emission with a simple analytic ring model, we find that the twisty morphology is determined by the magnetic field structure in the emitting region. Moreover, the dependence of this twisty pattern on spin can be attributed to changes in the magnetic field geometry that occur due to the frame dragging. By studying an analytic ring model, we find that the roles of Doppler boosting and lensing are subdominant. Faraday rotation may cause a systematic shift in the linear polarization pattern, but we find that its impact is subdominant for models with strong magnetic fields and modest ion-to-electron temperature ratios. Models with weaker magnetic fields are much more strongly affected by Faraday rotation and have more complicated emission geometries than can be captured by a ring model. However, these models are currently disfavoured by the recent EHT observations of M87*. Our results suggest that linear polarization maps can provide a probe of the underlying magnetic field structure around a black hole, which may then be usable to indirectly infer black hole spins. The generality of these results should be tested with alternative codes, initial conditions, and plasma physics prescriptions. △ Less

Submitted 28 March, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

Comments: 26 pages, 20 figures, accepted for publication in ApJ

arXiv:2209.14412 [pdf]

Persistent Enhancement of Exciton Diffusivity in CsPbBr3 Nanocrystal Solids

Authors: Wenbi Shcherbakov-Wu, Seryio Saris, Thomas Sheehan, Narumi Nagaya Wong, Eric R. Powers, Franziska Krieg, Maksym V. Kovalenko, Adam P. Willard, William A. Tisdale

Abstract: In semiconductors, exciton or charge carrier diffusivity is typically described as an inherent material property. Here, we show that the transport of excitons (i.e., bound electron-hole pairs) in CsPbBr3 perovskite nanocrystals (NCs) depends markedly on how recently those NCs were occupied by a previous exciton. Using fluence- and repetition-rate-dependent transient photoluminescence microscopy, w… ▽ More In semiconductors, exciton or charge carrier diffusivity is typically described as an inherent material property. Here, we show that the transport of excitons (i.e., bound electron-hole pairs) in CsPbBr3 perovskite nanocrystals (NCs) depends markedly on how recently those NCs were occupied by a previous exciton. Using fluence- and repetition-rate-dependent transient photoluminescence microscopy, we visualize the effect of excitation frequency on exciton transport in CsPbBr3 NC solids. Surprisingly, we observe a striking dependence of the apparent exciton diffusivity on excitation laser power that does not arise from nonlinear exciton-exciton interactions nor from thermal heating of the sample. We interpret our observations with a model in which excitons cause NCs to undergo a transition to a metastable configuration that admits faster exciton transport by roughly an order of magnitude. This metastable configuration persists for ~microseconds at room temperature, and does not depend on the identity of surface ligands or presence of an oxide shell, suggesting that it is an intrinsic response of the perovskite lattice to electronic excitation. The exciton diffusivity observed here (>0.15 cm2/s) is considerably higher than that observed in other NC systems on similar timescales, revealing unusually strong excitonic coupling in a NC material. The finding of a persistent enhancement in excitonic coupling between NCs may help explain other extraordinary photophysical behaviors observed in CsPbBr3 NC arrays, such as superfluorescence. Additionally, faster exciton diffusivity under higher photoexcitation intensity is likely to provide practical insights for optoelectronic device engineering. △ Less

Submitted 28 September, 2022; originally announced September 2022.

Comments: 45 pages, 16 figures

arXiv:2208.13571 [pdf, other]

PECAN: A Product-Quantized Content Addressable Memory Network

Authors: Jie Ran, Rui Lin, Jason Chun Lok Li, Jiajun Zhou, Ngai Wong

Abstract: A novel deep neural network (DNN) architecture is proposed wherein the filtering and linear transform are realized solely with product quantization (PQ). This results in a natural implementation via content addressable memory (CAM), which transcends regular DNN layer operations and requires only simple table lookup. Two schemes are developed for the end-to-end PQ prototype training, namely, throug… ▽ More A novel deep neural network (DNN) architecture is proposed wherein the filtering and linear transform are realized solely with product quantization (PQ). This results in a natural implementation via content addressable memory (CAM), which transcends regular DNN layer operations and requires only simple table lookup. Two schemes are developed for the end-to-end PQ prototype training, namely, through angle- and distance-based similarities, which differ in their multiplicative and additive natures with different complexity-accuracy tradeoffs. Even more, the distance-based scheme constitutes a truly multiplier-free DNN solution. Experiments confirm the feasibility of such Product-Quantized Content Addressable Memory Network (PECAN), which has strong implication on hardware-efficient deployments especially for in-memory computing. △ Less

Submitted 13 August, 2022; originally announced August 2022.

Showing 1–50 of 211 results for author: Wong, N