subscribe to arXiv mailings

Latent Conditional Diffusion-based Data Augmentation for Continuous-Time Dynamic Graph Mode

Authors: Yuxing Tian, Yiyan Qi, Aiwen Jiang, Qi Huang, Jian Guo

Abstract: Continuous-Time Dynamic Graph (CTDG) precisely models evolving real-world relationships, drawing heightened interest in dynamic graph learning across academia and industry. However, existing CTDG models encounter challenges stemming from noise and limited historical data. Graph Data Augmentation (GDA) emerges as a critical solution, yet current approaches primarily focus on static graphs and strug… ▽ More Continuous-Time Dynamic Graph (CTDG) precisely models evolving real-world relationships, drawing heightened interest in dynamic graph learning across academia and industry. However, existing CTDG models encounter challenges stemming from noise and limited historical data. Graph Data Augmentation (GDA) emerges as a critical solution, yet current approaches primarily focus on static graphs and struggle to effectively address the dynamics inherent in CTDGs. Moreover, these methods often demand substantial domain expertise for parameter tuning and lack theoretical guarantees for augmentation efficacy. To address these issues, we propose Conda, a novel latent diffusion-based GDA method tailored for CTDGs. Conda features a sandwich-like architecture, incorporating a Variational Auto-Encoder (VAE) and a conditional diffusion model, aimed at generating enhanced historical neighbor embeddings for target nodes. Unlike conventional diffusion models trained on entire graphs via pre-training, Conda requires historical neighbor sequence embeddings of target nodes for training, thus facilitating more targeted augmentation. We integrate Conda into the CTDG model and adopt an alternating training strategy to optimize performance. Extensive experimentation across six widely used real-world datasets showcases the consistent performance improvement of our approach, particularly in scenarios with limited historical data. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: Accepted by KDD 2024

arXiv:2406.16477 [pdf, other]

DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution

Authors: Aiwen Jiang, Zhi Wei, Long Peng, Feiqiang Liu, Wenbo Li, Mingwen Wang

Abstract: Image super-resolution pursuits reconstructing high-fidelity high-resolution counterpart for low-resolution image. In recent years, diffusion-based models have garnered significant attention due to their capabilities with rich prior knowledge. The success of diffusion models based on general text prompts has validated the effectiveness of textual control in the field of text2image. However, given… ▽ More Image super-resolution pursuits reconstructing high-fidelity high-resolution counterpart for low-resolution image. In recent years, diffusion-based models have garnered significant attention due to their capabilities with rich prior knowledge. The success of diffusion models based on general text prompts has validated the effectiveness of textual control in the field of text2image. However, given the severe degradation commonly presented in low-resolution images, coupled with the randomness characteristics of diffusion models, current models struggle to adequately discern semantic and degradation information within severely degraded images. This often leads to obstacles such as semantic loss, visual artifacts, and visual hallucinations, which pose substantial challenges for practical use. To address these challenges, this paper proposes to leverage degradation-aligned language prompt for accurate, fine-grained, and high-fidelity image restoration. Complementary priors including semantic content descriptions and degradation prompts are explored. Specifically, on one hand, image-restoration prompt alignment decoder is proposed to automatically discern the degradation degree of LR images, thereby generating beneficial degradation priors for image restoration. On the other hand, much richly tailored descriptions from pretrained multimodal large language model elicit high-level semantic priors closely aligned with human perception, ensuring fidelity control for image restoration. Comprehensive comparisons with state-of-the-art methods have been done on several popular synthetic and real-world benchmark datasets. The quantitative and qualitative analysis have demonstrated that the proposed method achieves a new state-of-the-art perceptual quality level, especially in real-world cases based on reference-free metrics. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.13141 [pdf, other]

Implant-to-Wearable Communication through the Human Body: Exploring the Effects of Encapsulated Capacitive and Galvanic Transmitters

Authors: Anyu Jiang, Cassandra Acebal, Brook Heyd, Trustin White, Gurleen Kainth, Arunashish Datta, Shreyas Sen, Adam Khalifa, Baibhab Chatterjee

Abstract: Data transfer using human-body communication (HBC) represents an actively explored alternative solution to address the challenges related to energy-efficiency, tissue absorption, and security of conventional wireless. Although the use of HBC for wearable-to-wearable communication has been well-explored, different configurations for the transmitter (Tx) and receiver (Rx) for implant-to-wearable HBC… ▽ More Data transfer using human-body communication (HBC) represents an actively explored alternative solution to address the challenges related to energy-efficiency, tissue absorption, and security of conventional wireless. Although the use of HBC for wearable-to-wearable communication has been well-explored, different configurations for the transmitter (Tx) and receiver (Rx) for implant-to-wearable HBC needs further studies. This paper substantiates the hypothesis that a fully implanted galvanic Tx is more efficient than a capacitive Tx for interaction with a wearable Rx. Given the practical limitations of implanting an ideal capacitive device, we choose a galvanic device with one electrode encapsulated to model the capacitive scenario. We analyze the lumped circuit model for in-body to out-of-body communication, and perform Circuit-based as well as Finite Element Method (FEM) simulations to explore how the encapsulation thickness affects the received signal levels. We demonstrate in-vivo experimental results on live Sprague Dawley rats to validate the hypothesis, and show that compared to the galvanic Tx, the channel loss will be $\approx$ 20 dB higher with each additional mm thickness of capacitive encapsulation, eventually going below the noise floor for ideal capacitive Tx. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.11364 [pdf, other]

AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection

Authors: Anbai Jiang, Bing Han, Zhiqiang Lv, Yufeng Deng, Wei-Qiang Zhang, Xie Chen, Yanmin Qian, Jia Liu, Pingyi Fan

Abstract: Large pre-trained models have demonstrated dominant performances in multiple areas, where the consistency between pre-training and fine-tuning is the key to success. However, few works reported satisfactory results of pre-trained models for the machine anomalous sound detection (ASD) task. This may be caused by the inconsistency of the pre-trained model and the inductive bias of machine audio, res… ▽ More Large pre-trained models have demonstrated dominant performances in multiple areas, where the consistency between pre-training and fine-tuning is the key to success. However, few works reported satisfactory results of pre-trained models for the machine anomalous sound detection (ASD) task. This may be caused by the inconsistency of the pre-trained model and the inductive bias of machine audio, resulting in inconsistency in data and architecture. Thus, we propose AnoPatch which utilizes a ViT backbone pre-trained on AudioSet and fine-tunes it on machine audio. It is believed that machine audio is more related to audio datasets than speech datasets, and modeling it from patch level suits the sparsity of machine audio. As a result, AnoPatch showcases state-of-the-art (SOTA) performances on the DCASE 2020 ASD dataset and the DCASE 2023 ASD dataset. We also compare multiple pre-trained models and empirically demonstrate that better consistency yields considerable improvement. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Accepted by INTERSPEECH 2024

arXiv:2406.08810 [pdf, other]

Few-Shot Anomaly Detection via Category-Agnostic Registration Learning

Authors: Chaoqin Huang, Haoyan Guan, Aofan Jiang, Yanfeng Wang, Michael Spratling, Xinchao Wang, Ya Zhang

Abstract: Most existing anomaly detection methods require a dedicated model for each category. Such a paradigm, despite its promising results, is computationally expensive and inefficient, thereby failing to meet the requirements for real-world applications. Inspired by how humans detect anomalies, by comparing a query image to known normal ones, this paper proposes a novel few-shot anomaly detection (FSAD)… ▽ More Most existing anomaly detection methods require a dedicated model for each category. Such a paradigm, despite its promising results, is computationally expensive and inefficient, thereby failing to meet the requirements for real-world applications. Inspired by how humans detect anomalies, by comparing a query image to known normal ones, this paper proposes a novel few-shot anomaly detection (FSAD) framework. Using a training set of normal images from various categories, registration, aiming to align normal images of the same categories, is leveraged as the proxy task for self-supervised category-agnostic representation learning. At test time, an image and its corresponding support set, consisting of a few normal images from the same category, are supplied, and anomalies are identified by comparing the registered features of the test image to its corresponding support image features. Such a setup enables the model to generalize to novel test categories. It is, to our best knowledge, the first FSAD method that requires no model fine-tuning for novel categories: enabling a single model to be applied to all categories. Extensive experiments demonstrate the effectiveness of the proposed method. Particularly, it improves the current state-of-the-art for FSAD by 11.3% and 8.3% on the MVTec and MPDD benchmarks, respectively. The source code is available at https://github.com/Haoyan-Guan/CAReg. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.06474 [pdf, other]

Towards a Personal Health Large Language Model

Authors: Justin Cosentino, Anastasiya Belyaeva, Xin Liu, Nicholas A. Furlotte, Zhun Yang, Chace Lee, Erik Schenck, Yojan Patel, Jian Cui, Logan Douglas Schneider, Robby Bryant, Ryan G. Gomes, Allen Jiang, Roy Lee, Yun Liu, Javier Perez, Jameson K. Rogers, Cathy Speed, Shyam Tailor, Megan Walker, Jeffrey Yu, Tim Althoff, Conor Heneghan, John Hernandez, Mark Malhotra , et al. (9 additional authors not shown)

Abstract: In health, most large language model (LLM) research has focused on clinical tasks. However, mobile and wearable devices, which are rarely integrated into such tasks, provide rich, longitudinal data for personal health monitoring. Here we present Personal Health Large Language Model (PH-LLM), fine-tuned from Gemini for understanding and reasoning over numerical time-series personal health data. We… ▽ More In health, most large language model (LLM) research has focused on clinical tasks. However, mobile and wearable devices, which are rarely integrated into such tasks, provide rich, longitudinal data for personal health monitoring. Here we present Personal Health Large Language Model (PH-LLM), fine-tuned from Gemini for understanding and reasoning over numerical time-series personal health data. We created and curated three datasets that test 1) production of personalized insights and recommendations from sleep patterns, physical activity, and physiological responses, 2) expert domain knowledge, and 3) prediction of self-reported sleep outcomes. For the first task we designed 857 case studies in collaboration with domain experts to assess real-world scenarios in sleep and fitness. Through comprehensive evaluation of domain-specific rubrics, we observed that Gemini Ultra 1.0 and PH-LLM are not statistically different from expert performance in fitness and, while experts remain superior for sleep, fine-tuning PH-LLM provided significant improvements in using relevant domain knowledge and personalizing information for sleep insights. We evaluated PH-LLM domain knowledge using multiple choice sleep medicine and fitness examinations. PH-LLM achieved 79% on sleep and 88% on fitness, exceeding average scores from a sample of human experts. Finally, we trained PH-LLM to predict self-reported sleep quality outcomes from textual and multimodal encoding representations of wearable data, and demonstrate that multimodal encoding is required to match performance of specialized discriminative models. Although further development and evaluation are necessary in the safety-critical personal health domain, these results demonstrate both the broad knowledge and capabilities of Gemini models and the benefit of contextualizing physiological data for personal health applications as done with PH-LLM. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 72 pages

arXiv:2406.04165 [pdf, other]

Repurposing Language Models into Embedding Models: Finding the Compute-Optimal Recipe

Authors: Alicja Ziarko, Albert Q. Jiang, Bartosz Piotrowski, Wenda Li, Mateja Jamnik, Piotr Miłoś

Abstract: Text embeddings are essential for many tasks, such as document retrieval, clustering, and semantic similarity assessment. In this paper, we study how to contrastively train text embedding models in a compute-optimal fashion, given a suite of pre-trained decoder-only language models. Our innovation is an algorithm that produces optimal configurations of model sizes, data quantities, and fine-tuning… ▽ More Text embeddings are essential for many tasks, such as document retrieval, clustering, and semantic similarity assessment. In this paper, we study how to contrastively train text embedding models in a compute-optimal fashion, given a suite of pre-trained decoder-only language models. Our innovation is an algorithm that produces optimal configurations of model sizes, data quantities, and fine-tuning methods for text-embedding models at different computational budget levels. The resulting recipe, which we obtain through extensive experiments, can be used by practitioners to make informed design choices for their embedding models. Specifically, our findings suggest that full fine-tuning and low-rank adaptation fine-tuning produce optimal models at lower and higher computational budgets respectively. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2404.04935 [pdf, other]

Anomaly Detection in Electrocardiograms: Advancing Clinical Diagnosis Through Self-Supervised Learning

Authors: Aofan Jiang, Chaoqin Huang, Qing Cao, Yuchen Xu, Zi Zeng, Kang Chen, Ya Zhang, Yanfeng Wang

Abstract: The electrocardiogram (ECG) is an essential tool for diagnosing heart disease, with computer-aided systems improving diagnostic accuracy and reducing healthcare costs. Despite advancements, existing systems often miss rare cardiac anomalies that could be precursors to serious, life-threatening issues or alterations in the cardiac macro/microstructure. We address this gap by focusing on self-superv… ▽ More The electrocardiogram (ECG) is an essential tool for diagnosing heart disease, with computer-aided systems improving diagnostic accuracy and reducing healthcare costs. Despite advancements, existing systems often miss rare cardiac anomalies that could be precursors to serious, life-threatening issues or alterations in the cardiac macro/microstructure. We address this gap by focusing on self-supervised anomaly detection (AD), training exclusively on normal ECGs to recognize deviations indicating anomalies. We introduce a novel self-supervised learning framework for ECG AD, utilizing a vast dataset of normal ECGs to autonomously detect and localize cardiac anomalies. It proposes a novel masking and restoration technique alongside a multi-scale cross-attention module, enhancing the model's ability to integrate global and local signal features. The framework emphasizes accurate localization of anomalies within ECG signals, ensuring the method's clinical relevance and reliability. To reduce the impact of individual variability, the approach further incorporates crucial patient-specific information from ECG reports, such as age and gender, thus enabling accurate identification of a broad spectrum of cardiac anomalies, including rare ones. Utilizing an extensive dataset of 478,803 ECG graphic reports from real-world clinical practice, our method has demonstrated exceptional effectiveness in AD across all tested conditions, regardless of their frequency of occurrence, significantly outperforming existing models. It achieved superior performance metrics, including an AUROC of 91.2%, an F1 score of 83.7%, a sensitivity rate of 84.2%, a specificity of 83.0%, and a precision of 75.6% with a fixed recall rate of 90%. It has also demonstrated robust localization capabilities, with an AUROC of 76.5% and a Dice coefficient of 65.3% for anomaly localization. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2403.12570 [pdf, other]

Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images

Authors: Chaoqin Huang, Aofan Jiang, Jinghao Feng, Ya Zhang, Xinchao Wang, Yanfeng Wang

Abstract: Recent advancements in large-scale visual-language pre-trained models have led to significant progress in zero-/few-shot anomaly detection within natural image domains. However, the substantial domain divergence between natural and medical images limits the effectiveness of these methodologies in medical anomaly detection. This paper introduces a novel lightweight multi-level adaptation and compar… ▽ More Recent advancements in large-scale visual-language pre-trained models have led to significant progress in zero-/few-shot anomaly detection within natural image domains. However, the substantial domain divergence between natural and medical images limits the effectiveness of these methodologies in medical anomaly detection. This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection. Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels. This multi-level adaptation is guided by multi-level, pixel-wise visual-language feature alignment loss functions, which recalibrate the model's focus from object semantics in natural imagery to anomaly identification in medical images. The adapted features exhibit improved generalization across various medical data types, even in zero-shot scenarios where the model encounters unseen medical modalities and anatomical regions during training. Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models, with an average AUC improvement of 6.24% and 7.33% for anomaly classification, 2.03% and 2.37% for anomaly segmentation, under the zero-shot and few-shot settings, respectively. Source code is available at: https://github.com/MediaBrain-SJTU/MVFA-AD △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: CVPR 2024

arXiv:2402.04855 [pdf, other]

Dual-Path Coupled Image Deraining Network via Spatial-Frequency Interaction

Authors: Yuhong He, Aiwen Jiang, Lingfang Jiang, Zhifeng Wang, Lu Wang

Abstract: Transformers have recently emerged as a significant force in the field of image deraining. Existing image deraining methods utilize extensive research on self-attention. Though showcasing impressive results, they tend to neglect critical frequency information, as self-attention is generally less adept at capturing high-frequency details. To overcome this shortcoming, we have developed an innovativ… ▽ More Transformers have recently emerged as a significant force in the field of image deraining. Existing image deraining methods utilize extensive research on self-attention. Though showcasing impressive results, they tend to neglect critical frequency information, as self-attention is generally less adept at capturing high-frequency details. To overcome this shortcoming, we have developed an innovative Dual-Path Coupled Deraining Network (DPCNet) that integrates information from both spatial and frequency domains through Spatial Feature Extraction Block (SFEBlock) and Frequency Feature Extraction Block (FFEBlock). We have further introduced an effective Adaptive Fusion Module (AFM) for the dual-path feature aggregation. Extensive experiments on six public deraining benchmarks and downstream vision tasks have demonstrated that our proposed method not only outperforms the existing state-of-the-art deraining method but also achieves visually pleasuring results with excellent robustness on downstream vision tasks. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2401.09244 [pdf, other]

Cross-lingual Offensive Language Detection: A Systematic Review of Datasets, Transfer Approaches and Challenges

Authors: Aiqi Jiang, Arkaitz Zubiaga

Abstract: The growing prevalence and rapid evolution of offensive language in social media amplify the complexities of detection, particularly highlighting the challenges in identifying such content across diverse languages. This survey presents a systematic and comprehensive exploration of Cross-Lingual Transfer Learning (CLTL) techniques in offensive language detection in social media. Our study stands as… ▽ More The growing prevalence and rapid evolution of offensive language in social media amplify the complexities of detection, particularly highlighting the challenges in identifying such content across diverse languages. This survey presents a systematic and comprehensive exploration of Cross-Lingual Transfer Learning (CLTL) techniques in offensive language detection in social media. Our study stands as the first holistic overview to focus exclusively on the cross-lingual scenario in this domain. We analyse 67 relevant papers and categorise these studies across various dimensions, including the characteristics of multilingual datasets used, the cross-lingual resources employed, and the specific CLTL strategies implemented. According to "what to transfer", we also summarise three main CLTL transfer approaches: instance, feature, and parameter transfer. Additionally, we shed light on the current challenges and future research opportunities in this field. Furthermore, we have made our survey resources available online, including two comprehensive tables that provide accessible references to the multilingual datasets and CLTL methods used in the reviewed literature. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: 35 pages, 7 figures

arXiv:2401.04088 [pdf, other]

Mixtral of Experts

Authors: Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix , et al. (1 additional authors not shown)

Abstract: We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. Even though each token only sees two experts, the selected e… ▽ More We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. Even though each token only sees two experts, the selected experts can be different at each timestep. As a result, each token has access to 47B parameters, but only uses 13B active parameters during inference. Mixtral was trained with a context size of 32k tokens and it outperforms or matches Llama 2 70B and GPT-3.5 across all evaluated benchmarks. In particular, Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks. We also provide a model fine-tuned to follow instructions, Mixtral 8x7B - Instruct, that surpasses GPT-3.5 Turbo, Claude-2.1, Gemini Pro, and Llama 2 70B - chat model on human benchmarks. Both the base and instruct models are released under the Apache 2.0 license. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: See more details at https://mistral.ai/news/mixtral-of-experts/

arXiv:2312.06162 [pdf, other]

Textual Prompt Guided Image Restoration

Authors: Qiuhai Yan, Aiwen Jiang, Kang Chen, Long Peng, Qiaosi Yi, Chunjie Zhang

Abstract: Image restoration has always been a cutting-edge topic in the academic and industrial fields of computer vision. Since degradation signals are often random and diverse, "all-in-one" models that can do blind image restoration have been concerned in recent years. Early works require training specialized headers and tails to handle each degradation of concern, which are manually cumbersome. Recent wo… ▽ More Image restoration has always been a cutting-edge topic in the academic and industrial fields of computer vision. Since degradation signals are often random and diverse, "all-in-one" models that can do blind image restoration have been concerned in recent years. Early works require training specialized headers and tails to handle each degradation of concern, which are manually cumbersome. Recent works focus on learning visual prompts from data distribution to identify degradation type. However, the prompts employed in most of models are non-text, lacking sufficient emphasis on the importance of human-in-the-loop. In this paper, an effective textual prompt guided image restoration model has been proposed. In this model, task-specific BERT is fine-tuned to accurately understand user's instructions and generating textual prompt guidance. Depth-wise multi-head transposed attentions and gated convolution modules are designed to bridge the gap between textual prompts and visual features. The proposed model has innovatively introduced semantic prompts into low-level visual domain. It highlights the potential to provide a natural, precise, and controllable way to perform image restoration tasks. Extensive experiments have been done on public denoising, dehazing and deraining datasets. The experiment results demonstrate that, compared with popular state-of-the-art methods, the proposed model can obtain much more superior performance, achieving accurate recognition and removal of degradation without increasing model's complexity. Related source codes and data will be publicly available on github site https://github.com/MoTong-AI-studio/TextPromptIR. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: 12 pages, 10figures

arXiv:2311.03755 [pdf, other]

Multilingual Mathematical Autoformalization

Authors: Albert Q. Jiang, Wenda Li, Mateja Jamnik

Abstract: Autoformalization is the task of translating natural language materials into machine-verifiable formalisations. Progress in autoformalization research is hindered by the lack of a sizeable dataset consisting of informal-formal pairs expressing the same essence. Existing methods tend to circumvent this challenge by manually curating small corpora or using few-shot learning with large language model… ▽ More Autoformalization is the task of translating natural language materials into machine-verifiable formalisations. Progress in autoformalization research is hindered by the lack of a sizeable dataset consisting of informal-formal pairs expressing the same essence. Existing methods tend to circumvent this challenge by manually curating small corpora or using few-shot learning with large language models. But these methods suffer from data scarcity and formal language acquisition difficulty. In this work, we create $\texttt{MMA}$, a large, flexible, multilingual, and multi-domain dataset of informal-formal pairs, by using a language model to translate in the reverse direction, that is, from formal mathematical statements into corresponding informal ones. Experiments show that language models fine-tuned on $\texttt{MMA}$ produce $16-18\%$ of statements acceptable with minimal corrections on the $\texttt{miniF2F}$ and $\texttt{ProofNet}$ benchmarks, up from $0\%$ with the base model. We demonstrate that fine-tuning on multilingual formal data results in more capable autoformalization models even when deployed on monolingual tasks. △ Less

Submitted 9 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

arXiv:2310.10631 [pdf, other]

Llemma: An Open Language Model For Mathematics

Authors: Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen McAleer, Albert Q. Jiang, Jia Deng, Stella Biderman, Sean Welleck

Abstract: We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool u… ▽ More We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments. △ Less

Submitted 15 March, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: Updated references; corrected description of COPRA search budget

arXiv:2310.06825 [pdf, other]

Mistral 7B

Authors: Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed

Abstract: We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences o… ▽ More We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and automated benchmarks. Our models are released under the Apache 2.0 license. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: Models and code are available at https://mistral.ai/news/announcing-mistral-7b/

arXiv:2310.01508 [pdf, other]

CODA: Temporal Domain Generalization via Concept Drift Simulator

Authors: Chia-Yuan Chang, Yu-Neng Chuang, Zhimeng Jiang, Kwei-Herng Lai, Anxiao Jiang, Na Zou

Abstract: In real-world applications, machine learning models often become obsolete due to shifts in the joint distribution arising from underlying temporal trends, a phenomenon known as the "concept drift". Existing works propose model-specific strategies to achieve temporal generalization in the near-future domain. However, the diverse characteristics of real-world datasets necessitate customized predicti… ▽ More In real-world applications, machine learning models often become obsolete due to shifts in the joint distribution arising from underlying temporal trends, a phenomenon known as the "concept drift". Existing works propose model-specific strategies to achieve temporal generalization in the near-future domain. However, the diverse characteristics of real-world datasets necessitate customized prediction model architectures. To this end, there is an urgent demand for a model-agnostic temporal domain generalization approach that maintains generality across diverse data modalities and architectures. In this work, we aim to address the concept drift problem from a data-centric perspective to bypass considering the interaction between data and model. Developing such a framework presents non-trivial challenges: (i) existing generative models struggle to generate out-of-distribution future data, and (ii) precisely capturing the temporal trends of joint distribution along chronological source domains is computationally infeasible. To tackle the challenges, we propose the COncept Drift simulAtor (CODA) framework incorporating a predicted feature correlation matrix to simulate future data for model training. Specifically, CODA leverages feature correlations to represent data characteristics at specific time points, thereby circumventing the daunting computational costs. Experimental results demonstrate that using CODA-generated data as training input effectively achieves temporal domain generalization across different model architectures. △ Less

Submitted 2 October, 2023; originally announced October 2023.

arXiv:2309.14658 [pdf, other]

Improvements on Scalable Stochastic Bayesian Inference Methods for Multivariate Hawkes Process

Authors: Alex Ziyu Jiang, Abel Rodríguez

Abstract: Multivariate Hawkes Processes (MHPs) are a class of point processes that can account for complex temporal dynamics among event sequences. In this work, we study the accuracy and computational efficiency of three classes of algorithms which, while widely used in the context of Bayesian inference, have rarely been applied in the context of MHPs: stochastic gradient expectation-maximization, stochast… ▽ More Multivariate Hawkes Processes (MHPs) are a class of point processes that can account for complex temporal dynamics among event sequences. In this work, we study the accuracy and computational efficiency of three classes of algorithms which, while widely used in the context of Bayesian inference, have rarely been applied in the context of MHPs: stochastic gradient expectation-maximization, stochastic gradient variational inference and stochastic gradient Langevin Monte Carlo. An important contribution of this paper is a novel approximation to the likelihood function that allows us to retain the computational advantages associated with conjugate settings while reducing approximation errors associated with the boundary effects. The comparisons are based on various simulated scenarios as well as an application to the study the risk dynamics in the Standard & Poor's 500 intraday index prices among its 11 sectors. △ Less

Submitted 15 January, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

arXiv:2309.13270 [pdf, other]

BART-SIMP: a novel framework for flexible spatial covariate modeling and prediction using Bayesian additive regression trees

Authors: Alex Ziyu Jiang, Jon Wakefield

Abstract: Prediction is a classic challenge in spatial statistics and the inclusion of spatial covariates can greatly improve predictive performance when incorporated into a model with latent spatial effects. It is desirable to develop flexible regression models that allow for nonlinearities and interactions in the covariate structure. Machine learning models have been suggested in the spatial context, allo… ▽ More Prediction is a classic challenge in spatial statistics and the inclusion of spatial covariates can greatly improve predictive performance when incorporated into a model with latent spatial effects. It is desirable to develop flexible regression models that allow for nonlinearities and interactions in the covariate structure. Machine learning models have been suggested in the spatial context, allowing for spatial dependence in the residuals, but fail to provide reliable uncertainty estimates. In this paper, we investigate a novel combination of a Gaussian process spatial model and a Bayesian Additive Regression Tree (BART) model. The computational burden of the approach is reduced by combining Markov chain Monte Carlo (MCMC) with the Integrated Nested Laplace Approximation (INLA) technique. We study the performance of the method via simulations and use the model to predict anthropometric responses, collected via household cluster samples in Kenya. △ Less

Submitted 23 September, 2023; originally announced September 2023.

arXiv:2308.04789 [pdf, other]

Multi-Scale Memory Comparison for Zero-/Few-Shot Anomaly Detection

Authors: Chaoqin Huang, Aofan Jiang, Ya Zhang, Yanfeng Wang

Abstract: Anomaly detection has gained considerable attention due to its broad range of applications, particularly in industrial defect detection. To address the challenges of data collection, researchers have introduced zero-/few-shot anomaly detection techniques that require minimal normal images for each category. However, complex industrial scenarios often involve multiple objects, presenting a signific… ▽ More Anomaly detection has gained considerable attention due to its broad range of applications, particularly in industrial defect detection. To address the challenges of data collection, researchers have introduced zero-/few-shot anomaly detection techniques that require minimal normal images for each category. However, complex industrial scenarios often involve multiple objects, presenting a significant challenge. In light of this, we propose a straightforward yet powerful multi-scale memory comparison framework for zero-/few-shot anomaly detection. Our approach employs a global memory bank to capture features across the entire image, while an individual memory bank focuses on simplified scenes containing a single object. The efficacy of our method is validated by its remarkable achievement of 4th place in the zero-shot track and 2nd place in the few-shot track of the Visual Anomaly and Novelty Detection (VAND) competition. △ Less

Submitted 1 January, 2024; v1 submitted 9 August, 2023; originally announced August 2023.

Comments: VAND Runner-up Winner in CVPR 2023

arXiv:2308.01639 [pdf, other]

Multi-scale Cross-restoration Framework for Electrocardiogram Anomaly Detection

Authors: Aofan Jiang, Chaoqin Huang, Qing Cao, Shuang Wu, Zi Zeng, Kang Chen, Ya Zhang, Yanfeng Wang

Abstract: Electrocardiogram (ECG) is a widely used diagnostic tool for detecting heart conditions. Rare cardiac diseases may be underdiagnosed using traditional ECG analysis, considering that no training dataset can exhaust all possible cardiac disorders. This paper proposes using anomaly detection to identify any unhealthy status, with normal ECGs solely for training. However, detecting anomalies in ECG ca… ▽ More Electrocardiogram (ECG) is a widely used diagnostic tool for detecting heart conditions. Rare cardiac diseases may be underdiagnosed using traditional ECG analysis, considering that no training dataset can exhaust all possible cardiac disorders. This paper proposes using anomaly detection to identify any unhealthy status, with normal ECGs solely for training. However, detecting anomalies in ECG can be challenging due to significant inter-individual differences and anomalies present in both global rhythm and local morphology. To address this challenge, this paper introduces a novel multi-scale cross-restoration framework for ECG anomaly detection and localization that considers both local and global ECG characteristics. The proposed framework employs a two-branch autoencoder to facilitate multi-scale feature learning through a masking and restoration process, with one branch focusing on global features from the entire ECG and the other on local features from heartbeat-level details, mimicking the diagnostic process of cardiologists. Anomalies are identified by their high restoration errors. To evaluate the performance on a large number of individuals, this paper introduces a new challenging benchmark with signal point-level ground truths annotated by experienced cardiologists. The proposed method demonstrates state-of-the-art performance on this benchmark and two other well-known ECG datasets. The benchmark dataset and source code are available at: \url{https://github.com/MediaBrain-SJTU/ECGAD} △ Less

Submitted 3 August, 2023; originally announced August 2023.

Comments: MICCAI 2023 Early Accept

arXiv:2307.05795 [pdf]

Research Protocol for the Google Health Digital Well-being Study

Authors: Daniel McDuff, Andrew Barakat, Ari Winbush, Allen Jiang, Felicia Cordeiro, Ryann Crowley, Lauren E. Kahn, John Hernandez, Nicholas B. Allen

Abstract: The impact of digital device use on health and well-being is a pressing question to which individuals, families, schools, policy makers, legislators, and digital designers are all demanding answers. However, the scientific literature on this topic to date is marred by small and/or unrepresentative samples, poor measurement of core constructs (e.g., device use, smartphone addiction), and a limited… ▽ More The impact of digital device use on health and well-being is a pressing question to which individuals, families, schools, policy makers, legislators, and digital designers are all demanding answers. However, the scientific literature on this topic to date is marred by small and/or unrepresentative samples, poor measurement of core constructs (e.g., device use, smartphone addiction), and a limited ability to address the psychological and behavioral mechanisms that may underlie the relationships between device use and well-being. A number of recent authoritative reviews have made urgent calls for future research projects to address these limitations. The critical role of research is to identify which patterns of use are associated with benefits versus risks, and who is more vulnerable to harmful versus beneficial outcomes, so that we can pursue evidence-based product design, education, and regulation aimed at maximizing benefits and minimizing risks of smartphones and other digital devices. We describe a protocol for a Digital Well-Being (DWB) study to help answer these questions. △ Less

Submitted 11 July, 2023; originally announced July 2023.

arXiv:2306.01694 [pdf, other]

Evaluating Language Models for Mathematics through Interactions

Authors: Katherine M. Collins, Albert Q. Jiang, Simon Frieder, Lionel Wong, Miri Zilka, Umang Bhatt, Thomas Lukasiewicz, Yuhuai Wu, Joshua B. Tenenbaum, William Hart, Timothy Gowers, Wenda Li, Adrian Weller, Mateja Jamnik

Abstract: There is much excitement about the opportunity to harness the power of large language models (LLMs) when building problem-solving assistants. However, the standard methodology of evaluating LLMs relies on static pairs of inputs and outputs, and is insufficient for making an informed decision about which LLMs and under which assistive settings can they be sensibly used. Static assessment fails to a… ▽ More There is much excitement about the opportunity to harness the power of large language models (LLMs) when building problem-solving assistants. However, the standard methodology of evaluating LLMs relies on static pairs of inputs and outputs, and is insufficient for making an informed decision about which LLMs and under which assistive settings can they be sensibly used. Static assessment fails to account for the essential interactive element in LLM deployment, and therefore limits how we understand language model capabilities. We introduce CheckMate, an adaptable prototype platform for humans to interact with and evaluate LLMs. We conduct a study with CheckMate to evaluate three language models (InstructGPT, ChatGPT, and GPT-4) as assistants in proving undergraduate-level mathematics, with a mixed cohort of participants from undergraduate students to professors of mathematics. We release the resulting interaction and rating dataset, MathConverse. By analysing MathConverse, we derive a taxonomy of human behaviours and uncover that despite a generally positive correlation, there are notable instances of divergence between correctness and perceived helpfulness in LLM generations, amongst other findings. Further, we garner a more granular understanding of GPT-4 mathematical problem-solving through a series of case studies, contributed by expert mathematicians. We conclude with actionable takeaways for ML practitioners and mathematicians: models that communicate uncertainty respond well to user corrections, and are more interpretable and concise may constitute better assistants. Interactive evaluation is a promising way to navigate the capability of these models; humans should be aware of language models' algebraic fallibility and discern where they are appropriate to use. △ Less

Submitted 5 November, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

arXiv:2305.04520 [pdf, other]

Minkowski Functionals of Large-Scale Structure as a Probe of Modified Gravity

Authors: Aoxiang Jiang, Wei Liu, Baojiu Li, Cristian Barrera-Hinojosa, Yufei Zhang, Wenjuan Fang

Abstract: In this study, we explore the potential of utilizing the four Minkowski functionals, which can fully describe the morphological properties of the large-scale structures, as a robust tool for investigating the modified gravity, particularly on non-linear and quasi-linear scales. With the assistance of the N-body simulation, we employ the Minkowski functionals to probe the Hu-Sawicki f(R) gravity mo… ▽ More In this study, we explore the potential of utilizing the four Minkowski functionals, which can fully describe the morphological properties of the large-scale structures, as a robust tool for investigating the modified gravity, particularly on non-linear and quasi-linear scales. With the assistance of the N-body simulation, we employ the Minkowski functionals to probe the Hu-Sawicki f(R) gravity model. The focus is on understanding the morphorlogical properties extracted by the Minkowski functionals and their sensitivity to modified gravity. Our analysis involves a comprehensive examination of the cosmic variance arising from finite simulation volumes. By systematically varying smoothing scales and redshifts, we quantify the information encoded in the Minkowski functionals measured from the dark-matter density field. The goal is to assess the capacity of the Minkowksi functionals to constrain the model and explore potential improvements through their combination. Additionally, we investigate the impact of using biased tracers such as dark matter halos and the halo occupation distribution galaxies on the modified gravity signatures within the Minkowksi functionals of the LSS. Furthermore, we evaluate the influence of the redshift space distortion on the observed results. In summary, our study suggests that the Minkowski functionals of the large-scale structures hold promise as a stringent tool for constraining modified gravity and offer valuable insights into the morphological features of the cosmic web. △ Less

Submitted 19 March, 2024; v1 submitted 8 May, 2023; originally announced May 2023.

Comments: 21 pages, 12 figures, accepted by PRD

arXiv:2305.02910 [pdf, other]

Dynamical hotness, star formation quenching and growth of supermassive black holes

Authors: Hui Hong, Huiyuan Wang, H. J. Mo, Ziwen Zhang, Guangwen Chen, Wentao Luo, Tinggui Wang, Pengfei Li, Renjie Li, Yao yao, Aoxiang Jiang

Abstract: A stellar system is dynamically hot when its kinetic energy is dominated by random motion represented by the velocity dispersion $σ_{\rm hot} (M_*)$. We use MaNGA data to obtain inner and outer dispersion of a galaxy, $σ_{\rm in}$ and $σ_{\rm out}$, to characterize its dynamical status and study its connection with star formation quenching and the growth of supermassive black hole (SMBH). We divid… ▽ More A stellar system is dynamically hot when its kinetic energy is dominated by random motion represented by the velocity dispersion $σ_{\rm hot} (M_*)$. We use MaNGA data to obtain inner and outer dispersion of a galaxy, $σ_{\rm in}$ and $σ_{\rm out}$, to characterize its dynamical status and study its connection with star formation quenching and the growth of supermassive black hole (SMBH). We divide galaxies into fully quenched (FQGs), partially quenched (PQGs) and fully star-forming (FSGs) populations, and identify quenched central cores (QCCs) in PQGs. The galaxy distribution in $σ_{\rm in}/σ_{\rm hot}$-$σ_{\rm out}/σ_{\rm hot}$ diagram is L-shaped, consisting of a horizontal sequence ($σ_{\rm out}/σ_{\rm hot}\sim0$) and a vertical sequence ($σ_{\rm in}/σ_{\rm hot}\sim1$). FQGs and QCCs are located at the top of vertical sequence, $σ_{\rm out}/σ_{\rm hot}\sim1$, therefore they are dynamically hot over their entire bodies. PQGs reside along vertical sequence, so they have hot center but cold outskirt. FSGs are diverse and can be found in both sequences. Galaxy structural properties, star formation and AGN activities make a transition along horizontal sequence at $\log(σ_{\rm in}/σ_{\rm hot})\sim-0.3$, and along vertical sequence at $\log(σ_{\rm out}/σ_{\rm hot})\sim-0.3$. The fractions of optical AGNs and barred galaxies increase rapidly in the first transition and decline rapidly in the second; radio galaxies are located at the top of vertical sequence. Our results demonstrate that star formation quenching and SMBH growth are effective only in dynamically hot systems. A simple model along this line can reproduce the observed SMBH scaling relations. We discuss how secular processes and strong interactions can make a system dynamically hot, and lead to the SMBH growth and star formation quenching. △ Less

Submitted 19 July, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

Comments: 24 pages, 19 figures, 1 table, accepted by ApJ

arXiv:2304.07439 [pdf, ps, other]

doi 10.1007/s00605-023-01843-0

Q-Kostka polynomials and spin Green polynomials

Authors: Anguo Jiang, Naihuan Jing, Ning Liu

Abstract: We study the $Q$-Kostka polynomials $L_{λμ}(t)$ by the vertex operator realization of the $Q$-Hall-Littlewood functions $G_λ(x;t)$ and derive new formulae for $L_{λμ}(t)$. In particular, we have established stability property for the Q-Kostka polynomials. We also introduce spin Green polynomials $Y^λ_μ(t)$ as both an analogue of the Green polynomials and deformation of the spin irreducible charact… ▽ More We study the $Q$-Kostka polynomials $L_{λμ}(t)$ by the vertex operator realization of the $Q$-Hall-Littlewood functions $G_λ(x;t)$ and derive new formulae for $L_{λμ}(t)$. In particular, we have established stability property for the Q-Kostka polynomials. We also introduce spin Green polynomials $Y^λ_μ(t)$ as both an analogue of the Green polynomials and deformation of the spin irreducible characters of $\mathfrak S_n$. Iterative formulas of the spin Green polynomials are given and some favorable properties parallel to the Green polynomials are obtained. Tables of $Y^λ_μ(t)$ are included for $n\leq7.$ △ Less

Submitted 14 April, 2023; originally announced April 2023.

Comments: 5 tables

MSC Class: Primary: 05E05; Secondary: 17B69; 05E10

Journal ref: Monatsh. Math. 201 (2023), no. 1, 109-125

arXiv:2303.17949 [pdf, other]

Unsupervised Anomaly Detection and Localization of Machine Audio: A GAN-based Approach

Authors: Anbai Jiang, Wei-Qiang Zhang, Yufeng Deng, Pingyi Fan, Jia Liu

Abstract: Automatic detection of machine anomaly remains challenging for machine learning. We believe the capability of generative adversarial network (GAN) suits the need of machine audio anomaly detection, yet rarely has this been investigated by previous work. In this paper, we propose AEGAN-AD, a totally unsupervised approach in which the generator (also an autoencoder) is trained to reconstruct input s… ▽ More Automatic detection of machine anomaly remains challenging for machine learning. We believe the capability of generative adversarial network (GAN) suits the need of machine audio anomaly detection, yet rarely has this been investigated by previous work. In this paper, we propose AEGAN-AD, a totally unsupervised approach in which the generator (also an autoencoder) is trained to reconstruct input spectrograms. It is pointed out that the denoising nature of reconstruction deprecates its capacity. Thus, the discriminator is redesigned to aid the generator during both training stage and detection stage. The performance of AEGAN-AD on the dataset of DCASE 2022 Challenge TASK 2 demonstrates the state-of-the-art result on five machine types. A novel anomaly localization method is also investigated. Source code available at: www.github.com/jianganbai/AEGAN-AD △ Less

Submitted 31 March, 2023; originally announced March 2023.

Comments: Accepted by ICASSP 2023

arXiv:2303.16086 [pdf, other]

Grothendieck Duality via Diagonally Supported Sheaves

Authors: Andy Jiang

Abstract: Following a formula found in the paper of Avramov, Iyengar, Lipman, and Nayak (2010) and ideas of Neeman and Khusyairi, we indicate that Grothendieck duality for finite tor-amplitude maps can be developed from scratch via the formula $f^! := δ^*π_1^{\times}f^*$. Our strategy centers on the subcategory $Γ_Δ(\mathrm{QCoh}(X \times X))$ of quasicoherent sheaves on $X \times X$ supported on the diagon… ▽ More Following a formula found in the paper of Avramov, Iyengar, Lipman, and Nayak (2010) and ideas of Neeman and Khusyairi, we indicate that Grothendieck duality for finite tor-amplitude maps can be developed from scratch via the formula $f^! := δ^*π_1^{\times}f^*$. Our strategy centers on the subcategory $Γ_Δ(\mathrm{QCoh}(X \times X))$ of quasicoherent sheaves on $X \times X$ supported on the diagonal. By exclusively using this subcategory instead of the full category $\mathrm{QCoh}(X \times X)$ we give systematic categorical proofs of results in Grothendieck duality and reprove many formulas found in Neeman (2018). We also relate some results in Grothendieck duality with properties of the sheaf of (derived) Grothendieck differential operators. △ Less

Submitted 28 March, 2023; originally announced March 2023.

Comments: 27 pages

arXiv:2303.16083 [pdf, ps, other]

The Derived Ring of Differential Operators

Authors: Andy Jiang

Abstract: By reading a standard formula for the ring of Grothendieck differential operators in a derived way, we construct a derived (sheaf of) ring of Grothendieck differential operators for Noetherian schemes $X$ separated and finite-type over a base $S$, when the map $X \to S$ is finite tor-amplitude. Using this ring of differential operators, we (re-)develop the theory of $D$-modules from scratch and sh… ▽ More By reading a standard formula for the ring of Grothendieck differential operators in a derived way, we construct a derived (sheaf of) ring of Grothendieck differential operators for Noetherian schemes $X$ separated and finite-type over a base $S$, when the map $X \to S$ is finite tor-amplitude. Using this ring of differential operators, we (re-)develop the theory of $D$-modules from scratch and show an equivalence of categories between $D$-modules using our definition and crystals over the infinitesimal site. △ Less

Submitted 28 March, 2023; originally announced March 2023.

Comments: 46 pages

arXiv:2303.13031 [pdf, other]

doi 10.1109/CVPR52729.2023.02129

Learning a Practical SDR-to-HDRTV Up-conversion using New Dataset and Degradation Models

Authors: Cheng Guo, Leidong Fan, Ziyu Xue, and Xiuhua Jiang

Abstract: In media industry, the demand of SDR-to-HDRTV up-conversion arises when users possess HDR-WCG (high dynamic range-wide color gamut) TVs while most off-the-shelf footage is still in SDR (standard dynamic range). The research community has started tackling this low-level vision task by learning-based approaches. When applied to real SDR, yet, current methods tend to produce dim and desaturated resul… ▽ More In media industry, the demand of SDR-to-HDRTV up-conversion arises when users possess HDR-WCG (high dynamic range-wide color gamut) TVs while most off-the-shelf footage is still in SDR (standard dynamic range). The research community has started tackling this low-level vision task by learning-based approaches. When applied to real SDR, yet, current methods tend to produce dim and desaturated result, making nearly no improvement on viewing experience. Different from other network-oriented methods, we attribute such deficiency to training set (HDR-SDR pair). Consequently, we propose new HDRTV dataset (dubbed HDRTV4K) and new HDR-to-SDR degradation models. Then, it's used to train a luminance-segmented network (LSN) consisting of a global mapping trunk, and two Transformer branches on bright and dark luminance range. We also update assessment criteria by tailored metrics and subjective experiment. Finally, ablation studies are conducted to prove the effectiveness. Our work is available at: https://github.com/AndreGuo/HDRTVDM. △ Less

Submitted 23 March, 2023; originally announced March 2023.

Comments: Accepted by CVPR2023

arXiv:2303.08774 [pdf, other]

GPT-4 Technical Report

Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was developing infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4. △ Less

Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: 100 pages; updated authors list; fixed author names and added citation

arXiv:2303.04488 [pdf, other]

Magnushammer: A Transformer-Based Approach to Premise Selection

Authors: Maciej Mikuła, Szymon Tworkowski, Szymon Antoniak, Bartosz Piotrowski, Albert Qiaochu Jiang, Jin Peng Zhou, Christian Szegedy, Łukasz Kuciński, Piotr Miłoś, Yuhuai Wu

Abstract: This paper presents a novel approach to premise selection, a crucial reasoning task in automated theorem proving. Traditionally, symbolic methods that rely on extensive domain knowledge and engineering effort are applied to this task. In contrast, this work demonstrates that contrastive training with the transformer architecture can achieve higher-quality retrieval of relevant premises, without th… ▽ More This paper presents a novel approach to premise selection, a crucial reasoning task in automated theorem proving. Traditionally, symbolic methods that rely on extensive domain knowledge and engineering effort are applied to this task. In contrast, this work demonstrates that contrastive training with the transformer architecture can achieve higher-quality retrieval of relevant premises, without the engineering overhead. Our method, Magnushammer, outperforms the most advanced and widely used automation tool in interactive theorem proving called Sledgehammer. On the PISA and miniF2F benchmarks Magnushammer achieves $59.5\%$ (against $38.3\%$) and $34.0\%$ (against $20.9\%$) success rates, respectively. By combining \method with a language-model-based automated theorem prover, we further improve the state-of-the-art proof success rate from $57.0\%$ to $71.0\%$ on the PISA benchmark using $4$x fewer parameters. Moreover, we develop and open source a novel dataset for premise selection, containing textual representations of (proof state, relevant premise) pairs. To the best of our knowledge, this is the largest available premise selection dataset, and the first one for the Isabelle proof assistant. △ Less

Submitted 18 March, 2024; v1 submitted 8 March, 2023; originally announced March 2023.

Comments: ICLR 2024

arXiv:2302.08162 [pdf, other]

doi 10.1088/1475-7516/2023/09/037

Probing massive neutrinos with the Minkowski functionals of the galaxy distribution

Authors: Wei Liu, Aoxiang Jiang, Wenjuan Fang

Abstract: The characteristic signatures of massive neutrinos on large-scale structure (LSS), if fully captured, can be used to put a stringent constraint on their mass sum, $M_ν$. Previous work utilizing N-body simulations has shown the Minkowski functionals (MFs) of LSS can reveal the imprints of massive neutrinos on LSS, provide important complementary information to two-point statistics and significantly… ▽ More The characteristic signatures of massive neutrinos on large-scale structure (LSS), if fully captured, can be used to put a stringent constraint on their mass sum, $M_ν$. Previous work utilizing N-body simulations has shown the Minkowski functionals (MFs) of LSS can reveal the imprints of massive neutrinos on LSS, provide important complementary information to two-point statistics and significantly improve constraints on $M_ν$. In this work, we take a step forward and apply the statistics to the biased tracers of LSS, i.e. the galaxies, and in redshift space. We perform a Fisher matrix analysis and quantify the constraining power of the MFs by using the Molino mock galaxy catalogs, which are constructed based on the halo occupation distribution (HOD) framework with parameters for the SDSS $M_r < -21.5$ and -22 galaxy samples. We find the MFs give tighter constraints on all of the cosmological parameters that we consider than the power spectrum. The constraints on $Ω_{\mathrm{m}}, Ω_{\mathrm{b}}, h, n_s, σ_8$, and $M_ν$ from the MFs are better by a factor of 1.9, 2.9, 3.7, 4.2, 2.5, and 5.7, respectively, after marginalizing over the HOD parameters. Specifically, for $M_ν$, we obtain a 1$σ$ constraint of 0.059 eV with the MFs alone for a volume of only $\left(1 h^{-1} \mathrm{Gpc}\right)^3$. △ Less

Submitted 18 September, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

Comments: 38 pages, 10 figures, 5 tables. Accepted for publication in JCAP. This is the second in our series of work, the first is arXiv:2204.02945

arXiv:2212.10405 [pdf, other]

AnnoBERT: Effectively Representing Multiple Annotators' Label Choices to Improve Hate Speech Detection

Authors: Wenjie Yin, Vibhor Agarwal, Aiqi Jiang, Arkaitz Zubiaga, Nishanth Sastry

Abstract: Supervised approaches generally rely on majority-based labels. However, it is hard to achieve high agreement among annotators in subjective tasks such as hate speech detection. Existing neural network models principally regard labels as categorical variables, while ignoring the semantic information in diverse label texts. In this paper, we propose AnnoBERT, a first-of-its-kind architecture integra… ▽ More Supervised approaches generally rely on majority-based labels. However, it is hard to achieve high agreement among annotators in subjective tasks such as hate speech detection. Existing neural network models principally regard labels as categorical variables, while ignoring the semantic information in diverse label texts. In this paper, we propose AnnoBERT, a first-of-its-kind architecture integrating annotator characteristics and label text with a transformer-based model to detect hate speech, with unique representations based on each annotator's characteristics via Collaborative Topic Regression (CTR) and integrate label text to enrich textual representations. During training, the model associates annotators with their label choices given a piece of text; during evaluation, when label information is not available, the model predicts the aggregated label given by the participating annotators by utilising the learnt association. The proposed approach displayed an advantage in detecting hate speech, especially in the minority class and edge cases with annotator disagreement. Improvement in the overall performance is the largest when the dataset is more label-imbalanced, suggesting its practical value in identifying real-world hate speech, as the volume of hate speech in-the-wild is extremely small on social media, when compared with normal (non-hate) speech. Through ablation studies, we show the relative contributions of annotator embeddings and label text to the model performance, and tested a range of alternative annotator embeddings and label text combinations. △ Less

Submitted 10 January, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: accepted at ICWSM 2023

Journal ref: 17th International AAAI Conference on Web and Social Media (ICWSM 2023). Please cite accordingly

arXiv:2211.11581 [pdf, other]

Modeling 100% Electrified Transportation in NYC

Authors: Jingrong Zhang, Amber Jiang, Brian Newborn, Sara Kou, Robert Mieth

Abstract: Envisioning a future 100% electrified transportation sector, this paper uses socio-economic, demographic, and geographic data to assess electric energy demand from commuter traffic. We explore the individual mode choices, which allows to create mode-mix scenarios for the entire population, and quantify the electric energy demand for each scenario using technical specifications of battery and elect… ▽ More Envisioning a future 100% electrified transportation sector, this paper uses socio-economic, demographic, and geographic data to assess electric energy demand from commuter traffic. We explore the individual mode choices, which allows to create mode-mix scenarios for the entire population, and quantify the electric energy demand for each scenario using technical specifications of battery and electric drives technology in combination with different charging scenarios. Using data sets for New York City, our results highlight the need for infrastructure investments, the usefulness of flexible charging policies, and the positive impact of incentivizing micromobility and mass-transit options. Our model and results are publicly available as interactive dashboard. △ Less

Submitted 17 February, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

Comments: Accepted for publication at the 2023 IEEE PES General Meeting

arXiv:2211.08447 [pdf, other]

SexWEs: Domain-Aware Word Embeddings via Cross-lingual Semantic Specialisation for Chinese Sexism Detection in Social Media

Authors: Aiqi Jiang, Arkaitz Zubiaga

Abstract: The goal of sexism detection is to mitigate negative online content targeting certain gender groups of people. However, the limited availability of labeled sexism-related datasets makes it problematic to identify online sexism for low-resource languages. In this paper, we address the task of automatic sexism detection in social media for one low-resource language -- Chinese. Rather than collecting… ▽ More The goal of sexism detection is to mitigate negative online content targeting certain gender groups of people. However, the limited availability of labeled sexism-related datasets makes it problematic to identify online sexism for low-resource languages. In this paper, we address the task of automatic sexism detection in social media for one low-resource language -- Chinese. Rather than collecting new sexism data or building cross-lingual transfer learning models, we develop a cross-lingual domain-aware semantic specialisation system in order to make the most of existing data. Semantic specialisation is a technique for retrofitting pre-trained distributional word vectors by integrating external linguistic knowledge (such as lexico-semantic relations) into the specialised feature space. To do this, we leverage semantic resources for sexism from a high-resource language (English) to specialise pre-trained word vectors in the target language (Chinese) to inject domain knowledge. We demonstrate the benefit of our sexist word embeddings (SexWEs) specialised by our framework via intrinsic evaluation of word similarity and extrinsic evaluation of sexism detection. Compared with other specialisation approaches and Chinese baseline word vectors, our SexWEs shows an average score improvement of 0.033 and 0.064 in both intrinsic and extrinsic evaluations, respectively. The ablative results and visualisation of SexWEs also prove the effectiveness of our framework on retrofitting word vectors in low-resource languages. △ Less

Submitted 30 March, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

Comments: accepted at ICWSM 2023

arXiv:2210.12283 [pdf, other]

Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs

Authors: Albert Q. Jiang, Sean Welleck, Jin Peng Zhou, Wenda Li, Jiacheng Liu, Mateja Jamnik, Timothée Lacroix, Yuhuai Wu, Guillaume Lample

Abstract: The formalization of existing mathematical proofs is a notoriously difficult process. Despite decades of research on automation and proof assistants, writing formal proofs remains arduous and only accessible to a few experts. While previous studies to automate formalization focused on powerful search algorithms, no attempts were made to take advantage of available informal proofs. In this work, we… ▽ More The formalization of existing mathematical proofs is a notoriously difficult process. Despite decades of research on automation and proof assistants, writing formal proofs remains arduous and only accessible to a few experts. While previous studies to automate formalization focused on powerful search algorithms, no attempts were made to take advantage of available informal proofs. In this work, we introduce Draft, Sketch, and Prove (DSP), a method that maps informal proofs to formal proof sketches, and uses the sketches to guide an automated prover by directing its search to easier sub-problems. We investigate two relevant setups where informal proofs are either written by humans or generated by a language model. Our experiments and ablation studies show that large language models are able to produce well-structured formal sketches that follow the same reasoning steps as the informal proofs. Guiding an automated prover with these sketches enhances its performance from 20.9% to 39.3% on a collection of mathematical competition problems. △ Less

Submitted 20 February, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

arXiv:2210.07035 [pdf, other]

doi 10.1021/acs.jctc.2c00996

Tensor Hypercontraction Form of the Perturbative Triples Energy in Coupled-Cluster Theory

Authors: Andy Jiang, Justin M. Turney, Henry F. Schaefer III

Abstract: We present the working equations for a reduced-scaling method of evaluating the perturbative triples (T) energy in coupled-cluster theory, through the tensor hypercontraction (THC) of the triples amplitudes ($t_{ijk}^{abc}$). Through our method we can reduce the scaling of the (T) energy from the traditional O($N^{7}$) to a more modest O($N^{5}$). We also discuss implementation details to aid futu… ▽ More We present the working equations for a reduced-scaling method of evaluating the perturbative triples (T) energy in coupled-cluster theory, through the tensor hypercontraction (THC) of the triples amplitudes ($t_{ijk}^{abc}$). Through our method we can reduce the scaling of the (T) energy from the traditional O($N^{7}$) to a more modest O($N^{5}$). We also discuss implementation details to aid future research, development, and software realization of this method. Additionally, we show that this method yields sub-millihartree (mEh) differences from CCSD(T) when evaluating absolute energies, and sub-0.1 kcal/mol energy differences when evaluating relative energies. Finally, we demonstrate that this method converges to the true CCSD(T) energy through the systematic increasing of the rank or eigenvalue tolerance of the orthogonal projector, as well as exhibiting sub-linear to linear error growth with respect to system size. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Journal ref: Journal of Chemical Theory and Computation 2023

arXiv:2208.03274 [pdf, other]

A Holistic Approach to Undesired Content Detection in the Real World

Authors: Todor Markov, Chong Zhang, Sandhini Agarwal, Tyna Eloundou, Teddy Lee, Steven Adler, Angela Jiang, Lilian Weng

Abstract: We present a holistic approach to building a robust and useful natural language classification system for real-world content moderation. The success of such a system relies on a chain of carefully designed and executed steps, including the design of content taxonomies and labeling instructions, data quality control, an active learning pipeline to capture rare events, and a variety of methods to ma… ▽ More We present a holistic approach to building a robust and useful natural language classification system for real-world content moderation. The success of such a system relies on a chain of carefully designed and executed steps, including the design of content taxonomies and labeling instructions, data quality control, an active learning pipeline to capture rare events, and a variety of methods to make the model robust and to avoid overfitting. Our moderation system is trained to detect a broad set of categories of undesired content, including sexual content, hateful content, violence, self-harm, and harassment. This approach generalizes to a wide range of different content taxonomies and can be used to create high-quality content classifiers that outperform off-the-shelf models. △ Less

Submitted 14 February, 2023; v1 submitted 5 August, 2022; originally announced August 2022.

Comments: Oral presentation at AAAI-23

arXiv:2208.02153 [pdf, ps, other]

Finding a Lower Bound for k-Unbounded Hamiltonian Cycles

Authors: Albert R. Jiang

Abstract: Methods to determine the existence of Hamiltonian Cycles in graphs have been extensively studied. However, little research has been done following cases when no Hamiltonian Cycle exists. Let a vertex be "unbounded" if it is visited more than once in a path. Furthermore, let a k-Unbounded Hamiltonian Cycle be a path with finite length that visits every vertex, has adjacent start and end vertices, a… ▽ More Methods to determine the existence of Hamiltonian Cycles in graphs have been extensively studied. However, little research has been done following cases when no Hamiltonian Cycle exists. Let a vertex be "unbounded" if it is visited more than once in a path. Furthermore, let a k-Unbounded Hamiltonian Cycle be a path with finite length that visits every vertex, has adjacent start and end vertices, and contains k unbounded vertices. We consider a novel variant of the Hamiltonian Cycle Problem in which the objective is to find an m-Unbounded Hamiltonian Cycle where m is the minimum value of k such that a k-Unbounded Hamiltonian Cycle exists. We first consider the task on well-known non-Hamiltonian graphs. We then provide an exponential-time brute-force algorithm for the determination of an m-Unbounded Hamiltonian Cycle and discuss approaches to solve the variant through transformations to the Hamiltonian Cycle Problem and the Asymmetric Traveling Salesman Problem. Finally, we present a polynomial-time heuristic for the determination of an m-Unbounded Hamiltonian Cycle that is also shown to be an effective heuristic for the original Hamiltonian Cycle Problem. △ Less

Submitted 8 August, 2022; v1 submitted 3 August, 2022; originally announced August 2022.

Comments: 26 pages, 14 figures

arXiv:2207.07361 [pdf, other]

Registration based Few-Shot Anomaly Detection

Authors: Chaoqin Huang, Haoyan Guan, Aofan Jiang, Ya Zhang, Michael Spratling, Yan-Feng Wang

Abstract: This paper considers few-shot anomaly detection (FSAD), a practical yet under-studied setting for anomaly detection (AD), where only a limited number of normal images are provided for each category at training. So far, existing FSAD studies follow the one-model-per-category learning paradigm used for standard AD, and the inter-category commonality has not been explored. Inspired by how humans dete… ▽ More This paper considers few-shot anomaly detection (FSAD), a practical yet under-studied setting for anomaly detection (AD), where only a limited number of normal images are provided for each category at training. So far, existing FSAD studies follow the one-model-per-category learning paradigm used for standard AD, and the inter-category commonality has not been explored. Inspired by how humans detect anomalies, i.e., comparing an image in question to normal images, we here leverage registration, an image alignment task that is inherently generalizable across categories, as the proxy task, to train a category-agnostic anomaly detection model. During testing, the anomalies are identified by comparing the registered features of the test image and its corresponding support (normal) images. As far as we know, this is the first FSAD method that trains a single generalizable model and requires no re-training or parameter fine-tuning for new categories. Experimental results have shown that the proposed method outperforms the state-of-the-art FSAD methods by 3%-8% in AUC on the MVTec and MPDD benchmarks. △ Less

Submitted 15 July, 2022; originally announced July 2022.

Comments: ECCV 2022 Oral; Code is available at https://github.com/MediaBrain-SJTU/RegAD

arXiv:2206.09576 [pdf, other]

FedSSO: A Federated Server-Side Second-Order Optimization Algorithm

Authors: Xin Ma, Renyi Bao, Jinpeng Jiang, Yang Liu, Arthur Jiang, Jun Yan, Xin Liu, Zhisong Pan

Abstract: In this work, we propose FedSSO, a server-side second-order optimization method for federated learning (FL). In contrast to previous works in this direction, we employ a server-side approximation for the Quasi-Newton method without requiring any training data from the clients. In this way, we not only shift the computation burden from clients to server, but also eliminate the additional communicat… ▽ More In this work, we propose FedSSO, a server-side second-order optimization method for federated learning (FL). In contrast to previous works in this direction, we employ a server-side approximation for the Quasi-Newton method without requiring any training data from the clients. In this way, we not only shift the computation burden from clients to server, but also eliminate the additional communication for second-order updates between clients and server entirely. We provide theoretical guarantee for convergence of our novel method, and empirically demonstrate our fast convergence and communication savings in both convex and non-convex settings. △ Less

Submitted 22 August, 2022; v1 submitted 20 June, 2022; originally announced June 2022.

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2206.03450 [pdf, other]

doi 10.1145/3534929

A Trade-off-centered Framework of Content Moderation

Authors: Jialun Aaron Jiang, Peipei Nie, Jed R. Brubaker, Casey Fiesler

Abstract: Content moderation research typically prioritizes representing and addressing challenges for one group of stakeholders or communities in one type of context. While taking a focused approach is reasonable or even favorable for empirical case studies, it does not address how content moderation works in multiple contexts. Through a systematic literature review of 86 content moderation papers that doc… ▽ More Content moderation research typically prioritizes representing and addressing challenges for one group of stakeholders or communities in one type of context. While taking a focused approach is reasonable or even favorable for empirical case studies, it does not address how content moderation works in multiple contexts. Through a systematic literature review of 86 content moderation papers that document empirical studies, we seek to uncover patterns and tensions within past content moderation research. We find that content moderation can be characterized as a series of trade-offs around moderation actions, styles, philosophies, and values. We discuss how facilitating cooperation and preventing abuse, two key elements in Grimmelmann's definition of moderation, are inherently dialectical in practice. We close by showing how researchers, designers, and moderators can use our framework of trade-offs in their own work, and arguing that trade-offs should be of central importance in investigating and designing content moderation. △ Less

Submitted 7 June, 2022; originally announced June 2022.

Comments: To appear in ACM TOCHI

ACM Class: J.4; K.4.2

arXiv:2205.12615 [pdf, ps, other]

Autoformalization with Large Language Models

Authors: Yuhuai Wu, Albert Q. Jiang, Wenda Li, Markus N. Rabe, Charles Staats, Mateja Jamnik, Christian Szegedy

Abstract: Autoformalization is the process of automatically translating from natural language mathematics to formal specifications and proofs. A successful autoformalization system could advance the fields of formal verification, program synthesis, and artificial intelligence. While the long-term goal of autoformalization seemed elusive for a long time, we show large language models provide new prospects to… ▽ More Autoformalization is the process of automatically translating from natural language mathematics to formal specifications and proofs. A successful autoformalization system could advance the fields of formal verification, program synthesis, and artificial intelligence. While the long-term goal of autoformalization seemed elusive for a long time, we show large language models provide new prospects towards this goal. We make the surprising observation that LLMs can correctly translate a significant portion ($25.3\%$) of mathematical competition problems perfectly to formal specifications in Isabelle/HOL. We demonstrate the usefulness of this process by improving a previously introduced neural theorem prover via training on these autoformalized theorems. Our methodology results in a new state-of-the-art result on the MiniF2F theorem proving benchmark, improving the proof rate from $29.6\%$ to $35.2\%$. △ Less

Submitted 25 May, 2022; originally announced May 2022.

Comments: 44 pages

arXiv:2205.10893 [pdf, other]

Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers

Authors: Albert Q. Jiang, Wenda Li, Szymon Tworkowski, Konrad Czechowski, Tomasz Odrzygóźdź, Piotr Miłoś, Yuhuai Wu, Mateja Jamnik

Abstract: In theorem proving, the task of selecting useful premises from a large library to unlock the proof of a given conjecture is crucially important. This presents a challenge for all theorem provers, especially the ones based on language models, due to their relative inability to reason over huge volumes of premises in text form. This paper introduces Thor, a framework integrating language models and… ▽ More In theorem proving, the task of selecting useful premises from a large library to unlock the proof of a given conjecture is crucially important. This presents a challenge for all theorem provers, especially the ones based on language models, due to their relative inability to reason over huge volumes of premises in text form. This paper introduces Thor, a framework integrating language models and automated theorem provers to overcome this difficulty. In Thor, a class of methods called hammers that leverage the power of automated theorem provers are used for premise selection, while all other tasks are designated to language models. Thor increases a language model's success rate on the PISA dataset from $39\%$ to $57\%$, while solving $8.2\%$ of problems neither language models nor automated theorem provers are able to solve on their own. Furthermore, with a significantly smaller computational budget, Thor can achieve a success rate on the MiniF2F dataset that is on par with the best existing methods. Thor can be instantiated for the majority of popular interactive theorem provers via a straightforward protocol we provide. △ Less

Submitted 22 May, 2022; originally announced May 2022.

arXiv:2204.05552 [pdf]

The Effects of Dynamic Learning and the Forgetting Process on an Optimizing Modelling for Full-Service Repair Pricing Contracts for Medical Devices

Authors: Aiping Jiang, Lin Li, Xuemin Xu, David Y. C. Huang

Abstract: In order to improve the profitability and customer service management of original equipment manufacturers (OEMs) in a market where full-service (FS) and on-call service (OS) co-exist, this article extends the optimizing modelling for pricing FS repair contracts with the effects of dynamic learning and forgetting. Along with considering autonomous learning in maintenance practice, this study also a… ▽ More In order to improve the profitability and customer service management of original equipment manufacturers (OEMs) in a market where full-service (FS) and on-call service (OS) co-exist, this article extends the optimizing modelling for pricing FS repair contracts with the effects of dynamic learning and forgetting. Along with considering autonomous learning in maintenance practice, this study also analyses how induced learning and forgetting process in a workplace put impact on the pricing optimizing model of FS contracts in the portfolio of FS and OS. A numerical analysis based on real data from a medical industry proves that the enhanced FS pricing model discussed here has two main advantages: (1) It could prominently improve repair efficiency, and (2) It help OEMs gain better profits compared to the original FS model and the sole OS maintenance. Sensitivity analysis shows that if internal failure rate increases, the optimized FS price rises gradually until reaching the maximum value, and profitability to the OEM increases overall; if frequency of induced learning goes up, the optimal FS price rises after a short-term downward trend, with a stable profitability to the OEM. △ Less

Submitted 12 April, 2022; originally announced April 2022.

arXiv:2204.02945 [pdf, other]

doi 10.1088/1475-7516/2022/07/045

Probing massive neutrinos with the Minkowski functionals of large-scale structure

Authors: Wei Liu, Aoxiang Jiang, Wenjuan Fang

Abstract: Massive neutrinos suppress the growth of structure under their free-streaming scales. The effect is most prominent on small scales where the widely-used two-point statistics can no longer capture the full information. In this work, we study the signatures massive neutrinos leave on large-scale structure (LSS) as revealed by its morphological properties, which are fully described by $4$ Minkowski f… ▽ More Massive neutrinos suppress the growth of structure under their free-streaming scales. The effect is most prominent on small scales where the widely-used two-point statistics can no longer capture the full information. In this work, we study the signatures massive neutrinos leave on large-scale structure (LSS) as revealed by its morphological properties, which are fully described by $4$ Minkowski functionals (MFs), and quantify the constraints on the summed neutrino mass $M_ν$ from the MFs, by using publicly available N-body simulations. We find the MFs provide important complementary information, and give tighter constraints on $M_ν$ than the power spectrum. Specifically, depending on whether massive neutrinos are included in the density field (the `m' field) or not (the `cb' field), we find the constraint on $M_ν$ from the MFs with a smoothing scale of $R_G=5 h^{-1}$Mpc is $48$ or $4$ times better than that from the power spectrum. When the MFs are combined with the power spectrum, they can improve the constraint on $M_ν$ from the latter by a factor of 63 for the `m' field and 5 for the `cb' field. Notably, when the `m' field is used, the constraint on $M_ν$ from the MFs can reach $0.0177$eV with a volume of $1(h^{-1}\rm Gpc)^3$, while the combination of the MFs and power spectrum can tighten this constraint to be $0.0133$eV, a $4.5σ$ significance on detecting the minimum sum of the neutrino masses. For the `m' field, we also find the $σ_8$ and $M_ν$ degeneracy is broken with the MFs, leading to stronger constraints on all 6 cosmological parameters considered in this work than the power spectrum. △ Less

Submitted 15 June, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

Comments: Accepted for publication in JCAP. Changes from the first version: add figure 10, and minor text revisions. Matches accepted version. 33 pages, 10 figures, 2 tables

arXiv:2201.09857 [pdf, other]

STOPS: Short-Term-based Volatility-controlled Policy Search and its Global Convergence

Authors: Liangliang Xu, Daoming Lyu, Yangchen Pan, Aiwen Jiang, Bo Liu

Abstract: It remains challenging to deploy existing risk-averse approaches to real-world applications. The reasons are multi-fold, including the lack of global optimality guarantee and the necessity of learning from long-term consecutive trajectories. Long-term consecutive trajectories are prone to involving visiting hazardous states, which is a major concern in the risk-averse setting. This paper proposes… ▽ More It remains challenging to deploy existing risk-averse approaches to real-world applications. The reasons are multi-fold, including the lack of global optimality guarantee and the necessity of learning from long-term consecutive trajectories. Long-term consecutive trajectories are prone to involving visiting hazardous states, which is a major concern in the risk-averse setting. This paper proposes Short-Term VOlatility-controlled Policy Search (STOPS), a novel algorithm that solves risk-averse problems by learning from short-term trajectories instead of long-term trajectories. Short-term trajectories are more flexible to generate, and can avoid the danger of hazardous state visitations. By using an actor-critic scheme with an overparameterized two-layer neural network, our algorithm finds a globally optimal policy at a sublinear rate with proximal policy optimization and natural policy gradient, with effectiveness comparable to the state-of-the-art convergence rate of risk-neutral policy-search methods. The algorithm is evaluated on challenging Mujoco robot simulation tasks under the mean-variance evaluation metric. Both theoretical analysis and experimental results demonstrate a state-of-the-art level of STOPS' performance among existing risk-averse policy search methods. △ Less

Submitted 22 July, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

arXiv:2108.04401 [pdf, other]

doi 10.1145/3479512

A Framework of Severity for Harmful Content Online

Authors: Morgan Klaus Scheuerman, Jialun Aaron Jiang, Casey Fiesler, Jed R. Brubaker

Abstract: The proliferation of harmful content on online social media platforms has necessitated empirical understandings of experiences of harm online and the development of practices for harm mitigation. Both understandings of harm and approaches to mitigating that harm, often through content moderation, have implicitly embedded frameworks of prioritization - what forms of harm should be researched, how p… ▽ More The proliferation of harmful content on online social media platforms has necessitated empirical understandings of experiences of harm online and the development of practices for harm mitigation. Both understandings of harm and approaches to mitigating that harm, often through content moderation, have implicitly embedded frameworks of prioritization - what forms of harm should be researched, how policy on harmful content should be implemented, and how harmful content should be moderated. To aid efforts of better understanding the variety of online harms, how they relate to one another, and how to prioritize harms relevant to research, policy, and practice, we present a theoretical framework of severity for harmful online content. By employing a grounded theory approach, we developed a framework of severity based on interviews and card-sorting activities conducted with 52 participants over the course of ten months. Through our analysis, we identified four Types of Harm (physical, emotional, relational, and financial) and eight Dimensions along which the severity of harm can be understood (perspectives, intent, agency, experience, scale, urgency, vulnerability, sphere). We describe how our framework can be applied to both research and policy settings towards deeper understandings of specific forms of harm (e.g., harassment) and prioritization frameworks when implementing policies encompassing many forms of harm. △ Less

Submitted 17 September, 2021; v1 submitted 9 August, 2021; originally announced August 2021.

Comments: CSCW 2021; 33 pages

Journal ref: Proc. ACM Hum.-Comput. Interact.5, CSCW2, Article 368 (October 2021), 33 pages

Showing 1–50 of 94 results for author: Jiang, A