-
Smart Navigation System for Parking Assignment at Large Events: Incorporating Heterogeneous Driver Characteristics
Authors:
Xi Cheng,
Gaofeng Su,
Siyuan Feng,
Ke Liu,
Chen Zhu,
Hui Lin,
Jilin Song,
Jianan Chen
Abstract:
Parking challenges escalate significantly during large events such as concerts or sports games, yet few studies address dynamic parking lot assignments for such occasions. This paper introduces a smart navigation system designed to optimize parking assignments swiftly during large events, utilizing a mixed search algorithm that accounts for the heterogeneous characteristics of drivers. We conducte…
▽ More
Parking challenges escalate significantly during large events such as concerts or sports games, yet few studies address dynamic parking lot assignments for such occasions. This paper introduces a smart navigation system designed to optimize parking assignments swiftly during large events, utilizing a mixed search algorithm that accounts for the heterogeneous characteristics of drivers. We conducted simulations in the Berkeley city area during the "Big Game" to validate our system and demonstrate the benefits of our innovative parking assignment approach.
△ Less
Submitted 14 May, 2024;
originally announced June 2024.
-
Deep Reinforcement Learning for Real-Time Ground Delay Program Revision and Corresponding Flight Delay Assignments
Authors:
Ke Liu,
Fan Hu,
Hui Lin,
Xi Cheng,
Jianan Chen,
Jilin Song,
Siyuan Feng,
Gaofeng Su,
Chen Zhu
Abstract:
This paper explores the optimization of Ground Delay Programs (GDP), a prevalent Traffic Management Initiative used in Air Traffic Management (ATM) to reconcile capacity and demand discrepancies at airports. Employing Reinforcement Learning (RL) to manage the inherent uncertainties in the national airspace system-such as weather variability, fluctuating flight demands, and airport arrival rates-we…
▽ More
This paper explores the optimization of Ground Delay Programs (GDP), a prevalent Traffic Management Initiative used in Air Traffic Management (ATM) to reconcile capacity and demand discrepancies at airports. Employing Reinforcement Learning (RL) to manage the inherent uncertainties in the national airspace system-such as weather variability, fluctuating flight demands, and airport arrival rates-we developed two RL models: Behavioral Cloning (BC) and Conservative Q-Learning (CQL). These models are designed to enhance GDP efficiency by utilizing a sophisticated reward function that integrates ground and airborne delays and terminal area congestion. We constructed a simulated single-airport environment, SAGDP_ENV, which incorporates real operational data along with predicted uncertainties to facilitate realistic decision-making scenarios. Utilizing the whole year 2019 data from Newark Liberty International Airport (EWR), our models aimed to preemptively set airport program rates. Despite thorough modeling and simulation, initial outcomes indicated that the models struggled to learn effectively, attributed potentially to oversimplified environmental assumptions. This paper discusses the challenges encountered, evaluates the models' performance against actual operational data, and outlines future directions to refine RL applications in ATM.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Customizing Text-to-Image Diffusion with Camera Viewpoint Control
Authors:
Nupur Kumari,
Grace Su,
Richard Zhang,
Taesung Park,
Eli Shechtman,
Jun-Yan Zhu
Abstract:
Model customization introduces new concepts to existing text-to-image models, enabling the generation of the new concept in novel contexts. However, such methods lack accurate camera view control w.r.t the object, and users must resort to prompt engineering (e.g., adding "top-view") to achieve coarse view control. In this work, we introduce a new task -- enabling explicit control of camera viewpoi…
▽ More
Model customization introduces new concepts to existing text-to-image models, enabling the generation of the new concept in novel contexts. However, such methods lack accurate camera view control w.r.t the object, and users must resort to prompt engineering (e.g., adding "top-view") to achieve coarse view control. In this work, we introduce a new task -- enabling explicit control of camera viewpoint for model customization. This allows us to modify object properties amongst various background scenes via text prompts, all while incorporating the target camera pose as additional control. This new task presents significant challenges in merging a 3D representation from the multi-view images of the new concept with a general, 2D text-to-image model. To bridge this gap, we propose to condition the 2D diffusion process on rendered, view-dependent features of the new object. During training, we jointly adapt the 2D diffusion modules and 3D feature predictions to reconstruct the object's appearance and geometry while reducing overfitting to the input multi-view images. Our method outperforms existing image editing and model personalization baselines in preserving the custom object's identity while following the input text prompt and the object's camera pose.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Noise-Aware Training of Layout-Aware Language Models
Authors:
Ritesh Sarkhel,
Xiaoqi Ren,
Lauro Beltrao Costa,
Guolong Su,
Vincent Perot,
Yanan Xie,
Emmanouil Koukoumidis,
Arnab Nandi
Abstract:
A visually rich document (VRD) utilizes visual features along with linguistic cues to disseminate information. Training a custom extractor that identifies named entities from a document requires a large number of instances of the target document type annotated at textual and visual modalities. This is an expensive bottleneck in enterprise scenarios, where we want to train custom extractors for tho…
▽ More
A visually rich document (VRD) utilizes visual features along with linguistic cues to disseminate information. Training a custom extractor that identifies named entities from a document requires a large number of instances of the target document type annotated at textual and visual modalities. This is an expensive bottleneck in enterprise scenarios, where we want to train custom extractors for thousands of different document types in a scalable way. Pre-training an extractor model on unlabeled instances of the target document type, followed by a fine-tuning step on human-labeled instances does not work in these scenarios, as it surpasses the maximum allowable training time allocated for the extractor. We address this scenario by proposing a Noise-Aware Training method or NAT in this paper. Instead of acquiring expensive human-labeled documents, NAT utilizes weakly labeled documents to train an extractor in a scalable way. To avoid degradation in the model's quality due to noisy, weakly labeled samples, NAT estimates the confidence of each training sample and incorporates it as uncertainty measure during training. We train multiple state-of-the-art extractor models using NAT. Experiments on a number of publicly available and in-house datasets show that NAT-trained models are not only robust in performance -- it outperforms a transfer-learning baseline by up to 6% in terms of macro-F1 score, but it is also more label-efficient -- it reduces the amount of human-effort required to obtain comparable performance by up to 73%.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
BrainMass: Advancing Brain Network Analysis for Diagnosis with Large-scale Self-Supervised Learning
Authors:
Yanwu Yang,
Chenfei Ye,
Guinan Su,
Ziyao Zhang,
Zhikai Chang,
Hairui Chen,
Piu Chan,
Yue Yu,
Ting Ma
Abstract:
Foundation models pretrained on large-scale datasets via self-supervised learning demonstrate exceptional versatility across various tasks. Due to the heterogeneity and hard-to-collect medical data, this approach is especially beneficial for medical image analysis and neuroscience research, as it streamlines broad downstream tasks without the need for numerous costly annotations. However, there ha…
▽ More
Foundation models pretrained on large-scale datasets via self-supervised learning demonstrate exceptional versatility across various tasks. Due to the heterogeneity and hard-to-collect medical data, this approach is especially beneficial for medical image analysis and neuroscience research, as it streamlines broad downstream tasks without the need for numerous costly annotations. However, there has been limited investigation into brain network foundation models, limiting their adaptability and generalizability for broad neuroscience studies. In this study, we aim to bridge this gap. In particular, (1) we curated a comprehensive dataset by collating images from 30 datasets, which comprises 70,781 samples of 46,686 participants. Moreover, we introduce pseudo-functional connectivity (pFC) to further generates millions of augmented brain networks by randomly dropping certain timepoints of the BOLD signal. (2) We propose the BrainMass framework for brain network self-supervised learning via mask modeling and feature alignment. BrainMass employs Mask-ROI Modeling (MRM) to bolster intra-network dependencies and regional specificity. Furthermore, Latent Representation Alignment (LRA) module is utilized to regularize augmented brain networks of the same participant with similar topological properties to yield similar latent representations by aligning their latent embeddings. Extensive experiments on eight internal tasks and seven external brain disorder diagnosis tasks show BrainMass's superior performance, highlighting its significant generalizability and adaptability. Nonetheless, BrainMass demonstrates powerful few/zero-shot learning abilities and exhibits meaningful interpretation to various diseases, showcasing its potential use for clinical applications.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
"May I Speak?": Multi-modal Attention Guidance in Social VR Group Conversations
Authors:
Geonsun Lee,
Dae Yeol Lee,
Guan-Ming Su,
Dinesh Manocha
Abstract:
In this paper, we present a novel multi-modal attention guidance method designed to address the challenges of turn-taking dynamics in meetings and enhance group conversations within virtual reality (VR) environments. Recognizing the difficulties posed by a confined field of view and the absence of detailed gesture tracking in VR, our proposed method aims to mitigate the challenges of noticing new…
▽ More
In this paper, we present a novel multi-modal attention guidance method designed to address the challenges of turn-taking dynamics in meetings and enhance group conversations within virtual reality (VR) environments. Recognizing the difficulties posed by a confined field of view and the absence of detailed gesture tracking in VR, our proposed method aims to mitigate the challenges of noticing new speakers attempting to join the conversation. This approach tailors attention guidance, providing a nuanced experience for highly engaged participants while offering subtler cues for those less engaged, thereby enriching the overall meeting dynamics. Through group interview studies, we gathered insights to guide our design, resulting in a prototype that employs "light" as a diegetic guidance mechanism, complemented by spatial audio. The combination creates an intuitive and immersive meeting environment, effectively directing users' attention to new speakers. An evaluation study, comparing our method to state-of-the-art attention guidance approaches, demonstrated significantly faster response times (p < 0.001), heightened perceived conversation satisfaction (p < 0.001), and preference (p < 0.001) for our method. Our findings contribute to the understanding of design implications for VR social attention guidance, opening avenues for future research and development.
△ Less
Submitted 27 January, 2024;
originally announced January 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Analysis of Coding Gain Due to In-Loop Reshaping
Authors:
Chau-Wai Wong,
Chang-Hong Fu,
Mengting Xu,
Guan-Ming Su
Abstract:
Reshaping, a point operation that alters the characteristics of signals, has been shown capable of improving the compression ratio in video coding practices. Out-of-loop reshaping that directly modifies the input video signal was first adopted as the supplemental enhancement information (SEI) for the HEVC/H.265 without the need to alter the core design of the video codec. VVC/H.266 further improve…
▽ More
Reshaping, a point operation that alters the characteristics of signals, has been shown capable of improving the compression ratio in video coding practices. Out-of-loop reshaping that directly modifies the input video signal was first adopted as the supplemental enhancement information (SEI) for the HEVC/H.265 without the need to alter the core design of the video codec. VVC/H.266 further improves the coding efficiency by adopting in-loop reshaping that modifies the residual signal being processed in the hybrid coding loop. In this paper, we theoretically analyze the rate-distortion performance of the in-loop reshaping and use experiments to verify the theoretical result. We prove that the in-loop reshaping can improve coding efficiency when the entropy coder adopted in the coding pipeline is suboptimal, which is in line with the practical scenarios that video codecs operate in. We derive the PSNR gain in a closed form and show that the theoretically predicted gain is consistent with that measured from experiments using standard testing video sequences.
△ Less
Submitted 19 June, 2024; v1 submitted 6 December, 2023;
originally announced December 2023.
-
Tensor networks for interpretable and efficient quantum-inspired machine learning
Authors:
Shi-Ju Ran,
Gang Su
Abstract:
It is a critical challenge to simultaneously gain high interpretability and efficiency with the current schemes of deep machine learning (ML). Tensor network (TN), which is a well-established mathematical tool originating from quantum mechanics, has shown its unique advantages on developing efficient ``white-box'' ML schemes. Here, we give a brief review on the inspiring progresses made in TN-base…
▽ More
It is a critical challenge to simultaneously gain high interpretability and efficiency with the current schemes of deep machine learning (ML). Tensor network (TN), which is a well-established mathematical tool originating from quantum mechanics, has shown its unique advantages on developing efficient ``white-box'' ML schemes. Here, we give a brief review on the inspiring progresses made in TN-based ML. On one hand, interpretability of TN ML is accommodated with the solid theoretical foundation based on quantum information and many-body physics. On the other hand, high efficiency can be rendered from the powerful TN representations and the advanced computational techniques developed in quantum many-body physics. With the fast development on quantum computers, TN is expected to conceive novel schemes runnable on quantum hardware, heading towards the ``quantum artificial intelligence'' in the forthcoming future.
△ Less
Submitted 19 November, 2023;
originally announced November 2023.
-
Prompt Your Mind: Refine Personalized Text Prompts within Your Mind
Authors:
Guinan Su,
Yanwu Yang,
Jie Guo
Abstract:
Large language models (LLMs) have demonstrated remarkable potential in natural language understanding and generation, making them valuable tools for enhancing conversational interactions. However, LLMs encounter challenges such as lacking multi-step reasoning capabilities, and heavy reliance on prompts. In this regard, we introduce a prompt-refinement system named PromptMind, also known as "Prompt…
▽ More
Large language models (LLMs) have demonstrated remarkable potential in natural language understanding and generation, making them valuable tools for enhancing conversational interactions. However, LLMs encounter challenges such as lacking multi-step reasoning capabilities, and heavy reliance on prompts. In this regard, we introduce a prompt-refinement system named PromptMind, also known as "Prompt Your Mind", to provide an automated solution for generating contextually relevant prompts during conversations. PromptMind enhances the overall interaction between humans and chatbots through an automatic prompt suggestion and an automatic prompt refinement. To assess the effectiveness of PromptMind, we designed three interaction tasks to evaluate emotional support, advice acquisition, and task-oriented interactions during human-chatbot interactions. The results demonstrated that PromptMind reduced mental demands during interactions and fostered enhanced performance and social connections between users and chatbots. In summary, our findings indicate that PromptMind acts as a bridge, facilitating smoother information exchange and enhancing the usability of chatbot interactions.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
DualTalker: A Cross-Modal Dual Learning Approach for Speech-Driven 3D Facial Animation
Authors:
Guinan Su,
Yanwu Yang,
Zhifeng Li
Abstract:
In recent years, audio-driven 3D facial animation has gained significant attention, particularly in applications such as virtual reality, gaming, and video conferencing. However, accurately modeling the intricate and subtle dynamics of facial expressions remains a challenge. Most existing studies approach the facial animation task as a single regression problem, which often fail to capture the int…
▽ More
In recent years, audio-driven 3D facial animation has gained significant attention, particularly in applications such as virtual reality, gaming, and video conferencing. However, accurately modeling the intricate and subtle dynamics of facial expressions remains a challenge. Most existing studies approach the facial animation task as a single regression problem, which often fail to capture the intrinsic inter-modal relationship between speech signals and 3D facial animation and overlook their inherent consistency. Moreover, due to the limited availability of 3D-audio-visual datasets, approaches learning with small-size samples have poor generalizability that decreases the performance. To address these issues, in this study, we propose a cross-modal dual-learning framework, termed DualTalker, aiming at improving data usage efficiency as well as relating cross-modal dependencies. The framework is trained jointly with the primary task (audio-driven facial animation) and its dual task (lip reading) and shares common audio/motion encoder components. Our joint training framework facilitates more efficient data usage by leveraging information from both tasks and explicitly capitalizing on the complementary relationship between facial motion and audio to improve performance. Furthermore, we introduce an auxiliary cross-modal consistency loss to mitigate the potential over-smoothing underlying the cross-modal complementary representations, enhancing the mapping of subtle facial expression dynamics. Through extensive experiments and a perceptual user study conducted on the VOCA and BIWI datasets, we demonstrate that our approach outperforms current state-of-the-art methods both qualitatively and quantitatively. We have made our code and video demonstrations available at https://github.com/sabrina-su/iadf.git.
△ Less
Submitted 12 November, 2023; v1 submitted 8 November, 2023;
originally announced November 2023.
-
LMDX: Language Model-based Document Information Extraction and Localization
Authors:
Vincent Perot,
Kai Kang,
Florian Luisier,
Guolong Su,
Xiaoyu Sun,
Ramya Sree Boppana,
Zilong Wang,
Zifeng Wang,
Jiaqi Mu,
Hao Zhang,
Chen-Yu Lee,
Nan Hua
Abstract:
Large Language Models (LLM) have revolutionized Natural Language Processing (NLP), improving state-of-the-art and exhibiting emergent capabilities across various tasks. However, their application in extracting information from visually rich documents, which is at the core of many document processing workflows and involving the extraction of key entities from semi-structured documents, has not yet…
▽ More
Large Language Models (LLM) have revolutionized Natural Language Processing (NLP), improving state-of-the-art and exhibiting emergent capabilities across various tasks. However, their application in extracting information from visually rich documents, which is at the core of many document processing workflows and involving the extraction of key entities from semi-structured documents, has not yet been successful. The main obstacles to adopting LLMs for this task include the absence of layout encoding within LLMs, which is critical for high quality extraction, and the lack of a grounding mechanism to localize the predicted entities within the document. In this paper, we introduce Language Model-based Document Information Extraction and Localization (LMDX), a methodology to reframe the document information extraction task for a LLM. LMDX enables extraction of singular, repeated, and hierarchical entities, both with and without training data, while providing grounding guarantees and localizing the entities within the document. Finally, we apply LMDX to the PaLM 2-S and Gemini Pro LLMs and evaluate it on VRDU and CORD benchmarks, setting a new state-of-the-art and showing how LMDX enables the creation of high quality, data-efficient parsers.
△ Less
Submitted 21 June, 2024; v1 submitted 19 September, 2023;
originally announced September 2023.
-
Rolling control and dynamics model of two section articulated-wing ornithopter
Authors:
G. Su,
Y. Cai,
J. Zhao
Abstract:
This paper invented a new rolling control mechanism of two section articulated-wing ornithopter, which is analogues to aileron control in plane, however, similar control mechanism leads to opposite result, indicating the ornithopter supposed to go left now go right instead. This research gives a qualitative dynamics model which explains this new phenomenon. Because of wing folding, the differentia…
▽ More
This paper invented a new rolling control mechanism of two section articulated-wing ornithopter, which is analogues to aileron control in plane, however, similar control mechanism leads to opposite result, indicating the ornithopter supposed to go left now go right instead. This research gives a qualitative dynamics model which explains this new phenomenon. Because of wing folding, the differential rotation of outer-section wing (analogues to aileron in plane, left aileron up and right aileron down make left turn) around pitch axis becomes common mode rotation around yaw axis,leading its rotating torque changing from left-handed rotation (using left-handed as example, right-handed is the same) around roll axis to a common mode force pointing to front-right (northeast, NE) direction from first player's view of the ornithopter.Because most of the flapping movement is in the upper hemisphere from ornithopter's view, the NE force is above on the center of mass of the orthopter, generating a right-handed moment around roll axis. Therefore, the ornithopter supposed to go left now goes right. This phenomenon is a unique and only observed in two section articulated-wing ornithopter by far. Many field tests conducted by authors confirm it is highly repetitive.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
On Robot Grasp Learning Using Equivariant Models
Authors:
Xupeng Zhu,
Dian Wang,
Guanang Su,
Ondrej Biza,
Robin Walters,
Robert Platt
Abstract:
Real-world grasp detection is challenging due to the stochasticity in grasp dynamics and the noise in hardware. Ideally, the system would adapt to the real world by training directly on physical systems. However, this is generally difficult due to the large amount of training data required by most grasp learning models. In this paper, we note that the planar grasp function is $\SE(2)$-equivariant…
▽ More
Real-world grasp detection is challenging due to the stochasticity in grasp dynamics and the noise in hardware. Ideally, the system would adapt to the real world by training directly on physical systems. However, this is generally difficult due to the large amount of training data required by most grasp learning models. In this paper, we note that the planar grasp function is $\SE(2)$-equivariant and demonstrate that this structure can be used to constrain the neural network used during learning. This creates an inductive bias that can significantly improve the sample efficiency of grasp learning and enable end-to-end training from scratch on a physical robot with as few as $600$ grasp attempts. We call this method Symmetric Grasp learning (SymGrasp) and show that it can learn to grasp ``from scratch'' in less that 1.5 hours of physical robot time.
△ Less
Submitted 10 June, 2023;
originally announced June 2023.
-
DeepScribe: Localization and Classification of Elamite Cuneiform Signs Via Deep Learning
Authors:
Edward C. Williams,
Grace Su,
Sandra R. Schloen,
Miller C. Prosser,
Susanne Paulus,
Sanjay Krishnan
Abstract:
Twenty-five hundred years ago, the paperwork of the Achaemenid Empire was recorded on clay tablets. In 1933, archaeologists from the University of Chicago's Oriental Institute (OI) found tens of thousands of these tablets and fragments during the excavation of Persepolis. Many of these tablets have been painstakingly photographed and annotated by expert cuneiformists, and now provide a rich datase…
▽ More
Twenty-five hundred years ago, the paperwork of the Achaemenid Empire was recorded on clay tablets. In 1933, archaeologists from the University of Chicago's Oriental Institute (OI) found tens of thousands of these tablets and fragments during the excavation of Persepolis. Many of these tablets have been painstakingly photographed and annotated by expert cuneiformists, and now provide a rich dataset consisting of over 5,000 annotated tablet images and 100,000 cuneiform sign bounding boxes. We leverage this dataset to develop DeepScribe, a modular computer vision pipeline capable of localizing cuneiform signs and providing suggestions for the identity of each sign. We investigate the difficulty of learning subtasks relevant to cuneiform tablet transcription on ground-truth data, finding that a RetinaNet object detector can achieve a localization mAP of 0.78 and a ResNet classifier can achieve a top-5 sign classification accuracy of 0.89. The end-to-end pipeline achieves a top-5 classification accuracy of 0.80. As part of the classification module, DeepScribe groups cuneiform signs into morphological clusters. We consider how this automatic clustering approach differs from the organization of standard, printed sign lists and what we may learn from it. These components, trained individually, are sufficient to produce a system that can analyze photos of cuneiform tablets from the Achaemenid period and provide useful transliteration suggestions to researchers. We evaluate the model's end-to-end performance on locating and classifying signs, providing a roadmap to a linguistically-aware transliteration system, then consider the model's potential utility when applied to other periods of cuneiform writing.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
Multi-Objective Task Assignment and Multiagent Planning with Hybrid GPU-CPU Acceleration
Authors:
Thomas Robinson,
Guoxin Su
Abstract:
Allocation and planning with a collection of tasks and a group of agents is an important problem in multiagent systems. One commonly faced bottleneck is scalability, as in general the multiagent model increases exponentially in size with the number of agents. We consider the combination of random task assignment and multiagent planning under multiple-objective constraints, and show that this probl…
▽ More
Allocation and planning with a collection of tasks and a group of agents is an important problem in multiagent systems. One commonly faced bottleneck is scalability, as in general the multiagent model increases exponentially in size with the number of agents. We consider the combination of random task assignment and multiagent planning under multiple-objective constraints, and show that this problem can be decentralised to individual agent-task models. We present an algorithm of point-oriented Pareto computation, which checks whether a point corresponding to given cost and probability thresholds for our formal problem is feasible or not. If the given point is infeasible, our algorithm finds a Pareto-optimal point which is closest to the given point. We provide the first multi-objective model checking framework that simultaneously uses GPU and multi-core acceleration. Our framework manages CPU and GPU devices as a load balancing problem for parallel computation. Our experiments demonstrate that parallelisation achieves significant run time speed-up over sequential computation.
△ Less
Submitted 7 May, 2023;
originally announced May 2023.
-
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
Authors:
Chen-Yu Lee,
Chun-Liang Li,
Hao Zhang,
Timothy Dozat,
Vincent Perot,
Guolong Su,
Xiang Zhang,
Kihyuk Sohn,
Nikolai Glushnev,
Renshen Wang,
Joshua Ainslie,
Shangbang Long,
Siyang Qin,
Yasuhisa Fujii,
Nan Hua,
Tomas Pfister
Abstract:
The recent advent of self-supervised pre-training techniques has led to a surge in the use of multimodal learning in form document understanding. However, existing approaches that extend the mask language modeling to other modalities require careful multi-task tuning, complex reconstruction target designs, or additional pre-training data. In FormNetV2, we introduce a centralized multimodal graph c…
▽ More
The recent advent of self-supervised pre-training techniques has led to a surge in the use of multimodal learning in form document understanding. However, existing approaches that extend the mask language modeling to other modalities require careful multi-task tuning, complex reconstruction target designs, or additional pre-training data. In FormNetV2, we introduce a centralized multimodal graph contrastive learning strategy to unify self-supervised pre-training for all modalities in one loss. The graph contrastive objective maximizes the agreement of multimodal representations, providing a natural interplay for all modalities without special customization. In addition, we extract image features within the bounding box that joins a pair of tokens connected by a graph edge, capturing more targeted visual cues without loading a sophisticated and separately pre-trained image embedder. FormNetV2 establishes new state-of-the-art performance on FUNSD, CORD, SROIE and Payment benchmarks with a more compact model size.
△ Less
Submitted 13 June, 2023; v1 submitted 4 May, 2023;
originally announced May 2023.
-
Intelligent diagnostic scheme for lung cancer screening with Raman spectra data by tensor network machine learning
Authors:
Yu-Jia An,
Sheng-Chen Bai,
Lin Cheng,
Xiao-Guang Li,
Cheng-en Wang,
Xiao-Dong Han,
Gang Su,
Shi-Ju Ran,
Cong Wang
Abstract:
Artificial intelligence (AI) has brought tremendous impacts on biomedical sciences from academic researches to clinical applications, such as in biomarkers' detection and diagnosis, optimization of treatment, and identification of new therapeutic targets in drug discovery. However, the contemporary AI technologies, particularly deep machine learning (ML), severely suffer from non-interpretability,…
▽ More
Artificial intelligence (AI) has brought tremendous impacts on biomedical sciences from academic researches to clinical applications, such as in biomarkers' detection and diagnosis, optimization of treatment, and identification of new therapeutic targets in drug discovery. However, the contemporary AI technologies, particularly deep machine learning (ML), severely suffer from non-interpretability, which might uncontrollably lead to incorrect predictions. Interpretability is particularly crucial to ML for clinical diagnosis as the consumers must gain necessary sense of security and trust from firm grounds or convincing interpretations. In this work, we propose a tensor-network (TN)-ML method to reliably predict lung cancer patients and their stages via screening Raman spectra data of Volatile organic compounds (VOCs) in exhaled breath, which are generally suitable as biomarkers and are considered to be an ideal way for non-invasive lung cancer screening. The prediction of TN-ML is based on the mutual distances of the breath samples mapped to the quantum Hilbert space. Thanks to the quantum probabilistic interpretation, the certainty of the predictions can be quantitatively characterized. The accuracy of the samples with high certainty is almost 100$\%$. The incorrectly-classified samples exhibit obviously lower certainty, and thus can be decipherably identified as anomalies, which will be handled by human experts to guarantee high reliability. Our work sheds light on shifting the ``AI for biomedical sciences'' from the conventional non-interpretable ML schemes to the interpretable human-ML interactive approaches, for the purpose of high accuracy and reliability.
△ Less
Submitted 11 March, 2023;
originally announced March 2023.
-
A General Theory of Correct, Incorrect, and Extrinsic Equivariance
Authors:
Dian Wang,
Xupeng Zhu,
Jung Yeon Park,
Mingxi Jia,
Guanang Su,
Robert Platt,
Robin Walters
Abstract:
Although equivariant machine learning has proven effective at many tasks, success depends heavily on the assumption that the ground truth function is symmetric over the entire domain matching the symmetry in an equivariant neural network. A missing piece in the equivariant learning literature is the analysis of equivariant networks when symmetry exists only partially in the domain. In this work, w…
▽ More
Although equivariant machine learning has proven effective at many tasks, success depends heavily on the assumption that the ground truth function is symmetric over the entire domain matching the symmetry in an equivariant neural network. A missing piece in the equivariant learning literature is the analysis of equivariant networks when symmetry exists only partially in the domain. In this work, we present a general theory for such a situation. We propose pointwise definitions of correct, incorrect, and extrinsic equivariance, which allow us to quantify continuously the degree of each type of equivariance a function displays. We then study the impact of various degrees of incorrect or extrinsic symmetry on model error. We prove error lower bounds for invariant or equivariant networks in classification or regression settings with partially incorrect symmetry. We also analyze the potentially harmful effects of extrinsic equivariance. Experiments validate these results in three different environments.
△ Less
Submitted 28 October, 2023; v1 submitted 8 March, 2023;
originally announced March 2023.
-
QueryForm: A Simple Zero-shot Form Entity Query Framework
Authors:
Zifeng Wang,
Zizhao Zhang,
Jacob Devlin,
Chen-Yu Lee,
Guolong Su,
Hao Zhang,
Jennifer Dy,
Vincent Perot,
Tomas Pfister
Abstract:
Zero-shot transfer learning for document understanding is a crucial yet under-investigated scenario to help reduce the high cost involved in annotating document entities. We present a novel query-based framework, QueryForm, that extracts entity values from form-like documents in a zero-shot fashion. QueryForm contains a dual prompting mechanism that composes both the document schema and a specific…
▽ More
Zero-shot transfer learning for document understanding is a crucial yet under-investigated scenario to help reduce the high cost involved in annotating document entities. We present a novel query-based framework, QueryForm, that extracts entity values from form-like documents in a zero-shot fashion. QueryForm contains a dual prompting mechanism that composes both the document schema and a specific entity type into a query, which is used to prompt a Transformer model to perform a single entity extraction task. Furthermore, we propose to leverage large-scale query-entity pairs generated from form-like webpages with weak HTML annotations to pre-train QueryForm. By unifying pre-training and fine-tuning into the same query-based framework, QueryForm enables models to learn from structured documents containing various entities and layouts, leading to better generalization to target document types without the need for target-specific training data. QueryForm sets new state-of-the-art average F1 score on both the XFUND (+4.6%~10.1%) and the Payment (+3.2%~9.5%) zero-shot benchmark, with a smaller model size and no additional image input.
△ Less
Submitted 27 June, 2023; v1 submitted 14 November, 2022;
originally announced November 2022.
-
SEIL: Simulation-augmented Equivariant Imitation Learning
Authors:
Mingxi Jia,
Dian Wang,
Guanang Su,
David Klee,
Xupeng Zhu,
Robin Walters,
Robert Platt
Abstract:
In robotic manipulation, acquiring samples is extremely expensive because it often requires interacting with the real world. Traditional image-level data augmentation has shown the potential to improve sample efficiency in various machine learning tasks. However, image-level data augmentation is insufficient for an imitation learning agent to learn good manipulation policies in a reasonable amount…
▽ More
In robotic manipulation, acquiring samples is extremely expensive because it often requires interacting with the real world. Traditional image-level data augmentation has shown the potential to improve sample efficiency in various machine learning tasks. However, image-level data augmentation is insufficient for an imitation learning agent to learn good manipulation policies in a reasonable amount of demonstrations. We propose Simulation-augmented Equivariant Imitation Learning (SEIL), a method that combines a novel data augmentation strategy of supplementing expert trajectories with simulated transitions and an equivariant model that exploits the $\mathrm{O}(2)$ symmetry in robotic manipulation. Experimental evaluations demonstrate that our method can learn non-trivial manipulation tasks within ten demonstrations and outperforms the baselines with a significant margin.
△ Less
Submitted 31 October, 2022;
originally announced November 2022.
-
Intelligent MIMO Detection Using Meta Learning
Authors:
Haomiao Huo,
Jindan Xu,
Gege Su,
Wei Xu,
Ning Wang
Abstract:
In a K-best detector for multiple-input-multiple-output(MIMO) systems, the value of K needs to be sufficiently large to achieve near-maximum-likelihood (ML) performance. By treating K as a variable that can be adjusted according to a fitting function of some learnable coefficients, an intelligent MIMO detection network based on deep neural networks (DNN) is proposed to reduce complexity of the det…
▽ More
In a K-best detector for multiple-input-multiple-output(MIMO) systems, the value of K needs to be sufficiently large to achieve near-maximum-likelihood (ML) performance. By treating K as a variable that can be adjusted according to a fitting function of some learnable coefficients, an intelligent MIMO detection network based on deep neural networks (DNN) is proposed to reduce complexity of the detection algorithm with little performance degradation. In particular, the proposed intelligent detection algorithm uses meta learning to learn the coefficients of the fitting function for K to circumvent the problem of learning K directly. The idea of network fusion is used to combine the learning results of the meta learning component networks. Simulation results show that the proposed scheme achieves near-ML detection performance while its complexity is close to that of linear detectors. Besides, it also exhibits strong ability of fast training.
△ Less
Submitted 8 August, 2022;
originally announced August 2022.
-
Collaborative Navigation and Manipulation of a Cable-towed Load by Multiple Quadrupedal Robots
Authors:
Chenyu Yang,
Guo Ning Sue,
Zhongyu Li,
Lizhi Yang,
Haotian Shen,
Yufeng Chi,
Akshara Rai,
Jun Zeng,
Koushil Sreenath
Abstract:
This paper tackles the problem of robots collaboratively towing a load with cables to a specified goal location while avoiding collisions in real time. The introduction of cables (as opposed to rigid links) enables the robotic team to travel through narrow spaces by changing its intrinsic dimensions through slack/taut switches of the cable. However, this is a challenging problem because of the hyb…
▽ More
This paper tackles the problem of robots collaboratively towing a load with cables to a specified goal location while avoiding collisions in real time. The introduction of cables (as opposed to rigid links) enables the robotic team to travel through narrow spaces by changing its intrinsic dimensions through slack/taut switches of the cable. However, this is a challenging problem because of the hybrid mode switches and the dynamical coupling among multiple robots and the load. Previous attempts at addressing such a problem were performed offline and do not consider avoiding obstacles online. In this paper, we introduce a cascaded planning scheme with a parallelized centralized trajectory optimization that deals with hybrid mode switches. We additionally develop a set of decentralized planners per robot, which enables our approach to solve the problem of collaborative load manipulation online. We develop and demonstrate one of the first collaborative autonomy framework that is able to move a cable-towed load, which is too heavy to move by a single robot, through narrow spaces with real-time feedback and reactive planning in experiments.
△ Less
Submitted 29 June, 2022;
originally announced June 2022.
-
DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning
Authors:
Zifeng Wang,
Zizhao Zhang,
Sayna Ebrahimi,
Ruoxi Sun,
Han Zhang,
Chen-Yu Lee,
Xiaoqi Ren,
Guolong Su,
Vincent Perot,
Jennifer Dy,
Tomas Pfister
Abstract:
Continual learning aims to enable a single model to learn a sequence of tasks without catastrophic forgetting. Top-performing methods usually require a rehearsal buffer to store past pristine examples for experience replay, which, however, limits their practical value due to privacy and memory constraints. In this work, we present a simple yet effective framework, DualPrompt, which learns a tiny s…
▽ More
Continual learning aims to enable a single model to learn a sequence of tasks without catastrophic forgetting. Top-performing methods usually require a rehearsal buffer to store past pristine examples for experience replay, which, however, limits their practical value due to privacy and memory constraints. In this work, we present a simple yet effective framework, DualPrompt, which learns a tiny set of parameters, called prompts, to properly instruct a pre-trained model to learn tasks arriving sequentially without buffering past examples. DualPrompt presents a novel approach to attach complementary prompts to the pre-trained backbone, and then formulates the objective as learning task-invariant and task-specific "instructions". With extensive experimental validation, DualPrompt consistently sets state-of-the-art performance under the challenging class-incremental setting. In particular, DualPrompt outperforms recent advanced continual learning methods with relatively large buffer sizes. We also introduce a more challenging benchmark, Split ImageNet-R, to help generalize rehearsal-free continual learning research. Source code is available at https://github.com/google-research/l2p.
△ Less
Submitted 5 August, 2022; v1 submitted 10 April, 2022;
originally announced April 2022.
-
FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction
Authors:
Chen-Yu Lee,
Chun-Liang Li,
Timothy Dozat,
Vincent Perot,
Guolong Su,
Nan Hua,
Joshua Ainslie,
Renshen Wang,
Yasuhisa Fujii,
Tomas Pfister
Abstract:
Sequence modeling has demonstrated state-of-the-art performance on natural language and document understanding tasks. However, it is challenging to correctly serialize tokens in form-like documents in practice due to their variety of layout patterns. We propose FormNet, a structure-aware sequence model to mitigate the suboptimal serialization of forms. First, we design Rich Attention that leverage…
▽ More
Sequence modeling has demonstrated state-of-the-art performance on natural language and document understanding tasks. However, it is challenging to correctly serialize tokens in form-like documents in practice due to their variety of layout patterns. We propose FormNet, a structure-aware sequence model to mitigate the suboptimal serialization of forms. First, we design Rich Attention that leverages the spatial relationship between tokens in a form for more precise attention score calculation. Second, we construct Super-Tokens for each word by embedding representations from their neighboring tokens through graph convolutions. FormNet therefore explicitly recovers local syntactic information that may have been lost during serialization. In experiments, FormNet outperforms existing methods with a more compact model size and less pre-training data, establishing new state-of-the-art performance on CORD, FUNSD and Payment benchmarks.
△ Less
Submitted 23 March, 2022; v1 submitted 16 March, 2022;
originally announced March 2022.
-
Hybrid Tracker with Pixel and Instance for Video Panoptic Segmentation
Authors:
Weicai Ye,
Xinyue Lan,
Ge Su,
Hujun Bao,
Zhaopeng Cui,
Guofeng Zhang
Abstract:
Video Panoptic Segmentation (VPS) aims to generate coherent panoptic segmentation and track the identities of all pixels across video frames. Existing methods predominantly utilize the trained instance embedding to keep the consistency of panoptic segmentation. However, they inevitably struggle to cope with the challenges of small objects, similar appearance but inconsistent identities, occlusion,…
▽ More
Video Panoptic Segmentation (VPS) aims to generate coherent panoptic segmentation and track the identities of all pixels across video frames. Existing methods predominantly utilize the trained instance embedding to keep the consistency of panoptic segmentation. However, they inevitably struggle to cope with the challenges of small objects, similar appearance but inconsistent identities, occlusion, and strong instance contour deformations. To address these problems, we present HybridTracker, a lightweight and joint tracking model attempting to eliminate the limitations of the single tracker. HybridTracker performs pixel tracker and instance tracker in parallel to obtain the association matrices, which are fused into a matching matrix. In the instance tracker, we design a differentiable matching layer, ensuring the stability of inter-frame matching. In the pixel tracker, we compute the dice coefficient of the same instance of different frames given the estimated optical flow, forming the Intersection Over Union (IoU) matrix. We additionally propose mutual check and temporal consistency constraints during inference to settle the occlusion and contour deformation challenges. Comprehensive experiments show that HybridTracker achieves superior performance than state-of-the-art methods on Cityscapes-VPS and VIPER datasets.
△ Less
Submitted 11 December, 2023; v1 submitted 2 March, 2022;
originally announced March 2022.
-
Sample Efficient Grasp Learning Using Equivariant Models
Authors:
Xupeng Zhu,
Dian Wang,
Ondrej Biza,
Guanang Su,
Robin Walters,
Robert Platt
Abstract:
In planar grasp detection, the goal is to learn a function from an image of a scene onto a set of feasible grasp poses in $\mathrm{SE}(2)$. In this paper, we recognize that the optimal grasp function is $\mathrm{SE}(2)$-equivariant and can be modeled using an equivariant convolutional neural network. As a result, we are able to significantly improve the sample efficiency of grasp learning, obtaini…
▽ More
In planar grasp detection, the goal is to learn a function from an image of a scene onto a set of feasible grasp poses in $\mathrm{SE}(2)$. In this paper, we recognize that the optimal grasp function is $\mathrm{SE}(2)$-equivariant and can be modeled using an equivariant convolutional neural network. As a result, we are able to significantly improve the sample efficiency of grasp learning, obtaining a good approximation of the grasp function after only 600 grasp attempts. This is few enough that we can learn to grasp completely on a physical robot in about 1.5 hours.
△ Less
Submitted 18 February, 2022;
originally announced February 2022.
-
Learning to Prompt for Continual Learning
Authors:
Zifeng Wang,
Zizhao Zhang,
Chen-Yu Lee,
Han Zhang,
Ruoxi Sun,
Xiaoqi Ren,
Guolong Su,
Vincent Perot,
Jennifer Dy,
Tomas Pfister
Abstract:
The mainstream paradigm behind continual learning has been to adapt the model parameters to non-stationary data distributions, where catastrophic forgetting is the central challenge. Typical methods rely on a rehearsal buffer or known task identity at test time to retrieve learned knowledge and address forgetting, while this work presents a new paradigm for continual learning that aims to train a…
▽ More
The mainstream paradigm behind continual learning has been to adapt the model parameters to non-stationary data distributions, where catastrophic forgetting is the central challenge. Typical methods rely on a rehearsal buffer or known task identity at test time to retrieve learned knowledge and address forgetting, while this work presents a new paradigm for continual learning that aims to train a more succinct memory system without accessing task identity at test time. Our method learns to dynamically prompt (L2P) a pre-trained model to learn tasks sequentially under different task transitions. In our proposed framework, prompts are small learnable parameters, which are maintained in a memory space. The objective is to optimize prompts to instruct the model prediction and explicitly manage task-invariant and task-specific knowledge while maintaining model plasticity. We conduct comprehensive experiments under popular image classification benchmarks with different challenging continual learning settings, where L2P consistently outperforms prior state-of-the-art methods. Surprisingly, L2P achieves competitive results against rehearsal-based methods even without a rehearsal buffer and is directly applicable to challenging task-agnostic continual learning. Source code is available at https://github.com/google-research/l2p.
△ Less
Submitted 21 March, 2022; v1 submitted 16 December, 2021;
originally announced December 2021.
-
SketchLattice: Latticed Representation for Sketch Manipulation
Authors:
Yonggang Qi,
Guoyao Su,
Pinaki Nath Chowdhury,
Mingkang Li,
Yi-Zhe Song
Abstract:
The key challenge in designing a sketch representation lies with handling the abstract and iconic nature of sketches. Existing work predominantly utilizes either, (i) a pixelative format that treats sketches as natural images employing off-the-shelf CNN-based networks, or (ii) an elaborately designed vector format that leverages the structural information of drawing orders using sequential RNN-bas…
▽ More
The key challenge in designing a sketch representation lies with handling the abstract and iconic nature of sketches. Existing work predominantly utilizes either, (i) a pixelative format that treats sketches as natural images employing off-the-shelf CNN-based networks, or (ii) an elaborately designed vector format that leverages the structural information of drawing orders using sequential RNN-based methods. While the pixelative format lacks intuitive exploitation of structural cues, sketches in vector format are absent in most cases limiting their practical usage. Hence, in this paper, we propose a lattice structured sketch representation that not only removes the bottleneck of requiring vector data but also preserves the structural cues that vector data provides. Essentially, sketch lattice is a set of points sampled from the pixelative format of the sketch using a lattice graph. We show that our lattice structure is particularly amenable to structural changes that largely benefits sketch abstraction modeling for generation tasks. Our lattice representation could be effectively encoded using a graph model, that uses significantly fewer model parameters (13.5 times lesser) than existing state-of-the-art. Extensive experiments demonstrate the effectiveness of sketch lattice for sketch manipulation, including sketch healing and image-to-sketch synthesis.
△ Less
Submitted 26 August, 2021;
originally announced August 2021.
-
Gradient-Leakage Resilient Federated Learning
Authors:
Wenqi Wei,
Ling Liu,
Yanzhao Wu,
Gong Su,
Arun Iyengar
Abstract:
Federated learning(FL) is an emerging distributed learning paradigm with default client privacy because clients can keep sensitive data on their devices and only share local training parameter updates with the federated server. However, recent studies reveal that gradient leakages in FL may compromise the privacy of client training data. This paper presents a gradient leakage resilient approach to…
▽ More
Federated learning(FL) is an emerging distributed learning paradigm with default client privacy because clients can keep sensitive data on their devices and only share local training parameter updates with the federated server. However, recent studies reveal that gradient leakages in FL may compromise the privacy of client training data. This paper presents a gradient leakage resilient approach to privacy-preserving federated learning with per training example-based client differential privacy, coined as Fed-CDP. It makes three original contributions. First, we identify three types of client gradient leakage threats in federated learning even with encrypted client-server communications. We articulate when and why the conventional server coordinated differential privacy approach, coined as Fed-SDP, is insufficient to protect the privacy of the training data. Second, we introduce Fed-CDP, the per example-based client differential privacy algorithm, and provide a formal analysis of Fed-CDP with the $(ε, δ)$ differential privacy guarantee, and a formal comparison between Fed-CDP and Fed-SDP in terms of privacy accounting. Third, we formally analyze the privacy-utility trade-off for providing differential privacy guarantee by Fed-CDP and present a dynamic decay noise-injection policy to further improve the accuracy and resiliency of Fed-CDP. We evaluate and compare Fed-CDP and Fed-CDP(decay) with Fed-SDP in terms of differential privacy guarantee and gradient leakage resilience over five benchmark datasets. The results show that the Fed-CDP approach outperforms conventional Fed-SDP in terms of resilience to client gradient leakages while offering competitive accuracy performance in federated learning.
△ Less
Submitted 2 July, 2021;
originally announced July 2021.
-
RDMAbox : Optimizing RDMA for Memory Intensive Workloads
Authors:
Juhyun Bae,
Ling Liu,
Yanzhao Wu,
Gong Su,
Arun Iyengar
Abstract:
We present RDMAbox, a set of low level RDMA optimizations that provide better performance than previous approaches. The optimizations are packaged in easy-to-use kernel and user space libraries for applications and systems in data center. We demonstrate the flexibility and effectiveness of RDMAbox by implementing a kernel remote paging system and a user space file system using RDMAbox. RDMAbox emp…
▽ More
We present RDMAbox, a set of low level RDMA optimizations that provide better performance than previous approaches. The optimizations are packaged in easy-to-use kernel and user space libraries for applications and systems in data center. We demonstrate the flexibility and effectiveness of RDMAbox by implementing a kernel remote paging system and a user space file system using RDMAbox. RDMAbox employs two optimization techniques. First, we suggest RDMA request merging and chaining to further reduce the total number of I/O operations to the RDMA NIC. The I/O merge queue at the same time functions as a traffic regulator to enforce admission control and avoid overloading the NIC. Second, we propose Adaptive Polling to achieve higher efficiency of polling Work Completion than existing busy polling while maintaining the low CPU overhead of event trigger. Our implementation of a remote paging system with RDMAbox outperforms existing representative solutions with up to 4? throughput improvement and up to 83% decrease in average tail latency in bigdata workloads, and up to 83% reduction in completion time in machine learning workloads. Our implementation of a user space file system based on RDMAbox achieves up to 5.9? higher throughput over existing representative solutions.
△ Less
Submitted 13 August, 2021; v1 submitted 25 April, 2021;
originally announced April 2021.
-
Compiling ONNX Neural Network Models Using MLIR
Authors:
Tian Jin,
Gheorghe-Teodor Bercea,
Tung D. Le,
Tong Chen,
Gong Su,
Haruki Imai,
Yasushi Negishi,
Anh Leu,
Kevin O'Brien,
Kiyokuni Kawachiya,
Alexandre E. Eichenberger
Abstract:
Deep neural network models are becoming increasingly popular and have been used in various tasks such as computer vision, speech recognition, and natural language processing. Machine learning models are commonly trained in a resource-rich environment and then deployed in a distinct environment such as high availability machines or edge devices. To assist the portability of models, the open-source…
▽ More
Deep neural network models are becoming increasingly popular and have been used in various tasks such as computer vision, speech recognition, and natural language processing. Machine learning models are commonly trained in a resource-rich environment and then deployed in a distinct environment such as high availability machines or edge devices. To assist the portability of models, the open-source community has proposed the Open Neural Network Exchange (ONNX) standard. In this paper, we present a high-level, preliminary report on our onnx-mlir compiler, which generates code for the inference of deep neural network models described in the ONNX format. Onnx-mlir is an open-source compiler implemented using the Multi-Level Intermediate Representation (MLIR) infrastructure recently integrated in the LLVM project. Onnx-mlir relies on the MLIR concept of dialects to implement its functionality. We propose here two new dialects: (1) an ONNX specific dialect that encodes the ONNX standard semantics, and (2) a loop-based dialect to provide for a common lowering point for all ONNX dialect operations. Each intermediate representation facilitates its own characteristic set of graph-level and loop-based optimizations respectively. We illustrate our approach by following several models through the proposed representations and we include some early optimization work and performance results.
△ Less
Submitted 30 September, 2020; v1 submitted 19 August, 2020;
originally announced August 2020.
-
Efficient Orchestration of Host and Remote Shared Memory for Memory Intensive Workloads
Authors:
Juhyun Bae,
Gong Su,
Arun Iyengar,
Yanzhao Wu,
Ling Liu
Abstract:
Since very few contributions to the development of an unified memory orchestration framework for efficient management of both host and remote idle memory have been made, we present Valet, an efficient approach to orchestration of host and remote shared memory for improving performance of memory intensive workloads. The paper makes three original contributions. First, we redesign the data flow in t…
▽ More
Since very few contributions to the development of an unified memory orchestration framework for efficient management of both host and remote idle memory have been made, we present Valet, an efficient approach to orchestration of host and remote shared memory for improving performance of memory intensive workloads. The paper makes three original contributions. First, we redesign the data flow in the critical path by introducing a host-coordinated memory pool that works as a local cache to reduce the latency in the critical path of the host and remote memory orchestration. Second, Valet utilizes unused local memory across containers by managing local memory via Valet host-coordinated memory pool, which allows containers to dynamically expand and shrink their memory allocations according to the workload demands. Third, Valet provides an efficient remote memory reclaiming technique on remote peers, based on two optimizations: (1) an activity-based victim selection scheme to allow the least-active-chunk of data to be selected for serving the eviction requests and (2) a migration protocol to move the least-active-chunk of data to less-memory-pressured remote node. As a result, Valet can effectively reduce the performance impact and migration overhead on local nodes. Our extensive experiments on both NoSQL systems and Machine Learning (ML) workloads show that Valet outperforms existing representative remote paging systems with up to 226X throughput improvement and up to 98% latency decrease over conventional OS swap facility for big data and ML workloads, and by up to 5.5X throughput improvement and up to 78.4% latency decrease over the state-of-the-art remote paging systems. Valet is open sourced at https://github.com/git-disl/Valet.
△ Less
Submitted 28 August, 2020; v1 submitted 3 August, 2020;
originally announced August 2020.
-
BDTF: A Blockchain-Based Data Trading Framework with Trusted Execution Environment
Authors:
Guoxiong Su,
Wenyuan Yang,
Zhengding Luo,
Yinghong Zhang,
Zhiqiang Bai,
Yuesheng Zhu
Abstract:
The need for data trading promotes the emergence of data market. However, in conventional data markets, both data buyers and data sellers have to use a centralized trading platform which might be dishonest. A dishonest centralized trading platform may steal and resell the data seller's data, or may refuse to send data after receiving payment from the data buyer. It seriously affects the fair data…
▽ More
The need for data trading promotes the emergence of data market. However, in conventional data markets, both data buyers and data sellers have to use a centralized trading platform which might be dishonest. A dishonest centralized trading platform may steal and resell the data seller's data, or may refuse to send data after receiving payment from the data buyer. It seriously affects the fair data transaction and harm the interests of both parties to the transaction. To address this issue, we propose a novel blockchain-based data trading framework with Trusted Execution Environment (TEE) to provide a trusted decentralized platform for fair data trading. In our design, a blockchain network is proposed to realize the payments from data buyers to data sellers, and a trusted exchange is built by using a TEE for the first time to achieve fair data transmission. With these help, data buyers and data sellers can conduct transactions directly. We implement our proposed framework on Ethereum and Intel SGX, security analysis and experimental results have demonstrated that the framework proposed can effectively guarantee the fair completion of data tradings.
△ Less
Submitted 14 July, 2020;
originally announced July 2020.
-
AlphaGAN: Fully Differentiable Architecture Search for Generative Adversarial Networks
Authors:
Yuesong Tian,
Li Shen,
Li Shen,
Guinan Su,
Zhifeng Li,
Wei Liu
Abstract:
Generative Adversarial Networks (GANs) are formulated as minimax game problems, whereby generators attempt to approach real data distributions by virtue of adversarial learning against discriminators. The intrinsic problem complexity poses the challenge to enhance the performance of generative networks. In this work, we aim to boost model learning from the perspective of network architectures, by…
▽ More
Generative Adversarial Networks (GANs) are formulated as minimax game problems, whereby generators attempt to approach real data distributions by virtue of adversarial learning against discriminators. The intrinsic problem complexity poses the challenge to enhance the performance of generative networks. In this work, we aim to boost model learning from the perspective of network architectures, by incorporating recent progress on automated architecture search into GANs. To this end, we propose a fully differentiable search framework for generative adversarial networks, dubbed alphaGAN. The searching process is formalized as solving a bi-level minimax optimization problem, in which the outer-level objective aims for seeking a suitable network architecture towards pure Nash Equilibrium conditioned on the generator and the discriminator network parameters optimized with a traditional GAN loss in the inner level. The entire optimization performs a first-order method by alternately minimizing the two-level objective in a fully differentiable manner, enabling architecture search to be completed in an enormous search space. Extensive experiments on CIFAR-10 and STL-10 datasets show that our algorithm can obtain high-performing architectures only with 3-GPU hours on a single GPU in the search space comprised of approximate 2 ? 1011 possible configurations. We also provide a comprehensive analysis on the behavior of the searching process and the properties of searched architectures, which would benefit further research on architectures for generative models. Pretrained models and codes are available at https://github.com/yuesongtian/AlphaGAN.
△ Less
Submitted 7 August, 2021; v1 submitted 16 June, 2020;
originally announced June 2020.
-
Adaptive Dithering Using Curved Markov-Gaussian Noise in the Quantized Domain for Mapping SDR to HDR Image
Authors:
Subhayan Mukherjee,
Guan-Ming Su,
Irene Cheng
Abstract:
High Dynamic Range (HDR) imaging is gaining increased attention due to its realistic content, for not only regular displays but also smartphones. Before sufficient HDR content is distributed, HDR visualization still relies mostly on converting Standard Dynamic Range (SDR) content. SDR images are often quantized, or bit depth reduced, before SDR-to-HDR conversion, e.g. for video transmission. Quant…
▽ More
High Dynamic Range (HDR) imaging is gaining increased attention due to its realistic content, for not only regular displays but also smartphones. Before sufficient HDR content is distributed, HDR visualization still relies mostly on converting Standard Dynamic Range (SDR) content. SDR images are often quantized, or bit depth reduced, before SDR-to-HDR conversion, e.g. for video transmission. Quantization can easily lead to banding artefacts. In some computing and/or memory I/O limited environment, the traditional solution using spatial neighborhood information is not feasible. Our method includes noise generation (offline) and noise injection (online), and operates on pixels of the quantized image. We vary the magnitude and structure of the noise pattern adaptively based on the luma of the quantized pixel and the slope of the inverse-tone mapping function. Subjective user evaluations confirm the superior performance of our technique.
△ Less
Submitted 20 January, 2020;
originally announced January 2020.
-
Tangent-Space Gradient Optimization of Tensor Network for Machine Learning
Authors:
Zheng-zhi Sun,
Shi-ju Ran,
Gang Su
Abstract:
The gradient-based optimization method for deep machine learning models suffers from gradient vanishing and exploding problems, particularly when the computational graph becomes deep. In this work, we propose the tangent-space gradient optimization (TSGO) for the probabilistic models to keep the gradients from vanishing or exploding. The central idea is to guarantee the orthogonality between the v…
▽ More
The gradient-based optimization method for deep machine learning models suffers from gradient vanishing and exploding problems, particularly when the computational graph becomes deep. In this work, we propose the tangent-space gradient optimization (TSGO) for the probabilistic models to keep the gradients from vanishing or exploding. The central idea is to guarantee the orthogonality between the variational parameters and the gradients. The optimization is then implemented by rotating parameter vector towards the direction of gradient. We explain and testify TSGO in tensor network (TN) machine learning, where the TN describes the joint probability distribution as a normalized state $\left| ψ\right\rangle $ in Hilbert space. We show that the gradient can be restricted in the tangent space of $\left\langle ψ\right.\left| ψ\right\rangle = 1$ hyper-sphere. Instead of additional adaptive methods to control the learning rate in deep learning, the learning rate of TSGO is naturally determined by the angle $θ$ as $η= \tan θ$. Our numerical results reveal better convergence of TSGO in comparison to the off-the-shelf Adam.
△ Less
Submitted 10 January, 2020;
originally announced January 2020.
-
TextNAS: A Neural Architecture Search Space tailored for Text Representation
Authors:
Yujing Wang,
Yaming Yang,
Yiren Chen,
Jing Bai,
Ce Zhang,
Guinan Su,
Xiaoyu Kou,
Yunhai Tong,
Mao Yang,
Lidong Zhou
Abstract:
Learning text representation is crucial for text classification and other language related tasks. There are a diverse set of text representation networks in the literature, and how to find the optimal one is a non-trivial problem. Recently, the emerging Neural Architecture Search (NAS) techniques have demonstrated good potential to solve the problem. Nevertheless, most of the existing works of NAS…
▽ More
Learning text representation is crucial for text classification and other language related tasks. There are a diverse set of text representation networks in the literature, and how to find the optimal one is a non-trivial problem. Recently, the emerging Neural Architecture Search (NAS) techniques have demonstrated good potential to solve the problem. Nevertheless, most of the existing works of NAS focus on the search algorithms and pay little attention to the search space. In this paper, we argue that the search space is also an important human prior to the success of NAS in different applications. Thus, we propose a novel search space tailored for text representation. Through automatic search, the discovered network architecture outperforms state-of-the-art models on various public datasets on text classification and natural language inference tasks. Furthermore, some of the design principles found in the automatic network agree well with human intuition.
△ Less
Submitted 23 December, 2019;
originally announced December 2019.
-
Quantum Compressed Sensing with Unsupervised Tensor-Network Machine Learning
Authors:
Shi-Ju Ran,
Zheng-Zhi Sun,
Shao-Ming Fei,
Gang Su,
Maciej Lewenstein
Abstract:
We propose tensor-network compressed sensing (TNCS) by combining the ideas of compressed sensing, tensor network (TN), and machine learning, which permits novel and efficient quantum communications of realistic data. The strategy is to use the unsupervised TN machine learning algorithm to obtain the entangled state $|Ψ\rangle$ that describes the probability distribution of a huge amount of classic…
▽ More
We propose tensor-network compressed sensing (TNCS) by combining the ideas of compressed sensing, tensor network (TN), and machine learning, which permits novel and efficient quantum communications of realistic data. The strategy is to use the unsupervised TN machine learning algorithm to obtain the entangled state $|Ψ\rangle$ that describes the probability distribution of a huge amount of classical information considered to be communicated. To transfer a specific piece of information with $|Ψ\rangle$, our proposal is to encode such information in the separable state with the minimal distance to the measured state $|Φ\rangle$ that is obtained by partially measuring on $|Ψ\rangle$ in a designed way. To this end, a measuring protocol analogous to the compressed sensing with neural-network machine learning is suggested, where the measurements are designed to minimize uncertainty of information from the probability distribution given by $|Φ\rangle$. In this way, those who have $|Φ\rangle$ can reliably access the information by simply measuring on $|Φ\rangle$. We propose q-sparsity to characterize the sparsity of quantum states and the efficiency of the quantum communications by TNCS. The high q-sparsity is essentially due to the fact that the TN states describing nicely the probability distribution obey the area law of entanglement entropy. Testing on realistic datasets (hand-written digits and fashion images), TNCS is shown to possess high efficiency and accuracy, where the security of communications is guaranteed by the fundamental quantum principles.
△ Less
Submitted 13 October, 2019; v1 submitted 24 July, 2019;
originally announced July 2019.
-
StackVault: Protection from Untrusted Functions
Authors:
Qi Zhang,
Zehra Sura,
Ashish Kundu,
Gong Su,
Arun Iyengar,
Ling Liu
Abstract:
Data exfiltration attacks have led to huge data breaches. Recently, the Equifax attack affected 147M users and a third-party library - Apache Struts - was alleged to be responsible for it. These attacks often exploit the fact that sensitive data are stored unencrypted in process memory and can be accessed by any function executing within the same process, including untrusted third-party library fu…
▽ More
Data exfiltration attacks have led to huge data breaches. Recently, the Equifax attack affected 147M users and a third-party library - Apache Struts - was alleged to be responsible for it. These attacks often exploit the fact that sensitive data are stored unencrypted in process memory and can be accessed by any function executing within the same process, including untrusted third-party library functions. This paper presents StackVault, a kernel-based system to prevent sensitive stack-based data from being accessed in an unauthorized manner by intra-process functions. Stack-based data includes data on stack as well as data pointed to by pointer variables on stack. StackVault consists of three components: (1) a set of programming APIs to allow users to specify which data needs to be protected, (2) a kernel module which uses unforgeable function identities to reliably carry out the sensitive data protection, and (3) an LLVM compiler extension that enables transparent placement of stack protection operations. The StackVault system automatically enforces stack protection through spatial and temporal access monitoring and control over both sensitive stack data and untrusted functions. We implemented StackVault and evaluated it using a number of popular real-world applications, including gRPC. The results show that StackVault is effective and efficient, incurring only up to 2.4% runtime overhead.
△ Less
Submitted 8 July, 2019;
originally announced July 2019.
-
Generative Tensor Network Classification Model for Supervised Machine Learning
Authors:
Zheng-Zhi Sun,
Cheng Peng,
Ding Liu,
Shi-Ju Ran,
Gang Su
Abstract:
Tensor network (TN) has recently triggered extensive interests in developing machine-learning models in quantum many-body Hilbert space. Here we purpose a generative TN classification (GTNC) approach for supervised learning. The strategy is to train the generative TN for each class of the samples to construct the classifiers. The classification is implemented by comparing the distance in the many-…
▽ More
Tensor network (TN) has recently triggered extensive interests in developing machine-learning models in quantum many-body Hilbert space. Here we purpose a generative TN classification (GTNC) approach for supervised learning. The strategy is to train the generative TN for each class of the samples to construct the classifiers. The classification is implemented by comparing the distance in the many-body Hilbert space. The numerical experiments by GTNC show impressive performance on the MNIST and Fashion-MNIST dataset. The testing accuracy is competitive to the state-of-the-art convolutional neural network while higher than the naive Bayes classifier (a generative classifier) and support vector machine. Moreover, GTNC is more efficient than the existing TN models that are in general discriminative. By investigating the distances in the many-body Hilbert space, we find that (a) the samples are naturally clustering in such a space; and (b) bounding the bond dimensions of the TN's to finite values corresponds to removing redundant information in the image recognition. These two characters make GTNC an adaptive and universal model of excellent performance.
△ Less
Submitted 26 March, 2019;
originally announced March 2019.
-
Interpretable Two-level Boolean Rule Learning for Classification
Authors:
Guolong Su,
Dennis Wei,
Kush R. Varshney,
Dmitry M. Malioutov
Abstract:
As a contribution to interpretable machine learning research, we develop a novel optimization framework for learning accurate and sparse two-level Boolean rules. We consider rules in both conjunctive normal form (AND-of-ORs) and disjunctive normal form (OR-of-ANDs). A principled objective function is proposed to trade classification accuracy and interpretability, where we use Hamming loss to chara…
▽ More
As a contribution to interpretable machine learning research, we develop a novel optimization framework for learning accurate and sparse two-level Boolean rules. We consider rules in both conjunctive normal form (AND-of-ORs) and disjunctive normal form (OR-of-ANDs). A principled objective function is proposed to trade classification accuracy and interpretability, where we use Hamming loss to characterize accuracy and sparsity to characterize interpretability. We propose efficient procedures to optimize these objectives based on linear programming (LP) relaxation, block coordinate descent, and alternating minimization. Experiments show that our new algorithms provide very good tradeoffs between accuracy and interpretability.
△ Less
Submitted 18 June, 2016;
originally announced June 2016.
-
Impact Analysis of Baseband Quantizer on Coding Efficiency for HDR Video
Authors:
Chau-Wai Wong,
Guan-Ming Su,
Min Wu
Abstract:
Digitally acquired high dynamic range (HDR) video baseband signal can take 10 to 12 bits per color channel. It is economically important to be able to reuse the legacy 8 or 10-bit video codecs to efficiently compress the HDR video. Linear or nonlinear mapping on the intensity can be applied to the baseband signal to reduce the dynamic range before the signal is sent to the codec, and we refer to t…
▽ More
Digitally acquired high dynamic range (HDR) video baseband signal can take 10 to 12 bits per color channel. It is economically important to be able to reuse the legacy 8 or 10-bit video codecs to efficiently compress the HDR video. Linear or nonlinear mapping on the intensity can be applied to the baseband signal to reduce the dynamic range before the signal is sent to the codec, and we refer to this range reduction step as a baseband quantization. We show analytically and verify using test sequences that the use of the baseband quantizer lowers the coding efficiency. Experiments show that as the baseband quantizer is strengthened by 1.6 bits, the drop of PSNR at a high bitrate is up to 1.60dB. Our result suggests that in order to achieve high coding efficiency, information reduction of videos in terms of quantization error should be introduced in the video codec instead of on the baseband signal.
△ Less
Submitted 1 August, 2016; v1 submitted 9 March, 2016;
originally announced March 2016.
-
Interpretable Two-level Boolean Rule Learning for Classification
Authors:
Guolong Su,
Dennis Wei,
Kush R. Varshney,
Dmitry M. Malioutov
Abstract:
This paper proposes algorithms for learning two-level Boolean rules in Conjunctive Normal Form (CNF, i.e. AND-of-ORs) or Disjunctive Normal Form (DNF, i.e. OR-of-ANDs) as a type of human-interpretable classification model, aiming for a favorable trade-off between the classification accuracy and the simplicity of the rule. Two formulations are proposed. The first is an integer program whose objecti…
▽ More
This paper proposes algorithms for learning two-level Boolean rules in Conjunctive Normal Form (CNF, i.e. AND-of-ORs) or Disjunctive Normal Form (DNF, i.e. OR-of-ANDs) as a type of human-interpretable classification model, aiming for a favorable trade-off between the classification accuracy and the simplicity of the rule. Two formulations are proposed. The first is an integer program whose objective function is a combination of the total number of errors and the total number of features used in the rule. We generalize a previously proposed linear programming (LP) relaxation from one-level to two-level rules. The second formulation replaces the 0-1 classification error with the Hamming distance from the current two-level rule to the closest rule that correctly classifies a sample. Based on this second formulation, block coordinate descent and alternating minimization algorithms are developed. Experiments show that the two-level rules can yield noticeably better performance than one-level rules due to their dramatically larger modeling capacity, and the two algorithms based on the Hamming distance formulation are generally superior to the other two-level rule learning methods in our comparison. A proposed approach to binarize any fractional values in the optimal solutions of LP relaxations is also shown to be effective.
△ Less
Submitted 23 November, 2015;
originally announced November 2015.
-
Asymptotic Bounds for Quantitative Verification of Perturbed Probabilistic Systems
Authors:
Guoxin Su,
David S. Rosenblum
Abstract:
The majority of existing probabilistic model checking case studies are based on well understood theoretical models and distributions. However, real-life probabilistic systems usually involve distribution parameters whose values are obtained by empirical measurements and thus are subject to small perturbations. In this paper, we consider perturbation analysis of reachability in the parametric model…
▽ More
The majority of existing probabilistic model checking case studies are based on well understood theoretical models and distributions. However, real-life probabilistic systems usually involve distribution parameters whose values are obtained by empirical measurements and thus are subject to small perturbations. In this paper, we consider perturbation analysis of reachability in the parametric models of these systems (i.e., parametric Markov chains) equipped with the norm of absolute distance. Our main contribution is a method to compute the asymptotic bounds in the form of condition numbers for constrained reachability probabilities against perturbations of the distribution parameters of the system. The adequacy of the method is demonstrated through experiments with the Zeroconf protocol and the hopping frog problem.
△ Less
Submitted 27 August, 2013; v1 submitted 29 April, 2013;
originally announced April 2013.
-
Session Communication and Integration
Authors:
Guoxin Su,
Mingsheng Ying,
Chengqi Zhang
Abstract:
The scenario-based specification of a large distributed system is usually naturally decomposed into various modules. The integration of specification modules contrasts to the parallel composition of program components, and includes various ways such as scenario concatenation, choice, and nesting. The recent development of multiparty session types for process calculi provides useful techniques to a…
▽ More
The scenario-based specification of a large distributed system is usually naturally decomposed into various modules. The integration of specification modules contrasts to the parallel composition of program components, and includes various ways such as scenario concatenation, choice, and nesting. The recent development of multiparty session types for process calculi provides useful techniques to accommodate the protocol modularisation, by encoding fragments of communication protocols in the usage of private channels for a class of agents. In this paper, we extend forgoing session type theories by enhancing the session integration mechanism. More specifically, we propose a novel synchronous multiparty session type theory, in which sessions are separated into the communicating and integrating levels. Communicating sessions record the message-based communications between multiple agents, whilst integrating sessions describe the integration of communicating ones. A two-level session type system is developed for pi-calculus with syntactic primitives for session establishment, and several key properties of the type system are studied. Applying the theory to system description, we show that a channel safety property and a session conformance property can be analysed. Also, to improve the utility of the theory, a process slicing method is used to help identify the violated sessions in the type checking.
△ Less
Submitted 7 October, 2012;
originally announced October 2012.
-
Performance Analysis of l_0 Norm Constraint Least Mean Square Algorithm
Authors:
Guolong Su,
Jian Jin,
Yuantao Gu,
Jian Wang
Abstract:
As one of the recently proposed algorithms for sparse system identification, $l_0$ norm constraint Least Mean Square ($l_0$-LMS) algorithm modifies the cost function of the traditional method with a penalty of tap-weight sparsity. The performance of $l_0$-LMS is quite attractive compared with its various precursors. However, there has been no detailed study of its performance. This paper presents…
▽ More
As one of the recently proposed algorithms for sparse system identification, $l_0$ norm constraint Least Mean Square ($l_0$-LMS) algorithm modifies the cost function of the traditional method with a penalty of tap-weight sparsity. The performance of $l_0$-LMS is quite attractive compared with its various precursors. However, there has been no detailed study of its performance. This paper presents all-around and throughout theoretical performance analysis of $l_0$-LMS for white Gaussian input data based on some reasonable assumptions. Expressions for steady-state mean square deviation (MSD) are derived and discussed with respect to algorithm parameters and system sparsity. The parameter selection rule is established for achieving the best performance. Approximated with Taylor series, the instantaneous behavior is also derived. In addition, the relationship between $l_0$-LMS and some previous arts and the sufficient conditions for $l_0$-LMS to accelerate convergence are set up. Finally, all of the theoretical results are compared with simulations and are shown to agree well in a large range of parameter setting.
△ Less
Submitted 9 March, 2013; v1 submitted 7 March, 2012;
originally announced March 2012.
-
Condensation phase transition in nonlinear fitness networks
Authors:
Guifeng Su,
Xiaobing Zhang,
Yi Zhang
Abstract:
We analyze the condensation phase transitions in out-of-equilibrium complex networks in a unifying framework which includes the nonlinear model and the fitness model as its appropriate limits. We show a novel phase structure which depends on both the fitness parameter and the nonlinear exponent. The occurrence of the condensation phase transitions in the dynamical evolution of the network is demon…
▽ More
We analyze the condensation phase transitions in out-of-equilibrium complex networks in a unifying framework which includes the nonlinear model and the fitness model as its appropriate limits. We show a novel phase structure which depends on both the fitness parameter and the nonlinear exponent. The occurrence of the condensation phase transitions in the dynamical evolution of the network is demonstrated by using Bianconi-Barabasi method. We find that the nonlinear and the fitness preferential attachment mechanisms play important roles in formation of an interesting phase structure.
△ Less
Submitted 3 December, 2012; v1 submitted 16 March, 2011;
originally announced March 2011.