-
Every Pixel Has its Moments: Ultra-High-Resolution Unpaired Image-to-Image Translation via Dense Normalization
Authors:
Ming-Yang Ho,
Che-Ming Wu,
Min-Sheng Wu,
Yufeng Jane Tseng
Abstract:
Recent advancements in ultra-high-resolution unpaired image-to-image translation have aimed to mitigate the constraints imposed by limited GPU memory through patch-wise inference. Nonetheless, existing methods often compromise between the reduction of noticeable tiling artifacts and the preservation of color and hue contrast, attributed to the reliance on global image- or patch-level statistics in…
▽ More
Recent advancements in ultra-high-resolution unpaired image-to-image translation have aimed to mitigate the constraints imposed by limited GPU memory through patch-wise inference. Nonetheless, existing methods often compromise between the reduction of noticeable tiling artifacts and the preservation of color and hue contrast, attributed to the reliance on global image- or patch-level statistics in the instance normalization layers. In this study, we introduce a Dense Normalization (DN) layer designed to estimate pixel-level statistical moments. This approach effectively diminishes tiling artifacts while concurrently preserving local color and hue contrasts. To address the computational demands of pixel-level estimation, we further propose an efficient interpolation algorithm. Moreover, we invent a parallelism strategy that enables the DN layer to operate in a single pass. Through extensive experiments, we demonstrate that our method surpasses all existing approaches in performance. Notably, our DN layer is hyperparameter-free and can be seamlessly integrated into most unpaired image-to-image translation frameworks without necessitating retraining. Overall, our work paves the way for future exploration in handling images of arbitrary resolutions within the realm of unpaired image-to-image translation. Code is available at: https://github.com/Kaminyou/Dense-Normalization.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems
Authors:
Haibin Wu,
Yuan Tseng,
Hung-yi Lee
Abstract:
Current state-of-the-art (SOTA) codec-based audio synthesis systems can mimic anyone's voice with just a 3-second sample from that specific unseen speaker. Unfortunately, malicious attackers may exploit these technologies, causing misuse and security issues. Anti-spoofing models have been developed to detect fake speech. However, the open question of whether current SOTA anti-spoofing models can e…
▽ More
Current state-of-the-art (SOTA) codec-based audio synthesis systems can mimic anyone's voice with just a 3-second sample from that specific unseen speaker. Unfortunately, malicious attackers may exploit these technologies, causing misuse and security issues. Anti-spoofing models have been developed to detect fake speech. However, the open question of whether current SOTA anti-spoofing models can effectively counter deepfake audios from codec-based speech synthesis systems remains unanswered. In this paper, we curate an extensive collection of contemporary SOTA codec models, employing them to re-create synthesized speech. This endeavor leads to the creation of CodecFake, the first codec-based deepfake audio dataset. Additionally, we verify that anti-spoofing models trained on commonly used datasets cannot detect synthesized speech from current codec-based speech generation systems. The proposed CodecFake dataset empowers these models to counter this challenge effectively.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
A DeNoising FPN With Transformer R-CNN for Tiny Object Detection
Authors:
Hou-I Liu,
Yu-Wen Tseng,
Kai-Cheng Chang,
Pin-Jyun Wang,
Hong-Han Shuai,
Wen-Huang Cheng
Abstract:
Despite notable advancements in the field of computer vision, the precise detection of tiny objects continues to pose a significant challenge, largely owing to the minuscule pixel representation allocated to these objects in imagery data. This challenge resonates profoundly in the domain of geoscience and remote sensing, where high-fidelity detection of tiny objects can facilitate a myriad of appl…
▽ More
Despite notable advancements in the field of computer vision, the precise detection of tiny objects continues to pose a significant challenge, largely owing to the minuscule pixel representation allocated to these objects in imagery data. This challenge resonates profoundly in the domain of geoscience and remote sensing, where high-fidelity detection of tiny objects can facilitate a myriad of applications ranging from urban planning to environmental monitoring. In this paper, we propose a new framework, namely, DeNoising FPN with Trans R-CNN (DNTR), to improve the performance of tiny object detection. DNTR consists of an easy plug-in design, DeNoising FPN (DN-FPN), and an effective Transformer-based detector, Trans R-CNN. Specifically, feature fusion in the feature pyramid network is important for detecting multiscale objects. However, noisy features may be produced during the fusion process since there is no regularization between the features of different scales. Therefore, we introduce a DN-FPN module that utilizes contrastive learning to suppress noise in each level's features in the top-down path of FPN. Second, based on the two-stage framework, we replace the obsolete R-CNN detector with a novel Trans R-CNN detector to focus on the representation of tiny objects with self-attention. Experimental results manifest that our DNTR outperforms the baselines by at least 17.4% in terms of APvt on the AI-TOD dataset and 9.6% in terms of AP on the VisDrone dataset, respectively. Our code will be available at https://github.com/hoiliu-0801/DNTR.
△ Less
Submitted 15 June, 2024; v1 submitted 9 June, 2024;
originally announced June 2024.
-
MP-PolarMask: A Faster and Finer Instance Segmentation for Concave Images
Authors:
Ke-Lei Wang,
Pin-Hsuan Chou,
Young-Ching Chou,
Chia-Jen Liu,
Cheng-Kuan Lin,
Yu-Chee Tseng
Abstract:
While there are a lot of models for instance segmentation, PolarMask stands out as a unique one that represents an object by a Polar coordinate system. With an anchor-box-free design and a single-stage framework that conducts detection and segmentation at one time, PolarMask is proved to be able to balance efficiency and accuracy. Hence, it can be easily connected with other downstream real-time a…
▽ More
While there are a lot of models for instance segmentation, PolarMask stands out as a unique one that represents an object by a Polar coordinate system. With an anchor-box-free design and a single-stage framework that conducts detection and segmentation at one time, PolarMask is proved to be able to balance efficiency and accuracy. Hence, it can be easily connected with other downstream real-time applications. In this work, we observe that there are two deficiencies associated with PolarMask: (i) inability of representing concave objects and (ii) inefficiency in using ray regression. We propose MP-PolarMask (Multi-Point PolarMask) by taking advantage of multiple Polar systems. The main idea is to extend from one main Polar system to four auxiliary Polar systems, thus capable of representing more complicated convex-and-concave-mixed shapes. We validate MP-PolarMask on both general objects and food objects of the COCO dataset, and the results demonstrate significant improvement of 13.69% in AP_L and 7.23% in AP over PolarMask with 36 rays.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization
Authors:
Yu-Min Tseng,
Yu-Chao Huang,
Teng-Yun Hsiao,
Wei-Lin Chen,
Chao-Wei Huang,
Yu Meng,
Yun-Nung Chen
Abstract:
The concept of persona, originally adopted in dialogue literature, has re-surged as a promising framework for tailoring large language models (LLMs) to specific context (e.g., personalized search, LLM-as-a-judge). However, the growing research on leveraging persona in LLMs is relatively disorganized and lacks a systematic taxonomy. To close the gap, we present a comprehensive survey to categorize…
▽ More
The concept of persona, originally adopted in dialogue literature, has re-surged as a promising framework for tailoring large language models (LLMs) to specific context (e.g., personalized search, LLM-as-a-judge). However, the growing research on leveraging persona in LLMs is relatively disorganized and lacks a systematic taxonomy. To close the gap, we present a comprehensive survey to categorize the current state of the field. We identify two lines of research, namely (1) LLM Role-Playing, where personas are assigned to LLMs, and (2) LLM Personalization, where LLMs take care of user personas. Additionally, we introduce existing methods for LLM personality evaluation. To the best of our knowledge, we present the first survey for role-playing and personalization in LLMs under the unified view of persona. We continuously maintain a paper collection to foster future endeavors: https://github.com/MiuLab/PersonaLLM-Survey
△ Less
Submitted 26 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Word-specific tonal realizations in Mandarin
Authors:
Yu-Ying Chuang,
Melanie J. Bell,
Yu-Hsiang Tseng,
R. Harald Baayen
Abstract:
The pitch contours of Mandarin two-character words are generally understood as being shaped by the underlying tones of the constituent single-character words, in interaction with articulatory constraints imposed by factors such as speech rate, co-articulation with adjacent tones, segmental make-up, and predictability. This study shows that tonal realization is also partially determined by words' m…
▽ More
The pitch contours of Mandarin two-character words are generally understood as being shaped by the underlying tones of the constituent single-character words, in interaction with articulatory constraints imposed by factors such as speech rate, co-articulation with adjacent tones, segmental make-up, and predictability. This study shows that tonal realization is also partially determined by words' meanings. We first show, on the basis of a Taiwan corpus of spontaneous conversations, using the generalized additive regression model, and focusing on the rise-fall tone pattern, that after controlling for effects of speaker and context, word type is a stronger predictor of pitch realization than all the previously established word-form related predictors combined. Importantly, the addition of information about meaning in context improves prediction accuracy even further. We then proceed to show, using computational modeling with context-specific word embeddings, that token-specific pitch contours predict word type with 50% accuracy on held-out data, and that context-sensitive, token-specific embeddings can predict the shape of pitch contours with 30% accuracy. These accuracies, which are an order of magnitude above chance level, suggest that the relation between words' pitch contours and their meanings are sufficiently strong to be functional for language users. The theoretical implications of these empirical findings are discussed.
△ Less
Submitted 11 May, 2024;
originally announced May 2024.
-
EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning
Authors:
Hongxia Xie,
Chu-Jun Peng,
Yu-Wen Tseng,
Hung-Jen Chen,
Chan-Feng Hsu,
Hong-Han Shuai,
Wen-Huang Cheng
Abstract:
Visual Instruction Tuning represents a novel learning paradigm involving the fine-tuning of pre-trained language models using task-specific instructions. This paradigm shows promising zero-shot results in various natural language processing tasks but is still unexplored in vision emotion understanding. In this work, we focus on enhancing the model's proficiency in understanding and adhering to ins…
▽ More
Visual Instruction Tuning represents a novel learning paradigm involving the fine-tuning of pre-trained language models using task-specific instructions. This paradigm shows promising zero-shot results in various natural language processing tasks but is still unexplored in vision emotion understanding. In this work, we focus on enhancing the model's proficiency in understanding and adhering to instructions related to emotional contexts. Initially, we identify key visual clues critical to visual emotion recognition. Subsequently, we introduce a novel GPT-assisted pipeline for generating emotion visual instruction data, effectively addressing the scarcity of annotated instruction data in this domain. Expanding on the groundwork established by InstructBLIP, our proposed EmoVIT architecture incorporates emotion-specific instruction data, leveraging the powerful capabilities of Large Language Models to enhance performance. Through extensive experiments, our model showcases its proficiency in emotion classification, adeptness in affective reasoning, and competence in comprehending humor. The comparative analysis provides a robust benchmark for Emotion Visual Instruction Tuning in the era of LLMs, providing valuable insights and opening avenues for future exploration in this domain. Our code is available at \url{https://github.com/aimmemotion/EmoVIT}.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Help Supporters: Exploring the Design Space of Assistive Technologies to Support Face-to-Face Help Between Blind and Sighted Strangers
Authors:
Yuanyang Teng,
Connor Courtien,
David Angel Rios,
Yves M. Tseng,
Jacqueline Gibson,
Maryam Aziz,
Avery Reyna,
Rajan Vaish,
Brian A. Smith
Abstract:
Blind and low-vision (BLV) people face many challenges when venturing into public environments, often wishing it were easier to get help from people nearby. Ironically, while many sighted individuals are willing to help, such interactions are infrequent. Asking for help is socially awkward for BLV people, and sighted people lack experience in helping BLV people. Through a mixed-ability research-th…
▽ More
Blind and low-vision (BLV) people face many challenges when venturing into public environments, often wishing it were easier to get help from people nearby. Ironically, while many sighted individuals are willing to help, such interactions are infrequent. Asking for help is socially awkward for BLV people, and sighted people lack experience in helping BLV people. Through a mixed-ability research-through-design process, we explore four diverse approaches toward how assistive technology can serve as help supporters that collaborate with both BLV and sighted parties throughout the help process. These approaches span two phases: the connection phase (finding someone to help) and the collaboration phase (facilitating help after finding someone). Our findings from a 20-participant mixed-ability study reveal how help supporters can best facilitate connection, which types of information they should present during both phases, and more. We discuss design implications for future approaches to support face-to-face help.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Large Language Multimodal Models for 5-Year Chronic Disease Cohort Prediction Using EHR Data
Authors:
Jun-En Ding,
Phan Nguyen Minh Thao,
Wen-Chih Peng,
Jian-Zhe Wang,
Chun-Cheng Chug,
Min-Chen Hsieh,
Yun-Chien Tseng,
Ling Chen,
Dongsheng Luo,
Chi-Te Wang,
Pei-fu Chen,
Feng Liu,
Fang-Ming Hung
Abstract:
Chronic diseases such as diabetes are the leading causes of morbidity and mortality worldwide. Numerous research studies have been attempted with various deep learning models in diagnosis. However, most previous studies had certain limitations, including using publicly available datasets (e.g. MIMIC), and imbalanced data. In this study, we collected five-year electronic health records (EHRs) from…
▽ More
Chronic diseases such as diabetes are the leading causes of morbidity and mortality worldwide. Numerous research studies have been attempted with various deep learning models in diagnosis. However, most previous studies had certain limitations, including using publicly available datasets (e.g. MIMIC), and imbalanced data. In this study, we collected five-year electronic health records (EHRs) from the Taiwan hospital database, including 1,420,596 clinical notes, 387,392 laboratory test results, and more than 1,505 laboratory test items, focusing on research pre-training large language models. We proposed a novel Large Language Multimodal Models (LLMMs) framework incorporating multimodal data from clinical notes and laboratory test results for the prediction of chronic disease risk. Our method combined a text embedding encoder and multi-head attention layer to learn laboratory test values, utilizing a deep neural network (DNN) module to merge blood features with chronic disease semantics into a latent space. In our experiments, we observe that clinicalBERT and PubMed-BERT, when combined with attention fusion, can achieve an accuracy of 73% in multiclass chronic diseases and diabetes prediction. By transforming laboratory test values into textual descriptions and employing the Flan T-5 model, we achieved a 76% Area Under the ROC Curve (AUROC), demonstrating the effectiveness of leveraging numerical text data for training and inference in language models. This approach significantly improves the accuracy of early-stage diabetes prediction.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR
Authors:
Liang-Hsuan Tseng,
En-Pei Hu,
Cheng-Han Chiang,
Yuan Tseng,
Hung-yi Lee,
Lin-shan Lee,
Shao-Hua Sun
Abstract:
Unsupervised automatic speech recognition (ASR) aims to learn the mapping between the speech signal and its corresponding textual transcription without the supervision of paired speech-text data. A word/phoneme in the speech signal is represented by a segment of speech signal with variable length and unknown boundary, and this segmental structure makes learning the mapping between speech and text…
▽ More
Unsupervised automatic speech recognition (ASR) aims to learn the mapping between the speech signal and its corresponding textual transcription without the supervision of paired speech-text data. A word/phoneme in the speech signal is represented by a segment of speech signal with variable length and unknown boundary, and this segmental structure makes learning the mapping between speech and text challenging, especially without paired data. In this paper, we propose REBORN,Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR. REBORN alternates between (1) training a segmentation model that predicts the boundaries of the segmental structures in speech signals and (2) training the phoneme prediction model, whose input is the speech feature segmented by the segmentation model, to predict a phoneme transcription. Since supervised data for training the segmentation model is not available, we use reinforcement learning to train the segmentation model to favor segmentations that yield phoneme sequence predictions with a lower perplexity. We conduct extensive experiments and find that under the same setting, REBORN outperforms all prior unsupervised ASR models on LibriSpeech, TIMIT, and five non-English languages in Multilingual LibriSpeech. We comprehensively analyze why the boundaries learned by REBORN improve the unsupervised ASR performance.
△ Less
Submitted 28 May, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Resolving Regular Polysemy in Named Entities
Authors:
Shu-Kai Hsieh,
Yu-Hsiang Tseng,
Hsin-Yu Chou,
Ching-Wen Yang,
Yu-Yun Chang
Abstract:
Word sense disambiguation primarily addresses the lexical ambiguity of common words based on a predefined sense inventory. Conversely, proper names are usually considered to denote an ad-hoc real-world referent. Once the reference is decided, the ambiguity is purportedly resolved. However, proper names also exhibit ambiguities through appellativization, i.e., they act like common words and may den…
▽ More
Word sense disambiguation primarily addresses the lexical ambiguity of common words based on a predefined sense inventory. Conversely, proper names are usually considered to denote an ad-hoc real-world referent. Once the reference is decided, the ambiguity is purportedly resolved. However, proper names also exhibit ambiguities through appellativization, i.e., they act like common words and may denote different aspects of their referents. We proposed to address the ambiguities of proper names through the light of regular polysemy, which we formalized as dot objects. This paper introduces a combined word sense disambiguation (WSD) model for disambiguating common words against Chinese Wordnet (CWN) and proper names as dot objects. The model leverages the flexibility of a gloss-based model architecture, which takes advantage of the glosses and example sentences of CWN. We show that the model achieves competitive results on both common and proper nouns, even on a relatively sparse sense dataset. Aside from being a performant WSD tool, the model further facilitates the future development of the lexical resource.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Scale-Aware Crowd Count Network with Annotation Error Correction
Authors:
Yi-Kuan Hsieh,
Jun-Wei Hsieh,
Yu-Chee Tseng,
Ming-Ching Chang,
Li Xin
Abstract:
Traditional crowd counting networks suffer from information loss when feature maps are downsized through pooling layers, leading to inaccuracies in counting crowds at a distance. Existing methods often assume correct annotations during training, disregarding the impact of noisy annotations, especially in crowded scenes. Furthermore, the use of a fixed Gaussian kernel fails to account for the varyi…
▽ More
Traditional crowd counting networks suffer from information loss when feature maps are downsized through pooling layers, leading to inaccuracies in counting crowds at a distance. Existing methods often assume correct annotations during training, disregarding the impact of noisy annotations, especially in crowded scenes. Furthermore, the use of a fixed Gaussian kernel fails to account for the varying pixel distribution with respect to the camera distance. To overcome these challenges, we propose a Scale-Aware Crowd Counting Network (SACC-Net) that introduces a ``scale-aware'' architecture with error-correcting capabilities of noisy annotations. For the first time, we {\bf simultaneously} model labeling errors (mean) and scale variations (variance) by spatially-varying Gaussian distributions to produce fine-grained heat maps for crowd counting. Furthermore, the proposed adaptive Gaussian kernel variance enables the model to learn dynamically with a low-rank approximation, leading to improved convergence efficiency with comparable accuracy. The performance of SACC-Net is extensively evaluated on four public datasets: UCF-QNRF, UCF CC 50, NWPU, and ShanghaiTech A-B. Experimental results demonstrate that SACC-Net outperforms all state-of-the-art methods, validating its effectiveness in achieving superior crowd counting accuracy.
△ Less
Submitted 27 December, 2023;
originally announced December 2023.
-
PointNeRF++: A multi-scale, point-based Neural Radiance Field
Authors:
Weiwei Sun,
Eduard Trulls,
Yang-Che Tseng,
Sneha Sambandam,
Gopal Sharma,
Andrea Tagliasacchi,
Kwang Moo Yi
Abstract:
Point clouds offer an attractive source of information to complement images in neural scene representations, especially when few images are available. Neural rendering methods based on point clouds do exist, but they do not perform well when the point cloud quality is low -- e.g., sparse or incomplete, which is often the case with real-world data. We overcome these problems with a simple represent…
▽ More
Point clouds offer an attractive source of information to complement images in neural scene representations, especially when few images are available. Neural rendering methods based on point clouds do exist, but they do not perform well when the point cloud quality is low -- e.g., sparse or incomplete, which is often the case with real-world data. We overcome these problems with a simple representation that aggregates point clouds at multiple scale levels with sparse voxel grids at different resolutions. To deal with point cloud sparsity, we average across multiple scale levels -- but only among those that are valid, i.e., that have enough neighboring points in proximity to the ray of a pixel. To help model areas without points, we add a global voxel at the coarsest scale, thus unifying ``classical'' and point-based NeRF formulations. We validate our method on the NeRF Synthetic, ScanNet, and KITTI-360 datasets, outperforming the state of the art, with a significant gap compared to other NeRF-based methods, especially on more challenging scenes.
△ Less
Submitted 21 March, 2024; v1 submitted 4 December, 2023;
originally announced December 2023.
-
Federated Learning for Sparse Principal Component Analysis
Authors:
Sin Cheng Ciou,
Pin Jui Chen,
Elvin Y. Tseng,
Yuh-Jye Lee
Abstract:
In the rapidly evolving realm of machine learning, algorithm effectiveness often faces limitations due to data quality and availability. Traditional approaches grapple with data sharing due to legal and privacy concerns. The federated learning framework addresses this challenge. Federated learning is a decentralized approach where model training occurs on client sides, preserving privacy by keepin…
▽ More
In the rapidly evolving realm of machine learning, algorithm effectiveness often faces limitations due to data quality and availability. Traditional approaches grapple with data sharing due to legal and privacy concerns. The federated learning framework addresses this challenge. Federated learning is a decentralized approach where model training occurs on client sides, preserving privacy by keeping data localized. Instead of sending raw data to a central server, only model updates are exchanged, enhancing data security. We apply this framework to Sparse Principal Component Analysis (SPCA) in this work. SPCA aims to attain sparse component loadings while maximizing data variance for improved interpretability. Beside the L1 norm regularization term in conventional SPCA, we add a smoothing function to facilitate gradient-based optimization methods. Moreover, in order to improve computational efficiency, we introduce a least squares approximation to original SPCA. This enables analytic solutions on the optimization processes, leading to substantial computational improvements. Within the federated framework, we formulate SPCA as a consensus optimization problem, which can be solved using the Alternating Direction Method of Multipliers (ADMM). Our extensive experiments involve both IID and non-IID random features across various data owners. Results on synthetic and public datasets affirm the efficacy of our federated SPCA approach.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
ActiveAI: Introducing AI Literacy for Middle School Learners with Goal-based Scenario Learning
Authors:
Ying Jui Tseng,
Gautam Yadav
Abstract:
The ActiveAI project addresses key challenges in AI education for grades 7-9 students by providing an engaging AI literacy learning experience based on the AI4K12 knowledge framework. Utilizing learning science mechanisms such as goal-based scenarios, immediate feedback, project-based learning, and intelligent agents, the app incorporates a variety of learner inputs like sliders, steppers, and col…
▽ More
The ActiveAI project addresses key challenges in AI education for grades 7-9 students by providing an engaging AI literacy learning experience based on the AI4K12 knowledge framework. Utilizing learning science mechanisms such as goal-based scenarios, immediate feedback, project-based learning, and intelligent agents, the app incorporates a variety of learner inputs like sliders, steppers, and collectors to enhance understanding. In these courses, students work on real-world scenarios like analyzing sentiment in social media comments. This helps them learn to effectively engage with AI systems and develop their ability to evaluate AI-generated output. The Learning Engineering Process (LEP) guided the project's creation and data instrumentation, focusing on design and impact. The project is currently in the implementation stage, leveraging the intelligent tutor design principles for app development. The extended abstract presents the foundational design and development, with further evaluation and research to be conducted in the future.
△ Less
Submitted 21 August, 2023;
originally announced September 2023.
-
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Authors:
Yuan Tseng,
Layne Berry,
Yi-Ting Chen,
I-Hsiang Chiu,
Hsuan-Hao Lin,
Max Liu,
Puyuan Peng,
Yi-Jen Shih,
Hung-Yu Wang,
Haibin Wu,
Po-Yao Huang,
Chun-Mao Lai,
Shang-Wen Li,
David Harwath,
Yu Tsao,
Shinji Watanabe,
Abdelrahman Mohamed,
Chi-Luen Feng,
Hung-yi Lee
Abstract:
Audio-visual representation learning aims to develop systems with human-like perception by utilizing correlation between auditory and visual information. However, current models often focus on a limited set of tasks, and generalization abilities of learned representations are unclear. To this end, we propose the AV-SUPERB benchmark that enables general-purpose evaluation of unimodal audio/visual a…
▽ More
Audio-visual representation learning aims to develop systems with human-like perception by utilizing correlation between auditory and visual information. However, current models often focus on a limited set of tasks, and generalization abilities of learned representations are unclear. To this end, we propose the AV-SUPERB benchmark that enables general-purpose evaluation of unimodal audio/visual and bimodal fusion representations on 7 datasets covering 5 audio-visual tasks in speech and audio processing. We evaluate 5 recent self-supervised models and show that none of these models generalize to all tasks, emphasizing the need for future study on improving universal model performance. In addition, we show that representations may be improved with intermediate-task fine-tuning and audio event classification with AudioSet serves as a strong intermediate task. We release our benchmark with evaluation code and a model submission platform to encourage further research in audio-visual learning.
△ Less
Submitted 19 March, 2024; v1 submitted 19 September, 2023;
originally announced September 2023.
-
Tracking Players in a Badminton Court by Two Cameras
Authors:
Young-Ching Chou,
Shen-Ru Zhang,
Bo-Wei Chen,
Hong-Qi Chen,
Cheng-Kuan Lin,
Yu-Chee Tseng
Abstract:
This study proposes a simple method for multi-object tracking (MOT) of players in a badminton court. We leverage two off-the-shelf cameras, one on the top of the court and the other on the side of the court. The one on the top is to track players' trajectories, while the one on the side is to analyze the pixel features of players. By computing the correlations between adjacent frames and engaging…
▽ More
This study proposes a simple method for multi-object tracking (MOT) of players in a badminton court. We leverage two off-the-shelf cameras, one on the top of the court and the other on the side of the court. The one on the top is to track players' trajectories, while the one on the side is to analyze the pixel features of players. By computing the correlations between adjacent frames and engaging the information of the two cameras, MOT of badminton players is obtained. This two-camera approach addresses the challenge of player occlusion and overlapping in a badminton court, providing player trajectory tracking and multi-angle analysis. The presented system offers insights into the positions and movements of badminton players, thus serving as a coaching or self-training tool for badminton players to improve their gaming strategies.
△ Less
Submitted 9 August, 2023;
originally announced August 2023.
-
LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs
Authors:
Tongshuang Wu,
Haiyi Zhu,
Maya Albayrak,
Alexis Axon,
Amanda Bertsch,
Wenxing Deng,
Ziqi Ding,
Bill Guo,
Sireesh Gururaja,
Tzu-Sheng Kuo,
Jenny T. Liang,
Ryan Liu,
Ihita Mandal,
Jeremiah Milbauer,
Xiaolin Ni,
Namrata Padmanabhan,
Subhashini Ramkumar,
Alexis Sudjianto,
Jordan Taylor,
Ying-Jui Tseng,
Patricia Vaidos,
Zhijin Wu,
Wei Wu,
Chenyang Yang
Abstract:
LLMs have shown promise in replicating human-like behavior in crowdsourcing tasks that were previously thought to be exclusive to human abilities. However, current efforts focus mainly on simple atomic tasks. We explore whether LLMs can replicate more complex crowdsourcing pipelines. We find that modern LLMs can simulate some of crowdworkers' abilities in these "human computation algorithms," but…
▽ More
LLMs have shown promise in replicating human-like behavior in crowdsourcing tasks that were previously thought to be exclusive to human abilities. However, current efforts focus mainly on simple atomic tasks. We explore whether LLMs can replicate more complex crowdsourcing pipelines. We find that modern LLMs can simulate some of crowdworkers' abilities in these "human computation algorithms," but the level of success is variable and influenced by requesters' understanding of LLM capabilities, the specific skills required for sub-tasks, and the optimal interaction modality for performing these sub-tasks. We reflect on human and LLMs' different sensitivities to instructions, stress the importance of enabling human-facing safeguards for LLMs, and discuss the potential of training humans and LLMs with complementary skill sets. Crucially, we show that replicating crowdsourcing pipelines offers a valuable platform to investigate (1) the relative strengths of LLMs on different tasks (by cross-comparing their performances on sub-tasks) and (2) LLMs' potential in complex tasks, where they can complete part of the tasks while leaving others to humans.
△ Less
Submitted 19 July, 2023; v1 submitted 19 July, 2023;
originally announced July 2023.
-
Contextualizing Problems to Student Interests at Scale in Intelligent Tutoring System Using Large Language Models
Authors:
Gautam Yadav,
Ying-Jui Tseng,
Xiaolin Ni
Abstract:
Contextualizing problems to align with student interests can significantly improve learning outcomes. However, this task often presents scalability challenges due to resource and time constraints. Recent advancements in Large Language Models (LLMs) like GPT-4 offer potential solutions to these issues. This study explores the ability of GPT-4 in the contextualization of problems within CTAT, an int…
▽ More
Contextualizing problems to align with student interests can significantly improve learning outcomes. However, this task often presents scalability challenges due to resource and time constraints. Recent advancements in Large Language Models (LLMs) like GPT-4 offer potential solutions to these issues. This study explores the ability of GPT-4 in the contextualization of problems within CTAT, an intelligent tutoring system, aiming to increase student engagement and enhance learning outcomes. Through iterative prompt engineering, we achieved meaningful contextualization that preserved the difficulty and original intent of the problem, thereby not altering values or overcomplicating the questions. While our research highlights the potential of LLMs in educational settings, we acknowledge current limitations, particularly with geometry problems, and emphasize the need for ongoing evaluation and research. Future work includes systematic studies to measure the impact of this tool on students' learning outcomes and enhancements to handle a broader range of problems.
△ Less
Submitted 31 May, 2023;
originally announced June 2023.
-
Vec2Gloss: definition modeling leveraging contextualized vectors with Wordnet gloss
Authors:
Yu-Hsiang Tseng,
Mao-Chang Ku,
Wei-Ling Chen,
Yu-Lin Chang,
Shu-Kai Hsieh
Abstract:
Contextualized embeddings are proven to be powerful tools in multiple NLP tasks. Nonetheless, challenges regarding their interpretability and capability to represent lexical semantics still remain. In this paper, we propose that the task of definition modeling, which aims to generate the human-readable definition of the word, provides a route to evaluate or understand the high dimensional semantic…
▽ More
Contextualized embeddings are proven to be powerful tools in multiple NLP tasks. Nonetheless, challenges regarding their interpretability and capability to represent lexical semantics still remain. In this paper, we propose that the task of definition modeling, which aims to generate the human-readable definition of the word, provides a route to evaluate or understand the high dimensional semantic vectors. We propose a `Vec2Gloss' model, which produces the gloss from the target word's contextualized embeddings. The generated glosses of this study are made possible by the systematic gloss patterns provided by Chinese Wordnet. We devise two dependency indices to measure the semantic and contextual dependency, which are used to analyze the generated texts in gloss and token levels. Our results indicate that the proposed `Vec2Gloss' model opens a new perspective to the lexical-semantic applications of contextualized embeddings.
△ Less
Submitted 28 May, 2023;
originally announced May 2023.
-
Lexical Retrieval Hypothesis in Multimodal Context
Authors:
Po-Ya Angela Wang,
Pin-Er Chen,
Hsin-Yu Chou,
Yu-Hsiang Tseng,
Shu-Kai Hsieh
Abstract:
Multimodal corpora have become an essential language resource for language science and grounded natural language processing (NLP) systems due to the growing need to understand and interpret human communication across various channels. In this paper, we first present our efforts in building the first Multimodal Corpus for Languages in Taiwan (MultiMoco). Based on the corpus, we conduct a case study…
▽ More
Multimodal corpora have become an essential language resource for language science and grounded natural language processing (NLP) systems due to the growing need to understand and interpret human communication across various channels. In this paper, we first present our efforts in building the first Multimodal Corpus for Languages in Taiwan (MultiMoco). Based on the corpus, we conduct a case study investigating the Lexical Retrieval Hypothesis (LRH), specifically examining whether the hand gestures co-occurring with speech constants facilitate lexical retrieval or serve other discourse functions. With detailed annotations on eight parliamentary interpellations in Taiwan Mandarin, we explore the co-occurrence between speech constants and non-verbal features (i.e., head movement, face movement, hand gesture, and function of hand gesture). Our findings suggest that while hand gestures do serve as facilitators for lexical retrieval in some cases, they also serve the purpose of information emphasis. This study highlights the potential of the MultiMoco Corpus to provide an important resource for in-depth analysis and further research in multimodal communication studies.
△ Less
Submitted 28 May, 2023;
originally announced May 2023.
-
Exploring Affordance and Situated Meaning in Image Captions: A Multimodal Analysis
Authors:
Pin-Er Chen,
Po-Ya Angela Wang,
Hsin-Yu Chou,
Yu-Hsiang Tseng,
Shu-Kai Hsieh
Abstract:
This paper explores the grounding issue regarding multimodal semantic representation from a computational cognitive-linguistic view. We annotate images from the Flickr30k dataset with five perceptual properties: Affordance, Perceptual Salience, Object Number, Gaze Cueing, and Ecological Niche Association (ENA), and examine their association with textual elements in the image captions. Our findings…
▽ More
This paper explores the grounding issue regarding multimodal semantic representation from a computational cognitive-linguistic view. We annotate images from the Flickr30k dataset with five perceptual properties: Affordance, Perceptual Salience, Object Number, Gaze Cueing, and Ecological Niche Association (ENA), and examine their association with textual elements in the image captions. Our findings reveal that images with Gibsonian affordance show a higher frequency of captions containing 'holding-verbs' and 'container-nouns' compared to images displaying telic affordance. Perceptual Salience, Object Number, and ENA are also associated with the choice of linguistic expressions. Our study demonstrates that comprehensive understanding of objects or events requires cognitive attention, semantic nuances in language, and integration across multiple modalities. We highlight the vital importance of situated meaning and affordance grounding in natural language understanding, with the potential to advance human-like interpretation in various scenarios.
△ Less
Submitted 24 October, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
GPTutor: a ChatGPT-powered programming tool for code explanation
Authors:
Eason Chen,
Ray Huang,
Han-Shin Chen,
Yuen-Hsien Tseng,
Liang-Yi Li
Abstract:
Learning new programming skills requires tailored guidance. With the emergence of advanced Natural Language Generation models like the ChatGPT API, there is now a possibility of creating a convenient and personalized tutoring system with AI for computer science education. This paper presents GPTutor, a ChatGPT-powered programming tool, which is a Visual Studio Code extension using the ChatGPT API…
▽ More
Learning new programming skills requires tailored guidance. With the emergence of advanced Natural Language Generation models like the ChatGPT API, there is now a possibility of creating a convenient and personalized tutoring system with AI for computer science education. This paper presents GPTutor, a ChatGPT-powered programming tool, which is a Visual Studio Code extension using the ChatGPT API to provide programming code explanations. By integrating Visual Studio Code API, GPTutor can comprehensively analyze the provided code by referencing the relevant source codes. As a result, GPTutor can use designed prompts to explain the selected code with a pop-up message. GPTutor is now published at the Visual Studio Code Extension Marketplace, and its source code is openly accessible on GitHub. Preliminary evaluation indicates that GPTutor delivers the most concise and accurate explanations compared to vanilla ChatGPT and GitHub Copilot. Moreover, the feedback from students and teachers indicated that GPTutor is user-friendly and can explain given codes satisfactorily. Finally, we discuss possible future research directions for GPTutor. This includes enhancing its performance and personalization via further prompt programming, as well as evaluating the effectiveness of GPTutor with real users.
△ Less
Submitted 15 June, 2023; v1 submitted 2 May, 2023;
originally announced May 2023.
-
Privacy-Preserving Video Conferencing via Thermal-Generative Images
Authors:
Sheng-Yang Chiu,
Yu-Ting Huang,
Chieh-Ting Lin,
Yu-Chee Tseng,
Jen-Jee Chen,
Meng-Hsuan Tu,
Bo-Chen Tung,
YuJou Nieh
Abstract:
Due to the COVID-19 epidemic, video conferencing has evolved as a new paradigm of communication and teamwork. However, private and personal information can be easily leaked through cameras during video conferencing. This includes leakage of a person's appearance as well as the contents in the background. This paper proposes a novel way of using online low-resolution thermal images as conditions to…
▽ More
Due to the COVID-19 epidemic, video conferencing has evolved as a new paradigm of communication and teamwork. However, private and personal information can be easily leaked through cameras during video conferencing. This includes leakage of a person's appearance as well as the contents in the background. This paper proposes a novel way of using online low-resolution thermal images as conditions to guide the synthesis of RGB images, bringing a promising solution for real-time video conferencing when privacy leakage is a concern. SPADE-SR (Spatially-Adaptive De-normalization with Self Resampling), a variant of SPADE, is adopted to incorporate the spatial property of a thermal heatmap and the non-thermal property of a normal, privacy-free pre-recorded RGB image provided in a form of latent code. We create a PAIR-LRT-Human (LRT = Low-Resolution Thermal) dataset to validate our claims. The result enables a convenient way of video conferencing where users no longer need to groom themselves and tidy up backgrounds for a short meeting. Additionally, it allows a user to switch to a different appearance and background during a conference.
△ Less
Submitted 28 March, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
Cascading and Direct Approaches to Unsupervised Constituency Parsing on Spoken Sentences
Authors:
Yuan Tseng,
Cheng-I Lai,
Hung-yi Lee
Abstract:
Past work on unsupervised parsing is constrained to written form. In this paper, we present the first study on unsupervised spoken constituency parsing given unlabeled spoken sentences and unpaired textual data. The goal is to determine the spoken sentences' hierarchical syntactic structure in the form of constituency parse trees, such that each node is a span of audio that corresponds to a consti…
▽ More
Past work on unsupervised parsing is constrained to written form. In this paper, we present the first study on unsupervised spoken constituency parsing given unlabeled spoken sentences and unpaired textual data. The goal is to determine the spoken sentences' hierarchical syntactic structure in the form of constituency parse trees, such that each node is a span of audio that corresponds to a constituent. We compare two approaches: (1) cascading an unsupervised automatic speech recognition (ASR) model and an unsupervised parser to obtain parse trees on ASR transcripts, and (2) direct training an unsupervised parser on continuous word-level speech representations. This is done by first splitting utterances into sequences of word-level segments, and aggregating self-supervised speech representations within segments to obtain segment embeddings. We find that separately training a parser on the unpaired text and directly applying it on ASR transcripts for inference produces better results for unsupervised parsing. Additionally, our results suggest that accurate segmentation alone may be sufficient to parse spoken sentences accurately. Finally, we show the direct approach may learn head-directionality correctly for both head-initial and head-final languages without any explicit inductive bias.
△ Less
Submitted 9 May, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
Self-supervised learning-based general laboratory progress pretrained model for cardiovascular event detection
Authors:
Li-Chin Chen,
Kuo-Hsuan Hung,
Yi-Ju Tseng,
Hsin-Yao Wang,
Tse-Min Lu,
Wei-Chieh Huang,
Yu Tsao
Abstract:
The inherent nature of patient data poses several challenges. Prevalent cases amass substantial longitudinal data owing to their patient volume and consistent follow-ups, however, longitudinal laboratory data are renowned for their irregularity, temporality, absenteeism, and sparsity; In contrast, recruitment for rare or specific cases is often constrained due to their limited patient size and epi…
▽ More
The inherent nature of patient data poses several challenges. Prevalent cases amass substantial longitudinal data owing to their patient volume and consistent follow-ups, however, longitudinal laboratory data are renowned for their irregularity, temporality, absenteeism, and sparsity; In contrast, recruitment for rare or specific cases is often constrained due to their limited patient size and episodic observations. This study employed self-supervised learning (SSL) to pretrain a generalized laboratory progress (GLP) model that captures the overall progression of six common laboratory markers in prevalent cardiovascular cases, with the intention of transferring this knowledge to aid in the detection of specific cardiovascular event. GLP implemented a two-stage training approach, leveraging the information embedded within interpolated data and amplify the performance of SSL. After GLP pretraining, it is transferred for TVR detection. The proposed two-stage training improved the performance of pure SSL, and the transferability of GLP exhibited distinctiveness. After GLP processing, the classification exhibited a notable enhancement, with averaged accuracy rising from 0.63 to 0.90. All evaluated metrics demonstrated substantial superiority (p < 0.01) compared to prior GLP processing. Our study effectively engages in translational engineering by transferring patient progression of cardiovascular laboratory parameters from one patient group to another, transcending the limitations of data availability. The transferability of disease progression optimized the strategies of examinations and treatments, and improves patient prognosis while using commonly available laboratory parameters. The potential for expanding this approach to encompass other diseases holds great promise.
△ Less
Submitted 7 September, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
Scale-Aware Crowd Counting Using a Joint Likelihood Density Map and Synthetic Fusion Pyramid Network
Authors:
Yi-Kuan Hsieh,
Jun-Wei Hsieh,
Yu-Chee Tseng,
Ming-Ching Chang,
Bor-Shiun Wang
Abstract:
We develop a Synthetic Fusion Pyramid Network (SPF-Net) with a scale-aware loss function design for accurate crowd counting. Existing crowd-counting methods assume that the training annotation points were accurate and thus ignore the fact that noisy annotations can lead to large model-learning bias and counting error, especially for counting highly dense crowds that appear far away. To the best of…
▽ More
We develop a Synthetic Fusion Pyramid Network (SPF-Net) with a scale-aware loss function design for accurate crowd counting. Existing crowd-counting methods assume that the training annotation points were accurate and thus ignore the fact that noisy annotations can lead to large model-learning bias and counting error, especially for counting highly dense crowds that appear far away. To the best of our knowledge, this work is the first to properly handle such noise at multiple scales in end-to-end loss design and thus push the crowd counting state-of-the-art. We model the noise of crowd annotation points as a Gaussian and derive the crowd probability density map from the input image. We then approximate the joint distribution of crowd density maps with the full covariance of multiple scales and derive a low-rank approximation for tractability and efficient implementation. The derived scale-aware loss function is used to train the SPF-Net. We show that it outperforms various loss functions on four public datasets: UCF-QNRF, UCF CC 50, NWPU and ShanghaiTech A-B datasets. The proposed SPF-Net can accurately predict the locations of people in the crowd, despite training on noisy training annotations.
△ Less
Submitted 2 January, 2023; v1 submitted 13 November, 2022;
originally announced November 2022.
-
MicroISP: Processing 32MP Photos on Mobile Devices with Deep Learning
Authors:
Andrey Ignatov,
Anastasia Sycheva,
Radu Timofte,
Yu Tseng,
Yu-Syuan Xu,
Po-Hsiang Yu,
Cheng-Ming Chiang,
Hsien-Kai Kuo,
Min-Hung Chen,
Chia-Ming Cheng,
Luc Van Gool
Abstract:
While neural networks-based photo processing solutions can provide a better image quality compared to the traditional ISP systems, their application to mobile devices is still very limited due to their very high computational complexity. In this paper, we present a novel MicroISP model designed specifically for edge devices, taking into account their computational and memory limitations. The propo…
▽ More
While neural networks-based photo processing solutions can provide a better image quality compared to the traditional ISP systems, their application to mobile devices is still very limited due to their very high computational complexity. In this paper, we present a novel MicroISP model designed specifically for edge devices, taking into account their computational and memory limitations. The proposed solution is capable of processing up to 32MP photos on recent smartphones using the standard mobile ML libraries and requiring less than 1 second to perform the inference, while for FullHD images it achieves real-time performance. The architecture of the model is flexible, allowing to adjust its complexity to devices of different computational power. To evaluate the performance of the model, we collected a novel Fujifilm UltraISP dataset consisting of thousands of paired photos captured with a normal mobile camera sensor and a professional 102MP medium-format FujiFilm GFX100 camera. The experiments demonstrated that, despite its compact size, the MicroISP model is able to provide comparable or better visual results than the traditional mobile ISP systems, while outperforming the previously proposed efficient deep learning based solutions. Finally, this model is also compatible with the latest mobile AI accelerators, achieving good runtime and low power consumption on smartphone NPUs and APUs. The code, dataset and pre-trained models are available on the project website: https://people.ee.ethz.ch/~ihnatova/microisp.html
△ Less
Submitted 8 November, 2022;
originally announced November 2022.
-
PyNet-V2 Mobile: Efficient On-Device Photo Processing With Neural Networks
Authors:
Andrey Ignatov,
Grigory Malivenko,
Radu Timofte,
Yu Tseng,
Yu-Syuan Xu,
Po-Hsiang Yu,
Cheng-Ming Chiang,
Hsien-Kai Kuo,
Min-Hung Chen,
Chia-Ming Cheng,
Luc Van Gool
Abstract:
The increased importance of mobile photography created a need for fast and performant RAW image processing pipelines capable of producing good visual results in spite of the mobile camera sensor limitations. While deep learning-based approaches can efficiently solve this problem, their computational requirements usually remain too large for high-resolution on-device image processing. To address th…
▽ More
The increased importance of mobile photography created a need for fast and performant RAW image processing pipelines capable of producing good visual results in spite of the mobile camera sensor limitations. While deep learning-based approaches can efficiently solve this problem, their computational requirements usually remain too large for high-resolution on-device image processing. To address this limitation, we propose a novel PyNET-V2 Mobile CNN architecture designed specifically for edge devices, being able to process RAW 12MP photos directly on mobile phones under 1.5 second and producing high perceptual photo quality. To train and to evaluate the performance of the proposed solution, we use the real-world Fujifilm UltraISP dataset consisting on thousands of RAW-RGB image pairs captured with a professional medium-format 102MP Fujifilm camera and a popular Sony mobile camera sensor. The results demonstrate that the PyNET-V2 Mobile model can substantially surpass the quality of tradition ISP pipelines, while outperforming the previously introduced neural network-based solutions designed for fast image processing. Furthermore, we show that the proposed architecture is also compatible with the latest mobile AI accelerators such as NPUs or APUs that can be used to further reduce the latency of the model to as little as 0.5 second. The dataset, code and pre-trained models used in this paper are available on the project website: https://github.com/gmalivenko/PyNET-v2
△ Less
Submitted 8 November, 2022;
originally announced November 2022.
-
Conversion of Legal Agreements into Smart Legal Contracts using NLP
Authors:
Eason Chen,
Niall Roche,
Yuen-Hsien Tseng,
Walter Hernandez,
Jiangbo Shangguan,
Alastair Moore
Abstract:
A Smart Legal Contract (SLC) is a specialized digital agreement comprising natural language and computable components. The Accord Project provides an open-source SLC framework containing three main modules: Cicero, Concerto, and Ergo. Currently, we need lawyers, programmers, and clients to work together with great effort to create a usable SLC using the Accord Project. This paper proposes a pipeli…
▽ More
A Smart Legal Contract (SLC) is a specialized digital agreement comprising natural language and computable components. The Accord Project provides an open-source SLC framework containing three main modules: Cicero, Concerto, and Ergo. Currently, we need lawyers, programmers, and clients to work together with great effort to create a usable SLC using the Accord Project. This paper proposes a pipeline to automate the SLC creation process with several Natural Language Processing (NLP) models to convert law contracts to the Accord Project's Concerto model. After evaluating the proposed pipeline, we discovered that our NER pipeline accurately detects CiceroMark from Accord Project template text with an accuracy of 0.8. Additionally, our Question Answering method can extract one-third of the Concerto variables from the template text. We also delve into some limitations and possible future research for the proposed pipeline. Finally, we describe a web interface enabling users to build SLCs. This interface leverages the proposed pipeline to convert text documents to Smart Legal Contracts by using NLP models.
△ Less
Submitted 5 April, 2023; v1 submitted 27 August, 2022;
originally announced October 2022.
-
A cusp-capturing PINN for elliptic interface problems
Authors:
Yu-Hau Tseng,
Te-Sheng Lin,
Wei-Fan Hu,
Ming-Chih Lai
Abstract:
In this paper, we propose a cusp-capturing physics-informed neural network (PINN) to solve discontinuous-coefficient elliptic interface problems whose solution is continuous but has discontinuous first derivatives on the interface. To find such a solution using neural network representation, we introduce a cusp-enforced level set function as an additional feature input to the network to retain the…
▽ More
In this paper, we propose a cusp-capturing physics-informed neural network (PINN) to solve discontinuous-coefficient elliptic interface problems whose solution is continuous but has discontinuous first derivatives on the interface. To find such a solution using neural network representation, we introduce a cusp-enforced level set function as an additional feature input to the network to retain the inherent solution properties; that is, capturing the solution cusps (where the derivatives are discontinuous) sharply. In addition, the proposed neural network has the advantage of being mesh-free, so it can easily handle problems in irregular domains. We train the network using the physics-informed framework in which the loss function comprises the residual of the differential equation together with certain interface and boundary conditions. We conduct a series of numerical experiments to demonstrate the effectiveness of the cusp-capturing technique and the accuracy of the present network model. Numerical results show that even using a one-hidden-layer (shallow) network with a moderate number of neurons and sufficient training data points, the present network model can achieve prediction accuracy comparable with traditional methods. Besides, if the solution is discontinuous across the interface, we can simply incorporate an additional supervised learning task for solution jump approximation into the present network without much difficulty.
△ Less
Submitted 16 April, 2023; v1 submitted 15 October, 2022;
originally announced October 2022.
-
On the Utility of Self-supervised Models for Prosody-related Tasks
Authors:
Guan-Ting Lin,
Chi-Luen Feng,
Wei-Ping Huang,
Yuan Tseng,
Tzu-Han Lin,
Chen-An Li,
Hung-yi Lee,
Nigel G. Ward
Abstract:
Self-Supervised Learning (SSL) from speech data has produced models that have achieved remarkable performance in many tasks, and that are known to implicitly represent many aspects of information latently present in speech signals. However, relatively little is known about the suitability of such models for prosody-related tasks or the extent to which they encode prosodic information. We present a…
▽ More
Self-Supervised Learning (SSL) from speech data has produced models that have achieved remarkable performance in many tasks, and that are known to implicitly represent many aspects of information latently present in speech signals. However, relatively little is known about the suitability of such models for prosody-related tasks or the extent to which they encode prosodic information. We present a new evaluation framework, SUPERB-prosody, consisting of three prosody-related downstream tasks and two pseudo tasks. We find that 13 of the 15 SSL models outperformed the baseline on all the prosody-related tasks. We also show good performance on two pseudo tasks: prosody reconstruction and future prosody prediction. We further analyze the layerwise contributions of the SSL models. Overall we conclude that SSL speech models are highly effective for prosody-related tasks.
△ Less
Submitted 26 October, 2022; v1 submitted 13 October, 2022;
originally announced October 2022.
-
An efficient neural-network and finite-difference hybrid method for elliptic interface problems with applications
Authors:
Wei-Fan Hu,
Te-Sheng Lin,
Yu-Hau Tseng,
Ming-Chih Lai
Abstract:
A new and efficient neural-network and finite-difference hybrid method is developed for solving Poisson equation in a regular domain with jump discontinuities on embedded irregular interfaces. Since the solution has low regularity across the interface, when applying finite difference discretization to this problem, an additional treatment accounting for the jump discontinuities must be employed. H…
▽ More
A new and efficient neural-network and finite-difference hybrid method is developed for solving Poisson equation in a regular domain with jump discontinuities on embedded irregular interfaces. Since the solution has low regularity across the interface, when applying finite difference discretization to this problem, an additional treatment accounting for the jump discontinuities must be employed. Here, we aim to elevate such an extra effort to ease our implementation by machine learning methodology. The key idea is to decompose the solution into singular and regular parts. The neural network learning machinery incorporating the given jump conditions finds the singular solution, while the standard five-point Laplacian discretization is used to obtain the regular solution with associated boundary conditions. Regardless of the interface geometry, these two tasks only require supervised learning for function approximation and a fast direct solver for Poisson equation, making the hybrid method easy to implement and efficient. The two- and three-dimensional numerical results show that the present hybrid method preserves second-order accuracy for the solution and its derivatives, and it is comparable with the traditional immersed interface method in the literature. As an application, we solve the Stokes equations with singular forces to demonstrate the robustness of the present method.
△ Less
Submitted 2 March, 2023; v1 submitted 11 October, 2022;
originally announced October 2022.
-
Focus Plus: Detect Learner's Distraction by Web Camera in Distance Teaching
Authors:
Eason Chen,
Yuen Hsien Tseng,
Kuo-Ping Lo
Abstract:
Distance teaching has become popular these years because of the COVID-19 epidemic. However, both students and teachers face several challenges in distance teaching, like being easy to distract. We proposed Focus+, a system designed to detect learners' status with the latest AI technology from their web camera to solve such challenges. By doing so, teachers can know students' status, and students c…
▽ More
Distance teaching has become popular these years because of the COVID-19 epidemic. However, both students and teachers face several challenges in distance teaching, like being easy to distract. We proposed Focus+, a system designed to detect learners' status with the latest AI technology from their web camera to solve such challenges. By doing so, teachers can know students' status, and students can regulate their learning experience. In this research, we will discuss the expected model's design for training and evaluating the AI detection model of Focus+.
△ Less
Submitted 9 October, 2022;
originally announced October 2022.
-
Orbeez-SLAM: A Real-time Monocular Visual SLAM with ORB Features and NeRF-realized Mapping
Authors:
Chi-Ming Chung,
Yang-Che Tseng,
Ya-Ching Hsu,
Xiang-Qian Shi,
Yun-Hung Hua,
Jia-Fong Yeh,
Wen-Chin Chen,
Yi-Ting Chen,
Winston H. Hsu
Abstract:
A spatial AI that can perform complex tasks through visual signals and cooperate with humans is highly anticipated. To achieve this, we need a visual SLAM that easily adapts to new scenes without pre-training and generates dense maps for downstream tasks in real-time. None of the previous learning-based and non-learning-based visual SLAMs satisfy all needs due to the intrinsic limitations of their…
▽ More
A spatial AI that can perform complex tasks through visual signals and cooperate with humans is highly anticipated. To achieve this, we need a visual SLAM that easily adapts to new scenes without pre-training and generates dense maps for downstream tasks in real-time. None of the previous learning-based and non-learning-based visual SLAMs satisfy all needs due to the intrinsic limitations of their components. In this work, we develop a visual SLAM named Orbeez-SLAM, which successfully collaborates with implicit neural representation and visual odometry to achieve our goals. Moreover, Orbeez-SLAM can work with the monocular camera since it only needs RGB inputs, making it widely applicable to the real world. Results show that our SLAM is up to 800x faster than the strong baseline with superior rendering outcomes. Code link: https://github.com/MarvinChung/Orbeez-SLAM.
△ Less
Submitted 31 January, 2023; v1 submitted 27 September, 2022;
originally announced September 2022.
-
A Survey on Open-Source-Defined Wireless Networks: Framework, Key Technology, and Implementation
Authors:
Liqiang Zhao,
Muhammad Muhammad Bala,
Wu Gang,
Pan Chengkang,
Yuan Yannan,
Tian Zhigang,
Yu-Chee Tseng,
Chen Xiang,
Bin Shen,
Chih-Lin I
Abstract:
The realization of open-source-defined wireless networks in the telecommunication domain is accomplished through the fifth-generation network (5G). In contrast to its predecessors (3G and 4G), the 5G network can support a wide variety of heterogeneous use cases with challenging requirements from both the Internet and the Internet of Things (IoT). The future sixth-generation (6G) network will not o…
▽ More
The realization of open-source-defined wireless networks in the telecommunication domain is accomplished through the fifth-generation network (5G). In contrast to its predecessors (3G and 4G), the 5G network can support a wide variety of heterogeneous use cases with challenging requirements from both the Internet and the Internet of Things (IoT). The future sixth-generation (6G) network will not only extend 5G capabilities but also innovate new functionalities to address emerging academic and engineering challenges. The research community has identified these challenges could be overcome by open-source-defined wireless networks, which is based on open-source software and hardware. In this survey, we present an overview of different aspects of open-source-defined wireless networks, comprising motivation, frameworks, key technologies, and implementation. We start by introducing the motivation and explore several frameworks with classification into three different categories: black-box, grey-box, and white-box. We review research efforts related to open-source-defined Core Network (CN), Radio Access Network (RAN), Multi-access Edge Computing (MEC), the capabilities of security threats, open-source hardware, and various implementations, including testbeds. The last but most important in this survey, lessons learned, future research direction, open research issues, pitfalls, and limitations of existing surveys on open-source wireless networks are included to motivate and encourage future research.
△ Less
Submitted 5 September, 2022;
originally announced September 2022.
-
VizWiz-FewShot: Locating Objects in Images Taken by People With Visual Impairments
Authors:
Yu-Yun Tseng,
Alexander Bell,
Danna Gurari
Abstract:
We introduce a few-shot localization dataset originating from photographers who authentically were trying to learn about the visual content in the images they took. It includes nearly 10,000 segmentations of 100 categories in over 4,500 images that were taken by people with visual impairments. Compared to existing few-shot object detection and instance segmentation datasets, our dataset is the fir…
▽ More
We introduce a few-shot localization dataset originating from photographers who authentically were trying to learn about the visual content in the images they took. It includes nearly 10,000 segmentations of 100 categories in over 4,500 images that were taken by people with visual impairments. Compared to existing few-shot object detection and instance segmentation datasets, our dataset is the first to locate holes in objects (e.g., found in 12.3\% of our segmentations), it shows objects that occupy a much larger range of sizes relative to the images, and text is over five times more common in our objects (e.g., found in 22.4\% of our segmentations). Analysis of three modern few-shot localization algorithms demonstrates that they generalize poorly to our new dataset. The algorithms commonly struggle to locate objects with holes, very small and very large objects, and objects lacking text. To encourage a larger community to work on these unsolved challenges, we publicly share our annotated few-shot dataset at https://vizwiz.org .
△ Less
Submitted 24 July, 2022;
originally announced July 2022.
-
Flexible Multiple-Objective Reinforcement Learning for Chip Placement
Authors:
Fu-Chieh Chang,
Yu-Wei Tseng,
Ya-Wen Yu,
Ssu-Rui Lee,
Alexandru Cioba,
I-Lun Tseng,
Da-shan Shiu,
Jhih-Wei Hsu,
Cheng-Yuan Wang,
Chien-Yi Yang,
Ren-Chu Wang,
Yao-Wen Chang,
Tai-Chen Chen,
Tung-Chieh Chen
Abstract:
Recently, successful applications of reinforcement learning to chip placement have emerged. Pretrained models are necessary to improve efficiency and effectiveness. Currently, the weights of objective metrics (e.g., wirelength, congestion, and timing) are fixed during pretraining. However, fixed-weighed models cannot generate the diversity of placements required for engineers to accommodate changi…
▽ More
Recently, successful applications of reinforcement learning to chip placement have emerged. Pretrained models are necessary to improve efficiency and effectiveness. Currently, the weights of objective metrics (e.g., wirelength, congestion, and timing) are fixed during pretraining. However, fixed-weighed models cannot generate the diversity of placements required for engineers to accommodate changing requirements as they arise. This paper proposes flexible multiple-objective reinforcement learning (MORL) to support objective functions with inference-time variable weights using just a single pretrained model. Our macro placement results show that MORL can generate the Pareto frontier of multiple objectives effectively.
△ Less
Submitted 13 April, 2022;
originally announced April 2022.
-
A Query-based Routing Table Update Mechanism for Content-Centric Network
Authors:
Pei-Hsuan Tsai,
Yu-Lin Tseng,
Jun-Bin Zhang,
Meng-Hsun Tsai
Abstract:
Due to the popularity of network applications, such as multimedia, online shopping, Internet of Things (IoT), and 5G, the contents cached in the routers are frequently replaced in Content-Centric Networking (CCN). Generally, cache miss causes numerous propagated packets to get the required content that deteriorates network congestion and delay the response time of consumers. Many caching strategie…
▽ More
Due to the popularity of network applications, such as multimedia, online shopping, Internet of Things (IoT), and 5G, the contents cached in the routers are frequently replaced in Content-Centric Networking (CCN). Generally, cache miss causes numerous propagated packets to get the required content that deteriorates network congestion and delay the response time of consumers. Many caching strategies and routing policies were proposed to solve the problem. This paper presents an alternative solution by designing a query-based routing table update mechanism to increase the accuracy of routing tables. By adding an additional query content in interest packets, our approach real-time explores the cached content in routers and updated the routing table accordingly. This paper uses a general network simulator, ndnSIM, to compare basic CCN and our approach. The results show that our approach improves the response time of consumers and network congestion and is compatible with general forwarding strategies.
△ Less
Submitted 21 June, 2021;
originally announced June 2021.
-
Learned Smartphone ISP on Mobile NPUs with Deep Learning, Mobile AI 2021 Challenge: Report
Authors:
Andrey Ignatov,
Cheng-Ming Chiang,
Hsien-Kai Kuo,
Anastasia Sycheva,
Radu Timofte,
Min-Hung Chen,
Man-Yu Lee,
Yu-Syuan Xu,
Yu Tseng,
Shusong Xu,
Jin Guo,
Chao-Hung Chen,
Ming-Chun Hsyu,
Wen-Chia Tsai,
Chao-Wei Chen,
Grigory Malivenko,
Minsu Kwon,
Myungje Lee,
Jaeyoon Yoo,
Changbeom Kang,
Shinjo Wang,
Zheng Shaolong,
Hao Dejun,
Xie Fen,
Feng Zhuang
, et al. (16 additional authors not shown)
Abstract:
As the quality of mobile cameras starts to play a crucial role in modern smartphones, more and more attention is now being paid to ISP algorithms used to improve various perceptual aspects of mobile photos. In this Mobile AI challenge, the target was to develop an end-to-end deep learning-based image signal processing (ISP) pipeline that can replace classical hand-crafted ISPs and achieve nearly r…
▽ More
As the quality of mobile cameras starts to play a crucial role in modern smartphones, more and more attention is now being paid to ISP algorithms used to improve various perceptual aspects of mobile photos. In this Mobile AI challenge, the target was to develop an end-to-end deep learning-based image signal processing (ISP) pipeline that can replace classical hand-crafted ISPs and achieve nearly real-time performance on smartphone NPUs. For this, the participants were provided with a novel learned ISP dataset consisting of RAW-RGB image pairs captured with the Sony IMX586 Quad Bayer mobile sensor and a professional 102-megapixel medium format camera. The runtime of all models was evaluated on the MediaTek Dimensity 1000+ platform with a dedicated AI processing unit capable of accelerating both floating-point and quantized neural networks. The proposed solutions are fully compatible with the above NPU and are capable of processing Full HD photos under 60-100 milliseconds while achieving high fidelity results. A detailed description of all models developed in this challenge is provided in this paper.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
A Human-Computer Duet System for Music Performance
Authors:
Yuen-Jen Lin,
Hsuan-Kai Kao,
Yih-Chih Tseng,
Ming Tsai,
Li Su
Abstract:
Virtual musicians have become a remarkable phenomenon in the contemporary multimedia arts. However, most of the virtual musicians nowadays have not been endowed with abilities to create their own behaviors, or to perform music with human musicians. In this paper, we firstly create a virtual violinist, who can collaborate with a human pianist to perform chamber music automatically without any inter…
▽ More
Virtual musicians have become a remarkable phenomenon in the contemporary multimedia arts. However, most of the virtual musicians nowadays have not been endowed with abilities to create their own behaviors, or to perform music with human musicians. In this paper, we firstly create a virtual violinist, who can collaborate with a human pianist to perform chamber music automatically without any intervention. The system incorporates the techniques from various fields, including real-time music tracking, pose estimation, and body movement generation. In our system, the virtual musician's behavior is generated based on the given music audio alone, and such a system results in a low-cost, efficient and scalable way to produce human and virtual musicians' co-performance. The proposed system has been validated in public concerts. Objective quality assessment approaches and possible ways to systematically improve the system are also discussed.
△ Less
Submitted 16 September, 2020;
originally announced September 2020.
-
Efficient Network Function Backup by Update Piggybacking
Authors:
Kate Ching-Ju Lin,
Ruei-Yong Hong,
Yu-Chee Tseng
Abstract:
Network Function Virtualization (NFV) and Service Function Chaining (SFC) have been widely used to enable flexible and agile network management. To enhance reliability, some research has proposed to deploy backup function instances for prompt recovery when a primary instance fails. While most of the recent studies focus on speeding up recovery, less attention has been paid to the problem of minimi…
▽ More
Network Function Virtualization (NFV) and Service Function Chaining (SFC) have been widely used to enable flexible and agile network management. To enhance reliability, some research has proposed to deploy backup function instances for prompt recovery when a primary instance fails. While most of the recent studies focus on speeding up recovery, less attention has been paid to the problem of minimizing the state update cost. In this work, we present PiggyBackup (Piggyback-based Backup), an efficient backup instance deployment and update protocol. Our key idea is to reuse the existing service chains traversing through servers in a network to help piggyback the update information. By doing this, we eliminate the header overhead and reduce the amount of update traffic significantly. To realize such a piggyback-based update more efficiently, we investigate the backup instance deployment and chain selection problems to enhance piggybacking opportunities and reduce the forwarding hop counts with explicit consideration of the distribution of service chains. Our simulation results show that PiggyBackup reduces the average overall update overhead by 47.65% and 39.56%, respectively, in a fat-tree topology as compared to random deployment and shortest path based deployment.
△ Less
Submitted 15 May, 2020;
originally announced May 2020.
-
Deploying Image Deblurring across Mobile Devices: A Perspective of Quality and Latency
Authors:
Cheng-Ming Chiang,
Yu Tseng,
Yu-Syuan Xu,
Hsien-Kai Kuo,
Yi-Min Tsai,
Guan-Yu Chen,
Koan-Sin Tan,
Wei-Ting Wang,
Yu-Chieh Lin,
Shou-Yao Roy Tseng,
Wei-Shiang Lin,
Chia-Lin Yu,
BY Shen,
Kloze Kao,
Chia-Ming Cheng,
Hung-Jen Chen
Abstract:
Recently, image enhancement and restoration have become important applications on mobile devices, such as super-resolution and image deblurring. However, most state-of-the-art networks present extremely high computational complexity. This makes them difficult to be deployed on mobile devices with acceptable latency. Moreover, when deploying to different mobile devices, there is a large latency var…
▽ More
Recently, image enhancement and restoration have become important applications on mobile devices, such as super-resolution and image deblurring. However, most state-of-the-art networks present extremely high computational complexity. This makes them difficult to be deployed on mobile devices with acceptable latency. Moreover, when deploying to different mobile devices, there is a large latency variation due to the difference and limitation of deep learning accelerators on mobile devices. In this paper, we conduct a search of portable network architectures for better quality-latency trade-off across mobile devices. We further present the effectiveness of widely used network optimizations for image deblurring task. This paper provides comprehensive experiments and comparisons to uncover the in-depth analysis for both latency and image quality. Through all the above works, we demonstrate the successful deployment of image deblurring application on mobile devices with the acceleration of deep learning accelerators. To the best of our knowledge, this is the first paper that addresses all the deployment issues of image deblurring task across mobile devices. This paper provides practical deployment-guidelines, and is adopted by the championship-winning team in NTIRE 2020 Image Deblurring Challenge on Smartphone Track.
△ Less
Submitted 27 April, 2020;
originally announced April 2020.
-
Unified Dynamic Convolutional Network for Super-Resolution with Variational Degradations
Authors:
Yu-Syuan Xu,
Shou-Yao Roy Tseng,
Yu Tseng,
Hsien-Kai Kuo,
Yi-Min Tsai
Abstract:
Deep Convolutional Neural Networks (CNNs) have achieved remarkable results on Single Image Super-Resolution (SISR). Despite considering only a single degradation, recent studies also include multiple degrading effects to better reflect real-world cases. However, most of the works assume a fixed combination of degrading effects, or even train an individual network for different combinations. Instea…
▽ More
Deep Convolutional Neural Networks (CNNs) have achieved remarkable results on Single Image Super-Resolution (SISR). Despite considering only a single degradation, recent studies also include multiple degrading effects to better reflect real-world cases. However, most of the works assume a fixed combination of degrading effects, or even train an individual network for different combinations. Instead, a more practical approach is to train a single network for wide-ranging and variational degradations. To fulfill this requirement, this paper proposes a unified network to accommodate the variations from inter-image (cross-image variations) and intra-image (spatial variations). Different from the existing works, we incorporate dynamic convolution which is a far more flexible alternative to handle different variations. In SISR with non-blind setting, our Unified Dynamic Convolutional Network for Variational Degradations (UDVD) is evaluated on both synthetic and real images with an extensive set of variations. The qualitative results demonstrate the effectiveness of UDVD over various existing works. Extensive experiments show that our UDVD achieves favorable or comparable performance on both synthetic and real images.
△ Less
Submitted 15 April, 2020;
originally announced April 2020.
-
Peanut Maturity Classification using Hyperspectral Imagery
Authors:
Sheng Zou,
Yu-Chien Tseng,
Alina Zare,
Diane Rowland,
Barry Tillman,
Seung-Chul Yoon
Abstract:
Seed maturity in peanut (Arachis hypogaea L.) determines economic return to a producer because of its impact on seed weight (yield), and critically influences seed vigor and other quality characteristics. During seed development, the inner mesocarp layer of the pericarp (hull) transitions in color from white to black as the seed matures. The maturity assessment process involves the removal of the…
▽ More
Seed maturity in peanut (Arachis hypogaea L.) determines economic return to a producer because of its impact on seed weight (yield), and critically influences seed vigor and other quality characteristics. During seed development, the inner mesocarp layer of the pericarp (hull) transitions in color from white to black as the seed matures. The maturity assessment process involves the removal of the exocarp of the hull and visually categorizing the mesocarp color into varying color classes from immature (white, yellow, orange) to mature (brown, and black). This visual color classification is time consuming because the exocarp must be manually removed. In addition, the visual classification process involves human assessment of colors, which leads to large variability of color classification from observer to observer. A more objective, digital imaging approach to peanut maturity is needed, optimally without the requirement of removal of the hull's exocarp. This study examined the use of a hyperspectral imaging (HSI) process to determine pod maturity with intact pericarps. The HSI method leveraged spectral differences between mature and immature pods within a classification algorithm to identify the mature and immature pods. The results showed a high classification accuracy with consistency using samples from different years and cultivars. In addition, the proposed method was capable of estimating a continuous-valued, pixel-level maturity value for individual peanut pods, allowing for a valuable tool that can be utilized in seed quality research. This new method solves issues of labor intensity and subjective error that all current methods of peanut maturity determination have.
△ Less
Submitted 24 October, 2019; v1 submitted 20 October, 2019;
originally announced October 2019.
-
CoachAI: A Project for Microscopic Badminton Match Data Collection and Tactical Analysis
Authors:
Tzu-Han Hsu,
Ching-Hsuan Chen,
Nyan Ping Ju,
Tsì-Uí İk,
Wen-Chih Peng,
Chih-Chuan Wang,
Yu-Shuen Wang,
Yuan-Hsiang Lin,
Yu-Chee Tseng,
Jiun-Long Huang,
Yu-Tai Ching
Abstract:
Computer vision based object tracking has been used to annotate and augment sports video. For sports learning and training, video replay is often used in post-match review and training review for tactical analysis and movement analysis. For automatically and systematically competition data collection and tactical analysis, a project called CoachAI has been supported by the Ministry of Science and…
▽ More
Computer vision based object tracking has been used to annotate and augment sports video. For sports learning and training, video replay is often used in post-match review and training review for tactical analysis and movement analysis. For automatically and systematically competition data collection and tactical analysis, a project called CoachAI has been supported by the Ministry of Science and Technology, Taiwan. The proposed project also includes research of data visualization, connected training auxiliary devices, and data warehouse. Deep learning techniques will be used to develop video-based real-time microscopic competition data collection based on broadcast competition video. Machine learning techniques will be used to develop a tactical analysis. To reveal data in more understandable forms and to help in pre-match training, AR/VR techniques will be used to visualize data, tactics, and so on. In addition, training auxiliary devices including smart badminton rackets and connected serving machines will be developed based on the IoT technology to further utilize competition data and tactical data and boost training efficiency. Especially, the connected serving machines will be developed to perform specified tactics and to interact with players in their training.
△ Less
Submitted 12 July, 2019;
originally announced July 2019.
-
Online Energy-Efficient Scheduling for Timely Information Downloads in Mobile Networks
Authors:
Yi-Hsuan Tseng,
Yu-Pin Hsu
Abstract:
We consider a mobile network where a mobile device is running an application that requires timely information. The information at the device can be updated by downloading the latest information through neighboring access points. The freshness of the information at the device is characterized by the recently proposed age of information. However, minimizing the age of information by frequent downloa…
▽ More
We consider a mobile network where a mobile device is running an application that requires timely information. The information at the device can be updated by downloading the latest information through neighboring access points. The freshness of the information at the device is characterized by the recently proposed age of information. However, minimizing the age of information by frequent downloading increases power consumption of the device. In this context, an energy-efficient scheduling algorithm for timely information downloads is critical, especially for power-limited mobile devices. Moreover, unpredictable movement of the mobile device causes uncertainty of the channel dynamics, which is even non-stationary within a finite amount of time for running the application. Thus, in this paper we devise a randomized online scheduling algorithm for mobile devices, which can move arbitrarily and run the application for any amount of time. We show that the expected total cost incurred by the proposed algorithm, including an age cost and a downloading cost, is (asymptotically) at most e/(e-1) ~ 1.58 times the minimum total cost achieved by an optimal offline scheduling algorithm.
△ Less
Submitted 30 April, 2019; v1 submitted 10 January, 2019;
originally announced January 2019.
-
DRCD: a Chinese Machine Reading Comprehension Dataset
Authors:
Chih Chieh Shao,
Trois Liu,
Yuting Lai,
Yiying Tseng,
Sam Tsai
Abstract:
In this paper, we introduce DRCD (Delta Reading Comprehension Dataset), an open domain traditional Chinese machine reading comprehension (MRC) dataset. This dataset aimed to be a standard Chinese machine reading comprehension dataset, which can be a source dataset in transfer learning. The dataset contains 10,014 paragraphs from 2,108 Wikipedia articles and 30,000+ questions generated by annotator…
▽ More
In this paper, we introduce DRCD (Delta Reading Comprehension Dataset), an open domain traditional Chinese machine reading comprehension (MRC) dataset. This dataset aimed to be a standard Chinese machine reading comprehension dataset, which can be a source dataset in transfer learning. The dataset contains 10,014 paragraphs from 2,108 Wikipedia articles and 30,000+ questions generated by annotators. We build a baseline model that achieves an F1 score of 89.59%. F1 score of Human performance is 93.30%.
△ Less
Submitted 28 May, 2019; v1 submitted 3 June, 2018;
originally announced June 2018.
-
Fusing Video and Inertial Sensor Data for Walking Person Identification
Authors:
Yuehong Huang,
Yu-Chee Tseng
Abstract:
An autonomous computer system (such as a robot) typically needs to identify, locate, and track persons appearing in its sight. However, most solutions have their limitations regarding efficiency, practicability, or environmental constraints. In this paper, we propose an effective and practical system which combines video and inertial sensors for person identification (PID). Persons who do differen…
▽ More
An autonomous computer system (such as a robot) typically needs to identify, locate, and track persons appearing in its sight. However, most solutions have their limitations regarding efficiency, practicability, or environmental constraints. In this paper, we propose an effective and practical system which combines video and inertial sensors for person identification (PID). Persons who do different activities are easy to identify. To show the robustness and potential of our system, we propose a walking person identification (WPID) method to identify persons walking at the same time. By comparing features derived from both video and inertial sensor data, we can associate sensors in smartphones with human objects in videos. Results show that the correctly identified rate of our WPID method can up to 76% in 2 seconds.
△ Less
Submitted 20 February, 2018;
originally announced February 2018.
-
Distributed Intrusion Detection of Byzantine Attacks in Wireless Networks with Random Linear Network Coding
Authors:
Jen-Yeu Chen,
Yi-ying Tseng
Abstract:
Network coding is an elegant technique where, instead of simply relaying the packets of information they receive, the nodes of a network are allowed to combine \emph{several} packets together for transmission and this technique can be used to achieve the maximum possible information flow in a network and save the needed number of packet transmissions. Moreover, in an energy-constraint wireless net…
▽ More
Network coding is an elegant technique where, instead of simply relaying the packets of information they receive, the nodes of a network are allowed to combine \emph{several} packets together for transmission and this technique can be used to achieve the maximum possible information flow in a network and save the needed number of packet transmissions. Moreover, in an energy-constraint wireless network such as Wireless Sensor Network (a typical type of wireless ad hoc network), applying network coding to reduce the number of wireless transmissions can also prolong the life time of sensor nodes. Although applying network coding in a wireless sensor network is obviously beneficial, due to the operation that one transmitting information is actually combination of multiple other information, it is possible that an error propagation may occur in the network. This special characteristic also exposes network coding system to a wide range of error attacks, especially Byzantine attacks. When some adversary nodes generate error data in the network with network coding, those erroneous information will be mixed at intermeidate nodes and thus corrupt all the information reaching a destination. Recent research efforts have shown that network coding can be combined with classical error control codes and cryptography for secure communication or misbehavior detection. Nevertheless, when it comes to Byzantine attacks, these results have limited effect. In fact, unless we find out those adversary nodes and isolate them, network coding may perform much worse than pure routing in the presence of malicious nodes. In this paper, a distributed hierarchical algorithm based on random linear network coding is developed to detect, locate and isolate malicious nodes.
△ Less
Submitted 11 March, 2013;
originally announced March 2013.