Skip to main content

Showing 1–50 of 50 results for author: Tseng, Y

  1. arXiv:2407.04245  [pdf, other

    cs.CV

    Every Pixel Has its Moments: Ultra-High-Resolution Unpaired Image-to-Image Translation via Dense Normalization

    Authors: Ming-Yang Ho, Che-Ming Wu, Min-Sheng Wu, Yufeng Jane Tseng

    Abstract: Recent advancements in ultra-high-resolution unpaired image-to-image translation have aimed to mitigate the constraints imposed by limited GPU memory through patch-wise inference. Nonetheless, existing methods often compromise between the reduction of noticeable tiling artifacts and the preservation of color and hue contrast, attributed to the reliance on global image- or patch-level statistics in… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  2. arXiv:2406.07237  [pdf, other

    eess.AS cs.SD

    CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems

    Authors: Haibin Wu, Yuan Tseng, Hung-yi Lee

    Abstract: Current state-of-the-art (SOTA) codec-based audio synthesis systems can mimic anyone's voice with just a 3-second sample from that specific unseen speaker. Unfortunately, malicious attackers may exploit these technologies, causing misuse and security issues. Anti-spoofing models have been developed to detect fake speech. However, the open question of whether current SOTA anti-spoofing models can e… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024, project page: https://codecfake.github.io/

  3. arXiv:2406.05755  [pdf, other

    cs.CV

    A DeNoising FPN With Transformer R-CNN for Tiny Object Detection

    Authors: Hou-I Liu, Yu-Wen Tseng, Kai-Cheng Chang, Pin-Jyun Wang, Hong-Han Shuai, Wen-Huang Cheng

    Abstract: Despite notable advancements in the field of computer vision, the precise detection of tiny objects continues to pose a significant challenge, largely owing to the minuscule pixel representation allocated to these objects in imagery data. This challenge resonates profoundly in the domain of geoscience and remote sensing, where high-fidelity detection of tiny objects can facilitate a myriad of appl… ▽ More

    Submitted 15 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: The article is accepted by IEEE Transactions on Geoscience and Remote Sensing. Our code will be available at https://github.com/hoiliu-0801/DNTR

  4. arXiv:2406.01356  [pdf, other

    cs.CV

    MP-PolarMask: A Faster and Finer Instance Segmentation for Concave Images

    Authors: Ke-Lei Wang, Pin-Hsuan Chou, Young-Ching Chou, Chia-Jen Liu, Cheng-Kuan Lin, Yu-Chee Tseng

    Abstract: While there are a lot of models for instance segmentation, PolarMask stands out as a unique one that represents an object by a Polar coordinate system. With an anchor-box-free design and a single-stage framework that conducts detection and segmentation at one time, PolarMask is proved to be able to balance efficiency and accuracy. Hence, it can be easily connected with other downstream real-time a… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  5. arXiv:2406.01171  [pdf, other

    cs.CL

    Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization

    Authors: Yu-Min Tseng, Yu-Chao Huang, Teng-Yun Hsiao, Wei-Lin Chen, Chao-Wei Huang, Yu Meng, Yun-Nung Chen

    Abstract: The concept of persona, originally adopted in dialogue literature, has re-surged as a promising framework for tailoring large language models (LLMs) to specific context (e.g., personalized search, LLM-as-a-judge). However, the growing research on leveraging persona in LLMs is relatively disorganized and lacks a systematic taxonomy. To close the gap, we present a comprehensive survey to categorize… ▽ More

    Submitted 26 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: 8-page version

  6. arXiv:2405.07006  [pdf, other

    cs.CL

    Word-specific tonal realizations in Mandarin

    Authors: Yu-Ying Chuang, Melanie J. Bell, Yu-Hsiang Tseng, R. Harald Baayen

    Abstract: The pitch contours of Mandarin two-character words are generally understood as being shaped by the underlying tones of the constituent single-character words, in interaction with articulatory constraints imposed by factors such as speech rate, co-articulation with adjacent tones, segmental make-up, and predictability. This study shows that tonal realization is also partially determined by words' m… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  7. arXiv:2404.16670  [pdf, other

    cs.CV cs.AI

    EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning

    Authors: Hongxia Xie, Chu-Jun Peng, Yu-Wen Tseng, Hung-Jen Chen, Chan-Feng Hsu, Hong-Han Shuai, Wen-Huang Cheng

    Abstract: Visual Instruction Tuning represents a novel learning paradigm involving the fine-tuning of pre-trained language models using task-specific instructions. This paradigm shows promising zero-shot results in various natural language processing tasks but is still unexplored in vision emotion understanding. In this work, we focus on enhancing the model's proficiency in understanding and adhering to ins… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  8. Help Supporters: Exploring the Design Space of Assistive Technologies to Support Face-to-Face Help Between Blind and Sighted Strangers

    Authors: Yuanyang Teng, Connor Courtien, David Angel Rios, Yves M. Tseng, Jacqueline Gibson, Maryam Aziz, Avery Reyna, Rajan Vaish, Brian A. Smith

    Abstract: Blind and low-vision (BLV) people face many challenges when venturing into public environments, often wishing it were easier to get help from people nearby. Ironically, while many sighted individuals are willing to help, such interactions are infrequent. Asking for help is socially awkward for BLV people, and sighted people lack experience in helping BLV people. Through a mixed-ability research-th… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: To Appear In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) Association for Computing Machinery, New York, NY, USA. 24 pages

  9. arXiv:2403.04785  [pdf, other

    cs.CL cs.AI

    Large Language Multimodal Models for 5-Year Chronic Disease Cohort Prediction Using EHR Data

    Authors: Jun-En Ding, Phan Nguyen Minh Thao, Wen-Chih Peng, Jian-Zhe Wang, Chun-Cheng Chug, Min-Chen Hsieh, Yun-Chien Tseng, Ling Chen, Dongsheng Luo, Chi-Te Wang, Pei-fu Chen, Feng Liu, Fang-Ming Hung

    Abstract: Chronic diseases such as diabetes are the leading causes of morbidity and mortality worldwide. Numerous research studies have been attempted with various deep learning models in diagnosis. However, most previous studies had certain limitations, including using publicly available datasets (e.g. MIMIC), and imbalanced data. In this study, we collected five-year electronic health records (EHRs) from… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  10. arXiv:2402.03988  [pdf, other

    eess.AS cs.CL cs.SD

    REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR

    Authors: Liang-Hsuan Tseng, En-Pei Hu, Cheng-Han Chiang, Yuan Tseng, Hung-yi Lee, Lin-shan Lee, Shao-Hua Sun

    Abstract: Unsupervised automatic speech recognition (ASR) aims to learn the mapping between the speech signal and its corresponding textual transcription without the supervision of paired speech-text data. A word/phoneme in the speech signal is represented by a segment of speech signal with variable length and unknown boundary, and this segmental structure makes learning the mapping between speech and text… ▽ More

    Submitted 28 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  11. arXiv:2401.09758  [pdf, other

    cs.CL

    Resolving Regular Polysemy in Named Entities

    Authors: Shu-Kai Hsieh, Yu-Hsiang Tseng, Hsin-Yu Chou, Ching-Wen Yang, Yu-Yun Chang

    Abstract: Word sense disambiguation primarily addresses the lexical ambiguity of common words based on a predefined sense inventory. Conversely, proper names are usually considered to denote an ad-hoc real-world referent. Once the reference is decided, the ambiguity is purportedly resolved. However, proper names also exhibit ambiguities through appellativization, i.e., they act like common words and may den… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

  12. arXiv:2312.16771  [pdf, other

    cs.CV

    Scale-Aware Crowd Count Network with Annotation Error Correction

    Authors: Yi-Kuan Hsieh, Jun-Wei Hsieh, Yu-Chee Tseng, Ming-Ching Chang, Li Xin

    Abstract: Traditional crowd counting networks suffer from information loss when feature maps are downsized through pooling layers, leading to inaccuracies in counting crowds at a distance. Existing methods often assume correct annotations during training, disregarding the impact of noisy annotations, especially in crowded scenes. Furthermore, the use of a fixed Gaussian kernel fails to account for the varyi… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

    Comments: 7 pages, 6 figues. arXiv admin note: text overlap with arXiv:2211.06835

  13. arXiv:2312.02362  [pdf, other

    cs.CV cs.GR

    PointNeRF++: A multi-scale, point-based Neural Radiance Field

    Authors: Weiwei Sun, Eduard Trulls, Yang-Che Tseng, Sneha Sambandam, Gopal Sharma, Andrea Tagliasacchi, Kwang Moo Yi

    Abstract: Point clouds offer an attractive source of information to complement images in neural scene representations, especially when few images are available. Neural rendering methods based on point clouds do exist, but they do not perform well when the point cloud quality is low -- e.g., sparse or incomplete, which is often the case with real-world data. We overcome these problems with a simple represent… ▽ More

    Submitted 21 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Project website: https://pointnerfpp.github.io/

  14. arXiv:2311.08677  [pdf, other

    cs.LG cs.DC cs.IT stat.ML

    Federated Learning for Sparse Principal Component Analysis

    Authors: Sin Cheng Ciou, Pin Jui Chen, Elvin Y. Tseng, Yuh-Jye Lee

    Abstract: In the rapidly evolving realm of machine learning, algorithm effectiveness often faces limitations due to data quality and availability. Traditional approaches grapple with data sharing due to legal and privacy concerns. The federated learning framework addresses this challenge. Federated learning is a decentralized approach where model training occurs on client sides, preserving privacy by keepin… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: 11 pages, 7 figures, 1 table. Accepted by IEEE BigData 2023, Sorrento, Italy

  15. arXiv:2309.12337  [pdf

    cs.CY cs.AI

    ActiveAI: Introducing AI Literacy for Middle School Learners with Goal-based Scenario Learning

    Authors: Ying Jui Tseng, Gautam Yadav

    Abstract: The ActiveAI project addresses key challenges in AI education for grades 7-9 students by providing an engaging AI literacy learning experience based on the AI4K12 knowledge framework. Utilizing learning science mechanisms such as goal-based scenarios, immediate feedback, project-based learning, and intelligent agents, the app incorporates a variety of learner inputs like sliders, steppers, and col… ▽ More

    Submitted 21 August, 2023; originally announced September 2023.

  16. arXiv:2309.10787  [pdf, other

    eess.AS cs.CV cs.MM cs.SD

    AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models

    Authors: Yuan Tseng, Layne Berry, Yi-Ting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Po-Yao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Shinji Watanabe, Abdelrahman Mohamed, Chi-Luen Feng, Hung-yi Lee

    Abstract: Audio-visual representation learning aims to develop systems with human-like perception by utilizing correlation between auditory and visual information. However, current models often focus on a limited set of tasks, and generalization abilities of learned representations are unclear. To this end, we propose the AV-SUPERB benchmark that enables general-purpose evaluation of unimodal audio/visual a… ▽ More

    Submitted 19 March, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024; Evaluation Code: https://github.com/roger-tseng/av-superb Submission Platform: https://av.superbbenchmark.org

  17. arXiv:2308.04872  [pdf, other

    cs.CV

    Tracking Players in a Badminton Court by Two Cameras

    Authors: Young-Ching Chou, Shen-Ru Zhang, Bo-Wei Chen, Hong-Qi Chen, Cheng-Kuan Lin, Yu-Chee Tseng

    Abstract: This study proposes a simple method for multi-object tracking (MOT) of players in a badminton court. We leverage two off-the-shelf cameras, one on the top of the court and the other on the side of the court. The one on the top is to track players' trajectories, while the one on the side is to analyze the pixel features of players. By computing the correlations between adjacent frames and engaging… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

  18. arXiv:2307.10168  [pdf, other

    cs.CL cs.HC

    LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs

    Authors: Tongshuang Wu, Haiyi Zhu, Maya Albayrak, Alexis Axon, Amanda Bertsch, Wenxing Deng, Ziqi Ding, Bill Guo, Sireesh Gururaja, Tzu-Sheng Kuo, Jenny T. Liang, Ryan Liu, Ihita Mandal, Jeremiah Milbauer, Xiaolin Ni, Namrata Padmanabhan, Subhashini Ramkumar, Alexis Sudjianto, Jordan Taylor, Ying-Jui Tseng, Patricia Vaidos, Zhijin Wu, Wei Wu, Chenyang Yang

    Abstract: LLMs have shown promise in replicating human-like behavior in crowdsourcing tasks that were previously thought to be exclusive to human abilities. However, current efforts focus mainly on simple atomic tasks. We explore whether LLMs can replicate more complex crowdsourcing pipelines. We find that modern LLMs can simulate some of crowdworkers' abilities in these "human computation algorithms," but… ▽ More

    Submitted 19 July, 2023; v1 submitted 19 July, 2023; originally announced July 2023.

  19. arXiv:2306.00190  [pdf, other

    cs.HC

    Contextualizing Problems to Student Interests at Scale in Intelligent Tutoring System Using Large Language Models

    Authors: Gautam Yadav, Ying-Jui Tseng, Xiaolin Ni

    Abstract: Contextualizing problems to align with student interests can significantly improve learning outcomes. However, this task often presents scalability challenges due to resource and time constraints. Recent advancements in Large Language Models (LLMs) like GPT-4 offer potential solutions to these issues. This study explores the ability of GPT-4 in the contextualization of problems within CTAT, an int… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

  20. arXiv:2305.17855  [pdf, other

    cs.CL

    Vec2Gloss: definition modeling leveraging contextualized vectors with Wordnet gloss

    Authors: Yu-Hsiang Tseng, Mao-Chang Ku, Wei-Ling Chen, Yu-Lin Chang, Shu-Kai Hsieh

    Abstract: Contextualized embeddings are proven to be powerful tools in multiple NLP tasks. Nonetheless, challenges regarding their interpretability and capability to represent lexical semantics still remain. In this paper, we propose that the task of definition modeling, which aims to generate the human-readable definition of the word, provides a route to evaluate or understand the high dimensional semantic… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

  21. arXiv:2305.17663  [pdf, other

    cs.CL

    Lexical Retrieval Hypothesis in Multimodal Context

    Authors: Po-Ya Angela Wang, Pin-Er Chen, Hsin-Yu Chou, Yu-Hsiang Tseng, Shu-Kai Hsieh

    Abstract: Multimodal corpora have become an essential language resource for language science and grounded natural language processing (NLP) systems due to the growing need to understand and interpret human communication across various channels. In this paper, we first present our efforts in building the first Multimodal Corpus for Languages in Taiwan (MultiMoco). Based on the corpus, we conduct a case study… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

  22. arXiv:2305.14616  [pdf, other

    cs.CL cs.CV

    Exploring Affordance and Situated Meaning in Image Captions: A Multimodal Analysis

    Authors: Pin-Er Chen, Po-Ya Angela Wang, Hsin-Yu Chou, Yu-Hsiang Tseng, Shu-Kai Hsieh

    Abstract: This paper explores the grounding issue regarding multimodal semantic representation from a computational cognitive-linguistic view. We annotate images from the Flickr30k dataset with five perceptual properties: Affordance, Perceptual Salience, Object Number, Gaze Cueing, and Ecological Niche Association (ENA), and examine their association with textual elements in the image captions. Our findings… ▽ More

    Submitted 24 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: 10 pages, 9 figures

  23. arXiv:2305.01863  [pdf

    cs.HC cs.AI cs.CL cs.SE

    GPTutor: a ChatGPT-powered programming tool for code explanation

    Authors: Eason Chen, Ray Huang, Han-Shin Chen, Yuen-Hsien Tseng, Liang-Yi Li

    Abstract: Learning new programming skills requires tailored guidance. With the emergence of advanced Natural Language Generation models like the ChatGPT API, there is now a possibility of creating a convenient and personalized tutoring system with AI for computer science education. This paper presents GPTutor, a ChatGPT-powered programming tool, which is a Visual Studio Code extension using the ChatGPT API… ▽ More

    Submitted 15 June, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: 6 pages. International Conference on Artificial Intelligence in Education 2023

  24. arXiv:2303.09279  [pdf, other

    cs.CR cs.MM

    Privacy-Preserving Video Conferencing via Thermal-Generative Images

    Authors: Sheng-Yang Chiu, Yu-Ting Huang, Chieh-Ting Lin, Yu-Chee Tseng, Jen-Jee Chen, Meng-Hsuan Tu, Bo-Chen Tung, YuJou Nieh

    Abstract: Due to the COVID-19 epidemic, video conferencing has evolved as a new paradigm of communication and teamwork. However, private and personal information can be easily leaked through cameras during video conferencing. This includes leakage of a person's appearance as well as the contents in the background. This paper proposes a novel way of using online low-resolution thermal images as conditions to… ▽ More

    Submitted 28 March, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: Accepted for publication at IEEE International Conference on Robotics and Automation (ICRA) 2023

  25. arXiv:2303.08809  [pdf, other

    cs.CL eess.AS

    Cascading and Direct Approaches to Unsupervised Constituency Parsing on Spoken Sentences

    Authors: Yuan Tseng, Cheng-I Lai, Hung-yi Lee

    Abstract: Past work on unsupervised parsing is constrained to written form. In this paper, we present the first study on unsupervised spoken constituency parsing given unlabeled spoken sentences and unpaired textual data. The goal is to determine the spoken sentences' hierarchical syntactic structure in the form of constituency parse trees, such that each node is a span of audio that corresponds to a consti… ▽ More

    Submitted 9 May, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: Accepted to ICASSP 2023; updated compute resource acknowledgements

  26. Self-supervised learning-based general laboratory progress pretrained model for cardiovascular event detection

    Authors: Li-Chin Chen, Kuo-Hsuan Hung, Yi-Ju Tseng, Hsin-Yao Wang, Tse-Min Lu, Wei-Chieh Huang, Yu Tsao

    Abstract: The inherent nature of patient data poses several challenges. Prevalent cases amass substantial longitudinal data owing to their patient volume and consistent follow-ups, however, longitudinal laboratory data are renowned for their irregularity, temporality, absenteeism, and sparsity; In contrast, recruitment for rare or specific cases is often constrained due to their limited patient size and epi… ▽ More

    Submitted 7 September, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

    Comments: published in IEEE Journal of Translational Engineering in Health & Medicine

    Journal ref: IEEE Journal of Translational Engineering in Health and Medicine, vol.12, p.43-56, 2023

  27. arXiv:2211.06835  [pdf, other

    cs.CV cs.AI

    Scale-Aware Crowd Counting Using a Joint Likelihood Density Map and Synthetic Fusion Pyramid Network

    Authors: Yi-Kuan Hsieh, Jun-Wei Hsieh, Yu-Chee Tseng, Ming-Ching Chang, Bor-Shiun Wang

    Abstract: We develop a Synthetic Fusion Pyramid Network (SPF-Net) with a scale-aware loss function design for accurate crowd counting. Existing crowd-counting methods assume that the training annotation points were accurate and thus ignore the fact that noisy annotations can lead to large model-learning bias and counting error, especially for counting highly dense crowds that appear far away. To the best of… ▽ More

    Submitted 2 January, 2023; v1 submitted 13 November, 2022; originally announced November 2022.

    Comments: 8 pages, 8 figures, 4 tables

  28. arXiv:2211.06770  [pdf, other

    cs.CV cs.LG eess.IV

    MicroISP: Processing 32MP Photos on Mobile Devices with Deep Learning

    Authors: Andrey Ignatov, Anastasia Sycheva, Radu Timofte, Yu Tseng, Yu-Syuan Xu, Po-Hsiang Yu, Cheng-Ming Chiang, Hsien-Kai Kuo, Min-Hung Chen, Chia-Ming Cheng, Luc Van Gool

    Abstract: While neural networks-based photo processing solutions can provide a better image quality compared to the traditional ISP systems, their application to mobile devices is still very limited due to their very high computational complexity. In this paper, we present a novel MicroISP model designed specifically for edge devices, taking into account their computational and memory limitations. The propo… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2211.06263

  29. arXiv:2211.06263  [pdf, other

    cs.CV cs.LG eess.IV

    PyNet-V2 Mobile: Efficient On-Device Photo Processing With Neural Networks

    Authors: Andrey Ignatov, Grigory Malivenko, Radu Timofte, Yu Tseng, Yu-Syuan Xu, Po-Hsiang Yu, Cheng-Ming Chiang, Hsien-Kai Kuo, Min-Hung Chen, Chia-Ming Cheng, Luc Van Gool

    Abstract: The increased importance of mobile photography created a need for fast and performant RAW image processing pipelines capable of producing good visual results in spite of the mobile camera sensor limitations. While deep learning-based approaches can efficiently solve this problem, their computational requirements usually remain too large for high-resolution on-device image processing. To address th… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

  30. Conversion of Legal Agreements into Smart Legal Contracts using NLP

    Authors: Eason Chen, Niall Roche, Yuen-Hsien Tseng, Walter Hernandez, Jiangbo Shangguan, Alastair Moore

    Abstract: A Smart Legal Contract (SLC) is a specialized digital agreement comprising natural language and computable components. The Accord Project provides an open-source SLC framework containing three main modules: Cicero, Concerto, and Ergo. Currently, we need lawyers, programmers, and clients to work together with great effort to create a usable SLC using the Accord Project. This paper proposes a pipeli… ▽ More

    Submitted 5 April, 2023; v1 submitted 27 August, 2022; originally announced October 2022.

    Comments: 7 pages, Companion Proceedings of the ACM Web Conference 2023 (WWW '23 Companion), April 30-May 4, 2023, Austin, TX, USA

    MSC Class: 68T50 ACM Class: I.7

  31. A cusp-capturing PINN for elliptic interface problems

    Authors: Yu-Hau Tseng, Te-Sheng Lin, Wei-Fan Hu, Ming-Chih Lai

    Abstract: In this paper, we propose a cusp-capturing physics-informed neural network (PINN) to solve discontinuous-coefficient elliptic interface problems whose solution is continuous but has discontinuous first derivatives on the interface. To find such a solution using neural network representation, we introduce a cusp-enforced level set function as an additional feature input to the network to retain the… ▽ More

    Submitted 16 April, 2023; v1 submitted 15 October, 2022; originally announced October 2022.

  32. arXiv:2210.07185  [pdf, other

    cs.CL eess.AS

    On the Utility of Self-supervised Models for Prosody-related Tasks

    Authors: Guan-Ting Lin, Chi-Luen Feng, Wei-Ping Huang, Yuan Tseng, Tzu-Han Lin, Chen-An Li, Hung-yi Lee, Nigel G. Ward

    Abstract: Self-Supervised Learning (SSL) from speech data has produced models that have achieved remarkable performance in many tasks, and that are known to implicitly represent many aspects of information latently present in speech signals. However, relatively little is known about the suitability of such models for prosody-related tasks or the extent to which they encode prosodic information. We present a… ▽ More

    Submitted 26 October, 2022; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: Accepted to IEEE SLT 2022

  33. An efficient neural-network and finite-difference hybrid method for elliptic interface problems with applications

    Authors: Wei-Fan Hu, Te-Sheng Lin, Yu-Hau Tseng, Ming-Chih Lai

    Abstract: A new and efficient neural-network and finite-difference hybrid method is developed for solving Poisson equation in a regular domain with jump discontinuities on embedded irregular interfaces. Since the solution has low regularity across the interface, when applying finite difference discretization to this problem, an additional treatment accounting for the jump discontinuities must be employed. H… ▽ More

    Submitted 2 March, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

    Journal ref: Commun. Comput. Phys., Vol. 33, pp.1090-1105 (2023)

  34. arXiv:2210.04400  [pdf

    cs.HC cs.AI

    Focus Plus: Detect Learner's Distraction by Web Camera in Distance Teaching

    Authors: Eason Chen, Yuen Hsien Tseng, Kuo-Ping Lo

    Abstract: Distance teaching has become popular these years because of the COVID-19 epidemic. However, both students and teachers face several challenges in distance teaching, like being easy to distract. We proposed Focus+, a system designed to detect learners' status with the latest AI technology from their web camera to solve such challenges. By doing so, teachers can know students' status, and students c… ▽ More

    Submitted 9 October, 2022; originally announced October 2022.

    Comments: 5 Pages, 4 Figures, 2021 National Chair Professorship Academic Series: Teaching and Learning in Pandemic Era

  35. arXiv:2209.13274  [pdf, other

    cs.RO cs.CV

    Orbeez-SLAM: A Real-time Monocular Visual SLAM with ORB Features and NeRF-realized Mapping

    Authors: Chi-Ming Chung, Yang-Che Tseng, Ya-Ching Hsu, Xiang-Qian Shi, Yun-Hung Hua, Jia-Fong Yeh, Wen-Chin Chen, Yi-Ting Chen, Winston H. Hsu

    Abstract: A spatial AI that can perform complex tasks through visual signals and cooperate with humans is highly anticipated. To achieve this, we need a visual SLAM that easily adapts to new scenes without pre-training and generates dense maps for downstream tasks in real-time. None of the previous learning-based and non-learning-based visual SLAMs satisfy all needs due to the intrinsic limitations of their… ▽ More

    Submitted 31 January, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

  36. arXiv:2209.01891  [pdf, other

    cs.NI

    A Survey on Open-Source-Defined Wireless Networks: Framework, Key Technology, and Implementation

    Authors: Liqiang Zhao, Muhammad Muhammad Bala, Wu Gang, Pan Chengkang, Yuan Yannan, Tian Zhigang, Yu-Chee Tseng, Chen Xiang, Bin Shen, Chih-Lin I

    Abstract: The realization of open-source-defined wireless networks in the telecommunication domain is accomplished through the fifth-generation network (5G). In contrast to its predecessors (3G and 4G), the 5G network can support a wide variety of heterogeneous use cases with challenging requirements from both the Internet and the Internet of Things (IoT). The future sixth-generation (6G) network will not o… ▽ More

    Submitted 5 September, 2022; originally announced September 2022.

  37. arXiv:2207.11810  [pdf, other

    cs.CV

    VizWiz-FewShot: Locating Objects in Images Taken by People With Visual Impairments

    Authors: Yu-Yun Tseng, Alexander Bell, Danna Gurari

    Abstract: We introduce a few-shot localization dataset originating from photographers who authentically were trying to learn about the visual content in the images they took. It includes nearly 10,000 segmentations of 100 categories in over 4,500 images that were taken by people with visual impairments. Compared to existing few-shot object detection and instance segmentation datasets, our dataset is the fir… ▽ More

    Submitted 24 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022. The first two authors contributed equally

  38. arXiv:2204.06407  [pdf, other

    cs.LG cs.AI

    Flexible Multiple-Objective Reinforcement Learning for Chip Placement

    Authors: Fu-Chieh Chang, Yu-Wei Tseng, Ya-Wen Yu, Ssu-Rui Lee, Alexandru Cioba, I-Lun Tseng, Da-shan Shiu, Jhih-Wei Hsu, Cheng-Yuan Wang, Chien-Yi Yang, Ren-Chu Wang, Yao-Wen Chang, Tai-Chen Chen, Tung-Chieh Chen

    Abstract: Recently, successful applications of reinforcement learning to chip placement have emerged. Pretrained models are necessary to improve efficiency and effectiveness. Currently, the weights of objective metrics (e.g., wirelength, congestion, and timing) are fixed during pretraining. However, fixed-weighed models cannot generate the diversity of placements required for engineers to accommodate changi… ▽ More

    Submitted 13 April, 2022; originally announced April 2022.

    Comments: A short version of this article is published in DAC'22:LBR (see ACM DOI 10.1145/3489517.3530617)

  39. A Query-based Routing Table Update Mechanism for Content-Centric Network

    Authors: Pei-Hsuan Tsai, Yu-Lin Tseng, Jun-Bin Zhang, Meng-Hsun Tsai

    Abstract: Due to the popularity of network applications, such as multimedia, online shopping, Internet of Things (IoT), and 5G, the contents cached in the routers are frequently replaced in Content-Centric Networking (CCN). Generally, cache miss causes numerous propagated packets to get the required content that deteriorates network congestion and delay the response time of consumers. Many caching strategie… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Comments: 6 pages, 14 figures, conference. ISBN:978-1-7281-9256-7

    ACM Class: C.2.2

    Journal ref: 2020 International Computer Symposium (ICS), 2020, pp. 266-271

  40. arXiv:2105.07809  [pdf, other

    eess.IV cs.CV cs.LG

    Learned Smartphone ISP on Mobile NPUs with Deep Learning, Mobile AI 2021 Challenge: Report

    Authors: Andrey Ignatov, Cheng-Ming Chiang, Hsien-Kai Kuo, Anastasia Sycheva, Radu Timofte, Min-Hung Chen, Man-Yu Lee, Yu-Syuan Xu, Yu Tseng, Shusong Xu, Jin Guo, Chao-Hung Chen, Ming-Chun Hsyu, Wen-Chia Tsai, Chao-Wei Chen, Grigory Malivenko, Minsu Kwon, Myungje Lee, Jaeyoon Yoo, Changbeom Kang, Shinjo Wang, Zheng Shaolong, Hao Dejun, Xie Fen, Feng Zhuang , et al. (16 additional authors not shown)

    Abstract: As the quality of mobile cameras starts to play a crucial role in modern smartphones, more and more attention is now being paid to ISP algorithms used to improve various perceptual aspects of mobile photos. In this Mobile AI challenge, the target was to develop an end-to-end deep learning-based image signal processing (ISP) pipeline that can replace classical hand-crafted ISPs and achieve nearly r… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

    Comments: Mobile AI 2021 Workshop and Challenges: https://ai-benchmark.com/workshops/mai/2021/

  41. A Human-Computer Duet System for Music Performance

    Authors: Yuen-Jen Lin, Hsuan-Kai Kao, Yih-Chih Tseng, Ming Tsai, Li Su

    Abstract: Virtual musicians have become a remarkable phenomenon in the contemporary multimedia arts. However, most of the virtual musicians nowadays have not been endowed with abilities to create their own behaviors, or to perform music with human musicians. In this paper, we firstly create a virtual violinist, who can collaborate with a human pianist to perform chamber music automatically without any inter… ▽ More

    Submitted 16 September, 2020; originally announced September 2020.

  42. arXiv:2005.07580  [pdf, ps, other

    cs.NI

    Efficient Network Function Backup by Update Piggybacking

    Authors: Kate Ching-Ju Lin, Ruei-Yong Hong, Yu-Chee Tseng

    Abstract: Network Function Virtualization (NFV) and Service Function Chaining (SFC) have been widely used to enable flexible and agile network management. To enhance reliability, some research has proposed to deploy backup function instances for prompt recovery when a primary instance fails. While most of the recent studies focus on speeding up recovery, less attention has been paid to the problem of minimi… ▽ More

    Submitted 15 May, 2020; originally announced May 2020.

  43. arXiv:2004.12599  [pdf, other

    cs.CV eess.IV

    Deploying Image Deblurring across Mobile Devices: A Perspective of Quality and Latency

    Authors: Cheng-Ming Chiang, Yu Tseng, Yu-Syuan Xu, Hsien-Kai Kuo, Yi-Min Tsai, Guan-Yu Chen, Koan-Sin Tan, Wei-Ting Wang, Yu-Chieh Lin, Shou-Yao Roy Tseng, Wei-Shiang Lin, Chia-Lin Yu, BY Shen, Kloze Kao, Chia-Ming Cheng, Hung-Jen Chen

    Abstract: Recently, image enhancement and restoration have become important applications on mobile devices, such as super-resolution and image deblurring. However, most state-of-the-art networks present extremely high computational complexity. This makes them difficult to be deployed on mobile devices with acceptable latency. Moreover, when deploying to different mobile devices, there is a large latency var… ▽ More

    Submitted 27 April, 2020; originally announced April 2020.

    Comments: CVPR 2020 Workshop on New Trends in Image Restoration and Enhancement (NTIRE)

  44. arXiv:2004.06965  [pdf, other

    eess.IV cs.CV

    Unified Dynamic Convolutional Network for Super-Resolution with Variational Degradations

    Authors: Yu-Syuan Xu, Shou-Yao Roy Tseng, Yu Tseng, Hsien-Kai Kuo, Yi-Min Tsai

    Abstract: Deep Convolutional Neural Networks (CNNs) have achieved remarkable results on Single Image Super-Resolution (SISR). Despite considering only a single degradation, recent studies also include multiple degrading effects to better reflect real-world cases. However, most of the works assume a fixed combination of degrading effects, or even train an individual network for different combinations. Instea… ▽ More

    Submitted 15 April, 2020; originally announced April 2020.

    Comments: CVPR 2020

  45. arXiv:1910.11122  [pdf

    cs.CV cs.LG eess.IV

    Peanut Maturity Classification using Hyperspectral Imagery

    Authors: Sheng Zou, Yu-Chien Tseng, Alina Zare, Diane Rowland, Barry Tillman, Seung-Chul Yoon

    Abstract: Seed maturity in peanut (Arachis hypogaea L.) determines economic return to a producer because of its impact on seed weight (yield), and critically influences seed vigor and other quality characteristics. During seed development, the inner mesocarp layer of the pericarp (hull) transitions in color from white to black as the seed matures. The maturity assessment process involves the removal of the… ▽ More

    Submitted 24 October, 2019; v1 submitted 20 October, 2019; originally announced October 2019.

  46. arXiv:1907.12888  [pdf, other

    cs.CV cs.LG cs.MM

    CoachAI: A Project for Microscopic Badminton Match Data Collection and Tactical Analysis

    Authors: Tzu-Han Hsu, Ching-Hsuan Chen, Nyan Ping Ju, Tsì-Uí İk, Wen-Chih Peng, Chih-Chuan Wang, Yu-Shuen Wang, Yuan-Hsiang Lin, Yu-Chee Tseng, Jiun-Long Huang, Yu-Tai Ching

    Abstract: Computer vision based object tracking has been used to annotate and augment sports video. For sports learning and training, video replay is often used in post-match review and training review for tactical analysis and movement analysis. For automatically and systematically competition data collection and tactical analysis, a project called CoachAI has been supported by the Ministry of Science and… ▽ More

    Submitted 12 July, 2019; originally announced July 2019.

  47. arXiv:1901.03137  [pdf, ps, other

    cs.NI cs.IT

    Online Energy-Efficient Scheduling for Timely Information Downloads in Mobile Networks

    Authors: Yi-Hsuan Tseng, Yu-Pin Hsu

    Abstract: We consider a mobile network where a mobile device is running an application that requires timely information. The information at the device can be updated by downloading the latest information through neighboring access points. The freshness of the information at the device is characterized by the recently proposed age of information. However, minimizing the age of information by frequent downloa… ▽ More

    Submitted 30 April, 2019; v1 submitted 10 January, 2019; originally announced January 2019.

    Comments: 10 pages, technical report for the ISIT 2019 paper

  48. arXiv:1806.00920  [pdf

    cs.CL

    DRCD: a Chinese Machine Reading Comprehension Dataset

    Authors: Chih Chieh Shao, Trois Liu, Yuting Lai, Yiying Tseng, Sam Tsai

    Abstract: In this paper, we introduce DRCD (Delta Reading Comprehension Dataset), an open domain traditional Chinese machine reading comprehension (MRC) dataset. This dataset aimed to be a standard Chinese machine reading comprehension dataset, which can be a source dataset in transfer learning. The dataset contains 10,014 paragraphs from 2,108 Wikipedia articles and 30,000+ questions generated by annotator… ▽ More

    Submitted 28 May, 2019; v1 submitted 3 June, 2018; originally announced June 2018.

    Comments: 5 pages

  49. arXiv:1802.07021  [pdf, other

    cs.CV

    Fusing Video and Inertial Sensor Data for Walking Person Identification

    Authors: Yuehong Huang, Yu-Chee Tseng

    Abstract: An autonomous computer system (such as a robot) typically needs to identify, locate, and track persons appearing in its sight. However, most solutions have their limitations regarding efficiency, practicability, or environmental constraints. In this paper, we propose an effective and practical system which combines video and inertial sensors for person identification (PID). Persons who do differen… ▽ More

    Submitted 20 February, 2018; originally announced February 2018.

  50. arXiv:1303.2553  [pdf, other

    cs.NI

    Distributed Intrusion Detection of Byzantine Attacks in Wireless Networks with Random Linear Network Coding

    Authors: Jen-Yeu Chen, Yi-ying Tseng

    Abstract: Network coding is an elegant technique where, instead of simply relaying the packets of information they receive, the nodes of a network are allowed to combine \emph{several} packets together for transmission and this technique can be used to achieve the maximum possible information flow in a network and save the needed number of packet transmissions. Moreover, in an energy-constraint wireless net… ▽ More

    Submitted 11 March, 2013; originally announced March 2013.

    Journal ref: International Journal of Distributed Sensor Networks, Volume 2012 (2012), Article ID 758340, 10 pages