subscribe to arXiv mailings

MVG-Splatting: Multi-View Guided Gaussian Splatting with Adaptive Quantile-Based Geometric Consistency Densification

Authors: Zhuoxiao Li, Shanliang Yao, Yijie Chu, Angel F. Garcia-Fernandez, Yong Yue, Eng Gee Lim, Xiaohui Zhu

Abstract: In the rapidly evolving field of 3D reconstruction, 3D Gaussian Splatting (3DGS) and 2D Gaussian Splatting (2DGS) represent significant advancements. Although 2DGS compresses 3D Gaussian primitives into 2D Gaussian surfels to effectively enhance mesh extraction quality, this compression can potentially lead to a decrease in rendering quality. Additionally, unreliable densification processes and th… ▽ More In the rapidly evolving field of 3D reconstruction, 3D Gaussian Splatting (3DGS) and 2D Gaussian Splatting (2DGS) represent significant advancements. Although 2DGS compresses 3D Gaussian primitives into 2D Gaussian surfels to effectively enhance mesh extraction quality, this compression can potentially lead to a decrease in rendering quality. Additionally, unreliable densification processes and the calculation of depth through the accumulation of opacity can compromise the detail of mesh extraction. To address this issue, we introduce MVG-Splatting, a solution guided by Multi-View considerations. Specifically, we integrate an optimized method for calculating normals, which, combined with image gradients, helps rectify inconsistencies in the original depth computations. Additionally, utilizing projection strategies akin to those in Multi-View Stereo (MVS), we propose an adaptive quantile-based method that dynamically determines the level of additional densification guided by depth maps, from coarse to fine detail. Experimental evidence demonstrates that our method not only resolves the issues of rendering quality degradation caused by depth discrepancies but also facilitates direct mesh extraction from dense Gaussian point clouds using the Marching Cubes algorithm. This approach significantly enhances the overall fidelity and accuracy of the 3D reconstruction process, ensuring that both the geometric details and visual quality. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: https://mvgsplatting.github.io

arXiv:2407.08906 [pdf, other]

AirSketch: Generative Motion to Sketch

Authors: Hui Xian Grace Lim, Xuanming Cui, Yogesh S Rawat, Ser-Nam Lim

Abstract: Illustration is a fundamental mode of human expression and communication. Certain types of motion that accompany speech can provide this illustrative mode of communication. While Augmented and Virtual Reality technologies (AR/VR) have introduced tools for producing drawings with hand motions (air drawing), they typically require costly hardware and additional digital markers, thereby limiting thei… ▽ More Illustration is a fundamental mode of human expression and communication. Certain types of motion that accompany speech can provide this illustrative mode of communication. While Augmented and Virtual Reality technologies (AR/VR) have introduced tools for producing drawings with hand motions (air drawing), they typically require costly hardware and additional digital markers, thereby limiting their accessibility and portability. Furthermore, air drawing demands considerable skill to achieve aesthetic results. To address these challenges, we introduce the concept of AirSketch, aimed at generating faithful and visually coherent sketches directly from hand motions, eliminating the need for complicated headsets or markers. We devise a simple augmentation-based self-supervised training procedure, enabling a controllable image diffusion model to learn to translate from highly noisy hand tracking images to clean, aesthetically pleasing sketches, while preserving the essential visual cues from the original tracking data. We present two air drawing datasets to study this problem. Our findings demonstrate that beyond producing photo-realistic images from precise spatial inputs, controllable image diffusion can effectively produce a refined, clear sketch from a noisy input. Our work serves as an initial step towards marker-less air drawing and reveals distinct applications of controllable diffusion models to AirSketch and AR/VR in general. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2405.12821 [pdf, other]

Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression Comprehension

Authors: Runwei Guan, Ruixiao Zhang, Ningwei Ouyang, Jianan Liu, Ka Lok Man, Xiaohao Cai, Ming Xu, Jeremy Smith, Eng Gee Lim, Yutao Yue, Hui Xiong

Abstract: Embodied perception is essential for intelligent vehicles and robots, enabling more natural interaction and task execution. However, these advancements currently embrace vision level, rarely focusing on using 3D modeling sensors, which limits the full understanding of surrounding objects with multi-granular characteristics. Recently, as a promising automotive sensor with affordable cost, 4D Millim… ▽ More Embodied perception is essential for intelligent vehicles and robots, enabling more natural interaction and task execution. However, these advancements currently embrace vision level, rarely focusing on using 3D modeling sensors, which limits the full understanding of surrounding objects with multi-granular characteristics. Recently, as a promising automotive sensor with affordable cost, 4D Millimeter-Wave radar provides denser point clouds than conventional radar and perceives both semantic and physical characteristics of objects, thus enhancing the reliability of perception system. To foster the development of natural language-driven context understanding in radar scenes for 3D grounding, we construct the first dataset, Talk2Radar, which bridges these two modalities for 3D Referring Expression Comprehension. Talk2Radar contains 8,682 referring prompt samples with 20,558 referred objects. Moreover, we propose a novel model, T-RadarNet for 3D REC upon point clouds, achieving state-of-the-art performances on Talk2Radar dataset compared with counterparts, where Deformable-FPN and Gated Graph Fusion are meticulously designed for efficient point cloud feature modeling and cross-modal fusion between radar and text features, respectively. Further, comprehensive experiments are conducted to give a deep insight into radar-based 3D REC. We release our project at https://github.com/GuanRunwei/Talk2Radar. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 8 pages, 5 figures

arXiv:2404.11922 [pdf, other]

Redefining the Shortest Path Problem Formulation of the Linear Non-Gaussian Acyclic Model: Pairwise Likelihood Ratios, Prior Knowledge, and Path Enumeration

Authors: Hans Jarett J. Ong, Brian Godwin S. Lim

Abstract: Effective causal discovery is essential for learning the causal graph from observational data. The linear non-Gaussian acyclic model (LiNGAM) operates under the assumption of a linear data generating process with non-Gaussian noise in determining the causal graph. Its assumption of unmeasured confounders being absent, however, poses practical limitations. In response, empirical research has shown… ▽ More Effective causal discovery is essential for learning the causal graph from observational data. The linear non-Gaussian acyclic model (LiNGAM) operates under the assumption of a linear data generating process with non-Gaussian noise in determining the causal graph. Its assumption of unmeasured confounders being absent, however, poses practical limitations. In response, empirical research has shown that the reformulation of LiNGAM as a shortest path problem (LiNGAM-SPP) addresses this limitation. Within LiNGAM-SPP, mutual information is chosen to serve as the measure of independence. A challenge is introduced - parameter tuning is now needed due to its reliance on kNN mutual information estimators. The paper proposes a threefold enhancement to the LiNGAM-SPP framework. First, the need for parameter tuning is eliminated by using the pairwise likelihood ratio in lieu of kNN-based mutual information. This substitution is validated on a general data generating process and benchmark real-world data sets, outperforming existing methods especially when given a larger set of features. The incorporation of prior knowledge is then enabled by a node-skipping strategy implemented on the graph representation of all causal orderings to eliminate violations based on the provided input of relative orderings. Flexibility relative to existing approaches is achieved. Last among the three enhancements is the utilization of the distribution of paths in the graph representation of all causal orderings. From this, crucial properties of the true causal graph such as the presence of unmeasured confounders and sparsity may be inferred. To some extent, the expected performance of the causal discovery algorithm may be predicted. The refinements above advance the practicality and performance of LiNGAM-SPP, showcasing the potential of graph-search-based methodologies in advancing causal discovery. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.10342 [pdf, other]

Referring Flexible Image Restoration

Authors: Runwei Guan, Rongsheng Hu, Zhuhao Zhou, Tianlang Xue, Ka Lok Man, Jeremy Smith, Eng Gee Lim, Weiping Ding, Yutao Yue

Abstract: In reality, images often exhibit multiple degradations, such as rain and fog at night (triple degradations). However, in many cases, individuals may not want to remove all degradations, for instance, a blurry lens revealing a beautiful snowy landscape (double degradations). In such scenarios, people may only desire to deblur. These situations and requirements shed light on a new challenge in image… ▽ More In reality, images often exhibit multiple degradations, such as rain and fog at night (triple degradations). However, in many cases, individuals may not want to remove all degradations, for instance, a blurry lens revealing a beautiful snowy landscape (double degradations). In such scenarios, people may only desire to deblur. These situations and requirements shed light on a new challenge in image restoration, where a model must perceive and remove specific degradation types specified by human commands in images with multiple degradations. We term this task Referring Flexible Image Restoration (RFIR). To address this, we first construct a large-scale synthetic dataset called RFIR, comprising 153,423 samples with the degraded image, text prompt for specific degradation removal and restored image. RFIR consists of five basic degradation types: blur, rain, haze, low light and snow while six main sub-categories are included for varying degrees of degradation removal. To tackle the challenge, we propose a novel transformer-based multi-task model named TransRFIR, which simultaneously perceives degradation types in the degraded image and removes specific degradation upon text prompt. TransRFIR is based on two devised attention modules, Multi-Head Agent Self-Attention (MHASA) and Multi-Head Agent Cross Attention (MHACA), where MHASA and MHACA introduce the agent token and reach the linear complexity, achieving lower computation cost than vanilla self-attention and cross-attention and obtaining competitive performances. Our TransRFIR achieves state-of-the-art performances compared with other counterparts and is proven as an effective architecture for image restoration. We release our project at https://github.com/GuanRunwei/FIR-CP. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: 15 pages, 19 figures

arXiv:2404.05213 [pdf, other]

Evaluation of an LLM in Identifying Logical Fallacies: A Call for Rigor When Adopting LLMs in HCI Research

Authors: Gionnieve Lim, Simon T. Perrault

Abstract: There is increasing interest in the adoption of LLMs in HCI research. However, LLMs may often be regarded as a panacea because of their powerful capabilities with an accompanying oversight on whether they are suitable for their intended tasks. We contend that LLMs should be adopted in a critical manner following rigorous evaluation. Accordingly, we present the evaluation of an LLM in identifying l… ▽ More There is increasing interest in the adoption of LLMs in HCI research. However, LLMs may often be regarded as a panacea because of their powerful capabilities with an accompanying oversight on whether they are suitable for their intended tasks. We contend that LLMs should be adopted in a critical manner following rigorous evaluation. Accordingly, we present the evaluation of an LLM in identifying logical fallacies that will form part of a digital misinformation intervention. By comparing to a labeled dataset, we found that GPT-4 achieves an accuracy of 0.79, and for our intended use case that excludes invalid or unidentified instances, an accuracy of 0.90. This gives us the confidence to proceed with the application of the LLM while keeping in mind the areas where it still falls short. The paper describes our evaluation approach, results and reflections on the use of the LLM for our intended task. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2403.12928 [pdf, other]

Rapid AIdeation: Generating Ideas With the Self and in Collaboration With Large Language Models

Authors: Gionnieve Lim, Simon T. Perrault

Abstract: Generative artificial intelligence (GenAI) can rapidly produce large and diverse volumes of content. This lends to it a quality of creativity which can be empowering in the early stages of design. In seeking to understand how creative ways to address practical issues can be conceived between humans and GenAI, we conducted a rapid ideation workshop with 21 participants where they used a large langu… ▽ More Generative artificial intelligence (GenAI) can rapidly produce large and diverse volumes of content. This lends to it a quality of creativity which can be empowering in the early stages of design. In seeking to understand how creative ways to address practical issues can be conceived between humans and GenAI, we conducted a rapid ideation workshop with 21 participants where they used a large language model (LLM) to brainstorm potential solutions and evaluate them. We found that the LLM produced a greater variety of ideas that were of high quality, though not necessarily of higher quality than human-generated ideas. Participants typically prompted in a straightforward manner with concise instructions. We also observed two collaborative dynamics with the LLM fulfilling a consulting role or an assisting role depending on the goals of the users. Notably, we observed an atypical anti-collaboration dynamic where participants used an antagonistic approach to prompt the LLM. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.12916 [pdf, other]

doi 10.1145/3623809.3623856

Effects of Automated Misinformation Warning Labels on the Intents to Like, Comment and Share Posts

Authors: Gionnieve Lim, Simon T. Perrault

Abstract: With fact-checking by professionals being difficult to scale on social media, algorithmic techniques have been considered. However, it is uncertain how the public may react to labels by automated fact-checkers. In this study, we investigate the use of automated warning labels derived from misinformation detection literature and investigate their effects on three forms of post engagement. Focusing… ▽ More With fact-checking by professionals being difficult to scale on social media, algorithmic techniques have been considered. However, it is uncertain how the public may react to labels by automated fact-checkers. In this study, we investigate the use of automated warning labels derived from misinformation detection literature and investigate their effects on three forms of post engagement. Focusing on political posts, we also consider how partisanship affects engagement. In a two-phases within-subjects experiment with 200 participants, we found that the generic warnings suppressed intents to comment on and share posts, but not on the intent to like them. Furthermore, when different reasons for the labels were provided, their effects on post engagement were inconsistent, suggesting that the reasons could have undesirably motivated engagement instead. Partisanship effects were observed across the labels with higher engagement for politically congruent posts. We discuss the implications on the design and use of automated warning labels. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Journal ref: HAI 2023

arXiv:2403.12913 [pdf, other]

doi 10.1007/978-94-024-2225-2_11

Fact Checking Chatbot: A Misinformation Intervention for Instant Messaging Apps and an Analysis of Trust in the Fact Checkers

Authors: Gionnieve Lim, Simon T. Perrault

Abstract: In Singapore, there has been a rise in misinformation on mobile instant messaging services (MIMS). MIMS support both small peer-to-peer networks and large groups. Misinformation in the former may spread due to recipients' trust in the sender while in the latter, misinformation can directly reach a wide audience. The encryption of MIMS makes it difficult to address misinformation directly. As such,… ▽ More In Singapore, there has been a rise in misinformation on mobile instant messaging services (MIMS). MIMS support both small peer-to-peer networks and large groups. Misinformation in the former may spread due to recipients' trust in the sender while in the latter, misinformation can directly reach a wide audience. The encryption of MIMS makes it difficult to address misinformation directly. As such, chatbots have become an alternative solution where users can disclose their chat content directly to fact checking services. To understand how effective fact checking chatbots are as an intervention and how trust in three different fact checkers (i.e., Government, News Outlets, and Artificial Intelligence) may affect this trust, we conducted a within-subjects experiment with 527 Singapore residents. We found mixed results for the fact checkers but support for the chatbot intervention overall. We also found a striking contradiction between participants' trust in the fact checkers and their behaviour towards them. Specifically, those who reported a high level of trust in the government performed worse and tended to follow the fact checking tool less when it was endorsed by the government. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Journal ref: Mobile Communication and Online Falsehoods in Asia, 2023

arXiv:2403.12686 [pdf, other]

WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar

Authors: Runwei Guan, Liye Jia, Fengyufan Yang, Shanliang Yao, Erick Purwanto, Xiaohui Zhu, Eng Gee Lim, Jeremy Smith, Ka Lok Man, Xuming Hu, Yutao Yue

Abstract: The perception of waterways based on human intent is significant for autonomous navigation and operations of Unmanned Surface Vehicles (USVs) in water environments. Inspired by visual grounding, we introduce WaterVG, the first visual grounding dataset designed for USV-based waterway perception based on human prompts. WaterVG encompasses prompts describing multiple targets, with annotations at the… ▽ More The perception of waterways based on human intent is significant for autonomous navigation and operations of Unmanned Surface Vehicles (USVs) in water environments. Inspired by visual grounding, we introduce WaterVG, the first visual grounding dataset designed for USV-based waterway perception based on human prompts. WaterVG encompasses prompts describing multiple targets, with annotations at the instance level including bounding boxes and masks. Notably, WaterVG includes 11,568 samples with 34,987 referred targets, whose prompts integrates both visual and radar characteristics. The pattern of text-guided two sensors equips a finer granularity of text prompts with visual and radar features of referred targets. Moreover, we propose a low-power visual grounding model, Potamoi, which is a multi-task model with a well-designed Phased Heterogeneous Modality Fusion (PHMF) mode, including Adaptive Radar Weighting (ARW) and Multi-Head Slim Cross Attention (MHSCA). Exactly, ARW extracts required radar features to fuse with vision for prompt alignment. MHSCA is an efficient fusion module with a remarkably small parameter count and FLOPs, elegantly fusing scenario context captured by two sensors with linguistic features, which performs expressively on visual grounding tasks. Comprehensive experiments and evaluations have been conducted on WaterVG, where our Potamoi archives state-of-the-art performances compared with counterparts. △ Less

Submitted 4 April, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: 10 pages, 10 figures

arXiv:2403.12529 [pdf, other]

Contextualized Messages Boost Graph Representations

Authors: Brian Godwin Lim, Galvin Brice Lim, Renzo Roel Tan, Kazushi Ikeda

Abstract: Graph neural networks (GNNs) have gained significant attention in recent years for their ability to process data that may be represented as graphs. This success has prompted several studies to explore the representational capability of GNNs based on the graph isomorphism task. These works inherently assume a countable node feature representation, potentially limiting their applicability. Interesti… ▽ More Graph neural networks (GNNs) have gained significant attention in recent years for their ability to process data that may be represented as graphs. This success has prompted several studies to explore the representational capability of GNNs based on the graph isomorphism task. These works inherently assume a countable node feature representation, potentially limiting their applicability. Interestingly, only a few theoretical works study GNNs with uncountable node feature representation. This paper presents a novel perspective on the representational capability of GNNs across all levels - node-level, neighborhood-level, and graph-level - when the space of node feature representation is uncountable. Specifically, it relaxes the injective requirement in previous works by employing an implicit pseudometric distance on the space of input to create a soft-injective function. This allows distinct inputs to produce similar outputs only if the pseudometric deems the inputs to be sufficiently similar on some representation, which is often useful in practice. As a consequence, a novel soft-isomorphic relational graph convolution network (SIR-GCN) that emphasizes non-linear and contextualized transformation of neighborhood feature representations is proposed. A mathematical discussion on the relationship between SIR-GCN and widely used GNNs is then laid out to put the contribution in context, establishing SIR-GCN as a generalization of classical GNN methodologies. Experiments on synthetic and benchmark datasets demonstrate the relative superiority of SIR-GCN, outperforming comparable models in node and graph property prediction tasks. △ Less

Submitted 22 May, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.06394 [pdf, other]

FSViewFusion: Few-Shots View Generation of Novel Objects

Authors: Rukhshanda Hussain, Hui Xian Grace Lim, Borchun Chen, Mubarak Shah, Ser Nam Lim

Abstract: Novel view synthesis has observed tremendous developments since the arrival of NeRFs. However, Nerf models overfit on a single scene, lacking generalization to out of distribution objects. Recently, diffusion models have exhibited remarkable performance on introducing generalization in view synthesis. Inspired by these advancements, we explore the capabilities of a pretrained stable diffusion mode… ▽ More Novel view synthesis has observed tremendous developments since the arrival of NeRFs. However, Nerf models overfit on a single scene, lacking generalization to out of distribution objects. Recently, diffusion models have exhibited remarkable performance on introducing generalization in view synthesis. Inspired by these advancements, we explore the capabilities of a pretrained stable diffusion model for view synthesis without explicit 3D priors. Specifically, we base our method on a personalized text to image model, Dreambooth, given its strong ability to adapt to specific novel objects with a few shots. Our research reveals two interesting findings. First, we observe that Dreambooth can learn the high level concept of a view, compared to arguably more complex strategies which involve finetuning diffusions on large amounts of multi-view data. Second, we establish that the concept of a view can be disentangled and transferred to a novel object irrespective of the original object's identify from which the views are learnt. Motivated by this, we introduce a learning strategy, FSViewFusion, which inherits a specific view through only one image sample of a single scene, and transfers the knowledge to a novel object, learnt from few shots, using low rank adapters. Through extensive experiments we demonstrate that our method, albeit simple, is efficient in generating reliable view samples for in the wild images. Code and models will be released. △ Less

Submitted 12 March, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

arXiv:2402.01741 [pdf]

Development and Testing of a Novel Large Language Model-Based Clinical Decision Support Systems for Medication Safety in 12 Clinical Specialties

Authors: Jasmine Chiat Ling Ong, Liyuan Jin, Kabilan Elangovan, Gilbert Yong San Lim, Daniel Yan Zheng Lim, Gerald Gui Ren Sng, Yuhe Ke, Joshua Yi Min Tung, Ryan Jian Zhong, Christopher Ming Yao Koh, Keane Zhi Hao Lee, Xiang Chen, Jack Kian Chng, Aung Than, Ken Junyang Goh, Daniel Shu Wei Ting

Abstract: Importance: We introduce a novel Retrieval Augmented Generation (RAG)-Large Language Model (LLM) framework as a Clinical Decision Support Systems (CDSS) to support safe medication prescription. Objective: To evaluate the efficacy of LLM-based CDSS in correctly identifying medication errors in different patient case vignettes from diverse medical and surgical sub-disciplines, against a human expe… ▽ More Importance: We introduce a novel Retrieval Augmented Generation (RAG)-Large Language Model (LLM) framework as a Clinical Decision Support Systems (CDSS) to support safe medication prescription. Objective: To evaluate the efficacy of LLM-based CDSS in correctly identifying medication errors in different patient case vignettes from diverse medical and surgical sub-disciplines, against a human expert panel derived ground truth. We compared performance for under 2 different CDSS practical healthcare integration modalities: LLM-based CDSS alone (fully autonomous mode) vs junior pharmacist + LLM-based CDSS (co-pilot, assistive mode). Design, Setting, and Participants: Utilizing a RAG model with state-of-the-art medically-related LLMs (GPT-4, Gemini Pro 1.0 and Med-PaLM 2), this study used 61 prescribing error scenarios embedded into 23 complex clinical vignettes across 12 different medical and surgical specialties. A multidisciplinary expert panel assessed these cases for Drug-Related Problems (DRPs) using the PCNE classification and graded severity / potential for harm using revised NCC MERP medication error index. We compared. Results RAG-LLM performed better compared to LLM alone. When employed in a co-pilot mode, accuracy, recall, and F1 scores were optimized, indicating effectiveness in identifying moderate to severe DRPs. The accuracy of DRP detection with RAG-LLM improved in several categories but at the expense of lower precision. Conclusions This study established that a RAG-LLM based CDSS significantly boosts the accuracy of medication error identification when used alongside junior pharmacists (co-pilot), with notable improvements in detecting severe DRPs. This study also illuminates the comparative performance of current state-of-the-art LLMs in RAG-based CDSS systems. △ Less

Submitted 17 February, 2024; v1 submitted 29 January, 2024; originally announced February 2024.

arXiv:2401.10820 [pdf, other]

Help Me Reflect: Leveraging Self-Reflection Interface Nudges to Enhance Deliberativeness on Online Deliberation Platforms

Authors: Shun Yi Yeo, Gionnieve Lim, Jie Gao, Weiyu Zhang, Simon Tangi Perrault

Abstract: The deliberative potential of online platforms has been widely examined. However, little is known about how various interface-based reflection nudges impact the quality of deliberation. This paper presents two user studies with 12 and 120 participants, respectively, to investigate the impacts of different reflective nudges on the quality of deliberation. In the first study, we examined five distin… ▽ More The deliberative potential of online platforms has been widely examined. However, little is known about how various interface-based reflection nudges impact the quality of deliberation. This paper presents two user studies with 12 and 120 participants, respectively, to investigate the impacts of different reflective nudges on the quality of deliberation. In the first study, we examined five distinct reflective nudges: persona, temporal prompts, analogies and metaphors, cultural prompts and storytelling. Persona, temporal prompts, and storytelling emerged as the preferred nudges for implementation on online deliberation platforms. In the second study, we assess the impacts of these preferred reflectors more thoroughly. Results revealed a significant positive impact of these reflectors on deliberative quality. Specifically, persona promotes a deliberative environment for balanced and opinionated viewpoints while temporal prompts promote more individualised viewpoints. Our findings suggest that the choice of reflectors can significantly influence the dynamics and shape the nature of online discussions. △ Less

Submitted 19 January, 2024; originally announced January 2024.

arXiv:2312.08851 [pdf, other]

Achelous++: Power-Oriented Water-Surface Panoptic Perception Framework on Edge Devices based on Vision-Radar Fusion and Pruning of Heterogeneous Modalities

Authors: Runwei Guan, Haocheng Zhao, Shanliang Yao, Ka Lok Man, Xiaohui Zhu, Limin Yu, Yong Yue, Jeremy Smith, Eng Gee Lim, Weiping Ding, Yutao Yue

Abstract: Urban water-surface robust perception serves as the foundation for intelligent monitoring of aquatic environments and the autonomous navigation and operation of unmanned vessels, especially in the context of waterway safety. It is worth noting that current multi-sensor fusion and multi-task learning models consume substantial power and heavily rely on high-power GPUs for inference. This contribute… ▽ More Urban water-surface robust perception serves as the foundation for intelligent monitoring of aquatic environments and the autonomous navigation and operation of unmanned vessels, especially in the context of waterway safety. It is worth noting that current multi-sensor fusion and multi-task learning models consume substantial power and heavily rely on high-power GPUs for inference. This contributes to increased carbon emissions, a concern that runs counter to the prevailing emphasis on environmental preservation and the pursuit of sustainable, low-carbon urban environments. In light of these concerns, this paper concentrates on low-power, lightweight, multi-task panoptic perception through the fusion of visual and 4D radar data, which is seen as a promising low-cost perception method. We propose a framework named Achelous++ that facilitates the development and comprehensive evaluation of multi-task water-surface panoptic perception models. Achelous++ can simultaneously execute five perception tasks with high speed and low power consumption, including object detection, object semantic segmentation, drivable-area segmentation, waterline segmentation, and radar point cloud semantic segmentation. Furthermore, to meet the demand for developers to customize models for real-time inference on low-performance devices, a novel multi-modal pruning strategy known as Heterogeneous-Aware SynFlow (HA-SynFlow) is proposed. Besides, Achelous++ also supports random pruning at initialization with different layer-wise sparsity, such as Uniform and Erdos-Renyi-Kernel (ERK). Overall, our Achelous++ framework achieves state-of-the-art performance on the WaterScenes benchmark, excelling in both accuracy and power efficiency compared to other single-task and multi-task models. We release and maintain the code at https://github.com/GuanRunwei/Achelous. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: 18 pages, 9 figures

arXiv:2312.04861 [pdf, other]

Exploring Radar Data Representations in Autonomous Driving: A Comprehensive Review

Authors: Shanliang Yao, Runwei Guan, Zitian Peng, Chenhang Xu, Yilu Shi, Weiping Ding, Eng Gee Lim, Yong Yue, Hyungjoon Seo, Ka Lok Man, Jieming Ma, Xiaohui Zhu, Yutao Yue

Abstract: With the rapid advancements of sensor technology and deep learning, autonomous driving systems are providing safe and efficient access to intelligent vehicles as well as intelligent transportation. Among these equipped sensors, the radar sensor plays a crucial role in providing robust perception information in diverse environmental conditions. This review focuses on exploring different radar data… ▽ More With the rapid advancements of sensor technology and deep learning, autonomous driving systems are providing safe and efficient access to intelligent vehicles as well as intelligent transportation. Among these equipped sensors, the radar sensor plays a crucial role in providing robust perception information in diverse environmental conditions. This review focuses on exploring different radar data representations utilized in autonomous driving systems. Firstly, we introduce the capabilities and limitations of the radar sensor by examining the working principles of radar perception and signal processing of radar measurements. Then, we delve into the generation process of five radar representations, including the ADC signal, radar tensor, point cloud, grid map, and micro-Doppler signature. For each radar representation, we examine the related datasets, methods, advantages and limitations. Furthermore, we discuss the challenges faced in these data representations and propose potential research directions. Above all, this comprehensive review offers an in-depth insight into how these representations enhance autonomous system capabilities, providing guidance for radar perception researchers. To facilitate retrieval and comparison of different data representations, datasets and methods, we provide an interactive website at https://radar-camera-fusion.github.io/radar. △ Less

Submitted 19 April, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

Comments: 24 pages, 10 figures, 5 tables. arXiv admin note: text overlap with arXiv:2304.10410

arXiv:2311.02800 [pdf, other]

Kivi: Verification for Cluster Management

Authors: Bingzhe Liu, Gangmuk Lim, Ryan Beckett, P. Brighten Godfrey

Abstract: Modern cloud infrastructure is powered by cluster management systems such as Kubernetes and Docker Swarm. While these systems seek to minimize users' operational burden, the complex, dynamic, and non-deterministic nature of these systems makes them hard to reason about, potentially leading to failures ranging from performance degradation to outages. We present Kivi, the first system for verifying… ▽ More Modern cloud infrastructure is powered by cluster management systems such as Kubernetes and Docker Swarm. While these systems seek to minimize users' operational burden, the complex, dynamic, and non-deterministic nature of these systems makes them hard to reason about, potentially leading to failures ranging from performance degradation to outages. We present Kivi, the first system for verifying controllers and their configurations in cluster management systems. Kivi focuses on the popular system Kubernetes, and models its controllers and events into processes whereby their interleavings are exhaustively checked via model checking. Central to handling autoscaling and large-scale deployments is our design that seeks to find violations in a smaller and reduced topology. We also develop several model optimizations in Kivi to scale to large clusters. We show that Kivi is effective and accurate in finding issues in realistic and complex scenarios and showcase two new issues in Kubernetes controller source code. △ Less

Submitted 5 November, 2023; originally announced November 2023.

Comments: 18 pages

arXiv:2308.10287 [pdf, other]

ASY-VRNet: Waterway Panoptic Driving Perception Model based on Asymmetric Fair Fusion of Vision and 4D mmWave Radar

Authors: Runwei Guan, Shanliang Yao, Xiaohui Zhu, Ka Lok Man, Yong Yue, Jeremy Smith, Eng Gee Lim, Yutao Yue

Abstract: Panoptic Driving Perception (PDP) is critical for the autonomous navigation of Unmanned Surface Vehicles (USVs). A PDP model typically integrates multiple tasks, necessitating the simultaneous and robust execution of various perception tasks to facilitate downstream path planning. The fusion of visual and radar sensors is currently acknowledged as a robust and cost-effective approach. However, mos… ▽ More Panoptic Driving Perception (PDP) is critical for the autonomous navigation of Unmanned Surface Vehicles (USVs). A PDP model typically integrates multiple tasks, necessitating the simultaneous and robust execution of various perception tasks to facilitate downstream path planning. The fusion of visual and radar sensors is currently acknowledged as a robust and cost-effective approach. However, most existing research has primarily focused on fusing visual and radar features dedicated to object detection or utilizing a shared feature space for multiple tasks, neglecting the individual representation differences between various tasks. To address this gap, we propose a pair of Asymmetric Fair Fusion (AFF) modules with favorable explainability designed to efficiently interact with independent features from both visual and radar modalities, tailored to the specific requirements of object detection and semantic segmentation tasks. The AFF modules treat image and radar maps as irregular point sets and transform these features into a crossed-shared feature space for multitasking, ensuring equitable treatment of vision and radar point cloud features. Leveraging AFF modules, we propose a novel and efficient PDP model, ASY-VRNet, which processes image and radar features based on irregular super-pixel point sets. Additionally, we propose an effective multitask learning method specifically designed for PDP models. Compared to other lightweight models, ASY-VRNet achieves state-of-the-art performance in object detection, semantic segmentation, and drivable-area segmentation on the WaterScenes benchmark. Our project is publicly available at https://github.com/GuanRunwei/ASY-VRNet. △ Less

Submitted 4 July, 2024; v1 submitted 20 August, 2023; originally announced August 2023.

Comments: Accepted by IROS 2024

arXiv:2308.03372 [pdf, other]

doi 10.1145/3638380.3638388

XAI in Automated Fact-Checking? The Benefits Are Modest and There's No One-Explanation-Fits-All

Authors: Gionnieve Lim, Simon T. Perrault

Abstract: The massive volume of online information along with the issue of misinformation has spurred active research in the automation of fact-checking. Like fact-checking by human experts, it is not enough for an automated fact-checker to just be accurate, but also be able to inform and convince the user of the validity of its predictions. This becomes viable with explainable artificial intelligence (XAI)… ▽ More The massive volume of online information along with the issue of misinformation has spurred active research in the automation of fact-checking. Like fact-checking by human experts, it is not enough for an automated fact-checker to just be accurate, but also be able to inform and convince the user of the validity of its predictions. This becomes viable with explainable artificial intelligence (XAI). In this work, we conduct a study of XAI fact-checkers involving 180 participants to determine how users' actions towards news and their attitudes towards explanations are affected by the XAI. Our results suggest that XAI has limited effects on users' agreement with the veracity prediction of the automated fact-checker and on their intent to share news. However, XAI nudges users towards forming uniform judgments of news veracity, thereby signaling their reliance on the explanations. We also found polarizing preferences towards XAI and raise several design considerations on them. △ Less

Submitted 19 June, 2024; v1 submitted 7 August, 2023; originally announced August 2023.

Journal ref: OzCHI 2023

arXiv:2307.07102 [pdf, other]

Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar

Authors: Runwei Guan, Shanliang Yao, Xiaohui Zhu, Ka Lok Man, Eng Gee Lim, Jeremy Smith, Yong Yue, Yutao Yue

Abstract: Current perception models for different tasks usually exist in modular forms on Unmanned Surface Vehicles (USVs), which infer extremely slowly in parallel on edge devices, causing the asynchrony between perception results and USV position, and leading to error decisions of autonomous navigation. Compared with Unmanned Ground Vehicles (UGVs), the robust perception of USVs develops relatively slowly… ▽ More Current perception models for different tasks usually exist in modular forms on Unmanned Surface Vehicles (USVs), which infer extremely slowly in parallel on edge devices, causing the asynchrony between perception results and USV position, and leading to error decisions of autonomous navigation. Compared with Unmanned Ground Vehicles (UGVs), the robust perception of USVs develops relatively slowly. Moreover, most current multi-task perception models are huge in parameters, slow in inference and not scalable. Oriented on this, we propose Achelous, a low-cost and fast unified panoptic perception framework for water-surface perception based on the fusion of a monocular camera and 4D mmWave radar. Achelous can simultaneously perform five tasks, detection and segmentation of visual targets, drivable-area segmentation, waterline segmentation and radar point cloud segmentation. Besides, models in Achelous family, with less than around 5 million parameters, achieve about 18 FPS on an NVIDIA Jetson AGX Xavier, 11 FPS faster than HybridNets, and exceed YOLOX-Tiny and Segformer-B0 on our collected dataset about 5 mAP$_{\text{50-95}}$ and 0.7 mIoU, especially under situations of adverse weather, dark environments and camera failure. To our knowledge, Achelous is the first comprehensive panoptic perception framework combining vision-level and point-cloud-level tasks for water-surface perception. To promote the development of the intelligent transportation community, we release our codes in \url{https://github.com/GuanRunwei/Achelous}. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: Accepted by ITSC 2023

arXiv:2307.06505 [pdf, other]

WaterScenes: A Multi-Task 4D Radar-Camera Fusion Dataset and Benchmarks for Autonomous Driving on Water Surfaces

Authors: Shanliang Yao, Runwei Guan, Zhaodong Wu, Yi Ni, Zile Huang, Ryan Wen Liu, Yong Yue, Weiping Ding, Eng Gee Lim, Hyungjoon Seo, Ka Lok Man, Jieming Ma, Xiaohui Zhu, Yutao Yue

Abstract: Autonomous driving on water surfaces plays an essential role in executing hazardous and time-consuming missions, such as maritime surveillance, survivors rescue, environmental monitoring, hydrography mapping and waste cleaning. This work presents WaterScenes, the first multi-task 4D radar-camera fusion dataset for autonomous driving on water surfaces. Equipped with a 4D radar and a monocular camer… ▽ More Autonomous driving on water surfaces plays an essential role in executing hazardous and time-consuming missions, such as maritime surveillance, survivors rescue, environmental monitoring, hydrography mapping and waste cleaning. This work presents WaterScenes, the first multi-task 4D radar-camera fusion dataset for autonomous driving on water surfaces. Equipped with a 4D radar and a monocular camera, our Unmanned Surface Vehicle (USV) proffers all-weather solutions for discerning object-related information, including color, shape, texture, range, velocity, azimuth, and elevation. Focusing on typical static and dynamic objects on water surfaces, we label the camera images and radar point clouds at pixel-level and point-level, respectively. In addition to basic perception tasks, such as object detection, instance segmentation and semantic segmentation, we also provide annotations for free-space segmentation and waterline segmentation. Leveraging the multi-task and multi-modal data, we conduct benchmark experiments on the uni-modality of radar and camera, as well as the fused modalities. Experimental results demonstrate that 4D radar-camera fusion can considerably improve the accuracy and robustness of perception on water surfaces, especially in adverse lighting and weather conditions. WaterScenes dataset is public on https://waterscenes.github.io. △ Less

Submitted 15 June, 2024; v1 submitted 12 July, 2023; originally announced July 2023.

Comments: Accepted by IEEE Transactions on Intelligent Transportation Systems

arXiv:2306.12550 [pdf, other]

On Evaluation of Document Classification using RVL-CDIP

Authors: Stefan Larson, Gordon Lim, Kevin Leach

Abstract: The RVL-CDIP benchmark is widely used for measuring performance on the task of document classification. Despite its widespread use, we reveal several undesirable characteristics of the RVL-CDIP benchmark. These include (1) substantial amounts of label noise, which we estimate to be 8.1% (ranging between 1.6% to 16.9% per document category); (2) presence of many ambiguous or multi-label documents;… ▽ More The RVL-CDIP benchmark is widely used for measuring performance on the task of document classification. Despite its widespread use, we reveal several undesirable characteristics of the RVL-CDIP benchmark. These include (1) substantial amounts of label noise, which we estimate to be 8.1% (ranging between 1.6% to 16.9% per document category); (2) presence of many ambiguous or multi-label documents; (3) a large overlap between test and train splits, which can inflate model performance metrics; and (4) presence of sensitive personally-identifiable information like US Social Security numbers (SSNs). We argue that there is a risk in using RVL-CDIP for benchmarking document classifiers, as its limited scope, presence of errors (state-of-the-art models now achieve accuracy error rates that are within our estimated label error rate), and lack of diversity make it less than ideal for benchmarking. We further advocate for the creation of a new document classification benchmark, and provide recommendations for what characteristics such a resource should include. △ Less

Submitted 21 June, 2023; originally announced June 2023.

Comments: EACL 2023

arXiv:2306.08345 [pdf, other]

doi 10.1145/3570361.3592518

SWAM: Revisiting Swap and OOMK for Improving Application Responsiveness on Mobile Devices

Authors: Geunsik Lim, Donghyun Kang, MyungJoo Ham, Young Ik Eom

Abstract: Existing memory reclamation policies on mobile devices may be no longer valid because they have negative effects on the response time of running applications. In this paper, we propose SWAM, a new integrated memory management technique that complements the shortcomings of both the swapping and killing mechanism in mobile devices and improves the application responsiveness. SWAM consists of (1) Ada… ▽ More Existing memory reclamation policies on mobile devices may be no longer valid because they have negative effects on the response time of running applications. In this paper, we propose SWAM, a new integrated memory management technique that complements the shortcomings of both the swapping and killing mechanism in mobile devices and improves the application responsiveness. SWAM consists of (1) Adaptive Swap that performs swapping adaptively into memory or storage device while managing the swap space dynamically, (2) OOM Cleaner that reclaims shared object pages in the swap space to secure available memory and storage space, and (3) EOOM Killer that terminates processes in the worst case while prioritizing the lowest initialization cost applications as victim processes first. Experimental results demonstrate that SWAM significantly reduces the number of applications killed by OOMK (6.5x lower), and improves application launch time (36% faster) and response time (41% faster), compared to the conventional schemes. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: 15 pages, 13 figures, MobiCom 2023 (29th Annual International Conference On Mobile Computing And Networking)

arXiv:2304.10893 [pdf, other]

FindVehicle and VehicleFinder: A NER dataset for natural language-based vehicle retrieval and a keyword-based cross-modal vehicle retrieval system

Authors: Runwei Guan, Ka Lok Man, Feifan Chen, Shanliang Yao, Rongsheng Hu, Xiaohui Zhu, Jeremy Smith, Eng Gee Lim, Yutao Yue

Abstract: Natural language (NL) based vehicle retrieval is a task aiming to retrieve a vehicle that is most consistent with a given NL query from among all candidate vehicles. Because NL query can be easily obtained, such a task has a promising prospect in building an interactive intelligent traffic system (ITS). Current solutions mainly focus on extracting both text and image features and mapping them to t… ▽ More Natural language (NL) based vehicle retrieval is a task aiming to retrieve a vehicle that is most consistent with a given NL query from among all candidate vehicles. Because NL query can be easily obtained, such a task has a promising prospect in building an interactive intelligent traffic system (ITS). Current solutions mainly focus on extracting both text and image features and mapping them to the same latent space to compare the similarity. However, existing methods usually use dependency analysis or semantic role-labelling techniques to find keywords related to vehicle attributes. These techniques may require a lot of pre-processing and post-processing work, and also suffer from extracting the wrong keyword when the NL query is complex. To tackle these problems and simplify, we borrow the idea from named entity recognition (NER) and construct FindVehicle, a NER dataset in the traffic domain. It has 42.3k labelled NL descriptions of vehicle tracks, containing information such as the location, orientation, type and colour of the vehicle. FindVehicle also adopts both overlapping entities and fine-grained entities to meet further requirements. To verify its effectiveness, we propose a baseline NL-based vehicle retrieval model called VehicleFinder. Our experiment shows that by using text encoders pre-trained by FindVehicle, VehicleFinder achieves 87.7\% precision and 89.4\% recall when retrieving a target vehicle by text command on our homemade dataset based on UA-DETRAC. The time cost of VehicleFinder is 279.35 ms on one ARM v8.2 CPU and 93.72 ms on one RTX A4000 GPU, which is much faster than the Transformer-based system. The dataset is open-source via the link https://github.com/GuanRunwei/FindVehicle, and the implementation can be found via the link https://github.com/GuanRunwei/VehicleFinder-CTIM. △ Less

Submitted 21 April, 2023; originally announced April 2023.

arXiv:2304.10410 [pdf, other]

doi 10.1109/TIV.2023.3307157

Radar-Camera Fusion for Object Detection and Semantic Segmentation in Autonomous Driving: A Comprehensive Review

Authors: Shanliang Yao, Runwei Guan, Xiaoyu Huang, Zhuoxiao Li, Xiangyu Sha, Yong Yue, Eng Gee Lim, Hyungjoon Seo, Ka Lok Man, Xiaohui Zhu, Yutao Yue

Abstract: Driven by deep learning techniques, perception technology in autonomous driving has developed rapidly in recent years, enabling vehicles to accurately detect and interpret surrounding environment for safe and efficient navigation. To achieve accurate and robust perception capabilities, autonomous vehicles are often equipped with multiple sensors, making sensor fusion a crucial part of the percepti… ▽ More Driven by deep learning techniques, perception technology in autonomous driving has developed rapidly in recent years, enabling vehicles to accurately detect and interpret surrounding environment for safe and efficient navigation. To achieve accurate and robust perception capabilities, autonomous vehicles are often equipped with multiple sensors, making sensor fusion a crucial part of the perception system. Among these fused sensors, radars and cameras enable a complementary and cost-effective perception of the surrounding environment regardless of lighting and weather conditions. This review aims to provide a comprehensive guideline for radar-camera fusion, particularly concentrating on perception tasks related to object detection and semantic segmentation.Based on the principles of the radar and camera sensors, we delve into the data processing process and representations, followed by an in-depth analysis and summary of radar-camera fusion datasets. In the review of methodologies in radar-camera fusion, we address interrogative questions, including "why to fuse", "what to fuse", "where to fuse", "when to fuse", and "how to fuse", subsequently discussing various challenges and potential research directions within this domain. To ease the retrieval and comparison of datasets and fusion methods, we also provide an interactive website: https://radar-camera-fusion.github.io. △ Less

Submitted 23 August, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

Comments: Accepted by IEEE Transactions on Intelligent Vehicles (T-IV)

Journal ref: IEEE Transactions on Intelligent Vehicles 2023

arXiv:2304.07366 [pdf, other]

CollabCoder: A Lower-barrier, Rigorous Workflow for Inductive Collaborative Qualitative Analysis with Large Language Models

Authors: Jie Gao, Yuchen Guo, Gionnieve Lim, Tianqin Zhang, Zheng Zhang, Toby Jia-Jun Li, Simon Tangi Perrault

Abstract: Collaborative Qualitative Analysis (CQA) can enhance qualitative analysis rigor and depth by incorporating varied viewpoints. Nevertheless, ensuring a rigorous CQA procedure itself can be both demanding and costly. To lower this bar, we take a theoretical perspective to design the CollabCoder workflow, that integrates Large Language Models (LLMs) into key inductive CQA stages: independent open cod… ▽ More Collaborative Qualitative Analysis (CQA) can enhance qualitative analysis rigor and depth by incorporating varied viewpoints. Nevertheless, ensuring a rigorous CQA procedure itself can be both demanding and costly. To lower this bar, we take a theoretical perspective to design the CollabCoder workflow, that integrates Large Language Models (LLMs) into key inductive CQA stages: independent open coding, iterative discussions, and final codebook creation. In the open coding phase, CollabCoder offers AI-generated code suggestions and records decision-making data. During discussions, it promotes mutual understanding by sharing this data within the coding team and using quantitative metrics to identify coding (dis)agreements, aiding in consensus-building. In the code grouping stage, CollabCoder provides primary code group suggestions, lightening the cognitive load of finalizing the codebook. A 16-user evaluation confirmed the effectiveness of CollabCoder, demonstrating its advantages over existing software and providing empirical insights into the role of LLMs in the CQA practice. △ Less

Submitted 22 January, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

Comments: Will be published at the ACM CHI Conference on Human Factors in Computing Systems (CHI'24)

arXiv:2303.11899 [pdf, other]

Large-Scale Traffic Signal Control Using Constrained Network Partition and Adaptive Deep Reinforcement Learning

Authors: Hankang Gu, Shangbo Wang, Xiaoguang Ma, Dongyao Jia, Guoqiang Mao, Eng Gee Lim, Cheuk Pong Ryan Wong

Abstract: Multi-agent Deep Reinforcement Learning (MADRL) based traffic signal control becomes a popular research topic in recent years. To alleviate the scalability issue of completely centralized RL techniques and the non-stationarity issue of completely decentralized RL techniques on large-scale traffic networks, some literature utilizes a regional control approach where the whole network is firstly part… ▽ More Multi-agent Deep Reinforcement Learning (MADRL) based traffic signal control becomes a popular research topic in recent years. To alleviate the scalability issue of completely centralized RL techniques and the non-stationarity issue of completely decentralized RL techniques on large-scale traffic networks, some literature utilizes a regional control approach where the whole network is firstly partitioned into multiple disjoint regions, followed by applying the centralized RL approach to each region. However, the existing partitioning rules either have no constraints on the topology of regions or require the same topology for all regions. Meanwhile, no existing regional control approach explores the performance of optimal joint action in an exponentially growing regional action space when intersections are controlled by 4-phase traffic signals (EW, EWL, NS, NSL). In this paper, we propose a novel RL training framework named RegionLight to tackle the above limitations. Specifically, the topology of regions is firstly constrained to a star network which comprises one center and an arbitrary number of leaves. Next, the network partitioning problem is modeled as an optimization problem to minimize the number of regions. Then, an Adaptive Branching Dueling Q-Network (ABDQ) model is proposed to decompose the regional control task into several joint signal control sub-tasks corresponding to particular intersections. Subsequently, these sub-tasks maximize the regional benefits cooperatively. Finally, the global control strategy for the whole network is obtained by concatenating the optimal joint actions of all regions. Experimental results demonstrate the superiority of our proposed framework over all baselines under both real and synthetic datasets in all evaluation metrics. △ Less

Submitted 7 September, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

arXiv:2303.08906 [pdf, other]

VVS: Video-to-Video Retrieval with Irrelevant Frame Suppression

Authors: Won Jo, Geuntaek Lim, Gwangjin Lee, Hyunwoo Kim, Byungsoo Ko, Yukyung Choi

Abstract: In content-based video retrieval (CBVR), dealing with large-scale collections, efficiency is as important as accuracy; thus, several video-level feature-based studies have actively been conducted. Nevertheless, owing to the severe difficulty of embedding a lengthy and untrimmed video into a single feature, these studies have been insufficient for accurate retrieval compared to frame-level feature-… ▽ More In content-based video retrieval (CBVR), dealing with large-scale collections, efficiency is as important as accuracy; thus, several video-level feature-based studies have actively been conducted. Nevertheless, owing to the severe difficulty of embedding a lengthy and untrimmed video into a single feature, these studies have been insufficient for accurate retrieval compared to frame-level feature-based studies. In this paper, we show that appropriate suppression of irrelevant frames can provide insight into the current obstacles of the video-level approaches. Furthermore, we propose a Video-to-Video Suppression network (VVS) as a solution. VVS is an end-to-end framework that consists of an easy distractor elimination stage to identify which frames to remove and a suppression weight generation stage to determine the extent to suppress the remaining frames. This structure is intended to effectively describe an untrimmed video with varying content and meaningless information. Its efficacy is proved via extensive experiments, and we show that our approach is not only state-of-the-art in video-level approaches but also has a fast inference time despite possessing retrieval capabilities close to those of frame-level approaches. Code is available at https://github.com/sejong-rcv/VVS △ Less

Submitted 19 December, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: AAAI-24

arXiv:2212.13032 [pdf]

Diagnosis of COVID-19 based on Chest Radiography

Authors: Mei Gah Lim, Hoi Leong Lee

Abstract: The Coronavirus disease 2019 (COVID-19) was first identified in Wuhan, China, in early December 2019 and now becoming a pandemic. When COVID-19 patients undergo radiography examination, radiologists can observe the present of radiographic abnormalities from their chest X-ray (CXR) images. In this study, a deep convolutional neural network (CNN) model was proposed to aid radiologists in diagnosing… ▽ More The Coronavirus disease 2019 (COVID-19) was first identified in Wuhan, China, in early December 2019 and now becoming a pandemic. When COVID-19 patients undergo radiography examination, radiologists can observe the present of radiographic abnormalities from their chest X-ray (CXR) images. In this study, a deep convolutional neural network (CNN) model was proposed to aid radiologists in diagnosing COVID-19 patients. First, this work conducted a comparative study on the performance of modified VGG-16, ResNet-50 and DenseNet-121 to classify CXR images into normal, COVID-19 and viral pneumonia. Then, the impact of image augmentation on the classification results was evaluated. The publicly available COVID-19 Radiography Database was used throughout this study. After comparison, ResNet-50 achieved the highest accuracy with 95.88%. Next, after training ResNet-50 with rotation, translation, horizontal flip, intensity shift and zoom augmented dataset, the accuracy dropped to 80.95%. Furthermore, an ablation study on the effect of image augmentation on the classification results found that the combinations of rotation and intensity shift augmentation methods obtained an accuracy higher than baseline, which is 96.14%. Finally, ResNet-50 with rotation and intensity shift augmentations performed the best and was proposed as the final classification model in this work. These findings demonstrated that the proposed classification model can provide a promising result for COVID-19 diagnosis. △ Less

Submitted 26 December, 2022; originally announced December 2022.

arXiv:2212.10278 [pdf, other]

Fully and Weakly Supervised Referring Expression Segmentation with End-to-End Learning

Authors: Hui Li, Mingjie Sun, Jimin Xiao, Eng Gee Lim, Yao Zhao

Abstract: Referring Expression Segmentation (RES), which is aimed at localizing and segmenting the target according to the given language expression, has drawn increasing attention. Existing methods jointly consider the localization and segmentation steps, which rely on the fused visual and linguistic features for both steps. We argue that the conflict between the purpose of identifying an object and genera… ▽ More Referring Expression Segmentation (RES), which is aimed at localizing and segmenting the target according to the given language expression, has drawn increasing attention. Existing methods jointly consider the localization and segmentation steps, which rely on the fused visual and linguistic features for both steps. We argue that the conflict between the purpose of identifying an object and generating a mask limits the RES performance. To solve this problem, we propose a parallel position-kernel-segmentation pipeline to better isolate and then interact the localization and segmentation steps. In our pipeline, linguistic information will not directly contaminate the visual feature for segmentation. Specifically, the localization step localizes the target object in the image based on the referring expression, and then the visual kernel obtained from the localization step guides the segmentation step. This pipeline also enables us to train RES in a weakly-supervised way, where the pixel-level segmentation labels are replaced by click annotations on center and corner points. The position head is fully-supervised and trained with the click annotations as supervision, and the segmentation head is trained with weakly-supervised segmentation losses. To validate our framework on a weakly-supervised setting, we annotated three RES benchmark datasets (RefCOCO, RefCOCO+ and RefCOCOg) with click annotations.Our method is simple but surprisingly effective, outperforming all previous state-of-the-art RES methods on fully- and weakly-supervised settings by a large margin. The benchmark code and datasets will be released. △ Less

Submitted 17 December, 2022; originally announced December 2022.

arXiv:2210.07448 [pdf, other]

Evaluating Out-of-Distribution Performance on Document Image Classifiers

Authors: Stefan Larson, Gordon Lim, Yutong Ai, David Kuang, Kevin Leach

Abstract: The ability of a document classifier to handle inputs that are drawn from a distribution different from the training distribution is crucial for robust deployment and generalizability. The RVL-CDIP corpus is the de facto standard benchmark for document classification, yet to our knowledge all studies that use this corpus do not include evaluation on out-of-distribution documents. In this paper, we… ▽ More The ability of a document classifier to handle inputs that are drawn from a distribution different from the training distribution is crucial for robust deployment and generalizability. The RVL-CDIP corpus is the de facto standard benchmark for document classification, yet to our knowledge all studies that use this corpus do not include evaluation on out-of-distribution documents. In this paper, we curate and release a new out-of-distribution benchmark for evaluating out-of-distribution performance for document classifiers. Our new out-of-distribution benchmark consists of two types of documents: those that are not part of any of the 16 in-domain RVL-CDIP categories (RVL-CDIP-O), and those that are one of the 16 in-domain categories yet are drawn from a distribution different from that of the original RVL-CDIP dataset (RVL-CDIP-N). While prior work on document classification for in-domain RVL-CDIP documents reports high accuracy scores, we find that these models exhibit accuracy drops of between roughly 15-30% on our new out-of-domain RVL-CDIP-N benchmark, and further struggle to distinguish between in-domain RVL-CDIP-N and out-of-domain RVL-CDIP-O inputs. Our new benchmark provides researchers with a valuable new resource for analyzing out-of-distribution performance on document classifiers. Our new out-of-distribution data can be found at https://github.com/gxlarson/rvl-cdip-ood. △ Less

Submitted 18 January, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

Comments: NeurIPS D&B 2022

arXiv:2207.04843 [pdf]

Statistical Detection of Adversarial examples in Blockchain-based Federated Forest In-vehicle Network Intrusion Detection Systems

Authors: Ibrahim Aliyu, Selinde van Engelenburg, Muhammed Bashir Muazu, Jinsul Kim, Chang Gyoon Lim

Abstract: The internet-of-Vehicle (IoV) can facilitate seamless connectivity between connected vehicles (CV), autonomous vehicles (AV), and other IoV entities. Intrusion Detection Systems (IDSs) for IoV networks can rely on machine learning (ML) to protect the in-vehicle network from cyber-attacks. Blockchain-based Federated Forests (BFFs) could be used to train ML models based on data from IoV entities whi… ▽ More The internet-of-Vehicle (IoV) can facilitate seamless connectivity between connected vehicles (CV), autonomous vehicles (AV), and other IoV entities. Intrusion Detection Systems (IDSs) for IoV networks can rely on machine learning (ML) to protect the in-vehicle network from cyber-attacks. Blockchain-based Federated Forests (BFFs) could be used to train ML models based on data from IoV entities while protecting the confidentiality of the data and reducing the risks of tampering with the data. However, ML models created this way are still vulnerable to evasion, poisoning, and exploratory attacks using adversarial examples. This paper investigates the impact of various possible adversarial examples on the BFF-IDS. We proposed integrating a statistical detector to detect and extract unknown adversarial samples. By including the unknown detected samples into the dataset of the detector, we augment the BFF-IDS with an additional model to detect original known attacks and the new adversarial inputs. The statistical adversarial detector confidently detected adversarial examples at the sample size of 50 and 100 input samples. Furthermore, the augmented BFF-IDS (BFF-IDS(AUG)) successfully mitigates the adversarial examples with more than 96% accuracy. With this approach, the model will continue to be augmented in a sandbox whenever an adversarial sample is detected and subsequently adopt the BFF-IDS(AUG) as the active security model. Consequently, the proposed integration of the statistical adversarial detector and the subsequent augmentation of the BFF-IDS with detected adversarial samples provides a sustainable security framework against adversarial examples and other unknown attacks. △ Less

Submitted 11 July, 2022; originally announced July 2022.

arXiv:2204.11461 [pdf]

A Review of Research on Civic Technology: Definitions, Theories, History and Insights

Authors: Weiyu Zhang, Gionnieve Lim, Simon Perrault, Chuyao Wang

Abstract: There have been initiatives that take advantage of information and communication technologies to serve civic purposes, referred to as civic technologies (Civic Tech). In this paper, we present a review of 224 papers from the ACM Digital Library focusing on Computer Supported Cooperative Work and Human-Computer Interaction, the key fields supporting the building of Civic Tech. Through this review,… ▽ More There have been initiatives that take advantage of information and communication technologies to serve civic purposes, referred to as civic technologies (Civic Tech). In this paper, we present a review of 224 papers from the ACM Digital Library focusing on Computer Supported Cooperative Work and Human-Computer Interaction, the key fields supporting the building of Civic Tech. Through this review, we discuss the concepts, theories and history of civic tech research and provide insights on the technological tools, social processes and participation mechanisms involved. Our work seeks to direct future civic tech efforts to the phase of by the citizens. △ Less

Submitted 25 April, 2022; originally announced April 2022.

Comments: 33 pages, 12 figures

arXiv:2203.16056 [pdf, other]

Automatic Facial Skin Feature Detection for Everyone

Authors: Qian Zheng, Ankur Purwar, Heng Zhao, Guang Liang Lim, Ling Li, Debasish Behera, Qian Wang, Min Tan, Rizhao Cai, Jennifer Werner, Dennis Sng, Maurice van Steensel, Weisi Lin, Alex C Kot

Abstract: Automatic assessment and understanding of facial skin condition have several applications, including the early detection of underlying health problems, lifestyle and dietary treatment, skin-care product recommendation, etc. Selfies in the wild serve as an excellent data resource to democratize skin quality assessment, but suffer from several data collection challenges.The key to guaranteeing an ac… ▽ More Automatic assessment and understanding of facial skin condition have several applications, including the early detection of underlying health problems, lifestyle and dietary treatment, skin-care product recommendation, etc. Selfies in the wild serve as an excellent data resource to democratize skin quality assessment, but suffer from several data collection challenges.The key to guaranteeing an accurate assessment is accurate detection of different skin features. We present an automatic facial skin feature detection method that works across a variety of skin tones and age groups for selfies in the wild. To be specific, we annotate the locations of acne, pigmentation, and wrinkle for selfie images with different skin tone colors, severity levels, and lighting conditions. The annotation is conducted in a two-phase scheme with the help of a dermatologist to train volunteers for annotation. We employ Unet++ as the network architecture for feature detection. This work shows that the two-phase annotation scheme can robustly detect the accurate locations of acne, pigmentation, and wrinkle for selfie images with different ethnicities, skin tone colors, severity levels, age groups, and lighting conditions. △ Less

Submitted 30 March, 2022; originally announced March 2022.

Comments: Accepted by the conference of Electronic Imaging (EI) 2022

arXiv:2203.05787 [pdf, other]

Democracy Does Matter: Comprehensive Feature Mining for Co-Salient Object Detection

Authors: Siyue Yu, Jimin Xiao, Bingfeng Zhang, Eng Gee Lim

Abstract: Co-salient object detection, with the target of detecting co-existed salient objects among a group of images, is gaining popularity. Recent works use the attention mechanism or extra information to aggregate common co-salient features, leading to incomplete even incorrect responses for target objects. In this paper, we aim to mine comprehensive co-salient features with democracy and reduce backgro… ▽ More Co-salient object detection, with the target of detecting co-existed salient objects among a group of images, is gaining popularity. Recent works use the attention mechanism or extra information to aggregate common co-salient features, leading to incomplete even incorrect responses for target objects. In this paper, we aim to mine comprehensive co-salient features with democracy and reduce background interference without introducing any extra information. To achieve this, we design a democratic prototype generation module to generate democratic response maps, covering sufficient co-salient regions and thereby involving more shared attributes of co-salient objects. Then a comprehensive prototype based on the response maps can be generated as a guide for final prediction. To suppress the noisy background information in the prototype, we propose a self-contrastive learning module, where both positive and negative pairs are formed without relying on additional classification information. Besides, we also design a democratic feature enhancement module to further strengthen the co-salient features by readjusting attention values. Extensive experiments show that our model obtains better performance than previous state-of-the-art methods, especially on challenging real-world cases (e.g., for CoCA, we obtain a gain of 2.0% for MAE, 5.4% for maximum F-measure, 2.3% for maximum E-measure, and 3.7% for S-measure) under the same settings. Code will be released soon. △ Less

Submitted 11 March, 2022; originally announced March 2022.

Comments: accepted by cvpr2022

arXiv:2202.05609 [pdf, other]

Non-Stop & Non-Breakable Code Review Services in a Distributed System: Detecting Issues in Real Time

Authors: Geunsik Lim, Yonghwi Kwon, Joonbae Park, Chul-Joo Kim

Abstract: The two most significant bottlenecks in code merging are the build process and the unit tests. However, as the number of items to be checked in a code review increases, that code review becomes a bottleneck for code merging as well. Because of the dependency structure between code review services, an error in one service affects the entire service. As a result, whenever a service error occurs, it… ▽ More The two most significant bottlenecks in code merging are the build process and the unit tests. However, as the number of items to be checked in a code review increases, that code review becomes a bottleneck for code merging as well. Because of the dependency structure between code review services, an error in one service affects the entire service. As a result, whenever a service error occurs, it is crucial to have methods for determining which code review service has ultimately caused the error. With the goal of achieving a non-stop & non-breakable code review service, this paper describes an early error detection method along with a case study of the service. △ Less

Submitted 11 February, 2022; originally announced February 2022.

Comments: 3 pages, 4 figures

arXiv:2107.10480 [pdf, other]

Unsupervised Detection of Adversarial Examples with Model Explanations

Authors: Gihyuk Ko, Gyumin Lim

Abstract: Deep Neural Networks (DNNs) have shown remarkable performance in a diverse range of machine learning applications. However, it is widely known that DNNs are vulnerable to simple adversarial perturbations, which causes the model to incorrectly classify inputs. In this paper, we propose a simple yet effective method to detect adversarial examples, using methods developed to explain the model's behav… ▽ More Deep Neural Networks (DNNs) have shown remarkable performance in a diverse range of machine learning applications. However, it is widely known that DNNs are vulnerable to simple adversarial perturbations, which causes the model to incorrectly classify inputs. In this paper, we propose a simple yet effective method to detect adversarial examples, using methods developed to explain the model's behavior. Our key observation is that adding small, humanly imperceptible perturbations can lead to drastic changes in the model explanations, resulting in unusual or irregular forms of explanations. From this insight, we propose an unsupervised detection of adversarial examples using reconstructor networks trained only on model explanations of benign examples. Our evaluations with MNIST handwritten dataset show that our method is capable of detecting adversarial examples generated by the state-of-the-art algorithms with high confidence. To the best of our knowledge, this work is the first in suggesting unsupervised defense method using model explanations. △ Less

Submitted 22 July, 2021; originally announced July 2021.

Comments: AdvML@KDD'21

arXiv:2107.05194 [pdf, other]

Monoscopic vs. Stereoscopic Views and Display Types in the Teleoperation of Unmanned Ground Vehicles for Object Avoidance

Authors: Yiming Luo, Jialin Wang, Hai-Ning Liang, Shan Luo, Eng Gee Lim

Abstract: Virtual reality (VR) head-mounted displays (HMD) have recently been used to provide an immersive, first-person vision/view in real-time for manipulating remotely-controlled unmanned ground vehicles (UGV). The teleoperation of UGV can be challenging for operators when it is done in real time. One big challenge is for operators to perceive quickly and rapidly the distance of objects that are around… ▽ More Virtual reality (VR) head-mounted displays (HMD) have recently been used to provide an immersive, first-person vision/view in real-time for manipulating remotely-controlled unmanned ground vehicles (UGV). The teleoperation of UGV can be challenging for operators when it is done in real time. One big challenge is for operators to perceive quickly and rapidly the distance of objects that are around the UGV while it is moving. In this research, we explore the use of monoscopic and stereoscopic views and display types (immersive and non-immersive VR) for operating vehicles remotely. We conducted two user studies to explore their feasibility and advantages. Results show a significantly better performance when using an immersive display with stereoscopic view for dynamic, real-time navigation tasks that require avoiding both moving and static obstacles. The use of stereoscopic view in an immersive display in particular improved user performance and led to better usability. △ Less

Submitted 12 July, 2021; originally announced July 2021.

arXiv:2107.04279 [pdf, other]

Fast Pixel-Matching for Video Object Segmentation

Authors: Siyue Yu, Jimin Xiao, BingFeng Zhang, Eng Gee Lim

Abstract: Video object segmentation, aiming to segment the foreground objects given the annotation of the first frame, has been attracting increasing attentions. Many state-of-the-art approaches have achieved great performance by relying on online model updating or mask-propagation techniques. However, most online models require high computational cost due to model fine-tuning during inference. Most mask-pr… ▽ More Video object segmentation, aiming to segment the foreground objects given the annotation of the first frame, has been attracting increasing attentions. Many state-of-the-art approaches have achieved great performance by relying on online model updating or mask-propagation techniques. However, most online models require high computational cost due to model fine-tuning during inference. Most mask-propagation based models are faster but with relatively low performance due to failure to adapt to object appearance variation. In this paper, we are aiming to design a new model to make a good balance between speed and performance. We propose a model, called NPMCA-net, which directly localizes foreground objects based on mask-propagation and non-local technique by matching pixels in reference and target frames. Since we bring in information of both first and previous frames, our network is robust to large object appearance variation, and can better adapt to occlusions. Extensive experiments show that our approach can achieve a new state-of-the-art performance with a fast speed at the same time (86.5% IoU on DAVIS-2016 and 72.2% IoU on DAVIS-2017, with speed of 0.11s per frame) under the same level comparison. Source code is available at https://github.com/siyueyu/NPMCA-net. △ Less

Submitted 9 July, 2021; originally announced July 2021.

Comments: Accepted by Signal Processing: Image Communication

arXiv:2106.04053 [pdf, other]

Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding

Authors: Mingjie Sun, Jimin Xiao, Eng Gee Lim, Si Liu, John Y. Goulermas

Abstract: In this paper, we are tackling the weakly-supervised referring expression grounding task, for the localization of a referent object in an image according to a query sentence, where the mapping between image regions and queries are not available during the training stage. In traditional methods, an object region that best matches the referring expression is picked out, and then the query sentence i… ▽ More In this paper, we are tackling the weakly-supervised referring expression grounding task, for the localization of a referent object in an image according to a query sentence, where the mapping between image regions and queries are not available during the training stage. In traditional methods, an object region that best matches the referring expression is picked out, and then the query sentence is reconstructed from the selected region, where the reconstruction difference serves as the loss for back-propagation. The existing methods, however, conduct both the matching and the reconstruction approximately as they ignore the fact that the matching correctness is unknown. To overcome this limitation, a discriminative triad is designed here as the basis to the solution, through which a query can be converted into one or multiple discriminative triads in a very scalable way. Based on the discriminative triad, we further propose the triad-level matching and reconstruction modules which are lightweight yet effective for the weakly-supervised training, making it three times lighter and faster than the previous state-of-the-art methods. One important merit of our work is its superior performance despite the simple and neat design. Specifically, the proposed method achieves a new state-of-the-art accuracy when evaluated on RefCOCO (39.21%), RefCOCO+ (39.18%) and RefCOCOg (43.24%) datasets, that is 4.17%, 4.08% and 7.8% higher than the previous one, respectively. △ Less

Submitted 7 June, 2021; originally announced June 2021.

Comments: TPAMI

arXiv:2103.05187 [pdf, other]

Iterative Shrinking for Referring Expression Grounding Using Deep Reinforcement Learning

Authors: Mingjie Sun, Jimin Xiao, Eng Gee Lim

Abstract: In this paper, we are tackling the proposal-free referring expression grounding task, aiming at localizing the target object according to a query sentence, without relying on off-the-shelf object proposals. Existing proposal-free methods employ a query-image matching branch to select the highest-score point in the image feature map as the target box center, with its width and height predicted by a… ▽ More In this paper, we are tackling the proposal-free referring expression grounding task, aiming at localizing the target object according to a query sentence, without relying on off-the-shelf object proposals. Existing proposal-free methods employ a query-image matching branch to select the highest-score point in the image feature map as the target box center, with its width and height predicted by another branch. Such methods, however, fail to utilize the contextual relation between the target and reference objects, and lack interpretability on its reasoning procedure. To solve these problems, we propose an iterative shrinking mechanism to localize the target, where the shrinking direction is decided by a reinforcement learning agent, with all contents within the current image patch comprehensively considered. Beside, the sequential shrinking process enables to demonstrate the reasoning about how to iteratively find the target. Experiments show that the proposed method boosts the accuracy by 4.32% against the previous state-of-the-art (SOTA) method on the RefCOCOg dataset, where query sentences are long and complex, with many targets referred by other reference objects. △ Less

Submitted 8 March, 2021; originally announced March 2021.

arXiv:2101.10707 [pdf, other]

doi 10.1109/ICCE.2013.6487055

Enhancing Application Performance by Memory Partitioning in Android Platforms

Authors: Geunsik Lim, Changwoo Min, Young Ik Eom

Abstract: This paper suggests a new memory partitioning scheme that can enhance process lifecycle, while avoiding Low Memory Killer and Out-of-Memory Killer operations on mobile devices. Our proposed scheme offers the complete concept of virtual memory nodes in operating systems of Android devices. This paper suggests a new memory partitioning scheme that can enhance process lifecycle, while avoiding Low Memory Killer and Out-of-Memory Killer operations on mobile devices. Our proposed scheme offers the complete concept of virtual memory nodes in operating systems of Android devices. △ Less

Submitted 26 January, 2021; originally announced January 2021.

arXiv:2101.09360 [pdf, other]

doi 10.1145/2901318.2901320

BB: Booting Booster for Consumer Electronics with Modern OS

Authors: Geunsik Lim, MyungJoo Ham

Abstract: Unconventional computing platforms have spread widely and rapidly following smart phones and tablets: consumer electronics such as smart TVs and digital cameras. For such devices, fast booting is a critical requirement; waiting tens of seconds for a TV or a camera to boot up is not acceptable, unlike a PC or smart phone. Moreover, the software platforms of these devices have become as rich as conv… ▽ More Unconventional computing platforms have spread widely and rapidly following smart phones and tablets: consumer electronics such as smart TVs and digital cameras. For such devices, fast booting is a critical requirement; waiting tens of seconds for a TV or a camera to boot up is not acceptable, unlike a PC or smart phone. Moreover, the software platforms of these devices have become as rich as conventional computing devices to provide comparable services. As a result, the booting procedure to start every required OS service, hardware component, and application, the quantity of which is ever increasing, may take unbearable time for most consumers. To accelerate booting, this paper introduces \textit{Booting Booster} (BB), which is used in all 2015 Samsung Smart TV models, and which runs the Linux-based Tizen OS. BB addresses the init scheme of Linux, which launches initial user-space OS services and applications and manages the life cycles of all user processes, by identifying and isolating booting-critical tasks, deferring non-critical tasks, and enabling execution of more tasks in parallel. BB has been successfully deployed in Samsung Smart TV 2015 models achieving a cold boot in 3.5 s (compared to 8.1 s with full commercial-grade optimizations without BB) without the need for suspend-to-RAM or hibernation. After this successful deployment, we have released the source code via http://opensource.samsung.com, and BB will be included in the open-source OS, Tizen (http://tizen.org). △ Less

Submitted 21 January, 2021; originally announced January 2021.

arXiv:2101.09359 [pdf, other]

Load-Balancing for Improving User Responsiveness on Multicore Embedded Systems

Authors: Geunsik Lim, Changwoo Min, YoungIk Eom

Abstract: Most commercial embedded devices have been deployed with a single processor architecture. The code size and complexity of applications running on embedded devices are rapidly increasing due to the emergence of application business models such as Google Play Store and Apple App Store. As a result, a high-performance multicore CPUs have become a major trend in the embedded market as well as in the p… ▽ More Most commercial embedded devices have been deployed with a single processor architecture. The code size and complexity of applications running on embedded devices are rapidly increasing due to the emergence of application business models such as Google Play Store and Apple App Store. As a result, a high-performance multicore CPUs have become a major trend in the embedded market as well as in the personal computer market. Due to this trend, many device manufacturers have been able to adopt more attractive user interfaces and high-performance applications for better user experiences on the multicore systems. In this paper, we describe how to improve the real-time performance by reducing the user waiting time on multicore systems that use a partitioned per-CPU run queue scheduling technique. Rather than focusing on naive load-balancing scheme for equally balanced CPU usage, our approach tries to minimize the cost of task migration by considering the importance level of running tasks and to optimize per-CPU utilization on multicore embedded systems. Consequently, our approach improves the real-time characteristics such as cache efficiency, user responsiveness, and latency. Experimental results under heavy background stress show that our approach reduces the average scheduling latency of an urgent task by 2.3 times. △ Less

Submitted 20 January, 2021; originally announced January 2021.

arXiv:2101.09284 [pdf]

doi 10.1109/ICSESS.2014.6933553

User-Level Memory Scheduler for Optimizing Application Performance in NUMA-Based Multicore Systems

Authors: Geunsik Lim, Sang-Bum Suh

Abstract: Multicore CPU architectures have been established as a structure for general-purpose systems for high-performance processing of applications. Recent multicore CPU has evolved as a system architecture based on non-uniform memory architecture. For the technique of using the kernel space that shifts the tasks to the ideal memory node, the characteristics of the applications of the user-space cannot b… ▽ More Multicore CPU architectures have been established as a structure for general-purpose systems for high-performance processing of applications. Recent multicore CPU has evolved as a system architecture based on non-uniform memory architecture. For the technique of using the kernel space that shifts the tasks to the ideal memory node, the characteristics of the applications of the user-space cannot be considered. Therefore, kernel level approaches cannot execute memory scheduling to recognize the importance of user applications. Moreover, users need to run applications after sufficiently understanding the multicore CPU based on non-uniform memory architecture to ensure the high performance of the user's applications. This paper presents a user-space memory scheduler that allocates the ideal memory node for tasks by monitoring the characteristics of non-uniform memory architecture. From our experiment, the proposed system improved the performance of the application by up to 25% compared to the existing system. △ Less

Submitted 21 January, 2021; originally announced January 2021.

arXiv:2101.08891 [pdf]

doi 10.1109/TALE.2014.7062593

Cloud-Based Content Cooperation System to Assist Collaborative Learning Environment

Authors: Geunsik Lim, Donghwa Lee, Sang-Bum Suh

Abstract: Online educational systems running on smart devices have the advantage of allowing users to learn online regardless of the location of the users. In particular, data synchronization enables users to cooperate on contents in real time anywhere by sharing their files via cloud storage. However, users cannot collaborate by simultaneously modifying files that are shared with each other. In this paper,… ▽ More Online educational systems running on smart devices have the advantage of allowing users to learn online regardless of the location of the users. In particular, data synchronization enables users to cooperate on contents in real time anywhere by sharing their files via cloud storage. However, users cannot collaborate by simultaneously modifying files that are shared with each other. In this paper, we propose a content collaboration method and a history management technique that are based on distributed system structure and can synchronize data shared in the cloud for multiple users and multiple devices. △ Less

Submitted 21 January, 2021; originally announced January 2021.

arXiv:2101.08889 [pdf, other]

doi 10.1109/ICCE.2019.8662017

TAOS-CI: Lightweight & Modular Continuous Integration System for Edge Computing

Authors: Geunsik Lim, MyungJoo Ham, Jijoong Moon, Wook Song, Sangjung Woo, Sewon Oh

Abstract: With the proliferation of IoT and edge devices, we are observing a lot of consumer electronics becoming yet another IoT and edge devices. Unlike traditional smart devices, such as smart phones, consumer electronics, in general, have significant diversities with fewer number of devices per product model. With such high diversities, the proliferation of edge devices requires frequent and seamless up… ▽ More With the proliferation of IoT and edge devices, we are observing a lot of consumer electronics becoming yet another IoT and edge devices. Unlike traditional smart devices, such as smart phones, consumer electronics, in general, have significant diversities with fewer number of devices per product model. With such high diversities, the proliferation of edge devices requires frequent and seamless updates of consumer electronics, which makes the manufacturers prone to regressions because the manufacturers have less resource per an instance of software release; i.e., they need to repeat releases by the number of product models times the number of updates. Continuous Integration (CI) systems can help prevent regression bugs from actively developing software packages including the frequently updated device software platforms. The proposed CI system provides a portable and modular software platform automatically inspecting potential issues of incoming changes with the enabled modules: code format and style, performance regressions, static checks on the source code, build and packaging tests, and dynamic checks with the built binary before deploying a platform image on the IoT and edge devices. Besides, our proposed approach is lightweight enough to be hosted in normal desktop computers even for dozens of developers. As a result, it can be easily applied to a lot of various source code repositories. Evaluation results demonstrate that the proposed method drastically improves plug-ins execution time and memory consumption, compared with methods in previous studies. △ Less

Submitted 21 January, 2021; originally announced January 2021.

Comments: arXiv admin note: text overlap with arXiv:2101.07961

arXiv:2101.08887 [pdf]

doi 10.1109/BIGCOMP.2014.6741419

Distributed Compilation System for High-Speed Software Build Processes

Authors: Geunsik Lim, Minho Lee, R. J. W. E. Lahaye, Young Ik Eom

Abstract: The idle time of personal computers has increased steadily due to the generalization of computer usage and cloud computing. Clustering research aims at utilizing idle computer resources for processing a variable workload on a large number of computers. The workload is processed continually despite the volatile status of the individual computer resources. This paper proposes a distributed compilati… ▽ More The idle time of personal computers has increased steadily due to the generalization of computer usage and cloud computing. Clustering research aims at utilizing idle computer resources for processing a variable workload on a large number of computers. The workload is processed continually despite the volatile status of the individual computer resources. This paper proposes a distributed compilation system for improving the processing speed of CPU-intensive software compilations. This significantly reduces the compilation time of mass sources by using the idle resources. We expect gains of up to 65% compared to non-distributed compilation systems. △ Less

Submitted 21 January, 2021; originally announced January 2021.

arXiv:2101.08885 [pdf]

doi 10.1109/GCCE.2013.6664780

User-Aware Power Management for Mobile Devices

Authors: Geunsik Lim, Changwoo Min, Dong Hyun Kang, Young Ik Eom

Abstract: The power management techniques to extend battery lifespan is becoming increasingly important due to longer user applications' running time in mobile devices. Even when users do not use any applications, battery lifespan decreases continually. It occurs because of service daemons of mobile platform and network-based data synchronization operations. In this paper, we propose a new power management… ▽ More The power management techniques to extend battery lifespan is becoming increasingly important due to longer user applications' running time in mobile devices. Even when users do not use any applications, battery lifespan decreases continually. It occurs because of service daemons of mobile platform and network-based data synchronization operations. In this paper, we propose a new power management system that recognizes the idle time of the device to reduce the battery consumption of mobile devices. △ Less

Submitted 21 January, 2021; originally announced January 2021.

arXiv:2101.08877 [pdf]

doi 10.1109/TCE.2013.6689690

Virtual Memory Partitioning for Enhancing Application Performance in Mobile Platforms

Authors: Geunsik Lim, Changwoo Min, Young Ik Eom

Abstract: Recently, the amount of running software on smart mobile devices is gradually increasing due to the introduction of application stores. The application store is a type of digital distribution platform for application software, which is provided as a component of an operating system on a smartphone or tablet. Mobile devices have limited memory capacity and, unlike server and desktop systems, due to… ▽ More Recently, the amount of running software on smart mobile devices is gradually increasing due to the introduction of application stores. The application store is a type of digital distribution platform for application software, which is provided as a component of an operating system on a smartphone or tablet. Mobile devices have limited memory capacity and, unlike server and desktop systems, due to their mobility they do not have a memory slot that can expand the memory capacity. Low memory killer (LMK) and out-of-memory killer (OOMK) are widely used memory management solutions in mobile systems. They forcibly terminate applications when the available physical memory becomes insufficient. In addition, before the forced termination, the memory shortage incurs thrashing and fragmentation, thus slowing down application performance. Although the existing page reclamation mechanism is designed to secure available memory, it could seriously degrade user responsiveness due to the thrashing. Memory management is therefore still important especially in mobile devices with small memory capacity. This paper presents a new memory partitioning technique that resolves the deterioration of the existing application life cycle induced by LMK and OOMK. It provides a completely isolated virtual memory node at the operating system level. Evaluation results demonstrate that the proposed method improves application execution time under memory shortage, compared with methods in previous studies. △ Less

Submitted 21 January, 2021; originally announced January 2021.

Showing 1–50 of 66 results for author: Lim, G