subscribe to arXiv mailings

Darkverse -- A New DarkWeb?

Authors: Raymond Chan, Benjamin W. J. Kwok, Adriel Yeo, Kan Chen, Jeannie S. Lee

Abstract: The "Darkverse" could be the negative harmful area of the Metaverse; a new virtual immersive environment for the facilitation of illicit activity such as misinformation, fraud, harassment, and illegal marketplaces. This paper explores the potential for inappropriate activities within the Metaverse, and the similarities between the Darkverse and the Dark Web. Challenges and future directions for in… ▽ More The "Darkverse" could be the negative harmful area of the Metaverse; a new virtual immersive environment for the facilitation of illicit activity such as misinformation, fraud, harassment, and illegal marketplaces. This paper explores the potential for inappropriate activities within the Metaverse, and the similarities between the Darkverse and the Dark Web. Challenges and future directions for investigation are also discussed, including user identification, creation of privacy-preserving frameworks and other data monitoring methods. △ Less

Submitted 23 April, 2024; originally announced May 2024.

Comments: This is an accepted position statement of CHI 2024 Workshop (Novel Approaches for Understanding and Mitigating Emerging New Harms in Immersive and Embodied Virtual Spaces: A Workshop at CHI 2024)

arXiv:2404.12563 [pdf, other]

Teaching Linguistic Justice through Augmented Reality

Authors: Ashvini Varatharaj, Abigail Welch, Mary Bucholtz, Jin Sook Lee

Abstract: This position paper presents the AR Language Map, a speculative artifact designed to enhance understanding of linguistic justice among middle and high school students through augmented reality (AR) that allows students to map their linguistic experiences. Through a social justice-oriented academic outreach program aimed at linguistically, economically, and racially minoritized students, academic c… ▽ More This position paper presents the AR Language Map, a speculative artifact designed to enhance understanding of linguistic justice among middle and high school students through augmented reality (AR) that allows students to map their linguistic experiences. Through a social justice-oriented academic outreach program aimed at linguistically, economically, and racially minoritized students, academic concepts on language, culture, race, and power are introduced to California middle school and high school students. The curriculum has activities for each lesson plan drawn from students' culturally relevant experiences. By enabling interactive exploration of linguistic justice, this tool aims to foster empathy, challenge linguistic racism, and valorize linguistic diversity. We discuss its conceptualization within the broader context of AR in social justice education. The AR Language Map not only deepens students' understanding of these critical issues but also enables them to become co-creators of their learning experiences. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: Presented at CHI 2024 (arXiv:2404.05889)

Report number: ARSJ/2024/05

arXiv:2404.03105 [pdf, other]

Methodology for Interpretable Reinforcement Learning for Optimizing Mechanical Ventilation

Authors: Joo Seung Lee, Malini Mahendra, Anil Aswani

Abstract: Mechanical ventilation is a critical life-support intervention that uses a machine to deliver controlled air and oxygen to a patient's lungs, assisting or replacing spontaneous breathing. While several data-driven approaches have been proposed to optimize ventilator control strategies, they often lack interpretability and agreement with general domain knowledge. This paper proposes a methodology f… ▽ More Mechanical ventilation is a critical life-support intervention that uses a machine to deliver controlled air and oxygen to a patient's lungs, assisting or replacing spontaneous breathing. While several data-driven approaches have been proposed to optimize ventilator control strategies, they often lack interpretability and agreement with general domain knowledge. This paper proposes a methodology for interpretable reinforcement learning (RL) using decision trees for mechanical ventilation control. Using a causal, nonparametric model-based off-policy evaluation, we evaluate the policies in their ability to gain increases in SpO2 while avoiding aggressive ventilator settings which are known to cause ventilator induced lung injuries and other complications. Numerical experiments using MIMIC-III data on the stays of real patients' intensive care unit stays demonstrate that the decision tree policy outperforms the behavior cloning policy and is comparable to state-of-the-art RL policy. Future work concerns better aligning the cost function with medical objectives to generate deeper clinical insights. △ Less

Submitted 3 April, 2024; originally announced April 2024.

arXiv:2403.07592 [pdf, other]

Accurate Spatial Gene Expression Prediction by integrating Multi-resolution features

Authors: Youngmin Chung, Ji Hun Ha, Kyeong Chan Im, Joo Sang Lee

Abstract: Recent advancements in Spatial Transcriptomics (ST) technology have facilitated detailed gene expression analysis within tissue contexts. However, the high costs and methodological limitations of ST necessitate a more robust predictive model. In response, this paper introduces TRIPLEX, a novel deep learning framework designed to predict spatial gene expression from Whole Slide Images (WSIs). TRIPL… ▽ More Recent advancements in Spatial Transcriptomics (ST) technology have facilitated detailed gene expression analysis within tissue contexts. However, the high costs and methodological limitations of ST necessitate a more robust predictive model. In response, this paper introduces TRIPLEX, a novel deep learning framework designed to predict spatial gene expression from Whole Slide Images (WSIs). TRIPLEX uniquely harnesses multi-resolution features, capturing cellular morphology at individual spots, the local context around these spots, and the global tissue organization. By integrating these features through an effective fusion strategy, TRIPLEX achieves accurate gene expression prediction. Our comprehensive benchmark study, conducted on three public ST datasets and supplemented with Visium data from 10X Genomics, demonstrates that TRIPLEX outperforms current state-of-the-art models in Mean Squared Error (MSE), Mean Absolute Error (MAE), and Pearson Correlation Coefficient (PCC). The model's predictions align closely with ground truth gene expression profiles and tumor annotations, underscoring TRIPLEX's potential in advancing cancer diagnosis and treatment. △ Less

Submitted 25 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024

arXiv:2311.09354 [pdf]

doi 10.1063/5.0189222

Nondestructive, quantitative viability analysis of 3D tissue cultures using machine learning image segmentation

Authors: Kylie J. Trettner, Jeremy Hsieh, Weikun Xiao, Jerry S. H. Lee, Andrea M. Armani

Abstract: Ascertaining the collective viability of cells in different cell culture conditions has typically relied on averaging colorimetric indicators and is often reported out in simple binary readouts. Recent research has combined viability assessment techniques with image-based deep-learning models to automate the characterization of cellular properties. However, further development of viability measure… ▽ More Ascertaining the collective viability of cells in different cell culture conditions has typically relied on averaging colorimetric indicators and is often reported out in simple binary readouts. Recent research has combined viability assessment techniques with image-based deep-learning models to automate the characterization of cellular properties. However, further development of viability measurements to assess the continuity of possible cellular states and responses to perturbation across cell culture conditions is needed. In this work, we demonstrate an image processing algorithm for quantifying cellular viability in 3D cultures without the need for assay-based indicators. We show that our algorithm performs similarly to a pair of human experts in whole-well images over a range of days and culture matrix compositions. To demonstrate potential utility, we perform a longitudinal study investigating the impact of a known therapeutic on pancreatic cancer spheroids. Using images taken with a high content imaging system, the algorithm successfully tracks viability at the individual spheroid and whole-well level. The method we propose reduces analysis time by 97% in comparison to the experts. Because the method is independent of the microscope or imaging system used, this approach lays the foundation for accelerating progress in and for improving the robustness and reproducibility of 3D culture analysis across biological and clinical research. △ Less

Submitted 11 March, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

Comments: 52 total pages, Main text and SI included, 35 figures (5 main text, 30 supplemental), 9 tables, 6 datasets (provided on linked GitHub), linked image files on Zenodo

arXiv:2310.15747 [pdf, other]

Large Language Models are Temporal and Causal Reasoners for Video Question Answering

Authors: Dohwan Ko, Ji Soo Lee, Wooyoung Kang, Byungseok Roh, Hyunwoo J. Kim

Abstract: Large Language Models (LLMs) have shown remarkable performances on a wide range of natural language understanding and generation tasks. We observe that the LLMs provide effective priors in exploiting $\textit{linguistic shortcuts}$ for temporal and causal reasoning in Video Question Answering (VideoQA). However, such priors often cause suboptimal results on VideoQA by leading the model to over-rel… ▽ More Large Language Models (LLMs) have shown remarkable performances on a wide range of natural language understanding and generation tasks. We observe that the LLMs provide effective priors in exploiting $\textit{linguistic shortcuts}$ for temporal and causal reasoning in Video Question Answering (VideoQA). However, such priors often cause suboptimal results on VideoQA by leading the model to over-rely on questions, $\textit{i.e.}$, $\textit{linguistic bias}$, while ignoring visual content. This is also known as `ungrounded guesses' or `hallucinations'. To address this problem while leveraging LLMs' prior on VideoQA, we propose a novel framework, Flipped-VQA, encouraging the model to predict all the combinations of $\langle$V, Q, A$\rangle$ triplet by flipping the source pair and the target label to understand their complex relationships, $\textit{i.e.}$, predict A, Q, and V given a VQ, VA, and QA pairs, respectively. In this paper, we develop LLaMA-VQA by applying Flipped-VQA to LLaMA, and it outperforms both LLMs-based and non-LLMs-based models on five challenging VideoQA benchmarks. Furthermore, our Flipped-VQA is a general framework that is applicable to various LLMs (OPT and GPT-J) and consistently improves their performances. We empirically demonstrate that Flipped-VQA not only enhances the exploitation of linguistic shortcuts but also mitigates the linguistic bias, which causes incorrect answers over-relying on the question. Code is available at https://github.com/mlvlab/Flipped-VQA. △ Less

Submitted 6 November, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: Accepted paper at EMNLP 2023 Main

arXiv:2308.09363 [pdf, other]

Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models

Authors: Dohwan Ko, Ji Soo Lee, Miso Choi, Jaewon Chu, Jihwan Park, Hyunwoo J. Kim

Abstract: Video Question Answering (VideoQA) is a challenging task that entails complex multi-modal reasoning. In contrast to multiple-choice VideoQA which aims to predict the answer given several options, the goal of open-ended VideoQA is to answer questions without restricting candidate answers. However, the majority of previous VideoQA models formulate open-ended VideoQA as a classification task to class… ▽ More Video Question Answering (VideoQA) is a challenging task that entails complex multi-modal reasoning. In contrast to multiple-choice VideoQA which aims to predict the answer given several options, the goal of open-ended VideoQA is to answer questions without restricting candidate answers. However, the majority of previous VideoQA models formulate open-ended VideoQA as a classification task to classify the video-question pairs into a fixed answer set, i.e., closed-vocabulary, which contains only frequent answers (e.g., top-1000 answers). This leads the model to be biased toward only frequent answers and fail to generalize on out-of-vocabulary answers. We hence propose a new benchmark, Open-vocabulary Video Question Answering (OVQA), to measure the generalizability of VideoQA models by considering rare and unseen answers. In addition, in order to improve the model's generalization power, we introduce a novel GNN-based soft verbalizer that enhances the prediction on rare and unseen answers by aggregating the information from their similar words. For evaluation, we introduce new baselines by modifying the existing (closed-vocabulary) open-ended VideoQA models and improve their performances by further taking into account rare and unseen answers. Our ablation studies and qualitative analyses demonstrate that our GNN-based soft verbalizer further improves the model performance, especially on rare and unseen answers. We hope that our benchmark OVQA can serve as a guide for evaluating the generalizability of VideoQA models and inspire future research. Code is available at https://github.com/mlvlab/OVQA. △ Less

Submitted 18 August, 2023; originally announced August 2023.

Comments: Accepted paper at ICCV 2023

arXiv:2306.03324 [pdf, other]

Impact of Large Language Models on Generating Software Specifications

Authors: Danning Xie, Byungwoo Yoo, Nan Jiang, Mijung Kim, Lin Tan, Xiangyu Zhang, Judy S. Lee

Abstract: Software specifications are essential for ensuring the reliability of software systems. Existing specification extraction approaches, however, suffer from limited generalizability and require manual efforts. The recent emergence of Large Language Models (LLMs), which have been successfully applied to numerous software engineering tasks, offers a promising avenue for automating this process. In thi… ▽ More Software specifications are essential for ensuring the reliability of software systems. Existing specification extraction approaches, however, suffer from limited generalizability and require manual efforts. The recent emergence of Large Language Models (LLMs), which have been successfully applied to numerous software engineering tasks, offers a promising avenue for automating this process. In this paper, we conduct the first empirical study to evaluate the capabilities of LLMs for generating software specifications from software comments or documentation. We evaluate LLMs' performance with Few Shot Learning (FSL), enabling LLMs to generalize from a small number of examples, as well as different prompt construction strategies, and compare the performance of LLMs with traditional approaches. Additionally, we conduct a comparative diagnosis of the failure cases from both LLMs and traditional methods, identifying their unique strengths and weaknesses. Lastly, we conduct extensive experiments on 15 state of the art LLMs, evaluating their performance and cost effectiveness for generating software specifications. Our results show that with FSL, LLMs outperform traditional methods (by 5.6%), and more sophisticated prompt construction strategies can further enlarge this performance gap (up to 5.1 to 10.0%). Yet, LLMs suffer from their unique challenges, such as ineffective prompts and the lack of domain knowledge, which together account for 53 to 60% of LLM unique failures. The strong performance of open source models (e.g., StarCoder) makes closed source models (e.g., GPT 3 Davinci) less desirable due to size and cost. Our study offers valuable insights for future research to improve specification generation. △ Less

Submitted 2 October, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

arXiv:2301.07740 [pdf, other]

doi 10.1109/MMUL.2022.3217627

The Metaverse from a Multimedia Communications Perspective

Authors: Haiwei Dong, Jeannie S. A. Lee

Abstract: eXtended reality (XR) technologies such as virtual reality and 360° stereoscopic streaming enable the concept of the Metaverse, an immersive virtual space for collaboration and interaction. To ensure high fidelity display of immersive media, the bandwidth, latency and network traffic patterns will need to be considered to ensure a user's Quality of Experience (QoE). In this article, examples and c… ▽ More eXtended reality (XR) technologies such as virtual reality and 360° stereoscopic streaming enable the concept of the Metaverse, an immersive virtual space for collaboration and interaction. To ensure high fidelity display of immersive media, the bandwidth, latency and network traffic patterns will need to be considered to ensure a user's Quality of Experience (QoE). In this article, examples and calculations are explored to demonstrate the requirements of the abovementioned parameters. Additionally, future methods such as network-awareness using reinforcement learning (RL) and XR content awareness using spatial or temporal difference in the frames could be explored from a multimedia communications perspective. △ Less

Submitted 18 January, 2023; originally announced January 2023.

Journal ref: IEEE Multimedia Magazine, vol. 29, no. 4, pp. 123-127, 2022

arXiv:2209.00783 [pdf, other]

doi 10.1109/NTMS49979.2021.9432673

TypoSwype: An Imaging Approach to Detect Typo-Squatting

Authors: Joon Sern Lee, Yam Gui Peng David

Abstract: Typo-squatting domains are a common cyber-attack technique. It involves utilising domain names, that exploit possible typographical errors of commonly visited domains, to carry out malicious activities such as phishing, malware installation, etc. Current approaches typically revolve around string comparison algorithms like the Demaru-Levenschtein Distance (DLD) algorithm. Such techniques do not ta… ▽ More Typo-squatting domains are a common cyber-attack technique. It involves utilising domain names, that exploit possible typographical errors of commonly visited domains, to carry out malicious activities such as phishing, malware installation, etc. Current approaches typically revolve around string comparison algorithms like the Demaru-Levenschtein Distance (DLD) algorithm. Such techniques do not take into account keyboard distance, which researchers find to have a strong correlation with typical typographical errors and are trying to take account of. In this paper, we present the TypoSwype framework which converts strings to images that take into account keyboard location innately. We also show how modern state of the art image recognition techniques involving Convolutional Neural Networks, trained via either Triplet Loss or NT-Xent Loss, can be applied to learn a mapping to a lower dimensional space where distances correspond to image, and equivalently, textual similarity. Finally, we also demonstrate our method's ability to improve typo-squatting detection over the widely used DLD algorithm, while maintaining the classification accuracy as to which domain the input domain was attempting to typo-squat. △ Less

Submitted 1 September, 2022; originally announced September 2022.

Journal ref: 2021 11th IFIP International Conference on New Technologies, Mobility and Security (NTMS)

arXiv:2209.00782 [pdf, other]

BinImg2Vec: Augmenting Malware Binary Image Classification with Data2Vec

Authors: Joon Sern Lee, Kai Keng Tay, Zong Fu Chua

Abstract: Rapid digitalisation spurred by the Covid-19 pandemic has resulted in more cyber crime. Malware-as-a-service is now a booming business for cyber criminals. With the surge in malware activities, it is vital for cyber defenders to understand more about the malware samples they have at hand as such information can greatly influence their next course of actions during a breach. Recently, researchers h… ▽ More Rapid digitalisation spurred by the Covid-19 pandemic has resulted in more cyber crime. Malware-as-a-service is now a booming business for cyber criminals. With the surge in malware activities, it is vital for cyber defenders to understand more about the malware samples they have at hand as such information can greatly influence their next course of actions during a breach. Recently, researchers have shown how malware family classification can be done by first converting malware binaries into grayscale images and then passing them through neural networks for classification. However, most work focus on studying the impact of different neural network architectures on classification performance. In the last year, researchers have shown that augmenting supervised learning with self-supervised learning can improve performance. Even more recently, Data2Vec was proposed as a modality agnostic self-supervised framework to train neural networks. In this paper, we present BinImg2Vec, a framework of training malware binary image classifiers that incorporates both self-supervised learning and supervised learning to produce a model that consistently outperforms one trained only via supervised learning. We were able to achieve a 4% improvement in classification performance and a 0.5% reduction in performance variance over multiple runs. We also show how our framework produces embeddings that can be well clustered, facilitating model explanability. △ Less

Submitted 1 September, 2022; originally announced September 2022.

Journal ref: 1st International Conference on AI in Cybersecurity (ICAIC), 2022

arXiv:2205.09185 [pdf, other]

doi 10.1016/j.nima.2022.167748

AI-assisted Optimization of the ECCE Tracking System at the Electron Ion Collider

Authors: C. Fanelli, Z. Papandreou, K. Suresh, J. K. Adkins, Y. Akiba, A. Albataineh, M. Amaryan, I. C. Arsene, C. Ayerbe Gayoso, J. Bae, X. Bai, M. D. Baker, M. Bashkanov, R. Bellwied, F. Benmokhtar, V. Berdnikov, J. C. Bernauer, F. Bock, W. Boeglin, M. Borysova, E. Brash, P. Brindza, W. J. Briscoe, M. Brooks, S. Bueltmann , et al. (258 additional authors not shown)

Abstract: The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to… ▽ More The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to leverage Artificial Intelligence (AI) already starting from the design and R&D phases. The EIC Comprehensive Chromodynamics Experiment (ECCE) is a consortium that proposed a detector design based on a 1.5T solenoid. The EIC detector proposal review concluded that the ECCE design will serve as the reference design for an EIC detector. Herein we describe a comprehensive optimization of the ECCE tracker using AI. The work required a complex parametrization of the simulated detector system. Our approach dealt with an optimization problem in a multidimensional design space driven by multiple objectives that encode the detector performance, while satisfying several mechanical constraints. We describe our strategy and show results obtained for the ECCE tracking system. The AI-assisted design is agnostic to the simulation framework and can be extended to other sub-detectors or to a system of sub-detectors to further optimize the performance of the EIC detector. △ Less

Submitted 19 May, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

Comments: 16 pages, 18 figures, 2 appendices, 3 tables

arXiv:2106.03797 [pdf, other]

Drone-based AI and 3D Reconstruction for Digital Twin Augmentation

Authors: Alex To, Maican Liu, Muhammad Hazeeq Bin Muhammad Hairul, Joseph G. Davis, Jeannie S. A. Lee, Henrik Hesse, Hoang D. Nguyen

Abstract: Digital Twin is an emerging technology at the forefront of Industry 4.0, with the ultimate goal of combining the physical space and the virtual space. To date, the Digital Twin concept has been applied in many engineering fields, providing useful insights in the areas of engineering design, manufacturing, automation, and construction industry. While the nexus of various technologies opens up new o… ▽ More Digital Twin is an emerging technology at the forefront of Industry 4.0, with the ultimate goal of combining the physical space and the virtual space. To date, the Digital Twin concept has been applied in many engineering fields, providing useful insights in the areas of engineering design, manufacturing, automation, and construction industry. While the nexus of various technologies opens up new opportunities with Digital Twin, the technology requires a framework to integrate the different technologies, such as the Building Information Model used in the Building and Construction industry. In this work, an Information Fusion framework is proposed to seamlessly fuse heterogeneous components in a Digital Twin framework from the variety of technologies involved. This study aims to augment Digital Twin in buildings with the use of AI and 3D reconstruction empowered by unmanned aviation vehicles. We proposed a drone-based Digital Twin augmentation framework with reusable and customisable components. A proof of concept is also developed, and extensive evaluation is conducted for 3D reconstruction and applications of AI for defect detection. △ Less

Submitted 19 May, 2021; originally announced June 2021.

arXiv:2104.11364 [pdf]

A field guide to cultivating computational biology

Authors: Anne E Carpenter, Casey S Greene, Piero Carnici, Benilton S Carvalho, Michiel de Hoon, Stacey Finley, Kim-Anh Le Cao, Jerry SH Lee, Luigi Marchionni, Suzanne Sindi, Fabian J Theis, Gregory P Way, Jean YH Yang, Elana J Fertig

Abstract: Biomedical research centers can empower basic discovery and novel therapeutic strategies by leveraging their large-scale datasets from experiments and patients. This data, together with new technologies to create and analyze it, has ushered in an era of data-driven discovery which requires moving beyond the traditional individual, single-discipline investigator research model. This interdisciplina… ▽ More Biomedical research centers can empower basic discovery and novel therapeutic strategies by leveraging their large-scale datasets from experiments and patients. This data, together with new technologies to create and analyze it, has ushered in an era of data-driven discovery which requires moving beyond the traditional individual, single-discipline investigator research model. This interdisciplinary niche is where computational biology thrives. It has matured over the past three decades and made major contributions to scientific knowledge and human health, yet researchers in the field often languish in career advancement, publication, and grant review. We propose solutions for individual scientists, institutions, journal publishers, funding agencies, and educators. △ Less

Submitted 22 April, 2021; originally announced April 2021.

arXiv:2006.13742 [pdf, other]

doi 10.1109/CCCI49893.2020.9256804

PhishGAN: Data Augmentation and Identification of Homoglpyh Attacks

Authors: Joon Sern Lee, Gui Peng David Yam, Jin Hao Chan

Abstract: Homoglyph attacks are a common technique used by hackers to conduct phishing. Domain names or links that are visually similar to actual ones are created via punycode to obfuscate the attack, making the victim more susceptible to phishing. For example, victims may mistake "|inkedin.com" for "linkedin.com" and in the process, divulge personal details to the fake website. Current State of The Art (SO… ▽ More Homoglyph attacks are a common technique used by hackers to conduct phishing. Domain names or links that are visually similar to actual ones are created via punycode to obfuscate the attack, making the victim more susceptible to phishing. For example, victims may mistake "|inkedin.com" for "linkedin.com" and in the process, divulge personal details to the fake website. Current State of The Art (SOTA) typically make use of string comparison algorithms (e.g. Levenshtein Distance), which are computationally heavy. One reason for this is the lack of publicly available datasets thus hindering the training of more advanced Machine Learning (ML) models. Furthermore, no one font is able to render all types of punycode correctly, posing a significant challenge to the creation of a dataset that is unbiased toward any particular font. This coupled with the vast number of internet domains pose a challenge in creating a dataset that can capture all possible variations. Here, we show how a conditional Generative Adversarial Network (GAN), PhishGAN, can be used to generate images of hieroglyphs, conditioned on non-homoglpyh input text images. Practical changes to current SOTA were required to facilitate the generation of more varied homoglyph text-based images. We also demonstrate a workflow of how PhishGAN together with a Homoglyph Identifier (HI) model can be used to identify the domain the homoglyph was trying to imitate. Furthermore, we demonstrate how PhishGAN's ability to generate datasets on the fly facilitate the quick adaptation of cybersecurity systems to detect new threats as they emerge. △ Less

Submitted 28 September, 2020; v1 submitted 24 June, 2020; originally announced June 2020.

Comments: 8 pages, 8 figures

arXiv:2006.12246 [pdf, other]

Pain Intensity Estimation from Mobile Video Using 2D and 3D Facial Keypoints

Authors: Matthew Lee, Lyndon Kennedy, Andreas Girgensohn, Lynn Wilcox, John Song En Lee, Chin Wen Tan, Ban Leong Sng

Abstract: Managing post-surgical pain is critical for successful surgical outcomes. One of the challenges of pain management is accurately assessing the pain level of patients. Self-reported numeric pain ratings are limited because they are subjective, can be affected by mood, and can influence the patient's perception of pain when making comparisons. In this paper, we introduce an approach that analyzes 2D… ▽ More Managing post-surgical pain is critical for successful surgical outcomes. One of the challenges of pain management is accurately assessing the pain level of patients. Self-reported numeric pain ratings are limited because they are subjective, can be affected by mood, and can influence the patient's perception of pain when making comparisons. In this paper, we introduce an approach that analyzes 2D and 3D facial keypoints of post-surgical patients to estimate their pain intensity level. Our approach leverages the previously unexplored capabilities of a smartphone to capture a dense 3D representation of a person's face as input for pain intensity level estimation. Our contributions are adata collection study with post-surgical patients to collect ground-truth labeled sequences of 2D and 3D facial keypoints for developing a pain estimation algorithm, a pain estimation model that uses multiple instance learning to overcome inherent limitations in facial keypoint sequences, and the preliminary results of the pain estimation model using 2D and 3D features with comparisons of alternate approaches. △ Less

Submitted 16 June, 2020; originally announced June 2020.

arXiv:2003.02955 [pdf, other]

Automatic Compilation of Resources for Academic Writing and Evaluating with Informal Word Identification and Paraphrasing System

Authors: Seid Muhie Yimam, Gopalakrishnan Venkatesh, John Sie Yuen Lee, Chris Biemann

Abstract: We present the first approach to automatically building resources for academic writing. The aim is to build a writing aid system that automatically edits a text so that it better adheres to the academic style of writing. On top of existing academic resources, such as the Corpus of Contemporary American English (COCA) academic Word List, the New Academic Word List, and the Academic Collocation List… ▽ More We present the first approach to automatically building resources for academic writing. The aim is to build a writing aid system that automatically edits a text so that it better adheres to the academic style of writing. On top of existing academic resources, such as the Corpus of Contemporary American English (COCA) academic Word List, the New Academic Word List, and the Academic Collocation List, we also explore how to dynamically build such resources that would be used to automatically identify informal or non-academic words or phrases. The resources are compiled using different generic approaches that can be extended for different domains and languages. We describe the evaluation of resources with a system implementation. The system consists of an informal word identification (IWI), academic candidate paraphrase generation, and paraphrase ranking components. To generate candidates and rank them in context, we have used the PPDB and WordNet paraphrase resources. We use the Concepts in Context (CoInCO) "All-Words" lexical substitution dataset both for the informal word identification and paraphrase generation experiments. Our informal word identification component achieves an F-1 score of 82%, significantly outperforming a stratified classifier baseline. The main contribution of this work is a domain-independent methodology to build targeted resources for writing aids. △ Less

Submitted 5 March, 2020; originally announced March 2020.

arXiv:1904.04354 [pdf, other]

Relational Reasoning Network (RRN) for Anatomical Landmarking

Authors: Neslisah Torosdagli, Syed Anwar, Payal Verma, Denise K Liberton, Janice S. Lee, Wade W. Han, Ulas Bagci

Abstract: Purpose: We perform anatomical landmarking for craniomaxillofacial (CMF) bones without explicitly segmenting them. Towards this, we propose a new simple yet efficient deep network architecture, called \textit{relational reasoning network (RRN)}, to accurately learn the local and the global relations among the landmarks in CMF bones; specifically, mandible, maxilla, and nasal bones. Approach: The… ▽ More Purpose: We perform anatomical landmarking for craniomaxillofacial (CMF) bones without explicitly segmenting them. Towards this, we propose a new simple yet efficient deep network architecture, called \textit{relational reasoning network (RRN)}, to accurately learn the local and the global relations among the landmarks in CMF bones; specifically, mandible, maxilla, and nasal bones. Approach: The proposed RRN works in an end-to-end manner, utilizing learned relations of the landmarks based on dense-block units. For a given few landmarks as input, RRN treats the landmarking process similar to a data imputation problem where predicted landmarks are considered missing. Results: We applied RRN to cone beam computed tomography scans obtained from 250 patients. With a 4-fold cross validation technique, we obtained an average root mean squared error of less than 2 mm per landmark. Our proposed RRN has revealed unique relationships among the landmarks that help us in inferring several \textit{reasoning} about informativeness of the landmark points. The proposed system identifies the missing landmark locations accurately even when severe pathology or deformation are present in the bones. Conclusions: Accurately identifying anatomical landmarks is a crucial step in deformation analysis and surgical planning for CMF surgeries. Achieving this goal without the need for explicit bone segmentation addresses a major limitation of segmentation based approaches, where segmentation failure (as often the case in bones with severe pathology or deformation) could easily lead to incorrect landmarking. To the best of our knowledge, this is the first of its kind algorithm finding anatomical relations of the objects using deep learning. △ Less

Submitted 18 September, 2022; v1 submitted 8 April, 2019; originally announced April 2019.

arXiv:1901.08740 [pdf, other]

Model-based Deep Reinforcement Learning for Dynamic Portfolio Optimization

Authors: Pengqian Yu, Joon Sern Lee, Ilya Kulyatin, Zekun Shi, Sakyasingha Dasgupta

Abstract: Dynamic portfolio optimization is the process of sequentially allocating wealth to a collection of assets in some consecutive trading periods, based on investors' return-risk profile. Automating this process with machine learning remains a challenging problem. Here, we design a deep reinforcement learning (RL) architecture with an autonomous trading agent such that, investment decisions and action… ▽ More Dynamic portfolio optimization is the process of sequentially allocating wealth to a collection of assets in some consecutive trading periods, based on investors' return-risk profile. Automating this process with machine learning remains a challenging problem. Here, we design a deep reinforcement learning (RL) architecture with an autonomous trading agent such that, investment decisions and actions are made periodically, based on a global objective, with autonomy. In particular, without relying on a purely model-free RL agent, we train our trading agent using a novel RL architecture consisting of an infused prediction module (IPM), a generative adversarial data augmentation module (DAM) and a behavior cloning module (BCM). Our model-based approach works with both on-policy or off-policy RL algorithms. We further design the back-testing and execution engine which interact with the RL agent in real time. Using historical {\em real} financial market data, we simulate trading with practical constraints, and demonstrate that our proposed model is robust, profitable and risk-sensitive, as compared to baseline trading strategies and model-free RL agents from prior work. △ Less

Submitted 24 January, 2019; originally announced January 2019.

arXiv:1810.04021 [pdf, other]

Deep Geodesic Learning for Segmentation and Anatomical Landmarking

Authors: Neslisah Torosdagli, Denise K. Liberton, Payal Verma, Murat Sincan, Janice S. Lee, Ulas Bagci

Abstract: In this paper, we propose a novel deep learning framework for anatomy segmentation and automatic landmark- ing. Specifically, we focus on the challenging problem of mandible segmentation from cone-beam computed tomography (CBCT) scans and identification of 9 anatomical landmarks of the mandible on the geodesic space. The overall approach employs three inter-related steps. In step 1, we propose a d… ▽ More In this paper, we propose a novel deep learning framework for anatomy segmentation and automatic landmark- ing. Specifically, we focus on the challenging problem of mandible segmentation from cone-beam computed tomography (CBCT) scans and identification of 9 anatomical landmarks of the mandible on the geodesic space. The overall approach employs three inter-related steps. In step 1, we propose a deep neu- ral network architecture with carefully designed regularization, and network hyper-parameters to perform image segmentation without the need for data augmentation and complex post- processing refinement. In step 2, we formulate the landmark localization problem directly on the geodesic space for sparsely- spaced anatomical landmarks. In step 3, we propose to use a long short-term memory (LSTM) network to identify closely- spaced landmarks, which is rather difficult to obtain using other standard detection networks. The proposed fully automated method showed superior efficacy compared to the state-of-the- art mandible segmentation and landmarking approaches in craniofacial anomalies and diseased states. We used a very challenging CBCT dataset of 50 patients with a high-degree of craniomaxillofacial (CMF) variability that is realistic in clinical practice. Complementary to the quantitative analysis, the qualitative visual inspection was conducted for distinct CBCT scans from 250 patients with high anatomical variability. We have also shown feasibility of the proposed work in an independent dataset from MICCAI Head-Neck Challenge (2015) achieving the state-of-the-art performance. Lastly, we present an in-depth analysis of the proposed deep networks with respect to the choice of hyper-parameters such as pooling and activation functions. △ Less

Submitted 6 October, 2018; originally announced October 2018.

Comments: 14 pages, 12 Figures, IEEE Transactions on Medical Imaging 2018, TMI-2018-0898.R1

arXiv:1511.02511 [pdf]

doi 10.1016/j.heliyon.2017.e00422

The Cosmic Microwave Background Radiation Power Spectrum as a Random Bit Generator for Symmetric and Asymmetric-Key Cryptography

Authors: Jeffrey S. Lee, Gerald B. Cleaver

Abstract: In this note, the Cosmic Microwave Background (CMB) Radiation is shown to be capable of functioning as a Random Bit Generator, and constitutes an effectively infinite supply of truly random one-time pad values of arbitrary length. It is further argued that the CMB power spectrum potentially conforms to the FIPS 140-2 standard. Additionally, its applicability to the generation of a (n x n) random k… ▽ More In this note, the Cosmic Microwave Background (CMB) Radiation is shown to be capable of functioning as a Random Bit Generator, and constitutes an effectively infinite supply of truly random one-time pad values of arbitrary length. It is further argued that the CMB power spectrum potentially conforms to the FIPS 140-2 standard. Additionally, its applicability to the generation of a (n x n) random key matrix for a Vernam cipher is established. △ Less

Submitted 10 November, 2015; v1 submitted 8 November, 2015; originally announced November 2015.

Comments: 9 pages, 1 figure

Journal ref: Heliyon Volume 3, Issue 10, October 2017, Article e00422

arXiv:0902.0744 [pdf]

Embedding Data within Knowledge Spaces

Authors: James D. Myers, Joe Futrelle, Jeff Gaynor, Joel Plutchak, Peter Bajcsy, Jason Kastner, Kailash Kotwani, Jong Sung Lee, Luigi Marini, Rob Kooper, Robert E. McGrath, Terry McLaren, Alejandro Rodriguez, Yong Liu

Abstract: The promise of e-Science will only be realized when data is discoverable, accessible, and comprehensible within distributed teams, across disciplines, and over the long-term--without reliance on out-of-band (non-digital) means. We have developed the open-source Tupelo semantic content management framework and are employing it to manage a wide range of e-Science entities (including data, document… ▽ More The promise of e-Science will only be realized when data is discoverable, accessible, and comprehensible within distributed teams, across disciplines, and over the long-term--without reliance on out-of-band (non-digital) means. We have developed the open-source Tupelo semantic content management framework and are employing it to manage a wide range of e-Science entities (including data, documents, workflows, people, and projects) and a broad range of metadata (including provenance, social networks, geospatial relationships, temporal relations, and domain descriptions). Tupelo couples the use of global identifiers and resource description framework (RDF) statements with an aggregatable content repository model to provide a unified space for securely managing distributed heterogeneous content and relationships. △ Less

Submitted 4 February, 2009; originally announced February 2009.

Comments: 10 pages with 1 figure. Corrected incorrect transliteration in abstract

ACM Class: H.3.5; H.5.3; I.2.4

Showing 1–22 of 22 results for author: Lee, J S