subscribe to arXiv mailings

Universal Gloss-level Representation for Gloss-free Sign Language Translation and Production

Authors: Eui Jun Hwang, Sukmin Cho, Huije Lee, Youngwoo Yoon, Jong C. Park

Abstract: Sign language, essential for the deaf and hard-of-hearing, presents unique challenges in translation and production due to its multimodal nature and the inherent ambiguity in mapping sign language motion to spoken language words. Previous methods often rely on gloss annotations, requiring time-intensive labor and specialized expertise in sign language. Gloss-free methods have emerged to address th… ▽ More Sign language, essential for the deaf and hard-of-hearing, presents unique challenges in translation and production due to its multimodal nature and the inherent ambiguity in mapping sign language motion to spoken language words. Previous methods often rely on gloss annotations, requiring time-intensive labor and specialized expertise in sign language. Gloss-free methods have emerged to address these limitations, but they often depend on external sign language data or dictionaries, failing to completely eliminate the need for gloss annotations. There is a clear demand for a comprehensive approach that can supplant gloss annotations and be utilized for both Sign Language Translation (SLT) and Sign Language Production (SLP). We introduce Universal Gloss-level Representation (UniGloR), a unified and self-supervised solution for both SLT and SLP, trained on multiple datasets including PHOENIX14T, How2Sign, and NIASL2021. Our results demonstrate UniGloR's effectiveness in the translation and production tasks. We further report an encouraging result for the Sign Language Recognition (SLR) on previously unseen data. Our study suggests that self-supervised learning can be made in a unified manner, paving the way for innovative and practical applications in future research. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 14 pages, 5 figures

arXiv:2407.00263 [pdf, other]

From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models

Authors: Mehar Bhatia, Sahithya Ravi, Aditya Chinchure, Eunjeong Hwang, Vered Shwartz

Abstract: Despite recent advancements in vision-language models, their performance remains suboptimal on images from non-western cultures due to underrepresentation in training datasets. Various benchmarks have been proposed to test models' cultural inclusivity, but they have limited coverage of cultures and do not adequately assess cultural diversity across universal as well as culture-specific local conce… ▽ More Despite recent advancements in vision-language models, their performance remains suboptimal on images from non-western cultures due to underrepresentation in training datasets. Various benchmarks have been proposed to test models' cultural inclusivity, but they have limited coverage of cultures and do not adequately assess cultural diversity across universal as well as culture-specific local concepts. To address these limitations, we introduce the GlobalRG benchmark, comprising two challenging tasks: retrieval across universals and cultural visual grounding. The former task entails retrieving culturally diverse images for universal concepts from 50 countries, while the latter aims at grounding culture-specific concepts within images from 15 countries. Our evaluation across a wide range of models reveals that the performance varies significantly across cultures -- underscoring the necessity for enhancing multicultural understanding in vision-language models. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: Under peer review

arXiv:2406.05079 [pdf, other]

SUMIE: A Synthetic Benchmark for Incremental Entity Summarization

Authors: Eunjeong Hwang, Yichao Zhou, Beliz Gunel, James Bradley Wendt, Sandeep Tata

Abstract: No existing dataset adequately tests how well language models can incrementally update entity summaries - a crucial ability as these models rapidly advance. The Incremental Entity Summarization (IES) task is vital for maintaining accurate, up-to-date knowledge. To address this, we introduce SUMIE, a fully synthetic dataset designed to expose real-world IES challenges. This dataset effectively high… ▽ More No existing dataset adequately tests how well language models can incrementally update entity summaries - a crucial ability as these models rapidly advance. The Incremental Entity Summarization (IES) task is vital for maintaining accurate, up-to-date knowledge. To address this, we introduce SUMIE, a fully synthetic dataset designed to expose real-world IES challenges. This dataset effectively highlights problems like incorrect entity association and incomplete information presentation. Unlike common synthetic datasets, ours captures the complexity and nuances found in real-world data. We generate informative and diverse attributes, summaries, and unstructured paragraphs in sequence, ensuring high quality. The alignment between generated summaries and paragraphs exceeds 96%, confirming the dataset's quality. Extensive experiments demonstrate the dataset's difficulty - state-of-the-art LLMs struggle to update summaries with an F1 higher than 80.4%. We will open source the benchmark and the evaluation metrics to help the community make progress on IES tasks. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 24 figures, 4 tables

arXiv:2401.04109 [pdf]

Recent developments of selective laser processes for wearable devices

Authors: Youngchan Kim, Eunseung Hwang, Chang Kai, Kaichen Xu, Heng Pan, Sukjoon Hong

Abstract: Recently, the growing interest in wearable technology for personal healthcare and smart VR/AR applications newly imposed a need for development of facile fabrication method. Regarding the issue, laser has long been proposing original answers to such challenging technological demands with its remote, sterile, rapid, and site-selective processing characteristics for arbitrary materials. In this revi… ▽ More Recently, the growing interest in wearable technology for personal healthcare and smart VR/AR applications newly imposed a need for development of facile fabrication method. Regarding the issue, laser has long been proposing original answers to such challenging technological demands with its remote, sterile, rapid, and site-selective processing characteristics for arbitrary materials. In this review, recent developments in relevant laser processes are summarized in two separate categories. Firstly, transformative approaches represented by laser-induced graphene (LIG) are introduced. Apart from design optimization and alteration of native substrate, latest advancements in the transformative approach now enable not only more complex material compositions but also multilayer device configurations by simultaneous transformation of heterogeneous precursor or sequential addition of functional layers coupled with other electronic elements. Besides, more conventional laser techniques such as ablation, sintering and synthesis are still accessible for enhancing the functionality of the entire system through expansion of applicable materials and adoption of new mechanisms. Various wearable device components developed through the corresponding laser processes are then organized with emphasis on chemical/physical sensors and energy devices. At the same time, special attention is given to the applications utilizing multiple laser sources or multiple laser processes, which pave the way towards all-laser fabrication of wearable devices. △ Less

Submitted 28 November, 2023; originally announced January 2024.

arXiv:2311.02122 [pdf, other]

Lost Your Style? Navigating with Semantic-Level Approach for Text-to-Outfit Retrieval

Authors: Junkyu Jang, Eugene Hwang, Sung-Hyuk Park

Abstract: Fashion stylists have historically bridged the gap between consumers' desires and perfect outfits, which involve intricate combinations of colors, patterns, and materials. Although recent advancements in fashion recommendation systems have made strides in outfit compatibility prediction and complementary item retrieval, these systems rely heavily on pre-selected customer choices. Therefore, we int… ▽ More Fashion stylists have historically bridged the gap between consumers' desires and perfect outfits, which involve intricate combinations of colors, patterns, and materials. Although recent advancements in fashion recommendation systems have made strides in outfit compatibility prediction and complementary item retrieval, these systems rely heavily on pre-selected customer choices. Therefore, we introduce a groundbreaking approach to fashion recommendations: text-to-outfit retrieval task that generates a complete outfit set based solely on textual descriptions given by users. Our model is devised at three semantic levels-item, style, and outfit-where each level progressively aggregates data to form a coherent outfit recommendation based on textual input. Here, we leverage strategies similar to those in the contrastive language-image pretraining model to address the intricate-style matrix within the outfit sets. Using the Maryland Polyvore and Polyvore Outfit datasets, our approach significantly outperformed state-of-the-art models in text-video retrieval tasks, solidifying its effectiveness in the fashion recommendation domain. This research not only pioneers a new facet of fashion recommendation systems, but also introduces a method that captures the essence of individual style preferences through textual descriptions. △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: 10pages, 2024 WACV Accepted

arXiv:2309.12179 [pdf, other]

Autoregressive Sign Language Production: A Gloss-Free Approach with Discrete Representations

Authors: Eui Jun Hwang, Huije Lee, Jong C. Park

Abstract: Gloss-free Sign Language Production (SLP) offers a direct translation of spoken language sentences into sign language, bypassing the need for gloss intermediaries. This paper presents the Sign language Vector Quantization Network, a novel approach to SLP that leverages Vector Quantization to derive discrete representations from sign pose sequences. Our method, rooted in both manual and non-manual… ▽ More Gloss-free Sign Language Production (SLP) offers a direct translation of spoken language sentences into sign language, bypassing the need for gloss intermediaries. This paper presents the Sign language Vector Quantization Network, a novel approach to SLP that leverages Vector Quantization to derive discrete representations from sign pose sequences. Our method, rooted in both manual and non-manual elements of signing, supports advanced decoding methods and integrates latent-level alignment for enhanced linguistic coherence. Through comprehensive evaluations, we demonstrate superior performance of our method over prior SLP methods and highlight the reliability of Back-Translation and Fréchet Gesture Distance as evaluation metrics. △ Less

Submitted 8 June, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: 5 pages, 3 figures, 6 tables

arXiv:2305.14929 [pdf, other]

Aligning Language Models to User Opinions

Authors: EunJeong Hwang, Bodhisattwa Prasad Majumder, Niket Tandon

Abstract: An important aspect of developing LLMs that interact with humans is to align models' behavior to their users. It is possible to prompt an LLM into behaving as a certain persona, especially a user group or ideological persona the model captured during its pertaining stage. But, how to best align an LLM with a specific user and not a demographic or ideological group remains an open question. Mining… ▽ More An important aspect of developing LLMs that interact with humans is to align models' behavior to their users. It is possible to prompt an LLM into behaving as a certain persona, especially a user group or ideological persona the model captured during its pertaining stage. But, how to best align an LLM with a specific user and not a demographic or ideological group remains an open question. Mining public opinion surveys (by Pew Research), we find that the opinions of a user and their demographics and ideologies are not mutual predictors. We use this insight to align LLMs by modeling both user opinions as well as user demographics and ideology, achieving up to 7 points accuracy gains in predicting public opinions from survey questions across a broad set of topics. In addition to the typical approach of prompting LLMs with demographics and ideology, we discover that utilizing the most relevant past opinions from individual users enables the model to predict user opinions more accurately. △ Less

Submitted 24 May, 2023; originally announced May 2023.

arXiv:2305.13703 [pdf, other]

MemeCap: A Dataset for Captioning and Interpreting Memes

Authors: EunJeong Hwang, Vered Shwartz

Abstract: Memes are a widely popular tool for web users to express their thoughts using visual metaphors. Understanding memes requires recognizing and interpreting visual metaphors with respect to the text inside or around the meme, often while employing background knowledge and reasoning abilities. We present the task of meme captioning and release a new dataset, MemeCap. Our dataset contains 6.3K memes al… ▽ More Memes are a widely popular tool for web users to express their thoughts using visual metaphors. Understanding memes requires recognizing and interpreting visual metaphors with respect to the text inside or around the meme, often while employing background knowledge and reasoning abilities. We present the task of meme captioning and release a new dataset, MemeCap. Our dataset contains 6.3K memes along with the title of the post containing the meme, the meme captions, the literal image caption, and the visual metaphors. Despite the recent success of vision and language (VL) models on tasks such as image captioning and visual question answering, our extensive experiments using state-of-the-art VL models show that they still struggle with visual metaphors, and perform substantially worse than humans. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2210.08768 [pdf, other]

N-pad : Neighboring Pixel-based Industrial Anomaly Detection

Authors: JunKyu Jang, Eugene Hwang, Sung-Hyuk Park

Abstract: Identifying defects in the images of industrial products has been an important task to enhance quality control and reduce maintenance costs. In recent studies, industrial anomaly detection models were developed using pre-trained networks to learn nominal representations. To employ the relative positional information of each pixel, we present \textit{\textbf{N-pad}}, a novel method for anomaly dete… ▽ More Identifying defects in the images of industrial products has been an important task to enhance quality control and reduce maintenance costs. In recent studies, industrial anomaly detection models were developed using pre-trained networks to learn nominal representations. To employ the relative positional information of each pixel, we present \textit{\textbf{N-pad}}, a novel method for anomaly detection and segmentation in a one-class learning setting that includes the neighborhood of the target pixel for model training and evaluation. Within the model architecture, pixel-wise nominal distributions are estimated by using the features of neighboring pixels with the target pixel to allow possible marginal misalignment. Moreover, the centroids from clusters of nominal features are identified as a representative nominal set. Accordingly, anomaly scores are inferred based on the Mahalanobis distances and Euclidean distances between the target pixel and the estimated distributions or the centroid set, respectively. Thus, we have achieved state-of-the-art performance in MVTec-AD with AUROC of 99.37 for anomaly detection and 98.75 for anomaly segmentation, reducing the error by 34\% compared to the next best performing model. Experiments in various settings further validate our model. △ Less

Submitted 17 October, 2022; originally announced October 2022.

arXiv:2208.06183 [pdf, other]

Non-Autoregressive Sign Language Production via Knowledge Distillation

Authors: Eui Jun Hwang, Jung Ho Kim, Suk Min Cho, Jong C. Park

Abstract: Sign Language Production (SLP) aims to translate expressions in spoken language into corresponding ones in sign language, such as skeleton-based sign poses or videos. Existing SLP models are either AutoRegressive (AR) or Non-Autoregressive (NAR). However, AR-SLP models suffer from regression to the mean and error propagation during decoding. NSLP-G, a NAR-based model, resolves these issues to some… ▽ More Sign Language Production (SLP) aims to translate expressions in spoken language into corresponding ones in sign language, such as skeleton-based sign poses or videos. Existing SLP models are either AutoRegressive (AR) or Non-Autoregressive (NAR). However, AR-SLP models suffer from regression to the mean and error propagation during decoding. NSLP-G, a NAR-based model, resolves these issues to some extent but engenders other problems. For example, it does not consider target sign lengths and suffers from false decoding initiation. We propose a novel NAR-SLP model via Knowledge Distillation (KD) to address these problems. First, we devise a length regulator to predict the end of the generated sign pose sequence. We then adopt KD, which distills spatial-linguistic features from a pre-trained pose encoder to alleviate false decoding initiation. Extensive experiments show that the proposed approach significantly outperforms existing SLP models in both Frechet Gesture Distance and Back-Translation evaluation. △ Less

Submitted 12 August, 2022; originally announced August 2022.

Comments: 10 pages, 4 figures, 3 tables, submitted to ECCV2023

arXiv:2208.00323 [pdf]

A Multi-View Learning Approach to Enhance Automatic 12-Lead ECG Diagnosis Performance

Authors: Jae-Won Choi, Dae-Yong Hong, Chan Jung, Eugene Hwang, Sung-Hyuk Park, Seung-Young Roh

Abstract: The performances of commonly used electrocardiogram (ECG) diagnosis models have recently improved with the introduction of deep learning (DL). However, the impact of various combinations of multiple DL components and/or the role of data augmentation techniques on the diagnosis have not been sufficiently investigated. This study proposes an ensemble-based multi-view learning approach with an ECG au… ▽ More The performances of commonly used electrocardiogram (ECG) diagnosis models have recently improved with the introduction of deep learning (DL). However, the impact of various combinations of multiple DL components and/or the role of data augmentation techniques on the diagnosis have not been sufficiently investigated. This study proposes an ensemble-based multi-view learning approach with an ECG augmentation technique to achieve a higher performance than traditional automatic 12-lead ECG diagnosis methods. The data analysis results show that the proposed model reports an F1 score of 0.840, which outperforms existing state-ofthe-art methods in the literature. △ Less

Submitted 30 July, 2022; originally announced August 2022.

Comments: 9 pages, 3 figures, and 5 tables

arXiv:1909.00344 [pdf]

Interdependency between the Stock Market and Financial News

Authors: EunJeong Hwang, Yong-Hyuk Kim

Abstract: Stock prices are driven by various factors. In particular, many individual investors who have relatively little financial knowledge rely heavily on the information from news stories when making investment decisions in the stock market. However, these stories may not reflect future stock prices because of the subjectivity in the news; stock prices may instead affect the news contents. This study ai… ▽ More Stock prices are driven by various factors. In particular, many individual investors who have relatively little financial knowledge rely heavily on the information from news stories when making investment decisions in the stock market. However, these stories may not reflect future stock prices because of the subjectivity in the news; stock prices may instead affect the news contents. This study aims to discover whether it is news or stock prices that have a greater impact on the other. To achieve this, we analyze the relationship between news sentiment and stock prices based on time series analysis using five different classification models. Our experimental results show that stock prices have a bigger impact on the news contents than news does on stock prices. △ Less

Submitted 1 September, 2019; originally announced September 2019.

Comments: 4 pages

MSC Class: 68P20

arXiv:1903.08297 [pdf, other]

Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening

Authors: Nan Wu, Jason Phang, Jungkyu Park, Yiqiu Shen, Zhe Huang, Masha Zorin, Stanisław Jastrzębski, Thibault Févry, Joe Katsnelson, Eric Kim, Stacey Wolfson, Ujas Parikh, Sushma Gaddam, Leng Leng Young Lin, Kara Ho, Joshua D. Weinstein, Beatriu Reig, Yiming Gao, Hildegard Toth, Kristine Pysarenko, Alana Lewin, Jiyon Lee, Krystal Airola, Eralda Mema, Stephanie Chung , et al. (7 additional authors not shown)

Abstract: We present a deep convolutional neural network for breast cancer screening exam classification, trained and evaluated on over 200,000 exams (over 1,000,000 images). Our network achieves an AUC of 0.895 in predicting whether there is a cancer in the breast, when tested on the screening population. We attribute the high accuracy of our model to a two-stage training procedure, which allows us to use… ▽ More We present a deep convolutional neural network for breast cancer screening exam classification, trained and evaluated on over 200,000 exams (over 1,000,000 images). Our network achieves an AUC of 0.895 in predicting whether there is a cancer in the breast, when tested on the screening population. We attribute the high accuracy of our model to a two-stage training procedure, which allows us to use a very high-capacity patch-level network to learn from pixel-level labels alongside a network learning from macroscopic breast-level labels. To validate our model, we conducted a reader study with 14 readers, each reading 720 screening mammogram exams, and find our model to be as accurate as experienced radiologists when presented with the same data. Finally, we show that a hybrid model, averaging probability of malignancy predicted by a radiologist with a prediction of our neural network, is more accurate than either of the two separately. To better understand our results, we conduct a thorough analysis of our network's performance on different subpopulations of the screening population, model design, training procedure, errors, and properties of its internal representations. △ Less

Submitted 19 March, 2019; originally announced March 2019.

Comments: MIDL 2019 [arXiv:1907.08612]

Report number: MIDL/2019/ExtendedAbstract/SkxYez76FE

arXiv:1807.05333 [pdf]

Real-Time Shape Tracking of Facial Landmarks

Authors: Hyungjoon Kim, Hyeonwoo Kim, Eenjun Hwang

Abstract: Detection of facial landmarks and accurate tracking of their shape are essential in real-time virtual makeup applications, where users can see the makeups effect by moving their face in different directions. Typical face tracking techniques detect diverse facial landmarks and track them using a point tracker such as the Kanade-Lucas-Tomasi (KLT) point tracker. Typically, 5 or 64 points are used fo… ▽ More Detection of facial landmarks and accurate tracking of their shape are essential in real-time virtual makeup applications, where users can see the makeups effect by moving their face in different directions. Typical face tracking techniques detect diverse facial landmarks and track them using a point tracker such as the Kanade-Lucas-Tomasi (KLT) point tracker. Typically, 5 or 64 points are used for tracking a face. Even though these points are sufficient to track the approximate locations of facial landmarks, they are not sufficient to track the exact shape of facial landmarks. In this paper, we propose a method that can track the exact shape of facial landmarks in real-time by combining a deep learning technique and a point tracker. We detect facial landmarks accurately using SegNet, which performs semantic segmentation based on deep learning. Edge points of detected landmarks are tracked using the KLT point tracker. In spite of its popularity, the KLT point tracker suffers from the point loss problem. We solve this problem by executing SegNet periodically to calculate the shape of facial landmarks. That is, by combining the two techniques, we can avoid the computational overhead of SegNet for real-time shape tracking and the point loss problem of the KLT point tracker. We performed several experiments to evaluate the performance of our method and report some of the results herein. △ Less

Submitted 14 July, 2018; originally announced July 2018.

Comments: 8 pages

MSC Class: eess.IV - Image and Video Processing

arXiv:1711.10577 [pdf, other]

Deep learning analysis of breast MRIs for prediction of occult invasive disease in ductal carcinoma in situ

Authors: Zhe Zhu, Michael Harowicz, Jun Zhang, Ashirbani Saha, Lars J. Grimm, E. Shelley Hwang, Maciej A. Mazurowski

Abstract: Purpose: To determine whether deep learning-based algorithms applied to breast MR images can aid in the prediction of occult invasive disease following the di- agnosis of ductal carcinoma in situ (DCIS) by core needle biopsy. Material and Methods: In this institutional review board-approved study, we analyzed dynamic contrast-enhanced fat-saturated T1-weighted MRI sequences of 131 patients at our… ▽ More Purpose: To determine whether deep learning-based algorithms applied to breast MR images can aid in the prediction of occult invasive disease following the di- agnosis of ductal carcinoma in situ (DCIS) by core needle biopsy. Material and Methods: In this institutional review board-approved study, we analyzed dynamic contrast-enhanced fat-saturated T1-weighted MRI sequences of 131 patients at our institution with a core needle biopsy-confirmed diagnosis of DCIS. The patients had no preoperative therapy before breast MRI and no prior history of breast cancer. We explored two different deep learning approaches to predict whether there was a hidden (occult) invasive component in the analyzed tumors that was ultimately detected at surgical excision. In the first approach, we adopted the transfer learning strategy, in which a network pre-trained on a large dataset of natural images is fine-tuned with our DCIS images. Specifically, we used the GoogleNet model pre-trained on the ImageNet dataset. In the second approach, we used a pre-trained network to extract deep features, and a support vector machine (SVM) that utilizes these features to predict the upstaging of the DCIS. We used 10-fold cross validation and the area under the ROC curve (AUC) to estimate the performance of the predictive models. Results: The best classification performance was obtained using the deep features approach with GoogleNet model pre-trained on ImageNet as the feature extractor and a polynomial kernel SVM used as the classifier (AUC = 0.70, 95% CI: 0.58- 0.79). For the transfer learning based approach, the highest AUC obtained was 0.53 (95% CI: 0.41-0.62). Conclusion: Convolutional neural networks could potentially be used to identify occult invasive disease in patients diagnosed with DCIS at the initial core needle biopsy. △ Less

Submitted 28 November, 2017; originally announced November 2017.

arXiv:1104.1822 [pdf]

Dimensionality Decrease Heuristics for NP Complete Problems

Authors: Eduardo Hwang

Abstract: The vast majority of scientific community believes that P!=NP, with countless supporting arguments. The number of people who believe otherwise probably amounts to as few as those opposing the 2nd Law of Thermodynamics. But isn't nature elegant enough, not to resource to brute-force search? In this article, a novel concept of dimensionality is presented, which may lead to a more efficient class of… ▽ More The vast majority of scientific community believes that P!=NP, with countless supporting arguments. The number of people who believe otherwise probably amounts to as few as those opposing the 2nd Law of Thermodynamics. But isn't nature elegant enough, not to resource to brute-force search? In this article, a novel concept of dimensionality is presented, which may lead to a more efficient class of heuristic implementations to solve NP complete problems. Thus, broadening the universe of man-machine tractable problems. Dimensionality, as defined here, will be a closer analog of strain energy in nature. △ Less

Submitted 10 April, 2011; originally announced April 2011.

arXiv:0905.2213

Outlining an elegant solver for 3-SAT

Authors: Eduardo Hwang

Abstract: The purpose of this article is to incite clever ways to attack problems. It advocates in favor of more elegant algorithms, in place of brute force (albeit its very well crafted) usages. The purpose of this article is to incite clever ways to attack problems. It advocates in favor of more elegant algorithms, in place of brute force (albeit its very well crafted) usages. △ Less

Submitted 10 April, 2011; v1 submitted 13 May, 2009; originally announced May 2009.

Comments: This paper has been withdrawn by the author due to its inadequacy, given more structured approaches to the subject

Showing 1–17 of 17 results for author: Hwang, E