subscribe to arXiv mailings

doi 10.1109/TASLP.2024.3407529

MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing

Authors: Yu-Fen Huang, Nikki Moran, Simon Coleman, Jon Kelly, Shun-Hwa Wei, Po-Yin Chen, Yun-Hsin Huang, Tsung-Ping Chen, Yu-Chia Kuo, Yu-Chi Wei, Chih-Hsuan Li, Da-Yu Huang, Hsuan-Kai Kao, Ting-Wei Lin, Li Su

Abstract: In cross-modal music processing, translation between visual, auditory, and semantic content opens up new possibilities as well as challenges. The construction of such a transformative scheme depends upon a benchmark corpus with a comprehensive data infrastructure. In particular, the assembly of a large-scale cross-modal dataset presents major challenges. In this paper, we present the MOSA (Music m… ▽ More In cross-modal music processing, translation between visual, auditory, and semantic content opens up new possibilities as well as challenges. The construction of such a transformative scheme depends upon a benchmark corpus with a comprehensive data infrastructure. In particular, the assembly of a large-scale cross-modal dataset presents major challenges. In this paper, we present the MOSA (Music mOtion with Semantic Annotation) dataset, which contains high quality 3-D motion capture data, aligned audio recordings, and note-by-note semantic annotations of pitch, beat, phrase, dynamic, articulation, and harmony for 742 professional music performances by 23 professional musicians, comprising more than 30 hours and 570 K notes of data. To our knowledge, this is the largest cross-modal music dataset with note-level annotations to date. To demonstrate the usage of the MOSA dataset, we present several innovative cross-modal music information retrieval (MIR) and musical content generation tasks, including the detection of beats, downbeats, phrase, and expressive contents from audio, video and motion data, and the generation of musicians' body motion from given music audio. The dataset and codes are available alongside this publication (https://github.com/yufenhuang/MOSA-Music-mOtion-and-Semantic-Annotation-dataset). △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024. 14 pages, 7 figures. Dataset is available on: https://github.com/yufenhuang/MOSA-Music-mOtion-and-Semantic-Annotation-dataset/tree/main and https://zenodo.org/records/11393449

arXiv:2402.13025 [pdf, other]

CFEVER: A Chinese Fact Extraction and VERification Dataset

Authors: Ying-Jia Lin, Chun-Yi Lin, Chia-Jen Yeh, Yi-Ting Li, Yun-Yu Hu, Chih-Hao Hsu, Mei-Feng Lee, Hung-Yu Kao

Abstract: We present CFEVER, a Chinese dataset designed for Fact Extraction and VERification. CFEVER comprises 30,012 manually created claims based on content in Chinese Wikipedia. Each claim in CFEVER is labeled as "Supports", "Refutes", or "Not Enough Info" to depict its degree of factualness. Similar to the FEVER dataset, claims in the "Supports" and "Refutes" categories are also annotated with correspon… ▽ More We present CFEVER, a Chinese dataset designed for Fact Extraction and VERification. CFEVER comprises 30,012 manually created claims based on content in Chinese Wikipedia. Each claim in CFEVER is labeled as "Supports", "Refutes", or "Not Enough Info" to depict its degree of factualness. Similar to the FEVER dataset, claims in the "Supports" and "Refutes" categories are also annotated with corresponding evidence sentences sourced from single or multiple pages in Chinese Wikipedia. Our labeled dataset holds a Fleiss' kappa value of 0.7934 for five-way inter-annotator agreement. In addition, through the experiments with the state-of-the-art approaches developed on the FEVER dataset and a simple baseline for CFEVER, we demonstrate that our dataset is a new rigorous benchmark for factual extraction and verification, which can be further used for developing automated systems to alleviate human fact-checking efforts. CFEVER is available at https://ikmlab.github.io/CFEVER. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: AAAI-24

arXiv:2310.19185 [pdf, other]

Robotic Barrier Construction through Weaved, Inflatable Tubes

Authors: H. J. Kim, H. Abdel-Raziq, X. Liu, A. Y. Siskovic, S. Patil, K. H. Petersen, H. L. Kao

Abstract: In this article, we present a mechanism and related path planning algorithm to construct light-duty barriers out of extruded, inflated tubes weaved around existing environmental features. Our extruded tubes are based on everted vine-robots and in this context, we present a new method to steer their growth. We characterize the mechanism in terms of accuracy resilience, and, towards their use as bar… ▽ More In this article, we present a mechanism and related path planning algorithm to construct light-duty barriers out of extruded, inflated tubes weaved around existing environmental features. Our extruded tubes are based on everted vine-robots and in this context, we present a new method to steer their growth. We characterize the mechanism in terms of accuracy resilience, and, towards their use as barriers, the ability of the tubes to withstand distributed loads. We further explore an algorithm which, given a feature map and the size and direction of the external load, can determine where and how to extrude the barrier. Finally, we showcase the potential of this method in an autonomously extruded two-layer wall weaved around three pipes. While preliminary, our work indicates that this method has the potential for barrier construction in cluttered environments, e.g. shelters against wind or snow. Future work may show how to achieve tighter weaves, how to leverage weave friction for improved strength, how to assess barrier performance for feedback control, and how to operate the extrusion mechanism off of a mobile robot. △ Less

Submitted 29 October, 2023; originally announced October 2023.

arXiv:2207.08141 [pdf, other]

ELECTRA is a Zero-Shot Learner, Too

Authors: Shiwen Ni, Hung-Yu Kao

Abstract: Recently, for few-shot or even zero-shot learning, the new paradigm "pre-train, prompt, and predict" has achieved remarkable achievements compared with the "pre-train, fine-tune" paradigm. After the success of prompt-based GPT-3, a series of masked language model (MLM)-based (e.g., BERT, RoBERTa) prompt learning methods became popular and widely used. However, another efficient pre-trained discrim… ▽ More Recently, for few-shot or even zero-shot learning, the new paradigm "pre-train, prompt, and predict" has achieved remarkable achievements compared with the "pre-train, fine-tune" paradigm. After the success of prompt-based GPT-3, a series of masked language model (MLM)-based (e.g., BERT, RoBERTa) prompt learning methods became popular and widely used. However, another efficient pre-trained discriminative model, ELECTRA, has probably been neglected. In this paper, we attempt to accomplish several NLP tasks in the zero-shot scenario using a novel our proposed replaced token detection (RTD)-based prompt learning method. Experimental results show that ELECTRA model based on RTD-prompt learning achieves surprisingly state-of-the-art zero-shot performance. Numerically, compared to MLM-RoBERTa-large and MLM-BERT-large, our RTD-ELECTRA-large has an average of about 8.4% and 13.7% improvement on all 15 tasks. Especially on the SST-2 task, our RTD-ELECTRA-large achieves an astonishing 90.1% accuracy without any training data. Overall, compared to the pre-trained masked language models, the pre-trained replaced token detection model performs better in zero-shot learning. The source code is available at: https://github.com/nishiwen1214/RTD-ELECTRA. △ Less

Submitted 20 July, 2022; v1 submitted 17 July, 2022; originally announced July 2022.

Comments: The source code is available at: https://github.com/nishiwen1214/RTD-ELECTRA

arXiv:2112.00245 [pdf, other]

doi 10.1109/TAAI54685.2021.00030

True or False: Does the Deep Learning Model Learn to Detect Rumors?

Authors: Shiwen Ni, Jiawen Li, Hung-Yu Kao

Abstract: It is difficult for humans to distinguish the true and false of rumors, but current deep learning models can surpass humans and achieve excellent accuracy on many rumor datasets. In this paper, we investigate whether deep learning models that seem to perform well actually learn to detect rumors. We evaluate models on their generalization ability to out-of-domain examples by fine-tuning BERT-based… ▽ More It is difficult for humans to distinguish the true and false of rumors, but current deep learning models can surpass humans and achieve excellent accuracy on many rumor datasets. In this paper, we investigate whether deep learning models that seem to perform well actually learn to detect rumors. We evaluate models on their generalization ability to out-of-domain examples by fine-tuning BERT-based models on five real-world datasets and evaluating against all test sets. The experimental results indicate that the generalization ability of the models on other unseen datasets are unsatisfactory, even common-sense rumors cannot be detected. Moreover, we found through experiments that models take shortcuts and learn absurd knowledge when the rumor datasets have serious data pitfalls. This means that simple modifications to the rumor text based on specific rules will lead to inconsistent model predictions. To more realistically evaluate rumor detection models, we proposed a new evaluation method called paired test (PairT), which requires models to correctly predict a pair of test samples at the same time. Furthermore, we make recommendations on how to better create rumor dataset and evaluate rumor detection model at the end of this paper. △ Less

Submitted 30 November, 2021; originally announced December 2021.

Comments: 5 pages, 3 figures, 8 tables

Journal ref: 2021 International Conference on Technologies and Applications of Artificial Intelligence (TAAI)

arXiv:2111.00781 [pdf, ps, other]

Decentralized Cooperative Reinforcement Learning with Hierarchical Information Structure

Authors: Hsu Kao, Chen-Yu Wei, Vijay Subramanian

Abstract: Multi-agent reinforcement learning (MARL) problems are challenging due to information asymmetry. To overcome this challenge, existing methods often require high level of coordination or communication between the agents. We consider two-agent multi-armed bandits (MABs) and Markov decision processes (MDPs) with a hierarchical information structure arising in applications, which we exploit to propose… ▽ More Multi-agent reinforcement learning (MARL) problems are challenging due to information asymmetry. To overcome this challenge, existing methods often require high level of coordination or communication between the agents. We consider two-agent multi-armed bandits (MABs) and Markov decision processes (MDPs) with a hierarchical information structure arising in applications, which we exploit to propose simpler and more efficient algorithms that require no coordination or communication. In the structure, in each step the ``leader" chooses her action first, and then the ``follower" decides his action after observing the leader's action. The two agents observe the same reward (and the same state transition in the MDP setting) that depends on their joint action. For the bandit setting, we propose a hierarchical bandit algorithm that achieves a near-optimal gap-independent regret of $\widetilde{\mathcal{O}}(\sqrt{ABT})$ and a near-optimal gap-dependent regret of $\mathcal{O}(\log(T))$, where $A$ and $B$ are the numbers of actions of the leader and the follower, respectively, and $T$ is the number of steps. We further extend to the case of multiple followers and the case with a deep hierarchy, where we both obtain near-optimal regret bounds. For the MDP setting, we obtain $\widetilde{\mathcal{O}}(\sqrt{H^7S^2ABT})$ regret, where $H$ is the number of steps per episode, $S$ is the number of states, $T$ is the number of episodes. This matches the existing lower bound in terms of $A, B$, and $T$. △ Less

Submitted 1 November, 2021; originally announced November 2021.

arXiv:2110.12603 [pdf, ps, other]

Common Information based Approximate State Representations in Multi-Agent Reinforcement Learning

Authors: Hsu Kao, Vijay Subramanian

Abstract: Due to information asymmetry, finding optimal policies for Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) is hard with the complexity growing doubly exponentially in the horizon length. The challenge increases greatly in the multi-agent reinforcement learning (MARL) setting where the transition probabilities, observation kernel, and reward function are unknown. Here, we… ▽ More Due to information asymmetry, finding optimal policies for Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) is hard with the complexity growing doubly exponentially in the horizon length. The challenge increases greatly in the multi-agent reinforcement learning (MARL) setting where the transition probabilities, observation kernel, and reward function are unknown. Here, we develop a general compression framework with approximate common and private state representations, based on which decentralized policies can be constructed. We derive the optimality gap of executing dynamic programming (DP) with the approximate states in terms of the approximation error parameters and the remaining time steps. When the compression is exact (no error), the resulting DP is equivalent to the one in existing work. Our general framework generalizes a number of methods proposed in the literature. The results shed light on designing practically useful deep-MARL network structures under the "centralized learning distributed execution" scheme. △ Less

Submitted 24 October, 2021; originally announced October 2021.

arXiv:2110.00425 [pdf, other]

HAT4RD: Hierarchical Adversarial Training for Rumor Detection on Social Media

Authors: Shiwen Ni, Jiawen Li, Hung-Yu Kao

Abstract: With the development of social media, social communication has changed. While this facilitates people's communication and access to information, it also provides an ideal platform for spreading rumors. In normal or critical situations, rumors will affect people's judgment and even endanger social security. However, natural language is high-dimensional and sparse, and the same rumor may be expresse… ▽ More With the development of social media, social communication has changed. While this facilitates people's communication and access to information, it also provides an ideal platform for spreading rumors. In normal or critical situations, rumors will affect people's judgment and even endanger social security. However, natural language is high-dimensional and sparse, and the same rumor may be expressed in hundreds of ways on social media. As such, the robustness and generalization of the current rumor detection model are put into question. We proposed a novel \textbf{h}ierarchical \textbf{a}dversarial \textbf{t}raining method for \textbf{r}umor \textbf{d}etection (HAT4RD) on social media. Specifically, HAT4RD is based on gradient ascent by adding adversarial perturbations to the embedding layers of post-level and event-level modules to deceive the detector. At the same time, the detector uses stochastic gradient descent to minimize the adversarial risk to learn a more robust model. In this way, the post-level and event-level sample spaces are enhanced, and we have verified the robustness of our model under a variety of adversarial attacks. Moreover, visual experiments indicate that the proposed model drifts into an area with a flat loss landscape, leading to better generalization. We evaluate our proposed method on three public rumors datasets from two commonly used social platforms (Twitter and Weibo). Experiment results demonstrate that our model achieves better results than state-of-the-art methods. △ Less

Submitted 29 August, 2022; v1 submitted 29 August, 2021; originally announced October 2021.

arXiv:2108.12805 [pdf, other]

DropAttack: A Masked Weight Adversarial Training Method to Improve Generalization of Neural Networks

Authors: Shiwen Ni, Jiawen Li, Hung-Yu Kao

Abstract: Adversarial training has been proven to be a powerful regularization method to improve the generalization of models. However, current adversarial training methods only attack the original input sample or the embedding vectors, and their attacks lack coverage and diversity. To further enhance the breadth and depth of attack, we propose a novel masked weight adversarial training method called DropAt… ▽ More Adversarial training has been proven to be a powerful regularization method to improve the generalization of models. However, current adversarial training methods only attack the original input sample or the embedding vectors, and their attacks lack coverage and diversity. To further enhance the breadth and depth of attack, we propose a novel masked weight adversarial training method called DropAttack, which enhances generalization of model by adding intentionally worst-case adversarial perturbations to both the input and hidden layers in different dimensions and minimize the adversarial risks generated by each layer. DropAttack is a general technique and can be adopt to a wide variety of neural networks with different architectures. To validate the effectiveness of the proposed method, we used five public datasets in the fields of natural language processing (NLP) and computer vision (CV) for experimental evaluating. We compare the proposed method with other adversarial training methods and regularization methods, and our method achieves state-of-the-art on all datasets. In addition, Dropattack can achieve the same performance when it use only a half training data compared to other standard training method. Theoretical analysis reveals that DropAttack can perform gradient regularization at random on some of the input and wight parameters of the model. Further visualization experiments show that DropAttack can push the minimum risk of the model to a lower and flatter loss landscapes. Our source code is publicly available on https://github.com/nishiwen1214/DropAttack. △ Less

Submitted 29 August, 2021; originally announced August 2021.

arXiv:2107.10747 [pdf, other]

Meet The Truth: Leverage Objective Facts and Subjective Views for Interpretable Rumor Detection

Authors: Jiawen Li, Shiwen Ni, Hung-Yu Kao

Abstract: Existing rumor detection strategies typically provide detection labels while ignoring their explanation. Nonetheless, providing pieces of evidence to explain why a suspicious tweet is rumor is essential. As such, a novel model, LOSIRD, was proposed in this paper. First, LOSIRD mines appropriate evidence sentences and classifies them by automatically checking the veracity of the relationship of the… ▽ More Existing rumor detection strategies typically provide detection labels while ignoring their explanation. Nonetheless, providing pieces of evidence to explain why a suspicious tweet is rumor is essential. As such, a novel model, LOSIRD, was proposed in this paper. First, LOSIRD mines appropriate evidence sentences and classifies them by automatically checking the veracity of the relationship of the given claim and its evidence from about 5 million Wikipedia documents. LOSIRD then automatically constructs two heterogeneous graph objects to simulate the propagation layout of the tweets and code the relationship of evidence. Finally, a graphSAGE processing component is used in LOSIRD to provide the label and evidence. To the best of our knowledge, we are the first one who combines objective facts and subjective views to verify rumor. The experimental results on two real-world Twitter datasets showed that our model exhibited the best performance in the early rumor detection task and its rumor detection performance outperformed other baseline and state-of-the-art models. Moreover, we confirmed that both objective information and subjective information are fundamental clues for rumor detection. △ Less

Submitted 21 July, 2021; originally announced July 2021.

arXiv:2011.00259 [pdf]

Rumor Detection on Twitter Using Multiloss Hierarchical BiLSTM with an Attenuation Factor

Authors: Yudianto Sujana, Jiawen Li, Hung-Yu Kao

Abstract: Social media platforms such as Twitter have become a breeding ground for unverified information or rumors. These rumors can threaten people's health, endanger the economy, and affect the stability of a country. Many researchers have developed models to classify rumors using traditional machine learning or vanilla deep learning models. However, previous studies on rumor detection have achieved low… ▽ More Social media platforms such as Twitter have become a breeding ground for unverified information or rumors. These rumors can threaten people's health, endanger the economy, and affect the stability of a country. Many researchers have developed models to classify rumors using traditional machine learning or vanilla deep learning models. However, previous studies on rumor detection have achieved low precision and are time consuming. Inspired by the hierarchical model and multitask learning, a multiloss hierarchical BiLSTM model with an attenuation factor is proposed in this paper. The model is divided into two BiLSTM modules: post level and event level. By means of this hierarchical structure, the model can extract deep in-formation from limited quantities of text. Each module has a loss function that helps to learn bilateral features and reduce the training time. An attenuation fac-tor is added at the post level to increase the accuracy. The results on two rumor datasets demonstrate that our model achieves better performance than that of state-of-the-art machine learning and vanilla deep learning models. △ Less

Submitted 14 December, 2020; v1 submitted 31 October, 2020; originally announced November 2020.

arXiv:2009.08015 [pdf, other]

doi 10.1145/3394171.3413848

Temporally Guided Music-to-Body-Movement Generation

Authors: Hsuan-Kai Kao, Li Su

Abstract: This paper presents a neural network model to generate virtual violinist's 3-D skeleton movements from music audio. Improved from the conventional recurrent neural network models for generating 2-D skeleton data in previous works, the proposed model incorporates an encoder-decoder architecture, as well as the self-attention mechanism to model the complicated dynamics in body movement sequences. To… ▽ More This paper presents a neural network model to generate virtual violinist's 3-D skeleton movements from music audio. Improved from the conventional recurrent neural network models for generating 2-D skeleton data in previous works, the proposed model incorporates an encoder-decoder architecture, as well as the self-attention mechanism to model the complicated dynamics in body movement sequences. To facilitate the optimization of self-attention model, beat tracking is applied to determine effective sizes and boundaries of the training examples. The decoder is accompanied with a refining network and a bowing attack inference mechanism to emphasize the right-hand behavior and bowing attack timing. Both objective and subjective evaluations reveal that the proposed model outperforms the state-of-the-art methods. To the best of our knowledge, this work represents the first attempt to generate 3-D violinists' body movements considering key features in musical body movement. △ Less

Submitted 16 September, 2020; originally announced September 2020.

arXiv:2009.07816 [pdf, other]

doi 10.1145/3394171.3413921

A Human-Computer Duet System for Music Performance

Authors: Yuen-Jen Lin, Hsuan-Kai Kao, Yih-Chih Tseng, Ming Tsai, Li Su

Abstract: Virtual musicians have become a remarkable phenomenon in the contemporary multimedia arts. However, most of the virtual musicians nowadays have not been endowed with abilities to create their own behaviors, or to perform music with human musicians. In this paper, we firstly create a virtual violinist, who can collaborate with a human pianist to perform chamber music automatically without any inter… ▽ More Virtual musicians have become a remarkable phenomenon in the contemporary multimedia arts. However, most of the virtual musicians nowadays have not been endowed with abilities to create their own behaviors, or to perform music with human musicians. In this paper, we firstly create a virtual violinist, who can collaborate with a human pianist to perform chamber music automatically without any intervention. The system incorporates the techniques from various fields, including real-time music tracking, pose estimation, and body movement generation. In our system, the virtual musician's behavior is generated based on the given music audio alone, and such a system results in a low-cost, efficient and scalable way to produce human and virtual musicians' co-performance. The proposed system has been validated in public concerts. Objective quality assessment approaches and possible ways to systematically improve the system are also discussed. △ Less

Submitted 16 September, 2020; originally announced September 2020.

arXiv:2002.04500 [pdf]

doi 10.1038/s41379-020-0640-y

Artificial Intelligence Assistance Significantly Improves Gleason Grading of Prostate Biopsies by Pathologists

Authors: Wouter Bulten, Maschenka Balkenhol, Jean-Joël Awoumou Belinga, Américo Brilhante, Aslı Çakır, Xavier Farré, Katerina Geronatsiou, Vincent Molinié, Guilherme Pereira, Paromita Roy, Günter Saile, Paulo Salles, Ewout Schaafsma, Joëlle Tschui, Anne-Marie Vos, Hester van Boven, Robert Vink, Jeroen van der Laak, Christina Hulsbergen-van de Kaa, Geert Litjens

Abstract: While the Gleason score is the most important prognostic marker for prostate cancer patients, it suffers from significant observer variability. Artificial Intelligence (AI) systems, based on deep learning, have proven to achieve pathologist-level performance at Gleason grading. However, the performance of such systems can degrade in the presence of artifacts, foreign tissue, or other anomalies. Pa… ▽ More While the Gleason score is the most important prognostic marker for prostate cancer patients, it suffers from significant observer variability. Artificial Intelligence (AI) systems, based on deep learning, have proven to achieve pathologist-level performance at Gleason grading. However, the performance of such systems can degrade in the presence of artifacts, foreign tissue, or other anomalies. Pathologists integrating their expertise with feedback from an AI system could result in a synergy that outperforms both the individual pathologist and the system. Despite the hype around AI assistance, existing literature on this topic within the pathology domain is limited. We investigated the value of AI assistance for grading prostate biopsies. A panel of fourteen observers graded 160 biopsies with and without AI assistance. Using AI, the agreement of the panel with an expert reference standard significantly increased (quadratically weighted Cohen's kappa, 0.799 vs 0.872; p=0.018). Our results show the added value of AI systems for Gleason grading, but more importantly, show the benefits of pathologist-AI synergy. △ Less

Submitted 11 February, 2020; originally announced February 2020.

Comments: 21 pages, 5 figures

Journal ref: Modern Pathology, Available online 5 August 2020

arXiv:1911.03380 [pdf, other]

An analysis of Uniswap markets

Authors: Guillermo Angeris, Hsien-Tang Kao, Rei Chiang, Charlie Noyes, Tarun Chitra

Abstract: Uniswap -- and other constant product markets -- appear to work well in practice despite their simplicity. In this paper, we give a simple formal analysis of constant product markets and their generalizations, showing that, under some common conditions, these markets must closely track the reference market price. We also show that Uniswap satisfies many other desirable properties and numerically d… ▽ More Uniswap -- and other constant product markets -- appear to work well in practice despite their simplicity. In this paper, we give a simple formal analysis of constant product markets and their generalizations, showing that, under some common conditions, these markets must closely track the reference market price. We also show that Uniswap satisfies many other desirable properties and numerically demonstrate, via a large-scale agent-based simulation, that Uniswap is stable under a wide range of market conditions. △ Less

Submitted 9 February, 2021; v1 submitted 8 November, 2019; originally announced November 2019.

arXiv:1907.07980 [pdf, other]

doi 10.1016/S1470-2045(19)30739-9

Automated Gleason Grading of Prostate Biopsies using Deep Learning

Authors: Wouter Bulten, Hans Pinckaers, Hester van Boven, Robert Vink, Thomas de Bel, Bram van Ginneken, Jeroen van der Laak, Christina Hulsbergen-van de Kaa, Geert Litjens

Abstract: The Gleason score is the most important prognostic marker for prostate cancer patients but suffers from significant inter-observer variability. We developed a fully automated deep learning system to grade prostate biopsies. The system was developed using 5834 biopsies from 1243 patients. A semi-automatic labeling technique was used to circumvent the need for full manual annotation by pathologists.… ▽ More The Gleason score is the most important prognostic marker for prostate cancer patients but suffers from significant inter-observer variability. We developed a fully automated deep learning system to grade prostate biopsies. The system was developed using 5834 biopsies from 1243 patients. A semi-automatic labeling technique was used to circumvent the need for full manual annotation by pathologists. The developed system achieved a high agreement with the reference standard. In a separate observer experiment, the deep learning system outperformed 10 out of 15 pathologists. The system has the potential to improve prostate cancer prognostics by acting as a first or second reader. △ Less

Submitted 18 July, 2019; originally announced July 2019.

Comments: 13 pages, 6 figures

Journal ref: The Lancet Oncology, Available online 8 January 2020

arXiv:1907.07355 [pdf, other]

Probing Neural Network Comprehension of Natural Language Arguments

Authors: Timothy Niven, Hung-Yu Kao

Abstract: We are surprised to find that BERT's peak performance of 77% on the Argument Reasoning Comprehension Task reaches just three points below the average untrained human baseline. However, we show that this result is entirely accounted for by exploitation of spurious statistical cues in the dataset. We analyze the nature of these cues and demonstrate that a range of models all exploit them. This analy… ▽ More We are surprised to find that BERT's peak performance of 77% on the Argument Reasoning Comprehension Task reaches just three points below the average untrained human baseline. However, we show that this result is entirely accounted for by exploitation of spurious statistical cues in the dataset. We analyze the nature of these cues and demonstrate that a range of models all exploit them. This analysis informs the construction of an adversarial dataset on which all models achieve random accuracy. Our adversarial dataset provides a more robust assessment of argument comprehension and should be adopted as the standard in future work. △ Less

Submitted 16 September, 2019; v1 submitted 17 July, 2019; originally announced July 2019.

Comments: ACL 2019 (Updated Version)

arXiv:1907.07347 [pdf, other]

Fake News Detection as Natural Language Inference

Authors: Kai-Chou Yang, Timothy Niven, Hung-Yu Kao

Abstract: This report describes the entry by the Intelligent Knowledge Management (IKM) Lab in the WSDM 2019 Fake News Classification challenge. We treat the task as natural language inference (NLI). We individually train a number of the strongest NLI models as well as BERT. We ensemble these results and retrain with noisy labels in two stages. We analyze transitivity relations in the train and test sets an… ▽ More This report describes the entry by the Intelligent Knowledge Management (IKM) Lab in the WSDM 2019 Fake News Classification challenge. We treat the task as natural language inference (NLI). We individually train a number of the strongest NLI models as well as BERT. We ensemble these results and retrain with noisy labels in two stages. We analyze transitivity relations in the train and test sets and determine a set of test cases that can be reliably classified on this basis. The remainder of test cases are classified by our ensemble. Our entry achieves test set accuracy of 88.063% for 3rd place in the competition. △ Less

Submitted 17 July, 2019; originally announced July 2019.

arXiv:1905.08846 [pdf, other]

Discovering Hidden Structure in High Dimensional Human Behavioral Data via Tensor Factorization

Authors: Homa Hosseinmardi, Hsien-Te Kao, Kristina Lerman, Emilio Ferrara

Abstract: In recent years, the rapid growth in technology has increased the opportunity for longitudinal human behavioral studies. Rich multimodal data, from wearables like Fitbit, online social networks, mobile phones etc. can be collected in natural environments. Uncovering the underlying low-dimensional structure of noisy multi-way data in an unsupervised setting is a challenging problem. Tensor factoriz… ▽ More In recent years, the rapid growth in technology has increased the opportunity for longitudinal human behavioral studies. Rich multimodal data, from wearables like Fitbit, online social networks, mobile phones etc. can be collected in natural environments. Uncovering the underlying low-dimensional structure of noisy multi-way data in an unsupervised setting is a challenging problem. Tensor factorization has been successful in extracting the interconnected low-dimensional descriptions of multi-way data. In this paper, we apply non-negative tensor factorization on a real-word wearable sensor data, StudentLife, to find latent temporal factors and group of similar individuals. Meta data is available for the semester schedule, as well as the individuals' performance and personality. We demonstrate that non-negative tensor factorization can successfully discover clusters of individuals who exhibit higher academic performance, as well as those who frequently engage in leisure activities. The recovered latent temporal patterns associated with these groups are validated against ground truth data to demonstrate the accuracy of our framework. △ Less

Submitted 21 May, 2019; originally announced May 2019.

Comments: 2018 WSDM Heteronam Workshop

Journal ref: 2018 ACM International WSDM Conference, Heteronam Workshop

arXiv:1811.11136 [pdf, other]

SOC: hunting the underground inside story of the ethereum Social-network Opinion and Comment

Authors: TonTon Hsien-De Huang, Po-Wei Hong, Ying-Tse Lee, Yi-Lun Wang, Chi-Leong Lok, Hung-Yu Kao

Abstract: The cryptocurrency is attracting more and more attention because of the blockchain technology. Ethereum is gaining a significant popularity in blockchain community, mainly due to the fact that it is designed in a way that enables developers to write smart contracts and decentralized applications (Dapps). There are many kinds of cryptocurrency information on the social network. The risks and fraud… ▽ More The cryptocurrency is attracting more and more attention because of the blockchain technology. Ethereum is gaining a significant popularity in blockchain community, mainly due to the fact that it is designed in a way that enables developers to write smart contracts and decentralized applications (Dapps). There are many kinds of cryptocurrency information on the social network. The risks and fraud problems behind it have pushed many countries including the United States, South Korea, and China to make warnings and set up corresponding regulations. However, the security of Ethereum smart contracts has not gained much attention. Through the Deep Learning approach, we propose a method of sentiment analysis for Ethereum's community comments. In this research, we first collected the users' cryptocurrency comments from the social network and then fed to our LSTM + CNN model for training. Then we made prediction through sentiment analysis. With our research result, we have demonstrated that both the precision and the recall of sentiment analysis can achieve 0.80+. More importantly, we deploy our sentiment analysis1 on RatingToken and Coin Master (mobile application of Cheetah Mobile Blockchain Security Center23). We can effectively provide detail information to resolve the risks of being fake and fraud problems. △ Less

Submitted 27 November, 2018; originally announced November 2018.

Comments: Draft

arXiv:1808.05883 [pdf, other]

doi 10.1038/s41598-018-37257-4

Epithelium segmentation using deep learning in H&E-stained prostate specimens with immunohistochemistry as reference standard

Authors: Wouter Bulten, Péter Bándi, Jeffrey Hoven, Rob van de Loo, Johannes Lotz, Nick Weiss, Jeroen van der Laak, Bram van Ginneken, Christina Hulsbergen-van de Kaa, Geert Litjens

Abstract: Prostate cancer (PCa) is graded by pathologists by examining the architectural pattern of cancerous epithelial tissue on hematoxylin and eosin (H&E) stained slides. Given the importance of gland morphology, automatically differentiating between glandular epithelial tissue and other tissues is an important prerequisite for the development of automated methods for detecting PCa. We propose a new met… ▽ More Prostate cancer (PCa) is graded by pathologists by examining the architectural pattern of cancerous epithelial tissue on hematoxylin and eosin (H&E) stained slides. Given the importance of gland morphology, automatically differentiating between glandular epithelial tissue and other tissues is an important prerequisite for the development of automated methods for detecting PCa. We propose a new method, using deep learning, for automatically segmenting epithelial tissue in digitized prostatectomy slides. We employed immunohistochemistry (IHC) to render the ground truth less subjective and more precise compared to manual outlining on H&E slides, especially in areas with high-grade and poorly differentiated PCa. Our dataset consisted of 102 tissue blocks, including both low and high grade PCa. From each block a single new section was cut, stained with H&E, scanned, restained using P63 and CK8/18 to highlight the epithelial structure, and scanned again. The H&E slides were co-registered to the IHC slides. On a subset of the IHC slides we applied color deconvolution, corrected stain errors manually, and trained a U-Net to perform segmentation of epithelial structures. Whole-slide segmentation masks generated by the IHC U-Net were used to train a second U-Net on H&E. Our system makes precise cell-level segmentations and segments both intact glands as well as individual (tumor) epithelial cells. We achieved an F1-score of 0.895 on a hold-out test set and 0.827 on an external reference set from a different center. We envision this segmentation as being the first part of a fully automated prostate cancer detection and grading pipeline. △ Less

Submitted 8 February, 2019; v1 submitted 17 August, 2018; originally announced August 2018.

Journal ref: Nature Scientific Reports 9, 864 (2019)

arXiv:1804.08266 [pdf, other]

NLITrans at SemEval-2018 Task 12: Transfer of Semantic Knowledge for Argument Comprehension

Authors: Tim Niven, Hung-Yu Kao

Abstract: The Argument Reasoning Comprehension Task requires significant language understanding and complex reasoning over world knowledge. We focus on transfer of a sentence encoder to bootstrap more complicated models given the small size of the dataset. Our best model uses a pre-trained BiLSTM to encode input sentences, learns task-specific features for the argument and warrants, then performs independen… ▽ More The Argument Reasoning Comprehension Task requires significant language understanding and complex reasoning over world knowledge. We focus on transfer of a sentence encoder to bootstrap more complicated models given the small size of the dataset. Our best model uses a pre-trained BiLSTM to encode input sentences, learns task-specific features for the argument and warrants, then performs independent argument-warrant matching. This model achieves mean test set accuracy of 64.43%. Encoder transfer yields a significant gain to our best model over random initialization. Independent warrant matching effectively doubles the size of the dataset and provides additional regularization. We demonstrate that regularization comes from ignoring statistical correlations between warrant features and position. We also report an experiment with our best model that only matches warrants to reasons, ignoring claims. Relatively low performance degradation suggests that our model is not necessarily learning the intended task. △ Less

Submitted 23 April, 2018; originally announced April 2018.

arXiv:1803.00458 [pdf, ps, other]

C-3PO: Click-sequence-aware DeeP Neural Network (DNN)-based Pop-uPs RecOmmendation

Authors: TonTon Hsien-De Huang, Hung-Yu Kao

Abstract: With the emergence of mobile and wearable devices, push notification becomes a powerful tool to connect and maintain the relationship with App users, but sending inappropriate or too many messages at the wrong time may result in the App being removed by the users. In order to maintain the retention rate and the delivery rate of advertisement, we adopt Deep Neural Network (DNN) to develop a pop-up… ▽ More With the emergence of mobile and wearable devices, push notification becomes a powerful tool to connect and maintain the relationship with App users, but sending inappropriate or too many messages at the wrong time may result in the App being removed by the users. In order to maintain the retention rate and the delivery rate of advertisement, we adopt Deep Neural Network (DNN) to develop a pop-up recommendation system "Click sequence-aware deeP neural network (DNN)-based Pop-uPs recOmmendation (C-3PO)" enabled by collaborative filtering-based hybrid user behavioral analysis. We further verified the system with real data collected from the product Security Master, Clean Master and CM Browser, supported by Leopard Mobile Inc. (Cheetah Mobile Taiwan Agency). In this way, we can know precisely about users' preference and frequency to click on the push notification/pop-ups, decrease the troublesome to users efficiently, and meanwhile increase the click through rate of push notifications/pop-ups. △ Less

Submitted 20 December, 2018; v1 submitted 28 February, 2018; originally announced March 2018.

Comments: 2018/12/20

arXiv:1710.05305 [pdf, other]

Data-Driven and Deep Learning Methodology for Deceptive Advertising and Phone Scams Detection

Authors: TonTon Hsien-De Huang, Chia-Mu Yu, Hung-Yu Kao

Abstract: The advance of smartphones and cellular networks boosts the need of mobile advertising and targeted marketing. However, it also triggers the unseen security threats. We found that the phone scams with fake calling numbers of very short lifetime are increasingly popular and have been used to trick the users. The harm is worldwide. On the other hand, deceptive advertising (deceptive ads), the fake a… ▽ More The advance of smartphones and cellular networks boosts the need of mobile advertising and targeted marketing. However, it also triggers the unseen security threats. We found that the phone scams with fake calling numbers of very short lifetime are increasingly popular and have been used to trick the users. The harm is worldwide. On the other hand, deceptive advertising (deceptive ads), the fake ads that tricks users to install unnecessary apps via either alluring or daunting texts and pictures, is an emerging threat that seriously harms the reputation of the advertiser. To counter against these two new threats, the conventional blacklist (or whitelist) approach and the machine learning approach with predefined features have been proven useless. Nevertheless, due to the success of deep learning in developing the highly intelligent program, our system can efficiently and effectively detect phone scams and deceptive ads by taking advantage of our unified framework on deep neural network (DNN) and convolutional neural network (CNN). The proposed system has been deployed for operational use and the experimental results proved the effectiveness of our proposed system. Furthermore, we keep our research results and release experiment material on http://DeceptiveAds.TWMAN.ORG and http://PhoneScams.TWMAN.ORG if there is any update. △ Less

Submitted 15 October, 2017; originally announced October 2017.

Comments: 6 pages, TAAI 2017 version

arXiv:1705.04448 [pdf, other]

R2-D2: ColoR-inspired Convolutional NeuRal Network (CNN)-based AndroiD Malware Detections

Authors: TonTon Hsien-De Huang, Hung-Yu Kao

Abstract: The influence of Deep Learning on image identification and natural language processing has attracted enormous attention globally. The convolution neural network that can learn without prior extraction of features fits well in response to the rapid iteration of Android malware. The traditional solution for detecting Android malware requires continuous learning through pre-extracted features to main… ▽ More The influence of Deep Learning on image identification and natural language processing has attracted enormous attention globally. The convolution neural network that can learn without prior extraction of features fits well in response to the rapid iteration of Android malware. The traditional solution for detecting Android malware requires continuous learning through pre-extracted features to maintain high performance of identifying the malware. In order to reduce the manpower of feature engineering prior to the condition of not to extract pre-selected features, we have developed a coloR-inspired convolutional neuRal networks (CNN)-based AndroiD malware Detection (R2-D2) system. The system can convert the bytecode of classes.dex from Android archive file to rgb color code and store it as a color image with fixed size. The color image is input to the convolutional neural network for automatic feature extraction and training. The data was collected from Jan. 2017 to Aug 2017. During the period of time, we have collected approximately 2 million of benign and malicious Android apps for our experiments with the help from our research partner Leopard Mobile Inc. Our experiment results demonstrate that the proposed system has accurate security analysis on contracts. Furthermore, we keep our research results and experiment materials on http://R2D2.TWMAN.ORG. △ Less

Submitted 15 November, 2018; v1 submitted 12 May, 2017; originally announced May 2017.

Comments: Verison 2018/11/15, IEEE BigData 2018, Seattle, WA, USA, Dec 10-13, 2018. (Accepted)

arXiv:1504.06018 [pdf, other]

Blind Index Coding

Authors: David T. H. Kao, Mohammad Ali Maddah-Ali, A. Salman Avestimehr

Abstract: We introduce the blind index coding (BIC) problem, in which a single sender communicates distinct messages to multiple users over a shared channel. Each user has partial knowledge of each message as side information. However, unlike classic index coding, in BIC, the sender is uncertain of what side information is available to each user. In particular, the sender only knows the amount of bits in ea… ▽ More We introduce the blind index coding (BIC) problem, in which a single sender communicates distinct messages to multiple users over a shared channel. Each user has partial knowledge of each message as side information. However, unlike classic index coding, in BIC, the sender is uncertain of what side information is available to each user. In particular, the sender only knows the amount of bits in each user's side information but not its content. This problem can arise naturally in caching and wireless networks. In order to blindly exploit side information in the BIC problem, we develop a hybrid coding scheme that XORs uncoded bits of a subset of messages with random combinations of bits from other messages. This scheme allows us to strike the right balance between maximizing the transmission rate to each user and minimizing the interference leakage to others. We also develop a general outer bound, which relies on a strong data processing inequality to effectively capture the senders uncertainty about the users' side information. Additionally, we consider the case where communication takes place over a shared wireless medium, modeled by an erasure broadcast channel, and show that surprisingly, combining repetition coding with hybrid coding improves the achievable rate region and outperforms alternative strategies of coping with channel erasure and while blindly exploiting side information. △ Less

Submitted 1 September, 2015; v1 submitted 22 April, 2015; originally announced April 2015.

Comments: Parts of this paper were presented at ISIT 2015 and ICC 2015

arXiv:1504.04797 [pdf, other]

Rover-to-Orbiter Communication in Mars: Taking Advantage of the Varying Topology

Authors: Songze Li, David T. H. Kao, A. Salman Avestimehr

Abstract: In this paper, we study the communication problem from rovers on Mars' surface to Mars-orbiting satellites. We first justify that, to a good extent, the rover-to-orbiter communication problem can be modelled as communication over a $2 \times 2$ X-channel with the network topology varying over time. For such a fading X-channel where transmitters are only aware of the time-varying topology but not t… ▽ More In this paper, we study the communication problem from rovers on Mars' surface to Mars-orbiting satellites. We first justify that, to a good extent, the rover-to-orbiter communication problem can be modelled as communication over a $2 \times 2$ X-channel with the network topology varying over time. For such a fading X-channel where transmitters are only aware of the time-varying topology but not the time-varying channel state (i.e., no CSIT), we propose coding strategies that code across topologies, and develop upper bounds on the sum degrees-of-freedom (DoF) that is shown to be tight under certain pattern of the topology variation. Furthermore we demonstrate that the proposed scheme approximately achieves the ergodic sum-capacity of the network. Using the proposed coding scheme, we numerically evaluate the ergodic rate gain over a time-division-multiple-access (TDMA) scheme for Rayleigh and Rice fading channels. We also numerically demonstrate that with practical orbital parameters, a 9.6% DoF gain, as well as more than 11.6% throughput gain can be achieved for a rover-to-orbiter communication network. △ Less

Submitted 10 December, 2015; v1 submitted 19 April, 2015; originally announced April 2015.

Comments: 13 pages, 6 figures. Accepted by IEEE Transactions on Communications

arXiv:1405.1091 [pdf, other]

Linear Degrees of Freedom of the MIMO X-Channel with Delayed CSIT

Authors: David T. H. Kao, A. Salman Avestimehr

Abstract: We study the degrees of freedom (DoF) of the multiple-input multiple-output X-channel (MIMO XC) with delayed channel state information at the transmitters (delayed CSIT), assuming linear coding strategies at the transmitters. We present two results: 1) the linear sum DoF for MIMO XC with general antenna configurations, and 2) the linear DoF region for MIMO XC with symmetric antennas. The converse… ▽ More We study the degrees of freedom (DoF) of the multiple-input multiple-output X-channel (MIMO XC) with delayed channel state information at the transmitters (delayed CSIT), assuming linear coding strategies at the transmitters. We present two results: 1) the linear sum DoF for MIMO XC with general antenna configurations, and 2) the linear DoF region for MIMO XC with symmetric antennas. The converse for each result is based on developing a novel rank-ratio inequality that characterizes the maximum ratio between the dimensions of received linear subspaces at the two multiple-antenna receivers. The achievability of the linear sum DoF is based on a three-phase strategy, in which during the first two phases only the transmitter with fewer antennas exploits delayed CSIT in order to minimize the dimension of its signal at the unintended receiver. During Phase 3, both transmitters use delayed CSIT to send linear combinations of past transmissions such that each receiver receives a superposition of desired message data and known interference, thus simultaneously serving both receivers. We also derive other linear DoF outer bounds for the MIMO XC that, in addition to the outer bounds from the sum DoF converse and the proposed transmission strategy, allow us to characterize the linear DoF region for symmetric antenna configurations. △ Less

Submitted 5 May, 2014; originally announced May 2014.

Comments: to be presented in part at ISIT 2014

arXiv:1305.3934 [pdf, other]

An Upper Bound on the Capacity of Vector Dirty Paper with Unknown Spin and Stretch

Authors: David T. H. Kao, Ashutosh Sabharwal

Abstract: Dirty paper codes are a powerful tool for combating known interference. However, there is a significant difference between knowing the transmitted interference sequence and knowing the received interference sequence, especially when the channel modifying the interference is uncertain. We present an upper bound on the capacity of a compound vector dirty paper channel where although an additive Gaus… ▽ More Dirty paper codes are a powerful tool for combating known interference. However, there is a significant difference between knowing the transmitted interference sequence and knowing the received interference sequence, especially when the channel modifying the interference is uncertain. We present an upper bound on the capacity of a compound vector dirty paper channel where although an additive Gaussian sequence is known to the transmitter, the channel matrix between the interferer and receiver is uncertain but known to lie within a bounded set. Our bound is tighter than previous bounds in the low-SIR regime for the scalar version of the compound dirty paper channel and employs a construction that focuses on the relationship between the dimension of the message-bearing signal and the dimension of the additive state sequence. Additionally, a bound on the high-SNR behavior of the system is established. △ Less

Submitted 16 May, 2013; originally announced May 2013.

Comments: to be presented at ISIT 2013

Showing 1–29 of 29 results for author: Kao, H