subscribe to arXiv mailings

Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for Image-Text Matching

Authors: Xuri Ge, Fuhai Chen, Songpei Xu, Fuxiang Tao, Jie Wang, Joemon M. Jose

Abstract: Image-text matching (ITM) is a fundamental problem in computer vision. The key issue lies in jointly learning the visual and textual representation to estimate their similarity accurately. Most existing methods focus on feature enhancement within modality or feature interaction across modalities, which, however, neglects the contextual information of the object representation based on the inter-ob… ▽ More Image-text matching (ITM) is a fundamental problem in computer vision. The key issue lies in jointly learning the visual and textual representation to estimate their similarity accurately. Most existing methods focus on feature enhancement within modality or feature interaction across modalities, which, however, neglects the contextual information of the object representation based on the inter-object relationships that match the corresponding sentences with rich contextual semantics. In this paper, we propose a Hybrid-modal Interaction with multiple Relational Enhancements (termed \textit{Hire}) for image-text matching, which correlates the intra- and inter-modal semantics between objects and words with implicit and explicit relationship modelling. In particular, the explicit intra-modal spatial-semantic graph-based reasoning network is designed to improve the contextual representation of visual objects with salient spatial and semantic relational connectivities, guided by the explicit relationships of the objects' spatial positions and their scene graph. We use implicit relationship modelling for potential relationship interactions before explicit modelling to improve the fault tolerance of explicit relationship detection. Then the visual and textual semantic representations are refined jointly via inter-modal interactive attention and cross-modal alignment. To correlate the context of objects with the textual context, we further refine the visual semantic representation via cross-level object-sentence and word-image-based interactive attention. Extensive experiments validate that the proposed hybrid-modal interaction with implicit and explicit modelling is more beneficial for image-text matching. And the proposed \textit{Hire} obtains new state-of-the-art results on MS-COCO and Flickr30K benchmarks. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 22pages, 5 Figures, 6 tables, the extension of CMSEI in WACV23, and submitted to ACM TIST. arXiv admin note: text overlap with arXiv:2210.08908

arXiv:2405.16701 [pdf, other]

Detail-Enhanced Intra- and Inter-modal Interaction for Audio-Visual Emotion Recognition

Authors: Tong Shi, Xuri Ge, Joemon M. Jose, Nicolas Pugeault, Paul Henderson

Abstract: Capturing complex temporal relationships between video and audio modalities is vital for Audio-Visual Emotion Recognition (AVER). However, existing methods lack attention to local details, such as facial state changes between video frames, which can reduce the discriminability of features and thus lower recognition accuracy. In this paper, we propose a Detail-Enhanced Intra- and Inter-modal Intera… ▽ More Capturing complex temporal relationships between video and audio modalities is vital for Audio-Visual Emotion Recognition (AVER). However, existing methods lack attention to local details, such as facial state changes between video frames, which can reduce the discriminability of features and thus lower recognition accuracy. In this paper, we propose a Detail-Enhanced Intra- and Inter-modal Interaction network(DE-III) for AVER, incorporating several novel aspects. We introduce optical flow information to enrich video representations with texture details that better capture facial state changes. A fusion module integrates the optical flow estimation with the corresponding video frames to enhance the representation of facial texture variations. We also design attentive intra- and inter-modal feature enhancement modules to further improve the richness and discriminability of video and audio representations. A detailed quantitative evaluation shows that our proposed model outperforms all existing methods on three benchmark datasets for both concrete and continuous emotion recognition. To encourage further research and ensure replicability, we will release our full code upon acceptance. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: Submitted to 27th International Conference of Pattern Recognition (ICPR 2024)

arXiv:2404.17273 [pdf, other]

doi 10.1016/j.ipm.2024.103716

3SHNet: Boosting Image-Sentence Retrieval via Visual Semantic-Spatial Self-Highlighting

Authors: Xuri Ge, Songpei Xu, Fuhai Chen, Jie Wang, Guoxin Wang, Shan An, Joemon M. Jose

Abstract: In this paper, we propose a novel visual Semantic-Spatial Self-Highlighting Network (termed 3SHNet) for high-precision, high-efficiency and high-generalization image-sentence retrieval. 3SHNet highlights the salient identification of prominent objects and their spatial locations within the visual modality, thus allowing the integration of visual semantics-spatial interactions and maintaining indep… ▽ More In this paper, we propose a novel visual Semantic-Spatial Self-Highlighting Network (termed 3SHNet) for high-precision, high-efficiency and high-generalization image-sentence retrieval. 3SHNet highlights the salient identification of prominent objects and their spatial locations within the visual modality, thus allowing the integration of visual semantics-spatial interactions and maintaining independence between two modalities. This integration effectively combines object regions with the corresponding semantic and position layouts derived from segmentation to enhance the visual representation. And the modality-independence guarantees efficiency and generalization. Additionally, 3SHNet utilizes the structured contextual visual scene information from segmentation to conduct the local (region-based) or global (grid-based) guidance and achieve accurate hybrid-level retrieval. Extensive experiments conducted on MS-COCO and Flickr30K benchmarks substantiate the superior performances, inference efficiency and generalization of the proposed 3SHNet when juxtaposed with contemporary state-of-the-art methodologies. Specifically, on the larger MS-COCO 5K test set, we achieve 16.3%, 24.8%, and 18.3% improvements in terms of rSum score, respectively, compared with the state-of-the-art methods using different image representations, while maintaining optimal retrieval efficiency. Moreover, our performance on cross-dataset generalization improves by 18.6%. Data and code are available at https://github.com/XuriGe1995/3SHNet. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: Accepted Information Processing and Management (IP&M), 10 pages, 9 figures and 8 tables

Journal ref: Information Processing & Management, Volume 61, Issue 4, July 2024, 103716

arXiv:2404.02059 [pdf, other]

doi 10.1145/3626772.3657725

IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT

Authors: Junchen Fu, Xuri Ge, Xin Xin, Alexandros Karatzoglou, Ioannis Arapakis, Jie Wang, Joemon M. Jose

Abstract: Multimodal foundation models are transformative in sequential recommender systems, leveraging powerful representation learning capabilities. While Parameter-efficient Fine-tuning (PEFT) is commonly used to adapt foundation models for recommendation tasks, most research prioritizes parameter efficiency, often overlooking critical factors like GPU memory efficiency and training speed. Addressing thi… ▽ More Multimodal foundation models are transformative in sequential recommender systems, leveraging powerful representation learning capabilities. While Parameter-efficient Fine-tuning (PEFT) is commonly used to adapt foundation models for recommendation tasks, most research prioritizes parameter efficiency, often overlooking critical factors like GPU memory efficiency and training speed. Addressing this gap, our paper introduces IISAN (Intra- and Inter-modal Side Adapted Network for Multimodal Representation), a simple plug-and-play architecture using a Decoupled PEFT structure and exploiting both intra- and inter-modal adaptation. IISAN matches the performance of full fine-tuning (FFT) and state-of-the-art PEFT. More importantly, it significantly reduces GPU memory usage - from 47GB to just 3GB for multimodal sequential recommendation tasks. Additionally, it accelerates training time per epoch from 443s to 22s compared to FFT. This is also a notable improvement over the Adapter and LoRA, which require 37-39 GB GPU memory and 350-380 seconds per epoch for training. Furthermore, we propose a new composite efficiency metric, TPME (Training-time, Parameter, and GPU Memory Efficiency) to alleviate the prevalent misconception that "parameter efficiency represents overall efficiency". TPME provides more comprehensive insights into practical efficiency comparisons between different methods. Besides, we give an accessible efficiency analysis of all PEFT and FFT approaches, which demonstrate the superiority of IISAN. We release our codes and other materials at https://github.com/GAIR-Lab/IISAN. △ Less

Submitted 11 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: Accepted by SIGIR2024

arXiv:2404.01618 [pdf, other]

Multi-Robot Collaborative Navigation with Formation Adaptation

Authors: Zihao Deng, Peng Gao, Williard Joshua Jose, Hao Zhang

Abstract: Multi-robot collaborative navigation is an essential ability where teamwork and synchronization are keys. In complex and uncertain environments, adaptive formation is vital, as rigid formations prove to be inadequate. The ability of robots to dynamically adjust their formation enables navigation through unpredictable spaces, maintaining cohesion, and effectively responding to environmental challen… ▽ More Multi-robot collaborative navigation is an essential ability where teamwork and synchronization are keys. In complex and uncertain environments, adaptive formation is vital, as rigid formations prove to be inadequate. The ability of robots to dynamically adjust their formation enables navigation through unpredictable spaces, maintaining cohesion, and effectively responding to environmental challenges. In this paper, we introduce a novel approach that uses bi-level learning framework. Specifically, we use graph learning at a high level for group coordination and reinforcement learning for individual navigation. We innovate by integrating a spring-damper model within the reinforcement learning reward mechanism, addressing the rigidity of traditional formation control methods. During execution, our approach enables a team of robots to successfully navigate challenging environments, maintain a desired formation shape, and dynamically adjust their formation scale based on environmental information. We conduct extensive experiments to evaluate our approach across three distinct formation scenarios in multi-robot navigation: circle, line, and wedge. Experimental results show that our approach achieves promising results and scalability on multi-robot navigation with formation adaptation. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2403.16948 [pdf, other]

Reinforcement Learning-based Recommender Systems with Large Language Models for State Reward and Action Modeling

Authors: Jie Wang, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M. Jose

Abstract: Reinforcement Learning (RL)-based recommender systems have demonstrated promising performance in meeting user expectations by learning to make accurate next-item recommendations from historical user-item interactions. However, existing offline RL-based sequential recommendation methods face the challenge of obtaining effective user feedback from the environment. Effectively modeling the user state… ▽ More Reinforcement Learning (RL)-based recommender systems have demonstrated promising performance in meeting user expectations by learning to make accurate next-item recommendations from historical user-item interactions. However, existing offline RL-based sequential recommendation methods face the challenge of obtaining effective user feedback from the environment. Effectively modeling the user state and shaping an appropriate reward for recommendation remains a challenge. In this paper, we leverage language understanding capabilities and adapt large language models (LLMs) as an environment (LE) to enhance RL-based recommenders. The LE is learned from a subset of user-item interaction data, thus reducing the need for large training data, and can synthesise user feedback for offline data by: (i) acting as a state model that produces high quality states that enrich the user representation, and (ii) functioning as a reward model to accurately capture nuanced user preferences on actions. Moreover, the LE allows to generate positive actions that augment the limited offline training data. We propose a LE Augmentation (LEA) method to further improve recommendation performance by optimising jointly the supervised component and the RL policy, using the augmented actions and historical user signals. We use LEA, the state and reward models in conjunction with state-of-the-art RL recommenders and report experimental results on two publicly available datasets. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.13690 [pdf, other]

doi 10.1145/3597503.3639167

MotorEase: Automated Detection of Motor Impairment Accessibility Issues in Mobile App UIs

Authors: Arun Krishnavajjala, SM Hasan Mansur, Justin Jose, Kevin Moran

Abstract: Recent research has begun to examine the potential of automatically finding and fixing accessibility issues that manifest in software. However, while recent work makes important progress, it has generally been skewed toward identifying issues that affect users with certain disabilities, such as those with visual or hearing impairments. However, there are other groups of users with different types… ▽ More Recent research has begun to examine the potential of automatically finding and fixing accessibility issues that manifest in software. However, while recent work makes important progress, it has generally been skewed toward identifying issues that affect users with certain disabilities, such as those with visual or hearing impairments. However, there are other groups of users with different types of disabilities that also need software tooling support to improve their experience. As such, this paper aims to automatically identify accessibility issues that affect users with motor-impairments. To move toward this goal, this paper introduces a novel approach, called MotorEase, capable of identifying accessibility issues in mobile app UIs that impact motor-impaired users. Motor-impaired users often have limited ability to interact with touch-based devices, and instead may make use of a switch or other assistive mechanism -- hence UIs must be designed to support both limited touch gestures and the use of assistive devices. MotorEase adapts computer vision and text processing techniques to enable a semantic understanding of app UI screens, enabling the detection of violations related to four popular, previously unexplored UI design guidelines that support motor-impaired users, including: (i) visual touch target size, (ii) expanding sections, (iii) persisting elements, and (iv) adjacent icon visual distance. We evaluate MotorEase on a newly derived benchmark, called MotorCheck, that contains 555 manually annotated examples of violations to the above accessibility guidelines, across 1599 screens collected from 70 applications via a mobile app testing tool. Our experiments illustrate that MotorEase is able to identify violations with an average accuracy of ~90%, and a false positive rate of less than 9%, outperforming baseline techniques. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: Accepted to ICSE 2024 Research Track, 13 pages

arXiv:2402.15276 [pdf, other]

CFIR: Fast and Effective Long-Text To Image Retrieval for Large Corpora

Authors: Zijun Long, Xuri Ge, Richard Mccreadie, Joemon Jose

Abstract: Text-to-image retrieval aims to find the relevant images based on a text query, which is important in various use-cases, such as digital libraries, e-commerce, and multimedia databases. Although Multimodal Large Language Models (MLLMs) demonstrate state-of-the-art performance, they exhibit limitations in handling large-scale, diverse, and ambiguous real-world needs of retrieval, due to the computa… ▽ More Text-to-image retrieval aims to find the relevant images based on a text query, which is important in various use-cases, such as digital libraries, e-commerce, and multimedia databases. Although Multimodal Large Language Models (MLLMs) demonstrate state-of-the-art performance, they exhibit limitations in handling large-scale, diverse, and ambiguous real-world needs of retrieval, due to the computation cost and the injective embeddings they produce. This paper presents a two-stage Coarse-to-Fine Index-shared Retrieval (CFIR) framework, designed for fast and effective large-scale long-text to image retrieval. The first stage, Entity-based Ranking (ER), adapts to long-text query ambiguity by employing a multiple-queries-to-multiple-targets paradigm, facilitating candidate filtering for the next stage. The second stage, Summary-based Re-ranking (SR), refines these rankings using summarized queries. We also propose a specialized Decoupling-BEiT-3 encoder, optimized for handling ambiguous user needs and both stages, which also enhances computational efficiency through vector-based similarity inference. Evaluation on the AToMiC dataset reveals that CFIR surpasses existing MLLMs by up to 11.06% in Recall@1000, while reducing training and retrieval times by 68.75% and 99.79%, respectively. We will release our code to facilitate future research at https://github.com/longkukuhi/CFIR. △ Less

Submitted 2 April, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.06194 [pdf, other]

SuperBench: Improving Cloud AI Infrastructure Reliability with Proactive Validation

Authors: Yifan Xiong, Yuting Jiang, Ziyue Yang, Lei Qu, Guoshuai Zhao, Shuguang Liu, Dong Zhong, Boris Pinzur, Jie Zhang, Yang Wang, Jithin Jose, Hossein Pourreza, Jeff Baxter, Kushal Datta, Prabhat Ram, Luke Melton, Joe Chau, Peng Cheng, Yongqiang Xiong, Lidong Zhou

Abstract: Reliability in cloud AI infrastructure is crucial for cloud service providers, prompting the widespread use of hardware redundancies. However, these redundancies can inadvertently lead to hidden degradation, so called "gray failure", for AI workloads, significantly affecting end-to-end performance and concealing performance issues, which complicates root cause analysis for failures and regressions… ▽ More Reliability in cloud AI infrastructure is crucial for cloud service providers, prompting the widespread use of hardware redundancies. However, these redundancies can inadvertently lead to hidden degradation, so called "gray failure", for AI workloads, significantly affecting end-to-end performance and concealing performance issues, which complicates root cause analysis for failures and regressions. We introduce SuperBench, a proactive validation system for AI infrastructure that mitigates hidden degradation caused by hardware redundancies and enhances overall reliability. SuperBench features a comprehensive benchmark suite, capable of evaluating individual hardware components and representing most real AI workloads. It comprises a Validator which learns benchmark criteria to clearly pinpoint defective components. Additionally, SuperBench incorporates a Selector to balance validation time and issue-related penalties, enabling optimal timing for validation execution with a tailored subset of benchmarks. Through testbed evaluation and simulation, we demonstrate that SuperBench can increase the mean time between incidents by up to 22.61x. SuperBench has been successfully deployed in Azure production, validating hundreds of thousands of GPUs over the last two years. △ Less

Submitted 7 June, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

Comments: USENIX ATC '24

arXiv:2308.13976 [pdf, other]

Label Denoising through Cross-Model Agreement

Authors: Yu Wang, Xin Xin, Zaiqiao Meng, Joemon Jose, Fuli Feng

Abstract: Learning from corrupted labels is very common in real-world machine-learning applications. Memorizing such noisy labels could affect the learning of the model, leading to sub-optimal performances. In this work, we propose a novel framework to learn robust machine-learning models from noisy labels. Through an empirical study, we find that different models make relatively similar predictions on clea… ▽ More Learning from corrupted labels is very common in real-world machine-learning applications. Memorizing such noisy labels could affect the learning of the model, leading to sub-optimal performances. In this work, we propose a novel framework to learn robust machine-learning models from noisy labels. Through an empirical study, we find that different models make relatively similar predictions on clean examples, while the predictions on noisy examples vary much more across different models. Motivated by this observation, we propose \em denoising with cross-model agreement \em (DeCA) which aims to minimize the KL-divergence between the true label distributions parameterized by two machine learning models while maximizing the likelihood of data observation. We employ the proposed DeCA on both the binary label scenario and the multiple label scenario. For the binary label scenario, we select implicit feedback recommendation as the downstream task and conduct experiments with four state-of-the-art recommendation models on four datasets. For the multiple-label scenario, the downstream application is image classification on two benchmark datasets. Experimental results demonstrate that the proposed methods significantly improve the model performance compared with normal training and other denoising methods on both binary and multiple-label scenarios. △ Less

Submitted 18 December, 2023; v1 submitted 26 August, 2023; originally announced August 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2105.09605

arXiv:2305.11081 [pdf, other]

Contrastive State Augmentations for Reinforcement Learning-Based Recommender Systems

Authors: Zhaochun Ren, Na Huang, Yidan Wang, Pengjie Ren, Jun Ma, Jiahuan Lei, Xinlei Shi, Hengliang Luo, Joemon M Jose, Xin Xin

Abstract: Learning reinforcement learning (RL)-based recommenders from historical user-item interaction sequences is vital to generate high-reward recommendations and improve long-term cumulative benefits. However, existing RL recommendation methods encounter difficulties (i) to estimate the value functions for states which are not contained in the offline training data, and (ii) to learn effective state re… ▽ More Learning reinforcement learning (RL)-based recommenders from historical user-item interaction sequences is vital to generate high-reward recommendations and improve long-term cumulative benefits. However, existing RL recommendation methods encounter difficulties (i) to estimate the value functions for states which are not contained in the offline training data, and (ii) to learn effective state representations from user implicit feedback due to the lack of contrastive signals. In this work, we propose contrastive state augmentations (CSA) for the training of RL-based recommender systems. To tackle the first issue, we propose four state augmentation strategies to enlarge the state space of the offline data. The proposed method improves the generalization capability of the recommender by making the RL agent visit the local state regions and ensuring the learned value functions are similar between the original and augmented states. For the second issue, we propose introducing contrastive signals between augmented states and the state randomly sampled from other sessions to improve the state representation learning further. To verify the effectiveness of the proposed CSA, we conduct extensive experiments on two publicly accessible datasets and one dataset collected from a real-life e-commerce platform. We also conduct experiments on a simulated environment as the online evaluation setting. Experimental results demonstrate that CSA can effectively improve recommendation performance. △ Less

Submitted 18 May, 2023; originally announced May 2023.

arXiv:2305.09977 [pdf, other]

doi 10.1145/3615979.3656059

DRackSim: Simulator for Rack-scale Memory Disaggregation

Authors: Amit Puri, John Jose, Tamarapalli Venkatesh, Vijaykrishnan Narayanan

Abstract: Memory disaggregation has emerged as an alternative to traditional server architecture in data centers. This paper introduces DRackSim, a simulation infrastructure to model rack-scale hardware disaggregated memory. DRackSim models multiple compute nodes, memory pools, and a rack-scale interconnect similar to GenZ. An application-level simulation approach simulates an x86 out-of-order multi-core pr… ▽ More Memory disaggregation has emerged as an alternative to traditional server architecture in data centers. This paper introduces DRackSim, a simulation infrastructure to model rack-scale hardware disaggregated memory. DRackSim models multiple compute nodes, memory pools, and a rack-scale interconnect similar to GenZ. An application-level simulation approach simulates an x86 out-of-order multi-core processor with a multi-level cache hierarchy at compute nodes. A queue-based simulation is used to model a remote memory controller and rack-level interconnect, which allows both cache-based and page-based access to remote memory. DRackSim models a central memory manager to manage address space at the memory pools. We integrate community-accepted DRAMSim2 to perform memory simulation at local and remote memory using multiple DRAMSim2 instances. An incremental approach is followed to validate the core and cache subsystem of DRackSim with that of Gem5. We measure the performance of various HPC workloads and show the performance impact for different nodes/pools configuration. △ Less

Submitted 19 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

arXiv:2305.05585 [pdf, other]

doi 10.1145/3539618.3591697

Improving Implicit Feedback-Based Recommendation through Multi-Behavior Alignment

Authors: Xin Xin, Xiangyuan Liu, Hanbing Wang, Pengjie Ren, Zhumin Chen, Jiahuan Lei, Xinlei Shi, Hengliang Luo, Joemon Jose, Maarten de Rijke, Zhaochun Ren

Abstract: Recommender systems that learn from implicit feedback often use large volumes of a single type of implicit user feedback, such as clicks, to enhance the prediction of sparse target behavior such as purchases. Using multiple types of implicit user feedback for such target behavior prediction purposes is still an open question. Existing studies that attempted to learn from multiple types of user beh… ▽ More Recommender systems that learn from implicit feedback often use large volumes of a single type of implicit user feedback, such as clicks, to enhance the prediction of sparse target behavior such as purchases. Using multiple types of implicit user feedback for such target behavior prediction purposes is still an open question. Existing studies that attempted to learn from multiple types of user behavior often fail to: (i) learn universal and accurate user preferences from different behavioral data distributions, and (ii) overcome the noise and bias in observed implicit user feedback. To address the above problems, we propose multi-behavior alignment (MBA), a novel recommendation framework that learns from implicit feedback by using multiple types of behavioral data. We conjecture that multiple types of behavior from the same user (e.g., clicks and purchases) should reflect similar preferences of that user. To this end, we regard the underlying universal user preferences as a latent variable. The variable is inferred by maximizing the likelihood of multiple observed behavioral data distributions and, at the same time, minimizing the Kullback-Leibler divergence (KL-divergence) between user models learned from auxiliary behavior (such as clicks or views) and the target behavior separately. MBA infers universal user preferences from multi-behavior data and performs data denoising to enable effective knowledge transfer. We conduct experiments on three datasets, including a dataset collected from an operational e-commerce platform. Empirical results demonstrate the effectiveness of our proposed method in utilizing multiple types of behavioral data to enhance the prediction of the target behavior. △ Less

Submitted 9 May, 2023; originally announced May 2023.

arXiv:2303.06420 [pdf, other]

doi 10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00060

Design and Evaluation of a Rack-Scale Disaggregated Memory Architecture For Data Centers

Authors: Amit Puri, John Jose, Tamarapalli Venkatesh

Abstract: Memory disaggregation is being considered as a strong alternative to traditional architecture to deal with the memory under-utilization in data centers. Disaggregated memory can adapt to dynamically changing memory requirements for the data center applications like data analytics, big data, etc., that require in-memory processing. However, such systems can face high remote memory access latency du… ▽ More Memory disaggregation is being considered as a strong alternative to traditional architecture to deal with the memory under-utilization in data centers. Disaggregated memory can adapt to dynamically changing memory requirements for the data center applications like data analytics, big data, etc., that require in-memory processing. However, such systems can face high remote memory access latency due to the interconnect speeds. In this paper, we explore a rack-scale disaggregated memory architecture and discuss the various design aspects. We design a trace-driven simulator that combines an event-based interconnect and a cycle-accurate memory simulator to evaluate the performance of disaggregated memory system at the rack scale. Our study shows that not only the interconnect but the contention in the remote memory queues also adds significantly to remote memory access latency. We introduces a memory allocation policy to reduce the latency compared to the conventional policies. We conduct experiments using various benchmarks with diverse memory access patterns. Our study shows encouraging results towards the rack-scale memory disaggregation and acceptable average memory access latency. △ Less

Submitted 8 April, 2023; v1 submitted 11 March, 2023; originally announced March 2023.

arXiv:2211.10155 [pdf, other]

Structured Pruning Adapters

Authors: Lukas Hedegaard, Aman Alok, Juby Jose, Alexandros Iosifidis

Abstract: Adapters are a parameter-efficient alternative to fine-tuning, which augment a frozen base network to learn new tasks. Yet, the inference of the adapted model is often slower than the corresponding fine-tuned model. To improve on this, we propose Structured Pruning Adapters (SPAs), a family of compressing, task-switching network adapters, that accelerate and specialize networks using tiny paramete… ▽ More Adapters are a parameter-efficient alternative to fine-tuning, which augment a frozen base network to learn new tasks. Yet, the inference of the adapted model is often slower than the corresponding fine-tuned model. To improve on this, we propose Structured Pruning Adapters (SPAs), a family of compressing, task-switching network adapters, that accelerate and specialize networks using tiny parameter sets and structured pruning. Specifically, we propose a channel-based SPA and evaluate it with a suite of pruning methods on multiple computer vision benchmarks. Compared to regular structured pruning with fine-tuning, our channel-SPAs improve accuracy by 6.9% on average while using half the parameters at 90% pruned weights. Alternatively, they can learn adaptations with 17x fewer parameters at 70% pruning with 1.6% lower accuracy. Similarly, our block-SPA requires far fewer parameters than pruning with fine-tuning. Our experimental code and Python library of adapters are available at github.com/lukashedegaard/structured-pruning-adapters. △ Less

Submitted 2 February, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

Comments: 11 pages, 6 figures, 2 tables

arXiv:2211.00718 [pdf]

SleepyWheels: An Ensemble Model for Drowsiness Detection leading to Accident Prevention

Authors: Jomin Jose, Andrew J, Kumudha Raimond, Shweta Vincent

Abstract: Around 40 percent of accidents related to driving on highways in India occur due to the driver falling asleep behind the steering wheel. Several types of research are ongoing to detect driver drowsiness but they suffer from the complexity and cost of the models. In this paper, SleepyWheels a revolutionary method that uses a lightweight neural network in conjunction with facial landmark identificat… ▽ More Around 40 percent of accidents related to driving on highways in India occur due to the driver falling asleep behind the steering wheel. Several types of research are ongoing to detect driver drowsiness but they suffer from the complexity and cost of the models. In this paper, SleepyWheels a revolutionary method that uses a lightweight neural network in conjunction with facial landmark identification is proposed to identify driver fatigue in real time. SleepyWheels is successful in a wide range of test scenarios, including the lack of facial characteristics while covering the eye or mouth, the drivers varying skin tones, camera placements, and observational angles. It can work well when emulated to real time systems. SleepyWheels utilized EfficientNetV2 and a facial landmark detector for identifying drowsiness detection. The model is trained on a specially created dataset on driver sleepiness and it achieves an accuracy of 97 percent. The model is lightweight hence it can be further deployed as a mobile application for various platforms. △ Less

Submitted 1 November, 2022; originally announced November 2022.

Comments: 20 pages

arXiv:2210.08908 [pdf, other]

Cross-modal Semantic Enhanced Interaction for Image-Sentence Retrieval

Authors: Xuri Ge, Fuhai Chen, Songpei Xu, Fuxiang Tao, Joemon M. Jose

Abstract: Image-sentence retrieval has attracted extensive research attention in multimedia and computer vision due to its promising application. The key issue lies in jointly learning the visual and textual representation to accurately estimate their similarity. To this end, the mainstream schema adopts an object-word based attention to calculate their relevance scores and refine their interactive represen… ▽ More Image-sentence retrieval has attracted extensive research attention in multimedia and computer vision due to its promising application. The key issue lies in jointly learning the visual and textual representation to accurately estimate their similarity. To this end, the mainstream schema adopts an object-word based attention to calculate their relevance scores and refine their interactive representations with the attention features, which, however, neglects the context of the object representation on the inter-object relationship that matches the predicates in sentences. In this paper, we propose a Cross-modal Semantic Enhanced Interaction method, termed CMSEI for image-sentence retrieval, which correlates the intra- and inter-modal semantics between objects and words. In particular, we first design the intra-modal spatial and semantic graphs based reasoning to enhance the semantic representations of objects guided by the explicit relationships of the objects' spatial positions and their scene graph. Then the visual and textual semantic representations are refined jointly via the inter-modal interactive attention and the cross-modal alignment. To correlate the context of objects with the textual context, we further refine the visual semantic representation via the cross-level object-sentence and word-image based interactive attention. Experimental results on seven standard evaluation metrics show that the proposed CMSEI outperforms the state-of-the-art and the alternative approaches on MS-COCO and Flickr30K benchmarks. △ Less

Submitted 17 October, 2022; originally announced October 2022.

Comments: accepted to WACV 2023

arXiv:2208.09070 [pdf]

Electronic, Wireless, and Photonic Network-on-Chip Security: Challenges and Countermeasures

Authors: Sudeep Pasricha, John Jose, Sujay Deb

Abstract: Networks-on-chips (NoCs) are an integral part of emerging manycore computing chips. They play a key role in facilitating communication among processing cores and between cores and memory. To meet the aggressive performance and energy-efficiency targets of machine learning and big data applications, NoCs have been evolving to leverage emerging paradigms such as silicon photonics and wireless commun… ▽ More Networks-on-chips (NoCs) are an integral part of emerging manycore computing chips. They play a key role in facilitating communication among processing cores and between cores and memory. To meet the aggressive performance and energy-efficiency targets of machine learning and big data applications, NoCs have been evolving to leverage emerging paradigms such as silicon photonics and wireless communication. Increasingly, these NoC fabrics are becoming susceptible to security vulnerabilities, such as from hardware trojans that can snoop, corrupt, or disrupt information transfers on NoCs. This article surveys the landscape of security challenges and countermeasures across electronic, wireless, and photonic NoCs. △ Less

Submitted 18 August, 2022; originally announced August 2022.

arXiv:2206.06190 [pdf, other]

TransRec: Learning Transferable Recommendation from Mixture-of-Modality Feedback

Authors: Jie Wang, Fajie Yuan, Mingyue Cheng, Joemon M. Jose, Chenyun Yu, Beibei Kong, Xiangnan He, Zhijin Wang, Bo Hu, Zang Li

Abstract: Learning large-scale pre-trained models on broad-ranging data and then transfer to a wide range of target tasks has become the de facto paradigm in many machine learning (ML) communities. Such big models are not only strong performers in practice but also offer a promising way to break out of the task-specific modeling restrictions, thereby enabling task-agnostic and unified ML systems. However, s… ▽ More Learning large-scale pre-trained models on broad-ranging data and then transfer to a wide range of target tasks has become the de facto paradigm in many machine learning (ML) communities. Such big models are not only strong performers in practice but also offer a promising way to break out of the task-specific modeling restrictions, thereby enabling task-agnostic and unified ML systems. However, such a popular paradigm is mainly unexplored by the recommender systems (RS) community. A critical issue is that standard recommendation models are primarily built on categorical identity features. That is, the users and the interacted items are represented by their unique IDs, which are generally not shareable across different systems or platforms. To pursue the transferable recommendations, we propose studying pre-trained RS models in a novel scenario where a user's interaction feedback involves a mixture-of-modality (MoM) items, e.g., text and images. We then present TransRec, a very simple modification made on the popular ID-based RS framework. TransRec learns directly from the raw features of the MoM items in an end-to-end training manner and thus enables effective transfer learning under various scenarios without relying on overlapped users or items. We empirically study the transferring ability of TransRec across four different real-world recommendation settings. Besides, we look at its effects by scaling source and target data size. Our results suggest that learning neural recommendation models from MoM feedback provides a promising way to realize universal RS. △ Less

Submitted 3 November, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

arXiv:2206.03382 [pdf, other]

Tutel: Adaptive Mixture-of-Experts at Scale

Authors: Changho Hwang, Wei Cui, Yifan Xiong, Ziyue Yang, Ze Liu, Han Hu, Zilong Wang, Rafael Salas, Jithin Jose, Prabhat Ram, Joe Chau, Peng Cheng, Fan Yang, Mao Yang, Yongqiang Xiong

Abstract: Sparsely-gated mixture-of-experts (MoE) has been widely adopted to scale deep learning models to trillion-plus parameters with fixed computational cost. The algorithmic performance of MoE relies on its token routing mechanism that forwards each input token to the right sub-models or experts. While token routing dynamically determines the amount of expert workload at runtime, existing systems suffe… ▽ More Sparsely-gated mixture-of-experts (MoE) has been widely adopted to scale deep learning models to trillion-plus parameters with fixed computational cost. The algorithmic performance of MoE relies on its token routing mechanism that forwards each input token to the right sub-models or experts. While token routing dynamically determines the amount of expert workload at runtime, existing systems suffer inefficient computation due to their static execution, namely static parallelism and pipelining, which does not adapt to the dynamic workload. We present Flex, a highly scalable stack design and implementation for MoE with dynamically adaptive parallelism and pipelining. Flex designs an identical layout for distributing MoE model parameters and input data, which can be leveraged by all possible parallelism or pipelining methods without any mathematical inequivalence or tensor migration overhead. This enables adaptive parallelism/pipelining optimization at zero cost during runtime. Based on this key design, Flex also implements various MoE acceleration techniques. Aggregating all techniques, Flex finally delivers huge speedup at any scale -- 4.96x and 5.75x speedup of a single MoE layer over 16 and 2,048 A100 GPUs, respectively, over the previous state-of-the-art. Our evaluation shows that Flex efficiently and effectively runs a real-world MoE-based model named SwinV2-MoE, built upon Swin Transformer V2, a state-of-the-art computer vision architecture. On efficiency, Flex accelerates SwinV2-MoE, achieving up to 1.55x and 2.11x speedup in training and inference over Fairseq, respectively. On effectiveness, the SwinV2-MoE model achieves superior accuracy in both pre-training and down-stream computer vision tasks such as COCO object detection than the counterpart dense model, indicating the readiness of Flex for end-to-end real-world model training and inference. △ Less

Submitted 5 June, 2023; v1 submitted 7 June, 2022; originally announced June 2022.

arXiv:2204.04993 [pdf, other]

Ischemic Stroke Lesion Segmentation Using Adversarial Learning

Authors: Mobarakol Islam, N Rajiv Vaidyanathan, V Jeya Maria Jose, Hongliang Ren

Abstract: Ischemic stroke occurs through a blockage of clogged blood vessels supplying blood to the brain. Segmentation of the stroke lesion is vital to improve diagnosis, outcome assessment and treatment planning. In this work, we propose a segmentation model with adversarial learning for ischemic lesion segmentation. We adopt U-Net with skip connection and dropout as segmentation baseline network and a fu… ▽ More Ischemic stroke occurs through a blockage of clogged blood vessels supplying blood to the brain. Segmentation of the stroke lesion is vital to improve diagnosis, outcome assessment and treatment planning. In this work, we propose a segmentation model with adversarial learning for ischemic lesion segmentation. We adopt U-Net with skip connection and dropout as segmentation baseline network and a fully connected network (FCN) as discriminator network. Discriminator network consists of 5 convolution layers followed by leaky-ReLU and an upsampling layer to rescale the output to the size of the input map. Training a segmentation network along with an adversarial network can detect and correct higher order inconsistencies between the segmentation maps produced by ground-truth and the Segmentor. We exploit three modalities (CT, DPWI, CBF) of acute computed tomography (CT) perfusion data provided in ISLES 2018 (Ischemic Stroke Lesion Segmentation) for ischemic lesion segmentation. Our model has achieved dice accuracy of 42.10% with the cross-validation of training and 39% with the testing data. △ Less

Submitted 11 April, 2022; originally announced April 2022.

Comments: Published in MICCAI ISLES Challenge 2018

arXiv:2204.01349 [pdf, other]

doi 10.1145/3643863

MGRR-Net: Multi-level Graph Relational Reasoning Network for Facial Action Units Detection

Authors: Xuri Ge, Joemon M. Jose, Songpei Xu, Xiao Liu, Hu Han

Abstract: The Facial Action Coding System (FACS) encodes the action units (AUs) in facial images, which has attracted extensive research attention due to its wide use in facial expression analysis. Many methods that perform well on automatic facial action unit (AU) detection primarily focus on modeling various types of AU relations between corresponding local muscle areas, or simply mining global attention-… ▽ More The Facial Action Coding System (FACS) encodes the action units (AUs) in facial images, which has attracted extensive research attention due to its wide use in facial expression analysis. Many methods that perform well on automatic facial action unit (AU) detection primarily focus on modeling various types of AU relations between corresponding local muscle areas, or simply mining global attention-aware facial features, however, neglect the dynamic interactions among local-global features. We argue that encoding AU features just from one perspective may not capture the rich contextual information between regional and global face features, as well as the detailed variability across AUs, because of the diversity in expression and individual characteristics. In this paper, we propose a novel Multi-level Graph Relational Reasoning Network (termed MGRR-Net) for facial AU detection. Each layer of MGRR-Net performs a multi-level (i.e., region-level, pixel-wise and channel-wise level) feature learning. While the region-level feature learning from local face patches features via graph neural network can encode the correlation across different AUs, the pixel-wise and channel-wise feature learning via graph attention network can enhance the discrimination ability of AU features from global face features. The fused features from the three levels lead to improved AU discriminative ability. Extensive experiments on DISFA and BP4D AU datasets show that the proposed approach achieves superior performance than the state-of-the-art methods. △ Less

Submitted 22 May, 2024; v1 submitted 4 April, 2022; originally announced April 2022.

Comments: 20 pages, 5 figures, 8 tables;

Journal ref: ACM Transactions on Intelligent Systems and Technology,2024

arXiv:2203.01800 [pdf, other]

Automatic Facial Paralysis Estimation with Facial Action Units

Authors: Xuri Ge, Joemon M. Jose, Pengcheng Wang, Arunachalam Iyer, Xiao Liu, Hu Han

Abstract: Facial palsy is unilateral facial nerve weakness or paralysis of rapid onset with unknown causes. Automatically estimating facial palsy severeness can be helpful for the diagnosis and treatment of people suffering from it across the world. In this work, we develop and experiment with a novel model for estimating facial palsy severity. For this, an effective Facial Action Units (AU) detection techn… ▽ More Facial palsy is unilateral facial nerve weakness or paralysis of rapid onset with unknown causes. Automatically estimating facial palsy severeness can be helpful for the diagnosis and treatment of people suffering from it across the world. In this work, we develop and experiment with a novel model for estimating facial palsy severity. For this, an effective Facial Action Units (AU) detection technique is incorporated into our model, where AUs refer to a unique set of facial muscle movements used to describe almost every anatomically possible facial expression. In this paper, we propose a novel Adaptive Local-Global Relational Network (ALGRNet) for facial AU detection and use it to classify facial paralysis severity. ALGRNet mainly consists of three main novel structures: (i) an adaptive region learning module that learns the adaptive muscle regions based on the detected landmarks; (ii) a skip-BiLSTM that models the latent relationships among local AUs; and (iii) a feature fusion&refining module that investigates the complementary between the local and global face. Quantitative results on two AU benchmarks, i.e., BP4D and DISFA, demonstrate our ALGRNet can achieve promising AU detection accuracy. We further demonstrate the effectiveness of its application to facial paralysis estimation by migrating ALGRNet to a facial paralysis dataset collected and annotated by medical professionals. △ Less

Submitted 30 March, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

Comments: 12 pages, 5 figures, resubmitted to IEEE Transactions on Affective Computing

arXiv:2111.03474 [pdf, other]

Supervised Advantage Actor-Critic for Recommender Systems

Authors: Xin Xin, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M. Jose

Abstract: Casting session-based or sequential recommendation as reinforcement learning (RL) through reward signals is a promising research direction towards recommender systems (RS) that maximize cumulative profits. However, the direct use of RL algorithms in the RS setting is impractical due to challenges like off-policy training, huge action spaces and lack of sufficient reward signals. Recent RL approach… ▽ More Casting session-based or sequential recommendation as reinforcement learning (RL) through reward signals is a promising research direction towards recommender systems (RS) that maximize cumulative profits. However, the direct use of RL algorithms in the RS setting is impractical due to challenges like off-policy training, huge action spaces and lack of sufficient reward signals. Recent RL approaches for RS attempt to tackle these challenges by combining RL and (self-)supervised sequential learning, but still suffer from certain limitations. For example, the estimation of Q-values tends to be biased toward positive values due to the lack of negative reward signals. Moreover, the Q-values also depend heavily on the specific timestamp of a sequence. To address the above problems, we propose negative sampling strategy for training the RL component and combine it with supervised sequential learning. We call this method Supervised Negative Q-learning (SNQN). Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case, which can be further utilized as a normalized weight for learning the supervised sequential part. This leads to another learning framework: Supervised Advantage Actor-Critic (SA2C). We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets. Experimental results show that the proposed approaches achieve significantly better performance than state-of-the-art supervised methods and existing self-supervised RL methods . Code will be open-sourced. △ Less

Submitted 5 November, 2021; originally announced November 2021.

Comments: 9 pages, 4 figures, In Proceedings of the 15th ACM International Conference on Web Search and Data Mining (WSDM '22), February 21-25, 2022, Phoenix, Arizona. arXiv admin note: text overlap with arXiv:2006.05779

arXiv:2110.05688 [pdf]

Inclusive Design: Accessibility Settings for People with Cognitive Disabilities

Authors: Trae Waggoner, Julia Ann Jose, Ashwin Nair, Sudarsan Manikandan

Abstract: The advancement of technology has progressed faster than any other field in the world and with the development of these new technologies, it is important to make sure that these tools can be used by everyone, including people with disabilities. Accessibility options in computing devices help ensure that everyone has the same access to advanced technologies. Unfortunately, for those who require mor… ▽ More The advancement of technology has progressed faster than any other field in the world and with the development of these new technologies, it is important to make sure that these tools can be used by everyone, including people with disabilities. Accessibility options in computing devices help ensure that everyone has the same access to advanced technologies. Unfortunately, for those who require more unique and sometimes challenging accommodations, such as people with Amyotrophic lateral sclerosis ( ALS), the most commonly used accessibility features are simply not enough. While assistive technology for those with ALS does exist, it requires multiple peripheral devices that can become quite expensive collectively. The purpose of this paper is to suggest a more affordable and readily available option for ALS assistive technology that can be implemented on a smartphone or tablet. △ Less

Submitted 11 October, 2021; originally announced October 2021.

arXiv:2110.05661 [pdf]

BotNet Detection on Social Media

Authors: Aniket Chandrakant Devle, Julia Ann Jose, Abhay Shrinivas Saraswathula, Shubham Mehta, Siddhant Srivastava, Sirisha Kona, Sudheera Daggumalli

Abstract: As our reliance on social media platforms and web services increase day by day, exploiters view these platforms as an opportunity to manipulate our thoughts ad actions. These platforms have become an open playground for social bot accounts. Social bots not only learn human conversations, manners, and presence but also manipulate public opinion, act as scammers, manipulate stock markets, and so on.… ▽ More As our reliance on social media platforms and web services increase day by day, exploiters view these platforms as an opportunity to manipulate our thoughts ad actions. These platforms have become an open playground for social bot accounts. Social bots not only learn human conversations, manners, and presence but also manipulate public opinion, act as scammers, manipulate stock markets, and so on. There has been evidence of bots manipulating people's opinions and thoughts which can be a great threat to democracy. Identification and prevention of such campaigns that release or create these bots have become critical. Our goal in this paper is to leverage web mining techniques to help detect fake bots on social media platforms such as Twitter, thereby mitigating the spread of disinformation. △ Less

Submitted 27 November, 2021; v1 submitted 11 October, 2021; originally announced October 2021.

arXiv:2108.02417 [pdf, other]

doi 10.1145/3474085.3475634

Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval

Authors: Xuri Ge, Fuhai Chen, Joemon M. Jose, Zhilong Ji, Zhongqin Wu, Xiao Liu

Abstract: The current state-of-the-art image-sentence retrieval methods implicitly align the visual-textual fragments, like regions in images and words in sentences, and adopt attention modules to highlight the relevance of cross-modal semantic correspondences. However, the retrieval performance remains unsatisfactory due to a lack of consistent representation in both semantics and structural spaces. In thi… ▽ More The current state-of-the-art image-sentence retrieval methods implicitly align the visual-textual fragments, like regions in images and words in sentences, and adopt attention modules to highlight the relevance of cross-modal semantic correspondences. However, the retrieval performance remains unsatisfactory due to a lack of consistent representation in both semantics and structural spaces. In this work, we propose to address the above issue from two aspects: (i) constructing intrinsic structure (along with relations) among the fragments of respective modalities, e.g., "dog $\to$ play $\to$ ball" in semantic structure for an image, and (ii) seeking explicit inter-modal structural and semantic correspondence between the visual and textual modalities. In this paper, we propose a novel Structured Multi-modal Feature Embedding and Alignment (SMFEA) model for image-sentence retrieval. In order to jointly and explicitly learn the visual-textual embedding and the cross-modal alignment, SMFEA creates a novel multi-modal structured module with a shared context-aware referral tree. In particular, the relations of the visual and textual fragments are modeled by constructing Visual Context-aware Structured Tree encoder (VCS-Tree) and Textual Context-aware Structured Tree encoder (TCS-Tree) with shared labels, from which visual and textual features can be jointly learned and optimized. We utilize the multi-modal tree structure to explicitly align the heterogeneous image-sentence data by maximizing the semantic and structural similarity between corresponding inter-modal tree nodes. Extensive experiments on Microsoft COCO and Flickr30K benchmarks demonstrate the superiority of the proposed model in comparison to the state-of-the-art methods. △ Less

Submitted 5 August, 2021; originally announced August 2021.

Comments: 9 pages, 7 figures, Accepted by ACM MM 2021

arXiv:2106.13271 [pdf, ps, other]

On Fairness and Interpretability

Authors: Deepak P, Sanil V, Joemon M. Jose

Abstract: Ethical AI spans a gamut of considerations. Among these, the most popular ones, fairness and interpretability, have remained largely distinct in technical pursuits. We discuss and elucidate the differences between fairness and interpretability across a variety of dimensions. Further, we develop two principles-based frameworks towards developing ethical AI for the future that embrace aspects of bot… ▽ More Ethical AI spans a gamut of considerations. Among these, the most popular ones, fairness and interpretability, have remained largely distinct in technical pursuits. We discuss and elucidate the differences between fairness and interpretability across a variety of dimensions. Further, we develop two principles-based frameworks towards developing ethical AI for the future that embrace aspects of both fairness and interpretability. First, interpretability for fairness proposes instantiating interpretability within the realm of fairness to develop a new breed of ethical AI. Second, fairness and interpretability initiates deliberations on bringing the best aspects of both together. We hope that these two frameworks will contribute to intensifying scholarly discussions on new frontiers of ethical AI that brings together fairness and interpretability. △ Less

Submitted 24 June, 2021; originally announced June 2021.

Comments: in IJCAI 2021 Workshop on AI for Social Good, January 2021. [ Ref: https://crcs.seas.harvard.edu/publications/fairness-and-interpretability ]

arXiv:2105.09605 [pdf, other]

doi 10.1145/3485447.3512202

Learning Robust Recommenders through Cross-Model Agreement

Authors: Yu Wang, Xin Xin, Zaiqiao Meng, Xiangnan He, Joemon Jose, Fuli Feng

Abstract: Learning from implicit feedback is one of the most common cases in the application of recommender systems. Generally speaking, interacted examples are considered as positive while negative examples are sampled from uninteracted ones. However, noisy examples are prevalent in real-world implicit feedback. A noisy positive example could be interacted but it actually leads to negative user preference.… ▽ More Learning from implicit feedback is one of the most common cases in the application of recommender systems. Generally speaking, interacted examples are considered as positive while negative examples are sampled from uninteracted ones. However, noisy examples are prevalent in real-world implicit feedback. A noisy positive example could be interacted but it actually leads to negative user preference. A noisy negative example which is uninteracted because of unawareness of the user could also denote potential positive user preference. Conventional training methods overlook these noisy examples, leading to sub-optimal recommendations. In this work, we propose a novel framework to learn robust recommenders from implicit feedback. Through an empirical study, we find that different models make relatively similar predictions on clean examples which denote the real user preference, while the predictions on noisy examples vary much more across different models. Motivated by this observation, we propose denoising with cross-model agreement(DeCA) which aims to minimize the KL-divergence between the real user preference distributions parameterized by two recommendation models while maximizing the likelihood of data observation. We employ the proposed DeCA on four state-of-the-art recommendation models and conduct experiments on four datasets. Experimental results demonstrate that DeCA significantly improves recommendation performance compared with normal training and other denoising methods. Codes will be open-sourced. △ Less

Submitted 13 March, 2022; v1 submitted 20 May, 2021; originally announced May 2021.

Comments: 12 pages, 23 figures

Journal ref: World Wide Web Conference 2022

arXiv:2104.00985 [pdf, other]

Brain Tumor Segmentation and Survival Prediction using 3D Attention UNet

Authors: Mobarakol Islam, Vibashan VS, V Jeya Maria Jose, Navodini Wijethilake, Uppal Utkarsh, Hongliang Ren

Abstract: In this work, we develop an attention convolutional neural network (CNN) to segment brain tumors from Magnetic Resonance Images (MRI). Further, we predict the survival rate using various machine learning methods. We adopt a 3D UNet architecture and integrate channel and spatial attention with the decoder network to perform segmentation. For survival prediction, we extract some novel radiomic featu… ▽ More In this work, we develop an attention convolutional neural network (CNN) to segment brain tumors from Magnetic Resonance Images (MRI). Further, we predict the survival rate using various machine learning methods. We adopt a 3D UNet architecture and integrate channel and spatial attention with the decoder network to perform segmentation. For survival prediction, we extract some novel radiomic features based on geometry, location, the shape of the segmented tumor and combine them with clinical information to estimate the survival duration for each patient. We also perform extensive experiments to show the effect of each feature for overall survival (OS) prediction. The experimental results infer that radiomic features such as histogram, location, and shape of the necrosis region and clinical features like age are the most critical parameters to estimate the OS. △ Less

Submitted 2 April, 2021; originally announced April 2021.

Comments: MICCAI-BrainLes Workshop

arXiv:2104.00980 [pdf, other]

Glioma Prognosis: Segmentation of the Tumor and Survival Prediction using Shape, Geometric and Clinical Information

Authors: Mobarakol Islam, V Jeya Maria Jose, Hongliang Ren

Abstract: Segmentation of brain tumor from magnetic resonance imaging (MRI) is a vital process to improve diagnosis, treatment planning and to study the difference between subjects with tumor and healthy subjects. In this paper, we exploit a convolutional neural network (CNN) with hypercolumn technique to segment tumor from healthy brain tissue. Hypercolumn is the concatenation of a set of vectors which for… ▽ More Segmentation of brain tumor from magnetic resonance imaging (MRI) is a vital process to improve diagnosis, treatment planning and to study the difference between subjects with tumor and healthy subjects. In this paper, we exploit a convolutional neural network (CNN) with hypercolumn technique to segment tumor from healthy brain tissue. Hypercolumn is the concatenation of a set of vectors which form by extracting convolutional features from multiple layers. Proposed model integrates batch normalization (BN) approach with hypercolumn. BN layers help to alleviate the internal covariate shift during stochastic gradient descent (SGD) training by zero-mean and unit variance of each mini-batch. Survival Prediction is done by first extracting features(Geometric, Fractal, and Histogram) from the segmented brain tumor data. Then, the number of days of overall survival is predicted by implementing regression on the extracted features using an artificial neural network (ANN). Our model achieves a mean dice score of 89.78%, 82.53% and 76.54% for the whole tumor, tumor core and enhancing tumor respectively in segmentation task and 67.90% in overall survival prediction task with the validation set of BraTS 2018 challenge. It obtains a mean dice accuracy of 87.315%, 77.04% and 70.22% for the whole tumor, tumor core and enhancing tumor respectively in the segmentation task and a 46.80% in overall survival prediction task in the BraTS 2018 test data set. △ Less

Submitted 2 April, 2021; originally announced April 2021.

Comments: MICCAI-BrainLes Workshop

arXiv:2101.02557 [pdf]

Continuous Glucose Monitoring Prediction

Authors: Julia Ann Jose, Trae Waggoner, Sudarsan Manikandan

Abstract: Diabetes is one of the deadliest diseases in the world and affects nearly 10 percent of the global adult population. Fortunately, powerful new technologies allow for a consistent and reliable treatment plan for people with diabetes. One major development is a system called continuous blood glucose monitoring (CGM). In this review, we look at three different continuous meal detection algorithms tha… ▽ More Diabetes is one of the deadliest diseases in the world and affects nearly 10 percent of the global adult population. Fortunately, powerful new technologies allow for a consistent and reliable treatment plan for people with diabetes. One major development is a system called continuous blood glucose monitoring (CGM). In this review, we look at three different continuous meal detection algorithms that were developed using given CGM data from patients with diabetes. From this analysis, an initial meal prediction algorithm was also developed utilizing these methods. △ Less

Submitted 4 January, 2021; originally announced January 2021.

arXiv:2101.00055 [pdf, other]

Data Criticality in Multi-Threaded Applications: An Insight for Many-Core Systems

Authors: Abhijit Das, John Jose, Prabhat Mishra

Abstract: Multi-threaded applications are capable of exploiting the full potential of many-core systems. However, Network-on-Chip (NoC) based inter-core communication in many-core systems is responsible for 60-75% of the miss latency experienced by multi-threaded applications. Delay in the arrival of critical data at the requesting core severely hampers performance. This brief presents some interesting insi… ▽ More Multi-threaded applications are capable of exploiting the full potential of many-core systems. However, Network-on-Chip (NoC) based inter-core communication in many-core systems is responsible for 60-75% of the miss latency experienced by multi-threaded applications. Delay in the arrival of critical data at the requesting core severely hampers performance. This brief presents some interesting insights about how critical data is requested from the memory by multi-threaded applications. Then it investigates the cause of delay in NoC and how it affects the performance. Finally, this brief shows how NoC-aware memory access optimisations can significantly improve performance. Our experimental evaluation considers early restart memory access optimisation and demonstrates that by exploiting NoC resources, critical data can be prioritised to reduce miss penalty by 10-12% and improve system performance by 7-11%. △ Less

Submitted 31 December, 2020; originally announced January 2021.

arXiv:2009.13724 [pdf, other]

One Person, One Model, One World: Learning Continual User Representation without Forgetting

Authors: Fajie Yuan, Guoxiao Zhang, Alexandros Karatzoglou, Joemon Jose, Beibei Kong, Yudong Li

Abstract: Learning user representations is a vital technique toward effective user modeling and personalized recommender systems. Existing approaches often derive an individual set of model parameters for each task by training on separate data. However, the representation of the same user potentially has some commonalities, such as preference and personality, even in different tasks. As such, these separate… ▽ More Learning user representations is a vital technique toward effective user modeling and personalized recommender systems. Existing approaches often derive an individual set of model parameters for each task by training on separate data. However, the representation of the same user potentially has some commonalities, such as preference and personality, even in different tasks. As such, these separately trained representations could be suboptimal in performance as well as inefficient in terms of parameter sharing. In this paper, we delve on research to continually learn user representations task by task, whereby new tasks are learned while using partial parameters from old ones. A new problem arises since when new tasks are trained, previously learned parameters are very likely to be modified, and as a result, an artificial neural network (ANN)-based model may lose its capacity to serve for well-trained previous tasks forever, this issue is termed catastrophic forgetting. To address this issue, we present \emph{Conure} the first \underline{con}tinual, or lifelong, \underline{u}ser \underline{re}presentation learner -- i.e., learning new tasks over time without forgetting old ones. Specifically, we propose iteratively removing less important weights of old tasks in a deep user representation model, motivated by the fact that neural network models are usually over-parameterized. In this way, we could learn many tasks with a single model by reusing the important weights, and modifying the less important weights to adapt to new tasks. We conduct extensive experiments on two real-world datasets with nine tasks and show that \emph{Conure} largely exceeds the standard model that does not purposely preserve such old "knowledge", and performs competitively or sometimes better than models which are trained either individually for each task or simultaneously by merging all task data. △ Less

Submitted 9 May, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

arXiv:2008.10233 [pdf]

AMRConvNet: AMR-Coded Speech Enhancement Using Convolutional Neural Networks

Authors: Williard Joshua Jose

Abstract: Speech is converted to digital signals using speech coding for efficient transmission. However, this often lowers the quality and bandwidth of speech. This paper explores the application of convolutional neural networks for Artificial Bandwidth Expansion (ABE) and speech enhancement on coded speech, particularly Adaptive Multi-Rate (AMR) used in 2G cellular phone calls. In this paper, we introduce… ▽ More Speech is converted to digital signals using speech coding for efficient transmission. However, this often lowers the quality and bandwidth of speech. This paper explores the application of convolutional neural networks for Artificial Bandwidth Expansion (ABE) and speech enhancement on coded speech, particularly Adaptive Multi-Rate (AMR) used in 2G cellular phone calls. In this paper, we introduce AMRConvNet: a convolutional neural network that performs ABE and speech enhancement on speech encoded with AMR. The model operates directly on the time-domain for both input and output speech but optimizes using combined time-domain reconstruction loss and frequency-domain perceptual loss. AMRConvNet resulted in an average improvement of 0.425 Mean Opinion Score - Listening Quality Objective (MOS-LQO) points for AMR bitrate of 4.75k, and 0.073 MOS-LQO points for AMR bitrate of 12.2k. AMRConvNet also showed robustness in AMR bitrate inputs. Finally, an ablation test showed that our combined time-domain and frequency-domain loss leads to slightly higher MOS-LQO and faster training convergence than using either loss alone. △ Less

Submitted 24 August, 2020; originally announced August 2020.

Comments: IEEE SMC 2020

arXiv:2006.05779 [pdf, other]

Self-Supervised Reinforcement Learning for Recommender Systems

Authors: Xin Xin, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M. Jose

Abstract: In session-based or sequential recommendation, it is important to consider a number of factors like long-term user engagement, multiple types of user-item interactions such as clicks, purchases etc. The current state-of-the-art supervised approaches fail to model them appropriately. Casting sequential recommendation task as a reinforcement learning (RL) problem is a promising direction. A major co… ▽ More In session-based or sequential recommendation, it is important to consider a number of factors like long-term user engagement, multiple types of user-item interactions such as clicks, purchases etc. The current state-of-the-art supervised approaches fail to model them appropriately. Casting sequential recommendation task as a reinforcement learning (RL) problem is a promising direction. A major component of RL approaches is to train the agent through interactions with the environment. However, it is often problematic to train a recommender in an on-line fashion due to the requirement to expose users to irrelevant recommendations. As a result, learning the policy from logged implicit feedback is of vital importance, which is challenging due to the pure off-policy setting and lack of negative rewards (feedback). In this paper, we propose self-supervised reinforcement learning for sequential recommendation tasks. Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL. The RL part acts as a regularizer to drive the supervised layer focusing on specific rewards(e.g., recommending items which may lead to purchases rather than clicks) while the self-supervised layer with cross-entropy loss provides strong gradient signals for parameter updates. Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC). We integrate the proposed frameworks with four state-of-the-art recommendation models. Experimental results on two real-world datasets demonstrate the effectiveness of our approach. △ Less

Submitted 11 June, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

Comments: SIGIR2020

arXiv:2006.04878 [pdf, other]

KiU-Net: Towards Accurate Segmentation of Biomedical Images using Over-complete Representations

Authors: Jeya Maria Jose, Vishwanath Sindagi, Ilker Hacihaliloglu, Vishal M. Patel

Abstract: Due to its excellent performance, U-Net is the most widely used backbone architecture for biomedical image segmentation in the recent years. However, in our studies, we observe that there is a considerable performance drop in the case of detecting smaller anatomical landmarks with blurred noisy boundaries. We analyze this issue in detail, and address it by proposing an over-complete architecture (… ▽ More Due to its excellent performance, U-Net is the most widely used backbone architecture for biomedical image segmentation in the recent years. However, in our studies, we observe that there is a considerable performance drop in the case of detecting smaller anatomical landmarks with blurred noisy boundaries. We analyze this issue in detail, and address it by proposing an over-complete architecture (Ki-Net) which involves projecting the data onto higher dimensions (in the spatial sense). This network, when augmented with U-Net, results in significant improvements in the case of segmenting small anatomical landmarks and blurred noisy boundaries while obtaining better overall performance. Furthermore, the proposed network has additional benefits like faster convergence and fewer number of parameters. We evaluate the proposed method on the task of brain anatomy segmentation from 2D Ultrasound (US) of preterm neonates, and achieve an improvement of around 4% in terms of the DICE accuracy and Jaccard index as compared to the standard-U-Net, while outperforming the recent best methods by 2%. Code: https://github.com/jeya-maria-jose/KiU-Net-pytorch . △ Less

Submitted 8 July, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

Comments: Accepted at MICCAI 2020

arXiv:2004.11460 [pdf, other]

Development of a Machine Learning Model and Mobile Application to Aid in Predicting Dosage of Vitamin K Antagonists Among Indian Patients

Authors: Amruthlal M, Devika S, Ameer Suhail P A, Aravind K Menon, Vignesh Krishnan, Alan Thomas, Manu Thomas, Sanjay G, Lakshmi Kanth L R, Jimmy Jose, Harikrishnan S

Abstract: Patients who undergo mechanical heart valve replacements or have conditions like Atrial Fibrillation have to take Vitamin K Antagonists (VKA) drugs to prevent coagulation of blood. These drugs have narrow therapeutic range and need to be very closely monitored due to life threatening side effects. The dosage of VKA drug is determined and revised by a physician based on Prothrombin Time - Internati… ▽ More Patients who undergo mechanical heart valve replacements or have conditions like Atrial Fibrillation have to take Vitamin K Antagonists (VKA) drugs to prevent coagulation of blood. These drugs have narrow therapeutic range and need to be very closely monitored due to life threatening side effects. The dosage of VKA drug is determined and revised by a physician based on Prothrombin Time - International Normalised Ratio (PT-INR) value obtained through a blood test. Our work aimed at predicting the maintenance dosage of warfarin, the present most widely recommended anticoagulant drug, using the de-identified medical data collected from 109 patients from Kerala. A Support Vector Machine (SVM) Regression model was built to predict the maintenance dosage of warfarin, for patients who have been undergoing treatment from a physician and have reached stable INR values between 2.0 and 4.0. △ Less

Submitted 19 April, 2020; originally announced April 2020.

arXiv:2004.04635 [pdf, other]

Graph Highway Networks

Authors: Xin Xin, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M. Jose

Abstract: Graph Convolution Networks (GCN) are widely used in learning graph representations due to their effectiveness and efficiency. However, they suffer from the notorious over-smoothing problem, in which the learned representations of densely connected nodes converge to alike vectors when many (>3) graph convolutional layers are stacked. In this paper, we argue that there-normalization trick used in GC… ▽ More Graph Convolution Networks (GCN) are widely used in learning graph representations due to their effectiveness and efficiency. However, they suffer from the notorious over-smoothing problem, in which the learned representations of densely connected nodes converge to alike vectors when many (>3) graph convolutional layers are stacked. In this paper, we argue that there-normalization trick used in GCN leads to overly homogeneous information propagation, which is the source of over-smoothing. To address this problem, we propose Graph Highway Networks(GHNet) which utilize gating units to automatically balance the trade-off between homogeneity and heterogeneity in the GCN learning process. The gating units serve as direct highways to maintain heterogeneous information from the node itself after feature propagation. This design enables GHNet to achieve much larger receptive fields per node without over-smoothing and thus access to more of the graph connectivity information. Experimental results on benchmark datasets demonstrate the superior performance of GHNet over GCN and related models. △ Less

Submitted 9 April, 2020; originally announced April 2020.

arXiv:1908.08752 [pdf]

Ten years of research on ResearchGate, a scoping review using Google Scholar 2008_2017

Authors: Prieto-Gutierrez, Juan Jose

Abstract: Objective. To analyse quantitatively the articles published during 2008_2017 about the academic social networking site ResearchGate. Methods. A scoping bibliometric review of documents retrieved using Google Scholar was conducted, limited to publications that contained the word "ResearchGate" in their title and were published from 2008 to 2017. Results. The search yielded 159 documents, once a pre… ▽ More Objective. To analyse quantitatively the articles published during 2008_2017 about the academic social networking site ResearchGate. Methods. A scoping bibliometric review of documents retrieved using Google Scholar was conducted, limited to publications that contained the word "ResearchGate" in their title and were published from 2008 to 2017. Results. The search yielded 159 documents, once a preliminary list of 386 documents retrieved from Google Scholar was filtered, which eliminated about 60% of the results that were bibliographic citations and not documents. Papers in journals were the most numerous type of documents (n73; 46%), followed by conference papers (n_31; 19.5 %). Contributing eight publications, two Spanish scholars (Delgado Lopez-Cozar and Orduna Malea, who were coauthors in each case) were the most prolific authors writing on this topic during the ten-year period. The keywords most used in the documents were "ResearchGate" and "Altmetrics". The publications were cited frequently since 2014 (more than 90% of the total cites fell in that period), and those with more than one author were the most cited ones. The authors of the documents were mainly librarians and information science professionals, who wrote primarily as co-authors with colleagues from their own institutions, mostly published in English. Conclusions. Interest in ResearchGate has grown since 2015, as evident from the number of articles published and the citations they received. Keywords. Academic social networks, bibliometrics, scholarly communication △ Less

Submitted 23 August, 2019; originally announced August 2019.

arXiv:1905.09063 [pdf, other]

NTP : A Neural Network Topology Profiler

Authors: Raghavendra Bhat, Pravin Chandran, Juby Jose, Viswanath Dibbur, Prakash Sirra Ajith

Abstract: Performance of end-to-end neural networks on a given hardware platform is a function of its compute and memory signature, which in-turn, is governed by a wide range of parameters such as topology size, primitives used, framework used, batching strategy, latency requirements, precision etc. Current benchmarking tools suffer from limitations such as a) being either too granular like DeepBench [1] (o… ▽ More Performance of end-to-end neural networks on a given hardware platform is a function of its compute and memory signature, which in-turn, is governed by a wide range of parameters such as topology size, primitives used, framework used, batching strategy, latency requirements, precision etc. Current benchmarking tools suffer from limitations such as a) being either too granular like DeepBench [1] (or) b) mandate a working implementation that is either framework specific or hardware-architecture specific or both (or) c) provide only high level benchmark metrics. In this paper, we present NTP (Neural Net Topology Profiler), a sophisticated benchmarking framework, to effectively identify memory and compute signature of an end-to-end topology on multiple hardware architectures, without the need for an actual implementation. NTP is tightly integrated with hardware specific benchmarking tools to enable exhaustive data collection and analysis. Using NTP, a deep learning researcher can quickly establish baselines needed to understand performance of an end-to-end neural network topology and make high level architectural decisions. Further, integration of NTP with frameworks like Tensorflow, Pytorch, Intel OpenVINO etc. allows for performance comparison along several vectors like a) Comparison of different frameworks on a given hardware b) Comparison of different hardware using a given framework c) Comparison across different heterogeneous hardware configurations for given framework etc. These capabilities empower a researcher to effortlessly make architectural decisions needed for achieving optimized performance on any hardware platform. The paper documents the architectural approach of NTP and demonstrates the capabilities of the tool by benchmarking Mozilla DeepSpeech, a popular Speech Recognition topology. △ Less

Submitted 24 May, 2019; v1 submitted 22 May, 2019; originally announced May 2019.

arXiv:1905.05420 [pdf, other]

doi 10.13140/RG.2.2.23016.52485

Towards a Skeleton-Based Action Recognition For Realistic Scenarios

Authors: Cagatay Odabasi, Jewel Jose

Abstract: Understanding human actions is a crucial problem for service robots. However, the general trend in Action Recognition is developing and testing these systems on structured datasets. That's why this work presents a practical Skeleton-based Action Recognition framework which can be used in realistic scenarios. Our results show that although non-augmented and non-normalized data may yield comparable… ▽ More Understanding human actions is a crucial problem for service robots. However, the general trend in Action Recognition is developing and testing these systems on structured datasets. That's why this work presents a practical Skeleton-based Action Recognition framework which can be used in realistic scenarios. Our results show that although non-augmented and non-normalized data may yield comparable results on the test split of the dataset, it is far from being useful on another dataset which is a manually collected data. △ Less

Submitted 14 May, 2019; originally announced May 2019.

arXiv:1904.12796 [pdf, other]

Relational Collaborative Filtering:Modeling Multiple Item Relations for Recommendation

Authors: Xin Xin, Xiangnan He, Yongfeng Zhang, Yongdong Zhang, Joemon Jose

Abstract: Existing item-based collaborative filtering (ICF) methods leverage only the relation of collaborative similarity. Nevertheless, there exist multiple relations between items in real-world scenarios. Distinct from the collaborative similarity that implies co-interact patterns from the user perspective, these relations reveal fine-grained knowledge on items from different perspectives of meta-data, f… ▽ More Existing item-based collaborative filtering (ICF) methods leverage only the relation of collaborative similarity. Nevertheless, there exist multiple relations between items in real-world scenarios. Distinct from the collaborative similarity that implies co-interact patterns from the user perspective, these relations reveal fine-grained knowledge on items from different perspectives of meta-data, functionality, etc. However, how to incorporate multiple item relations is less explored in recommendation research. In this work, we propose Relational Collaborative Filtering (RCF), a general framework to exploit multiple relations between items in recommender system. We find that both the relation type and the relation value are crucial in inferring user preference. To this end, we develop a two-level hierarchical attention mechanism to model user preference. The first-level attention discriminates which types of relations are more important, and the second-level attention considers the specific relation values to estimate the contribution of a historical item in recommending the target item. To make the item embeddings be reflective of the relational structure between items, we further formulate a task to preserve the item relations, and jointly train it with the recommendation task of preference modeling. Empirical results on two real datasets demonstrate the strong performance of RCF. Furthermore, we also conduct qualitative analyses to show the benefits of explanations brought by the modeling of multiple item relations. △ Less

Submitted 11 May, 2019; v1 submitted 29 April, 2019; originally announced April 2019.

arXiv:1811.02629 [pdf, other]

Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge

Authors: Spyridon Bakas, Mauricio Reyes, Andras Jakab, Stefan Bauer, Markus Rempfler, Alessandro Crimi, Russell Takeshi Shinohara, Christoph Berger, Sung Min Ha, Martin Rozycki, Marcel Prastawa, Esther Alberts, Jana Lipkova, John Freymann, Justin Kirby, Michel Bilello, Hassan Fathallah-Shaykh, Roland Wiest, Jan Kirschke, Benedikt Wiestler, Rivka Colen, Aikaterini Kotrotsou, Pamela Lamontagne, Daniel Marcus, Mikhail Milchenko , et al. (402 additional authors not shown)

Abstract: Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles dissem… ▽ More Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset. △ Less

Submitted 23 April, 2019; v1 submitted 5 November, 2018; originally announced November 2018.

Comments: The International Multimodal Brain Tumor Segmentation (BraTS) Challenge

arXiv:1808.05163 [pdf, other]

A Simple Convolutional Generative Network for Next Item Recommendation

Authors: Fajie Yuan, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M Jose, Xiangnan He

Abstract: Convolutional Neural Networks (CNNs) have been recently introduced in the domain of session-based next item recommendation. An ordered collection of past items the user has interacted with in a session (or sequence) are embedded into a 2-dimensional latent matrix, and treated as an image. The convolution and pooling operations are then applied to the mapped item embeddings. In this paper, we first… ▽ More Convolutional Neural Networks (CNNs) have been recently introduced in the domain of session-based next item recommendation. An ordered collection of past items the user has interacted with in a session (or sequence) are embedded into a 2-dimensional latent matrix, and treated as an image. The convolution and pooling operations are then applied to the mapped item embeddings. In this paper, we first examine the typical session-based CNN recommender and show that both the generative model and network architecture are suboptimal when modeling long-range dependencies in the item sequence. To address the issues, we introduce a simple, but very effective generative model that is capable of learning high-level representation from both short- and long-range item dependencies. The network architecture of the proposed model is formed of a stack of \emph{holed} convolutional layers, which can efficiently increase the receptive fields without relying on the pooling operation. Another contribution is the effective use of residual block structure in recommender systems, which can ease the optimization for much deeper networks. The proposed generative model attains state-of-the-art accuracy with less training time in the next item recommendation task. It accordingly can be used as a powerful recommendation baseline to beat in future, especially when there are long sequences of user feedback. △ Less

Submitted 28 November, 2018; v1 submitted 15 August, 2018; originally announced August 2018.

arXiv:1710.09805 [pdf, other]

Improving Negative Sampling for Word Representation using Self-embedded Features

Authors: Long Chen, Fajie Yuan, Joemon M. Jose, Weinan Zhang

Abstract: Although the word-popularity based negative sampler has shown superb performance in the skip-gram model, the theoretical motivation behind oversampling popular (non-observed) words as negative samples is still not well understood. In this paper, we start from an investigation of the gradient vanishing issue in the skipgram model without a proper negative sampler. By performing an insightful analys… ▽ More Although the word-popularity based negative sampler has shown superb performance in the skip-gram model, the theoretical motivation behind oversampling popular (non-observed) words as negative samples is still not well understood. In this paper, we start from an investigation of the gradient vanishing issue in the skipgram model without a proper negative sampler. By performing an insightful analysis from the stochastic gradient descent (SGD) learning perspective, we demonstrate that, both theoretically and intuitively, negative samples with larger inner product scores are more informative than those with lower scores for the SGD learner in terms of both convergence rate and accuracy. Understanding this, we propose an alternative sampling algorithm that dynamically selects informative negative samples during each SGD update. More importantly, the proposed sampler accounts for multi-dimensional self-embedded features during the sampling process, which essentially makes it more effective than the original popularity-based (one-dimensional) sampler. Empirical experiments further verify our observations, and show that our fine-grained samplers gain significant improvement over the existing ones without increasing computational complexity. △ Less

Submitted 26 June, 2018; v1 submitted 26 October, 2017; originally announced October 2017.

Comments: Accepted in WSDM 2018

arXiv:1601.06181 [pdf, ps, other]

Secure Content Distribution in Vehicular Networks

Authors: Viet T. Nguyen, Jubin Jose, Xinzhou Wu, Tom Richardson

Abstract: Dedicated short range communication (DSRC) relies on secure distribution to vehicles of a certificate revocation list (CRL) for enabling security protocols. CRL distribution utilizing vehicle-to-vehicle (V2V) communications is preferred to an infrastructure-only approach. One approach to V2V CRL distribution, using rateless coding at the source and forwarding at vehicle relays is vulnerable to a p… ▽ More Dedicated short range communication (DSRC) relies on secure distribution to vehicles of a certificate revocation list (CRL) for enabling security protocols. CRL distribution utilizing vehicle-to-vehicle (V2V) communications is preferred to an infrastructure-only approach. One approach to V2V CRL distribution, using rateless coding at the source and forwarding at vehicle relays is vulnerable to a pollution attack in which a few malicious vehicles forward incorrect packets which then spread through the network leading to denial-of-service. This paper develops a new scheme called Precode-and-Hash that enables efficient packet verification before forwarding thereby preventing the pollution attack. In contrast to rateless codes, it utilizes a fixed low-rate precode and random selection of packets from the set of precoded packets. The fixed precode admits efficient hash verification of all encoded packets. Specifically, hashes are computed for all precoded packets and sent securely using signatures. We analyze the performance of the Precode-and-Hash scheme for a multi-hop line network and provide simulation results for several schemes in a more realistic vehicular model. △ Less

Submitted 22 January, 2016; originally announced January 2016.

arXiv:1512.02059 [pdf, other]

Inter-Vehicle Range Estimation from Periodic Broadcasts

Authors: Urs Niesen, Venkatesan N. Ekambaram, Jubin Jose, Xinzhou Wu

Abstract: Dedicated short-range communication (DSRC) enables vehicular communication using periodic broadcast messages. We propose to use these periodic broadcasts to perform inter-vehicle ranging. Motivated by this scenario, we study the general problem of precise range estimation between pairs of moving vehicles using periodic broadcasts. Each vehicle has its own independent and unsynchronized clock, whic… ▽ More Dedicated short-range communication (DSRC) enables vehicular communication using periodic broadcast messages. We propose to use these periodic broadcasts to perform inter-vehicle ranging. Motivated by this scenario, we study the general problem of precise range estimation between pairs of moving vehicles using periodic broadcasts. Each vehicle has its own independent and unsynchronized clock, which can exhibit significant drift between consecutive periodic broadcast transmissions. As a consequence, both the clock offsets and drifts need to be taken into account in addition to the vehicle motion to accurately estimate the vehicle ranges. We develop a range estimation algorithm using local polynomial smoothing of the vehicle motion. The proposed algorithm can be applied to networks with arbitrary number of vehicles and requires no additional message exchanges apart from the periodic broadcasts. We validate our algorithm on experimental data and show that the performance of the proposed approach is close to that obtained using unicast round-trip time ranging. In particular, we are able to achieve sub-meter ranging accuracies in vehicular scenarios. Our scheme requires additional timestamp information to be transmitted as part of the broadcast messages, and we develop a novel timestamp compression algorithm to minimize the resulting overhead. △ Less

Submitted 7 July, 2016; v1 submitted 7 December, 2015; originally announced December 2015.

Comments: 16 pages

Journal ref: IEEE Transactions on Vehicular Technology, vol. 66, pp. 10637-10646, December 2017

arXiv:1511.01535 [pdf, ps, other]

Distributed Rate and Power Control in Vehicular Networks

Authors: Jubin Jose, Chong Li, Xinzhou Wu, Lei Ying, Kai Zhu

Abstract: The focus of this paper is on the rate and power control algorithms in Dedicated Short Range Communication (DSRC) for vehicular networks. We first propose a utility maximization framework by leveraging the well-developed network congestion control, and formulate two subproblems, one on rate control with fixed transmit powers and the other on power control with fixed rates. Distributed rate control… ▽ More The focus of this paper is on the rate and power control algorithms in Dedicated Short Range Communication (DSRC) for vehicular networks. We first propose a utility maximization framework by leveraging the well-developed network congestion control, and formulate two subproblems, one on rate control with fixed transmit powers and the other on power control with fixed rates. Distributed rate control and power control algorithms are developed to solve these two subproblems, respectively, and are proved to be asymptotically optimal. Joint rate and power control can be done by using the two algorithms in an alternating fashion. The performance enhancement of our algorithms compared with a recent rate control algorithm, called EMBARC, is evaluated by using the network simulator ns2. △ Less

Submitted 4 November, 2015; originally announced November 2015.

Comments: Submitted to IEEE/ACM Transactions on Networking

arXiv:1111.1022 [pdf]

Towards the integration of formal specification in the Áncora methodology

Authors: Carlos Alberto Fernandez-y-Fernandez, Martín José José

Abstract: There are some non-formal methodologies such as RUP, OpenUP, agile methodologies such as SCRUP, XP and techniques like those proposed by UML, which allow the development of software. The software industry has struggled to generate quality software, as importance has not been given to the engineering requirements, resulting in a poor specification of requirements and software of poor quality. In or… ▽ More There are some non-formal methodologies such as RUP, OpenUP, agile methodologies such as SCRUP, XP and techniques like those proposed by UML, which allow the development of software. The software industry has struggled to generate quality software, as importance has not been given to the engineering requirements, resulting in a poor specification of requirements and software of poor quality. In order to generate a contribution to the specification of requirements, this article describes a methodological proposal, implementing formal methods to the results of the process of requirements analysis of the methodology Áncora. △ Less

Submitted 3 November, 2011; originally announced November 2011.

Showing 1–50 of 61 results for author: Jose, J