-
Diff-Tracker: Text-to-Image Diffusion Models are Unsupervised Trackers
Authors:
Zhengbo Zhang,
Li Xu,
Duo Peng,
Hossein Rahmani,
Jun Liu
Abstract:
We introduce Diff-Tracker, a novel approach for the challenging unsupervised visual tracking task leveraging the pre-trained text-to-image diffusion model. Our main idea is to leverage the rich knowledge encapsulated within the pre-trained diffusion model, such as the understanding of image semantics and structural information, to address unsupervised visual tracking. To this end, we design an ini…
▽ More
We introduce Diff-Tracker, a novel approach for the challenging unsupervised visual tracking task leveraging the pre-trained text-to-image diffusion model. Our main idea is to leverage the rich knowledge encapsulated within the pre-trained diffusion model, such as the understanding of image semantics and structural information, to address unsupervised visual tracking. To this end, we design an initial prompt learner to enable the diffusion model to recognize the tracking target by learning a prompt representing the target. Furthermore, to facilitate dynamic adaptation of the prompt to the target's movements, we propose an online prompt updater. Extensive experiments on five benchmark datasets demonstrate the effectiveness of our proposed method, which also achieves state-of-the-art performance.
△ Less
Submitted 16 July, 2024; v1 submitted 11 July, 2024;
originally announced July 2024.
-
Towards Adaptive Pseudo-label Learning for Semi-Supervised Temporal Action Localization
Authors:
Feixiang Zhou,
Bryan Williams,
Hossein Rahmani
Abstract:
Alleviating noisy pseudo labels remains a key challenge in Semi-Supervised Temporal Action Localization (SS-TAL). Existing methods often filter pseudo labels based on strict conditions, but they typically assess classification and localization quality separately, leading to suboptimal pseudo-label ranking and selection. In particular, there might be inaccurate pseudo labels within selected positiv…
▽ More
Alleviating noisy pseudo labels remains a key challenge in Semi-Supervised Temporal Action Localization (SS-TAL). Existing methods often filter pseudo labels based on strict conditions, but they typically assess classification and localization quality separately, leading to suboptimal pseudo-label ranking and selection. In particular, there might be inaccurate pseudo labels within selected positives, alongside reliable counterparts erroneously assigned to negatives. To tackle these problems, we propose a novel Adaptive Pseudo-label Learning (APL) framework to facilitate better pseudo-label selection. Specifically, to improve the ranking quality, Adaptive Label Quality Assessment (ALQA) is proposed to jointly learn classification confidence and localization reliability, followed by dynamically selecting pseudo labels based on the joint score. Additionally, we propose an Instance-level Consistency Discriminator (ICD) for eliminating ambiguous positives and mining potential positives simultaneously based on inter-instance intrinsic consistency, thereby leading to a more precise selection. We further introduce a general unsupervised Action-aware Contrastive Pre-training (ACP) to enhance the discrimination both within actions and between actions and backgrounds, which benefits SS-TAL. Extensive experiments on THUMOS14 and ActivityNet v1.3 demonstrate that our method achieves state-of-the-art performance under various semi-supervised settings.
△ Less
Submitted 12 July, 2024; v1 submitted 10 July, 2024;
originally announced July 2024.
-
Understanding the Role of User Profile in the Personalization of Large Language Models
Authors:
Bin Wu,
Zhengyan Shi,
Hossein A. Rahmani,
Varsha Ramineni,
Emine Yilmaz
Abstract:
Utilizing user profiles to personalize Large Language Models (LLMs) has been shown to enhance the performance on a wide range of tasks. However, the precise role of user profiles and their effect mechanism on LLMs remains unclear. This study first confirms that the effectiveness of user profiles is primarily due to personalization information rather than semantic information. Furthermore, we inves…
▽ More
Utilizing user profiles to personalize Large Language Models (LLMs) has been shown to enhance the performance on a wide range of tasks. However, the precise role of user profiles and their effect mechanism on LLMs remains unclear. This study first confirms that the effectiveness of user profiles is primarily due to personalization information rather than semantic information. Furthermore, we investigate how user profiles affect the personalization of LLMs. Within the user profile, we reveal that it is the historical personalized response produced or approved by users that plays a pivotal role in personalizing LLMs. This discovery unlocks the potential of LLMs to incorporate a greater number of user profiles within the constraints of limited input length. As for the position of user profiles, we observe that user profiles integrated into different positions of the input context do not contribute equally to personalization. Instead, where the user profile that is closer to the beginning affects more on the personalization of LLMs. Our findings reveal the role of user profiles for the personalization of LLMs, and showcase how incorporating user profiles impacts performance providing insight to leverage user profiles effectively.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
DisC-GS: Discontinuity-aware Gaussian Splatting
Authors:
Haoxuan Qu,
Zhuoling Li,
Hossein Rahmani,
Yujun Cai,
Jun Liu
Abstract:
Recently, Gaussian Splatting, a method that represents a 3D scene as a collection of Gaussian distributions, has gained significant attention in addressing the task of novel view synthesis. In this paper, we highlight a fundamental limitation of Gaussian Splatting: its inability to accurately render discontinuities and boundaries in images due to the continuous nature of Gaussian distributions. To…
▽ More
Recently, Gaussian Splatting, a method that represents a 3D scene as a collection of Gaussian distributions, has gained significant attention in addressing the task of novel view synthesis. In this paper, we highlight a fundamental limitation of Gaussian Splatting: its inability to accurately render discontinuities and boundaries in images due to the continuous nature of Gaussian distributions. To address this issue, we propose a novel framework enabling Gaussian Splatting to perform discontinuity-aware image rendering. Additionally, we introduce a Bézier-boundary gradient approximation strategy within our framework to keep the ``differentiability'' of the proposed discontinuity-aware rendering process. Extensive experiments demonstrate the efficacy of our framework.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
LAGA: Layered 3D Avatar Generation and Customization via Gaussian Splatting
Authors:
Jia Gong,
Shenyu Ji,
Lin Geng Foo,
Kang Chen,
Hossein Rahmani,
Jun Liu
Abstract:
Creating and customizing a 3D clothed avatar from textual descriptions is a critical and challenging task. Traditional methods often treat the human body and clothing as inseparable, limiting users' ability to freely mix and match garments. In response to this limitation, we present LAyered Gaussian Avatar (LAGA), a carefully designed framework enabling the creation of high-fidelity decomposable a…
▽ More
Creating and customizing a 3D clothed avatar from textual descriptions is a critical and challenging task. Traditional methods often treat the human body and clothing as inseparable, limiting users' ability to freely mix and match garments. In response to this limitation, we present LAyered Gaussian Avatar (LAGA), a carefully designed framework enabling the creation of high-fidelity decomposable avatars with diverse garments. By decoupling garments from avatar, our framework empowers users to conviniently edit avatars at the garment level. Our approach begins by modeling the avatar using a set of Gaussian points organized in a layered structure, where each layer corresponds to a specific garment or the human body itself. To generate high-quality garments for each layer, we introduce a coarse-to-fine strategy for diverse garment generation and a novel dual-SDS loss function to maintain coherence between the generated garments and avatar components, including the human body and other garments. Moreover, we introduce three regularization losses to guide the movement of Gaussians for garment transfer, allowing garments to be freely transferred to various avatars. Extensive experimentation demonstrates that our approach surpasses existing methods in the generation of 3D clothed humans.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Deep Learning-Based Object Pose Estimation: A Comprehensive Survey
Authors:
Jian Liu,
Wei Sun,
Hui Yang,
Zhiwen Zeng,
Chongpei Liu,
Jin Zheng,
Xingyu Liu,
Hossein Rahmani,
Nicu Sebe,
Ajmal Mian
Abstract:
Object pose estimation is a fundamental computer vision problem with broad applications in augmented reality and robotics. Over the past decade, deep learning models, due to their superior accuracy and robustness, have increasingly supplanted conventional algorithms reliant on engineered point pair features. Nevertheless, several challenges persist in contemporary methods, including their dependen…
▽ More
Object pose estimation is a fundamental computer vision problem with broad applications in augmented reality and robotics. Over the past decade, deep learning models, due to their superior accuracy and robustness, have increasingly supplanted conventional algorithms reliant on engineered point pair features. Nevertheless, several challenges persist in contemporary methods, including their dependency on labeled training data, model compactness, robustness under challenging conditions, and their ability to generalize to novel unseen objects. A recent survey discussing the progress made on different aspects of this area, outstanding challenges, and promising future directions, is missing. To fill this gap, we discuss the recent advances in deep learning-based object pose estimation, covering all three formulations of the problem, \emph{i.e.}, instance-level, category-level, and unseen object pose estimation. Our survey also covers multiple input data modalities, degrees-of-freedom of output poses, object properties, and downstream tasks, providing the readers with a holistic understanding of this field. Additionally, it discusses training paradigms of different domains, inference modes, application areas, evaluation metrics, and benchmark datasets, as well as reports the performance of current state-of-the-art methods on these benchmarks, thereby facilitating the readers in selecting the most suitable method for their application. Finally, the survey identifies key challenges, reviews the prevailing trends along with their pros and cons, and identifies promising directions for future research. We also keep tracing the latest works at https://github.com/CNJianLiu/Awesome-Object-Pose-Estimation.
△ Less
Submitted 31 May, 2024; v1 submitted 13 May, 2024;
originally announced May 2024.
-
Synthetic Test Collections for Retrieval Evaluation
Authors:
Hossein A. Rahmani,
Nick Craswell,
Emine Yilmaz,
Bhaskar Mitra,
Daniel Campos
Abstract:
Test collections play a vital role in evaluation of information retrieval (IR) systems. Obtaining a diverse set of user queries for test collection construction can be challenging, and acquiring relevance judgments, which indicate the appropriateness of retrieved documents to a query, is often costly and resource-intensive. Generating synthetic datasets using Large Language Models (LLMs) has recen…
▽ More
Test collections play a vital role in evaluation of information retrieval (IR) systems. Obtaining a diverse set of user queries for test collection construction can be challenging, and acquiring relevance judgments, which indicate the appropriateness of retrieved documents to a query, is often costly and resource-intensive. Generating synthetic datasets using Large Language Models (LLMs) has recently gained significant attention in various applications. In IR, while previous work exploited the capabilities of LLMs to generate synthetic queries or documents to augment training data and improve the performance of ranking models, using LLMs for constructing synthetic test collections is relatively unexplored. Previous studies demonstrate that LLMs have the potential to generate synthetic relevance judgments for use in the evaluation of IR systems. In this paper, we comprehensively investigate whether it is possible to use LLMs to construct fully synthetic test collections by generating not only synthetic judgments but also synthetic queries. In particular, we analyse whether it is possible to construct reliable synthetic test collections and the potential risks of bias such test collections may exhibit towards LLM-based models. Our experiments indicate that using LLMs it is possible to construct synthetic test collections that can reliably be used for retrieval evaluation.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Action Detection via an Image Diffusion Process
Authors:
Lin Geng Foo,
Tianjiao Li,
Hossein Rahmani,
Jun Liu
Abstract:
Action detection aims to localize the starting and ending points of action instances in untrimmed videos, and predict the classes of those instances. In this paper, we make the observation that the outputs of the action detection task can be formulated as images. Thus, from a novel perspective, we tackle action detection via a three-image generation process to generate starting point, ending point…
▽ More
Action detection aims to localize the starting and ending points of action instances in untrimmed videos, and predict the classes of those instances. In this paper, we make the observation that the outputs of the action detection task can be formulated as images. Thus, from a novel perspective, we tackle action detection via a three-image generation process to generate starting point, ending point and action-class predictions as images via our proposed Action Detection Image Diffusion (ADI-Diff) framework. Furthermore, since our images differ from natural images and exhibit special properties, we further explore a Discrete Action-Detection Diffusion Process and a Row-Column Transformer design to better handle their processing. Our ADI-Diff framework achieves state-of-the-art results on two widely-used datasets.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
LLMs are Good Sign Language Translators
Authors:
Jia Gong,
Lin Geng Foo,
Yixuan He,
Hossein Rahmani,
Jun Liu
Abstract:
Sign Language Translation (SLT) is a challenging task that aims to translate sign videos into spoken language. Inspired by the strong translation capabilities of large language models (LLMs) that are trained on extensive multilingual text corpora, we aim to harness off-the-shelf LLMs to handle SLT. In this paper, we regularize the sign videos to embody linguistic characteristics of spoken language…
▽ More
Sign Language Translation (SLT) is a challenging task that aims to translate sign videos into spoken language. Inspired by the strong translation capabilities of large language models (LLMs) that are trained on extensive multilingual text corpora, we aim to harness off-the-shelf LLMs to handle SLT. In this paper, we regularize the sign videos to embody linguistic characteristics of spoken language, and propose a novel SignLLM framework to transform sign videos into a language-like representation for improved readability by off-the-shelf LLMs. SignLLM comprises two key modules: (1) The Vector-Quantized Visual Sign module converts sign videos into a sequence of discrete character-level sign tokens, and (2) the Codebook Reconstruction and Alignment module converts these character-level tokens into word-level sign representations using an optimal transport formulation. A sign-text alignment loss further bridges the gap between sign and text tokens, enhancing semantic compatibility. We achieve state-of-the-art gloss-free results on two widely-used SLT benchmarks.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Preference-Based Planning in Stochastic Environments: From Partially-Ordered Temporal Goals to Most Preferred Policies
Authors:
Hazhar Rahmani,
Abhishek N. Kulkarni,
Jie Fu
Abstract:
Human preferences are not always represented via complete linear orders: It is natural to employ partially-ordered preferences for expressing incomparable outcomes. In this work, we consider decision-making and probabilistic planning in stochastic systems modeled as Markov decision processes (MDPs), given a partially ordered preference over a set of temporally extended goals. Specifically, each te…
▽ More
Human preferences are not always represented via complete linear orders: It is natural to employ partially-ordered preferences for expressing incomparable outcomes. In this work, we consider decision-making and probabilistic planning in stochastic systems modeled as Markov decision processes (MDPs), given a partially ordered preference over a set of temporally extended goals. Specifically, each temporally extended goal is expressed using a formula in Linear Temporal Logic on Finite Traces (LTL$_f$). To plan with the partially ordered preference, we introduce order theory to map a preference over temporal goals to a preference over policies for the MDP. Accordingly, a most preferred policy under a stochastic ordering induces a stochastic nondominated probability distribution over the finite paths in the MDP. To synthesize a most preferred policy, our technical approach includes two key steps. In the first step, we develop a procedure to transform a partially ordered preference over temporal goals into a computational model, called preference automaton, which is a semi-automaton with a partial order over acceptance conditions. In the second step, we prove that finding a most preferred policy is equivalent to computing a Pareto-optimal policy in a multi-objective MDP that is constructed from the original MDP, the preference automaton, and the chosen stochastic ordering relation. Throughout the paper, we employ running examples to illustrate the proposed preference specification and solution approaches. We demonstrate the efficacy of our algorithm using these examples, providing detailed analysis, and then discuss several potential future directions.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Weakly Supervised Co-training with Swapping Assignments for Semantic Segmentation
Authors:
Xinyu Yang,
Hossein Rahmani,
Sue Black,
Bryan M. Williams
Abstract:
Class activation maps (CAMs) are commonly employed in weakly supervised semantic segmentation (WSSS) to produce pseudo-labels. Due to incomplete or excessive class activation, existing studies often resort to offline CAM refinement, introducing additional stages or proposing offline modules. This can cause optimization difficulties for single-stage methods and limit generalizability. In this study…
▽ More
Class activation maps (CAMs) are commonly employed in weakly supervised semantic segmentation (WSSS) to produce pseudo-labels. Due to incomplete or excessive class activation, existing studies often resort to offline CAM refinement, introducing additional stages or proposing offline modules. This can cause optimization difficulties for single-stage methods and limit generalizability. In this study, we aim to reduce the observed CAM inconsistency and error to mitigate reliance on refinement processes. We propose an end-to-end WSSS model incorporating guided CAMs, wherein our segmentation model is trained while concurrently optimizing CAMs online. Our method, Co-training with Swapping Assignments (CoSA), leverages a dual-stream framework, where one sub-network learns from the swapped assignments generated by the other. We introduce three techniques: i) soft perplexity-based regularization to penalize uncertain regions; ii) a threshold-searching approach to dynamically revise the confidence threshold; and iii) contrastive separation to address the coexistence problem. CoSA demonstrates exceptional performance, achieving mIoU of 76.2\% and 51.0\% on VOC and COCO validation datasets, respectively, surpassing existing baselines by a substantial margin. Notably, CoSA is the first single-stage approach to outperform all existing multi-stage methods including those with additional supervision. Code is avilable at \url{https://github.com/youshyee/CoSA}.
△ Less
Submitted 9 July, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
DestripeCycleGAN: Stripe Simulation CycleGAN for Unsupervised Infrared Image Destriping
Authors:
Shiqi Yang,
Hanlin Qin,
Shuai Yuan,
Xiang Yan,
Hossein Rahmani
Abstract:
CycleGAN has been proven to be an advanced approach for unsupervised image restoration. This framework consists of two generators: a denoising one for inference and an auxiliary one for modeling noise to fulfill cycle-consistency constraints. However, when applied to the infrared destriping task, it becomes challenging for the vanilla auxiliary generator to consistently produce vertical noise unde…
▽ More
CycleGAN has been proven to be an advanced approach for unsupervised image restoration. This framework consists of two generators: a denoising one for inference and an auxiliary one for modeling noise to fulfill cycle-consistency constraints. However, when applied to the infrared destriping task, it becomes challenging for the vanilla auxiliary generator to consistently produce vertical noise under unsupervised constraints. This poses a threat to the effectiveness of the cycle-consistency loss, leading to stripe noise residual in the denoised image. To address the above issue, we present a novel framework for single-frame infrared image destriping, named DestripeCycleGAN. In this model, the conventional auxiliary generator is replaced with a priori stripe generation model (SGM) to introduce vertical stripe noise in the clean data, and the gradient map is employed to re-establish cycle-consistency. Meanwhile, a Haar wavelet background guidance module (HBGM) has been designed to minimize the divergence of background details between the different domains. To preserve vertical edges, a multi-level wavelet U-Net (MWUNet) is proposed as the denoising generator, which utilizes the Haar wavelet transform as the sampler to decline directional information loss. Moreover, it incorporates the group fusion block (GFB) into skip connections to fuse the multi-scale features and build the context of long-distance dependencies. Extensive experiments on real and synthetic data demonstrate that our DestripeCycleGAN surpasses the state-of-the-art methods in terms of visual quality and quantitative evaluation. Our code will be made public at https://github.com/0wuji/DestripeCycleGAN.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Natural Language User Profiles for Transparent and Scrutable Recommendations
Authors:
Jerome Ramos,
Hossen A. Rahmani,
Xi Wang,
Xiao Fu,
Aldo Lipani
Abstract:
Current state-of-the-art recommender systems predominantly rely on either implicit or explicit feedback from users to suggest new items. While effective in recommending novel options, these conventional systems often use uninterpretable embeddings. This lack of transparency not only limits user understanding of why certain items are suggested but also reduces the user's ability to easily scrutiniz…
▽ More
Current state-of-the-art recommender systems predominantly rely on either implicit or explicit feedback from users to suggest new items. While effective in recommending novel options, these conventional systems often use uninterpretable embeddings. This lack of transparency not only limits user understanding of why certain items are suggested but also reduces the user's ability to easily scrutinize and edit their preferences. For example, if a user has a change in interests, they would need to make significant changes to their interaction history to adjust the model's recommendations. To address these limitations, we introduce a novel method that utilizes user reviews to craft personalized, natural language profiles describing users' preferences. Through these descriptive profiles, our system provides transparent recommendations in natural language. Our evaluations show that this novel approach maintains a performance level on par with established recommender systems, but with the added benefits of transparency and user control. By enabling users to scrutinize why certain items are recommended, they can more easily verify, adjust, and have greater autonomy over their recommendations.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Clarifying the Path to User Satisfaction: An Investigation into Clarification Usefulness
Authors:
Hossein A. Rahmani,
Xi Wang,
Mohammad Aliannejadi,
Mohammadmehdi Naghiaei,
Emine Yilmaz
Abstract:
Clarifying questions are an integral component of modern information retrieval systems, directly impacting user satisfaction and overall system performance. Poorly formulated questions can lead to user frustration and confusion, negatively affecting the system's performance. This research addresses the urgent need to identify and leverage key features that contribute to the classification of clari…
▽ More
Clarifying questions are an integral component of modern information retrieval systems, directly impacting user satisfaction and overall system performance. Poorly formulated questions can lead to user frustration and confusion, negatively affecting the system's performance. This research addresses the urgent need to identify and leverage key features that contribute to the classification of clarifying questions, enhancing user satisfaction. To gain deeper insights into how different features influence user satisfaction, we conduct a comprehensive analysis, considering a broad spectrum of lexical, semantic, and statistical features, such as question length and sentiment polarity. Our empirical results provide three main insights into the qualities of effective query clarification: (1) specific questions are more effective than generic ones; (2) the subjectivity and emotional tone of a question play a role; and (3) shorter and more ambiguous queries benefit significantly from clarification. Based on these insights, we implement feature-integrated user satisfaction prediction using various classifiers, both traditional and neural-based, including random forest, BERT, and large language models. Our experiments show a consistent and significant improvement, particularly in traditional classifiers, with a minimum performance boost of 45\%. This study presents invaluable guidelines for refining the formulation of clarifying questions and enhancing both user satisfaction and system performance.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
A Personalized Framework for Consumer and Producer Group Fairness Optimization in Recommender Systems
Authors:
Hossein A. Rahmani,
Mohammadmehdi Naghiaei,
Yashar Deldjoo
Abstract:
In recent years, there has been an increasing recognition that when machine learning (ML) algorithms are used to automate decisions, they may mistreat individuals or groups, with legal, ethical, or economic implications. Recommender systems are prominent examples of these machine learning (ML) systems that aid users in making decisions. The majority of past literature research on RS fairness treat…
▽ More
In recent years, there has been an increasing recognition that when machine learning (ML) algorithms are used to automate decisions, they may mistreat individuals or groups, with legal, ethical, or economic implications. Recommender systems are prominent examples of these machine learning (ML) systems that aid users in making decisions. The majority of past literature research on RS fairness treats user and item fairness concerns independently, ignoring the fact that recommender systems function in a two-sided marketplace. In this paper, we propose CP-FairRank, an optimization-based re-ranking algorithm that seamlessly integrates fairness constraints from both the consumer and producer side in a joint objective framework. The framework is generalizable and may take into account varied fairness settings based on group segmentation, recommendation model selection, and domain, which is one of its key characteristics. For instance, we demonstrate that the system may jointly increase consumer and producer fairness when (un)protected consumer groups are defined on the basis of their activity level and main-streamness, while producer groups are defined according to their popularity level. For empirical validation, through large-scale on eight datasets and four mainstream collaborative filtering (CF) recommendation models, we demonstrate that our proposed strategy is able to improve both consumer and producer fairness without compromising or very little overall recommendation quality, demonstrating the role algorithms may play in avoiding data biases.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports
Authors:
Haopeng Li,
Andong Deng,
Qiuhong Ke,
Jun Liu,
Hossein Rahmani,
Yulan Guo,
Bernt Schiele,
Chen Chen
Abstract:
Reasoning over sports videos for question answering is an important task with numerous applications, such as player training and information retrieval. However, this task has not been explored due to the lack of relevant datasets and the challenging nature it presents. Most datasets for video question answering (VideoQA) focus mainly on general and coarse-grained understanding of daily-life videos…
▽ More
Reasoning over sports videos for question answering is an important task with numerous applications, such as player training and information retrieval. However, this task has not been explored due to the lack of relevant datasets and the challenging nature it presents. Most datasets for video question answering (VideoQA) focus mainly on general and coarse-grained understanding of daily-life videos, which is not applicable to sports scenarios requiring professional action understanding and fine-grained motion analysis. In this paper, we introduce the first dataset, named Sports-QA, specifically designed for the sports VideoQA task. The Sports-QA dataset includes various types of questions, such as descriptions, chronologies, causalities, and counterfactual conditions, covering multiple sports. Furthermore, to address the characteristics of the sports VideoQA task, we propose a new Auto-Focus Transformer (AFT) capable of automatically focusing on particular scales of temporal information for question answering. We conduct extensive experiments on Sports-QA, including baseline studies and the evaluation of different methods. The results demonstrate that our AFT achieves state-of-the-art performance.
△ Less
Submitted 14 February, 2024; v1 submitted 2 January, 2024;
originally announced January 2024.
-
3D Points Splatting for Real-Time Dynamic Hand Reconstruction
Authors:
Zheheng Jiang,
Hossein Rahmani,
Sue Black,
Bryan M. Williams
Abstract:
We present 3D Points Splatting Hand Reconstruction (3D-PSHR), a real-time and photo-realistic hand reconstruction approach. We propose a self-adaptive canonical points upsampling strategy to achieve high-resolution hand geometry representation. This is followed by a self-adaptive deformation that deforms the hand from the canonical space to the target pose, adapting to the dynamic changing of cano…
▽ More
We present 3D Points Splatting Hand Reconstruction (3D-PSHR), a real-time and photo-realistic hand reconstruction approach. We propose a self-adaptive canonical points upsampling strategy to achieve high-resolution hand geometry representation. This is followed by a self-adaptive deformation that deforms the hand from the canonical space to the target pose, adapting to the dynamic changing of canonical points which, in contrast to the common practice of subdividing the MANO model, offers greater flexibility and results in improved geometry fitting. To model texture, we disentangle the appearance color into the intrinsic albedo and pose-aware shading, which are learned through a Context-Attention module. Moreover, our approach allows the geometric and the appearance models to be trained simultaneously in an end-to-end manner. We demonstrate that our method is capable of producing animatable, photorealistic and relightable hand reconstructions using multiple datasets, including monocular videos captured with handheld smartphones and large-scale multi-view videos featuring various hand poses. We also demonstrate that our approach achieves real-time rendering speeds while simultaneously maintaining superior performance compared to existing state-of-the-art methods.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation
Authors:
Xi Wang,
Hossein A. Rahmani,
Jiqun Liu,
Emine Yilmaz
Abstract:
Conversational Recommendation System (CRS) is a rapidly growing research area that has gained significant attention alongside advancements in language modelling techniques. However, the current state of conversational recommendation faces numerous challenges due to its relative novelty and limited existing contributions. In this study, we delve into benchmark datasets for developing CRS models and…
▽ More
Conversational Recommendation System (CRS) is a rapidly growing research area that has gained significant attention alongside advancements in language modelling techniques. However, the current state of conversational recommendation faces numerous challenges due to its relative novelty and limited existing contributions. In this study, we delve into benchmark datasets for developing CRS models and address potential biases arising from the feedback loop inherent in multi-turn interactions, including selection bias and multiple popularity bias variants. Drawing inspiration from the success of generative data via using language models and data augmentation techniques, we present two novel strategies, 'Once-Aug' and 'PopNudge', to enhance model performance while mitigating biases. Through extensive experiments on ReDial and TG-ReDial benchmark datasets, we show a consistent improvement of CRS techniques with our data augmentation approaches and offer additional insights on addressing multiple newly formulated biases.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Provider Fairness and Beyond-Accuracy Trade-offs in Recommender Systems
Authors:
Saeedeh Karimi,
Hossein A. Rahmani,
Mohammadmehdi Naghiaei,
Leila Safari
Abstract:
Recommender systems, while transformative in online user experiences, have raised concerns over potential provider-side fairness issues. These systems may inadvertently favor popular items, thereby marginalizing less popular ones and compromising provider fairness. While previous research has recognized provider-side fairness issues, the investigation into how these biases affect beyond-accuracy a…
▽ More
Recommender systems, while transformative in online user experiences, have raised concerns over potential provider-side fairness issues. These systems may inadvertently favor popular items, thereby marginalizing less popular ones and compromising provider fairness. While previous research has recognized provider-side fairness issues, the investigation into how these biases affect beyond-accuracy aspects of recommendation systems - such as diversity, novelty, coverage, and serendipity - has been less emphasized. In this paper, we address this gap by introducing a simple yet effective post-processing re-ranking model that prioritizes provider fairness, while simultaneously maintaining user relevance and recommendation quality. We then conduct an in-depth evaluation of the model's impact on various aspects of recommendation quality across multiple datasets. Specifically, we apply the post-processing algorithm to four distinct recommendation models across four varied domain datasets, assessing the improvement in each metric, encompassing both accuracy and beyond-accuracy aspects. This comprehensive analysis allows us to gauge the effectiveness of our approach in mitigating provider biases. Our findings underscore the effectiveness of the adopted method in improving provider fairness and recommendation quality. They also provide valuable insights into the trade-offs involved in achieving fairness in recommender systems, contributing to a more nuanced understanding of this complex issue.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
Cellular Wireless Networks in the Upper Mid-Band
Authors:
Seongjoon Kang,
Marco Mezzavilla,
Sundeep Rangan,
Arjuna Madanayake,
Satheesh Bojja Venkatakrishnan,
Gregory Hellbourg,
Monisha Ghosh,
Hamed Rahmani,
Aditya Dhananjay
Abstract:
The upper mid-band - roughly from 7 to 24 GHz - has attracted considerable recent interest for new cellular services. This frequency range has vastly more spectrum than the highly congested bands below 7 GHz while offering more favorable propagation and coverage than the millimeter wave (mmWave) frequencies. The upper mid-band can thus provide a powerful and complementary frequency range to balanc…
▽ More
The upper mid-band - roughly from 7 to 24 GHz - has attracted considerable recent interest for new cellular services. This frequency range has vastly more spectrum than the highly congested bands below 7 GHz while offering more favorable propagation and coverage than the millimeter wave (mmWave) frequencies. The upper mid-band can thus provide a powerful and complementary frequency range to balance coverage and capacity. Realizing the full potential of these bands, however, will require fundamental changes to the design of cellular systems. Most importantly, spectrum will likely need to be shared with incumbents including communication satellites, military RADAR, and radio astronomy. Also, the upper mid-band is simply a vast frequency range. Due to this wide bandwidth, combined with the directional nature of transmission and intermittent occupancy of incumbents, cellular systems will need to be agile to sense and intelligently use large spatial and frequency degrees of freedom. This paper attempts to provide an initial assessment of the feasibility and potential gains of wideband cellular systems operating in the upper mid-band. The study includes: (1) a system study to assess potential gains of multi-band systems in a representative dense urban environment and illustrate the value of wide band system with dynamic frequency selectivity; (2) an evaluation of potential cross interference between satellites and terrestrial cellular services and interference nulling to reduce that interference; and (3) design and evaluation of a compact multi-band antenna array structure. Leveraging these preliminary results, we identify potential future research directions to realize next-generation systems in these frequencies.
△ Less
Submitted 6 March, 2024; v1 submitted 6 September, 2023;
originally announced September 2023.
-
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
Authors:
Lin Geng Foo,
Hossein Rahmani,
Jun Liu
Abstract:
AI-generated content (AIGC) methods aim to produce text, images, videos, 3D assets, and other media using AI algorithms. Due to its wide range of applications and the demonstrated potential of recent works, AIGC developments have been attracting lots of attention recently, and AIGC methods have been developed for various data modalities, such as image, video, text, 3D shape (as voxels, point cloud…
▽ More
AI-generated content (AIGC) methods aim to produce text, images, videos, 3D assets, and other media using AI algorithms. Due to its wide range of applications and the demonstrated potential of recent works, AIGC developments have been attracting lots of attention recently, and AIGC methods have been developed for various data modalities, such as image, video, text, 3D shape (as voxels, point clouds, meshes, and neural implicit fields), 3D scene, 3D human avatar (body and head), 3D motion, and audio -- each presenting different characteristics and challenges. Furthermore, there have also been many significant developments in cross-modality AIGC methods, where generative methods can receive conditioning input in one modality and produce outputs in another. Examples include going from various modalities to image, video, 3D shape, 3D scene, 3D avatar (body and head), 3D motion (skeleton and avatar), and audio modalities. In this paper, we provide a comprehensive review of AIGC methods across different data modalities, including both single-modality and cross-modality methods, highlighting the various challenges, representative works, and recent technical directions in each setting. We also survey the representative datasets throughout the modalities, and present comparative results for various modalities. Moreover, we also discuss the challenges and potential future research directions.
△ Less
Submitted 21 October, 2023; v1 submitted 27 August, 2023;
originally announced August 2023.
-
Distribution-Aligned Diffusion for Human Mesh Recovery
Authors:
Lin Geng Foo,
Jia Gong,
Hossein Rahmani,
Jun Liu
Abstract:
Recovering a 3D human mesh from a single RGB image is a challenging task due to depth ambiguity and self-occlusion, resulting in a high degree of uncertainty. Meanwhile, diffusion models have recently seen much success in generating high-quality outputs by progressively denoising noisy inputs. Inspired by their capability, we explore a diffusion-based approach for human mesh recovery, and propose…
▽ More
Recovering a 3D human mesh from a single RGB image is a challenging task due to depth ambiguity and self-occlusion, resulting in a high degree of uncertainty. Meanwhile, diffusion models have recently seen much success in generating high-quality outputs by progressively denoising noisy inputs. Inspired by their capability, we explore a diffusion-based approach for human mesh recovery, and propose a Human Mesh Diffusion (HMDiff) framework which frames mesh recovery as a reverse diffusion process. We also propose a Distribution Alignment Technique (DAT) that infuses prior distribution information into the mesh distribution diffusion process, and provides useful prior knowledge to facilitate the mesh recovery task. Our method achieves state-of-the-art performance on three widely used datasets. Project page: https://gongjia0208.github.io/HMDiff/.
△ Less
Submitted 24 October, 2023; v1 submitted 25 August, 2023;
originally announced August 2023.
-
Optimal Sensor Deception to Deviate from an Allowed Itinerary
Authors:
Hazhar Rahmani,
Arash Ahadi,
Jie Fu
Abstract:
In this work, we study a class of deception planning problems in which an agent aims to alter a security monitoring system's sensor readings so as to disguise its adversarial itinerary as an allowed itinerary in the environment. The adversarial itinerary set and allowed itinerary set are captured by regular languages. To deviate without being detected, we investigate whether there exists a strateg…
▽ More
In this work, we study a class of deception planning problems in which an agent aims to alter a security monitoring system's sensor readings so as to disguise its adversarial itinerary as an allowed itinerary in the environment. The adversarial itinerary set and allowed itinerary set are captured by regular languages. To deviate without being detected, we investigate whether there exists a strategy for the agent to alter the sensor readings, with a minimal cost, such that for any of those paths it takes, the system thinks the agent took a path within the allowed itinerary. Our formulation assumes an offline sensor alteration where the agent determines the sensor alteration strategy and implement it, and then carry out any path in its deviation itinerary. We prove that the problem of solving the optimal sensor alteration is NP-hard, by a reduction from the directed multi-cut problem. Further, we present an exact algorithm based on integer linear programming and demonstrate the correctness and the efficacy of the algorithm in case studies.
△ Less
Submitted 27 June, 2024; v1 submitted 1 August, 2023;
originally announced August 2023.
-
CAPRI: Context-Aware Interpretable Point-of-Interest Recommendation Framework
Authors:
Ali Tourani,
Hossein A. Rahmani,
Mohammadmehdi Naghiaei,
Yashar Deldjoo
Abstract:
Point-of-Interest (POI ) recommendation systems have gained popularity for their unique ability to suggest geographical destinations with the incorporation of contextual information such as time, location, and user-item interaction. Existing recommendation frameworks lack the contextual fusion required for POI systems. This paper presents CAPRI, a novel POI recommendation framework that effectivel…
▽ More
Point-of-Interest (POI ) recommendation systems have gained popularity for their unique ability to suggest geographical destinations with the incorporation of contextual information such as time, location, and user-item interaction. Existing recommendation frameworks lack the contextual fusion required for POI systems. This paper presents CAPRI, a novel POI recommendation framework that effectively integrates context-aware models, such as GeoSoCa, LORE, and USG, and introduces a novel strategy for the efficient merging of contextual information. CAPRI integrates an evaluation module that expands the evaluation scope beyond accuracy to include novelty, personalization, diversity, and fairness. With an aim to establish a new industry standard for reproducible results in the realm of POI recommendation systems, we have made CAPRI openly accessible on GitHub, facilitating easy access and contribution to the continued development and refinement of this innovative framework.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
A Survey on Asking Clarification Questions Datasets in Conversational Systems
Authors:
Hossein A. Rahmani,
Xi Wang,
Yue Feng,
Qiang Zhang,
Emine Yilmaz,
Aldo Lipani
Abstract:
The ability to understand a user's underlying needs is critical for conversational systems, especially with limited input from users in a conversation. Thus, in such a domain, Asking Clarification Questions (ACQs) to reveal users' true intent from their queries or utterances arise as an essential task. However, it is noticeable that a key limitation of the existing ACQs studies is their incomparab…
▽ More
The ability to understand a user's underlying needs is critical for conversational systems, especially with limited input from users in a conversation. Thus, in such a domain, Asking Clarification Questions (ACQs) to reveal users' true intent from their queries or utterances arise as an essential task. However, it is noticeable that a key limitation of the existing ACQs studies is their incomparability, from inconsistent use of data, distinct experimental setups and evaluation strategies. Therefore, in this paper, to assist the development of ACQs techniques, we comprehensively analyse the current ACQs research status, which offers a detailed comparison of publicly available datasets, and discusses the applied evaluation metrics, joined with benchmarks for multiple ACQs-related tasks. In particular, given a thorough analysis of the ACQs task, we discuss a number of corresponding research directions for the investigation of ACQs as well as the development of conversational systems.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Towards Asking Clarification Questions for Information Seeking on Task-Oriented Dialogues
Authors:
Yue Feng,
Hossein A. Rahmani,
Aldo Lipani,
Emine Yilmaz
Abstract:
Task-oriented dialogue systems aim at providing users with task-specific services. Users of such systems often do not know all the information about the task they are trying to accomplish, requiring them to seek information about the task. To provide accurate and personalized task-oriented information seeking results, task-oriented dialogue systems need to address two potential issues: 1) users' i…
▽ More
Task-oriented dialogue systems aim at providing users with task-specific services. Users of such systems often do not know all the information about the task they are trying to accomplish, requiring them to seek information about the task. To provide accurate and personalized task-oriented information seeking results, task-oriented dialogue systems need to address two potential issues: 1) users' inability to describe their complex information needs in their requests; and 2) ambiguous/missing information the system has about the users. In this paper, we propose a new Multi-Attention Seq2Seq Network, named MAS2S, which can ask questions to clarify the user's information needs and the user's profile in task-oriented information seeking. We also extend an existing dataset for task-oriented information seeking, leading to the \ourdataset which contains about 100k task-oriented information seeking dialogues that are made publicly available\footnote{Dataset and code is available at \href{https://github.com/sweetalyssum/clarit}{https://github.com/sweetalyssum/clarit}.}. Experimental results on \ourdataset show that MAS2S outperforms baselines on both clarification question generation and answer prediction.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
When and What to Ask Through World States and Text Instructions: IGLU NLP Challenge Solution
Authors:
Zhengxiang Shi,
Jerome Ramos,
To Eun Kim,
Xi Wang,
Hossein A. Rahmani,
Aldo Lipani
Abstract:
In collaborative tasks, effective communication is crucial for achieving joint goals. One such task is collaborative building where builders must communicate with each other to construct desired structures in a simulated environment such as Minecraft. We aim to develop an intelligent builder agent to build structures based on user input through dialogue. However, in collaborative building, builder…
▽ More
In collaborative tasks, effective communication is crucial for achieving joint goals. One such task is collaborative building where builders must communicate with each other to construct desired structures in a simulated environment such as Minecraft. We aim to develop an intelligent builder agent to build structures based on user input through dialogue. However, in collaborative building, builders may encounter situations that are difficult to interpret based on the available information and instructions, leading to ambiguity. In the NeurIPS 2022 Competition NLP Task, we address two key research questions, with the goal of filling this gap: when should the agent ask for clarification, and what clarification questions should it ask? We move towards this target with two sub-tasks, a classification task and a ranking task. For the classification task, the goal is to determine whether the agent should ask for clarification based on the current world state and dialogue history. For the ranking task, the goal is to rank the relevant clarification questions from a pool of candidates. In this report, we briefly introduce our methods for the classification and ranking task. For the classification task, our model achieves an F1 score of 0.757, which placed the 3rd on the leaderboard. For the ranking task, our model achieves about 0.38 for Mean Reciprocal Rank by extending the traditional ranking model. Lastly, we discuss various neural approaches for the ranking task and future direction.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
A Probabilistic Attention Model with Occlusion-aware Texture Regression for 3D Hand Reconstruction from a Single RGB Image
Authors:
Zheheng Jiang,
Hossein Rahmani,
Sue Black,
Bryan M. Williams
Abstract:
Recently, deep learning based approaches have shown promising results in 3D hand reconstruction from a single RGB image. These approaches can be roughly divided into model-based approaches, which are heavily dependent on the model's parameter space, and model-free approaches, which require large numbers of 3D ground truths to reduce depth ambiguity and struggle in weakly-supervised scenarios. To o…
▽ More
Recently, deep learning based approaches have shown promising results in 3D hand reconstruction from a single RGB image. These approaches can be roughly divided into model-based approaches, which are heavily dependent on the model's parameter space, and model-free approaches, which require large numbers of 3D ground truths to reduce depth ambiguity and struggle in weakly-supervised scenarios. To overcome these issues, we propose a novel probabilistic model to achieve the robustness of model-based approaches and reduced dependence on the model's parameter space of model-free approaches. The proposed probabilistic model incorporates a model-based network as a prior-net to estimate the prior probability distribution of joints and vertices. An Attention-based Mesh Vertices Uncertainty Regression (AMVUR) model is proposed to capture dependencies among vertices and the correlation between joints and mesh vertices to improve their feature representation. We further propose a learning based occlusion-aware Hand Texture Regression model to achieve high-fidelity texture reconstruction. We demonstrate the flexibility of the proposed probabilistic model to be trained in both supervised and weakly-supervised scenarios. The experimental results demonstrate our probabilistic model's state-of-the-art accuracy in 3D hand and texture reconstruction from a single image in both training schemes, including in the presence of severe occlusions.
△ Less
Submitted 27 April, 2023;
originally announced April 2023.
-
Probabilistic Planning with Prioritized Preferences over Temporal Logic Objectives
Authors:
Lening Li,
Hazhar Rahmani,
Jie Fu
Abstract:
This paper studies temporal planning in probabilistic environments, modeled as labeled Markov decision processes (MDPs), with user preferences over multiple temporal goals. Existing works reflect such preferences as a prioritized list of goals. This paper introduces a new specification language, termed prioritized qualitative choice linear temporal logic on finite traces, which augments linear tem…
▽ More
This paper studies temporal planning in probabilistic environments, modeled as labeled Markov decision processes (MDPs), with user preferences over multiple temporal goals. Existing works reflect such preferences as a prioritized list of goals. This paper introduces a new specification language, termed prioritized qualitative choice linear temporal logic on finite traces, which augments linear temporal logic on finite traces with prioritized conjunction and ordered disjunction from prioritized qualitative choice logic. This language allows for succinctly specifying temporal objectives with corresponding preferences accomplishing each temporal task. The finite traces that describe the system's behaviors are ranked based on their dissatisfaction scores with respect to the formula. We propose a systematic translation from the new language to a weighted deterministic finite automaton. Utilizing this computational model, we formulate and solve a problem of computing an optimal policy that minimizes the expected score of dissatisfaction given user preferences. We demonstrate the efficacy and applicability of the logic and the algorithm on several case studies with detailed analyses for each.
△ Less
Submitted 23 April, 2023;
originally announced April 2023.
-
GradMDM: Adversarial Attack on Dynamic Networks
Authors:
Jianhong Pan,
Lin Geng Foo,
Qichen Zheng,
Zhipeng Fan,
Hossein Rahmani,
Qiuhong Ke,
Jun Liu
Abstract:
Dynamic neural networks can greatly reduce computation redundancy without compromising accuracy by adapting their structures based on the input. In this paper, we explore the robustness of dynamic neural networks against energy-oriented attacks targeted at reducing their efficiency. Specifically, we attack dynamic models with our novel algorithm GradMDM. GradMDM is a technique that adjusts the dir…
▽ More
Dynamic neural networks can greatly reduce computation redundancy without compromising accuracy by adapting their structures based on the input. In this paper, we explore the robustness of dynamic neural networks against energy-oriented attacks targeted at reducing their efficiency. Specifically, we attack dynamic models with our novel algorithm GradMDM. GradMDM is a technique that adjusts the direction and the magnitude of the gradients to effectively find a small perturbation for each input, that will activate more computational units of dynamic models during inference. We evaluate GradMDM on multiple datasets and dynamic models, where it outperforms previous energy-oriented attack techniques, significantly increasing computation complexity while reducing the perceptibility of the perturbations.
△ Less
Submitted 1 April, 2023;
originally announced April 2023.
-
Token Boosting for Robust Self-Supervised Visual Transformer Pre-training
Authors:
Tianjiao Li,
Lin Geng Foo,
Ping Hu,
Xindi Shang,
Hossein Rahmani,
Zehuan Yuan,
Jun Liu
Abstract:
Learning with large-scale unlabeled data has become a powerful tool for pre-training Visual Transformers (VTs). However, prior works tend to overlook that, in real-world scenarios, the input data may be corrupted and unreliable. Pre-training VTs on such corrupted data can be challenging, especially when we pre-train via the masked autoencoding approach, where both the inputs and masked ``ground tr…
▽ More
Learning with large-scale unlabeled data has become a powerful tool for pre-training Visual Transformers (VTs). However, prior works tend to overlook that, in real-world scenarios, the input data may be corrupted and unreliable. Pre-training VTs on such corrupted data can be challenging, especially when we pre-train via the masked autoencoding approach, where both the inputs and masked ``ground truth" targets can potentially be unreliable in this case. To address this limitation, we introduce the Token Boosting Module (TBM) as a plug-and-play component for VTs that effectively allows the VT to learn to extract clean and robust features during masked autoencoding pre-training. We provide theoretical analysis to show how TBM improves model pre-training with more robust and generalizable representations, thus benefiting downstream tasks. We conduct extensive experiments to analyze TBM's effectiveness, and results on four corrupted datasets demonstrate that TBM consistently improves performance on downstream tasks.
△ Less
Submitted 12 April, 2023; v1 submitted 9 April, 2023;
originally announced April 2023.
-
Progressive Channel-Shrinking Network
Authors:
Jianhong Pan,
Siyuan Yang,
Lin Geng Foo,
Qiuhong Ke,
Hossein Rahmani,
Zhipeng Fan,
Jun Liu
Abstract:
Currently, salience-based channel pruning makes continuous breakthroughs in network compression. In the realization, the salience mechanism is used as a metric of channel salience to guide pruning. Therefore, salience-based channel pruning can dynamically adjust the channel width at run-time, which provides a flexible pruning scheme. However, there are two problems emerging: a gating function is o…
▽ More
Currently, salience-based channel pruning makes continuous breakthroughs in network compression. In the realization, the salience mechanism is used as a metric of channel salience to guide pruning. Therefore, salience-based channel pruning can dynamically adjust the channel width at run-time, which provides a flexible pruning scheme. However, there are two problems emerging: a gating function is often needed to truncate the specific salience entries to zero, which destabilizes the forward propagation; dynamic architecture brings more cost for indexing in inference which bottlenecks the inference speed. In this paper, we propose a Progressive Channel-Shrinking (PCS) method to compress the selected salience entries at run-time instead of roughly approximating them to zero. We also propose a Running Shrinking Policy to provide a testing-static pruning scheme that can reduce the memory access cost for filter indexing. We evaluate our method on ImageNet and CIFAR10 datasets over two prevalent networks: ResNet and VGG, and demonstrate that our PCS outperforms all baselines and achieves state-of-the-art in terms of compression-performance tradeoff. Moreover, we observe a significant and practical acceleration of inference.
△ Less
Submitted 1 April, 2023;
originally announced April 2023.
-
GNN-based physics solver for time-independent PDEs
Authors:
Rini Jasmine Gladstone,
Helia Rahmani,
Vishvas Suryakumar,
Hadi Meidani,
Marta D'Elia,
Ahmad Zareei
Abstract:
Physics-based deep learning frameworks have shown to be effective in accurately modeling the dynamics of complex physical systems with generalization capability across problem inputs. However, time-independent problems pose the challenge of requiring long-range exchange of information across the computational domain for obtaining accurate predictions. In the context of graph neural networks (GNNs)…
▽ More
Physics-based deep learning frameworks have shown to be effective in accurately modeling the dynamics of complex physical systems with generalization capability across problem inputs. However, time-independent problems pose the challenge of requiring long-range exchange of information across the computational domain for obtaining accurate predictions. In the context of graph neural networks (GNNs), this calls for deeper networks, which, in turn, may compromise or slow down the training process. In this work, we present two GNN architectures to overcome this challenge - the Edge Augmented GNN and the Multi-GNN. We show that both these networks perform significantly better (by a factor of 1.5 to 2) than baseline methods when applied to time-independent solid mechanics problems. Furthermore, the proposed architectures generalize well to unseen domains, boundary conditions, and materials. Here, the treatment of variable domains is facilitated by a novel coordinate transformation that enables rotation and translation invariance. By broadening the range of problems that neural operators based on graph neural networks can tackle, this paper provides the groundwork for their application to complex scientific and industrial settings.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
DiffPose: Toward More Reliable 3D Pose Estimation
Authors:
Jia Gong,
Lin Geng Foo,
Zhipeng Fan,
Qiuhong Ke,
Hossein Rahmani,
Jun Liu
Abstract:
Monocular 3D human pose estimation is quite challenging due to the inherent ambiguity and occlusion, which often lead to high uncertainty and indeterminacy. On the other hand, diffusion models have recently emerged as an effective tool for generating high-quality images from noise. Inspired by their capability, we explore a novel pose estimation framework (DiffPose) that formulates 3D pose estimat…
▽ More
Monocular 3D human pose estimation is quite challenging due to the inherent ambiguity and occlusion, which often lead to high uncertainty and indeterminacy. On the other hand, diffusion models have recently emerged as an effective tool for generating high-quality images from noise. Inspired by their capability, we explore a novel pose estimation framework (DiffPose) that formulates 3D pose estimation as a reverse diffusion process. We incorporate novel designs into our DiffPose to facilitate the diffusion process for 3D pose estimation: a pose-specific initialization of pose uncertainty distributions, a Gaussian Mixture Model-based forward diffusion process, and a context-conditioned reverse diffusion process. Our proposed DiffPose significantly outperforms existing methods on the widely used pose estimation benchmarks Human3.6M and MPI-INF-3DHP. Project page: https://gongjia0208.github.io/Diffpose/.
△ Less
Submitted 9 April, 2023; v1 submitted 30 November, 2022;
originally announced November 2022.
-
Probabilistic Planning with Partially Ordered Preferences over Temporal Goals
Authors:
Hazhar Rahmani,
Abhishek N. Kulkarni,
Jie Fu
Abstract:
In this paper, we study planning in stochastic systems, modeled as Markov decision processes (MDPs), with preferences over temporally extended goals. Prior work on temporal planning with preferences assumes that the user preferences form a total order, meaning that every pair of outcomes are comparable with each other. In this work, we consider the case where the preferences over possible outcomes…
▽ More
In this paper, we study planning in stochastic systems, modeled as Markov decision processes (MDPs), with preferences over temporally extended goals. Prior work on temporal planning with preferences assumes that the user preferences form a total order, meaning that every pair of outcomes are comparable with each other. In this work, we consider the case where the preferences over possible outcomes are a partial order rather than a total order. We first introduce a variant of deterministic finite automaton, referred to as a preference DFA, for specifying the user's preferences over temporally extended goals. Based on the order theory, we translate the preference DFA to a preference relation over policies for probabilistic planning in a labeled MDP. In this treatment, a most preferred policy induces a weak-stochastic nondominated probability distribution over the finite paths in the MDP. The proposed planning algorithm hinges on the construction of a multi-objective MDP. We prove that a weak-stochastic nondominated policy given the preference specification is Pareto-optimal in the constructed multi-objective MDP, and vice versa. Throughout the paper, we employ a running example to demonstrate the proposed preference specification and solution approaches. We show the efficacy of our algorithm using the example with detailed analysis, and then discuss possible future directions.
△ Less
Submitted 7 March, 2023; v1 submitted 25 September, 2022;
originally announced September 2022.
-
Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition
Authors:
Tianjiao Li,
Lin Geng Foo,
Qiuhong Ke,
Hossein Rahmani,
Anran Wang,
Jinghua Wang,
Jun Liu
Abstract:
The goal of fine-grained action recognition is to successfully discriminate between action categories with subtle differences. To tackle this, we derive inspiration from the human visual system which contains specialized regions in the brain that are dedicated towards handling specific tasks. We design a novel Dynamic Spatio-Temporal Specialization (DSTS) module, which consists of specialized neur…
▽ More
The goal of fine-grained action recognition is to successfully discriminate between action categories with subtle differences. To tackle this, we derive inspiration from the human visual system which contains specialized regions in the brain that are dedicated towards handling specific tasks. We design a novel Dynamic Spatio-Temporal Specialization (DSTS) module, which consists of specialized neurons that are only activated for a subset of samples that are highly similar. During training, the loss forces the specialized neurons to learn discriminative fine-grained differences to distinguish between these similar samples, improving fine-grained recognition. Moreover, a spatio-temporal specialization method further optimizes the architectures of the specialized neurons to capture either more spatial or temporal fine-grained information, to better tackle the large range of spatio-temporal variations in the videos. Lastly, we design an Upstream-Downstream Learning algorithm to optimize our model's dynamic decisions during training, improving the performance of our DSTS module. We obtain state-of-the-art performance on two widely-used fine-grained action recognition datasets.
△ Less
Submitted 3 September, 2022;
originally announced September 2022.
-
Towards Confidence-aware Calibrated Recommendation
Authors:
Mohammadmehdi Naghiaei,
Hossein A. Rahmani,
Mohammad Aliannejadi,
Nasim Sonboli
Abstract:
Recommender systems utilize users' historical data to learn and predict their future interests, providing them with suggestions tailored to their tastes. Calibration ensures that the distribution of recommended item categories is consistent with the user's historical data. Mitigating miscalibration brings various benefits to a recommender system. For example, it becomes less likely that a system o…
▽ More
Recommender systems utilize users' historical data to learn and predict their future interests, providing them with suggestions tailored to their tastes. Calibration ensures that the distribution of recommended item categories is consistent with the user's historical data. Mitigating miscalibration brings various benefits to a recommender system. For example, it becomes less likely that a system overlooks categories with less interaction on a user's profile by only recommending popular categories. Despite the notable success, calibration methods have several drawbacks, such as limiting the diversity of the recommended items and not considering the calibration confidence. This work, presents a set of properties that address various aspects of a desired calibrated recommender system. Considering these properties, we propose a confidence-aware optimization-based re-ranking algorithm to find the balance between calibration, relevance, and item diversity, while simultaneously accounting for calibration confidence based on user profile size. Our model outperforms state-of-the-art methods in terms of various accuracy and beyond-accuracy metrics for different user groups.
△ Less
Submitted 22 August, 2022;
originally announced August 2022.
-
IGFormer: Interaction Graph Transformer for Skeleton-based Human Interaction Recognition
Authors:
Yunsheng Pang,
Qiuhong Ke,
Hossein Rahmani,
James Bailey,
Jun Liu
Abstract:
Human interaction recognition is very important in many applications. One crucial cue in recognizing an interaction is the interactive body parts. In this work, we propose a novel Interaction Graph Transformer (IGFormer) network for skeleton-based interaction recognition via modeling the interactive body parts as graphs. More specifically, the proposed IGFormer constructs interaction graphs accord…
▽ More
Human interaction recognition is very important in many applications. One crucial cue in recognizing an interaction is the interactive body parts. In this work, we propose a novel Interaction Graph Transformer (IGFormer) network for skeleton-based interaction recognition via modeling the interactive body parts as graphs. More specifically, the proposed IGFormer constructs interaction graphs according to the semantic and distance correlations between the interactive body parts, and enhances the representation of each person by aggregating the information of the interactive body parts based on the learned graphs. Furthermore, we propose a Semantic Partition Module to transform each human skeleton sequence into a Body-Part-Time sequence to better capture the spatial and temporal information of the skeleton sequence for learning the graphs. Extensive experiments on three benchmark datasets demonstrate that our model outperforms the state-of-the-art with a significant margin.
△ Less
Submitted 25 July, 2022;
originally announced July 2022.
-
Exploring the Impact of Temporal Bias in Point-of-Interest Recommendation
Authors:
Hossein A. Rahmani,
Mohammadmehdi Naghiaei,
Ali Tourani,
Yashar Deldjoo
Abstract:
Recommending appropriate travel destinations to consumers based on contextual information such as their check-in time and location is a primary objective of Point-of-Interest (POI) recommender systems. However, the issue of contextual bias (i.e., how much consumers prefer one situation over another) has received little attention from the research community. This paper examines the effect of tempor…
▽ More
Recommending appropriate travel destinations to consumers based on contextual information such as their check-in time and location is a primary objective of Point-of-Interest (POI) recommender systems. However, the issue of contextual bias (i.e., how much consumers prefer one situation over another) has received little attention from the research community. This paper examines the effect of temporal bias, defined as the difference between users' check-in hours, leisure vs.~work hours, on the consumer-side fairness of context-aware recommendation algorithms. We believe that eliminating this type of temporal (and geographical) bias might contribute to a drop in traffic-related air pollution, noting that rush-hour traffic may be more congested. To surface effective POI recommendations, we evaluated the sensitivity of state-of-the-art context-aware models to the temporal bias contained in users' check-in activities on two POI datasets, namely Gowalla and Yelp. The findings show that the examined context-aware recommendation models prefer one group of users over another based on the time of check-in and that this preference persists even when users have the same amount of interactions.
△ Less
Submitted 23 July, 2022;
originally announced July 2022.
-
ERA: Expert Retrieval and Assembly for Early Action Prediction
Authors:
Lin Geng Foo,
Tianjiao Li,
Hossein Rahmani,
Qiuhong Ke,
Jun Liu
Abstract:
Early action prediction aims to successfully predict the class label of an action before it is completely performed. This is a challenging task because the beginning stages of different actions can be very similar, with only minor subtle differences for discrimination. In this paper, we propose a novel Expert Retrieval and Assembly (ERA) module that retrieves and assembles a set of experts most sp…
▽ More
Early action prediction aims to successfully predict the class label of an action before it is completely performed. This is a challenging task because the beginning stages of different actions can be very similar, with only minor subtle differences for discrimination. In this paper, we propose a novel Expert Retrieval and Assembly (ERA) module that retrieves and assembles a set of experts most specialized at using discriminative subtle differences, to distinguish an input sample from other highly similar samples. To encourage our model to effectively use subtle differences for early action prediction, we push experts to discriminate exclusively between samples that are highly similar, forcing these experts to learn to use subtle differences that exist between those samples. Additionally, we design an effective Expert Learning Rate Optimization method that balances the experts' optimization and leads to better performance. We evaluate our ERA module on four public action datasets and achieve state-of-the-art performance.
△ Less
Submitted 22 July, 2022; v1 submitted 20 July, 2022;
originally announced July 2022.
-
ViralBERT: A User Focused BERT-Based Approach to Virality Prediction
Authors:
Rikaz Rameez,
Hossein A. Rahmani,
Emine Yilmaz
Abstract:
Recently, Twitter has become the social network of choice for sharing and spreading information to a multitude of users through posts called 'tweets'. Users can easily re-share these posts to other users through 'retweets', which allow information to cascade to many more users, increasing its outreach. Clearly, being able to know the extent to which a post can be retweeted has great value in adver…
▽ More
Recently, Twitter has become the social network of choice for sharing and spreading information to a multitude of users through posts called 'tweets'. Users can easily re-share these posts to other users through 'retweets', which allow information to cascade to many more users, increasing its outreach. Clearly, being able to know the extent to which a post can be retweeted has great value in advertising, influencing and other such campaigns. In this paper we propose ViralBERT, which can be used to predict the virality of tweets using content- and user-based features. We employ a method of concatenating numerical features such as hashtags and follower numbers to tweet text, and utilise two BERT modules: one for semantic representation of the combined text and numerical features, and another module purely for sentiment analysis of text, as both the information within text and it's ability to elicit an emotional response play a part in retweet proneness. We collect a dataset of 330k tweets to train ViralBERT and validate the efficacy of our model using baselines from current studies in this field. Our experiments show that our approach outperforms these baselines, with a 13% increase in both F1 Score and Accuracy compared to the best performing baseline method. We then undergo an ablation study to investigate the importance of chosen features, finding that text sentiment and follower counts, and to a lesser extent mentions and following counts, are the strongest features for the model, and that hashtag counts are detrimental to the model.
△ Less
Submitted 17 May, 2022;
originally announced June 2022.
-
Experiments on Generalizability of User-Oriented Fairness in Recommender Systems
Authors:
Hossein A. Rahmani,
Mohammadmehdi Naghiaei,
Mahdi Dehghan,
Mohammad Aliannejadi
Abstract:
Recent work in recommender systems mainly focuses on fairness in recommendations as an important aspect of measuring recommendations quality. A fairness-aware recommender system aims to treat different user groups similarly. Relevant work on user-oriented fairness highlights the discriminative behavior of fairness-unaware recommendation algorithms towards a certain user group, defined based on use…
▽ More
Recent work in recommender systems mainly focuses on fairness in recommendations as an important aspect of measuring recommendations quality. A fairness-aware recommender system aims to treat different user groups similarly. Relevant work on user-oriented fairness highlights the discriminative behavior of fairness-unaware recommendation algorithms towards a certain user group, defined based on users' activity level. Typical solutions include proposing a user-centered fairness re-ranking framework applied on top of a base ranking model to mitigate its unfair behavior towards a certain user group i.e., disadvantaged group. In this paper, we re-produce a user-oriented fairness study and provide extensive experiments to analyze the dependency of their proposed method on various fairness and recommendation aspects, including the recommendation domain, nature of the base ranking model, and user grouping method. Moreover, we evaluate the final recommendations provided by the re-ranking framework from both user- (e.g., NDCG, user-fairness) and item-side (e.g., novelty, item-fairness) metrics. We discover interesting trends and trade-offs between the model's performance in terms of different evaluation metrics. For instance, we see that the definition of the advantaged/disadvantaged user groups plays a crucial role in the effectiveness of the fairness algorithm and how it improves the performance of specific base ranking models. Finally, we highlight some important open challenges and future directions in this field. We release the data, evaluation pipeline, and the trained models publicly on https://github.com/rahmanidashti/FairRecSys.
△ Less
Submitted 17 May, 2022;
originally announced May 2022.
-
CPFair: Personalized Consumer and Producer Fairness Re-ranking for Recommender Systems
Authors:
Mohammadmehdi Naghiaei,
Hossein A. Rahmani,
Yashar Deldjoo
Abstract:
Recently, there has been a rising awareness that when machine learning (ML) algorithms are used to automate choices, they may treat/affect individuals unfairly, with legal, ethical, or economic consequences. Recommender systems are prominent examples of such ML systems that assist users in making high-stakes judgments. A common trend in the previous literature research on fairness in recommender s…
▽ More
Recently, there has been a rising awareness that when machine learning (ML) algorithms are used to automate choices, they may treat/affect individuals unfairly, with legal, ethical, or economic consequences. Recommender systems are prominent examples of such ML systems that assist users in making high-stakes judgments. A common trend in the previous literature research on fairness in recommender systems is that the majority of works treat user and item fairness concerns separately, ignoring the fact that recommender systems operate in a two-sided marketplace. In this work, we present an optimization-based re-ranking approach that seamlessly integrates fairness constraints from both the consumer and producer-side in a joint objective framework. We demonstrate through large-scale experiments on 8 datasets that our proposed method is capable of improving both consumer and producer fairness without reducing overall recommendation quality, demonstrating the role algorithms may play in minimizing data biases.
△ Less
Submitted 17 April, 2022;
originally announced April 2022.
-
The Unfairness of Popularity Bias in Book Recommendation
Authors:
Mohammadmehdi Naghiaei,
Hossein A. Rahmani,
Mahdi Dehghan
Abstract:
Recent studies have shown that recommendation systems commonly suffer from popularity bias. Popularity bias refers to the problem that popular items (i.e., frequently rated items) are recommended frequently while less popular items are recommended rarely or not at all. Researchers adopted two approaches to examining popularity bias: (i) from the users' perspective, by analyzing how far a recommend…
▽ More
Recent studies have shown that recommendation systems commonly suffer from popularity bias. Popularity bias refers to the problem that popular items (i.e., frequently rated items) are recommended frequently while less popular items are recommended rarely or not at all. Researchers adopted two approaches to examining popularity bias: (i) from the users' perspective, by analyzing how far a recommendation system deviates from user's expectations in receiving popular items, and (ii) by analyzing the amount of exposure that long-tail items receive, measured by overall catalog coverage and novelty. In this paper, we examine the first point of view in the book domain, although the findings may be applied to other domains as well. To this end, we analyze the well-known Book-Crossing dataset and define three user groups based on their tendency towards popular items (i.e., Niche, Diverse, Bestseller-focused). Further, we evaluate the performance of nine state-of-the-art recommendation algorithms and two baselines (i.e., Random, MostPop) from both the accuracy (e.g., NDCG, Precision, Recall) and popularity bias perspectives. Our results indicate that most state-of-the-art recommendation algorithms suffer from popularity bias in the book domain, and fail to meet users' expectations with Niche and Diverse tastes despite having a larger profile size. Conversely, Bestseller-focused users are more likely to receive high-quality recommendations, both in terms of fairness and personalization. Furthermore, our study shows a tradeoff between personalization and unfairness of popularity bias in recommendation algorithms for users belonging to the Diverse and Bestseller groups, that is, algorithms with high capability of personalization suffer from the unfairness of popularity bias.
△ Less
Submitted 27 February, 2022;
originally announced February 2022.
-
The Unfairness of Active Users and Popularity Bias in Point-of-Interest Recommendation
Authors:
Hossein A. Rahmani,
Yashar Deldjoo,
Ali Tourani,
Mohammadmehdi Naghiaei
Abstract:
Point-of-Interest (POI) recommender systems provide personalized recommendations to users and help businesses attract potential customers. Despite their success, recent studies suggest that highly data-driven recommendations could be impacted by data biases, resulting in unfair outcomes for different stakeholders, mainly consumers (users) and providers (items). Most existing fairness-related resea…
▽ More
Point-of-Interest (POI) recommender systems provide personalized recommendations to users and help businesses attract potential customers. Despite their success, recent studies suggest that highly data-driven recommendations could be impacted by data biases, resulting in unfair outcomes for different stakeholders, mainly consumers (users) and providers (items). Most existing fairness-related research works in recommender systems treat user fairness and item fairness issues individually, disregarding that RS work in a two-sided marketplace. This paper studies the interplay between (i) the unfairness of active users, (ii) the unfairness of popular items, and (iii) the accuracy (personalization) of recommendation as three angles of our study triangle. We group users into advantaged and disadvantaged levels to measure user fairness based on their activity level. For item fairness, we divide items into short-head, mid-tail, and long-tail groups and study the exposure of these item groups into the top-k recommendation list of users. Experimental validation of eight different recommendation models commonly used for POI recommendation (e.g., contextual, CF) on two publicly available POI recommendation datasets, Gowalla and Yelp, indicate that most well-performing models suffer seriously from the unfairness of popularity bias (provider unfairness). Furthermore, our study shows that most recommendation models cannot satisfy both consumer and producer fairness, indicating a trade-off between these variables possibly due to natural biases in data. We choose the POI recommendation as our test scenario; however, the insights should be trivially extendable on other domains.
△ Less
Submitted 8 April, 2022; v1 submitted 27 February, 2022;
originally announced February 2022.
-
A Systematic Analysis on the Impact of Contextual Information on Point-of-Interest Recommendation
Authors:
Hossein A. Rahmani,
Mohammad Aliannejadi,
Mitra Baratchi,
Fabio Crestani
Abstract:
As the popularity of Location-based Social Networks (LBSNs) increases, designing accurate models for Point-of-Interest (POI) recommendation receives more attention. POI recommendation is often performed by incorporating contextual information into previously designed recommendation algorithms. Some of the major contextual information that has been considered in POI recommendation are the location…
▽ More
As the popularity of Location-based Social Networks (LBSNs) increases, designing accurate models for Point-of-Interest (POI) recommendation receives more attention. POI recommendation is often performed by incorporating contextual information into previously designed recommendation algorithms. Some of the major contextual information that has been considered in POI recommendation are the location attributes (i.e., exact coordinates of a location, category, and check-in time), the user attributes (i.e., comments, reviews, tips, and check-in made to the locations), and other information, such as the distance of the POI from user's main activity location, and the social tie between users. The right selection of such factors can significantly impact the performance of the POI recommendation. However, previous research does not consider the impact of the combination of these different factors. In this paper, we propose different contextual models and analyze the fusion of different major contextual information in POI recommendation. The major contributions of this paper are: (i) providing an extensive survey of context-aware location recommendation (ii) quantifying and analyzing the impact of different contextual information (e.g., social, temporal, spatial, and categorical) in the POI recommendation on available baselines and two new linear and non-linear models, that can incorporate all the major contextual information into a single recommendation model, and (iii) evaluating the considered models using two well-known real-world datasets. Our results indicate that while modeling geographical and temporal influences can improve recommendation quality, fusing all other contextual information into a recommendation model is not always the best strategy.
△ Less
Submitted 20 January, 2022;
originally announced January 2022.
-
Leveraging Social Influence based on Users Activity Centers for Point-of-Interest Recommendation
Authors:
Kosar Seyedhoseinzadeh,
Hossein A. Rahmani,
Mohsen Afsharchi,
Mohammad Aliannejadi
Abstract:
Recommender Systems (RSs) aim to model and predict the user preference while interacting with items, such as Points of Interest (POIs). These systems face several challenges, such as data sparsity, limiting their effectiveness. In this paper, we address this problem by incorporating social, geographical, and temporal information into the Matrix Factorization (MF) technique. To this end, we model s…
▽ More
Recommender Systems (RSs) aim to model and predict the user preference while interacting with items, such as Points of Interest (POIs). These systems face several challenges, such as data sparsity, limiting their effectiveness. In this paper, we address this problem by incorporating social, geographical, and temporal information into the Matrix Factorization (MF) technique. To this end, we model social influence based on two factors: similarities between users in terms of common check-ins and the friendships between them. We introduce two levels of friendship based on explicit friendship networks and high check-in overlap between users. We base our friendship algorithm on users' geographical activity centers. The results show that our proposed model outperforms the state-of-the-art on two real-world datasets. More specifically, our ablation study shows that the social model improves the performance of our proposed POI recommendation system by 31% and 14% on the Gowalla and Yelp datasets in terms of Precision@10, respectively.
△ Less
Submitted 10 January, 2022;
originally announced January 2022.
-
Demographic Biases of Crowd Workers in Key Opinion Leaders Finding
Authors:
Hossein A. Rahmani,
Jie Yang
Abstract:
Key Opinion Leaders (KOLs) are people that have a strong influence and their opinions are listened to by people when making important decisions. Crowdsourcing provides an efficient and cost-effective means to gather data for the KOL finding task. However, data collected through crowdsourcing is affected by the inherent demographic biases of crowd workers. To avoid such demographic biases, we need…
▽ More
Key Opinion Leaders (KOLs) are people that have a strong influence and their opinions are listened to by people when making important decisions. Crowdsourcing provides an efficient and cost-effective means to gather data for the KOL finding task. However, data collected through crowdsourcing is affected by the inherent demographic biases of crowd workers. To avoid such demographic biases, we need to measure how biased each crowd worker is. In this paper, we propose a simple yet effective approach based on demographic information of candidate KOLs and their counterfactual value. We argue that it is effectiveness because of the extra information that we can consider together with labeled data to curate a less biased dataset.
△ Less
Submitted 19 October, 2021; v1 submitted 18 October, 2021;
originally announced October 2021.
-
Recent Advances of Continual Learning in Computer Vision: An Overview
Authors:
Haoxuan Qu,
Hossein Rahmani,
Li Xu,
Bryan Williams,
Jun Liu
Abstract:
In contrast to batch learning where all training data is available at once, continual learning represents a family of methods that accumulate knowledge and learn continuously with data available in sequential order. Similar to the human learning process with the ability of learning, fusing, and accumulating new knowledge coming at different time steps, continual learning is considered to have high…
▽ More
In contrast to batch learning where all training data is available at once, continual learning represents a family of methods that accumulate knowledge and learn continuously with data available in sequential order. Similar to the human learning process with the ability of learning, fusing, and accumulating new knowledge coming at different time steps, continual learning is considered to have high practical significance. Hence, continual learning has been studied in various artificial intelligence tasks. In this paper, we present a comprehensive review of the recent progress of continual learning in computer vision. In particular, the works are grouped by their representative techniques, including regularization, knowledge distillation, memory, generative replay, parameter isolation, and a combination of the above techniques. For each category of these techniques, both its characteristics and applications in computer vision are presented. At the end of this overview, several subareas, where continuous knowledge accumulation is potentially helpful while continual learning has not been well studied, are discussed.
△ Less
Submitted 30 November, 2023; v1 submitted 23 September, 2021;
originally announced September 2021.
-
The Multi-Modal Video Reasoning and Analyzing Competition
Authors:
Haoran Peng,
He Huang,
Li Xu,
Tianjiao Li,
Jun Liu,
Hossein Rahmani,
Qiuhong Ke,
Zhicheng Guo,
Cong Wu,
Rongchang Li,
Mang Ye,
Jiahao Wang,
Jiaxu Zhang,
Yuanzhong Liu,
Tao He,
Fuwei Zhang,
Xianbin Liu,
Tao Lin
Abstract:
In this paper, we introduce the Multi-Modal Video Reasoning and Analyzing Competition (MMVRAC) workshop in conjunction with ICCV 2021. This competition is composed of four different tracks, namely, video question answering, skeleton-based action recognition, fisheye video-based action recognition, and person re-identification, which are based on two datasets: SUTD-TrafficQA and UAV-Human. We summa…
▽ More
In this paper, we introduce the Multi-Modal Video Reasoning and Analyzing Competition (MMVRAC) workshop in conjunction with ICCV 2021. This competition is composed of four different tracks, namely, video question answering, skeleton-based action recognition, fisheye video-based action recognition, and person re-identification, which are based on two datasets: SUTD-TrafficQA and UAV-Human. We summarize the top-performing methods submitted by the participants in this competition and show their results achieved in the competition.
△ Less
Submitted 18 August, 2021;
originally announced August 2021.