subscribe to arXiv mailings

Conversational Topic Recommendation in Counseling and Psychotherapy with Decision Transformer and Large Language Models

Authors: Aylin Gunal, Baihan Lin, Djallel Bouneffouf

Abstract: Given the increasing demand for mental health assistance, artificial intelligence (AI), particularly large language models (LLMs), may be valuable for integration into automated clinical support systems. In this work, we leverage a decision transformer architecture for topic recommendation in counseling conversations between patients and mental health professionals. The architecture is utilized fo… ▽ More Given the increasing demand for mental health assistance, artificial intelligence (AI), particularly large language models (LLMs), may be valuable for integration into automated clinical support systems. In this work, we leverage a decision transformer architecture for topic recommendation in counseling conversations between patients and mental health professionals. The architecture is utilized for offline reinforcement learning, and we extract states (dialogue turn embeddings), actions (conversation topics), and rewards (scores measuring the alignment between patient and therapist) from previous turns within a conversation to train a decision transformer model. We demonstrate an improvement over baseline reinforcement learning methods, and propose a novel system of utilizing our model's output as synthetic labels for fine-tuning a large language model for the same task. Although our implementation based on LLaMA-2 7B has mixed results, future work can undoubtedly build on the design. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: 5 pages excluding references, 3 figures; accepted at Clinical NLP Workshop @ NAACL 2024

arXiv:2403.12805 [pdf, other]

Contextual Moral Value Alignment Through Context-Based Aggregation

Authors: Pierre Dognin, Jesus Rios, Ronny Luss, Inkit Padhi, Matthew D Riemer, Miao Liu, Prasanna Sattigeri, Manish Nagireddy, Kush R. Varshney, Djallel Bouneffouf

Abstract: Developing value-aligned AI agents is a complex undertaking and an ongoing challenge in the field of AI. Specifically within the domain of Large Language Models (LLMs), the capability to consolidate multiple independently trained dialogue agents, each aligned with a distinct moral value, into a unified system that can adapt to and be aligned with multiple moral values is of paramount importance. I… ▽ More Developing value-aligned AI agents is a complex undertaking and an ongoing challenge in the field of AI. Specifically within the domain of Large Language Models (LLMs), the capability to consolidate multiple independently trained dialogue agents, each aligned with a distinct moral value, into a unified system that can adapt to and be aligned with multiple moral values is of paramount importance. In this paper, we propose a system that does contextual moral value alignment based on contextual aggregation. Here, aggregation is defined as the process of integrating a subset of LLM responses that are best suited to respond to a user input, taking into account features extracted from the user's input. The proposed system shows better results in term of alignment to human value compared to the state of the art. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.09704 [pdf, other]

Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations

Authors: Swapnaja Achintalwar, Ioana Baldini, Djallel Bouneffouf, Joan Byamugisha, Maria Chang, Pierre Dognin, Eitan Farchi, Ndivhuwo Makondo, Aleksandra Mojsilovic, Manish Nagireddy, Karthikeyan Natesan Ramamurthy, Inkit Padhi, Orna Raz, Jesus Rios, Prasanna Sattigeri, Moninder Singh, Siphiwe Thwala, Rosario A. Uceda-Sosa, Kush R. Varshney

Abstract: The alignment of large language models is usually done by model providers to add or control behaviors that are common or universally understood across use cases and contexts. In contrast, in this article, we present an approach and architecture that empowers application developers to tune a model to their particular values, social norms, laws and other regulations, and orchestrate between potentia… ▽ More The alignment of large language models is usually done by model providers to add or control behaviors that are common or universally understood across use cases and contexts. In contrast, in this article, we present an approach and architecture that empowers application developers to tune a model to their particular values, social norms, laws and other regulations, and orchestrate between potentially conflicting requirements in context. We lay out three main components of such an Alignment Studio architecture: Framers, Instructors, and Auditors that work in concert to control the behavior of a language model. We illustrate this approach with a running example of aligning a company's internal-facing enterprise chatbot to its business conduct guidelines. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: 7 pages, 5 figures

arXiv:2403.06009 [pdf, other]

Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations

Authors: Swapnaja Achintalwar, Adriana Alvarado Garcia, Ateret Anaby-Tavor, Ioana Baldini, Sara E. Berger, Bishwaranjan Bhattacharjee, Djallel Bouneffouf, Subhajit Chaudhury, Pin-Yu Chen, Lamogha Chiazor, Elizabeth M. Daly, Kirushikesh DB, Rogério Abreu de Paula, Pierre Dognin, Eitan Farchi, Soumya Ghosh, Michael Hind, Raya Horesh, George Kour, Ja Young Lee, Nishtha Madaan, Sameep Mehta, Erik Miehling, Keerthiram Murugesan, Manish Nagireddy , et al. (13 additional authors not shown)

Abstract: Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations. Due to several limiting factors surrounding LLMs (training cost, API access, data availability, etc.), it may not always be feasible to impose direct safety constraints on a deployed model. Therefore, an efficient and reliable alternative is required. To this end, we presen… ▽ More Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations. Due to several limiting factors surrounding LLMs (training cost, API access, data availability, etc.), it may not always be feasible to impose direct safety constraints on a deployed model. Therefore, an efficient and reliable alternative is required. To this end, we present our ongoing efforts to create and deploy a library of detectors: compact and easy-to-build classification models that provide labels for various harms. In addition to the detectors themselves, we discuss a wide range of uses for these detector models - from acting as guardrails to enabling effective AI governance. We also deep dive into inherent challenges in their development and discuss future work aimed at making the detectors more reliable and broadening their scope. △ Less

Submitted 13 June, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

arXiv:2402.14701 [pdf, other]

COMPASS: Computational Mapping of Patient-Therapist Alliance Strategies with Language Modeling

Authors: Baihan Lin, Djallel Bouneffouf, Yulia Landa, Rachel Jespersen, Cheryl Corcoran, Guillermo Cecchi

Abstract: The therapeutic working alliance is a critical factor in predicting the success of psychotherapy treatment. Traditionally, working alliance assessment relies on questionnaires completed by both therapists and patients. In this paper, we present COMPASS, a novel framework to directly infer the therapeutic working alliance from the natural language used in psychotherapy sessions. Our approach utiliz… ▽ More The therapeutic working alliance is a critical factor in predicting the success of psychotherapy treatment. Traditionally, working alliance assessment relies on questionnaires completed by both therapists and patients. In this paper, we present COMPASS, a novel framework to directly infer the therapeutic working alliance from the natural language used in psychotherapy sessions. Our approach utilizes advanced large language models to analyze transcripts of psychotherapy sessions and compare them with distributed representations of statements in the working alliance inventory. Analyzing a dataset of over 950 sessions covering diverse psychiatric conditions, we demonstrate the effectiveness of our method in microscopically mapping patient-therapist alignment trajectories and providing interpretability for clinical psychiatry and in identifying emerging patterns related to the condition being treated. By employing various neural topic modeling techniques in combination with generative language prompting, we analyze the topical characteristics of different psychiatric conditions and incorporate temporal modeling to capture the evolution of topics at a turn-level resolution. This combined framework enhances the understanding of therapeutic interactions, enabling timely feedback for therapists regarding conversation quality and providing interpretable insights to improve the effectiveness of psychotherapy. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: This work extends our research series in computational psychiatry (e.g auto annotation in arXiv:2204.05522, topic extraction in arXiv:2204.10189, and diagnosis in arXiv:2210.15603) with the introduction of LLMs to complete the full cycle of interpreting and understanding psychotherapy strategies as a comprehensive analytical framework

arXiv:2306.10050 [pdf, other]

Interpolating Item and User Fairness in Multi-Sided Recommendations

Authors: Qinyi Chen, Jason Cheuk Nam Liang, Negin Golrezaei, Djallel Bouneffouf

Abstract: Today's online platforms heavily lean on algorithmic recommendations for bolstering user engagement and driving revenue. However, these recommendations can impact multiple stakeholders simultaneously -- the platform, items (sellers), and users (customers) -- each with their unique objectives, making it difficult to find the right middle ground that accommodates all stakeholders. To address this, w… ▽ More Today's online platforms heavily lean on algorithmic recommendations for bolstering user engagement and driving revenue. However, these recommendations can impact multiple stakeholders simultaneously -- the platform, items (sellers), and users (customers) -- each with their unique objectives, making it difficult to find the right middle ground that accommodates all stakeholders. To address this, we introduce a novel fair recommendation framework, Problem (FAIR), that flexibly balances multi-stakeholder interests via a constrained optimization formulation. We next explore Problem (FAIR) in a dynamic online setting where data uncertainty further adds complexity, and propose a low-regret algorithm FORM that concurrently performs real-time learning and fair recommendations, two tasks that are often at odds. Via both theoretical analysis and a numerical case study on real-world data, we demonstrate the efficacy of our framework and method in maintaining platform revenue while ensuring desired levels of fairness for both items and users. △ Less

Submitted 25 May, 2024; v1 submitted 12 June, 2023; originally announced June 2023.

arXiv:2306.03902 [pdf, other]

Utterance Classification with Logical Neural Network: Explainable AI for Mental Disorder Diagnosis

Authors: Yeldar Toleubay, Don Joven Agravante, Daiki Kimura, Baihan Lin, Djallel Bouneffouf, Michiaki Tatsubori

Abstract: In response to the global challenge of mental health problems, we proposes a Logical Neural Network (LNN) based Neuro-Symbolic AI method for the diagnosis of mental disorders. Due to the lack of effective therapy coverage for mental disorders, there is a need for an AI solution that can assist therapists with the diagnosis. However, current Neural Network models lack explainability and may not be… ▽ More In response to the global challenge of mental health problems, we proposes a Logical Neural Network (LNN) based Neuro-Symbolic AI method for the diagnosis of mental disorders. Due to the lack of effective therapy coverage for mental disorders, there is a need for an AI solution that can assist therapists with the diagnosis. However, current Neural Network models lack explainability and may not be trusted by therapists. The LNN is a Recurrent Neural Network architecture that combines the learning capabilities of neural networks with the reasoning capabilities of classical logic-based AI. The proposed system uses input predicates from clinical interviews to output a mental disorder class, and different predicate pruning techniques are used to achieve scalability and higher scores. In addition, we provide an insight extraction method to aid therapists with their diagnosis. The proposed system addresses the lack of explainability of current Neural Network models and provides a more trustworthy solution for mental disorder diagnosis. △ Less

Submitted 6 June, 2023; originally announced June 2023.

Comments: ACL 2023

arXiv:2304.00416 [pdf, other]

Towards Healthy AI: Large Language Models Need Therapists Too

Authors: Baihan Lin, Djallel Bouneffouf, Guillermo Cecchi, Kush R. Varshney

Abstract: Recent advances in large language models (LLMs) have led to the development of powerful AI chatbots capable of engaging in natural and human-like conversations. However, these chatbots can be potentially harmful, exhibiting manipulative, gaslighting, and narcissistic behaviors. We define Healthy AI to be safe, trustworthy and ethical. To create healthy AI systems, we present the SafeguardGPT frame… ▽ More Recent advances in large language models (LLMs) have led to the development of powerful AI chatbots capable of engaging in natural and human-like conversations. However, these chatbots can be potentially harmful, exhibiting manipulative, gaslighting, and narcissistic behaviors. We define Healthy AI to be safe, trustworthy and ethical. To create healthy AI systems, we present the SafeguardGPT framework that uses psychotherapy to correct for these harmful behaviors in AI chatbots. The framework involves four types of AI agents: a Chatbot, a "User," a "Therapist," and a "Critic." We demonstrate the effectiveness of SafeguardGPT through a working example of simulating a social conversation. Our results show that the framework can improve the quality of conversations between AI chatbots and humans. Although there are still several challenges and directions to be addressed in the future, SafeguardGPT provides a promising approach to improving the alignment between AI chatbots and human values. By incorporating psychotherapy and reinforcement learning techniques, the framework enables AI chatbots to learn and adapt to human preferences and values in a safe and ethical way, contributing to the development of a more human-centric and responsible AI. △ Less

Submitted 1 April, 2023; originally announced April 2023.

arXiv:2303.09601 [pdf, other]

Psychotherapy AI Companion with Reinforcement Learning Recommendations and Interpretable Policy Dynamics

Authors: Baihan Lin, Guillermo Cecchi, Djallel Bouneffouf

Abstract: We introduce a Reinforcement Learning Psychotherapy AI Companion that generates topic recommendations for therapists based on patient responses. The system uses Deep Reinforcement Learning (DRL) to generate multi-objective policies for four different psychiatric conditions: anxiety, depression, schizophrenia, and suicidal cases. We present our experimental results on the accuracy of recommended to… ▽ More We introduce a Reinforcement Learning Psychotherapy AI Companion that generates topic recommendations for therapists based on patient responses. The system uses Deep Reinforcement Learning (DRL) to generate multi-objective policies for four different psychiatric conditions: anxiety, depression, schizophrenia, and suicidal cases. We present our experimental results on the accuracy of recommended topics using three different scales of working alliance ratings: task, bond, and goal. We show that the system is able to capture the real data (historical topics discussed by the therapists) relatively well, and that the best performing models vary by disorder and rating scale. To gain interpretable insights into the learned policies, we visualize policy trajectories in a 2D principal component analysis space and transition matrices. These visualizations reveal distinct patterns in the policies trained with different reward signals and trained on different clinical diagnoses. Our system's success in generating DIsorder-Specific Multi-Objective Policies (DISMOP) and interpretable policy dynamics demonstrates the potential of DRL in providing personalized and efficient therapeutic recommendations. △ Less

Submitted 16 March, 2023; originally announced March 2023.

Comments: WWW 2023. This work supersede our prior work arxiv:2208.13077 by studying the interpretability of RL-based therapy agents with policy visualizations

arXiv:2302.10845 [pdf, other]

TherapyView: Visualizing Therapy Sessions with Temporal Topic Modeling and AI-Generated Arts

Authors: Baihan Lin, Stefan Zecevic, Djallel Bouneffouf, Guillermo Cecchi

Abstract: We present the TherapyView, a demonstration system to help therapists visualize the dynamic contents of past treatment sessions, enabled by the state-of-the-art neural topic modeling techniques to analyze the topical tendencies of various psychiatric conditions and deep learning-based image generation engine to provide a visual summary. The system incorporates temporal modeling to provide a time-s… ▽ More We present the TherapyView, a demonstration system to help therapists visualize the dynamic contents of past treatment sessions, enabled by the state-of-the-art neural topic modeling techniques to analyze the topical tendencies of various psychiatric conditions and deep learning-based image generation engine to provide a visual summary. The system incorporates temporal modeling to provide a time-series representation of topic similarities at a turn-level resolution and AI-generated artworks given the dialogue segments to provide a concise representations of the contents covered in the session, offering interpretable insights for therapists to optimize their strategies and enhance the effectiveness of psychotherapy. This system provides a proof of concept of AI-augmented therapy tools with e in-depth understanding of the patient's mental state and enabling more effective treatment. △ Less

Submitted 21 February, 2023; originally announced February 2023.

Comments: This work extends our prior empirical work on topic modeling (arxiv:2204.10189) to now provide an interpretable and interactive data visualization platform with AI-generated artworks as a concrete user scenario for therapists

arXiv:2302.01067 [pdf, other]

A Survey on Compositional Generalization in Applications

Authors: Baihan Lin, Djallel Bouneffouf, Irina Rish

Abstract: The field of compositional generalization is currently experiencing a renaissance in AI, as novel problem settings and algorithms motivated by various practical applications are being introduced, building on top of the classical compositional generalization problem. This article aims to provide a comprehensive review of top recent developments in multiple real-life applications of the compositiona… ▽ More The field of compositional generalization is currently experiencing a renaissance in AI, as novel problem settings and algorithms motivated by various practical applications are being introduced, building on top of the classical compositional generalization problem. This article aims to provide a comprehensive review of top recent developments in multiple real-life applications of the compositional generalization. Specifically, we introduce a taxonomy of common applications and summarize the state-of-the-art for each of those domains. Furthermore, we identify important current trends and provide new perspectives pertaining to the future of this burgeoning field. △ Less

Submitted 2 February, 2023; originally announced February 2023.

arXiv:2210.16386 [pdf, other]

Non-Stationary Bandits with Auto-Regressive Temporal Dependency

Authors: Qinyi Chen, Negin Golrezaei, Djallel Bouneffouf

Abstract: Traditional multi-armed bandit (MAB) frameworks, predominantly examined under stochastic or adversarial settings, often overlook the temporal dynamics inherent in many real-world applications such as recommendation systems and online advertising. This paper introduces a novel non-stationary MAB framework that captures the temporal structure of these real-world dynamics through an auto-regressive (… ▽ More Traditional multi-armed bandit (MAB) frameworks, predominantly examined under stochastic or adversarial settings, often overlook the temporal dynamics inherent in many real-world applications such as recommendation systems and online advertising. This paper introduces a novel non-stationary MAB framework that captures the temporal structure of these real-world dynamics through an auto-regressive (AR) reward structure. We propose an algorithm that integrates two key mechanisms: (i) an alternation mechanism adept at leveraging temporal dependencies to dynamically balance exploration and exploitation, and (ii) a restarting mechanism designed to discard out-of-date information. Our algorithm achieves a regret upper bound that nearly matches the lower bound, with regret measured against a robust dynamic benchmark. Finally, via a real-world case study on tourism demand prediction, we demonstrate both the efficacy of our algorithm and the broader applicability of our techniques to more complex, rapidly evolving time series. △ Less

Submitted 12 December, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

Comments: 45 pages, 8 figures

arXiv:2210.15603 [pdf, other]

Working Alliance Transformer for Psychotherapy Dialogue Classification

Authors: Baihan Lin, Guillermo Cecchi, Djallel Bouneffouf

Abstract: As a predictive measure of the treatment outcome in psychotherapy, the working alliance measures the agreement of the patient and the therapist in terms of their bond, task and goal. Long been a clinical quantity estimated by the patients' and therapists' self-evaluative reports, we believe that the working alliance can be better characterized using natural language processing technique directly i… ▽ More As a predictive measure of the treatment outcome in psychotherapy, the working alliance measures the agreement of the patient and the therapist in terms of their bond, task and goal. Long been a clinical quantity estimated by the patients' and therapists' self-evaluative reports, we believe that the working alliance can be better characterized using natural language processing technique directly in the dialogue transcribed in each therapy session. In this work, we propose the Working Alliance Transformer (WAT), a Transformer-based classification model that has a psychological state encoder which infers the working alliance scores by projecting the embedding of the dialogues turns onto the embedding space of the clinical inventory for working alliance. We evaluate our method in a real-world dataset with over 950 therapy sessions with anxiety, depression, schizophrenia and suicidal patients and demonstrate an empirical advantage of using information about the therapeutic states in this sequence classification task of psychotherapy dialogues. △ Less

Submitted 27 October, 2022; originally announced October 2022.

arXiv:2209.12618 [pdf, ps, other]

Survey on Applications of Neurosymbolic Artificial Intelligence

Authors: Djallel Bouneffouf, Charu C. Aggarwal

Abstract: In recent years, the Neurosymbolic framework has attracted a lot of attention in various applications, from recommender systems and information retrieval to healthcare and finance. This success is due to its stellar performance combined with attractive properties, such as learning and reasoning. The new emerging Neurosymbolic field is currently experiencing a renaissance, as novel frameworks and a… ▽ More In recent years, the Neurosymbolic framework has attracted a lot of attention in various applications, from recommender systems and information retrieval to healthcare and finance. This success is due to its stellar performance combined with attractive properties, such as learning and reasoning. The new emerging Neurosymbolic field is currently experiencing a renaissance, as novel frameworks and algorithms motivated by various practical applications are being introduced, building on top of the classical neural and reasoning problem setting. This article aims to provide a comprehensive review of significant recent developments in real-world applications of Neurosymbolic Artificial Intelligence. Specifically, we introduce a taxonomy of common Neurosymbolic applications and summarize the state-of-the-art for each of those domains. Furthermore, we identify important current trends and provide new perspectives pertaining to the future of this burgeoning field. △ Less

Submitted 8 September, 2022; originally announced September 2022.

arXiv:2208.13077 [pdf, other]

SupervisorBot: NLP-Annotated Real-Time Recommendations of Psychotherapy Treatment Strategies with Deep Reinforcement Learning

Authors: Baihan Lin, Guillermo Cecchi, Djallel Bouneffouf

Abstract: We propose a recommendation system that suggests treatment strategies to a therapist during the psychotherapy session in real-time. Our system uses a turn-level rating mechanism that predicts the therapeutic outcome by computing a similarity score between the deep embedding of a scoring inventory, and the current sentence that the patient is speaking. The system automatically transcribes a continu… ▽ More We propose a recommendation system that suggests treatment strategies to a therapist during the psychotherapy session in real-time. Our system uses a turn-level rating mechanism that predicts the therapeutic outcome by computing a similarity score between the deep embedding of a scoring inventory, and the current sentence that the patient is speaking. The system automatically transcribes a continuous audio stream and separates it into turns of the patient and of the therapist and perform real-time inference of their therapeutic working alliance. The dialogue pairs along with their computed working alliance as ratings are then fed into a deep reinforcement learning recommendation system where the sessions are treated as users and the topics are treated as items. Other than evaluating the empirical advantages of the core components on an existing dataset of psychotherapy sessions, we demonstrate the effectiveness of this system in a web app. △ Less

Submitted 29 October, 2022; v1 submitted 27 August, 2022; originally announced August 2022.

Comments: This work extends our work series in interactive speech or text systems for psychotherapy (e.g. arXiv:2006.04376, arXiv:2204.05522 and arXiv:2204.10189) and proposes a novel recommendation setting

arXiv:2208.10627 [pdf, other]

Targeted Advertising on Social Networks Using Online Variational Tensor Regression

Authors: Tsuyoshi Idé, Keerthiram Murugesan, Djallel Bouneffouf, Naoki Abe

Abstract: This paper is concerned with online targeted advertising on social networks. The main technical task we address is to estimate the activation probability for user pairs, which quantifies the influence one user may have on another towards purchasing decisions. This is a challenging task because one marketing episode typically involves a multitude of marketing campaigns/strategies of different produ… ▽ More This paper is concerned with online targeted advertising on social networks. The main technical task we address is to estimate the activation probability for user pairs, which quantifies the influence one user may have on another towards purchasing decisions. This is a challenging task because one marketing episode typically involves a multitude of marketing campaigns/strategies of different products for highly diverse customers. In this paper, we propose what we believe is the first tensor-based contextual bandit framework for online targeted advertising. The proposed framework is designed to accommodate any number of feature vectors in the form of multi-mode tensor, thereby enabling to capture the heterogeneity that may exist over user preferences, products, and campaign strategies in a unified manner. To handle inter-dependency of tensor modes, we introduce an online variational algorithm with a mean-field approximation. We empirically confirm that the proposed TensorUCB algorithm achieves a significant improvement in influence maximization tasks over the benchmarks, which is attributable to its capability of capturing the user-product heterogeneity. △ Less

Submitted 9 October, 2022; v1 submitted 22 August, 2022; originally announced August 2022.

Comments: 18 pages, 7 figures

MSC Class: 68T05

arXiv:2204.10189 [pdf, other]

Neural Topic Modeling of Psychotherapy Sessions

Authors: Baihan Lin, Djallel Bouneffouf, Guillermo Cecchi, Ravi Tejwani

Abstract: In this work, we compare different neural topic modeling methods in learning the topical propensities of different psychiatric conditions from the psychotherapy session transcripts parsed from speech recordings. We also incorporate temporal modeling to put this additional interpretability to action by parsing out topic similarities as a time series in a turn-level resolution. We believe this topic… ▽ More In this work, we compare different neural topic modeling methods in learning the topical propensities of different psychiatric conditions from the psychotherapy session transcripts parsed from speech recordings. We also incorporate temporal modeling to put this additional interpretability to action by parsing out topic similarities as a time series in a turn-level resolution. We believe this topic modeling framework can offer interpretable insights for the therapist to optimally decide his or her strategy and improve psychotherapy effectiveness. △ Less

Submitted 3 November, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

Comments: This work extends our research series in computational linguistics for psychiatry (e.g. working alliance analysis in arXiv:2204.05522) with a systematic investigation of neural topic modeling approaches to provide interpretable insights in psychotherapy

arXiv:2204.05522 [pdf, other]

Deep Annotation of Therapeutic Working Alliance in Psychotherapy

Authors: Baihan Lin, Guillermo Cecchi, Djallel Bouneffouf

Abstract: The therapeutic working alliance is an important predictor of the outcome of the psychotherapy treatment. In practice, the working alliance is estimated from a set of scoring questionnaires in an inventory that both the patient and the therapists fill out. In this work, we propose an analytical framework of directly inferring the therapeutic working alliance from the natural language within the ps… ▽ More The therapeutic working alliance is an important predictor of the outcome of the psychotherapy treatment. In practice, the working alliance is estimated from a set of scoring questionnaires in an inventory that both the patient and the therapists fill out. In this work, we propose an analytical framework of directly inferring the therapeutic working alliance from the natural language within the psychotherapy sessions in a turn-level resolution with deep embeddings such as the Doc2Vec and SentenceBERT models. The transcript of each psychotherapy session can be transcribed and generated in real-time from the session speech recordings, and these embedded dialogues are compared with the distributed representations of the statements in the working alliance inventory. We demonstrate, in a real-world dataset with over 950 sessions of psychotherapy treatments in anxiety, depression, schizophrenia and suicidal patients, the effectiveness of this method in mapping out trajectories of patient-therapist alignment and the interpretability that can offer insights in clinical psychiatry. We believe such a framework can be provide timely feedback to the therapist regarding the quality of the conversation in interview sessions. △ Less

Submitted 12 April, 2022; originally announced April 2022.

arXiv:2106.15808 [pdf, other]

Optimal Epidemic Control as a Contextual Combinatorial Bandit with Budget

Authors: Baihan Lin, Djallel Bouneffouf

Abstract: In light of the COVID-19 pandemic, it is an open challenge and critical practical problem to find a optimal way to dynamically prescribe the best policies that balance both the governmental resources and epidemic control in different countries and regions. To solve this multi-dimensional tradeoff of exploitation and exploration, we formulate this technical challenge as a contextual combinatorial b… ▽ More In light of the COVID-19 pandemic, it is an open challenge and critical practical problem to find a optimal way to dynamically prescribe the best policies that balance both the governmental resources and epidemic control in different countries and regions. To solve this multi-dimensional tradeoff of exploitation and exploration, we formulate this technical challenge as a contextual combinatorial bandit problem that jointly optimizes a multi-criteria reward function. Given the historical daily cases in a region and the past intervention plans in place, the agent should generate useful intervention plans that policy makers can implement in real time to minimizing both the number of daily COVID-19 cases and the stringency of the recommended interventions. We prove this concept with simulations of multiple realistic policy making scenarios and demonstrate a clear advantage in providing a pareto optimal solution in the epidemic intervention problem. △ Less

Submitted 26 April, 2022; v1 submitted 30 June, 2021; originally announced June 2021.

Comments: Proceeding of FUZZ-IEEE 2022. This work extends our prior work on real-world applications of budgeted bandits (e.g. arXiv:1906.09384), and aims to solve the critical problem of epidemic control. Codes at: https://github.com/doerlbh/BanditZoo

arXiv:2103.08241 [pdf, other]

Reinforcement Learning with Algorithms from Probabilistic Structure Estimation

Authors: Jonathan P. Epperlein, Roman Overko, Sergiy Zhuk, Christopher King, Djallel Bouneffouf, Andrew Cullen, Robert Shorten

Abstract: Reinforcement learning (RL) algorithms aim to learn optimal decisions in unknown environments through experience of taking actions and observing the rewards gained. In some cases, the environment is not influenced by the actions of the RL agent, in which case the problem can be modeled as a contextual multi-armed bandit and lightweight myopic algorithms can be employed. On the other hand, when the… ▽ More Reinforcement learning (RL) algorithms aim to learn optimal decisions in unknown environments through experience of taking actions and observing the rewards gained. In some cases, the environment is not influenced by the actions of the RL agent, in which case the problem can be modeled as a contextual multi-armed bandit and lightweight myopic algorithms can be employed. On the other hand, when the RL agent's actions affect the environment, the problem must be modeled as a Markov decision process and more complex RL algorithms are required which take the future effects of actions into account. Moreover, in practice, it is often unknown from the outset whether or not the agent's actions will impact the environment and it is therefore not possible to determine which RL algorithm is most fitting. In this work, we propose to avoid this difficult decision entirely and incorporate a choice mechanism into our RL framework. Rather than assuming a specific problem structure, we use a probabilistic structure estimation procedure based on a likelihood-ratio (LR) test to make a more informed selection of learning algorithm. We derive a sufficient condition under which myopic policies are optimal, present an LR test for this condition, and derive a bound on the regret of our framework. We provide examples of real-world scenarios where our framework is needed and provide extensive simulations to validate our approach. △ Less

Submitted 1 June, 2022; v1 submitted 15 March, 2021; originally announced March 2021.

arXiv:2101.00001 [pdf, ps, other]

Etat de l'art sur l'application des bandits multi-bras

Authors: Djallel Bouneffouf

Abstract: The Multi-armed bandit offer the advantage to learn and exploit the already learnt knowledge at the same time. This capability allows this approach to be applied in different domains, going from clinical trials where the goal is investigating the effects of different experimental treatments while minimizing patient losses, to adaptive routing where the goal is to minimize the delays in a network.… ▽ More The Multi-armed bandit offer the advantage to learn and exploit the already learnt knowledge at the same time. This capability allows this approach to be applied in different domains, going from clinical trials where the goal is investigating the effects of different experimental treatments while minimizing patient losses, to adaptive routing where the goal is to minimize the delays in a network. This article provides a review of the recent results on applying bandit to real-life scenario and summarize the state of the art for each of these fields. Different techniques has been proposed to solve this problem setting, like epsilon-greedy, Upper confident bound (UCB) and Thompson Sampling (TS). We are showing here how this algorithms were adapted to solve the different problems of exploration exploitation. △ Less

Submitted 4 January, 2021; originally announced January 2021.

Comments: in French

arXiv:2010.11413 [pdf, other]

doi 10.1371/journal.pone.0267907

Predicting human decision making in psychological tasks with recurrent neural networks

Authors: Baihan Lin, Djallel Bouneffouf, Guillermo Cecchi

Abstract: Unlike traditional time series, the action sequences of human decision making usually involve many cognitive processes such as beliefs, desires, intentions, and theory of mind, i.e., what others are thinking. This makes predicting human decision-making challenging to be treated agnostically to the underlying psychological mechanisms. We propose here to use a recurrent neural network architecture b… ▽ More Unlike traditional time series, the action sequences of human decision making usually involve many cognitive processes such as beliefs, desires, intentions, and theory of mind, i.e., what others are thinking. This makes predicting human decision-making challenging to be treated agnostically to the underlying psychological mechanisms. We propose here to use a recurrent neural network architecture based on long short-term memory networks (LSTM) to predict the time series of the actions taken by human subjects engaged in gaming activity, the first application of such methods in this research domain. In this study, we collate the human data from 8 published literature of the Iterated Prisoner's Dilemma comprising 168,386 individual decisions and post-process them into 8,257 behavioral trajectories of 9 actions each for both players. Similarly, we collate 617 trajectories of 95 actions from 10 different published studies of Iowa Gambling Task experiments with healthy human subjects. We train our prediction networks on the behavioral data and demonstrate a clear advantage over the state-of-the-art methods in predicting human decision-making trajectories in both the single-agent scenario of the Iowa Gambling Task and the multi-agent scenario of the Iterated Prisoner's Dilemma. Moreover, we observe that the weights of the LSTM networks modeling the top performers tend to have a wider distribution compared to poor performers, as well as a larger bias, which suggest possible interpretations for the distribution of strategies adopted by each group. △ Less

Submitted 20 April, 2022; v1 submitted 21 October, 2020; originally announced October 2020.

Comments: To appear in PLOS ONE. Codes at https://github.com/doerlbh/HumanLSTM

Journal ref: PLOS ONE 17(5): e0267907 (2022)

arXiv:2010.09473 [pdf, other]

Double-Linear Thompson Sampling for Context-Attentive Bandits

Authors: Djallel Bouneffouf, Raphaël Féraud, Sohini Upadhyay, Yasaman Khazaeni, Irina Rish

Abstract: In this paper, we analyze and extend an online learning framework known as Context-Attentive Bandit, motivated by various practical applications, from medical diagnosis to dialog systems, where due to observation costs only a small subset of a potentially large number of context variables can be observed at each iteration;however, the agent has a freedom to choose which variables to observe. We de… ▽ More In this paper, we analyze and extend an online learning framework known as Context-Attentive Bandit, motivated by various practical applications, from medical diagnosis to dialog systems, where due to observation costs only a small subset of a potentially large number of context variables can be observed at each iteration;however, the agent has a freedom to choose which variables to observe. We derive a novel algorithm, called Context-Attentive Thompson Sampling (CATS), which builds upon the Linear Thompson Sampling approach, adapting it to Context-Attentive Bandit setting. We provide a theoretical regret analysis and an extensive empirical evaluation demonstrating advantages of the proposed approach over several baseline methods on a variety of real-life datasets △ Less

Submitted 15 October, 2020; originally announced October 2020.

Comments: arXiv admin note: text overlap with arXiv:1906.09384

arXiv:2009.13714 [pdf, other]

Learning to Generate Image Source-Agnostic Universal Adversarial Perturbations

Authors: Pu Zhao, Parikshit Ram, Songtao Lu, Yuguang Yao, Djallel Bouneffouf, Xue Lin, Sijia Liu

Abstract: Adversarial perturbations are critical for certifying the robustness of deep learning models. A universal adversarial perturbation (UAP) can simultaneously attack multiple images, and thus offers a more unified threat model, obviating an image-wise attack algorithm. However, the existing UAP generator is underdeveloped when images are drawn from different image sources (e.g., with different image… ▽ More Adversarial perturbations are critical for certifying the robustness of deep learning models. A universal adversarial perturbation (UAP) can simultaneously attack multiple images, and thus offers a more unified threat model, obviating an image-wise attack algorithm. However, the existing UAP generator is underdeveloped when images are drawn from different image sources (e.g., with different image resolutions). Towards an authentic universality across image sources, we take a novel view of UAP generation as a customized instance of few-shot learning, which leverages bilevel optimization and learning-to-optimize (L2O) techniques for UAP generation with improved attack success rate (ASR). We begin by considering the popular model agnostic meta-learning (MAML) framework to meta-learn a UAP generator. However, we see that the MAML framework does not directly offer the universal attack across image sources, requiring us to integrate it with another meta-learning framework of L2O. The resulting scheme for meta-learning a UAP generator (i) has better performance (50% higher ASR) than baselines such as Projected Gradient Descent, (ii) has better performance (37% faster) than the vanilla L2O and MAML frameworks (when applicable), and (iii) is able to simultaneously handle UAP generation for different victim models and image data sources. △ Less

Submitted 17 August, 2022; v1 submitted 28 September, 2020; originally announced September 2020.

arXiv:2007.11967 [pdf, other]

Computing the Dirichlet-Multinomial Log-Likelihood Function

Authors: Djallel Bouneffouf

Abstract: Dirichlet-multinomial (DMN) distribution is commonly used to model over-dispersion in count data. Precise and fast numerical computation of the DMN log-likelihood function is important for performing statistical inference using this distribution, and remains a challenge. To address this, we use mathematical properties of the gamma function to derive a closed form expression for the DMN log-likelih… ▽ More Dirichlet-multinomial (DMN) distribution is commonly used to model over-dispersion in count data. Precise and fast numerical computation of the DMN log-likelihood function is important for performing statistical inference using this distribution, and remains a challenge. To address this, we use mathematical properties of the gamma function to derive a closed form expression for the DMN log-likelihood function. Compared to existing methods, calculation of the closed form has a lower computational complexity, hence is much faster without comprimising computational accuracy. △ Less

Submitted 17 July, 2020; originally announced July 2020.

arXiv:2007.11416 [pdf, ps, other]

Spectral Clustering using Eigenspectrum Shape Based Nystrom Sampling

Authors: Djallel Bouneffouf

Abstract: Spectral clustering has shown a superior performance in analyzing the cluster structure. However, its computational complexity limits its application in analyzing large-scale data. To address this problem, many low-rank matrix approximating algorithms are proposed, including the Nystrom method - an approach with proven approximate error bounds. There are several algorithms that provide recipes to… ▽ More Spectral clustering has shown a superior performance in analyzing the cluster structure. However, its computational complexity limits its application in analyzing large-scale data. To address this problem, many low-rank matrix approximating algorithms are proposed, including the Nystrom method - an approach with proven approximate error bounds. There are several algorithms that provide recipes to construct Nystrom approximations with variable accuracies and computing times. This paper proposes a scalable Nystrom-based clustering algorithm with a new sampling procedure, Centroid Minimum Sum of Squared Similarities (CMS3), and a heuristic on when to use it. Our heuristic depends on the eigen spectrum shape of the dataset, and yields competitive low-rank approximations in test datasets compared to the other state-of-the-art methods △ Less

Submitted 21 July, 2020; originally announced July 2020.

arXiv:2007.06368 [pdf, other]

Contextual Bandit with Missing Rewards

Authors: Djallel Bouneffouf, Sohini Upadhyay, Yasaman Khazaeni

Abstract: We consider a novel variant of the contextual bandit problem (i.e., the multi-armed bandit with side-information, or context, available to a decision-maker) where the reward associated with each context-based decision may not always be observed("missing rewards"). This new problem is motivated by certain online settings including clinical trial and ad recommendation applications. In order to addre… ▽ More We consider a novel variant of the contextual bandit problem (i.e., the multi-armed bandit with side-information, or context, available to a decision-maker) where the reward associated with each context-based decision may not always be observed("missing rewards"). This new problem is motivated by certain online settings including clinical trial and ad recommendation applications. In order to address the missing rewards setting, we propose to combine the standard contextual bandit approach with an unsupervised learning mechanism such as clustering. Unlike standard contextual bandit methods, by leveraging clustering to estimate missing reward, we are able to learn from each incoming event, even those with missing rewards. Promising empirical results are obtained on several real-life datasets. △ Less

Submitted 18 July, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

arXiv:2006.15194 [pdf, other]

Online learning with Corrupted context: Corrupted Contextual Bandits

Authors: Djallel Bouneffouf

Abstract: We consider a novel variant of the contextual bandit problem (i.e., the multi-armed bandit with side-information, or context, available to a decision-maker) where the context used at each decision may be corrupted ("useless context"). This new problem is motivated by certain on-line settings including clinical trial and ad recommendation applications. In order to address the corrupted-context sett… ▽ More We consider a novel variant of the contextual bandit problem (i.e., the multi-armed bandit with side-information, or context, available to a decision-maker) where the context used at each decision may be corrupted ("useless context"). This new problem is motivated by certain on-line settings including clinical trial and ad recommendation applications. In order to address the corrupted-context setting,we propose to combine the standard contextual bandit approach with a classical multi-armed bandit mechanism. Unlike standard contextual bandit methods, we are able to learn from all iteration, even those with corrupted context, by improving the computing of the expectation for each arm. Promising empirical results are obtained on several real-life datasets. △ Less

Submitted 26 June, 2020; originally announced June 2020.

arXiv:2006.09635 [pdf, other]

Solving Constrained CASH Problems with ADMM

Authors: Parikshit Ram, Sijia Liu, Deepak Vijaykeerthi, Dakuo Wang, Djallel Bouneffouf, Greg Bramble, Horst Samulowitz, Alexander G. Gray

Abstract: The CASH problem has been widely studied in the context of automated configurations of machine learning (ML) pipelines and various solvers and toolkits are available. However, CASH solvers do not directly handle black-box constraints such as fairness, robustness or other domain-specific custom constraints. We present our recent approach [Liu, et al., 2020] that leverages the ADMM optimization fram… ▽ More The CASH problem has been widely studied in the context of automated configurations of machine learning (ML) pipelines and various solvers and toolkits are available. However, CASH solvers do not directly handle black-box constraints such as fairness, robustness or other domain-specific custom constraints. We present our recent approach [Liu, et al., 2020] that leverages the ADMM optimization framework to decompose CASH into multiple small problems and demonstrate how ADMM facilitates incorporation of black-box constraints. △ Less

Submitted 10 July, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

Comments: 7th ICML Workshop on Automated Machine Learning (2020)

arXiv:2006.06580 [pdf, other]

Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior

Authors: Baihan Lin, Djallel Bouneffouf, Guillermo Cecchi

Abstract: As an important psychological and social experiment, the Iterated Prisoner's Dilemma (IPD) treats the choice to cooperate or defect as an atomic action. We propose to study the behaviors of online learning algorithms in the Iterated Prisoner's Dilemma (IPD) game, where we investigate the full spectrum of reinforcement learning agents: multi-armed bandits, contextual bandits and reinforcement learn… ▽ More As an important psychological and social experiment, the Iterated Prisoner's Dilemma (IPD) treats the choice to cooperate or defect as an atomic action. We propose to study the behaviors of online learning algorithms in the Iterated Prisoner's Dilemma (IPD) game, where we investigate the full spectrum of reinforcement learning agents: multi-armed bandits, contextual bandits and reinforcement learning. We evaluate them based on a tournament of iterated prisoner's dilemma where multiple agents can compete in a sequential fashion. This allows us to analyze the dynamics of policies learned by multiple self-interested independent reward-driven agents, and also allows us study the capacity of these algorithms to fit the human behaviors. Results suggest that considering the current situation to make decision is the worst in this kind of social dilemma game. Multiples discoveries on online learning behaviors and clinical validations are stated, as an effort to connect artificial intelligence algorithms with human behaviors and their abnormal states in neuropsychiatric conditions. △ Less

Submitted 26 August, 2022; v1 submitted 9 June, 2020; originally announced June 2020.

Comments: Proceeding of PRICAI 2022. To the best of our knowledge, this is the first attempt to explore the full spectrum of reinforcement learning agents (multi-armed bandits, contextual bandits and reinforcement learning) in the sequential social dilemma. Codes at https://github.com/doerlbh/dilemmaRL

arXiv:2005.04544 [pdf, other]

Unified Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL

Authors: Baihan Lin, Guillermo Cecchi, Djallel Bouneffouf, Jenna Reinen, Irina Rish

Abstract: Artificial behavioral agents are often evaluated based on their consistent behaviors and performance to take sequential actions in an environment to maximize some notion of cumulative reward. However, human decision making in real life usually involves different strategies and behavioral trajectories that lead to the same empirical outcome. Motivated by clinical literature of a wide range of neuro… ▽ More Artificial behavioral agents are often evaluated based on their consistent behaviors and performance to take sequential actions in an environment to maximize some notion of cumulative reward. However, human decision making in real life usually involves different strategies and behavioral trajectories that lead to the same empirical outcome. Motivated by clinical literature of a wide range of neurological and psychiatric disorders, we propose here a more general and flexible parametric framework for sequential decision making that involves a two-stream reward processing mechanism. We demonstrated that this framework is flexible and unified enough to incorporate a family of problems spanning multi-armed bandits (MAB), contextual bandits (CB) and reinforcement learning (RL), which decompose the sequential decision making process in different levels. Inspired by the known reward processing abnormalities of many mental disorders, our clinically-inspired agents demonstrated interesting behavioral trajectories and comparable performance on simulated tasks with particular reward distributions, a real-world dataset capturing human decision-making in gambling tasks, and the PacMan game across different reward stationarities in a lifelong learning setting. △ Less

Submitted 27 December, 2021; v1 submitted 9 May, 2020; originally announced May 2020.

Comments: Proceeding of HBAI 2020. This article supersedes and extends our work arXiv:1706.02897 (MAB) and arXiv:1906.11286 (RL) into the Contextual Bandit (CB) framework. It generalized extensively into multi-armed bandits, contextual bandits and RL settings to create a unified framework of human behavioral agents

Journal ref: In Human Brain and Artificial Intelligence (pp. 14-33). Springer 2021

arXiv:2005.02209 [pdf, ps, other]

Hyper-parameter Tuning for the Contextual Bandit

Authors: Djallel Bouneffouf, Emmanuelle Claeys

Abstract: We study here the problem of learning the exploration exploitation trade-off in the contextual bandit problem with linear reward function setting. In the traditional algorithms that solve the contextual bandit problem, the exploration is a parameter that is tuned by the user. However, our proposed algorithm learn to choose the right exploration parameters in an online manner based on the observed… ▽ More We study here the problem of learning the exploration exploitation trade-off in the contextual bandit problem with linear reward function setting. In the traditional algorithms that solve the contextual bandit problem, the exploration is a parameter that is tuned by the user. However, our proposed algorithm learn to choose the right exploration parameters in an online manner based on the observed context, and the immediate reward received for the chosen action. We have presented here two algorithms that uses a bandit to find the optimal exploration of the contextual bandit algorithm, which we hope is the first step toward the automation of the multi-armed bandit algorithm. △ Less

Submitted 4 May, 2020; originally announced May 2020.

Comments: arXiv admin note: text overlap with arXiv:1705.03821

arXiv:1910.14436 [pdf, other]

How can AI Automate End-to-End Data Science?

Authors: Charu Aggarwal, Djallel Bouneffouf, Horst Samulowitz, Beat Buesser, Thanh Hoang, Udayan Khurana, Sijia Liu, Tejaswini Pedapati, Parikshit Ram, Ambrish Rawat, Martin Wistuba, Alexander Gray

Abstract: Data science is labor-intensive and human experts are scarce but heavily involved in every aspect of it. This makes data science time consuming and restricted to experts with the resulting quality heavily dependent on their experience and skills. To make data science more accessible and scalable, we need its democratization. Automated Data Science (AutoDS) is aimed towards that goal and is emergin… ▽ More Data science is labor-intensive and human experts are scarce but heavily involved in every aspect of it. This makes data science time consuming and restricted to experts with the resulting quality heavily dependent on their experience and skills. To make data science more accessible and scalable, we need its democratization. Automated Data Science (AutoDS) is aimed towards that goal and is emerging as an important research and business topic. We introduce and define the AutoDS challenge, followed by a proposal of a general AutoDS framework that covers existing approaches but also provides guidance for the development of new methods. We categorize and review the existing literature from multiple aspects of the problem setup and employed techniques. Then we provide several views on how AI could succeed in automating end-to-end AutoDS. We hope this survey can serve as insightful guideline for the AutoDS field and provide inspiration for future research. △ Less

Submitted 22 October, 2019; originally announced October 2019.

arXiv:1906.12350 [pdf, other]

Split Q Learning: Reinforcement Learning with Two-Stream Rewards

Authors: Baihan Lin, Djallel Bouneffouf, Guillermo Cecchi

Abstract: Drawing an inspiration from behavioral studies of human decision making, we propose here a general parametric framework for a reinforcement learning problem, which extends the standard Q-learning approach to incorporate a two-stream framework of reward processing with biases biologically associated with several neurological and psychiatric conditions, including Parkinson's and Alzheimer's diseases… ▽ More Drawing an inspiration from behavioral studies of human decision making, we propose here a general parametric framework for a reinforcement learning problem, which extends the standard Q-learning approach to incorporate a two-stream framework of reward processing with biases biologically associated with several neurological and psychiatric conditions, including Parkinson's and Alzheimer's diseases, attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain. For AI community, the development of agents that react differently to different types of rewards can enable us to understand a wide spectrum of multi-agent interactions in complex real-world socioeconomic systems. Moreover, from the behavioral modeling perspective, our parametric framework can be viewed as a first step towards a unifying computational model capturing reward processing abnormalities across multiple mental conditions and user preferences in long-term recommendation systems. △ Less

Submitted 12 November, 2019; v1 submitted 20 June, 2019; originally announced June 2019.

Comments: IJCAI 2019. This article supersedes our work arXiv:1706.02897 into RL setting, with a different focus by applying Inverse Reinforcement Learning to model human clinical behavioral bias. It also precedes our work arXiv:1906.11286 which introduces extensive emphases in RL games

arXiv:1906.11286 [pdf, other]

A Story of Two Streams: Reinforcement Learning Models from Human Behavior and Neuropsychiatry

Authors: Baihan Lin, Guillermo Cecchi, Djallel Bouneffouf, Jenna Reinen, Irina Rish

Abstract: Drawing an inspiration from behavioral studies of human decision making, we propose here a more general and flexible parametric framework for reinforcement learning that extends standard Q-learning to a two-stream model for processing positive and negative rewards, and allows to incorporate a wide range of reward-processing biases -- an important component of human decision making which can help u… ▽ More Drawing an inspiration from behavioral studies of human decision making, we propose here a more general and flexible parametric framework for reinforcement learning that extends standard Q-learning to a two-stream model for processing positive and negative rewards, and allows to incorporate a wide range of reward-processing biases -- an important component of human decision making which can help us better understand a wide spectrum of multi-agent interactions in complex real-world socioeconomic systems, as well as various neuropsychiatric conditions associated with disruptions in normal reward processing. From the computational perspective, we observe that the proposed Split-QL model and its clinically inspired variants consistently outperform standard Q-Learning and SARSA methods, as well as recently proposed Double Q-Learning approaches, on simulated tasks with particular reward distributions, a real-world dataset capturing human decision-making in gambling tasks, and the Pac-Man game in a lifelong learning setting across different reward stationarities. △ Less

Submitted 14 April, 2020; v1 submitted 20 June, 2019; originally announced June 2019.

Comments: Published in AAMAS 2020 as a full paper. This article supersedes our work arXiv:1706.02897 into RL setting and extends extensively into RL games, cognitive modeling, and gambling tasks in lifelong learning setting

arXiv:1906.03979 [pdf, other]

Optimal Exploitation of Clustering and History Information in Multi-Armed Bandit

Authors: Djallel Bouneffouf, Srinivasan Parthasarathy, Horst Samulowitz, Martin Wistub

Abstract: We consider the stochastic multi-armed bandit problem and the contextual bandit problem with historical observations and pre-clustered arms. The historical observations can contain any number of instances for each arm, and the pre-clustering information is a fixed clustering of arms provided as part of the input. We develop a variety of algorithms which incorporate this offline information effecti… ▽ More We consider the stochastic multi-armed bandit problem and the contextual bandit problem with historical observations and pre-clustered arms. The historical observations can contain any number of instances for each arm, and the pre-clustering information is a fixed clustering of arms provided as part of the input. We develop a variety of algorithms which incorporate this offline information effectively during the online exploration phase and derive their regret bounds. In particular, we develop the META algorithm which effectively hedges between two other algorithms: one which uses both historical observations and clustering, and another which uses only the historical observations. The former outperforms the latter when the clustering quality is good, and vice-versa. Extensive experiments on synthetic and real world datasets on Warafin drug dosage and web server selection for latency minimization validate our theoretical insights and demonstrate that META is a robust strategy for optimally exploiting the pre-clustering information. △ Less

Submitted 31 May, 2019; originally announced June 2019.

Comments: IJCAI 2019, International Joint Conferences on Artificial Intelligence

arXiv:1905.00424 [pdf, other]

An ADMM Based Framework for AutoML Pipeline Configuration

Authors: Sijia Liu, Parikshit Ram, Deepak Vijaykeerthy, Djallel Bouneffouf, Gregory Bramble, Horst Samulowitz, Dakuo Wang, Andrew Conn, Alexander Gray

Abstract: We study the AutoML problem of automatically configuring machine learning pipelines by jointly selecting algorithms and their appropriate hyper-parameters for all steps in supervised learning pipelines. This black-box (gradient-free) optimization with mixed integer & continuous variables is a challenging problem. We propose a novel AutoML scheme by leveraging the alternating direction method of mu… ▽ More We study the AutoML problem of automatically configuring machine learning pipelines by jointly selecting algorithms and their appropriate hyper-parameters for all steps in supervised learning pipelines. This black-box (gradient-free) optimization with mixed integer & continuous variables is a challenging problem. We propose a novel AutoML scheme by leveraging the alternating direction method of multipliers (ADMM). The proposed framework is able to (i) decompose the optimization problem into easier sub-problems that have a reduced number of variables and circumvent the challenge of mixed variable categories, and (ii) incorporate black-box constraints along-side the black-box optimization objective. We empirically evaluate the flexibility (in utilizing existing AutoML techniques), effectiveness (against open source AutoML toolkits),and unique capability (of executing AutoML with practically motivated black-box constraints) of our proposed scheme on a collection of binary classification data sets from UCI ML& OpenML repositories. We observe that on an average our framework provides significant gains in comparison to other AutoML frameworks (Auto-sklearn & TPOT), highlighting the practical advantages of this framework. △ Less

Submitted 6 December, 2019; v1 submitted 1 May, 2019; originally announced May 2019.

Journal ref: published at AAAI 2020

arXiv:1904.10040 [pdf, ps, other]

A Survey on Practical Applications of Multi-Armed and Contextual Bandits

Authors: Djallel Bouneffouf, Irina Rish

Abstract: In recent years, multi-armed bandit (MAB) framework has attracted a lot of attention in various applications, from recommender systems and information retrieval to healthcare and finance, due to its stellar performance combined with certain attractive properties, such as learning from less feedback. The multi-armed bandit field is currently flourishing, as novel problem settings and algorithms mot… ▽ More In recent years, multi-armed bandit (MAB) framework has attracted a lot of attention in various applications, from recommender systems and information retrieval to healthcare and finance, due to its stellar performance combined with certain attractive properties, such as learning from less feedback. The multi-armed bandit field is currently flourishing, as novel problem settings and algorithms motivated by various practical applications are being introduced, building on top of the classical bandit problem. This article aims to provide a comprehensive review of top recent developments in multiple real-life applications of the multi-armed bandit. Specifically, we introduce a taxonomy of common MAB-based applications and summarize state-of-art for each of those domains. Furthermore, we identify important current trends and provide new perspectives pertaining to the future of this exciting and fast-growing field. △ Less

Submitted 2 April, 2019; originally announced April 2019.

Comments: under review by IJCAI 2019 Survey

arXiv:1809.08343 [pdf, other]

Interpretable Multi-Objective Reinforcement Learning through Policy Orchestration

Authors: Ritesh Noothigattu, Djallel Bouneffouf, Nicholas Mattei, Rachita Chandra, Piyush Madan, Kush Varshney, Murray Campbell, Moninder Singh, Francesca Rossi

Abstract: Autonomous cyber-physical agents and systems play an increasingly large role in our lives. To ensure that agents behave in ways aligned with the values of the societies in which they operate, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. These constraints and norms can come f… ▽ More Autonomous cyber-physical agents and systems play an increasingly large role in our lives. To ensure that agents behave in ways aligned with the values of the societies in which they operate, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. These constraints and norms can come from any number of sources including regulations, business process guidelines, laws, ethical principles, social norms, and moral values. We detail a novel approach that uses inverse reinforcement learning to learn a set of unspecified constraints from demonstrations of the task, and reinforcement learning to learn to maximize the environment rewards. More precisely, we assume that an agent can observe traces of behavior of members of the society but has no access to the explicit set of constraints that give rise to the observed behavior. Inverse reinforcement learning is used to learn such constraints, that are then combined with a possibly orthogonal value function through the use of a contextual bandit-based orchestrator that picks a contextually-appropriate choice between the two policies (constraint-based and environment reward-based) when taking actions. The contextual bandit orchestrator allows the agent to mix policies in novel ways, taking the best actions from either a reward maximizing or constrained policy. In addition, the orchestrator is transparent on which policy is being employed at each time step. We test our algorithms using a Pac-Man domain and show that the agent is able to learn to act optimally, act within the demonstrated constraints, and mix these two functions in complex ways. △ Less

Submitted 21 September, 2018; originally announced September 2018.

Comments: 8 pages, 3 figures

arXiv:1809.05720 [pdf, other]

Incorporating Behavioral Constraints in Online AI Systems

Authors: Avinash Balakrishnan, Djallel Bouneffouf, Nicholas Mattei, Francesca Rossi

Abstract: AI systems that learn through reward feedback about the actions they take are increasingly deployed in domains that have significant impact on our daily life. However, in many cases the online rewards should not be the only guiding criteria, as there are additional constraints and/or priorities imposed by regulations, values, preferences, or ethical principles. We detail a novel online agent that… ▽ More AI systems that learn through reward feedback about the actions they take are increasingly deployed in domains that have significant impact on our daily life. However, in many cases the online rewards should not be the only guiding criteria, as there are additional constraints and/or priorities imposed by regulations, values, preferences, or ethical principles. We detail a novel online agent that learns a set of behavioral constraints by observation and uses these learned constraints as a guide when making decisions in an online setting while still being reactive to reward feedback. To define this agent, we propose to adopt a novel extension to the classical contextual multi-armed bandit setting and we provide a new algorithm called Behavior Constrained Thompson Sampling (BCTS) that allows for online learning while obeying exogenous constraints. Our agent learns a constrained policy that implements the observed behavioral constraints demonstrated by a teacher agent, and then uses this constrained policy to guide the reward-based online exploration and exploitation. We characterize the upper bound on the expected regret of the contextual bandit algorithm that underlies our agent and provide a case study with real world data in two application domains. Our experiments show that the designed agent is able to act within the set of behavior constraints without significantly degrading its overall reward performance. △ Less

Submitted 15 September, 2018; originally announced September 2018.

Comments: 9 pages, 6 figures

arXiv:1806.09077 [pdf, other]

Beyond Backprop: Online Alternating Minimization with Auxiliary Variables

Authors: Anna Choromanska, Benjamin Cowen, Sadhana Kumaravel, Ronny Luss, Mattia Rigotti, Irina Rish, Brian Kingsbury, Paolo DiAchille, Viatcheslav Gurev, Ravi Tejwani, Djallel Bouneffouf

Abstract: Despite significant recent advances in deep neural networks, training them remains a challenge due to the highly non-convex nature of the objective function. State-of-the-art methods rely on error backpropagation, which suffers from several well-known issues, such as vanishing and exploding gradients, inability to handle non-differentiable nonlinearities and to parallelize weight-updates across la… ▽ More Despite significant recent advances in deep neural networks, training them remains a challenge due to the highly non-convex nature of the objective function. State-of-the-art methods rely on error backpropagation, which suffers from several well-known issues, such as vanishing and exploding gradients, inability to handle non-differentiable nonlinearities and to parallelize weight-updates across layers, and biological implausibility. These limitations continue to motivate exploration of alternative training algorithms, including several recently proposed auxiliary-variable methods which break the complex nested objective function into local subproblems. However, those techniques are mainly offline (batch), which limits their applicability to extremely large datasets, as well as to online, continual or reinforcement learning. The main contribution of our work is a novel online (stochastic/mini-batch) alternating minimization (AM) approach for training deep neural networks, together with the first theoretical convergence guarantees for AM in stochastic settings and promising empirical results on a variety of architectures and datasets. △ Less

Submitted 5 June, 2019; v1 submitted 23 June, 2018; originally announced June 2018.

Comments: First six authors contributed equally to this work: A.C. - theory, manuscript, B.C. - code, experiments, S.K. - code, experiments, R.L. - algorithm, experiments, M.R. - code, experiments, I.R. - algorithm, manuscript

arXiv:1802.00981 [pdf, other]

Contextual Bandit with Adaptive Feature Extraction

Authors: Baihan Lin, Djallel Bouneffouf, Guillermo Cecchi, Irina Rish

Abstract: We consider an online decision making setting known as contextual bandit problem, and propose an approach for improving contextual bandit performance by using an adaptive feature extraction (representation learning) based on online clustering. Our approach starts with an off-line pre-training on unlabeled history of contexts (which can be exploited by our approach, but not by the standard contextu… ▽ More We consider an online decision making setting known as contextual bandit problem, and propose an approach for improving contextual bandit performance by using an adaptive feature extraction (representation learning) based on online clustering. Our approach starts with an off-line pre-training on unlabeled history of contexts (which can be exploited by our approach, but not by the standard contextual bandit), followed by an online selection and adaptation of encoders. Specifically, given an input sample (context), the proposed approach selects the most appropriate encoding function to extract a feature vector which becomes an input for a contextual bandit, and updates both the bandit and the encoding function based on the context and on the feedback (reward). Our experiments on a variety of datasets, and both in stationary and non-stationary environments of several kinds demonstrate clear advantages of the proposed adaptive representation learning over the standard contextual bandit based on "raw" input contexts. △ Less

Submitted 14 September, 2020; v1 submitted 3 February, 2018; originally announced February 2018.

Comments: IEEE ICDMW 2018

arXiv:1711.06761 [pdf, other]

Scalable Recollections for Continual Lifelong Learning

Authors: Matthew Riemer, Tim Klinger, Djallel Bouneffouf, Michele Franceschini

Abstract: Given the recent success of Deep Learning applied to a variety of single tasks, it is natural to consider more human-realistic settings. Perhaps the most difficult of these settings is that of continual lifelong learning, where the model must learn online over a continuous stream of non-stationary data. A successful continual lifelong learning system must have three key capabilities: it must learn… ▽ More Given the recent success of Deep Learning applied to a variety of single tasks, it is natural to consider more human-realistic settings. Perhaps the most difficult of these settings is that of continual lifelong learning, where the model must learn online over a continuous stream of non-stationary data. A successful continual lifelong learning system must have three key capabilities: it must learn and adapt over time, it must not forget what it has learned, and it must be efficient in both training time and memory. Recent techniques have focused their efforts primarily on the first two capabilities while questions of efficiency remain largely unexplored. In this paper, we consider the problem of efficient and effective storage of experiences over very large time-frames. In particular we consider the case where typical experiences are O(n) bits and memories are limited to O(k) bits for k << n. We present a novel scalable architecture and training algorithm in this challenging domain and provide an extensive evaluation of its performance. Our results show that we can achieve considerable gains on top of state-of-the-art methods such as GEM. △ Less

Submitted 19 December, 2018; v1 submitted 17 November, 2017; originally announced November 2017.

Comments: AAAI 2019

arXiv:1706.02897 [pdf, ps, other]

Bandit Models of Human Behavior: Reward Processing in Mental Disorders

Authors: Djallel Bouneffouf, Irina Rish, Guillermo A. Cecchi

Abstract: Drawing an inspiration from behavioral studies of human decision making, we propose here a general parametric framework for multi-armed bandit problem, which extends the standard Thompson Sampling approach to incorporate reward processing biases associated with several neurological and psychiatric conditions, including Parkinson's and Alzheimer's diseases, attention-deficit/hyperactivity disorder… ▽ More Drawing an inspiration from behavioral studies of human decision making, we propose here a general parametric framework for multi-armed bandit problem, which extends the standard Thompson Sampling approach to incorporate reward processing biases associated with several neurological and psychiatric conditions, including Parkinson's and Alzheimer's diseases, attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain. We demonstrate empirically that the proposed parametric approach can often outperform the baseline Thompson Sampling on a variety of datasets. Moreover, from the behavioral modeling perspective, our parametric framework can be viewed as a first step towards a unifying computational model capturing reward processing abnormalities across multiple mental conditions. △ Less

Submitted 7 June, 2017; originally announced June 2017.

Comments: Conference on Artificial General Intelligence, AGI-17

arXiv:1705.03821 [pdf, other]

Context Attentive Bandits: Contextual Bandit with Restricted Context

Authors: Djallel Bouneffouf, Irina Rish, Guillermo A. Cecchi, Raphael Feraud

Abstract: We consider a novel formulation of the multi-armed bandit model, which we call the contextual bandit with restricted context, where only a limited number of features can be accessed by the learner at every iteration. This novel formulation is motivated by different online problems arising in clinical trials, recommender systems and attention modeling. Herein, we adapt the standard multi-armed band… ▽ More We consider a novel formulation of the multi-armed bandit model, which we call the contextual bandit with restricted context, where only a limited number of features can be accessed by the learner at every iteration. This novel formulation is motivated by different online problems arising in clinical trials, recommender systems and attention modeling. Herein, we adapt the standard multi-armed bandit algorithm known as Thompson Sampling to take advantage of our restricted context setting, and propose two novel algorithms, called the Thompson Sampling with Restricted Context(TSRC) and the Windows Thompson Sampling with Restricted Context(WTSRC), for handling stationary and nonstationary environments, respectively. Our empirical results demonstrate advantages of the proposed approaches on several real-life datasets △ Less

Submitted 7 June, 2017; v1 submitted 10 May, 2017; originally announced May 2017.

Comments: IJCAI 2017

arXiv:1508.07091 [pdf, other]

Multi-armed Bandit Problem with Known Trend

Authors: Djallel Bouneffouf, Raphaël Feraud

Abstract: We consider a variant of the multi-armed bandit model, which we call multi-armed bandit problem with known trend, where the gambler knows the shape of the reward function of each arm but not its distribution. This new problem is motivated by different online problems like active learning, music and interface recommendation applications, where when an arm is sampled by the model the received reward… ▽ More We consider a variant of the multi-armed bandit model, which we call multi-armed bandit problem with known trend, where the gambler knows the shape of the reward function of each arm but not its distribution. This new problem is motivated by different online problems like active learning, music and interface recommendation applications, where when an arm is sampled by the model the received reward change according to a known trend. By adapting the standard multi-armed bandit algorithm UCB1 to take advantage of this setting, we propose the new algorithm named A-UCB that assumes a stochastic model. We provide upper bounds of the regret which compare favourably with the ones of UCB1. We also confirm that experimentally with different simulations △ Less

Submitted 10 May, 2017; v1 submitted 28 August, 2015; originally announced August 2015.

Comments: Neurocomputing 2016. arXiv admin note: text overlap with arXiv:0805.3415 by other authors

ACM Class: I.2

arXiv:1409.8572 [pdf, other]

Freshness-Aware Thompson Sampling

Authors: Djallel Bouneffouf

Abstract: To follow the dynamicity of the user's content, researchers have recently started to model interactions between users and the Context-Aware Recommender Systems (CARS) as a bandit problem where the system needs to deal with exploration and exploitation dilemma. In this sense, we propose to study the freshness of the user's content in CARS through the bandit problem. We introduce in this paper an al… ▽ More To follow the dynamicity of the user's content, researchers have recently started to model interactions between users and the Context-Aware Recommender Systems (CARS) as a bandit problem where the system needs to deal with exploration and exploitation dilemma. In this sense, we propose to study the freshness of the user's content in CARS through the bandit problem. We introduce in this paper an algorithm named Freshness-Aware Thompson Sampling (FA-TS) that manages the recommendation of fresh document according to the user's risk of the situation. The intensive evaluation and the detailed analysis of the experimental results reveals several important discoveries in the exploration/exploitation (exr/exp) behaviour. △ Less

Submitted 29 September, 2014; originally announced September 2014.

Comments: 21st International Conference on Neural Information Processing. arXiv admin note: text overlap with arXiv:1409.7729

ACM Class: I.2

arXiv:1409.8191 [pdf, other]

A Neural Networks Committee for the Contextual Bandit Problem

Authors: Robin Allesiardo, Raphael Feraud, Djallel Bouneffouf

Abstract: This paper presents a new contextual bandit algorithm, NeuralBandit, which does not need hypothesis on stationarity of contexts and rewards. Several neural networks are trained to modelize the value of rewards knowing the context. Two variants, based on multi-experts approach, are proposed to choose online the parameters of multi-layer perceptrons. The proposed algorithms are successfully tested o… ▽ More This paper presents a new contextual bandit algorithm, NeuralBandit, which does not need hypothesis on stationarity of contexts and rewards. Several neural networks are trained to modelize the value of rewards knowing the context. Two variants, based on multi-experts approach, are proposed to choose online the parameters of multi-layer perceptrons. The proposed algorithms are successfully tested on a large dataset with and without stationarity of rewards. △ Less

Submitted 29 September, 2014; originally announced September 2014.

Comments: 21st International Conference on Neural Information Processing

ACM Class: I.2

arXiv:1409.7729 [pdf, other]

Context-Based Information Retrieval in Risky Environment

Authors: Djallel Bouneffouf

Abstract: Context-Based Information Retrieval is recently modelled as an exploration/ exploitation trade-off (exr/exp) problem, where the system has to choose between maximizing its expected rewards dealing with its current knowledge (exploitation) and learning more about the unknown user's preferences to improve its knowledge (exploration). This problem has been addressed by the reinforcement learning comm… ▽ More Context-Based Information Retrieval is recently modelled as an exploration/ exploitation trade-off (exr/exp) problem, where the system has to choose between maximizing its expected rewards dealing with its current knowledge (exploitation) and learning more about the unknown user's preferences to improve its knowledge (exploration). This problem has been addressed by the reinforcement learning community but they do not consider the risk level of the current user's situation, where it may be dangerous to explore the non-top-ranked documents the user may not desire in his/her current situation if the risk level is high. We introduce in this paper an algorithm named CBIR-R-greedy that considers the risk level of the user's situation to adaptively balance between exr and exp. △ Less

Submitted 30 July, 2014; originally announced September 2014.

Comments: arXiv admin note: substantial text overlap with arXiv:1408.2195

ACM Class: I.2

arXiv:1408.2196 [pdf, other]

Exponentiated Gradient Exploration for Active Learning

Authors: Djallel Bouneffouf

Abstract: Active learning strategies respond to the costly labelling task in a supervised classification by selecting the most useful unlabelled examples in training a predictive model. Many conventional active learning algorithms focus on refining the decision boundary, rather than exploring new regions that can be more informative. In this setting, we propose a sequential algorithm named EG-Active that ca… ▽ More Active learning strategies respond to the costly labelling task in a supervised classification by selecting the most useful unlabelled examples in training a predictive model. Many conventional active learning algorithms focus on refining the decision boundary, rather than exploring new regions that can be more informative. In this setting, we propose a sequential algorithm named EG-Active that can improve any Active learning algorithm by an optimal random exploration. Experimental results show a statistically significant and appreciable improvement in the performance of our new approach over the existing active feedback methods. △ Less

Submitted 10 August, 2014; originally announced August 2014.

ACM Class: I.2

Showing 1–50 of 65 results for author: Bouneffouf, D