Skip to main content

Showing 1–25 of 25 results for author: Kalogeiton, V

  1. arXiv:2407.01516  [pdf, other

    cs.CV

    E.T. the Exceptional Trajectories: Text-to-camera-trajectory generation with character awareness

    Authors: Robin Courant, Nicolas Dufour, Xi Wang, Marc Christie, Vicky Kalogeiton

    Abstract: Stories and emotions in movies emerge through the effect of well-thought-out directing decisions, in particular camera placement and movement over time. Crafting compelling camera trajectories remains a complex iterative process, even for skilful artists. To tackle this, in this paper, we propose a dataset called the Exceptional Trajectories (E.T.) with camera trajectories along with character inf… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: ECCV 2024. Project page: https://www.lix.polytechnique.fr/vista/projects/2024_et_courant/

  2. arXiv:2406.10221  [pdf, other

    cs.CV cs.AI cs.CL

    Short Film Dataset (SFD): A Benchmark for Story-Level Video Understanding

    Authors: Ridouane Ghermi, Xi Wang, Vicky Kalogeiton, Ivan Laptev

    Abstract: Recent advances in vision-language models have significantly propelled video understanding. Existing datasets and tasks, however, have notable limitations. Most datasets are confined to short videos with limited events and narrow narratives. For example, datasets with instructional and egocentric videos often document the activities of one person in a single scene. Although some movie datasets off… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  3. arXiv:2405.20324  [pdf, other

    cs.CV cs.LG

    Don't drop your samples! Coherence-aware training benefits Conditional diffusion

    Authors: Nicolas Dufour, Victor Besnier, Vicky Kalogeiton, David Picard

    Abstract: Conditional diffusion models are powerful generative models that can leverage various types of conditional information, such as class labels, segmentation masks, or text captions. However, in many real-world scenarios, conditional information may be noisy or unreliable due to human annotation errors or weak alignment. In this paper, we propose the Coherence-Aware Diffusion (CAD), a novel method th… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted at CVPR 2024 as a Highlight. Project page: https://nicolas-dufour.github.io/cad.html

  4. arXiv:2404.13040  [pdf, other

    cs.CV cs.LG

    Analysis of Classifier-Free Guidance Weight Schedulers

    Authors: Xi Wang, Nicolas Dufour, Nefeli Andreou, Marie-Paule Cani, Victoria Fernandez Abrevaya, David Picard, Vicky Kalogeiton

    Abstract: Classifier-Free Guidance (CFG) enhances the quality and condition adherence of text-to-image diffusion models. It operates by combining the conditional and unconditional predictions using a fixed weight. However, recent works vary the weights throughout the diffusion process, reporting superior results but without providing any rationale or analysis. By conducting comprehensive experiments, this p… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  5. arXiv:2401.04210  [pdf, other

    cs.CV cs.AI cs.CL cs.MM cs.SD eess.AS

    FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild

    Authors: Zhi-Song Liu, Robin Courant, Vicky Kalogeiton

    Abstract: Automatically understanding funny moments (i.e., the moments that make people laugh) when watching comedy is challenging, as they relate to various features, such as body language, dialogues and culture. In this paper, we propose FunnyNet-W, a model that relies on cross- and self-attention for visual, audio and text data to predict funny moments in videos. Unlike most methods that rely on ground t… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: 22 pages, 14 figures

  6. arXiv:2312.09788  [pdf, other

    cs.CV cs.AI cs.LG

    Collaborating Foundation Models for Domain Generalized Semantic Segmentation

    Authors: Yasser Benigmim, Subhankar Roy, Slim Essid, Vicky Kalogeiton, Stéphane Lathuilière

    Abstract: Domain Generalized Semantic Segmentation (DGSS) deals with training a model on a labeled source domain with the aim of generalizing to unseen domains during inference. Existing DGSS methods typically effectuate robust features by means of Domain Randomization (DR). Such an approach is often limited as it can only account for style diversification and not content. In this work, we take an orthogona… ▽ More

    Submitted 29 March, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: https://github.com/yasserben/CLOUDS ; Accepted to CVPR 2024

  7. arXiv:2311.04414  [pdf, other

    cs.CV

    Learning the What and How of Annotation in Video Object Segmentation

    Authors: Thanos Delatolas, Vicky Kalogeiton, Dim P. Papadopoulos

    Abstract: Video Object Segmentation (VOS) is crucial for several applications, from video editing to video data generation. Training a VOS model requires an abundance of manually labeled training videos. The de-facto traditional way of annotating objects requires humans to draw detailed segmentation masks on the target objects at each video frame. This annotation process, however, is tedious and time-consum… ▽ More

    Submitted 11 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

    Comments: Accepted to WACV 2024

  8. Reward Function Design for Crowd Simulation via Reinforcement Learning

    Authors: Ariel Kwiatkowski, Vicky Kalogeiton, Julien Pettré, Marie-Paule Cani

    Abstract: Crowd simulation is important for video-games design, since it enables to populate virtual worlds with autonomous avatars that navigate in a human-like manner. Reinforcement learning has shown great potential in simulating virtual crowds, but the design of the reward function is critical to achieving effective and efficient results. In this work, we explore the design of reward functions for reinf… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  9. arXiv:2309.03933  [pdf, other

    cs.CV

    BluNF: Blueprint Neural Field

    Authors: Robin Courant, Xi Wang, Marc Christie, Vicky Kalogeiton

    Abstract: Neural Radiance Fields (NeRFs) have revolutionized scene novel view synthesis, offering visually realistic, precise, and robust implicit reconstructions. While recent approaches enable NeRF editing, such as object removal, 3D shape modification, or material property manipulation, the manual annotation prior to such edits makes the process tedious. Additionally, traditional 2D interaction tools lac… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: ICCV-W (AI3DCC) 2023. Project page with videos and code: https://www.lix.polytechnique.fr/vista/projects/2023_iccvw_courant/

  10. arXiv:2303.18080  [pdf, other

    cs.CV

    One-shot Unsupervised Domain Adaptation with Personalized Diffusion Models

    Authors: Yasser Benigmim, Subhankar Roy, Slim Essid, Vicky Kalogeiton, Stéphane Lathuilière

    Abstract: Adapting a segmentation model from a labeled source domain to a target domain, where a single unlabeled datum is available, is one the most challenging problems in domain adaptation and is otherwise known as one-shot unsupervised domain adaptation (OSUDA). Most of the prior works have addressed the problem by relying on style transfer techniques, where the source images are stylized to have the ap… ▽ More

    Submitted 16 June, 2023; v1 submitted 31 March, 2023; originally announced March 2023.

    Comments: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition- Workshop on Generative Models for Computer Vision (CVPR-W 2023)

  11. arXiv:2303.12445  [pdf, other

    cs.CV cs.AI

    MEDIMP: 3D Medical Images with clinical Prompts from limited tabular data for renal transplantation

    Authors: Leo Milecki, Vicky Kalogeiton, Sylvain Bodard, Dany Anglicheau, Jean-Michel Correas, Marc-Olivier Timsit, Maria Vakalopoulou

    Abstract: Renal transplantation emerges as the most effective solution for end-stage renal disease. Occurring from complex causes, a substantial risk of transplant chronic dysfunction persists and may lead to graft loss. Medical imaging plays a substantial role in renal transplant monitoring in clinical practice. However, graft supervision is multi-disciplinary, notably joining nephrology, urology, and radi… ▽ More

    Submitted 29 April, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

  12. arXiv:2303.12068  [pdf, other

    cs.CV

    Machine Learning for Brain Disorders: Transformers and Visual Transformers

    Authors: Robin Courant, Maika Edberg, Nicolas Dufour, Vicky Kalogeiton

    Abstract: Transformers were initially introduced for natural language processing (NLP) tasks, but fast they were adopted by most deep learning fields, including computer vision. They measure the relationships between pairs of input tokens (words in the case of text strings, parts of images for visual Transformers), termed attention. The cost is exponential with the number of tokens. For image classification… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: To appear in O. Colliot (Ed.), Machine Learning for Brain Disorders, Springer

  13. arXiv:2302.05740  [pdf, other

    cs.LG cs.AI

    UGAE: A Novel Approach to Non-exponential Discounting

    Authors: Ariel Kwiatkowski, Vicky Kalogeiton, Julien Pettré, Marie-Paule Cani

    Abstract: The discounting mechanism in Reinforcement Learning determines the relative importance of future and present rewards. While exponential discounting is widely used in practice, non-exponential discounting methods that align with human behavior are often desirable for creating human-like agents. However, non-exponential discounting methods cannot be directly applied in modern on-policy actor-critic… ▽ More

    Submitted 11 February, 2023; originally announced February 2023.

  14. arXiv:2210.04883  [pdf, other

    cs.CV cs.AI cs.LG

    SCAM! Transferring humans between images with Semantic Cross Attention Modulation

    Authors: Nicolas Dufour, David Picard, Vicky Kalogeiton

    Abstract: A large body of recent work targets semantically conditioned image generation. Most such methods focus on the narrower task of pose transfer and ignore the more challenging task of subject transfer that consists in not only transferring the pose but also the appearance and background. In this work, we introduce SCAM (Semantic Cross Attention Modulation), a system that encodes rich and diverse info… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

    Comments: Accepted at ECCV 2022

  15. arXiv:2209.09344  [pdf, other

    cs.LG cs.AI cs.GR

    Understanding reinforcement learned crowds

    Authors: Ariel Kwiatkowski, Vicky Kalogeiton, Julien Pettré, Marie-Paule Cani

    Abstract: Simulating trajectories of virtual crowds is a commonly encountered task in Computer Graphics. Several recent works have applied Reinforcement Learning methods to animate virtual agents, however they often make different design choices when it comes to the fundamental simulation setup. Each of these choices comes with a reasonable justification for its use, so it is not obvious what is their real… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Comments: Accepted for publication at MIG 2022

    MSC Class: 68Q32 ACM Class: I.2.6; I.3.8

  16. A Survey on Reinforcement Learning Methods in Character Animation

    Authors: Ariel Kwiatkowski, Eduardo Alvarado, Vicky Kalogeiton, C. Karen Liu, Julien Pettré, Michiel van de Panne, Marie-Paule Cani

    Abstract: Reinforcement Learning is an area of Machine Learning focused on how agents can be trained to make sequential decisions, and achieve a particular goal within an arbitrary environment. While learning, they repeatedly take actions based on their observation of the environment, and receive appropriate rewards which define the objective. This experience is then used to progressively improve the policy… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: 27 pages, 6 figures, Eurographics STAR, Computer Graphics Forum

  17. arXiv:2202.13562  [pdf, other

    cs.CV eess.IV

    Name Your Style: An Arbitrary Artist-aware Image Style Transfer

    Authors: Zhi-Song Liu, Li-Wen Wang, Wan-Chi Siu, Vicky Kalogeiton

    Abstract: Image style transfer has attracted widespread attention in the past few years. Despite its remarkable results, it requires additional style images available as references, making it less flexible and inconvenient. Using text is the most natural way to describe the style. More importantly, text can describe implicit abstract styles, like styles of specific artists or art movements. In this paper, w… ▽ More

    Submitted 4 March, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

    Comments: 15 pages, 15 figures

  18. arXiv:2110.07375  [pdf, other

    cs.CV eess.IV

    Multiple Style Transfer via Variational AutoEncoder

    Authors: Zhi-Song Liu, Vicky Kalogeiton, Marie-Paule Cani

    Abstract: Modern works on style transfer focus on transferring style from a single image. Recently, some approaches study multiple style transfer; these, however, are either too slow or fail to mix multiple styles. We propose ST-VAE, a Variational AutoEncoder for latent space-based style transfer. It performs multiple style transfer by projecting nonlinear styles to a linear latent space, enabling to merge… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

    Comments: 5 papges, 4 figures

  19. arXiv:2105.09939  [pdf, other

    cs.CV

    Face, Body, Voice: Video Person-Clustering with Multiple Modalities

    Authors: Andrew Brown, Vicky Kalogeiton, Andrew Zisserman

    Abstract: The objective of this work is person-clustering in videos -- grouping characters according to their identity. Previous methods focus on the narrower task of face-clustering, and for the most part ignore other cues such as the person's voice, their overall appearance (hair, clothes, posture), and the editing structure of the videos. Similarly, most current datasets evaluate only the task of face-cl… ▽ More

    Submitted 20 May, 2021; originally announced May 2021.

  20. LAEO-Net++: revisiting people Looking At Each Other in videos

    Authors: Manuel J. Marin-Jimenez, Vicky Kalogeiton, Pablo Medina-Suarez, Andrew Zisserman

    Abstract: Capturing the 'mutual gaze' of people is essential for understanding and interpreting the social interactions between them. To this end, this paper addresses the problem of detecting people Looking At Each Other (LAEO) in video sequences. For this purpose, we propose LAEO-Net++, a new deep CNN for determining LAEO in videos. In contrast to previous works, LAEO-Net++ takes spatio-temporal tracks as… ▽ More

    Submitted 6 January, 2021; originally announced January 2021.

    Comments: 16 pages, 16 Figures. arXiv admin note: substantial text overlap with arXiv:1906.05261

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020

  21. arXiv:2007.12163  [pdf, other

    cs.CV

    Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval

    Authors: Andrew Brown, Weidi Xie, Vicky Kalogeiton, Andrew Zisserman

    Abstract: Optimising a ranking-based metric, such as Average Precision (AP), is notoriously challenging due to the fact that it is non-differentiable, and hence cannot be optimised directly using gradient-descent methods. To this end, we introduce an objective that optimises instead a smoothed approximation of AP, coined Smooth-AP. Smooth-AP is a plug-and-play objective function that allows for end-to-end t… ▽ More

    Submitted 8 September, 2020; v1 submitted 23 July, 2020; originally announced July 2020.

    Comments: Accepted at ECCV 2020

  22. arXiv:1906.05261  [pdf, other

    cs.CV

    LAEO-Net: revisiting people Looking At Each Other in videos

    Authors: Manuel J. Marin-Jimenez, Vicky Kalogeiton, Pablo Medina-Suarez, Andrew Zisserman

    Abstract: Capturing the `mutual gaze' of people is essential for understanding and interpreting the social interactions between them. To this end, this paper addresses the problem of detecting people Looking At Each Other (LAEO) in video sequences. For this purpose, we propose LAEO-Net, a new deep CNN for determining LAEO in videos. In contrast to previous works, LAEO-Net takes spatio-temporal tracks as inp… ▽ More

    Submitted 12 June, 2019; originally announced June 2019.

    Comments: CVPR 2019

  23. arXiv:1705.01861  [pdf, other

    cs.CV

    Action Tubelet Detector for Spatio-Temporal Action Localization

    Authors: Vicky Kalogeiton, Philippe Weinzaepfel, Vittorio Ferrari, Cordelia Schmid

    Abstract: Current state-of-the-art approaches for spatio-temporal action localization rely on detections at the frame level that are then linked or tracked across time. In this paper, we leverage the temporal continuity of videos instead of operating at the frame level. We propose the ACtion Tubelet detector (ACT-detector) that takes as input a sequence of frames and outputs tubelets, i.e., sequences of bou… ▽ More

    Submitted 21 August, 2017; v1 submitted 4 May, 2017; originally announced May 2017.

    Comments: 9 pages

  24. arXiv:1604.07803  [pdf, other

    cs.ET

    Programmable Crossbar Quantum-dot Cellular Automata Circuits

    Authors: Vicky S. Kalogeiton, Dim P. Papadopoulos, Orestis Liolis, Vassilios A. Mardiris, Georgios Ch. Sirakoulis, Ioannis G. Karafyllidis

    Abstract: Quantum-dot fabrication and characterization is a well-established technology, which is used in photonics, quantum optics and nanoelectronics. Four quantum-dots placed at the corners of a square form a unit cell, which can hold a bit of information and serve as a basis for Quantum-dot Cellular Automata (QCA) nanoelectronic circuits. Although several basic QCA circuits have been designed, fabricate… ▽ More

    Submitted 26 April, 2016; originally announced April 2016.

  25. arXiv:1501.01186  [pdf, other

    cs.CV

    Analysing domain shift factors between videos and images for object detection

    Authors: Vicky Kalogeiton, Vittorio Ferrari, Cordelia Schmid

    Abstract: Object detection is one of the most important challenges in computer vision. Object detectors are usually trained on bounding-boxes from still images. Recently, video has been used as an alternative source of data. Yet, for a given test domain (image or video), the performance of the detector depends on the domain it was trained on. In this paper, we examine the reasons behind this performance gap… ▽ More

    Submitted 27 January, 2016; v1 submitted 6 January, 2015; originally announced January 2015.

    Comments: 8 pages