Skip to main content

Showing 1–14 of 14 results for author: Stan, G

  1. arXiv:2406.01843  [pdf, other

    cs.CV

    L-MAGIC: Language Model Assisted Generation of Images with Coherence

    Authors: Zhipeng Cai, Matthias Mueller, Reiner Birkl, Diana Wofk, Shao-Yen Tseng, JunDa Cheng, Gabriela Ben-Melech Stan, Vasudev Lal, Michael Paulitsch

    Abstract: In the current era of generative AI breakthroughs, generating panoramic scenes from a single input image remains a key challenge. Most existing methods use diffusion-based iterative or simultaneous multi-view inpainting. However, the lack of global scene layout priors leads to subpar outputs with duplicated objects (e.g., multiple beds in a bedroom) or requires time-consuming human text inputs for… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: accepted to CVPR 2024

  2. arXiv:2404.03118  [pdf, other

    cs.CV

    LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models

    Authors: Gabriela Ben Melech Stan, Estelle Aflalo, Raanan Yehezkel Rohekar, Anahita Bhiwandiwalla, Shao-Yen Tseng, Matthew Lyle Olson, Yaniv Gurwicz, Chenfei Wu, Nan Duan, Vasudev Lal

    Abstract: In the rapidly evolving landscape of artificial intelligence, multi-modal large language models are emerging as a significant area of interest. These models, which combine various forms of data input, are becoming increasingly popular. However, understanding their internal mechanisms remains a complex task. Numerous advancements have been made in the field of explainability tools and mechanisms, y… ▽ More

    Submitted 24 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

  3. arXiv:2404.01197  [pdf, other

    cs.CV

    Getting it Right: Improving Spatial Consistency in Text-to-Image Models

    Authors: Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Aflalo, Sayak Paul, Dhruba Ghosh, Tejas Gokhale, Ludwig Schmidt, Hannaneh Hajishirzi, Vasudev Lal, Chitta Baral, Yezhou Yang

    Abstract: One of the key shortcomings in current text-to-image (T2I) models is their inability to consistently generate images which faithfully follow the spatial relationships specified in the text prompt. In this paper, we offer a comprehensive investigation of this limitation, while also developing datasets and methods that achieve state-of-the-art performance. First, we find that current vision-language… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: project webpage : https://spright-t2i.github.io/

  4. arXiv:2402.13033  [pdf, other

    cs.LG cs.IR cs.SI

    Enhancing Real-World Complex Network Representations with Hyperedge Augmentation

    Authors: Xiangyu Zhao, Zehui Li, Mingzhu Shen, Guy-Bart Stan, Pietro Liò, Yiren Zhao

    Abstract: Graph augmentation methods play a crucial role in improving the performance and enhancing generalisation capabilities in Graph Neural Networks (GNNs). Existing graph augmentation methods mainly perturb the graph structures and are usually limited to pairwise node relations. These methods cannot fully address the complexities of real-world large-scale networks that often involve higher-order node r… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: Preprint. Under review. 17 pages, 4 figures, 14 tables. arXiv admin note: text overlap with arXiv:2306.05108

  5. arXiv:2402.06079  [pdf, other

    q-bio.GN cs.AI cs.LG

    DiscDiff: Latent Diffusion Model for DNA Sequence Generation

    Authors: Zehui Li, Yuhao Ni, William A V Beardall, Guoxuan Xia, Akashaditya Das, Guy-Bart Stan, Yiren Zhao

    Abstract: This paper introduces a novel framework for DNA sequence generation, comprising two key components: DiscDiff, a Latent Diffusion Model (LDM) tailored for generating discrete DNA sequences, and Absorb-Escape, a post-training algorithm designed to refine these sequences. Absorb-Escape enhances the realism of the generated sequences by correcting `round errors' inherent in the conversion process betw… ▽ More

    Submitted 17 April, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: Different from the prior work "Latent Diffusion Model for DNA Sequence Generation" (arXiv:2310.06150), we updated the evaluation framework and compared the DiscDiff with other methods comprehensively. In addition, a post-training framework is proposed to increase the quality of generated sequences

  6. arXiv:2311.03226  [pdf, other

    cs.CV cs.AI

    LDM3D-VR: Latent Diffusion Model for 3D VR

    Authors: Gabriela Ben Melech Stan, Diana Wofk, Estelle Aflalo, Shao-Yen Tseng, Zhipeng Cai, Michael Paulitsch, Vasudev Lal

    Abstract: Latent diffusion models have proven to be state-of-the-art in the creation and manipulation of visual outputs. However, as far as we know, the generation of depth maps jointly with RGB is still limited. We introduce LDM3D-VR, a suite of diffusion models targeting virtual reality development that includes LDM3D-pano and LDM3D-SR. These models enable the generation of panoramic RGBD based on textual… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: Accepted to Workshop on Diffusion Models, NeurIPS 2023

  7. arXiv:2310.06150  [pdf, other

    cs.LG

    Latent Diffusion Model for DNA Sequence Generation

    Authors: Zehui Li, Yuhao Ni, Tim August B. Huygelen, Akashaditya Das, Guoxuan Xia, Guy-Bart Stan, Yiren Zhao

    Abstract: The harnessing of machine learning, especially deep generative models, has opened up promising avenues in the field of synthetic DNA sequence generation. Whilst Generative Adversarial Networks (GANs) have gained traction for this application, they often face issues such as limited sample diversity and mode collapse. On the other hand, Diffusion Models are a promising new class of generative models… ▽ More

    Submitted 24 December, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: 2023 Conference on Neural Information Processing Systems (NeurIPS 2023) AI for Science Workshop

  8. arXiv:2306.05143  [pdf, other

    cs.LG q-bio.GN

    Genomic Interpreter: A Hierarchical Genomic Deep Neural Network with 1D Shifted Window Transformer

    Authors: Zehui Li, Akashaditya Das, William A V Beardall, Yiren Zhao, Guy-Bart Stan

    Abstract: Given the increasing volume and quality of genomics data, extracting new insights requires interpretable machine-learning models. This work presents Genomic Interpreter: a novel architecture for genomic assay prediction. This model outperforms the state-of-the-art models for genomic assay prediction tasks. Our model can identify hierarchical dependencies in genomic sites. This is achieved through… ▽ More

    Submitted 28 June, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: 40th International Conference on Machine Learning (ICML 2023) Workshop on Computational Biology (WCB)

  9. arXiv:2306.05108  [pdf, other

    cs.LG cs.SI

    Hybrid Graph: A Unified Graph Representation with Datasets and Benchmarks for Complex Graphs

    Authors: Zehui Li, Xiangyu Zhao, Mingzhu Shen, Guy-Bart Stan, Pietro Liò, Yiren Zhao

    Abstract: Graphs are widely used to encapsulate a variety of data formats, but real-world networks often involve complex node relations beyond only being pairwise. While hypergraphs and hierarchical graphs have been developed and employed to account for the complex node relations, they cannot fully represent these complexities in practice. Additionally, though many Graph Neural Networks (GNNs) have been pro… ▽ More

    Submitted 20 February, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: 16 pages, 5 figures, 11 tables

  10. arXiv:2305.10853  [pdf, other

    cs.CV

    LDM3D: Latent Diffusion Model for 3D

    Authors: Gabriela Ben Melech Stan, Diana Wofk, Scottie Fox, Alex Redden, Will Saxton, Jean Yu, Estelle Aflalo, Shao-Yen Tseng, Fabio Nonato, Matthias Muller, Vasudev Lal

    Abstract: This research paper proposes a Latent Diffusion Model for 3D (LDM3D) that generates both image and depth map data from a given text prompt, allowing users to generate RGBD images from text prompts. The LDM3D model is fine-tuned on a dataset of tuples containing an RGB image, depth map and caption, and validated through extensive experiments. We also develop an application called DepthFusion, which… ▽ More

    Submitted 21 May, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

  11. arXiv:2208.11553  [pdf, other

    cs.CV

    MuMUR : Multilingual Multimodal Universal Retrieval

    Authors: Avinash Madasu, Estelle Aflalo, Gabriela Ben Melech Stan, Shachar Rosenman, Shao-Yen Tseng, Gedas Bertasius, Vasudev Lal

    Abstract: Multi-modal retrieval has seen tremendous progress with the development of vision-language models. However, further improving these models require additional labelled data which is a huge manual effort. In this paper, we propose a framework MuMUR, that utilizes knowledge transfer from a multilingual model to boost the performance of multi-modal (image and video) retrieval. We first use state-of-th… ▽ More

    Submitted 19 September, 2023; v1 submitted 24 August, 2022; originally announced August 2022.

    Comments: This is an extension of the previous MKTVR paper (for which you can find a reference here : https://dl.acm.org/doi/abs/10.1007/978-3-031-28244-7_42 or in a previous version on arxiv). This version was published to the Information Retrieval Journal

  12. arXiv:1809.00409  [pdf, other

    physics.soc-ph cs.SI math.DS

    Global Network Prediction from Local Node Dynamics

    Authors: Neave O'Clery, Ye Yuan, Guy-Bart Stan, Mauricio Barahona

    Abstract: The study of dynamical systems on networks, describing complex interactive processes, provides insight into how network structure affects global behaviour. Yet many methods for network dynamics fail to cope with large or partially-known networks, a ubiquitous situation in real-world applications. Here we propose a localised method, applicable to a broad class of dynamical models on networks, where… ▽ More

    Submitted 2 September, 2018; originally announced September 2018.

  13. arXiv:1403.7429  [pdf, other

    math.OC cs.DC cs.LG eess.SY

    Distributed Reconstruction of Nonlinear Networks: An ADMM Approach

    Authors: Wei Pan, Aivar Sootla, Guy-Bart Stan

    Abstract: In this paper, we present a distributed algorithm for the reconstruction of large-scale nonlinear networks. In particular, we focus on the identification from time-series data of the nonlinear functional forms and associated parameters of large-scale nonlinear networks. Recently, a nonlinear network reconstruction problem was formulated as a nonconvex optimisation problem based on the combination… ▽ More

    Submitted 28 March, 2014; originally announced March 2014.

    Comments: To appear in the Preprints of 19th IFAC World Congress 2014

  14. arXiv:1303.3183  [pdf, ps, other

    eess.SY cs.CE cs.LG q-bio.MN

    Toggling a Genetic Switch Using Reinforcement Learning

    Authors: Aivar Sootla, Natalja Strelkowa, Damien Ernst, Mauricio Barahona, Guy-Bart Stan

    Abstract: In this paper, we consider the problem of optimal exogenous control of gene regulatory networks. Our approach consists in adapting an established reinforcement learning algorithm called the fitted Q iteration. This algorithm infers the control law directly from the measurements of the system's response to external control inputs without the use of a mathematical model of the system. The measuremen… ▽ More

    Submitted 25 February, 2015; v1 submitted 12 March, 2013; originally announced March 2013.

    Comments: 12 pages, presented at the 9th French Meeting on Planning, Decision Making and Learning, Liège (Belgium), May 12-13, 2014