Skip to main content

Showing 1–10 of 10 results for author: Thewlis, J

  1. arXiv:2309.10783  [pdf, other

    cs.CV cs.AI cs.CL

    Language as the Medium: Multimodal Video Classification through text only

    Authors: Laura Hanu, Anita L. Verő, James Thewlis

    Abstract: Despite an exciting new wave of multimodal machine learning models, current approaches still struggle to interpret the complex contextual relationships between the different modalities present in videos. Going beyond existing methods that emphasize simple activities or objects, we propose a new model-agnostic approach for generating detailed textual descriptions that captures multimodal video info… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Accepted at "What is Next in Multimodal Foundation Models?" (MMFM) workshop at ICCV 2023

  2. arXiv:2210.10820  [pdf, other

    cs.CV cs.CL cs.IR cs.LG

    VTC: Improving Video-Text Retrieval with User Comments

    Authors: Laura Hanu, James Thewlis, Yuki M. Asano, Christian Rupprecht

    Abstract: Multi-modal retrieval is an important problem for many applications, such as recommendation and search. Current benchmarks and even datasets are often manually constructed and consist of mostly clean samples where all modalities are well-correlated with the content. Thus, current video-text retrieval literature largely focuses on video titles or audio transcripts, while ignoring user comments, sin… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Comments: Accepted paper at the European Conference on Computer Vision (ECCV) 2022

  3. arXiv:1908.06427  [pdf, other

    cs.CV

    Unsupervised Learning of Landmarks by Descriptor Vector Exchange

    Authors: James Thewlis, Samuel Albanie, Hakan Bilen, Andrea Vedaldi

    Abstract: Equivariance to random image transformations is an effective method to learn landmarks of object categories, such as the eyes and the nose in faces, without manual supervision. However, this method does not explicitly guarantee that the learned landmarks are consistent with changes between different instances of the same object, such as different facial identities. In this paper, we develop a new… ▽ More

    Submitted 18 August, 2019; originally announced August 2019.

    Comments: ICCV 2019

  4. arXiv:1906.05706  [pdf, other

    cs.CV

    Slim DensePose: Thrifty Learning from Sparse Annotations and Motion Cues

    Authors: Natalia Neverova, James Thewlis, Rıza Alp Güler, Iasonas Kokkinos, Andrea Vedaldi

    Abstract: DensePose supersedes traditional landmark detectors by densely mapping image pixels to body surface coordinates. This power, however, comes at a greatly increased annotation time, as supervising the model requires to manually label hundreds of points per pose instance. In this work, we thus seek methods to significantly slim down the DensePose annotations, proposing more efficient data collection… ▽ More

    Submitted 13 June, 2019; originally announced June 2019.

    Comments: CVPR 2019

  5. arXiv:1904.01114  [pdf, other

    cs.CV

    Deep Industrial Espionage

    Authors: Samuel Albanie, James Thewlis, Sebastien Ehrhardt, Joao Henriques

    Abstract: The theory of deep learning is now considered largely solved, and is well understood by researchers and influencers alike. To maintain our relevance, we therefore seek to apply our skills to under-explored, lucrative applications of this technology. To this end, we propose and Deep Industrial Espionage, an efficient end-to-end framework for industrial information propagation and productisation. Sp… ▽ More

    Submitted 1 April, 2019; originally announced April 2019.

  6. arXiv:1807.05636  [pdf, other

    cs.CV cs.LG cs.NE

    Cross Pixel Optical Flow Similarity for Self-Supervised Learning

    Authors: Aravindh Mahendran, James Thewlis, Andrea Vedaldi

    Abstract: We propose a novel method for learning convolutional neural image representations without manual supervision. We use motion cues in the form of optical flow, to supervise representations of static images. The obvious approach of training a network to predict flow from a single image can be needlessly difficult due to intrinsic ambiguities in this prediction task. We instead propose a much simpler… ▽ More

    Submitted 15 July, 2018; originally announced July 2018.

    MSC Class: 68T45

  7. arXiv:1803.11560  [pdf, other

    cs.LG

    Substitute Teacher Networks: Learning with Almost No Supervision

    Authors: Samuel Albanie, James Thewlis, Joao F. Henriques

    Abstract: Learning through experience is time-consuming, inefficient and often bad for your cortisol levels. To address this problem, a number of recently proposed teacher-student methods have demonstrated the benefits of private tuition, in which a single model learns from an ensemble of more experienced tutors. Unfortunately, the cost of such supervision restricts good representations to a privileged mino… ▽ More

    Submitted 1 April, 2018; originally announced March 2018.

    Comments: Published as a conference at SIGBOVIK 2018

  8. arXiv:1706.02932  [pdf, other

    cs.CV stat.ML

    Unsupervised learning of object frames by dense equivariant image labelling

    Authors: James Thewlis, Hakan Bilen, Andrea Vedaldi

    Abstract: One of the key challenges of visual perception is to extract abstract models of 3D objects and object categories from visual measurements, which are affected by complex nuisance factors such as viewpoint, occlusion, motion, and deformations. Starting from the recent idea of viewpoint factorization, we propose a new approach that, given a large number of images of an object and no other supervision… ▽ More

    Submitted 17 November, 2017; v1 submitted 9 June, 2017; originally announced June 2017.

    Comments: NIPS 2017

  9. arXiv:1705.02193  [pdf, other

    cs.CV stat.ML

    Unsupervised learning of object landmarks by factorized spatial embeddings

    Authors: James Thewlis, Hakan Bilen, Andrea Vedaldi

    Abstract: Learning automatically the structure of object categories remains an important open problem in computer vision. In this paper, we propose a novel unsupervised approach that can discover and learn landmarks in object categories, thus characterizing their structure. Our approach is based on factorizing image deformations, as induced by a viewpoint change or an object deformation, by learning a deep… ▽ More

    Submitted 6 August, 2017; v1 submitted 5 May, 2017; originally announced May 2017.

    Comments: To be published in ICCV 2017

  10. arXiv:1609.03532  [pdf, other

    cs.CV

    Fully-Trainable Deep Matching

    Authors: James Thewlis, Shuai Zheng, Philip H. S. Torr, Andrea Vedaldi

    Abstract: Deep Matching (DM) is a popular high-quality method for quasi-dense image matching. Despite its name, however, the original DM formulation does not yield a deep neural network that can be trained end-to-end via backpropagation. In this paper, we remove this limitation by rewriting the complete DM algorithm as a convolutional neural network. This results in a novel deep architecture for image match… ▽ More

    Submitted 12 September, 2016; originally announced September 2016.

    Comments: British Machine Vision Conference (BMVC) 2016