Computer Science > Computer Vision and Pattern Recognition

arXiv:2402.10093 (cs)

[Submitted on 15 Feb 2024 (v1), last revised 3 Jun 2024 (this version, v2)]

Title:MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations

Authors:Benedikt Alkin, Lukas Miklautz, Sepp Hochreiter, Johannes Brandstetter

Abstract:We introduce MIM (Masked Image Modeling)-Refiner, a contrastive learning boost for pre-trained MIM models. MIM-Refiner is motivated by the insight that strong representations within MIM models generally reside in intermediate layers. Accordingly, MIM-Refiner leverages multiple contrastive heads that are connected to different intermediate layers. In each head, a modified nearest neighbor objective constructs semantic clusters that capture semantic information which improves performance on downstream tasks, including off-the-shelf and fine-tuning settings.
The refinement process is short and simple - yet highly effective. Within a few epochs, we refine the features of MIM models from subpar to state-of-the-art, off-the-shelf features. Refining a ViT-H, pre-trained with data2vec 2.0 on ImageNet-1K, sets a new state-of-the-art in linear probing (84.7%) and low-shot classification among models that are pre-trained on ImageNet-1K. At ImageNet-1K 1-shot classification, MIM-Refiner advances the state-of-the-art to 64.2%, outperforming larger models that were trained on up to 2000 times more data such as DINOv2-g, OpenCLIP-G and MAWS-6.5B.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2402.10093 [cs.CV]
	(or arXiv:2402.10093v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2402.10093

Submission history

From: Benedikt Alkin [view email]
[v1] Thu, 15 Feb 2024 16:46:16 UTC (844 KB)
[v2] Mon, 3 Jun 2024 17:51:58 UTC (872 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators