Computer Science > Computer Vision and Pattern Recognition
[Submitted on 29 Jul 2021 (v1), last revised 3 Aug 2022 (this version, v3)]
Title:PPT Fusion: Pyramid Patch Transformerfor a Case Study in Image Fusion
View PDFAbstract:The Transformer architecture has witnessed a rapid development in recent years, outperforming the CNN architectures in many computer vision tasks, as exemplified by the Vision Transformers (ViT) for image classification. However, existing visual transformer models aim to extract semantic information for high-level tasks, such as classification and detection.These methods ignore the importance of the spatial resolution of the input image, thus sacrificing the local correlation information of neighboring pixels. In this paper, we propose a Patch Pyramid Transformer(PPT) to effectively address the above issues.Specifically, we first design a Patch Transformer to transform the image into a sequence of patches, where transformer encoding is performed for each patch to extract local representations. In addition, we construct a Pyramid Transformer to effectively extract the non-local information from the entire image. After obtaining a set of multi-scale, multi-dimensional, and multi-angle features of the original image, we design the image reconstruction network to ensure that the features can be reconstructed into the original input. To validate the effectiveness, we apply the proposed Patch Pyramid Transformer to image fusion tasks. The experimental results demonstrate its superior performance, compared to the state-of-the-art fusion approaches, achieving the best results on several evaluation indicators. Thanks to the underlying representational capacity of the PPT network, it can directly be applied to different image fusion tasks without redesigning or retraining the network.
Submission history
From: Yu Fu [view email][v1] Thu, 29 Jul 2021 13:57:45 UTC (5,206 KB)
[v2] Wed, 15 Dec 2021 01:48:51 UTC (6,842 KB)
[v3] Wed, 3 Aug 2022 02:06:32 UTC (6,763 KB)
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.