DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

Jia, Yuru; Hoyer, Lukas; Huang, Shengyu; Wang, Tianfu; Van Gool, Luc; Schindler, Konrad; Obukhov, Anton

Computer Science > Computer Vision and Pattern Recognition

arXiv:2312.03048 (cs)

[Submitted on 5 Dec 2023 (v1), last revised 31 Jul 2024 (this version, v3)]

Title:DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

Authors:Yuru Jia, Lukas Hoyer, Shengyu Huang, Tianfu Wang, Luc Van Gool, Konrad Schindler, Anton Obukhov

View PDF HTML (experimental)

Abstract:Large, pretrained latent diffusion models (LDMs) have demonstrated an extraordinary ability to generate creative content, specialize to user data through few-shot fine-tuning, and condition their output on other modalities, such as semantic maps. However, are they usable as large-scale data generators, e.g., to improve tasks in the perception stack, like semantic segmentation? We investigate this question in the context of autonomous driving, and answer it with a resounding "yes". We propose an efficient data generation pipeline termed DGInStyle. First, we examine the problem of specializing a pretrained LDM to semantically-controlled generation within a narrow domain. Second, we propose a Style Swap technique to endow the rich generative prior with the learned semantic control. Third, we design a Multi-resolution Latent Fusion technique to overcome the bias of LDMs towards dominant objects. Using DGInStyle, we generate a diverse dataset of street scenes, train a domain-agnostic semantic segmentation model on it, and evaluate the model on multiple popular autonomous driving datasets. Our approach consistently increases the performance of several domain generalization methods compared to the previous state-of-the-art methods. The source code and the generated dataset are available at this https URL.

Comments:	ECCV 2024, camera ready
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2312.03048 [cs.CV]
	(or arXiv:2312.03048v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2312.03048

Submission history

From: Yuru Jia [view email]
[v1] Tue, 5 Dec 2023 18:34:12 UTC (9,869 KB)
[v2] Mon, 8 Apr 2024 08:59:24 UTC (12,271 KB)
[v3] Wed, 31 Jul 2024 13:02:51 UTC (41,022 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators