Opportunities and challenges of diffusion models for generative AI

doi:10.1093/nsr/nwae348

Review

. 2024 Oct 3;11(12):nwae348.

doi: 10.1093/nsr/nwae348. eCollection 2024 Dec.

Opportunities and challenges of diffusion models for generative AI

Minshuo Chen¹, Song Mei², Jianqing Fan³, Mengdi Wang¹

Affiliations

¹ Department of Electrical and Computer Engineering, Princeton University, Princeton 08544, USA.
² Department of Statistics, University of California, Berkeley, Berkeley 94720, USA.
³ Department of Operations Research and Financial Engineering, Princeton University, Princeton 08544, USA.

PMID: 39554240
PMCID: PMC11562846
DOI: 10.1093/nsr/nwae348

Review

Opportunities and challenges of diffusion models for generative AI

Minshuo Chen et al. Natl Sci Rev. 2024.

. 2024 Oct 3;11(12):nwae348.

doi: 10.1093/nsr/nwae348. eCollection 2024 Dec.

Authors

Minshuo Chen¹, Song Mei², Jianqing Fan³, Mengdi Wang¹

Affiliations

¹ Department of Electrical and Computer Engineering, Princeton University, Princeton 08544, USA.
² Department of Statistics, University of California, Berkeley, Berkeley 94720, USA.
³ Department of Operations Research and Financial Engineering, Princeton University, Princeton 08544, USA.

PMID: 39554240
PMCID: PMC11562846
DOI: 10.1093/nsr/nwae348

Abstract

Diffusion models, a powerful and universal generative artificial intelligence technology, have achieved tremendous success and opened up new possibilities in diverse applications. In these applications, diffusion models provide flexible high-dimensional data modeling, and act as a sampler for generating new samples under active control towards task-desired properties. Despite the significant empirical success, theoretical underpinnings of diffusion models are very limited, potentially slowing down principled methodological innovations for further harnessing and improving diffusion models. In this paper, we review emerging applications of diffusion models to highlight their sample generation capabilities under various control goals. At the same time, we dive into the unique working flow of diffusion models through the lens of stochastic processes. We identify theoretical challenges in analyzing diffusion models, owing to their complicated training procedure and interaction with the underlying data distribution. To address these challenges, we overview several promising advances, demonstrating diffusion models as an efficient distribution learner and a sampler. Furthermore, we introduce a new avenue in high-dimensional structured optimization through diffusion models, where searching for solutions is reformulated as a conditional sampling problem and solved by diffusion models. Lastly, we discuss future directions about diffusion models. The purpose of this paper is to provide a well-rounded exposure for stimulating forward-looking theories and methods of diffusion models.

Keywords: diffusion model; generative AI; optimization; sample generation under controls.

PubMed Disclaimer

Figures

**Figure 1.**
Demonstration of forward and backward processes in diffusion models. The forward process is a noise corruption process, where Gaussian noise of increasing variance is progressively added to the clean data. The backward process is used for new sample generation starting from a standard Gaussian distribution, where the score function steers the generation process.

**Figure 2.**
Conditional diffusion models generate images under various guidance. The upper row demonstrates an alignment with text description consisting of multiple objects. The lower row demonstrates an abstract description of aesthetic quality. Reproduced with permission from Black *et al.* [32].

**Figure 3.**
Decision Diffuser and AdaptDiffuser in [59] and [61], respectively. Decision diffuser is trained on offline-labeled trajectories and is capable of generating new trajectories conditioned on desired reward values, or skills. AdaptDiffuser introduces a self-evolution loop utilizing selected high-quality trajectories from a trainable discriminator.

**Figure 4.**
U-Net architecture for resolution RGB images. When generating new samples using a discretized backward process, diffusion models utilize U-Net at each discretization step for transforming samples. The image sample together with a time embedding is first compressed into a low-dimensional representation and then lifted back to the original dimension. Reproduced with permission from Ronneberger *et al.* [70]. Copyright 2015 Springer.

formula image — **Figure 4.**
U-Net architecture for resolution RGB images. When generating new samples using a discretized backward process, diffusion models utilize U-Net at each discretization step for transforming samples. The image sample together with a time embedding is first compressed into a low-dimensional representation and then lifted back to the original dimension. Reproduced with permission from Ronneberger *et al.* [70]. Copyright 2015 Springer.

**Figure 5.**
Simplified U-Net architecture for approximating score functions in the low-dimensional subspace data setting. Matrix V represents the linear encoder and decoder, which is to be jointly learned with parameter during the optimization of loss (12). Here is a network with input and output dimensions being the subspace dimension. is the score network parameterized by V and . Reproduced with permission from Chen *et al.* [40].

**Figure 6.**
The effect of guidance strength on a three-component GMM in [47,107]. Each component has weight and identity covariance, and the component centers are , and . The leftmost panel displays the unguided density. We increase the guidance strength from left to right. When generating samples, we use the ground truth score. Reproduced with permission from Wu *et al.* [107].

**Figure 7.**
Illustration of a negative effect of large guidance strength. In this plot, the component means of the Gaussian mixture model are aligned on the same line. We increase the guidance strength from left to right. The upper row uses a relatively large discretization step size in the backward process. With a large , the center component splits into two clusters at an earlier stage. The bottom row uses a much smaller discretization step size; the center component then splits only with a much larger . Reproduced with permission from Wu *et al.* [107].

**Figure 8.**
Reformulation of data-driven black-box optimization as conditional sampling in [67]. The conditional distribution takes the targeted function value as the conditioning and is learned from a pre-collected data set. Reproduced with permission from Li *et al.* [67].

**Figure 9.**
The learning algorithm proposed in [67] consists of four steps. In the first step a reward model is learned from the labeled data . In the second step, the learned reward model is deployed as a pseudo-labeler to label . In the third step, a conditional diffusion model is trained using the pseudo-labeled data. Lastly, in the fourth step, new samples are generated from the conditional distribution by specifying a target reward value a. Reproduced with permission from Li *et al.* [67].

See this image and copyright information in PMC

References

1. Bommasani R, Hudson DA, Adeli E et al. On the opportunities and risks of foundation models. arXiv: 2108.07258.
1. Giuffrè M, Shung DL. Harnessing the power of synthetic data in healthcare: innovation, application, and privacy. NPJ Digit Med 2023; 6: 186.10.1038/s41746-023-00927-3 - DOI - PMC - PubMed
1. Yang L, Zhang Z, Song Y et al. Diffusion models: a comprehensive survey of methods and applications. ACM Comput Surv 2024; 56: 105.10.1145/3626235 - DOI
1. Stokel-Walker C, Van Noorden R. What ChatGPT and generative AI mean for science. Nature 2023; 614: 214–6. - PubMed
1. Sohl-Dickstein J, Weiss E, Maheswaranathan N et al. Deep unsupervised learning using nonequilibrium thermodynamics. In: Proceedings of the 32nd International Conference on Machine Learning, Vol. 37. JMLR, 2015, 2256–65.

Publication types

Actions

LinkOut - more resources

Full Text Sources
- PubMed Central
- Silverchair Information Systems

[1] Bommasani R, Hudson DA, Adeli E et al. On the opportunities and risks of foundation models. arXiv: 2108.07258.

[2] Bommasani R, Hudson DA, Adeli E et al. On the opportunities and risks of foundation models. arXiv: 2108.07258.

[3] Giuffrè M, Shung DL. Harnessing the power of synthetic data in healthcare: innovation, application, and privacy. NPJ Digit Med 2023; 6: 186.10.1038/s41746-023-00927-3 - DOI - PMC - PubMed

[4] Giuffrè M, Shung DL. Harnessing the power of synthetic data in healthcare: innovation, application, and privacy. NPJ Digit Med 2023; 6: 186.10.1038/s41746-023-00927-3 - DOI - PMC - PubMed

[5] Yang L, Zhang Z, Song Y et al. Diffusion models: a comprehensive survey of methods and applications. ACM Comput Surv 2024; 56: 105.10.1145/3626235 - DOI

[6] Yang L, Zhang Z, Song Y et al. Diffusion models: a comprehensive survey of methods and applications. ACM Comput Surv 2024; 56: 105.10.1145/3626235 - DOI

[7] Stokel-Walker C, Van Noorden R. What ChatGPT and generative AI mean for science. Nature 2023; 614: 214–6. - PubMed

[8] Stokel-Walker C, Van Noorden R. What ChatGPT and generative AI mean for science. Nature 2023; 614: 214–6. - PubMed

[9] Sohl-Dickstein J, Weiss E, Maheswaranathan N et al. Deep unsupervised learning using nonequilibrium thermodynamics. In: Proceedings of the 32nd International Conference on Machine Learning, Vol. 37. JMLR, 2015, 2256–65.

[10] Sohl-Dickstein J, Weiss E, Maheswaranathan N et al. Deep unsupervised learning using nonequilibrium thermodynamics. In: Proceedings of the 32nd International Conference on Machine Learning, Vol. 37. JMLR, 2015, 2256–65.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Opportunities and challenges of diffusion models for generative AI

Affiliations

Opportunities and challenges of diffusion models for generative AI

Authors

Affiliations

Abstract

Figures

References

Publication types

LinkOut - more resources

Full Text Sources

Abstract

Figures

References

Publication types

Related information

LinkOut - more resources

Full Text Sources