\paperheight

=11in

Generative prediction of flow field based on the diffusion model

Jiajun Hu\aff1    Zhen Lu\aff1\corresp zhen.lu@pku.edu.cn    Yue Yang\aff1, 2\corresp yyg@pku.edu.cn \aff1State Key Laboratory for Turbulence and Complex Systems, College of Engineering, Peking University, Beijing 100871, China \aff2HEDPS-CAPT, Peking University, Beijing 100871, China
Abstract

We propose a geometry-to-flow diffusion model that utilizes the input of obstacle shape to predict a flow field past the obstacle. The model is based on a learnable Markov transition kernel to recover the data distribution from the Gaussian distribution. The Markov process is conditioned on the obstacle geometry, estimating the noise to be removed at each step, implemented via a U-Net. A cross-attention mechanism incorporates the geometry as a prompt. We train the geometry-to-flow diffusion model using a dataset of flows past simple obstacles, including the circle, ellipse, rectangle, and triangle. For comparison, the CNN model is trained using the same dataset. Tests are carried out on flows past obstacles with simple and complex geometries, representing interpolation and extrapolation on the geometry condition, respectively. In the test set, challenging scenarios include a cross and characters ‘PKU’. Generated flow fields show that the geometry-to-flow diffusion model is superior to the CNN model in predicting instantaneous flow fields and handling complex geometries. Quantitative analysis of the model accuracy and divergence in the fields demonstrate the high robustness of the diffusion model, indicating that the diffusion model learns physical laws implicitly.

keywords:
machine learning, computational methods, vortex shedding

1 Introduction

Understanding the interaction between fluid and solid objects is crucial for optimizing various engineering systems (Shelley & Zhang, 2011). Computational fluid dynamics (CFD) plays a vital role in revealing the complex fluid-solid interactions (Tong et al., 2021) and enhancing design efficiency. However, CFD faces several challenges, including the expertise required to generate high-quality computational meshes, especially for complex geometries (Perot, 2011), and the substantial computational resources needed for high-fidelity simulations (Kern et al., 2024). These challenges have motivated the exploration of CFD-accelerating approaches.

Machine learning (ML) has emerged as a promising solution to address the limitations of traditional CFD methods (Kochkov et al., 2021). By leveraging the data-driven models, ML has demonstrated the potential to significantly accelerate simulations (Vinuesa & Brunton, 2022). Various ML approaches have been applied to CFD successively, including super-resolution of flow fields (Fukami et al., 2019), estimating forces of a plate in flows (Tong et al., 2022), physics-informed neural network (PINN) for flow simulation (Chu et al., 2022), velocity interpolation in multiphase flows (Hu et al., 2024). Despite these advancements, many existing ML methods struggle with extrapolation (Wu et al., 2020). The deficit of ML methods on generalization to out-of-distribution scenarios poses a challenge for their application in CFD (Fukami et al., 2024), where extrapolation on flow parameters and solid geometry is often expected.

Generative artificial intelligence has recently revolutionized many fields, particularly large language models (Biever, 2023) and image generation (Pan et al., 2023). Generative models aim to learn the underlying probability distribution of a given dataset. Once trained, they generate outputs by sampling from the distribution. They have been applied to scientific research, such as the material design (Fu et al., 2023) and active control (Kim et al., 2024). Examples in CFD include the super-resolution of turbulence field via a generative adversarial network (Kim et al., 2021) and flow field prediction using a variational autoencoder (Ranade et al., 2021). However, most existing works still focus on interpolation within the conditions listed in the training set, lacking further examination of out-of-distribution generalization.

The diffusion model (Yang et al., 2023) has emerged as a powerful probabilistic method, with predominant formulations including the denoising diffusion probabilistic model (DDPM) (Sohl-Dickstein et al., 2015) and score-based generative models (Song & Ermon, 2020). The diffusion model progressively perturbs data distribution to the Gaussian by injecting noise, then learns to reverse this process for sample generation. Shan et al. (2024) raised a physics-informed diffusion model for super-resolution. Qiu et al. (2024) trained physics-informed diffusion model to output time evolution of flow past a cylinder and inside veins. The ability of the diffusion model to learn distributions and its potential for extrapolation makes it a promising approach for accelerating CFD and exploring complex flow phenomena around bodies with diverse geometries.

In this work, we mainly focus on the two-dimensional (2D) flow past an obstacle. We develop a geometry-to-flow (G2F) diffusion model in which the obstacle geometry serves as a prompt to guide the gradual denoising process, generating the flow field around obstacles. The G2F diffusion model is trained with elementary geometries. Evaluations using simple and complex geometries validate the model in interpolation and extrapolation tasks. Our results demonstrate the model’s capability to generate flow fields for a wide range of obstacle shapes, including those outside the training distribution.

2 Geometry prompt to flow field output

2.1 DDPM

We use the obstacle geometry as a prompt to generate a flow past the obstacle as output through the DDPM. Figure 1 provides a brief overview of this G2F diffusion model. There are two diffusion processes in the DDPM – the forward diffusion perturbs data to noise, and the reverse diffusion converts noise back to data (Sohl-Dickstein et al., 2015). The DDPM learns a probabilistic representation of the underlying data, thereby facilitating the generation of high-quality samples that resemble the target distribution.

For data 𝒛0subscript𝒛0\boldsymbol{z}_{0}bold_italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT sampled from a distribution q(𝒛0)𝑞subscript𝒛0q(\boldsymbol{z}_{0})italic_q ( bold_italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), the forward diffusion utilizes a Markov transition kernel

q(𝒛t|𝒛t1)=𝒩(𝒛t;1βt𝒛t1,βt𝑰),t=1,,Tformulae-sequence𝑞conditionalsubscript𝒛𝑡subscript𝒛𝑡1𝒩subscript𝒛𝑡1subscript𝛽𝑡subscript𝒛𝑡1subscript𝛽𝑡𝑰𝑡1𝑇q\left(\boldsymbol{z}_{t}|\boldsymbol{z}_{t-1}\right)=\mathcal{N}\left(% \boldsymbol{z}_{t};\sqrt{1-\beta_{t}}\boldsymbol{z}_{t-1},\beta_{t}\boldsymbol% {I}\right),\quad t=1,\dots,Titalic_q ( bold_italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | bold_italic_z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) = caligraphic_N ( bold_italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; square-root start_ARG 1 - italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG bold_italic_z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_I ) , italic_t = 1 , … , italic_T (1)

to inject noise, generating a sequence of noisy samples, 𝒛1,,𝒛Tsubscript𝒛1subscript𝒛𝑇\boldsymbol{z}_{1},\dots,\boldsymbol{z}_{T}bold_italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_z start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT in total T𝑇Titalic_T steps, where βt(0,1)subscript𝛽𝑡01\beta_{t}\in\left(0,1\right)italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ ( 0 , 1 ) is the diffusion rate, 𝒩𝒩\mathcal{N}caligraphic_N denotes the Gaussian distribution, and 𝑰𝑰\boldsymbol{I}bold_italic_I is the identity matrix. For T𝑇T\rightarrow\inftyitalic_T → ∞, q(𝒛T)𝒩(𝟎,𝑰)𝑞subscript𝒛𝑇𝒩0𝑰q\left(\boldsymbol{z}_{T}\right)\approx\mathcal{N}\left(\boldsymbol{0},% \boldsymbol{I}\right)italic_q ( bold_italic_z start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) ≈ caligraphic_N ( bold_0 , bold_italic_I ), allowing to approximate 𝒛Tsubscript𝒛𝑇\boldsymbol{z}_{T}bold_italic_z start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT as a Gaussian vector.

Refer to caption

Figure 1: Schematic for the reverse diffusion process in the G2F diffusion model. It generates a sample by removing noises in a series of steps through a U-Net. The geometry prompt is injected into the U-Net via a cross-attention mechanism, which takes the geometry as the prompt to control the generation process.

To generate samples 𝒛0subscript𝒛0\boldsymbol{z}_{0}bold_italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, DDPM conducts a reverse diffusion process on the Gaussian vector 𝒛Tsubscript𝒛𝑇\boldsymbol{z}_{T}bold_italic_z start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT using a learnable Markov transition kernel

p𝜽(𝒛t1|𝒛t;𝒄)=𝒩[𝒛t1;𝝁𝜽(𝒛t,t),𝚺𝜽(𝒛t,t),𝒄],subscript𝑝𝜽conditionalsubscript𝒛𝑡1subscript𝒛𝑡𝒄𝒩subscript𝒛𝑡1subscript𝝁𝜽subscript𝒛𝑡𝑡subscript𝚺𝜽subscript𝒛𝑡𝑡𝒄p_{\boldsymbol{\theta}}\left(\boldsymbol{z}_{t-1}|\boldsymbol{z}_{t};% \boldsymbol{c}\right)=\mathcal{N}\left[\boldsymbol{z}_{t-1};\boldsymbol{\mu}_{% \boldsymbol{\theta}}\left(\boldsymbol{z}_{t},t\right),\boldsymbol{\Sigma}_{% \boldsymbol{\theta}}\left(\boldsymbol{z}_{t},t\right),\boldsymbol{c}\right],italic_p start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT | bold_italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; bold_italic_c ) = caligraphic_N [ bold_italic_z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ; bold_italic_μ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) , bold_Σ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) , bold_italic_c ] , (2)

where 𝜽𝜽\boldsymbol{\theta}bold_italic_θ denotes model parameters, 𝒄𝒄\boldsymbol{c}bold_italic_c denotes the condition, the mean 𝝁𝜽subscript𝝁𝜽\boldsymbol{\mu}_{\boldsymbol{\theta}}bold_italic_μ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT and variance 𝚺𝜽subscript𝚺𝜽\boldsymbol{\Sigma}_{\boldsymbol{\theta}}bold_Σ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT are parameterized by the neural network (NN). The parameters 𝜽𝜽\boldsymbol{\theta}bold_italic_θ are trained to make the reverse diffusion process a good approximation of the forward one, minimizing the Kullback-Leibler divergence (Kullback & Leibler, 1951) between q(𝒛0,𝒛1,,𝒛T)𝑞subscript𝒛0subscript𝒛1subscript𝒛𝑇q\left(\boldsymbol{z}_{0},\boldsymbol{z}_{1},\dots,\boldsymbol{z}_{T}\right)italic_q ( bold_italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_z start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) and p𝜽(𝒛0,𝒛1,,𝒛T;𝒄)subscript𝑝𝜽subscript𝒛0subscript𝒛1subscript𝒛𝑇𝒄p_{\boldsymbol{\theta}}\left(\boldsymbol{z}_{0},\boldsymbol{z}_{1},\dots,% \boldsymbol{z}_{T};\boldsymbol{c}\right)italic_p start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_z start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ; bold_italic_c ). To optimize training, Ho et al. (2020) proposed a loss function based on the mean-square-error (MSE) of the predicted noise,

=𝔼t,𝒛0,ϵ[ϵϵ𝜽(𝒛t,t;𝒄)2],subscript𝔼𝑡subscript𝒛0bold-italic-ϵdelimited-[]superscriptnormbold-italic-ϵsubscriptbold-italic-ϵ𝜽subscript𝒛𝑡𝑡𝒄2\mathcal{L}=\mathbb{E}_{t,\boldsymbol{z}_{0},\boldsymbol{\epsilon}}\left[\left% \|\boldsymbol{\epsilon}-\boldsymbol{\epsilon}_{\boldsymbol{\theta}}\left(% \boldsymbol{z}_{t},t;\boldsymbol{c}\right)\right\|^{2}\right],caligraphic_L = blackboard_E start_POSTSUBSCRIPT italic_t , bold_italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_ϵ end_POSTSUBSCRIPT [ ∥ bold_italic_ϵ - bold_italic_ϵ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ; bold_italic_c ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] , (3)

where 𝔼𝔼\mathbb{E}blackboard_E denotes the expectation, and ϵ𝜽(𝒛t,t;𝒄)subscriptbold-italic-ϵ𝜽subscript𝒛𝑡𝑡𝒄\boldsymbol{\epsilon}_{\boldsymbol{\theta}}\left(\boldsymbol{z}_{t},t;% \boldsymbol{c}\right)bold_italic_ϵ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ; bold_italic_c ) is a NN predicting the noise vector ϵ𝒩(𝟎,𝑰)similar-tobold-italic-ϵ𝒩0𝑰\boldsymbol{\epsilon}\sim\mathcal{N}\left(\boldsymbol{0},\boldsymbol{I}\right)bold_italic_ϵ ∼ caligraphic_N ( bold_0 , bold_italic_I ).

2.2 Geometry prompt

In the G2F diffusion model sketched in figure 1, we introduce the geometry prompt as the condition 𝒄𝒄\boldsymbol{c}bold_italic_c in (2), and use a U-Net (Ronneberger et al., 2015) with a cross-attention mechanism (Rombach et al., 2022) to realize ϵ𝜽(zt,t;𝒄)subscriptbold-italic-ϵ𝜽subscript𝑧𝑡𝑡𝒄\boldsymbol{\epsilon}_{\boldsymbol{\theta}}(z_{t},t;\boldsymbol{c})bold_italic_ϵ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ; bold_italic_c ).

The U-Net consists of a shrinking path, an expanding path, and some skip connections between the two. The shrinking path gradually compresses a low-dimensional, high-resolution field to a high-dimensional, low-resolution field. It contains five layers, with sizes of (64, 64, 256), (32, 32, 512), (16, 16, 1024), (8, 8, 1024), and (4, 4, 2048), followed by an average pool layer to obtain encoded data with size of (1, 1, 2048). The expanding path follows the reverse process.

To incorporate the geometry prompt, we first encode the geometry information into a vector of 0 and 1, representing the fluid and solid, respectively. The prompt is encoded by two NNs, having the same sizes as the last and the second-to-last layers in the expanding path. Outputs from these two NNs are then combined into the last and the second-to-last layers in the expanding path. The cross-attention (Rombach et al., 2022) in the last two layers balances the interpolation and extrapolation in our practice.

Note that DDPM generates results from random noises, which means that DDPM’s results differ even with the same prompt. Here we focus on the geometry prompt, so we only refer the generated flow fields to a geometry, without specifying a particular time.

2.3 Datasets

We evaluate the G2F diffusion model based on 2D flows past various obstacles. The dataset includes a range of obstacle geometries, as illustrated in figure 2. The G2F diffusion model is trained using the flow field around simple geometries, including the circle, ellipse, rectangle, and triangle. They are often used as building blocks to construct complex geometries. To assess the model’s ability to extrapolate to different geometries, the test set includes flows around obstacles not present in the training set. This includes several cases with parallelogram obstacles. Furthermore, the model is tested on a cross and characters ‘PKU’ with convex shapes.

Refer to caption
Figure 2: Datasets for evaluating the G2F diffusion model. Elementary obstacle geometries including the circle, ellipse, rectangle, and triangle are used for training; Complex obstacle geometries including the parallelogram, cross, and characters ‘PKU’ are employed for testing. Each sample 𝒛0=[u,v,p,ω]subscript𝒛0𝑢𝑣𝑝𝜔\boldsymbol{z}_{0}=\left[u,v,p,\omega\right]bold_italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = [ italic_u , italic_v , italic_p , italic_ω ] includes velocity, pressure, and vorticity.

The geometry complexity is characterized by the roundness r=4πS/C2𝑟4𝜋𝑆superscript𝐶2r=4\pi S/C^{2}italic_r = 4 italic_π italic_S / italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, where S𝑆Sitalic_S and C𝐶Citalic_C are the area and circumference, respectively. The roundness is bounded by r=1𝑟1r=1italic_r = 1 for the circle and r0𝑟0r\rightarrow 0italic_r → 0 for extremely complex geometry. The cross in the test set has r=0.321𝑟0.321r=0.321italic_r = 0.321, smaller than all shapes in the training set, and the characters ‘PKU’ have r0𝑟0r\approx 0italic_r ≈ 0.

Focusing on the geometry condition, we evaluate the model using flow field data with the same free-stream velocity U=1subscript𝑈1U_{\infty}=1italic_U start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = 1, viscosity μ=1×102𝜇1superscript102\mu=1\times 10^{-2}italic_μ = 1 × 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT, and density ρ=1𝜌1\rho=1italic_ρ = 1. The circle radius is 1, and the characteristic length of the obstacles ranges from 1 (a rectangle or ellipse with its longitudinal axis aligned in the streamwise direction) to 4 (a rectangle or ellipse with its longitudinal axis aligned in the spanwise direction). Correspondingly, the Reynolds number \Rey\Rey\Rey varies from 100 to 400. We solve the incompressible Navier-Stokes equations to obtain the flow field around various obstacles as the ground truth (GT) using OpenFOAM (Greenshields, 2022). The body-fitted meshes with 55000 cells are employed for a computational domain of (20,60)×(20,20)20602020\left(-20,60\right)\times\left(-20,20\right)( - 20 , 60 ) × ( - 20 , 20 ), where the geometric centroid of the obstacle is positioned at the spatial location (x,y)=(0,0)𝑥𝑦00(x,y)=(0,0)( italic_x , italic_y ) = ( 0 , 0 ). Each flow-field snapshot has the velocity (u,v)𝑢𝑣(u,v)( italic_u , italic_v ), pressure p𝑝pitalic_p, and vorticity ω𝜔\omegaitalic_ω. Note that we interpolate the flow field to a 128×128128128128\times 128128 × 128 uniform grid over a view box of (2,10)×(4,4)21044\left(-2,10\right)\times\left(-4,4\right)( - 2 , 10 ) × ( - 4 , 4 ) to obtain z0=[u,v,p,ω]subscript𝑧0𝑢𝑣𝑝𝜔z_{0}=[u,v,p,\omega]italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = [ italic_u , italic_v , italic_p , italic_ω ] for the model. For each case, 50 snapshots are sampled with the time step of 1111.

In the implementation, we empirically set T=400𝑇400T=400italic_T = 400 and βt=1×104+5×105tsubscript𝛽𝑡1superscript1045superscript105𝑡\beta_{t}=1\times 10^{-4}+5\times 10^{-5}titalic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT + 5 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT italic_t. The parameters 𝜽𝜽\boldsymbol{\theta}bold_italic_θ in the NN are optimized to minimize the loss function in (3). The parameters are initialized by the Kaiming method (He et al., 2015) and updated using the Adam optimizer (Kingma & Ba, 2015). The initial learning rate is 0.001, subsequently reduced to one-tenth of its preceding value after every 2000 epochs.

In comparison, we trained a CNN model using the same U-Net and dataset. The CNN model loads geometry prompts at the entrance of the shrinking path and outputs flow fields at the exit of the expanding path, similar to Shirzadi et al. (2022). The CNN parameters are optimized to minimize the MSE of outputs against the GT. We specifically chose to compare with CNN as it utilizes the same U-Net architecture as the G2F diffusion model, allowing for a direct comparison of the two approaches. The physics-informed model, such as PINN (Karniadakis et al., 2021), is not compared here because the presented diffusion model does not incorporate explicit physical constraints. The physics-informed diffusion model is discussed in supplementary material and will be further studied within the same network structure in future work.

3 Results

We evaluate the G2F diffusion model against the obstacles in the training and test sets. Given the random nature of the model, where a specific time is not designated, a snapshot in the GT solution is selected for comparison with the model output. We measure the similarity between the model output and GT using the L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT error

L1(𝒛0M)=𝒛0M𝒛0GT,subscript𝐿1superscriptsubscript𝒛0𝑀normsuperscriptsubscript𝒛0𝑀superscriptsubscript𝒛0GTL_{1}\left(\boldsymbol{z}_{0}^{M}\right)=\left\|\boldsymbol{z}_{0}^{M}-% \boldsymbol{z}_{0}^{\mathrm{GT}}\right\|,italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ) = ∥ bold_italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT - bold_italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_GT end_POSTSUPERSCRIPT ∥ , (4)

where the superscript M𝑀Mitalic_M denotes the model output. Then the GT snapshot with the minimal L1(𝒛0M)subscript𝐿1superscriptsubscript𝒛0𝑀L_{1}\left(\boldsymbol{z}_{0}^{M}\right)italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ) is used for comparison below.

3.1 Training set

We first present the flow past a cylinder as a standard ML-enhanced CFD case. Figure 3 compares the G2F diffusion and CNN models against the GT. At \Rey=200\Rey200\Rey=200= 200, the flow is featured by periodic vortex shedding.

\begin{overpic}[height=126.47249pt]{pic/round/u.pdf} \put(-8.0,95.0){(a)} \end{overpic}
\begin{overpic}[height=126.47249pt]{pic/round/u-plot.pdf} \put(0.0,95.0){(b)} \end{overpic}
\begin{overpic}[height=126.47249pt]{pic/round/p.pdf} \put(-8.0,95.0){(c)} \end{overpic}
\begin{overpic}[height=126.47249pt]{pic/round/p-plot.pdf} \put(0.0,95.0){(d)} \end{overpic}
Figure 3: Flow past a cylinder generated by the G2F diffusion model and CNN, compared with the GT: (a) contour of u𝑢uitalic_u, (b) profiles of u𝑢uitalic_u in the wake at x=0𝑥0x=0italic_x = 0, 2, and 6, (c) contour of pressure p𝑝pitalic_p, and (d) pressure on the obstacle surface. The vertical dashed lines in the upper left panel mark the stations at x=0𝑥0x=0italic_x = 0, 2, and 6.

Figures 3(a) and (c) compare the instantaneous contours of the streamwise velocity u𝑢uitalic_u and pressure p𝑝pitalic_p, respectively. Both models perform well in the stagnation region. The G2F diffusion model captures the von Kármán vortex street in the wake. The model output well approximates the alternating shear layers and low-pressure zones, although it exhibits minor discrepancies in values. In contrast, the CNN model fails to generate the asymmetric flow structures, displaying a ‘time-averaged’ flow field. This is because the diffusion model learns the distribution q(𝒛0)𝑞subscript𝒛0q\left(\boldsymbol{z}_{0}\right)italic_q ( bold_italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) of the data 𝒛0subscript𝒛0\boldsymbol{z}_{0}bold_italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, where a sample from the distribution approximates one snapshot of the flow field. Meanwhile, the CNN is trained to minimize the distance between its output and all snapshots, resulting in an ‘averaged’ result.

We plot the profiles of u𝑢uitalic_u at x=0𝑥0x=0italic_x = 0, 2, and 6 in figure 3(b). The G2F diffusion model agrees with the GT on the shift of velocity peaks and valleys along the development of the wake. However, quantitative errors remain in the model output. Note that the G2F diffusion model returns negative inside the obstacle, where u𝑢uitalic_u should be 0. Due to averaging over all snapshots, the CNN model gives symmetric u𝑢uitalic_u profiles at all three stations.

We further compare the pressures on the obstacle from different models in figure 3(d). The pressure on the obstacle surface is plotted against the angle in the polar coordinate, as illustrated in figure 3(c). The G2F diffusion model follows the GT, capturing the peak at the stagnation point and the subsequent asymmetric pressure drop on both sides. It also accurately generates the pressure rises at φ=π/2𝜑𝜋2\varphi=\pi/2italic_φ = italic_π / 2 and 3π/23𝜋23\pi/23 italic_π / 2, indicating a good prediction on the boundary layer separation. Although the CNN gives reasonable results on the stagnation point and the local minimum position, its solution includes strong unphysical oscillations.

Overall, the G2F and CNN models exhibit distinct behaviours in predicting the flow field. The G2F diffusion model produces results that qualitatively match the instantaneous flow field, while the CNN model provides a time-averaged flow field. Similar results are observed in other geometries. This difference arises from the goals of the two deep-learning NNs. The G2F diffusion model is trained to minimize the KL divergence, equivalent to maximizing the likelihood. Sampling from the distribution p𝜽(𝒛0;𝒄)subscript𝑝𝜽subscript𝒛0𝒄p_{\boldsymbol{\theta}}(\boldsymbol{z}_{0};\boldsymbol{c})italic_p start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ; bold_italic_c ) then returns data close to one snapshot in training data. The CNN model struggles with asymmetry because it minimizes the sum of errors between its output and all snapshots, leading to an averaged result.

3.2 Test set

We employ obstacle geometries absent from the training set to test the extrapolation capability of the G2F diffusion model, including parallelograms, a cross, and the characters ‘PKU’. The parallelograms exhibit similarities to the simple geometries in the training set. Furthermore, the cross and characters ‘PKU’ are out-of-distribution cases, characterized by combinations of elementary geometries and convex shapes.

Figure 4 compares the velocity and pressure fields generated by the G2F diffusion and CNN models around the cross with the GT. The G2F diffusion model generates the instantaneous flow field with shedding vortices and clear boundary layers. Stagnation zones on the upstream side of the cross are predicted correctly, implied by the high-pressure values on the obstacle surface. Meanwhile, the CNN model returns obvious oscillations around the obstacle, and the fields are averaged in the wake. Moreover, the CNN model incorrectly predicts two negative pressure regions near the front arm of the cross, implying that it treats the cross arms as standalone objects in the flow field. The G2F diffusion model addresses this challenge effectively, demonstrating the extrapolation capability of generative models.

\begin{overpic}[height=126.47249pt]{pic/cross/u.pdf} \put(-8.0,95.0){(a)} \end{overpic}
\begin{overpic}[height=126.47249pt]{pic/cross/u-plot.pdf} \put(0.0,95.0){(b)} \end{overpic}
\begin{overpic}[height=126.47249pt]{pic/cross/p.pdf} \put(-8.0,95.0){(c)} \end{overpic}
\begin{overpic}[height=126.47249pt]{pic/cross/p-plot.pdf} \put(0.0,95.0){(d)} \end{overpic}
Figure 4: Flow past a cross generated by the G2F diffusion model and CNN, compared with the GT: (a) contour of u𝑢uitalic_u, (b) profiles of u𝑢uitalic_u in the wake at x=𝑥absentx=italic_x = 0, 2, and 6, (c) contour of pressure p𝑝pitalic_p, and (d) pressure on a unit circle crossing the four arms. The vertical dashed lines in the upper left panel mark the stations at x=0𝑥0x=0italic_x = 0, 2, and 6.

Due to the two cavities downstream of the cross, there exists a local minimum following the upper edge of the cross, as shown around y=1.5𝑦1.5y=1.5italic_y = 1.5 in the u𝑢uitalic_u profile at x=2𝑥2x=2italic_x = 2. This is not found for simple geometries in the training set. The G2F diffusion model indicates a weak shear layer at this position, although it cannot accurately reproduce this structure. In addition, the G2F diffusion model gives a narrower wake flow compared with the GT, due to two possible reasons. First, the cross-attention mechanism makes the geometry more influential on the flow field around the obstacle (Vaswani et al., 2017), while the far field is less affected by the geometry. Second, a lack of obstacle geometries with large widths in the y𝑦yitalic_y-direction in the training set makes it difficult for the G2F diffusion model to predict a wide wake flow.

Generating flow fields around the characters ‘PKU’ is challenging due to the sharp corners, convex curves, and separated components, none of which are included in the training set. Given the difficulty in simulating this flow with a simple mesh, we present the generated streamwise velocity u𝑢uitalic_u and pressure p𝑝pitalic_p by the two models in figure 5, without the GT. The G2F diffusion model demonstrates a remarkable extrapolation capability, capturing the major flow features, despite some artefacts such as the spurious flows through the characters ‘P’ and ‘U’ in the velocity field. Again, the CNN model introduces noise at character boundaries and provides time-averaged-like fields. Model performance on other obstacle geometries is provided in supplementary material.

Refer to caption
Figure 5: Flow past characters ‘PKU’ generated by the G2F diffusion model and CNN, on contours of u𝑢uitalic_u and p𝑝pitalic_p.

All geometries’ results are summarized in figure 6. The black and red symbols represent the geometries in the training and test sets, respectively. The horizontal axis measures the roundness r𝑟ritalic_r of geometries. Note that all geometries in the training set have r0.5𝑟0.5r\geq 0.5italic_r ≥ 0.5, while several geometries in the test set have r<0.5𝑟0.5r<0.5italic_r < 0.5, indicating out-of-distribution cases.

\begin{overpic}[height=151.76854pt]{pic/extra/uv.pdf} \put(6.0,95.0){(a)} \end{overpic}
\begin{overpic}[height=151.76854pt]{pic/extra/div_e.pdf} \put(5.0,95.0){(b)} \end{overpic}
Figure 6: Comparison of results for different obstacle geometries: (a) normalized L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT error between ML methods and GT and (b) divergence. It shows that the G2F diffusion model outperforms the CNN in understanding physical laws, and achieves good performances in extrapolation tasks.

We compare the accuracy of the two ML models in figure 6(a) using the L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT error of [u,v]𝑢𝑣\left[u,v\right][ italic_u , italic_v ] normalized by the difference of the maximum and minimum values in the GT, L1([u,v])/([u,v]maxGT[u,v]minGT)subscript𝐿1𝑢𝑣superscriptsubscript𝑢𝑣GTsuperscriptsubscript𝑢𝑣GTL_{1}([u,v])/([u,v]_{\max}^{\mathrm{GT}}-[u,v]_{\min}^{\mathrm{GT}})italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( [ italic_u , italic_v ] ) / ( [ italic_u , italic_v ] start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_GT end_POSTSUPERSCRIPT - [ italic_u , italic_v ] start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_GT end_POSTSUPERSCRIPT ). The pressure and vorticity are not included, considering the differences in the order of magnitude. Most of the G2F diffusion model’s results are better than CNN’s, especially in the test set, indicating that the diffusion model is more accurate and robust. We also notice that CNN performs well for some slender obstacles, as these cases have weaker vortex shedding. Consequently, the wake flows in these cases are close to time-averaged results by the CNN model.

We further evaluate the model realizability via the divergence 𝒟div=xu+yvsubscript𝒟divnormsubscript𝑥𝑢subscript𝑦𝑣\mathcal{D}_{\mathrm{div}}=\left\|\partial_{x}u+\partial_{y}v\right\|caligraphic_D start_POSTSUBSCRIPT roman_div end_POSTSUBSCRIPT = ∥ ∂ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_u + ∂ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_v ∥ averaged over the domain, which quantifies the discrepancy from the incompressible flow. Results in figure 6(b) demonstrate that 𝒟divsubscript𝒟div\mathcal{D}_{\mathrm{div}}caligraphic_D start_POSTSUBSCRIPT roman_div end_POSTSUBSCRIPT increases overall with the decrease of r𝑟ritalic_r. For all cases, the G2F diffusion model yields smaller 𝒟divsubscript𝒟div\mathcal{D}_{\mathrm{div}}caligraphic_D start_POSTSUBSCRIPT roman_div end_POSTSUBSCRIPT than the CNN model, and the increase rate of 𝒟divsubscript𝒟div\mathcal{D}_{\mathrm{div}}caligraphic_D start_POSTSUBSCRIPT roman_div end_POSTSUBSCRIPT with the decrease in r𝑟ritalic_r is smaller. This confirms the robustness of the G2F diffusion model across various geometries, also implying that incorporating a divergence-free constraint may not significantly contribute towards better performance. The physics-informed diffusion model is discussed in supplementary material. Additionally, flows with \Rey\Rey\Rey closer to the training-set average (around 200) yield better results, suggesting that the data at a broader range of \Rey\Rey\Rey should be incorporated in the training set to enhance the model performance.

4 Conclusion

We propose a G2F diffusion model for predicting flows past an obstacle with various geometries. The model employs a U-Net architecture with a cross-attention mechanism to incorporate geometry information as a prompt, guiding the reverse diffusion process to generate flow fields. A dataset including 2D flows with different geometries is established for training and testing. The G2F diffusion model is trained using flow fields for simple obstacle geometries and evaluated on both interpolation and extrapolation tasks.

We assess the G2F diffusion model using simple and complex obstacle geometries present and absent in the training set, respectively. The model results are compared with those from a CNN model and the GT. The results demonstrate that the diffusion model outperforms the CNN model, particularly in generating instantaneous flow fields and handling out-of-distribution geometries. In particular, the diffusion model successfully captures essential flow features while maintaining physical consistency, even in challenging scenarios. It reproduces instantaneous flow fields by learning data distribution, while the CNN model yields time-averaged data. Furthermore, the diffusion model’s ability to learn physical laws is evident from its consistent performance across various geometries, as quantified by the divergence.

Beyond the current preliminary study on the application of the diffusion model in CFD, there are several issues that deserve further investigation. In addition to the obstacle geometry, \Rey\Rey\Rey is another key parameter, emphasizing the importance of diverse training data for improving the model’s generalization capabilities. Future research includes expanding the dataset to cover a wider range of Reynolds numbers and geometries, exploiting the scaling law, and enhancing the model’s ability to handle diverse flow conditions.

While results indicate that the diffusion model has the potential to learn the underlying physics implicitly, incorporating physics information could accelerate the sampling process, as discussed in supplementary material. Since the conditioned diffusion model suffers from the diversity problem (Gu et al., 2024), further exploration of condition implementation approaches is necessary to address this challenge. Additionally, investigating the application of the diffusion model in generating time-dependent flow field data can provide insights for understanding and predicting complex fluid dynamics.

\backsection

[Funding]Numerical simulations were performed on the TH-2A supercomputer in Guangzhou, China. This work has been supported in part by the National Natural Science Foundation of China (Nos. 11925201, 52306126, 11988102, and 92270203), the National Key R&D Program of China (Grant No. 2020YFE0204200), and the Xplore Prize.

\backsection

[Declaration of interests]The authors report no conflict of interest.

\backsection

[Author ORCIDs]Jiajun Hu, https://orcid.org/0009-0005-5192-876X; Zhen Lu, https://orcid.org/0000-0002-6729-3771; Yue Yang, https://orcid.org/0000-0001-9969-7431

\backsection

[Author contributions]Y.Y., Z.L, and J.H. designed research. J.H. and Z.L. preformed research. All the authors discussed the results and wrote the manuscript. All the authors have given approval for the manuscript.

References

  • Biever (2023) Biever, C. 2023 ChatGPT broke the Turing test — the race is on for new ways to assess AI. Nature 619, 686–689.
  • Chu et al. (2022) Chu, M., Liu, J., Zheng, Q., Franz, E., Seidel, H.-P., Theobalt, C. & Zayer, R. 2022 Physics informed neural fields for smoke reconstruction with sparse data. ACM Trans. Graph. 41, 119.
  • Fu et al. (2023) Fu, N., Wei, L., Song, Y., Li, Q., Xin, R., Omee, S. S., Dong, R., Siriwardane, E. M. D. & Hu, J. 2023 Material transformers: deep learning language models for generative materials design. Mach. Learn.: Sci. Technol. 4, 015001.
  • Fukami et al. (2019) Fukami, K., Fukagata, K. & Taira, K. 2019 Super-resolution reconstruction of turbulent flows with machine learning. J. Fluid Mech. 870, 106–120.
  • Fukami et al. (2024) Fukami, K., Goto, S. & Taira, K. 2024 Data-driven nonlinear turbulent flow scaling with Buckingham Pi variables. J. Fluid Mech. 984, R4.
  • Greenshields (2022) Greenshields, C. J. 2022 OpenFOAM User Guide, 10th edn. OpenFOAM Foundation Ltd.
  • Gu et al. (2024) Gu, J., Shen, Y., Zhai, S., Zhang, Y., Jaitly, N. & Susskind, J. M. 2024 Kaleido diffusion: Improving conditional diffusion models with autoregressive latent modeling, arXiv: 2405.21048.
  • He et al. (2015) He, K., Zhang, X., Ren, S. & Sun, J. 2015 Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In 2015 IEEE International Conference on Computer Vision, pp. 1026–1034.
  • Ho et al. (2020) Ho, J., Jain, A. & Abbeel, P. 2020 Denoising diffusion probabilistic models. In 34th Conference on Neural Information Processing Systems, , vol. 33, pp. 6840–6851.
  • Hu et al. (2024) Hu, J., Lu, Z. & Yang, Y. 2024 Improving prediction of preferential concentration in particle-laden turbulence using the neural-network interpolation. Phys. Rev. Fluids 9, 034606.
  • Karniadakis et al. (2021) Karniadakis, George Em, Kevrekidis, Ioannis G., Lu, Lu, Perdikaris, Paris, Wang, Sifan & Yang, Liu 2021 Physics-informed machine learning. Nat. Rev. Phys. 3, 422–440.
  • Kern et al. (2024) Kern, J. S., Negi, P. S., Hanifi, A. & Henningson, D. S. 2024 Onset of absolute instability on a pitching aerofoil. J. Fluid Mech. 988, A8.
  • Kim et al. (2021) Kim, H., Kim, J., Won, S. & Lee, C. 2021 Unsupervised deep learning for super-resolution reconstruction of turbulence. J. Fluid Mech. 910, A29.
  • Kim et al. (2024) Kim, J., Kim, J. & Lee, C. 2024 Prediction and control of two-dimensional decaying turbulence using generative adversarial networks. J. Fluid Mech. 981, A19.
  • Kingma & Ba (2015) Kingma, D. P. & Ba, J. L. 2015 Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, p. 4.
  • Kochkov et al. (2021) Kochkov, D., Smith, J. A., Alieva, A., Wang, Q., Brenner, M. P. & Hoyer, Stephan 2021 Machine learning-accelerated computational fluid dynamics. Proc. Natl. Acad. Sci. USA 118, e2101784118.
  • Kullback & Leibler (1951) Kullback, S. & Leibler, R. A. 1951 On information and sufficiency. Ann. Math. Statist. 22, 79–86.
  • Pan et al. (2023) Pan, Z., Zhou, X. & Tian, H. 2023 Arbitrary style guidance for enhanced diffusion-based text-to-image generation. In IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 4450–4460.
  • Perot (2011) Perot, J. B. 2011 Discrete conservation properties of unstructured mesh schemes. Annu. Rev. Fluid Mech. 43, 299–318.
  • Qiu et al. (2024) Qiu, J., Huang, J., Zhang, X., Lin, Z., Pan, M., Liu, Z. & Miao, F. 2024 Pi-fusion: Physics-informed diffusion model for learning fluid dynamics, arXiv: 2406.03711.
  • Ranade et al. (2021) Ranade, R., Hill, C., He, H., Maleki, A., Chang, N. & Pathak, J. 2021 A composable autoencoder-based iterative algorithm for accelerating numerical simulations, arXiv: 2110.03780.
  • Rombach et al. (2022) Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. 2022 High-resolution image synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10674–10685.
  • Ronneberger et al. (2015) Ronneberger, O., Fischer, P. & Brox, T. 2015 U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention 18th International Conference, , vol. 9351, pp. 234–241.
  • Shan et al. (2024) Shan, S., Wang, P., Chen, S., Liu, J., Xu, C. & Cai, S. 2024 Pird: Physics-informed residual diffusion for flow field reconstruction, arXiv: 2404.08412.
  • Shelley & Zhang (2011) Shelley, M. J. & Zhang, J. 2011 Flapping and bending bodies interacting with fluid flows. Annu. Rev. Fluid Mech. 43, 449–465.
  • Shirzadi et al. (2022) Shirzadi, M., Fukasawa, T., Fukui, K. & Ishigami, T. 2022 Prediction of submicron particle dynamics in fibrous filter using deep convolutional neural networks. Phys. Fluids 34, 123303.
  • Sohl-Dickstein et al. (2015) Sohl-Dickstein, J., Weiss, E. A., Maheswaranathan, N. & Ganguli, S. 2015 Deep unsupervised learning using nonequilibrium thermodynamics. In 32nd International Conference on Machine Learning, , vol. 3, pp. 2246–2255.
  • Song & Ermon (2020) Song, J. & Ermon, S. 2020 Multi-label contrastive predictive coding. In 34th Conference on Neural Information Processing Systems, , vol. 33, pp. 8161–8173.
  • Tong et al. (2022) Tong, W., Wang, S. & Yang, Y. 2022 Estimating forces from cross-sectional data in the wake of flows past a plate using theoretical and data-driven models. Phys. Fluids 34 (11), 111905.
  • Tong et al. (2021) Tong, W., Yang, Y. & Wang, S. 2021 Estimating thrust from shedding vortex surfaces in the wake of a flapping plate. J. Fluid Mech. 920, A10.
  • Vaswani et al. (2017) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł & Polosukhin, I. 2017 Attention is all you need. In 31st Conference on Neural Information Processing Systems, , vol. 2017-December, pp. 5999–6009.
  • Vinuesa & Brunton (2022) Vinuesa, R. & Brunton, S. L. 2022 Enhancing computational fluid dynamics with machine learning. Nat. Comput. Sci. 2, 358–366.
  • Wu et al. (2020) Wu, X., Li, R.-L., Zhang, F.-L., Liu, J.-C., Wang, J., Shamir, A. & Hu, S.-M. 2020 Deep portrait image completion and extrapolation. IEEE Trans. Image Process. 29, 2344–2355.
  • Yang et al. (2023) Yang, L., Zhang, Z., Song, Y., Hong, S., Xu, R., Zhao, Y., Zhang, W., Cui, B. & Yang, M.-H. 2023 Diffusion models: A comprehensive survey of methods and applications. ACM Comput. Surv. 56, 1–39.