Physics informed solution for Heat equation

Lucy Nowacki

2022-10-08 21:29

image info

The PINN adopts a mesh-free approach to solve PDEs by formulating a mathematical model through a neural network (NN), where the loss function encompasses a residual term from the relevant equation. This residual term narrows down the space of viable solutions in line with the imposed physics (laws modeled from observed reality). Thus, PINNs can generally solve PDEs in the form: $$\mathcal{F}(U(z);\gamma=f(z)),\, z\, \in\, \Omega$$ $$\mathcal{B}(U(z);\gamma=g(z)),\, z\, \in\, \partial\Omega$$

Let's list the components needed for the full definition as follows:

$\partial\Omega$ stands for boundaries on a domain $\Omega \subset{\mathbb{R}^d}$
$z := \mathcal{\left[x_1,...,x_{d-1}; t\right]}$ is the space-time coordinate vector
Networks' inputs are transformed into networks outputs field $u$
$\gamma$ are the parameters related to the physics. These parameters could include any constants or coefficients that appear in the PDE or boundary conditions that are inherent to the physical system being modeled. For example, here $\gamma$ contains inititial and boundary conditions and $\alpha$. However, $\gamma$ might be represented by a function of $z$, $\gamma=f(z)$.
$f(z)$ represents the residual of the Heat equation at the point $z$
$\mathcal{F}$ is the non-linear differential operator
$g$ is a boundary function
$\mathcal{B}$ is the non-linear boundary operator

In the context of solving the Heat equation using a Physics-Informed Neural Network (PINN), the given Heat equation is $U_t = \alpha U_{xx}$. Here's how the neural network is incorporated in this scenario:

Approximation of Solution:
- The neural network (NN) is trained to approximate the solution $U(z)$ to the Heat equation over the domain $\Omega$.
- The function $U(z)$ is represented by the neural network, i.e., $U(z) \approx u_\theta(z)$ where $u_\theta(z)$ is the output of the neural network parametrized by $\theta$ at input $z$ (which is the space-time coordinate vector).
Formulation of PDE:
- The Heat equation $U_t = \alpha U_{xx}$ is formulated in terms of a differential operator $\mathcal{F}$ acting on the NN-approximated solution $u_\theta(z)$, to express the discrepancy between the actual behavior dictated by the Heat equation and the current approximation provided by the neural network. This formulation is expressed as: $$ \mathcal{F}(u_\theta(z); \alpha) = u_\theta(z)_t - \alpha u_\theta(z)_{xx} $$.
- Here, $u_\theta(z)_t$ and $u_\theta(z)_{xx}$ are the time and spatial derivatives of the neural network's output, respectively. The parameter $\alpha$ is a known constant related to the physics of the problem, and this formulation enables the direct comparison of the neural network's approximation to the actual Heat equation. This discrepancy, or residual, is what will be minimized during the training process to ensure the neural network's output adheres to the physics of the problem.
Loss Function:
- A loss function is defined which includes a term representing the residual of the Heat equation, i.e., the difference $$ f(z) = u_\theta(z)_t - \alpha u_\theta(z)_{xx} $$, which should ideally be zero. This residual term quantifies the discrepancy between the actual Heat equation and the neural network's current approximation of the solution at each point $z$ in the domain $\Omega$. This loss function might also include terms for boundary conditions, and possibly other terms if observational data is available. The function $f(z)$ doesn't directly identify the data of the problem, but rather quantifies how well the neural network's approximation adheres to the Heat equation across the domain. However, if there's observational data available, it could be incorporated into the loss function alongside or in place of the residual term, which would then involve a function identifying or representing the data of the problem.
Training:
- The neural network is trained to minimize this loss function, thereby learning to approximate the solution $u(z)$ while adhering to the physics imposed by the Heat equation and the boundary conditions.
Parameter Optimization:
- If $\alpha$ is known, it is used as-is in the loss function. If $\alpha$ is not known and is a part of the inverse problem, it could be treated as an additional parameter to be learned during the training process along with the neural network parameters $\theta$. In such a case, both $\theta$ and $\alpha$ are optimized to minimize the loss function, thus finding an approximate solution to the Heat equation that adheres to the imposed physical constraints and any available data.

Through this process, the neural network is integrated into the formulation and solution of the Heat equation, providing a mechanism to solve the Heat equation in a way that is informed by the physics of the problem.

The connection between the operator $\mathcal{F}(u_\theta(z); \alpha)$ and the function $f(z)$ lies in the formulation of the residual term for the partial differential equation (PDE) you're working with. Here's a breakdown of the relationship:¶

Operator Formulation:
- The operator $\mathcal{F}$ is a representation of the PDE. When applied to the neural network's approximation $u_\theta(z)$ of the solution, it produces a residual term that represents the discrepancy between the actual PDE and the neural network's current approximation of the solution.
- In the case of the Heat Equation $U_t = \alpha U_{xx}$, the operator $\mathcal{F}$ is defined such that $$ \mathcal{F}(u_\theta(z); \alpha) = u_\theta(z)_t - \alpha u_\theta(z)_{xx} $$
Function $f(z)$:
- The function $f(z)$ is essentially the residual term produced when $\mathcal{F}$ is applied to $u_\theta(z)$. It quantifies the discrepancy at each point $z$ in the domain $\Omega$.
- Therefore, we have $$ f(z) = \mathcal{F}(u_\theta(z); \alpha) = u_\theta(z)_t - \alpha u_\theta(z)_{xx} $$
Usage in Loss Function:
- This function $f(z)$ is then used in the loss function to guide the training of the neural network. By minimizing the loss function, the neural network learns to approximate the solution $U(z)$ such that the residual $f(z)$ (and therefore $\mathcal{F}(u_\theta(z); \alpha)$) approaches zero, thereby satisfying the PDE.

The operator $\mathcal{F}$ in the context of Physics-Informed Neural Networks (PINNs) and partial differential equations (PDEs) is a functional. A functional is an operator that takes a function as input on the particualr point of the domain and returns a scalar as output. In the context of PINNs, the operator $\mathcal{F}$ acts on the function $u_\theta(z)$ (which is the neural network's approximation of the solution to the PDE) and computes a residual, which is the difference between the left-hand side and the right-hand side of the PDE, when the function $u_\theta(z)$ is substituted into it. This residual is usually a scalar quantity for each point $z$ in the domain, and it quantifies how well the neural network's approximation satisfies the PDE. The aim is to minimize this residual over the domain, which corresponds to finding a function $u_\theta(z)$ that satisfies the PDE as closely as possible.

The solving of forward and inverse problems using Physics-Informed Neural Networks (PINNs) involves creating a neural network that can learn to approximate the solutions to the given partial differential equations (PDEs) while respecting any known physics (i.e., the differential equations and boundary/initial conditions).

Forward Problem Algorithm:¶

For the forward problem, the goal is to find the function $u$ for a given data $f$ with specified parameters $\omega$. A typical approach for solving the forward problem with PINNs, as seen in literature, is based [source]. Here is a simplified version of the algorithm:

Initialize Neural Network: Set up a neural network with an appropriate architecture.
Define Loss Function: The loss function incorporates a residual term from the PDE and possibly boundary/initial conditions.
Train Neural Network: Train the neural network to minimize the loss function using optimization algorithms like Adam or L-BFGS.
Evaluate: Evaluate the trained network to obtain the solution $u$ over the domain of interest.

Inverse Problem Algorithm:¶

In the inverse problem, the aim is to deduce parameters $\gamma$ from the given data. One way to approach this is by extending the traditional PINN framework to handle the inverse problem by incorporating the unknown parameters into the neural network and optimizing over these as well. A Bayesian approach has also been proposed, where a Bayesian Neural Network (BNN) combined with a PINN serves as the prior, and techniques like Hamiltonian Monte Carlo (HMC) or Variational Inference (VI) could serve as estimators [source]. Here is a simplified version of the algorithm:

Initialize Neural Network and Parameters: Set up a neural network with an appropriate architecture and initialize the unknown parameters $\gamma$.
Define Loss Function: The loss function incorporates a residual term from the PDE, boundary/initial conditions, and a term related to the discrepancy between the predicted and observed data.
Optimize: Train the neural network and optimize the unknown parameters $\gamma$ to minimize the loss function.
Evaluate: Evaluate the trained network and optimized parameters to obtain the solution $u$ and parameters $\gamma$ over the domain of interest.

Both of these problems can also be tackled by employing different variants of PINNs like Variable Coefficient PINNs (VC-PINNs) which propose a new deep learning framework to deal specifically with the forward and inverse problems of variable coefficient PDE [source]. Moreover, the use of Fourier Features has been proposed to improve the performance of PINNs in handling such problems, especially in cases with non-homogeneous boundary conditions.

Therefore, there is the NN approximation parametrized by $\theta$ $$\hat{u}_{\theta}(z)\simeq u(z)$$ in which NN learns to approximate the PDE via finding $\theta$ that defines the NN by minimising the adequately weighted loss function depending on differential equation $\mathcal{L}_{\mathcal{F}}$, the boundary conditions $\mathcal{L}_{\mathcal{B}}$ and, if available, some data $\mathcal{L}_{data}$. Namely, $$\theta^* = \displaystyle \min_{x}arg(\omega_{\mathcal{F}}\mathcal{L}_{\mathcal{F}}(\theta)+\omega_{\mathcal{B}}\mathcal{L}_{\mathcal{B}}(\theta)+\omega_{\mathcal{data}}\mathcal{L}_{data}(\theta))$$

In this scenario, PINNs represent a blend of semi-supervised and supervised learning. The PDE equation acts as a physics-based constraint on the learning process, with the semi-supervised aspect arising from the boundary conditions and the supervised aspect from the incorporation of known labeled data. The diagram below depicts the conceptual architecture of PINNs.

From the above, we can consider PINNs as unsupervised learning if only based on a physical equation. However, we can say about semi-supervised learning if we have boundary conditions. The incorporation of known label data results in supervised learning. The diagram below depicts the conceptual architecture of PINNs image info Image taken from Scientific Machine Learning through Physics-Informed Neural Networks: Where we are and What’s next by Cuomo, Salvatore and Di Cola, Vincenzo Schiano and Giampaolo, Fabio and Rozza, Gianluigi and Raissi, Maizar and Piccialli, Francesco, 2022.

Here, we will solve the Heat equation $U_t=0.1U_{xx}$ with Dirichlet boundary conditions, homogeneous $U(0,t)=0$ and non-homogeneous $U(1,t)=sin(2t)$, together with the initial condition $U(x,0)=\sqrt{x}$. Non-homogeneous solutions can cause difficulties, i.e. it is often not straightforward to obtain stable convergence and model overfits easily. The problem can be overcome by applying bespoke NN architecture and/or Fourier Features. The analytical solution was obtained through Mathematica in the form of the first two terms of expansion of the solution and is plotted below. For the PINN residual term, we have 200x100 = 20000 points from which we will sample points, namely collocation points inside the domain, $\Omega$. Fourier Features are explaind in another blog post

In [1]:

import matplotlib.pyplot as plt
import torch
import scipy
from scipy.integrate import solve_ivp
from torch import nn 
from torch import Tensor           
import torch.autograd as autograd         
from torch.optim import Adam, LBFGS
import numpy as np
#!pip install torchviz
from torchviz import make_dot

#!pip install jupyterthemes
from jupyterthemes import jtplot
jtplot.style(theme="grade3", context="notebook", ticks=True, grid=True)

print("CUDA available: ", torch.cuda.is_available())
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 
print("PyTorch version: ", torch.__version__ )

def analytical_sol(x,t,y):
    x_plot =x.squeeze(1) 
    t_plot =t.squeeze(1)
    X,T= torch.meshgrid(x_plot,t_plot)
    F_xt = y
    fig = plt.figure(figsize=(25,12))
    fig.suptitle('Analytical Solution of the Heat Equation', fontsize=30)
   
    ax1 = fig.add_subplot(121, projection='3d')
    ax2 = fig.add_subplot(122)

    ax1.plot_surface(T.numpy(), X.numpy(), F_xt.numpy(),cmap="twilight")
    ax1.set_title('U(x,t)')
    ax1.set_xlabel('t')
    ax1.set_ylabel('x')

    cp = ax2.contourf(T,X, F_xt,20,cmap="twilight")
    fig.colorbar(cp) # Add a colorbar to a plot
    ax2.set_title('U(x,t)')
    ax2.set_xlabel('t')
    ax2.set_ylabel('x')
    plt.show()

def approx_sol(x,t,y):
    X,T= x,t
    F_xt = y
    fig = plt.figure(figsize=(25,15))
    fig.suptitle('NN Approximated Solution of the Heat Equation', fontsize=30)
    ax1 = fig.add_subplot(121, projection='3d')
    ax2 = fig.add_subplot(122)

    ax1.plot_surface(T.numpy(), X.numpy(), F_xt.numpy(),cmap="twilight")
    ax1.set_title('U(x,t)')
    ax1.set_xlabel('t')
    ax1.set_ylabel('x')

    cp = ax2.contourf(T,X, F_xt,20,cmap="twilight")
    fig.colorbar(cp) # Add a colorbar to a plot
    ax2.set_title('U(x,t)')
    ax2.set_xlabel('t')
    ax2.set_ylabel('x')
    plt.show()

# We will approximate the heat equation on the box [0, 1]x[0, 1]:
x_min=0.
x_max=1.
t_min=0.
t_max=1.
#Collocation discretisation of the box, i.e. for the residual term
total_points_x=200
total_points_t=100

#Create mesh
x=torch.linspace(x_min,x_max,total_points_x).view(-1,1)
t=torch.linspace(t_min,t_max,total_points_t).view(-1,1)
# Create the mesh 
X,T=torch.meshgrid(x.squeeze(1),t.squeeze(1))#[200,100]
print(x.shape)#[200,1]
print(t.shape)#[100,1]
print(X.shape)#[200,100]
print(T.shape)#[200,100]

x_min_tens = torch.Tensor([x_min]) 
x_max_tens = torch.Tensor([x_max]) 
#To get analytical solution obtained via MATHEMATICA
f_real = lambda x, t:  (1. - 1.*x)*torch.sin(2*t) +( 1.12734*torch.exp(-0.98696*t) - 0.252637*torch.cos(2.*t) - 0.511949*torch.sin(2.*t) )* torch.sin(torch.pi*x)+\
                        ( -0.112278*torch.exp(-3.94784*t) - 0.128324*torch.cos(2.*t) - 0.0650094*torch.sin(2.*t) )* torch.sin(2*torch.pi*x)
# Evaluate real solution on the box domain
U_real=f_real(X,T)
analytical_sol(x,t,U_real) #f_real was defined previously(function)

CUDA available:  True
PyTorch version:  1.13.1
torch.Size([200, 1])
torch.Size([100, 1])
torch.Size([200, 100])
torch.Size([200, 100])

/home/lucy/anaconda3/envs/lucy/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1670525541990/work/aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]

No description has been provided for this image

Since we approximate NN solution on $[0,1]x[0,1]=100x200$ for space and time , respectively, it is necessary to flatten our domain $\Omega$. All input vectors are two-dimensional for space and time. The output is one-dimensional, which later, after reshaping, will result in an approximated solution. For later error estimations, let us prepare the test vector and analytical solution $U$. To do so, we have to flatten input field into a two-dimensional vector $\{x_{i}, t_{i}\}_{i=1}^{20000}$ and one-dimensional output vector $\{U_{test}^{i}\}_{i=1}^{20000}$. Similar setup will be applied for the initial (IC) and boundary conditions (BC).

In [2]:

#Prepare testing data
# Transform the mesh into a 2-column vector
x_test=torch.hstack((X.transpose(1,0).flatten()[:,None],T.transpose(1,0).flatten()[:,None]))#[200x100 -> 20000,2]
U_test=U_real.transpose(1,0).flatten()[:,None] #[200x100 -> 20000,1] # Colum major Flatten (so we transpose it)

Now it is time to set up vectors on which NN will approximate the desired solution. The initial data at $t=0$ are from 18 points sampled from 100-points time vector, so we obtain 2-column vector $\{U_0^i, x_0^i\}_{i=1}^{18}$.

The boundaries at $x=0$ and $x=1$ consist of 150 points sampled from a 200-points spatial vector. In this way, we obtain the 2-column vector $\{U_b^i, t_0^i\}_{i=1}^{150}$.

Next, we stack IC and BCs vertically into two vectors of the length of 168 points for inputs X_train_boundaries and solution on boundaries U_train_boundaries.

In [3]:

# Domain bounds
lb=x_test[0] #first value of the mesh
ub=x_test[-1] #last value of the mesh

# Transform the mesh into a 2-column vector
#Left Edge: U(x,0)=sqrt(x)->xmin=<x=<xmax; t=0
left_X=torch.hstack((X[:,0][:,None],T[:,0][:,None])) #[200,2] 
left_U=torch.sqrt(left_X[:,0]).unsqueeze(1) #[200,1]

#Boundary Conditions
#Bottom Edge: U(x_min=0,t) = sin(2x); tmin=<t=<max
bottom_X=torch.hstack([X[0,:][:,None],T[0,:][:,None]]) #[100,2]
bottom_U= torch.sin(2.*(bottom_X[:,1][:,None]))#[100,1]

#Top Edge: U(x_max=1,t) = 0 ; 0=<t=<1
top_X=torch.hstack((X[-1,:][:,None],T[-1,:][:,None])) #[100,2]
top_U=torch.zeros(top_X.shape[0],1) #[100,1]

N_IC=20
idx=np.random.choice(left_X.shape[0],N_IC,replace=False)
X_IC=left_X[idx,:] #[15,2]
U_IC=left_U[idx,:] #[15,1]

#Stack all boundaries vertically
X_BC=torch.vstack([bottom_X,top_X]) #[150,2]
U_BC=torch.vstack([bottom_U,top_U]) #[150,1]

N_BC=200
idx=np.random.choice(X_BC.shape[0],N_BC,replace=False)
X_BC=X_BC[idx,:] #[100,2]
U_BC=U_BC[idx,:] #[100,1]

#Stack all boundaries vertically
X_train_boundaries=torch.vstack([X_IC, X_BC]) #[185,2]
U_train_boundaries=torch.vstack([U_IC, U_BC]) #[185,1]

For the residual term, we have 200x100 = 20000 points from which we sample 10000 points. That is why the solution domain is on $U(x,t)$ are $\{ x_c^i, t_c^i \}_{i=1}^{1000}$. Next, we stack the collocation points inside the domain, $\Omega$, with those for BCs. This way, we have all the necessary arrays and vectors prepared for the training.

In [4]:

#Collocation Points to evaluate PDE residual
#Choose(Nf) points(Latin hypercube)
N_residual=10000
from pyDOE import lhs 
X_train_residual=lb+(ub-lb)*lhs(2,N_residual)#[1000,2] # 2 as the inputs are x and t
X_train_total=torch.vstack((X_train_residual,X_train_boundaries))#[10000,2] and [100,] -> [10100,2] Add the training points to the collocation points

torch.manual_seed(123)
#Store tensors to GPU
X_train_boundaries=X_train_boundaries.float().to(device)#Training Points (BC)
U_train_boundaries=U_train_boundaries.float().to(device)#Training Points (BC)
X_train_total=X_train_total.float().to(device)#Collocation Points
U_hat = torch.zeros(X_train_total.shape[0],1).to(device)#[10100,1]; needed to loss function
X_train_residual.shape

Out[4]:

torch.Size([10000, 2])

We will apply wide, shallow NN architecture with additional skip connections to overcome the vanishing gradient problem. In addition, it is usually that the deeper NN, the more learning goes towards low-frequency features. Thus, there are applied so-called Fourier Features that prevent the omission of high-frequency functions. The used here mapping is $\gamma(x)=[cos(B\,x), sin(B\,x)]^T$ with $B$ drawn from normal distribution. tune_beta is the aggregating gradient accumulator stabilising training by smoothing irregularities and facilitating gradients' flow. The NN's architecture is depicted here .

In [5]:

class DenseResNet(nn.Module):

    def __init__(self, dim_in, dim_out, num_resnet_blocks, 
                 num_layers_per_block, num_neurons, activation, 
                 fourier_features, m_freqs, sigma, tune_beta):
        super(DenseResNet, self).__init__()

        self.num_resnet_blocks = num_resnet_blocks
        self.num_layers_per_block = num_layers_per_block
        self.fourier_features = fourier_features
        self.activation = activation
        self.tune_beta = tune_beta

        if tune_beta:
            self.beta0 = nn.Parameter(torch.ones(1, 1)).to(device)
            self.beta = nn.Parameter(torch.ones(self.num_resnet_blocks, self.num_layers_per_block)).to(device)

        else: 
            self.beta0 = torch.ones(1, 1).to(device)
            self.beta = torch.ones(self.num_resnet_blocks, self.num_layers_per_block).to(device)


        self.first1 = nn.Linear(dim_in, num_neurons)
        # nn.init.xavier_uniform_(self.first1.weight.data)
        # nn.init.zeros_(self.first1.bias.data)  
        self.first2 = nn.Linear(dim_in, num_neurons)
        # nn.init.xavier_uniform_(self.first2.weight.data)
        # nn.init.zeros_(self.first2.bias.data)  
        self.first3 = nn.Linear(dim_in, num_neurons)
        # nn.init.xavier_uniform_(self.first3.weight.data)
        # nn.init.zeros_(self.first3.bias.data)  
        self.first4 = nn.Linear(dim_in, num_neurons)
        # nn.init.xavier_uniform_(self.first4.weight.data)
        # nn.init.zeros_(self.first4.bias.data)  
        self.first5 = nn.Linear(dim_in, num_neurons)
        # nn.init.xavier_uniform_(self.first5.weight.data)
        # nn.init.zeros_(self.first5.bias.data)  

        self.resblocks = nn.ModuleList([
            nn.ModuleList([nn.Linear(num_neurons, num_neurons) 
                for _ in range(num_layers_per_block)]) 
            for _ in range(num_resnet_blocks)])

        self.last = nn.Linear(num_neurons, dim_out)

        if fourier_features:
            self.first1 = nn.Linear(2*m_freqs, num_neurons)
            self.first2 = nn.Linear(2*m_freqs, num_neurons)
            self.first3 = nn.Linear(2*m_freqs, num_neurons)
            self.first4 = nn.Linear(2*m_freqs, num_neurons)
            self.first5 = nn.Linear(2*m_freqs, num_neurons)
            self.B = nn.Parameter(sigma*torch.randn(dim_in, m_freqs).to(device)) # to converts inputs to m_freqs

    def forward(self, x):
        if self.fourier_features:
            cosx = torch.cos(torch.matmul(x, self.B)).to(device)
            sinx = torch.sin(torch.matmul(x, self.B)).to(device)
            x = torch.cat((cosx, sinx), dim=1)
            x1 = self.activation(self.beta0*self.first1(x))
            x2 = self.activation(self.beta0*self.first2(x))
            x3 = self.activation(self.beta0*self.first3(x))
            x4 = self.activation(self.beta0*self.first4(x))
            x5 = self.activation(self.beta0*self.first5(x))
            id1 = x1
            id2 = x2
            id3 = x3
            id4 = x4
            id5 = x5

        else:
            x1 = self.activation(self.beta0*self.first1(x))
            x2 = self.activation(self.beta0*self.first2(x))
            x3 = self.activation(self.beta0*self.first3(x))
            x4 = self.activation(self.beta0*self.first4(x))
            x5 = self.activation(self.beta0*self.first5(x))
            id1 = x1
            id2 = x2
            id3 = x3
            id4 = x4
            id5 = x5

        for i in range(self.num_resnet_blocks):
            z1 = self.activation(self.beta[i][0]*self.resblocks[i][0](x1))
            z2 = self.activation(self.beta[i][0]*self.resblocks[i][0](x2))
            z3 = self.activation(self.beta[i][0]*self.resblocks[i][0](x3))
            z4 = self.activation(self.beta[i][0]*self.resblocks[i][0](x4))
            z5 = self.activation(self.beta[i][0]*self.resblocks[i][0](x5))

            for j in range(1, self.num_layers_per_block):
                z11 = self.activation(self.beta[i][j]*self.resblocks[i][j](z1))
                z22 = self.activation(self.beta[i][j]*self.resblocks[i][j](z2))
                z33 = self.activation(self.beta[i][j]*self.resblocks[i][j](z3))
                z44 = self.activation(self.beta[i][j]*self.resblocks[i][j](z4))
                z55 = self.activation(self.beta[i][j]*self.resblocks[i][j](z5))
                z11 = z11 + id1
                z22 = z22 + id2
                z33 = z33 + id3 
                z44 = z44 + id4 
                z55 = z55 + id5 

                for j in range(1, self.num_layers_per_block):
                    z111 = self.activation(self.beta[i][j]*self.resblocks[i][j](z11))
                    z222 = self.activation(self.beta[i][j]*self.resblocks[i][j](z22))
                    z333 = self.activation(self.beta[i][j]*self.resblocks[i][j](z33))
                    z444 = self.activation(self.beta[i][j]*self.resblocks[i][j](z44))
                    z555 = self.activation(self.beta[i][j]*self.resblocks[i][j](z55))

                x =  z111 + z222 + z333 + z444 +z555

        out = self.last(x)

        return out

Fourier_dense_net1 = DenseResNet(dim_in=2, dim_out=1, num_resnet_blocks=3, 
                        num_layers_per_block=3, num_neurons=25, activation=nn.LeakyReLU(0.65),
                        fourier_features=True, m_freqs=30, sigma=8, tune_beta=True)

PINN = Fourier_dense_net1.to(device)

In [6]:

x = torch.randn(2,2).to(device).requires_grad_(True)
y = PINN(x)
#make_dot(y, params=dict(list(PINN.named_parameters()))).render("Residual_net", format="png")
make_dot(y, params=dict(list(PINN.named_parameters()))).render("Residual_net_svg", format="svg")

Out[6]:

'Residual_net_svg.svg'

The architecture of NN is as follow image info or in details

For the training loop, the ADAM is used, but in the last part, LFBGS tries to find the minimum of the objective function. LFBGS is the second-order optimizer that uses Hessian approximations.

In [7]:

X_test=x_test.float().to(device) # the input dataset (complete)
U_test=U_test.float().to(device) # the real solution 

lr = 5e-4
optimizer = torch.optim.Adam(PINN.parameters(),lr=lr)#,amsgrad=True, weight_decay=1e-4)#, eps=1e-8)
optimizer_LBFGS = LBFGS(PINN.parameters(), history_size=250, max_iter=5000, line_search_fn="strong_wolfe")
loss_function = nn.MSELoss(reduction ='mean')
loss_evolution = []
iter = 0
steps = 3000
test_loss = torch.tensor([np.Inf]).to(device)

def train():
    x_BC= PINN(X_train_boundaries) 
    loss_BC=loss_function(x_BC, U_train_boundaries)
    g=X_train_total.clone() 
    g.requires_grad=True #Enable differentiation
    U=PINN(g)
    U_x_t = autograd.grad(U,g,torch.ones([g.shape[0], 1]).to(device), retain_graph=True, create_graph=True)[0] #first derivative
    U_xx_tt = autograd.grad(U_x_t,g,torch.ones(g.shape).to(device), create_graph=True)[0] #second derivative
    U_t=U_x_t[:,[1]]    # we select the 2nd element for t (the first one is x) (Remember the input X=[x,t]) 
    U_xx=U_xx_tt[:,[0]] # we select the 1st element for x (the second one is t) (Remember the input X=[x,t]) 
    U=U_t - 0.1*U_xx    # updated approximation
    loss_PDE = loss_function(U,U_hat)#U_hat=0 if you putt the equation on the LHS and leave 0 on the RHS
    loss = loss_BC + loss_PDE
    return loss

LOSSES = [] 
TEST_LOSSES = []   
ITERACJE = []
U_anim = np.zeros((total_points_x,total_points_t, steps))

for i in range(steps):
    optimizer.zero_grad()  
    loss = train()

    loss.backward()  
    optimizer.step()
    
    if test_loss < 0.00085:
        def closure():
            optimizer_LBFGS.zero_grad()  
            loss = train()
            loss.backward()
            return loss
        optimizer_LBFGS.step(closure)
        loss = closure()
        
    U_anim[:,:,i]=PINN(X_test).reshape(shape=[total_points_t,total_points_x]).transpose(1,0).detach().cpu()
    iter += 1
    if i%(steps/100)==0:
        
        with torch.no_grad():
            test_loss = loss_function(PINN(X_test),U_test)
            LOSSES.append(loss.detach().cpu().numpy())
            TEST_LOSSES.append(test_loss.detach().cpu().numpy())
        ITERACJE.append(iter+1)    
        print(iter+1,': ', loss.detach().cpu().numpy(),'---',test_loss.detach().cpu().numpy())

2 :  17.831263 --- 0.06674502
32 :  0.8528627 --- 0.045096662
62 :  0.46887738 --- 0.03815045
92 :  0.3483515 --- 0.032783423
122 :  0.27979183 --- 0.029782983
152 :  0.23454615 --- 0.026659124
182 :  0.20857567 --- 0.022835834
212 :  0.18034737 --- 0.019799935
242 :  0.1567404 --- 0.017043982
272 :  0.13741373 --- 0.014461114
302 :  0.12432193 --- 0.012347294
332 :  0.11400397 --- 0.010649349
362 :  0.10553551 --- 0.009277885
392 :  0.09758695 --- 0.008056894
422 :  0.090613715 --- 0.007042224
452 :  0.08397591 --- 0.0061979885
482 :  0.07785298 --- 0.005599733
512 :  0.073583506 --- 0.005206297
542 :  0.07036761 --- 0.004794056
572 :  0.06626605 --- 0.0043713637
602 :  0.062076867 --- 0.0039927894
632 :  0.059015736 --- 0.0036774992
662 :  0.056147415 --- 0.0033511138
692 :  0.053916126 --- 0.0029982005
722 :  0.052137773 --- 0.0026885574
752 :  0.049833156 --- 0.0024387538
782 :  0.04734771 --- 0.0021901745
812 :  0.045917965 --- 0.0020172156
842 :  0.044183552 --- 0.001905831
872 :  0.04269187 --- 0.0018400361
902 :  0.04158854 --- 0.0018210765
932 :  0.040080283 --- 0.0018218249
962 :  0.038920164 --- 0.0017986429
992 :  0.03762996 --- 0.0017868434
1022 :  0.03654671 --- 0.0018383138
1052 :  0.035482526 --- 0.0019106624
1082 :  0.033471853 --- 0.001967736
1112 :  0.032217413 --- 0.0020109985
1142 :  0.031479504 --- 0.0021108359
1172 :  0.030676434 --- 0.0020931186
1202 :  0.029369464 --- 0.002100398
1232 :  0.02843803 --- 0.0021050854
1262 :  0.027975153 --- 0.002091664
1292 :  0.026947834 --- 0.002084046
1322 :  0.025694445 --- 0.0020737713
1352 :  0.024684388 --- 0.0020156803
1382 :  0.023714343 --- 0.0019668655
1412 :  0.023391915 --- 0.0019362832
1442 :  0.02291016 --- 0.0019101453
1472 :  0.022383789 --- 0.001878988
1502 :  0.02211262 --- 0.0018505218
1532 :  0.0215569 --- 0.0018135986
1562 :  0.021240707 --- 0.0017499401
1592 :  0.0204192 --- 0.0017410574
1622 :  0.01951735 --- 0.0017269315
1652 :  0.019348072 --- 0.001690187
1682 :  0.018810073 --- 0.0016149393
1712 :  0.018284686 --- 0.0015347067
1742 :  0.017896678 --- 0.0014897673
1772 :  0.017378474 --- 0.0014541593
1802 :  0.016946211 --- 0.0014533042
1832 :  0.016659778 --- 0.0014741896
1862 :  0.016408408 --- 0.0014890792
1892 :  0.01586946 --- 0.0014865135
1922 :  0.015162466 --- 0.001514089
1952 :  0.014833924 --- 0.0014843846
1982 :  0.014315575 --- 0.0014532417
2012 :  0.0139346775 --- 0.0014245159
2042 :  0.013740085 --- 0.0014161682
2072 :  0.0134129375 --- 0.0014089798
2102 :  0.013393842 --- 0.001385577
2132 :  0.01337186 --- 0.0013623734
2162 :  0.013212851 --- 0.0013082016
2192 :  0.013226951 --- 0.0013273607
2222 :  0.013133822 --- 0.0013327848
2252 :  0.013023983 --- 0.0012875517
2282 :  0.012761017 --- 0.0012517462
2312 :  0.01233662 --- 0.0012283468
2342 :  0.012082236 --- 0.0012221389
2372 :  0.012048004 --- 0.0012135876
2402 :  0.011996787 --- 0.0011838165
2432 :  0.011750313 --- 0.0011590387
2462 :  0.01127777 --- 0.00110292
2492 :  0.010758452 --- 0.0010820739
2522 :  0.010409621 --- 0.0010778131
2552 :  0.010204878 --- 0.0010734971
2582 :  0.010047226 --- 0.0010786185
2612 :  0.010008708 --- 0.0010690913
2642 :  0.009843485 --- 0.0010602444
2672 :  0.009705795 --- 0.0010475096
2702 :  0.009547084 --- 0.0010410569
2732 :  0.009347187 --- 0.0010308431
2762 :  0.009239406 --- 0.0010292662
2792 :  0.009180894 --- 0.0010173836
2822 :  0.009157913 --- 0.0009968427
2852 :  0.009001478 --- 0.000981157
2882 :  0.008882746 --- 0.0009605056
2912 :  0.00867293 --- 0.00094431
2942 :  0.00848553 --- 0.00094033085
2972 :  0.008378607 --- 0.00093440607

In [8]:

fig, (ax1, ax2) = plt.subplots(2,1, figsize=(20,15))
fig.suptitle('Historical Losses', fontsize=30)
ax1.semilogy(ITERACJE[1:], LOSSES[1:], c='deeppink',linewidth=2.8, label='Train loss')
ax1.set_ylabel('Loss')
ax1.set_xlabel('Iterations')
ax1.legend()

ax2.semilogy(ITERACJE[1:], TEST_LOSSES[1:],c='dodgerblue',linewidth=2.8, label='Test loss')
ax2.set_ylabel('Test Loss')
ax2.set_xlabel('Iterations')
ax2.legend()
plt.show()

To obtain better results, it is advesible to extend traing to LFBGS part, however the usage of this optimisation algorithm is very time-consuming.

In [9]:

U1=PINN(X_test)
x1=X_test[:,0]
t1=X_test[:,1]

arr_x1=x1.reshape(shape=[total_points_t,total_points_x]).transpose(1,0).detach().cpu()
arr_T1=t1.reshape(shape=[total_points_t,total_points_x]).transpose(1,0).detach().cpu()
arr_U1=U1.reshape(shape=[total_points_t,total_points_x]).transpose(1,0).detach().cpu()
arr_U_test=U_test.reshape(shape=[total_points_t,total_points_x]).transpose(1,0).detach().cpu()

approx_sol(arr_x1,arr_T1,arr_U1)

It is effective to visualise the learning process. There are two animations below for 3-D and contour plots.
image info

The results of PINN is not as accurate as those obtained by numerical methods (finite difference or Method of Lines), but it shows that this novel approach works and is very promising for further research.

ANIMATIONS¶

If you are interested in creating animations read the further part of the post.

Contour plot¶

In [ ]:

#!conda install -c conda-forge ffmpeg
from mpl_toolkits.mplot3d import Axes3D 
import matplotlib.animation as animation
#import networkx as nx
from matplotlib.animation import FuncAnimation, PillowWriter, MovieWriter 
from IPython.display import HTML

#Contour plt
fps = 20 # frame per sec
frn = 1# frame number of the animation; the lower value the more frames

xarray = []
for i in range(steps):
    if i%frn == 0:
        xarray.append(U_anim[:,:,i])
carray = np.einsum('kil->ilk', np.array(xarray))

fig, ax = plt.subplots(figsize=(20,18))
fig.suptitle('Solution of Approximated Heat Equation', fontsize=30)
def animate(i):
    ax.clear()
    ax.contourf(T.numpy(), X.numpy(),carray[:,:,i], 30, cmap="twilight")
ani_2 = animation.FuncAnimation(fig, animate, carray.shape[2], interval=500, repeat=False)
fn = 'plot_contour_animation_funcanimation'
#ani_2.save(fn+'.gif',writer='imagemagick',fps=fps)
ani_2.save(fn+'.mp4',writer='ffmpeg',fps=fps)

To deploy plot in notebook

In [ ]:

import matplotlib
matplotlib.rcParams['animation.embed_limit'] = 2**128
plt.rcParams['animation.html'] = 'html5'
ani_2

3-D surface plot¶

This is how to creat top page surface animation.

In [ ]:

fps = 20 # frame per sec
frn = 10# frame number of the animation; the lower value the more frames

xarray = []
for i in range(steps):
    if i%frn == 0:
        xarray.append(U_anim[:,:,i])
carray = np.einsum('kil->ilk', np.array(xarray))


def update_plot(frame_number, zarray, plot):
    plot[0].remove()
    plot[0] = ax.plot_surface(T.numpy(), X.numpy(), carray[:,:,frame_number], cmap="twilight")


fig = plt.figure(figsize=(20,18))
fig.suptitle('Solution of Approximated Heat Equation', fontsize=30)
ax = fig.add_subplot(111, projection='3d')

plot = [ax.plot_surface(T.numpy(), X.numpy(), carray[:,:,0], color='0.75', rstride=1, cstride=1)]
ax.set_zlim(0,1.1)
ani = animation.FuncAnimation(fig, update_plot, steps//frn, fargs=(carray, plot), interval=5000/fps)

ax.set_title('$U(x,t)_{approximated}$')
ax.set_xlabel('t')
ax.set_ylabel('x')
fn = 'plot_surface_animation_funcanimation'
ani.save(fn+'.mp4',writer='ffmpeg',fps=fps)
ani.save(fn+'.gif',writer='imagemagick',fps=fps)

Physics-Informed Neural Networks (PINNs) have shown promise in solving Partial Differential Equations (PDEs) by incorporating physical laws into the learning process. However, they face challenges like lack of invariance to the mesh, which can affect their accuracy and generalization. On the other hand, the Neural Operator approach, inspired by the principles of Neural Networks and Operators, offers a way to learn mappings between function spaces, inherently providing mesh invariance. This feature of Neural Operators can significantly enhance the model's capability to generalize across different meshes, thus potentially overcoming a key limitation faced by PINNs in solving PDEs. For a deeper understanding, refer to the seminal paper on Neural Operators.

In the next post, I will look at the inverse problem.