PINNs in Forward & Inverse Problems

Lucy Nowacki

2023-10-05 17:23

ooo

In this post, we demonstrate the forward and inverse PINNs problem for the heat equation

$$U_t=0.1\,U_{xx}$$

with Dirichlet boundary conditions, homogeneous $U(0,t)=0$ and non-homogeneous $U(1,t)=\cos(2t)$, together with the initial condition $U(x,0)=\sqrt{x}$. Non-homogeneous solutions can cause difficulties, i.e., it is often not straightforward to obtain stable convergence, and the model overfits easily. The problem can be overcome by applying a bespoke NN architecture and/or Fourier Features. The analytical solution was obtained through Mathematica in the form of a 30-term expansion of the solution and is plotted below. For the residual term, we have a grid of 200x100, totaling 20000 points from which we will sample points, namely collocation points inside the domain, $\Omega$.

The first step is to obtain an analytical solution by means of Mathematica. You can use the Mathematica kernel for free. How to install it and use with Jupyter you can find here. Now I will swap to the Mathematica kernel and obtain the approximated analytical solution of the equation in question.

heqn = D[u[x, t], t] == 0.1*D[u[x, t], {x, 2}]

ic = u[x, 0] == Sqrt[x];

bc = {u[0, t] == 0, u[1, t] == Cos[2t]};

sol = DSolve[{heqn, ic, bc}, u[x, t], {x, t}];

asol = u[x, t] /. sol[[1]] /. {[Infinity] -> 30} //Activate//FullSimplify;

Load/install needed libraries

In [2]:

import matplotlib.pyplot as plt
import torch
from torch import nn
from torch import Tensor
import torch.autograd as autograd
import torch.optim as optim
from torch.optim import Adam
import numpy as np
#!pip install pyDOE
from pyDOE import lhs #for Latin hypercube sampling
#!pip install torchviz
from torchviz import make_dot

#!pip install jupyterthemes
from jupyterthemes import jtplot
jtplot.style(theme="monokai", context="notebook", ticks=True, grid=True)

print("CUDA available: ", torch.cuda.is_available())
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print("PyTorch version: ", torch.__version__ )

CUDA available:  True
PyTorch version:  1.13.1

In this post, we delve into the intricacies of Residual Networks (ResNets) and the role of the tune_beta parameter, represented as $ \beta $ in the mathematics, in deep learning architectures, especially when it pertains to the enhancement of these networks.

Residual Learning¶

In classical feed-forward neural networks, each layer is trained to directly map its input to the corresponding output. However, for exceedingly deep networks, this can be problematic due to challenges such as vanishing or exploding gradients. Enter Residual Networks (ResNets), which introduced a paradigm shift by leveraging "shortcut" or "skip" connections to enable identity mapping, thereby allowing layers to learn the residual between the input and output.

Mathematically expressed, instead of aiming to learn $ H(x) $, a ResNet endeavors to learn the residual function:

$$ F(x) = H(x) - x $$

During the forward pass, the input $ x $ is added to this residual function, rendering the actual output:

$$ H(x) = F(x) + x $$

Stochastic Depth and the Emergence of $ \beta $¶

Stochastic depth is a regularization technique designed to improve convergence and generalization in deep networks. The essence of this method is the random "dropping" or "bypassing" of layers during the training phase.

Originally, stochastic depth utilized a binary random value to scale a layer's output. However, the innovation of introducing $ \beta $ generalized this binary random value into a continuous, trainable parameter. This allows the network to dynamically adjust the significance of each layer's output during the training process.

Layer Scaling with $ \beta $¶

Layer scaling is based on the proposition that permitting the network to modify the significance or contribution of specific layers or blocks can optimize the training process. For a given layer's output $ F(x) $, scaling introduces the factor $ \beta $:

$$ \text{Scaled Output} = \beta \cdot F(x) $$

When tune_beta is set to True, the network considers $ \beta $ as a learnable parameter for each layer in the residual block. This parameter scales the layer's output prior to activation. Conversely, when set to False, $ \beta $ is fixed at a value of 1, ensuring the layer's output remains unchanged.

Fourier Features in Machine Learning¶

Fourier analysis provides a way to express functions in terms of a combination of sine and cosine functions. This powerful technique asserts that, irrespective of a function's complexity, it can be represented as a sum of simpler sinusoids, each with a distinct frequency and amplitude.

Fourier Transform: A Bridge to the Frequency Domain¶

The Fourier transform is an essential operation that takes a function from its original domain (typically time or space) and maps it to the frequency domain. Formally, for a function $ f(t) $, its Fourier transform $ F(\omega) $ is given by:

$$ F(\omega) = \int_{-\infty}^{\infty} f(t) e^{-j\omega t} dt $$

The inverse process, which recovers $ f(t) $ from $ F(\omega) $, is achieved using the inverse Fourier transform:

$$ f(t) = \frac{1}{2\pi} \int_{-\infty}^{\infty} F(\omega) e^{j\omega t} d\omega $$

Fourier Features in Neural Networks¶

When deploying neural networks for tasks like regression on functions with periodic patterns, directly using raw inputs might not capture the inherent oscillatory behavior. Fourier features come to the rescue by projecting these inputs into a higher-dimensional space, enhancing the network's expressive power.

Given an input $ x $, the Fourier feature mapping can be defined as:

$$ \phi(x) = \left[ \sin(2\pi B x), \cos(2\pi B x) \right] $$

Where ( B ) is a matrix of learned or fixed frequencies. This mapping produces a sinusoidal embedding for each input dimension, amplifying the model's capacity to learn periodic patterns.

Fourier Features and the Utility of $ \beta $¶

Fourier features, often used to transform input data, can amplify the expressive power of neural networks, particularly for functions that exhibit periodic patterns. These features essentially project the input data into a higher-dimensional space, making it easier for the network to learn complex, oscillatory functions.

When used in conjunction with $ \beta $, the neural network can dynamically decide the importance of Fourier-transformed features versus the original features. By allowing the network to learn the optimal scaling factor $ \beta $, it can effectively balance between the raw and transformed features, leading to improved convergence and better generalization to unseen data.

Benefits in Machine Learning¶

Expressiveness: With Fourier features, neural networks can learn intricate, oscillatory functions with fewer parameters.
Generalization: By emphasizing periodic patterns, models might generalize better to unseen data exhibiting similar behaviors.
Interpolation: For tasks like image synthesis, Fourier features can lead to smoother interpolations between data points.

In essence, Fourier features offer a bridge, allowing neural networks to tap into the rich world of frequency-domain information, thus enhancing their learning capabilities in certain tasks.

Let's demonstrate the working of Fourier features using a simple 1D regression example.¶

This snippet is a good example that will help you understnand the Fourier Features.

We'll first generate some synthetic data that follows a sine wave pattern. We'll then try to fit a neural network to this data using both the raw input features and the Fourier-transformed input features. By comparing the two fits, we can visualize the impact of the Fourier features.

In [14]:

# Generate synthetic data
x = np.linspace(0, 4 * np.pi, 500)[:, None]
y = np.sin(x)

# Convert to PyTorch tensors and move them to the device
x_tensor = torch.tensor(x, dtype=torch.float32).to(device)
y_tensor = torch.tensor(y, dtype=torch.float32).to(device)

# Define a simple feedforward neural network
class SimpleNN(nn.Module):
    def __init__(self, input_dim):
        super(SimpleNN, self).__init__()
        self.fc = nn.Sequential(
            nn.Linear(input_dim, 64),
            nn.ReLU(),
            nn.Linear(64, 1)
        )
        
    def forward(self, x):
        return self.fc(x)

# Fourier transformation
B = torch.randn(1, 10).to(device)
x_fourier = torch.cat([torch.sin(2 * np.pi * x_tensor @ B), torch.cos(2 * np.pi * x_tensor @ B)], dim=1)

# Training function
def train_model(steps, model, x, y):
    optimizer = optim.Adam(model.parameters(), lr=0.01)
    for _ in range(steps):
        optimizer.zero_grad()
        loss = nn.MSELoss()(model(x), y)
        loss.backward()
        optimizer.step()
    return model(x)

# Training steps
steps_list = [10, 30, 4000]

# Initialize and move the models to the device
model_fourier = SimpleNN(20).to(device)
model_plain = SimpleNN(1).to(device)

# Collect predictions
predictions_fourier = []
predictions_plain = []

for steps in steps_list:
    predictions_fourier.append(train_model(steps, model_fourier, x_fourier, y_tensor))
    predictions_plain.append(train_model(steps, model_plain, x_tensor, y_tensor))

# Plotting
fig, axs = plt.subplots(3, 1, figsize=(10, 12))

for i, (steps, y_pred_f, y_pred_p) in enumerate(zip(steps_list, predictions_fourier, predictions_plain)):
    axs[i].plot(x, y, label="True Function", linestyle='dashed')
    axs[i].plot(x, y_pred_f.cpu().detach().numpy(), label=f"Fourier Features after {steps} steps")
    axs[i].plot(x, y_pred_p.cpu().detach().numpy(), label=f"Without Fourier Features after {steps} steps")
    axs[i].legend()
    axs[i].set_title(f'After {steps} Training Steps')

plt.tight_layout()
plt.show()

No description has been provided for this image

Analyzing the line of the Fourier transformation

Step 1: Initialisation of vector of random frequencies¶

B = torch.randn(1, 10).to(device)

Here, we're creating a tensor $B$ of shape (1, 10) filled with random values sampled from a normal distribution. These random values serve as the frequencies for the Fourier features. Represented mathematically, this tensor is:

$$ B = \begin{bmatrix} b_1 & b_2 & \dots & b_{10} \end{bmatrix} $$

where each $b_i$ is a random frequency.

Step 2: Fourier Transformation - projection¶

Given an input tensor $x_{\text{tensor}}$, we use the Fourier transformation to project it into a higher-dimensional space using sinusoids.

x_fourier = torch.cat([torch.sin(2 * np.pi * x_tensor @ B), torch.cos(2 * np.pi * x_tensor @ B)], dim=1)

Here, we're computing the sine of the product of the input tensor and the frequency tensor $B$. The $@$ symbol represents matrix multiplication. The result is a tensor of sine values corresponding to each frequency in $B$. Mathematically:

$$ x_{\text{sin}} = \left[\sin(2\pi x_1 b_1), \sin(2\pi x_1 b_2), \ldots, \sin(2\pi x_1 b_{10})\right] $$

Similarly, we compute the cosine of the product of the input tensor and the frequency tensor. This results in a tensor of cosine values. Represented mathematically:

$$ x_{\text{cos}} = \left[\cos(2\pi x_1 b_1), \cos(2\pi x_1 b_2), \ldots, \cos(2\pi x_1 b_{10})\right] $$

Step 3: Concatenation¶

Finally, the sine and cosine transformed values are concatenated along dimension 1 to form the Fourier-transformed input:

$$ x_{\text{fourier}} = [x_{\text{sin}}, x_{\text{cos}}] $$

This transformed input, $x_{\text{fourier}}$, can then be used in subsequent layers of the neural network, enabling it to capture oscillatory patterns more effectively.

In conclusion, Fourier features, as implemented in the provided code, leverage the power of sinusoidal transformations to enrich the input space of neural networks. This often results in enhanced performance, especially in tasks where the data or the solution has inherent oscillatory patterns or periodicities. For example, one of the challenges in using PINNs for PDEs is enforcing boundary and initial conditions. Fourier features, due to their inherent oscillatory nature, can provide a better fit near boundaries and initial conditions, ensuring that the neural network respects these constraints.

Dealing with High-Frequency Signals¶

One of the inherent strengths of Fourier features is their ability to handle high-frequency signals. The reason lies in the fundamental principles of Fourier analysis. High-frequency signals, by definition, oscillate rapidly. The Fourier transform, which decomposes a function into its constituent sinusoids, is adept at capturing these rapid oscillations by representing them in terms of their sine and cosine components. Thus, when the input data has high-frequency variations, the Fourier features can effectively represent these oscillations in the transformed space, ensuring that the neural network doesn't miss out on these critical patterns.

In tasks involving PDEs solved by PINNs, this becomes especially relevant. Many solutions to PDEs can exhibit rapid changes or oscillatory behaviors, especially in complex domains or under specific boundary conditions. Fourier features ensure that these behaviors are captured and represented adequately, enhancing the ability of the PINN to approximate the solution more accurately.

The NN architecture implemented here has both $\beta$-tuning and Fourier Features that facilitate the training process significantly

In [3]:

seed = 123
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)  # if you have more than one GPU
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = True

def analytical_sol(x, t, y):
    x_plot = x.squeeze(1)
    t_plot = t.squeeze(1)
    X, T = torch.meshgrid(x_plot, t_plot)
    F_xt = y
    fig = plt.figure(figsize=(25, 12))
    fig.suptitle('Analytical Solution of the Heat Equation', fontsize=30)

    ax1 = fig.add_subplot(121, projection='3d')
    ax2 = fig.add_subplot(122)

    ax1.plot_surface(T.numpy(), X.numpy(), F_xt.numpy(), cmap="twilight")
    ax1.set_title('U(x,t)')
    ax1.set_xlabel('t')
    ax1.set_ylabel('x')

    # Rotate the plot by 180 degrees in the x-dimension
    ax1.view_init(elev=45, azim=45)  # You can adjust the 'elev' parameter for desired elevation

    cp = ax2.contourf(T, X, F_xt, 20, cmap="twilight")
    fig.colorbar(cp)  # Add a colorbar to a plot
    ax2.set_title('U(x,t)')
    ax2.set_xlabel('t')
    ax2.set_ylabel('x')
    plt.show()

def approx_sol(x,t,y):
    X,T= x,t
    F_xt = y
    fig = plt.figure(figsize=(25,15))
    fig.suptitle('NN Approximated Solution of the Heat Equation', fontsize=30)
    ax1 = fig.add_subplot(121, projection='3d')
    ax2 = fig.add_subplot(122)

    ax1.plot_surface(T.numpy(), X.numpy(), F_xt.numpy(),cmap="twilight")
    ax1.set_title('U(x,t)')
    ax1.set_xlabel('t')
    ax1.set_ylabel('x')
    # Rotate the plot by 180 degrees in the x-dimension
    ax1.view_init(elev=45, azim=45)  # You can adjust the 'elev' parameter for desired elevation

    cp = ax2.contourf(T,X, F_xt,20,cmap="twilight")
    fig.colorbar(cp) # Add a colorbar to a plot
    ax2.set_title('U(x,t)')
    ax2.set_xlabel('t')
    ax2.set_ylabel('x')
    plt.show()


def contour_comparison_with_errors(x, t, y_analytical, y_approximated):
    x_plot = x.squeeze(1)
    t_plot = t.squeeze(1)
    X, T = torch.meshgrid(x_plot, t_plot)
    
    error_matrix = y_analytical - y_approximated
    pointwise_error = torch.sqrt(error_matrix**2)
    frobenius_norm = torch.norm(error_matrix, p='fro').item()
    
    fig = plt.figure(figsize=(35, 12))
    fig.suptitle('Comparison of Analytical and NN Approximated Solutions', fontsize=30)

    # Analytical Solution
    ax1 = fig.add_subplot(131)
    cp1 = ax1.contourf(T, X, y_analytical, 20, cmap="twilight")
    fig.colorbar(cp1, ax=ax1)
    ax1.set_title('Analytical U(x,t)')
    ax1.set_xlabel('t')
    ax1.set_ylabel('x')

    # Approximated Solution
    ax2 = fig.add_subplot(132)
    cp2 = ax2.contourf(T, X, y_approximated, 20, cmap="twilight")
    fig.colorbar(cp2, ax=ax2)
    ax2.set_title('Approximated U(x,t)')
    ax2.set_xlabel('t')
    ax2.set_ylabel('x')

    # Error Visualization using the Euclidean norm for each point
    ax3 = fig.add_subplot(133)
    cp3 = ax3.contourf(T, X, pointwise_error, 20, cmap="inferno")
    fig.colorbar(cp3, ax=ax3)
    ax3.set_title(f'Pointwise Euclidean Error\nFrobenius Norm: {frobenius_norm:.4f}')
    ax3.set_xlabel('t')
    ax3.set_ylabel('x')
    
    plt.show()

We will approximate the heat equation on the box [0, 1]x[0, 1], but the first step we will render the analytical solution of equation in question from Mathematica. There are [x, t] = [200, 100] collocation points for the residual loss, and 40 and 40 points for initial and boundary conditions, respectively.

In [4]:

# We will approximate the heat equation on the box [0, 1]x[0, 1]:
x_min=0.
x_max=1.
t_min=0.
t_max=2.#2*torch.pi#(3.*torch.pi)/4.
#Collocation discretisation of the box, i.e. for the residual term
# total_points_x=200
# total_points_t=100

total_points_x=200
total_points_t=100

#Create mesh
x=torch.linspace(x_min,x_max,total_points_x).view(-1,1) #add dimension
print(f'x shape is {x.shape}')
t=torch.linspace(t_min,t_max,total_points_t).view(-1,1)

# Let's define the analytical solution of our heat equation. It is approximation made up 
# of the first 30 terms obtained by Mathematica as shown above. 

def u(x, t):
    pi = torch.pi
    exp = torch.exp
    sin = torch.sin
    cos = torch.cos

    expression = (
        x * cos(2 * t) +
        0.750034 * exp(-0.98696 * t) * sin(pi * x) +
        0.0126985 * exp(-3.94784 * t) * sin(2 * pi * x) +
        0.0541309 * exp(-8.88264 * t) * sin(3 * pi * x) +
        0.0253758 * exp(-15.7914 * t) * sin(4 * pi * x) +
        0.0210899 * exp(-24.674 * t) * sin(5 * pi * x) +
        0.0149057 * exp(-35.5306 * t) * sin(6 * pi * x) +
        0.0123551 * exp(-48.3611 * t) * sin(7 * pi * x) +
        0.00983617 * exp(-63.1655 * t) * sin(8 * pi * x) +
        0.00840252 * exp(-79.9438 * t) * sin(9 * pi * x) +
        0.00707543 * exp(-98.696 * t) * sin(10 * pi * x) +
        0.00619775 * exp(-119.422 * t) * sin(11 * pi * x) +
        0.00539475 * exp(-142.122 * t) * sin(12 * pi * x) +
        0.00481634 * exp(-166.796 * t) * sin(13 * pi * x) +
        0.00428605 * exp(-193.444 * t) * sin(14 * pi * x) +
        0.00388256 * exp(-222.066 * t) * sin(15 * pi * x) +
        0.00351044 * exp(-252.662 * t) * sin(16 * pi * x) +
        0.00321628 * exp(-285.232 * t) * sin(17 * pi * x) +
        0.00294317 * exp(-319.775 * t) * sin(18 * pi * x) +
        0.00272112 * exp(-356.293 * t) * sin(19 * pi * x) +
        0.00251363 * exp(-394.784 * t) * sin(20 * pi * x) +
        0.00234125 * exp(-435.25 * t) * sin(21 * pi * x) +
        0.00217921 * exp(-477.689 * t) * sin(22 * pi * x) +
        0.00204226 * exp(-522.102 * t) * sin(23 * pi * x) +
        0.00191284 * exp(-568.489 * t) * sin(24 * pi * x) +
        0.00180193 * exp(-616.85 * t) * sin(25 * pi * x) +
        0.00169662 * exp(-667.185 * t) * sin(26 * pi * x) +
        0.00160532 * exp(-719.494 * t) * sin(27 * pi * x) +
        0.00151825 * exp(-773.777 * t) * sin(28 * pi * x) +
        0.00144203 * exp(-830.034 * t) * sin(29 * pi * x) +
        1. * (
            sin(2. * t) * (
                0.252637 * sin(pi * x) -
                0.128324 * sin(2 * pi * x) +
                0.0454747 * sin(3 * pi * x) -
                0.019839 * sin(4 * pi * x) +
                0.0102531 * sin(5 * pi * x) -
                0.00595364 * sin(6 * pi * x) +
                0.00375469 * sin(7 * pi * x) -
                0.00251713 * sin(8 * pi * x) +
                0.00176852 * sin(9 * pi * x) -
                0.00128953 * sin(10 * pi * x) +
                0.00096897 * sin(11 * pi * x) -
                0.000746415 * sin(12 * pi * x) +
                0.000587108 * sin(13 * pi * x) -
                0.000470089 * sin(14 * pi * x) +
                0.000382209 * sin(15 * pi * x) -
                0.000314937 * sin(16 * pi * x) +
                0.000262568 * sin(17 * pi * x) -
                0.000221195 * sin(18 * pi * x) +
                0.000188077 * sin(19 * pi * x) -
                0.000161254 * sin(20 * pi * x) +
                0.000139297 * sin(21 * pi * x) -
                0.000121153 * sin(22 * pi * x) +
                0.000106028 * sin(23 * pi * x) -
                0.0000933193 * sin(24 * pi * x) +
                0.0000825631 * sin(25 * pi * x) -
                0.0000733984 * sin(26 * pi * x) +
                0.0000655414 * sin(27 * pi * x) -
                0.000058767 * sin(28 * pi * x) +
                0.0000528949 * sin(29 * pi * x) -
                0.0000477798 * sin(30 * pi * x)
            ) +
            cos(2. * t) * (
                -0.511949 * sin(pi * x) +
                0.0650094 * sin(2 * pi * x) -
                0.010239 * sin(3 * pi * x) +
                0.00251264 * sin(4 * pi * x) -
                0.000831087 * sin(5 * pi * x) +
                0.000335128 * sin(6 * pi * x) -
                0.000155277 * sin(7 * pi * x) +
                0.0000796995 * sin(8 * pi * x) -
                0.0000442442 * sin(9 * pi * x) +
                0.0000261314 * sin(10 * pi * x) -
                0.0000162276 * sin(11 * pi * x) +
                0.0000105038 * sin(12 * pi * x) -
                7.03982e-6 * sin(13 * pi * x) +
                4.8602e-6 * sin(14 * pi * x) -
                3.4423e-6 * sin(15 * pi * x) +
                2.49295e-6 * sin(16 * pi * x) -
                1.84109e-6 * sin(17 * pi * x) +
                1.38344e-6 * sin(18 * pi * x) -
                1.05574e-6 * sin(19 * pi * x) +
                8.1692e-7 * sin(20 * pi * x) -
                6.40081e-7 * sin(21 * pi * x) +
                5.07247e-7 * sin(22 * pi * x) -
                4.06158e-7 * sin(23 * pi * x) +
                3.28306e-7 * sin(24 * pi * x) -
                2.67692e-7 * sin(25 * pi * x) +
                2.20024e-7 * sin(26 * pi * x) -
                1.82187e-7 * sin(27 * pi * x) +
                1.51896e-7 * sin(28 * pi * x) -
                1.27452e-7 * sin(29 * pi * x) +
                1.0758e-7 * sin(30 * pi * x)
            )
        ) +
        0.00136908 * exp(-888.264 * t) * sin(30 * pi * x)
    )

    return expression

# Create the mesh 
X,T=torch.meshgrid(x.squeeze(1),t.squeeze(1))#[200,100]
print(x.shape)#[200,1]
print(t.shape)#[100,1]
print(X.shape)#[200,100]
print(T.shape)#[200,100]

x_min_tens = torch.Tensor([x_min]) 
x_max_tens = torch.Tensor([x_max]) 
#To get analytical solution obtained via MATHEMATICA
f_real = lambda x, t:  u(x, t)
# Evaluate real solution on the box domain
U_real=f_real(X,T)
print(f'U_real shape is: {U_real.shape}')
analytical_sol(x,t,U_real) #f_real was defined previously(function)

x shape is torch.Size([200, 1])
torch.Size([200, 1])
torch.Size([100, 1])
torch.Size([200, 100])
torch.Size([200, 100])
U_real shape is: torch.Size([200, 100])

/home/lucy/anaconda3/envs/lucy/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1670525541990/work/aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]

The class Feeder() manages the mesh together with boundary and initial conditions. Here we have a tensor (x, t) for $u(x,t)$. So we have to flatten it in each dimension, and next to cocatenate obtained long vectors into two column vector. After all, we sample either for the collocation points or boundary conditions. Everthing is processed by class Feeder.

In [5]:

class Feeder():
    def __init__(self, N_IC, N_BC, x_min, x_max, t_min, t_max, tot_pts_x, tot_pts_t, real_func, 
                 left_boundary_func, bottom_boundary_func, top_boundary_func):
        # Create mesh
        self.x = torch.linspace(x_min, x_max, tot_pts_x).view(-1,1)
        self.t = torch.linspace(t_min, t_max, tot_pts_t).view(-1,1)
        # Create the mesh
        self.X, self.T = torch.meshgrid(self.x.squeeze(1), self.t.squeeze(1))  # [200,100]
        # Set the functions as attributes of the class
        self.real_func = real_func
        self.left_boundary_func = left_boundary_func
        self.bottom_boundary_func = bottom_boundary_func
        self.top_boundary_func = top_boundary_func
        self.N_IC = N_IC
        self.N_BC = N_BC

    def func_real(self):
        self.U_real = self.real_func(self.X, self.T)
        return self.x, self.t, self.U_real

    def get_tensors(self, N_residual):
        # Call func_real() with the provided real_func
        self.func_real()

        # Prepare testing data
        x_test = torch.hstack((self.X.T.flatten()[:, None], self.T.T.flatten()[:, None]))  # [200x100 -> 20000,2]
        U_test = self.U_real.T.flatten()[:, None]  # [200x100 -> 20000,1]
        # Domain bounds
        lb = x_test[0]  # first value of the mesh
        ub = x_test[-1]  # last value of the mesh

        # Left Edge: U(x,0) -> xmin=<x=<xmax; t=0
        left_X = torch.hstack((self.X[:, 0][:, None], self.T[:, 0][:, None]))  # [200,2]
        left_U = self.left_boundary_func(left_X)  # [200,1]

        # Bottom Edge: U(x_min=0,t) = sin(2x); tmin=<t=<max
        bottom_X = torch.hstack([self.X[0, :][:, None], self.T[0, :][:, None]])  # [100,2]
        bottom_U = self.bottom_boundary_func(bottom_X)  # [100,1]

        # Top Edge: U(x_max=1,t) = 0 ; 0=<t=<1
        top_X = torch.hstack((self.X[-1, :][:, None], self.T[-1, :][:, None]))  # [100,2]
        top_U = self.top_boundary_func(top_X)  # [100,1]

        # N_IC=20
        idx = np.random.choice(left_X.shape[0], self.N_IC, replace=False)
        X_IC = left_X[idx, :]  # [15,2]
        U_IC = left_U[idx, :]  # [15,1]

        # Stack all boundaries vertically
        X_BC = torch.vstack([bottom_X, top_X])  # [150,2]
        U_BC = torch.vstack([bottom_U, top_U])  # [150,1]

        # N_BC=200
        idx = np.random.choice(X_BC.shape[0], self.N_BC, replace=False)
        X_BC = X_BC[idx, :]  # [100,2]
        U_BC = U_BC[idx, :]  # [100,1]

        # Stack all boundaries vertically
        X_train_boundaries = torch.vstack([X_IC, X_BC])  # [200,2]
        U_train_boundaries = torch.vstack([U_IC, U_BC])  # [200,1]

        # Collocation Points to evaluate PDE residual
        X_train_residual = lb + (ub - lb) * torch.tensor(lhs(2, N_residual)).float()  # [1000,2]
        X_train_total = torch.vstack((X_train_residual, X_train_boundaries))  # [10000,2]

        # Store tensors to GPU
        X_train_boundaries = X_train_boundaries.float().to(device)  # Training Points (BC)
        U_train_boundaries = U_train_boundaries.float().to(device)  # Training Points (BC)
        X_train_total = X_train_total.float().to(device)  # Collocation Points
        U_hat = torch.zeros(X_train_total.shape[0], 1).to(device)  # [10100,1]

        return lb, ub, x_test, U_test, X_train_boundaries, U_train_boundaries, X_train_total, U_hat

In [6]:

def initial_condition_function(left_X):
    return torch.sqrt(left_X[:, 0]).unsqueeze(1)

def bottom_boundary_function(bottom_X):
    return torch.zeros(bottom_X.shape[0], 1)

def top_boundary_function(top_X):
    return torch.cos(2. * top_X[:, 1]).unsqueeze(1)

# Assuming you have a real_func defined somewhere
dane = Feeder(N_IC=40, N_BC=40, x_min=x_min, x_max=x_max, t_min=t_min, t_max=t_max, tot_pts_x=total_points_x, tot_pts_t=total_points_t, real_func=f_real,
                left_boundary_func=initial_condition_function, 
                bottom_boundary_func=bottom_boundary_function, 
                top_boundary_func=top_boundary_function)
# We will use 10000 collocation points
lb, ub, x_test, U_test, X_train_boundaries, U_train_boundaries, X_train_total, U_hat  = dane.get_tensors(10000)
print('x_test_shape: ', x_test.shape)
print('U_test_shape: ', U_test.shape)
print('X_train_boundaries_shape: ', X_train_boundaries.shape)
print('U_train_boundaries_shape: ', U_train_boundaries.shape)
print('X_train_total_shape: ', X_train_total.shape)
print('U_hat_shape: ',  U_hat.shape)

x_test_shape:  torch.Size([20000, 2])
U_test_shape:  torch.Size([20000, 1])
X_train_boundaries_shape:  torch.Size([80, 2])
U_train_boundaries_shape:  torch.Size([80, 1])
X_train_total_shape:  torch.Size([10080, 2])
U_hat_shape:  torch.Size([10080, 1])

ResNet with Fourier Features and tuning beta

In [5]:

class ResBlock(nn.Module):
    def __init__(self, num_layers, num_neurons, activation, tune_beta):
        super(ResBlock, self).__init__()
        
        # Create a list of linear layers with num_layers elements
        self.layers = nn.ModuleList([nn.Linear(num_neurons, num_neurons) for _ in range(num_layers)])
        self.activation = activation

        # If tune_beta is True, initialize beta for each layer in the ResBlock as learnable parameters
        if tune_beta:
            self.beta = nn.Parameter(torch.ones(num_layers, 1))
        else:
            # If tune_beta is False, set beta to a tensor of ones
            self.beta = torch.ones(num_layers, 1).to(device)

    def forward(self, x):
        identity = x
        for idx, layer in enumerate(self.layers):
            # Apply the activation function to the output of each layer scaled by beta
            x = self.activation(self.beta[idx] * layer(x)) + identity
        return x

class DenseResNet(nn.Module):
    def __init__(self, dim_in, dim_out, num_resnet_blocks,
                 num_layers_per_block, num_neurons, activation,
                 fourier_features, m_freqs, sigma, tune_beta, lb, ub):
        super(DenseResNet, self).__init__()
        self.fourier_features = fourier_features
        self.activation = activation
        self.lb = lb
        self.ub = ub

        if fourier_features:
            # Initialize learnable parameter B for Fourier features
            self.B = nn.Parameter(sigma * torch.randn(dim_in, m_freqs).to(device))
            dim_in = 2 * m_freqs

        # Define the first layer as a linear layer followed by activation
        self.first = nn.Sequential(
            nn.Linear(dim_in, num_neurons),
            activation,
        )

        # Create a list of ResBlock modules, each with num_layers_per_block layers
        # Pass the tune_beta flag to each ResBlock
        self.resblocks = nn.ModuleList([
            ResBlock(num_layers_per_block, num_neurons, activation, tune_beta)
            for _ in range(num_resnet_blocks)
        ])

        # Define the last layer as a linear layer
        self.last = nn.Linear(num_neurons, dim_out)

    def forward(self, x):
        ub = self.ub.float().to(device)
        lb = self.lb.float().to(device)
        
        # Perform feature scaling to the input
        x = (x - lb) / (ub - lb)

        if self.fourier_features:
            # Compute the cosine and sine components of the Fourier features
            cosx = torch.cos(torch.matmul(x, self.B)).to(device)
            sinx = torch.sin(torch.matmul(x, self.B)).to(device)
            
            # Concatenate the cosine and sine components along dimension 1
            x = torch.cat((cosx, sinx), dim=1)

        # Forward pass through the first layer
        x = self.first(x)
        
        # Forward pass through each ResBlock in the list
        for resblock in self.resblocks:
            x = resblock(x)

        # Forward pass through the last layer
        out = self.last(x)
        return out

To visualise the NN architecture

In [18]:

# lb = torch.tensor([0.0, 0.0]).to(device)  # Example lb
# ub = torch.tensor([1.0, 1.0]).to(device)  # Example ub

# Fourier_dense_net1 = DenseResNet(dim_in=2, dim_out=1, num_resnet_blocks=3, 
#                         num_layers_per_block=3, num_neurons=25, activation=nn.LeakyReLU(0.65),
#                         fourier_features=True, m_freqs=30, sigma=2*torch.pi, tune_beta=True, lb=lb, ub=ub).to(device)

# # PINN = Fourier_dense_net1.to(device) 
# x = torch.randn(2,2).to(device).requires_grad_(True)
# y = Fourier_dense_net1(x)
# #make_dot(y, params=dict(list(PINN.named_parameters()))).render("Residual_net", format="png")
# make_dot(y, params=dict(list(Fourier_dense_net1.named_parameters()))).render("Residual_PINN", format="svg")

The architecture of NN is as follow image info or in details

There are helper classes for the trainig purposes

In [23]:

class EarlyStopping:
    def __init__(self, patience=30, verbose=True, delta=0):
        # Initialize EarlyStopping with patience, verbosity, and delta
        self.patience = patience  # Number of steps with no improvement to wait before stopping
        self.verbose = verbose    # Whether to print messages when early stopping occurs
        self.test_loss_history = []  # Use a list to store the history of test losses
        self.delta = delta        # Minimum change in test loss to be considered an improvement

    def __call__(self, test_loss):
        # Calculate a score based on the negative of the test loss (lower loss is better)
        score = -test_loss
        self.test_loss_history.append(score)  # Append the score to the history of test losses

        # If we have more than `patience` results, then look back `patience` results
        if len(self.test_loss_history) > self.patience:
            # Check if the current score is better than the score from `patience` steps ago
            if score < self.test_loss_history[-self.patience] + self.delta:
                if self.verbose:
                    print(f"Early stopping due to no improvement over the last {self.patience} steps.")
                return True  # Early stop condition is met
        return False  # Continue training

The solving of forward and inverse problems using Physics-Informed Neural Networks (PINNs) involves creating a neural network that can learn to approximate the solutions to the given partial differential equations (PDEs) while respecting any known physics (i.e., the differential equations and boundary/initial conditions).

Forward Problem Algorithm:¶

For the forward problem, the goal is to find the function $u$ for a given data $f$ with specified parameters $\omega$. A typical approach for solving both the forward and inverse problems with PINNs, as seen in literature, is based [source]. Here is a simplified version of the forward algorithm:

Initialize Neural Network: Set up a neural network with an appropriate architecture.
Define Loss Function: The loss function incorporates a residual term from the PDE and possibly boundary/initial conditions.
Train Neural Network: Train the neural network to minimize the loss function using optimization algorithms like Adam or L-BFGS.
Evaluate: Evaluate the trained network to obtain the solution $u$ over the domain of interest.

Inverse Problem Algorithm:¶

For the implementation go here

In the inverse problem, the aim is to deduce parameters $\gamma$ from the given data. One way to approach this is by extending the traditional PINN framework to handle the inverse problem by incorporating the unknown parameters into the neural network and optimizing over these as well. A Bayesian approach has also been proposed, where a Bayesian Neural Network (BNN) combined with a PINN serves as the prior, and techniques like Hamiltonian Monte Carlo (HMC) or Variational Inference (VI) could serve as estimators [source]. Here is a simplified version of the algorithm:

Initialize Neural Network and Parameters: Set up a neural network with an appropriate architecture and initialize the unknown parameters $\gamma$.
Define Loss Function: The loss function incorporates a residual term from the PDE, boundary/initial conditions, and a term related to the discrepancy between the predicted and observed data.
Optimize: Train the neural network and optimize the unknown parameters $\gamma$ to minimize the loss function.
Evaluate: Evaluate the trained network and optimized parameters to obtain the solution $u$ and parameters $\gamma$ over the domain of interest.

In [24]:

X_test=x_test.float().to(device) # the input dataset (complete)
U_test=U_test.float().to(device) # the real solution


class Fabryka(nn.Module):
    def __init__(self, X_train_boundaries, U_train_boundaries, U_hat, X_train_total, total_pts_x, total_pts_t, dim_in, dim_out, num_resnet_blocks,
                 num_layers_per_block, num_neurons, activation,
                 fourier_features, m_freqs, sigma, tune_beta, lb, ub):
        super(Fabryka, self).__init__()

        self.X_train_boundaries = X_train_boundaries
        self.U_train_boundaries = U_train_boundaries
        self.U_hat = U_hat
        self.X_train_total = X_train_total
        self.total_pts_x = total_pts_x
        self.total_pts_t = total_pts_t

        self.loss_function = nn.MSELoss(reduction ='mean')
        # Initialize the DenseResNet inside FCN
        self.dense_res_net = DenseResNet(dim_in, dim_out, num_resnet_blocks,
                                         num_layers_per_block, num_neurons, activation,
                                         fourier_features, m_freqs, sigma, tune_beta, lb=lb, ub=ub)

    def forward_loss(self):
        x_BC= self.dense_res_net(self.X_train_boundaries)
        loss_BC = self.loss_function(x_BC, self.U_train_boundaries) #Loss
        g=X_train_total.clone()  # points to training
        g.requires_grad=True # enable differentiation
        U = self.dense_res_net(g)
        # DERVIVATIVES COMPUTATIONS
        U_x_t = autograd.grad(U,g,torch.ones([g.shape[0], 1]).to(device), retain_graph=True, create_graph=True)[0] #first derivative
        U_xx_tt = autograd.grad(U_x_t,g,torch.ones(g.shape).to(device), create_graph=True)[0] #second derivative
        U_t=U_x_t[:,[1]]    # we select the 2nd element for t (the first one is x) (Remember the input X=[x,t])
        U_xx=U_xx_tt[:,[0]] # we select the 1st element for x (the second one is t) (Remember the input X=[x,t])
        U=U_t - 0.1*U_xx    # updated approximation
        loss_PDE = self.loss_function(U,self.U_hat)#U_hat=0 if you putt the equation on the LHS and leave 0 on the RHS
        loss = loss_BC + loss_PDE
        return loss

    def test_PINN(self, X_test):
      return self.dense_res_net(X_test)


    def __call__(self, steps, lr):

        optimizer = torch.optim.Adam(self.parameters(), lr=lr)

        LOSSES = []
        TEST_LOSSES = []
        ITERACJE = []
        #LEARNING_RATES = []  # Initialization
        iter = 0
        test_loss = torch.tensor([np.Inf]).to(device)
        U_anim = np.zeros((self.total_pts_x, self.total_pts_t, steps))

        early_stopping_test = EarlyStopping(patience=30, verbose=True)
        print('Iter|Training Losses|Test Losses')
        for i in range(steps):
            optimizer.zero_grad()
            loss = self.forward_loss()
            loss.backward()
            optimizer.step()

            # Updated to handle the larger number of steps
            if i < steps:
                U_anim[:, :, i] = self.dense_res_net(X_test).reshape(shape=[self.total_pts_t, self.total_pts_x]).transpose(1, 0).detach().cpu()
            iter += 1

            if i % 30 == 0:
                with torch.no_grad():
                    self.loss_function.eval()
                    # Compute test loss
                    test_loss = self.loss_function(self.dense_res_net(X_test), U_test)
                    LOSSES.append(loss.detach().cpu().numpy())
                    TEST_LOSSES.append(test_loss.detach().cpu().numpy())
                    #current_lr = optimizer.param_groups[0]['lr']
                    #LEARNING_RATES.append(current_lr)
                    ITERACJE.append(iter)

                    # Early stopping check
                    if early_stopping_test(test_loss.detach().cpu()):
                        break
                
                print(iter-1, ': ', loss.detach().cpu().numpy(), '---', test_loss.detach().cpu().numpy())#, '--- LR:', current_lr)
        
        return U_anim, ITERACJE, LOSSES, TEST_LOSSES

In [28]:

seed = 123
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)  # if you have more than one GPU
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = True

fcn_model = Fabryka(X_train_boundaries, U_train_boundaries, U_hat, X_train_total, total_points_x, total_points_t, dim_in=2, dim_out=1, num_resnet_blocks=3,
                        num_layers_per_block=3, num_neurons=25, activation=nn.ELU(0.7),#nn.LeakyReLU(0.65),
                        fourier_features=True, m_freqs=29, sigma=2, tune_beta=True, lb=lb, ub=ub)  # Fill in the other required arguments
PINN = fcn_model.to(device)
steps = 4000  # or whatever number of steps you want

learning_rate = 8e-5
U_anim, ITERACJE, LOSSES, TEST_LOSSES = PINN(steps, learning_rate)

Iter|Training Losses|Test Losses
0 :  0.468925 --- 0.28691
30 :  0.2621652 --- 0.113855176
60 :  0.20304663 --- 0.065380335
90 :  0.16770844 --- 0.050485965
120 :  0.13713512 --- 0.038412724
150 :  0.1098391 --- 0.02873213
180 :  0.086597115 --- 0.021036236
210 :  0.06781949 --- 0.015839787
240 :  0.053009707 --- 0.012410511
270 :  0.041080147 --- 0.010183584
300 :  0.031839874 --- 0.008684642
330 :  0.024757434 --- 0.0076582395
360 :  0.01974354 --- 0.006889947
390 :  0.016145233 --- 0.006218079
420 :  0.013526186 --- 0.0057333764
450 :  0.011637928 --- 0.005431896
480 :  0.009868612 --- 0.005092977
510 :  0.008498156 --- 0.004778018
540 :  0.0074970415 --- 0.004501841
570 :  0.0067845127 --- 0.0042877793
600 :  0.006226059 --- 0.004070964
630 :  0.005765113 --- 0.0038868117
660 :  0.0054019545 --- 0.0037612708
690 :  0.0050892504 --- 0.0035976954
720 :  0.004776342 --- 0.003446025
750 :  0.0045442753 --- 0.0033423544
780 :  0.0043516154 --- 0.00321678
810 :  0.004199932 --- 0.0031565323
840 :  0.004042793 --- 0.0030986436
870 :  0.003882287 --- 0.0030353274
900 :  0.003710757 --- 0.0029827754
930 :  0.0035694437 --- 0.0029258812
960 :  0.0034092527 --- 0.002865794
990 :  0.0032419157 --- 0.0028174834
1020 :  0.003149318 --- 0.0027772353
1050 :  0.0030456046 --- 0.0027627475
1080 :  0.0028909561 --- 0.0027378497
1110 :  0.0028413073 --- 0.0027280853
1140 :  0.0027200207 --- 0.0027100479
1170 :  0.0026494698 --- 0.002683589
1200 :  0.0025702505 --- 0.002667365
1230 :  0.0025124962 --- 0.0026474625
1260 :  0.002448263 --- 0.0026071856
1290 :  0.002377959 --- 0.0025808895
1320 :  0.0023367987 --- 0.0025443875
1350 :  0.002270943 --- 0.0025166022
1380 :  0.0022320156 --- 0.0024998365
1410 :  0.002181276 --- 0.0024441583
1440 :  0.0021435595 --- 0.0024084044
1470 :  0.0020856662 --- 0.0023469864
1500 :  0.0020219139 --- 0.0022983577
1530 :  0.0019888368 --- 0.0022292216
1560 :  0.0019703903 --- 0.0021512657
1590 :  0.001951131 --- 0.0020587128
1620 :  0.0019260589 --- 0.0019524704
1650 :  0.0019119659 --- 0.0018759979
1680 :  0.0018765854 --- 0.0017755636
1710 :  0.0018453137 --- 0.0016931132
1740 :  0.0017868083 --- 0.001613018
1770 :  0.0017254128 --- 0.0015507904
1800 :  0.0016898145 --- 0.0014790513
1830 :  0.001650763 --- 0.0014157251
1860 :  0.0016297187 --- 0.0013635923
1890 :  0.0015918473 --- 0.0013193875
1920 :  0.001563162 --- 0.0012664263
1950 :  0.0015677896 --- 0.0012403495
1980 :  0.0016253946 --- 0.0012094359
2010 :  0.0016668786 --- 0.0011979443
2040 :  0.0016486959 --- 0.0011799183
2070 :  0.001655466 --- 0.0011638915
2100 :  0.0016211409 --- 0.0011522169
2130 :  0.0016085415 --- 0.0011521807
2160 :  0.0015830229 --- 0.0011525482
2190 :  0.0015499955 --- 0.0011422847
2220 :  0.0014853115 --- 0.0011327434
2250 :  0.0013992008 --- 0.0011219531
2280 :  0.0013436937 --- 0.0011108606
2310 :  0.0012901663 --- 0.0011086622
2340 :  0.001253202 --- 0.0011035432
2370 :  0.0012286587 --- 0.0010904134
2400 :  0.0011829687 --- 0.0010685654
2430 :  0.0011353065 --- 0.001038318
2460 :  0.0011126392 --- 0.0010093594
2490 :  0.0010938832 --- 0.0009819814
2520 :  0.001068648 --- 0.0009471277
2550 :  0.0010525509 --- 0.0009117645
2580 :  0.0010656575 --- 0.00087259937
2610 :  0.0010670838 --- 0.0008309442
2640 :  0.0010907757 --- 0.00080858875
2670 :  0.0010618016 --- 0.00078817835
2700 :  0.001003453 --- 0.0007754855
2730 :  0.000997267 --- 0.00077035255
2760 :  0.0009832024 --- 0.0007678535
2790 :  0.0009898666 --- 0.00076933403
2820 :  0.0009720258 --- 0.0007785238
2850 :  0.0009329722 --- 0.0007839675
2880 :  0.00088877714 --- 0.0007940581
2910 :  0.0008495425 --- 0.00080384384
2940 :  0.0008251701 --- 0.0008045991
2970 :  0.00079786486 --- 0.000805407
3000 :  0.00077164016 --- 0.0008069982
3030 :  0.0007444651 --- 0.00080586143
3060 :  0.00072155613 --- 0.000802927
3090 :  0.0006996202 --- 0.000803055
3120 :  0.000687537 --- 0.0008039736
3150 :  0.0006753947 --- 0.000799291
3180 :  0.00066558574 --- 0.0007981512
3210 :  0.0006639491 --- 0.0007970973
3240 :  0.00065873796 --- 0.00079331343
3270 :  0.00065332593 --- 0.00078922353
3300 :  0.0006397837 --- 0.00078767736
3330 :  0.00064285 --- 0.0007852361
3360 :  0.00062098046 --- 0.0007821692
3390 :  0.00062533113 --- 0.0007789852
3420 :  0.0006127054 --- 0.00077238935
3450 :  0.0006183595 --- 0.0007647131
3480 :  0.00062309264 --- 0.00075954146
3510 :  0.00061472144 --- 0.0007544256
3540 :  0.0006060315 --- 0.00074986694
3570 :  0.00061153655 --- 0.00074819825
3600 :  0.0005951432 --- 0.000745387
3630 :  0.00059012405 --- 0.0007451246
3660 :  0.00057858543 --- 0.00074309786
3690 :  0.00057183317 --- 0.0007396285
3720 :  0.0005619305 --- 0.00073659705
3750 :  0.00054851035 --- 0.00073202036
3780 :  0.0005448676 --- 0.0007298937
3810 :  0.0005307372 --- 0.0007244989
3840 :  0.00052222935 --- 0.00072126207
3870 :  0.0005062257 --- 0.00071828277
3900 :  0.00050085003 --- 0.0007160879
3930 :  0.00049519213 --- 0.00071459956
3960 :  0.0004798803 --- 0.00071136537
3990 :  0.00048315214 --- 0.0007084526

In [29]:

fig, (ax1, ax2) = plt.subplots(2,1, figsize=(20,15))
fig.suptitle('Historical Losses', fontsize=30)
ax1.semilogy(ITERACJE[1:], LOSSES[1:], c='deeppink',linewidth=2.8, label='Train loss')
ax1.set_ylabel('Loss')
ax1.set_xlabel('Iterations')
ax1.legend()

ax2.semilogy(ITERACJE[1:], TEST_LOSSES[1:],c='dodgerblue',linewidth=2.8, label='Test loss')
ax2.set_ylabel('Test Loss')
ax2.set_xlabel('Iterations')
ax2.legend()
plt.show()

In [ ]:

# Save the model parameters
model_path = 'fcn_model.pth'
torch.save(PINN.state_dict(), model_path)

In [27]:

# Create a new instance of the model class
new_fcn_model = Fabryka(
    X_train_boundaries, U_train_boundaries, U_hat, X_train_total,
    total_points_x, total_points_t, dim_in=2, dim_out=1,
    num_resnet_blocks=3, num_layers_per_block=3, num_neurons=25,
    activation=nn.ELU(0.7), fourier_features=True, m_freqs=29,
    sigma=2, tune_beta=True, lb=lb, ub=ub
)
new_PINN = new_fcn_model.to(device)

# Load the saved parameters
model_path = 'fcn_model.pth'
new_PINN.load_state_dict(torch.load(model_path))

# Set the model to evaluation mode
new_PINN.eval()

Out[27]:

Fabryka(
  (loss_function): MSELoss()
  (dense_res_net): DenseResNet(
    (activation): ELU(alpha=0.7)
    (first): Sequential(
      (0): Linear(in_features=58, out_features=25, bias=True)
      (1): ELU(alpha=0.7)
    )
    (resblocks): ModuleList(
      (0): ResBlock(
        (layers): ModuleList(
          (0): Linear(in_features=25, out_features=25, bias=True)
          (1): Linear(in_features=25, out_features=25, bias=True)
          (2): Linear(in_features=25, out_features=25, bias=True)
        )
        (activation): ELU(alpha=0.7)
      )
      (1): ResBlock(
        (layers): ModuleList(
          (0): Linear(in_features=25, out_features=25, bias=True)
          (1): Linear(in_features=25, out_features=25, bias=True)
          (2): Linear(in_features=25, out_features=25, bias=True)
        )
        (activation): ELU(alpha=0.7)
      )
      (2): ResBlock(
        (layers): ModuleList(
          (0): Linear(in_features=25, out_features=25, bias=True)
          (1): Linear(in_features=25, out_features=25, bias=True)
          (2): Linear(in_features=25, out_features=25, bias=True)
        )
        (activation): ELU(alpha=0.7)
      )
    )
    (last): Linear(in_features=25, out_features=1, bias=True)
  )
)

In [28]:

U1=new_PINN.test_PINN(X_test)
x1=X_test[:,0]
t1=X_test[:,1]

arr_x1=x1.reshape(shape=[total_points_t,total_points_x]).transpose(1,0).detach().cpu()
arr_T1=t1.reshape(shape=[total_points_t,total_points_x]).transpose(1,0).detach().cpu()
arr_U1=U1.reshape(shape=[total_points_t,total_points_x]).transpose(1,0).detach().cpu()
arr_U_test=U_test.reshape(shape=[total_points_t,total_points_x]).transpose(1,0).detach().cpu()

approx_sol(arr_x1,arr_T1,arr_U1)

In [31]:

contour_comparison_with_errors(x, t, U_real, arr_U1)

Inverse Problem¶

Denote a neural network (NN) function by $N$ and remember that PDE in question is a model that can be parametrised. We parametrise the assumed $u(x,t) + model[\theta; \alpha]=0$ with $u(x,t)=u_t$ as hidden solution and $model[\theta; \alpha]$ as non-linear operator. Note, there are also parameters $\theta$ inside the $N$. Thus, for the discovering purposes, we have to put parameters $\theta$ and $\alpha$ together. By solving the inverse proble, for example, having some temperature data flow, we can identify the PDE's parameters characteristic for a given material, and hence to identify this material.

Rearranging our heat eq $U_t-0.1\,U_{xx}$, or $u_t+model[\theta; \alpha]=0$, we use PINN to obtain $\alpha$ by setting $\theta$ parmeters inside the NN under the PDE's operator structure.

By universal approximation theorem, we can use neural network (NN), $N$, to approximate any function $u(x,t) \approx N(x,t)$ . Since $N$ is a function, we can obtain its derivatives via automatic differention. Then we can arrange these derivatives to mimic the PDE's opearator structure

$$u_t-\alpha u_{xx}=N_t-\alpha N_{xx}=0$$

Define function $f$

$$f(t, x)=u_t\underbrace{-\alpha u_{xx}}_{{+ model[\theta; \alpha]}}=N_t-\alpha N_{xx}$$

and try to

$$f(x,t)\approx 0$$

$f$ is a functional that is the operator taking the function and returns a scalar. It is also the argument of the loss function, $\mathcal{L}_{f}$ . If $f {\longrightarrow} 0$ then we discover our $\alpha $ parameter.

In order to better express the dependence on the parameters $\alpha$ and $\theta$ in the total loss expression $\mathcal{L}$, you might want to consider the following amendment to the total loss expression. Incorporating the additional loss due to boundary and initial conditions, the total loss function now can be expressed as:

$$ \mathcal{L}(\theta, \alpha) = \mathcal{L}_{f}(\theta, \alpha) + \mathcal{L}_{u=data}(\theta) + \mathcal{L}_{\text{b}}(\theta) $$

Here are the individual components of the total loss function:

Loss Relating to PDE: $$ \mathcal{L}_{f}(\theta, \alpha) = \frac{1}{N_{N\_f}} \sum _{i=1}^{N\_f} \left| f\left(x_n^i,t_n^i; \theta, \alpha\right) \right|^2 $$
Loss Relating to the Overall Performance of the Algorithm: $$ \mathcal{L}_{u=data}(\theta) = \frac{1}{N_{N\_f}} \sum _{i=1}^{N\_f} \left| u\left(x_n^i, t_n^i\right) - N\left(x_n^i,t_n^i; \theta\right) \right|^2 $$
Loss Relating to Boundary and Initial Conditions: $$ \mathcal{L}_{\text{b}}(\theta) = \frac{1}{N_{N\_b}} \sum _{i=1}^{N\_b} \left| u\left(x_b^i, t_b^i\right) - N\left(x_b^i,t_b^i; \theta\right) \right|^2 $$

In the above expressions:

$f$ and $N$ are functions parametrized by $\theta$ and $\alpha$.
$N_{N\_f}$ is the total number of collocation points.
$N_{N\_b}$ is the total number of boundary and initial condition points.
$(x_b^i, t_b^i)$ are the coordinates of the boundary and initial condition points.

The goal is to minimize the total loss $\mathcal{L}$ to obtain the neural network's parameters $\theta=[W_i, b_i]$ and the PDE's parameter $\alpha$.

Let's recall relevant shapes of tensors.

In [6]:

seed = 123
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)  # if you have more than one GPU
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = True

# We will approximate the heat equation on the box [0, 1]x[0, 1]:
x_min=0.
x_max=1.
t_min=0.
t_max=2.
#Collocation discretisation of the box, i.e. for the residual term

total_points_x=200
total_points_t=100

#Create mesh
x=torch.linspace(x_min,x_max,total_points_x).view(-1,1) #add dimension
print(f'x shape is {x.shape}')
t=torch.linspace(t_min,t_max,total_points_t).view(-1,1)

# Create the mesh 
X,T=torch.meshgrid(x.squeeze(1),t.squeeze(1))#[200,100]
print(x.shape)#[200,1]
print(t.shape)#[100,1]
print(X.shape)#[200,100]
print(T.shape)#[200,100]

x_min_tens = torch.Tensor([x_min]) 
x_max_tens = torch.Tensor([x_max]) 
#To get analytical solution obtained via MATHEMATICA
f_real = lambda x, t:  u(x, t)
# Evaluate real solution on the box domain
U_real=f_real(X,T)
print(f'U_real shape is: {U_real.shape}')
analytical_sol(x,t,U_real) #f_real was defined previously(function)

print(x.shape,t.shape, U_real.shape)
print(x.shape[0])
print(X.shape,T.shape)

x shape is torch.Size([200, 1])
torch.Size([200, 1])
torch.Size([100, 1])
torch.Size([200, 100])
torch.Size([200, 100])
U_real shape is: torch.Size([200, 100])

torch.Size([200, 1]) torch.Size([100, 1]) torch.Size([200, 100])
200
torch.Size([200, 100]) torch.Size([200, 100])

Training Data¶

For the inverse problem, we sample points from the PDE's solution which is given. Again we have to flatten (x,t)-grid of $u(x,t)$ in eaxh dimension, x and t. Next we concatenate obtained long vectors (remembering about keeping right order). We have to process initial and boundaries conditions since we know them and we can use them. However the latter is not necessary to recover $\alpha$ parameter albeit helpful.

In [7]:

def initial_condition_function(left_X):
    return torch.sqrt(left_X[:, 0]).unsqueeze(1)

def bottom_boundary_function(bottom_X):
    return torch.zeros(bottom_X.shape[0], 1)

def top_boundary_function(top_X):
    return torch.cos(2. * top_X[:, 1]).unsqueeze(1)

In [8]:

# Flatten and concatenate to obtain X_true
X_true = torch.cat((X.flatten(start_dim=0).unsqueeze(1), T.flatten(start_dim=0).unsqueeze(1)), dim=1)
print(X.flatten(start_dim=0).shape)

# Domain bounds
lb = X_true[0]  # [-1. 0.]
ub = X_true[-1] # [1.  1.]

usol_numpy = U_real.cpu().numpy()
U_true_numpy = usol_numpy.reshape(-1)#, order='F')  # Transpose, then flatten in Fortran order
U_true = torch.tensor(U_true_numpy, dtype=torch.float32).unsqueeze(1) # Convert back to tensor and add a new dimension

# Left Edge: U(x,0) -> xmin=<x=<xmax; t=0
initial_X = torch.hstack((X[:, 0][:, None], T[:, 0][:, None])) # [200,2]
initial_U = initial_condition_function(initial_X)  # [200,1]

idx = np.random.choice(initial_X.shape[0], 20, replace=False)
X_IC = initial_X[idx, :]  # [15,2]
U_IC = initial_U[idx, :]  # [15,1]

bottom_X = torch.hstack([X[0, :][:, None], T[0, :][:, None]])    # [100,2]
bottom_U = bottom_boundary_function(bottom_X)  # [100,1]

top_X = torch.hstack((X[-1, :][:, None], T[-1, :][:, None]))   # [100,2]
top_U = top_boundary_function(top_X)  # [100,1]

# Stack all boundaries vertically
X_BC = torch.vstack([bottom_X, top_X])  # [150,2]
U_BC = torch.vstack([bottom_U, top_U])  # [150,1]

# N_BC=200
idx = np.random.choice(X_BC.shape[0], 20, replace=False)
X_BC = X_BC[idx, :]  # [100,2]
U_BC = U_BC[idx, :]  # [100,1]

# Stack all boundaries vertically
X_train_boundaries = torch.vstack([X_IC, X_BC])  # [200,2]
U_train_boundaries = torch.vstack([U_IC, U_BC])  # [200,1]

# Store tensors to GPU
X_train_boundaries = X_train_boundaries.float().to(device)  # Training Points (BC)
U_train_boundaries = U_train_boundaries.float().to(device)  # Training Points (BC)

print('U_true', U_true.shape)

total_points=len(x)*len(t)
print('total_points: ', total_points)
N_f = 1000 #Total number of collocation points

# Obtain random points for interior
id_f = np.random.choice(total_points, N_f, replace=False)# Randomly chosen points for Interior
print('id_f: ', id_f.shape)
X_train_Nu = X_true[id_f]
U_train_Nu= U_true[id_f]

'send them to our GPU'
X_train_Nu = X_train_Nu.float().to(device)
print('X_train_Nu: ', X_train_Nu.shape)
U_train_Nu = U_train_Nu.float().to(device)
print('U_train_Nu: ', U_train_Nu.shape)
X_true = X_true.float().to(device)
print('X_true: ', X_true.shape)
U_true = U_true.float().to(device)
print('U_true: ', U_true.shape)
f_hat = torch.zeros(X_train_Nu.shape[0],1).to(device)
print('f_hat: ', f_hat.shape)

torch.Size([20000])

U_true torch.Size([20000, 1])
total_points:  20000
id_f:  (1000,)
X_train_Nu:  torch.Size([1000, 2])
U_train_Nu:  torch.Size([1000, 1])
X_true:  torch.Size([20000, 2])
U_true:  torch.Size([20000, 1])
f_hat:  torch.Size([1000, 1])

In [11]:

class Fabryka_Inverse(nn.Module):
    def __init__(self, dim_in, dim_out, num_resnet_blocks,
                 num_layers_per_block, num_neurons, activation,
                 fourier_features, m_freqs, sigma, tune_beta, alpha, lb, ub):
        super(Fabryka_Inverse, self).__init__()

        self.loss_function = nn.MSELoss(reduction ='mean')  # Mean squared error loss
        self.iter = 0  # Iteration counter

        # alpha is a parameter that will be learned (related to the PDE)
        self.alpha = torch.tensor([alpha], requires_grad=True).float().to(device)
        self.alpha = nn.Parameter(self.alpha)

        # DenseResNet is a custom network architecture used in this model
        self.dnn = DenseResNet(
            dim_in=dim_in, dim_out=dim_out,
            num_resnet_blocks=num_resnet_blocks,
            num_layers_per_block=num_layers_per_block,
            num_neurons=num_neurons, activation=activation,
            fourier_features=fourier_features, m_freqs=m_freqs,
            sigma=sigma, tune_beta=tune_beta, lb=lb, ub=ub
        ).to(device)

        # Registering alpha as a parameter of dnn
        self.dnn.register_parameter('alpha', self.alpha)

    def loss_data(self,x,y):
        # Loss against real data
        loss_u = self.loss_function(self.dnn(x), y)

        return loss_u
    
    def loss_boundaries(self, X_boundaries, U_boundaries):
        # Compute the loss for both initial and boundary conditions
        loss_b = self.loss_function(self.dnn(X_boundaries), U_boundaries)
        return loss_b

    def loss_PDE(self, X_train_Nu):
        alpha=self.alpha
        #lambda2=self.lambda2  # Uncomment if lambda2 is needed
        g = X_train_Nu.clone()
        g.requires_grad = True  # Setting requires_grad to True for automatic differentiation
        u = self.dnn(g)

        # Calculating gradients and second-order gradients
        u_x_t = autograd.grad(u, g, torch.ones([X_train_Nu.shape[0], 1]).to(device), retain_graph=True, create_graph=True)[0]
        u_xx_tt = autograd.grad(u_x_t, g, torch.ones(X_train_Nu.shape).to(device), create_graph=True)[0]
        u_x = u_x_t[:,[0]]
        u_t = u_x_t[:,[1]]
        u_xx = u_xx_tt[:,[0]]
        f = u_t - (alpha)*u_xx  # Residual of the PDE
        loss_f = self.loss_function(f,f_hat)  # f_hat should be defined or passed as an argument

        return loss_f

    def loss_total(self, X_col, U_col, X_boundaries, U_boundaries):
        # Combine data loss, PDE loss, initial loss, and boundary loss
        loss_u = self.loss_data(X_col, U_col)
        loss_f = self.loss_PDE(X_col)
        loss_b = self.loss_boundaries(X_boundaries, U_boundaries)
        loss_val = loss_u + loss_f + loss_b# + loss_i
        return loss_val
    
    def __call__(self, steps, lr, X_boundaries, U_boundaries):
        # Collecting parameters for optimization
        params = list(self.dnn.parameters())#+ [self.alpha] 
        optimizer = torch.optim.Adam(params, lr=lr)  # Adam optimizer
        iter = 0

        for i in range(steps):
            optimizer.zero_grad()
            # Ensure the correct arguments are passed to loss_total
            loss = self.loss_total(X_train_Nu, U_train_Nu, X_boundaries, U_boundaries)# X_IC, U_IC, X_BC, U_BC)   # If loss_total requires arguments, provide them here
            loss.backward()  # Backpropagation
            optimizer.step()  # Update step

            self.iter += 1
            if self.iter % 100 == 0:
                error_vec, _ = self.test()
                print(f'Iter: {self.iter}, Relative Error(Test): {error_vec.cpu().detach().numpy():.5f}, α_real = [0.1], α_PINN = [{self.alpha.item():.5f}]')

                # print(
                #     'Iter: %d, Relative Error(Test): %.5f , α_real = [0.1], α_PINN = [%.5f]' %
                #     (
                #         self.iter,
                #         error_vec.cpu().detach().numpy(),
                #         self.alpha.item(),
                #     )
                # )
        return loss

    def test(self):
        u_pred = self.dnn(X_true)  # X_true should be defined or passed as an argument
        # Relative L2 Norm of the error (Vector)
        error_vec = torch.linalg.norm((U_true-u_pred),2)/torch.linalg.norm(U_true,2)  # U_true should be defined or passed as an argument
        u_pred = u_pred.cpu().detach().numpy()
        u_pred = np.reshape(u_pred,(x.shape[0],t.shape[0]),order='C')  # x and t should be defined or passed as arguments

        return error_vec, u_pred

In [12]:

#Set default dtype to float32
torch.set_default_dtype(torch.float)
torch.manual_seed(123)
np.random.seed(123)

fcn_instance_inv = Fabryka_Inverse(
    dim_in=2, dim_out=1, num_resnet_blocks=3,
    num_layers_per_block=3, num_neurons=25, activation=nn.ELU(0.853), # nn.Tanh(),#nn.SiLU(),
    fourier_features=True, m_freqs=29, sigma=2, tune_beta=True, alpha=0.5, lb=lb, ub=ub)

fcn_instance_inv.to(device)

# Now call the __call__ method of FCN with steps and lr
steps = 20000
lr = 7e-5
fcn_instance_inv(steps, lr, X_train_boundaries, U_train_boundaries)

Iter: 100, Relative Error(Test): 0.49314, α_real = [0.1], α_PINN = [0.49701]
Iter: 200, Relative Error(Test): 0.44367, α_real = [0.1], α_PINN = [0.49353]
Iter: 300, Relative Error(Test): 0.41400, α_real = [0.1], α_PINN = [0.48472]
Iter: 400, Relative Error(Test): 0.39632, α_real = [0.1], α_PINN = [0.47552]
Iter: 500, Relative Error(Test): 0.38704, α_real = [0.1], α_PINN = [0.46726]
Iter: 600, Relative Error(Test): 0.38473, α_real = [0.1], α_PINN = [0.45966]
Iter: 700, Relative Error(Test): 0.38208, α_real = [0.1], α_PINN = [0.45218]
Iter: 800, Relative Error(Test): 0.38396, α_real = [0.1], α_PINN = [0.44455]
Iter: 900, Relative Error(Test): 0.38294, α_real = [0.1], α_PINN = [0.43705]
Iter: 1000, Relative Error(Test): 0.38051, α_real = [0.1], α_PINN = [0.42975]
Iter: 1100, Relative Error(Test): 0.37638, α_real = [0.1], α_PINN = [0.42251]
Iter: 1200, Relative Error(Test): 0.37121, α_real = [0.1], α_PINN = [0.41525]
Iter: 1300, Relative Error(Test): 0.36857, α_real = [0.1], α_PINN = [0.40799]
Iter: 1400, Relative Error(Test): 0.36418, α_real = [0.1], α_PINN = [0.40074]
Iter: 1500, Relative Error(Test): 0.35960, α_real = [0.1], α_PINN = [0.39336]
Iter: 1600, Relative Error(Test): 0.35303, α_real = [0.1], α_PINN = [0.38587]
Iter: 1700, Relative Error(Test): 0.34590, α_real = [0.1], α_PINN = [0.37833]
Iter: 1800, Relative Error(Test): 0.33516, α_real = [0.1], α_PINN = [0.37080]
Iter: 1900, Relative Error(Test): 0.32468, α_real = [0.1], α_PINN = [0.36333]
Iter: 2000, Relative Error(Test): 0.31428, α_real = [0.1], α_PINN = [0.35578]
Iter: 2100, Relative Error(Test): 0.30544, α_real = [0.1], α_PINN = [0.34796]
Iter: 2200, Relative Error(Test): 0.29518, α_real = [0.1], α_PINN = [0.34007]
Iter: 2300, Relative Error(Test): 0.28093, α_real = [0.1], α_PINN = [0.33218]
Iter: 2400, Relative Error(Test): 0.26707, α_real = [0.1], α_PINN = [0.32443]
Iter: 2500, Relative Error(Test): 0.25326, α_real = [0.1], α_PINN = [0.31657]
Iter: 2600, Relative Error(Test): 0.22366, α_real = [0.1], α_PINN = [0.30899]
Iter: 2700, Relative Error(Test): 0.25305, α_real = [0.1], α_PINN = [0.30166]
Iter: 2800, Relative Error(Test): 0.19551, α_real = [0.1], α_PINN = [0.29508]
Iter: 2900, Relative Error(Test): 0.19321, α_real = [0.1], α_PINN = [0.28886]
Iter: 3000, Relative Error(Test): 0.18534, α_real = [0.1], α_PINN = [0.28289]
Iter: 3100, Relative Error(Test): 0.17522, α_real = [0.1], α_PINN = [0.27699]
Iter: 3200, Relative Error(Test): 0.16447, α_real = [0.1], α_PINN = [0.27120]
Iter: 3300, Relative Error(Test): 0.15459, α_real = [0.1], α_PINN = [0.26554]
Iter: 3400, Relative Error(Test): 0.14742, α_real = [0.1], α_PINN = [0.26021]
Iter: 3500, Relative Error(Test): 0.14590, α_real = [0.1], α_PINN = [0.25494]
Iter: 3600, Relative Error(Test): 0.13252, α_real = [0.1], α_PINN = [0.25012]
Iter: 3700, Relative Error(Test): 0.12318, α_real = [0.1], α_PINN = [0.24562]
Iter: 3800, Relative Error(Test): 0.12032, α_real = [0.1], α_PINN = [0.24120]
Iter: 3900, Relative Error(Test): 0.11548, α_real = [0.1], α_PINN = [0.23689]
Iter: 4000, Relative Error(Test): 0.11327, α_real = [0.1], α_PINN = [0.23256]
Iter: 4100, Relative Error(Test): 0.10514, α_real = [0.1], α_PINN = [0.22823]
Iter: 4200, Relative Error(Test): 0.10277, α_real = [0.1], α_PINN = [0.22403]
Iter: 4300, Relative Error(Test): 0.09879, α_real = [0.1], α_PINN = [0.21981]
Iter: 4400, Relative Error(Test): 0.09483, α_real = [0.1], α_PINN = [0.21571]
Iter: 4500, Relative Error(Test): 0.09154, α_real = [0.1], α_PINN = [0.21180]
Iter: 4600, Relative Error(Test): 0.08835, α_real = [0.1], α_PINN = [0.20801]
Iter: 4700, Relative Error(Test): 0.08535, α_real = [0.1], α_PINN = [0.20438]
Iter: 4800, Relative Error(Test): 0.08167, α_real = [0.1], α_PINN = [0.20072]
Iter: 4900, Relative Error(Test): 0.07881, α_real = [0.1], α_PINN = [0.19696]
Iter: 5000, Relative Error(Test): 0.07706, α_real = [0.1], α_PINN = [0.19321]
Iter: 5100, Relative Error(Test): 0.07444, α_real = [0.1], α_PINN = [0.18957]
Iter: 5200, Relative Error(Test): 0.07229, α_real = [0.1], α_PINN = [0.18604]
Iter: 5300, Relative Error(Test): 0.06919, α_real = [0.1], α_PINN = [0.18267]
Iter: 5400, Relative Error(Test): 0.06683, α_real = [0.1], α_PINN = [0.17951]
Iter: 5500, Relative Error(Test): 0.06408, α_real = [0.1], α_PINN = [0.17641]
Iter: 5600, Relative Error(Test): 0.06177, α_real = [0.1], α_PINN = [0.17339]
Iter: 5700, Relative Error(Test): 0.05926, α_real = [0.1], α_PINN = [0.17037]
Iter: 5800, Relative Error(Test): 0.05713, α_real = [0.1], α_PINN = [0.16732]
Iter: 5900, Relative Error(Test): 0.05508, α_real = [0.1], α_PINN = [0.16428]
Iter: 6000, Relative Error(Test): 0.05348, α_real = [0.1], α_PINN = [0.16122]
Iter: 6100, Relative Error(Test): 0.05168, α_real = [0.1], α_PINN = [0.15852]
Iter: 6200, Relative Error(Test): 0.04923, α_real = [0.1], α_PINN = [0.15595]
Iter: 6300, Relative Error(Test): 0.04762, α_real = [0.1], α_PINN = [0.15361]
Iter: 6400, Relative Error(Test): 0.04583, α_real = [0.1], α_PINN = [0.15141]
Iter: 6500, Relative Error(Test): 0.04447, α_real = [0.1], α_PINN = [0.14929]
Iter: 6600, Relative Error(Test): 0.04289, α_real = [0.1], α_PINN = [0.14717]
Iter: 6700, Relative Error(Test): 0.04125, α_real = [0.1], α_PINN = [0.14514]
Iter: 6800, Relative Error(Test): 0.04028, α_real = [0.1], α_PINN = [0.14314]
Iter: 6900, Relative Error(Test): 0.03912, α_real = [0.1], α_PINN = [0.14127]
Iter: 7000, Relative Error(Test): 0.03783, α_real = [0.1], α_PINN = [0.13933]
Iter: 7100, Relative Error(Test): 0.03700, α_real = [0.1], α_PINN = [0.13740]
Iter: 7200, Relative Error(Test): 0.03614, α_real = [0.1], α_PINN = [0.13568]
Iter: 7300, Relative Error(Test): 0.03513, α_real = [0.1], α_PINN = [0.13400]
Iter: 7400, Relative Error(Test): 0.03455, α_real = [0.1], α_PINN = [0.13242]
Iter: 7500, Relative Error(Test): 0.03375, α_real = [0.1], α_PINN = [0.13075]
Iter: 7600, Relative Error(Test): 0.03313, α_real = [0.1], α_PINN = [0.12915]
Iter: 7700, Relative Error(Test): 0.03271, α_real = [0.1], α_PINN = [0.12759]
Iter: 7800, Relative Error(Test): 0.03197, α_real = [0.1], α_PINN = [0.12636]
Iter: 7900, Relative Error(Test): 0.03117, α_real = [0.1], α_PINN = [0.12534]
Iter: 8000, Relative Error(Test): 0.03068, α_real = [0.1], α_PINN = [0.12454]
Iter: 8100, Relative Error(Test): 0.03031, α_real = [0.1], α_PINN = [0.12373]
Iter: 8200, Relative Error(Test): 0.03002, α_real = [0.1], α_PINN = [0.12296]
Iter: 8300, Relative Error(Test): 0.02957, α_real = [0.1], α_PINN = [0.12192]
Iter: 8400, Relative Error(Test): 0.02923, α_real = [0.1], α_PINN = [0.12087]
Iter: 8500, Relative Error(Test): 0.02905, α_real = [0.1], α_PINN = [0.11988]
Iter: 8600, Relative Error(Test): 0.02880, α_real = [0.1], α_PINN = [0.11908]
Iter: 8700, Relative Error(Test): 0.02807, α_real = [0.1], α_PINN = [0.11845]
Iter: 8800, Relative Error(Test): 0.02780, α_real = [0.1], α_PINN = [0.11773]
Iter: 8900, Relative Error(Test): 0.02686, α_real = [0.1], α_PINN = [0.11713]
Iter: 9000, Relative Error(Test): 0.02622, α_real = [0.1], α_PINN = [0.11662]
Iter: 9100, Relative Error(Test): 0.02566, α_real = [0.1], α_PINN = [0.11597]
Iter: 9200, Relative Error(Test): 0.02514, α_real = [0.1], α_PINN = [0.11523]
Iter: 9300, Relative Error(Test): 0.02491, α_real = [0.1], α_PINN = [0.11455]
Iter: 9400, Relative Error(Test): 0.02482, α_real = [0.1], α_PINN = [0.11410]
Iter: 9500, Relative Error(Test): 0.02478, α_real = [0.1], α_PINN = [0.11366]
Iter: 9600, Relative Error(Test): 0.02462, α_real = [0.1], α_PINN = [0.11316]
Iter: 9700, Relative Error(Test): 0.02482, α_real = [0.1], α_PINN = [0.11282]
Iter: 9800, Relative Error(Test): 0.02445, α_real = [0.1], α_PINN = [0.11284]
Iter: 9900, Relative Error(Test): 0.02437, α_real = [0.1], α_PINN = [0.11213]
Iter: 10000, Relative Error(Test): 0.02421, α_real = [0.1], α_PINN = [0.11130]
Iter: 10100, Relative Error(Test): 0.02334, α_real = [0.1], α_PINN = [0.11064]
Iter: 10200, Relative Error(Test): 0.02303, α_real = [0.1], α_PINN = [0.11023]
Iter: 10300, Relative Error(Test): 0.02333, α_real = [0.1], α_PINN = [0.10994]
Iter: 10400, Relative Error(Test): 0.02322, α_real = [0.1], α_PINN = [0.10957]
Iter: 10500, Relative Error(Test): 0.02301, α_real = [0.1], α_PINN = [0.10925]
Iter: 10600, Relative Error(Test): 0.02239, α_real = [0.1], α_PINN = [0.10867]
Iter: 10700, Relative Error(Test): 0.02201, α_real = [0.1], α_PINN = [0.10830]
Iter: 10800, Relative Error(Test): 0.02205, α_real = [0.1], α_PINN = [0.10818]
Iter: 10900, Relative Error(Test): 0.02194, α_real = [0.1], α_PINN = [0.10814]
Iter: 11000, Relative Error(Test): 0.02141, α_real = [0.1], α_PINN = [0.10783]
Iter: 11100, Relative Error(Test): 0.02149, α_real = [0.1], α_PINN = [0.10753]
Iter: 11200, Relative Error(Test): 0.02119, α_real = [0.1], α_PINN = [0.10705]
Iter: 11300, Relative Error(Test): 0.02094, α_real = [0.1], α_PINN = [0.10675]
Iter: 11400, Relative Error(Test): 0.02028, α_real = [0.1], α_PINN = [0.10655]
Iter: 11500, Relative Error(Test): 0.01995, α_real = [0.1], α_PINN = [0.10631]
Iter: 11600, Relative Error(Test): 0.01956, α_real = [0.1], α_PINN = [0.10620]
Iter: 11700, Relative Error(Test): 0.01960, α_real = [0.1], α_PINN = [0.10615]
Iter: 11800, Relative Error(Test): 0.01922, α_real = [0.1], α_PINN = [0.10623]
Iter: 11900, Relative Error(Test): 0.01937, α_real = [0.1], α_PINN = [0.10619]
Iter: 12000, Relative Error(Test): 0.01883, α_real = [0.1], α_PINN = [0.10623]
Iter: 12100, Relative Error(Test): 0.01955, α_real = [0.1], α_PINN = [0.10618]
Iter: 12200, Relative Error(Test): 0.01898, α_real = [0.1], α_PINN = [0.10596]
Iter: 12300, Relative Error(Test): 0.01891, α_real = [0.1], α_PINN = [0.10574]
Iter: 12400, Relative Error(Test): 0.01894, α_real = [0.1], α_PINN = [0.10554]
Iter: 12500, Relative Error(Test): 0.01886, α_real = [0.1], α_PINN = [0.10535]
Iter: 12600, Relative Error(Test): 0.01911, α_real = [0.1], α_PINN = [0.10527]
Iter: 12700, Relative Error(Test): 0.01904, α_real = [0.1], α_PINN = [0.10498]
Iter: 12800, Relative Error(Test): 0.01845, α_real = [0.1], α_PINN = [0.10460]
Iter: 12900, Relative Error(Test): 0.01861, α_real = [0.1], α_PINN = [0.10393]
Iter: 13000, Relative Error(Test): 0.01838, α_real = [0.1], α_PINN = [0.10306]
Iter: 13100, Relative Error(Test): 0.01846, α_real = [0.1], α_PINN = [0.10265]
Iter: 13200, Relative Error(Test): 0.01822, α_real = [0.1], α_PINN = [0.10255]
Iter: 13300, Relative Error(Test): 0.01778, α_real = [0.1], α_PINN = [0.10244]
Iter: 13400, Relative Error(Test): 0.01758, α_real = [0.1], α_PINN = [0.10241]
Iter: 13500, Relative Error(Test): 0.01714, α_real = [0.1], α_PINN = [0.10236]
Iter: 13600, Relative Error(Test): 0.01715, α_real = [0.1], α_PINN = [0.10248]
Iter: 13700, Relative Error(Test): 0.01674, α_real = [0.1], α_PINN = [0.10240]
Iter: 13800, Relative Error(Test): 0.01659, α_real = [0.1], α_PINN = [0.10229]
Iter: 13900, Relative Error(Test): 0.01646, α_real = [0.1], α_PINN = [0.10212]
Iter: 14000, Relative Error(Test): 0.01670, α_real = [0.1], α_PINN = [0.10207]
Iter: 14100, Relative Error(Test): 0.01622, α_real = [0.1], α_PINN = [0.10198]
Iter: 14200, Relative Error(Test): 0.01576, α_real = [0.1], α_PINN = [0.10189]
Iter: 14300, Relative Error(Test): 0.01607, α_real = [0.1], α_PINN = [0.10174]
Iter: 14400, Relative Error(Test): 0.01596, α_real = [0.1], α_PINN = [0.10191]
Iter: 14500, Relative Error(Test): 0.01608, α_real = [0.1], α_PINN = [0.10199]
Iter: 14600, Relative Error(Test): 0.01634, α_real = [0.1], α_PINN = [0.10201]
Iter: 14700, Relative Error(Test): 0.01626, α_real = [0.1], α_PINN = [0.10190]
Iter: 14800, Relative Error(Test): 0.01584, α_real = [0.1], α_PINN = [0.10180]
Iter: 14900, Relative Error(Test): 0.01570, α_real = [0.1], α_PINN = [0.10150]
Iter: 15000, Relative Error(Test): 0.01517, α_real = [0.1], α_PINN = [0.10134]
Iter: 15100, Relative Error(Test): 0.01494, α_real = [0.1], α_PINN = [0.10128]
Iter: 15200, Relative Error(Test): 0.01463, α_real = [0.1], α_PINN = [0.10112]
Iter: 15300, Relative Error(Test): 0.01447, α_real = [0.1], α_PINN = [0.10093]
Iter: 15400, Relative Error(Test): 0.01445, α_real = [0.1], α_PINN = [0.10085]
Iter: 15500, Relative Error(Test): 0.01540, α_real = [0.1], α_PINN = [0.10077]
Iter: 15600, Relative Error(Test): 0.01430, α_real = [0.1], α_PINN = [0.10065]
Iter: 15700, Relative Error(Test): 0.01416, α_real = [0.1], α_PINN = [0.10054]
Iter: 15800, Relative Error(Test): 0.01386, α_real = [0.1], α_PINN = [0.10048]
Iter: 15900, Relative Error(Test): 0.01378, α_real = [0.1], α_PINN = [0.10034]
Iter: 16000, Relative Error(Test): 0.01371, α_real = [0.1], α_PINN = [0.10027]
Iter: 16100, Relative Error(Test): 0.01362, α_real = [0.1], α_PINN = [0.10034]
Iter: 16200, Relative Error(Test): 0.01379, α_real = [0.1], α_PINN = [0.10017]
Iter: 16300, Relative Error(Test): 0.01329, α_real = [0.1], α_PINN = [0.10008]
Iter: 16400, Relative Error(Test): 0.01325, α_real = [0.1], α_PINN = [0.10000]
Iter: 16500, Relative Error(Test): 0.01324, α_real = [0.1], α_PINN = [0.09998]
Iter: 16600, Relative Error(Test): 0.01308, α_real = [0.1], α_PINN = [0.09998]
Iter: 16700, Relative Error(Test): 0.01292, α_real = [0.1], α_PINN = [0.09998]
Iter: 16800, Relative Error(Test): 0.01322, α_real = [0.1], α_PINN = [0.09997]
Iter: 16900, Relative Error(Test): 0.01369, α_real = [0.1], α_PINN = [0.09996]
Iter: 17000, Relative Error(Test): 0.01318, α_real = [0.1], α_PINN = [0.10005]
Iter: 17100, Relative Error(Test): 0.01337, α_real = [0.1], α_PINN = [0.09998]
Iter: 17200, Relative Error(Test): 0.01289, α_real = [0.1], α_PINN = [0.10006]
Iter: 17300, Relative Error(Test): 0.01313, α_real = [0.1], α_PINN = [0.10008]
Iter: 17400, Relative Error(Test): 0.01303, α_real = [0.1], α_PINN = [0.10017]
Iter: 17500, Relative Error(Test): 0.01282, α_real = [0.1], α_PINN = [0.10019]
Iter: 17600, Relative Error(Test): 0.01277, α_real = [0.1], α_PINN = [0.10015]
Iter: 17700, Relative Error(Test): 0.01338, α_real = [0.1], α_PINN = [0.10009]
Iter: 17800, Relative Error(Test): 0.01383, α_real = [0.1], α_PINN = [0.10007]
Iter: 17900, Relative Error(Test): 0.01266, α_real = [0.1], α_PINN = [0.10004]
Iter: 18000, Relative Error(Test): 0.01274, α_real = [0.1], α_PINN = [0.09995]
Iter: 18100, Relative Error(Test): 0.01272, α_real = [0.1], α_PINN = [0.09993]
Iter: 18200, Relative Error(Test): 0.01282, α_real = [0.1], α_PINN = [0.09977]
Iter: 18300, Relative Error(Test): 0.01321, α_real = [0.1], α_PINN = [0.09969]
Iter: 18400, Relative Error(Test): 0.01295, α_real = [0.1], α_PINN = [0.09957]
Iter: 18500, Relative Error(Test): 0.01306, α_real = [0.1], α_PINN = [0.09958]
Iter: 18600, Relative Error(Test): 0.01302, α_real = [0.1], α_PINN = [0.09958]
Iter: 18700, Relative Error(Test): 0.01321, α_real = [0.1], α_PINN = [0.09961]
Iter: 18800, Relative Error(Test): 0.01304, α_real = [0.1], α_PINN = [0.09964]
Iter: 18900, Relative Error(Test): 0.01267, α_real = [0.1], α_PINN = [0.09969]
Iter: 19000, Relative Error(Test): 0.01283, α_real = [0.1], α_PINN = [0.09979]
Iter: 19100, Relative Error(Test): 0.01263, α_real = [0.1], α_PINN = [0.09977]
Iter: 19200, Relative Error(Test): 0.01282, α_real = [0.1], α_PINN = [0.09980]
Iter: 19300, Relative Error(Test): 0.01257, α_real = [0.1], α_PINN = [0.09984]
Iter: 19400, Relative Error(Test): 0.01226, α_real = [0.1], α_PINN = [0.09985]
Iter: 19500, Relative Error(Test): 0.01266, α_real = [0.1], α_PINN = [0.09986]
Iter: 19600, Relative Error(Test): 0.01250, α_real = [0.1], α_PINN = [0.09969]
Iter: 19700, Relative Error(Test): 0.01235, α_real = [0.1], α_PINN = [0.09955]
Iter: 19800, Relative Error(Test): 0.01243, α_real = [0.1], α_PINN = [0.09968]
Iter: 19900, Relative Error(Test): 0.01239, α_real = [0.1], α_PINN = [0.09968]
Iter: 20000, Relative Error(Test): 0.01231, α_real = [0.1], α_PINN = [0.09961]

Out[12]:

tensor(9.3544e-05, device='cuda:0', grad_fn=<AddBackward0>)

In [13]:

# Save the model parameters
model_path = 'fcn_model_inv.pth'
torch.save(fcn_instance_inv.state_dict(), model_path)

# # Create a new instance of FCN
# loaded_fcn_instance = Fabryka_Inverse(
#     dim_in=2, dim_out=1, num_resnet_blocks=3,
#     num_layers_per_block=3, num_neurons=25, activation=nn.ELU(0.77), # nn.Tanh(),#nn.SiLU(),
#     fourier_features=True, m_freqs=29, sigma=2, tune_beta=True, alpha=0.5, lb=lb, ub=ub
# )

# # Load the model parameters from the file
# loaded_fcn_instance.load_state_dict(torch.load(model_path))

# # Move the model to the desired device
# loaded_fcn_instance.to(device)

In [14]:

error_vec, u_pred = fcn_instance_inv.test()

In [15]:

contour_comparison_with_errors(x, t, U_real, u_pred)

In [16]:

import matplotlib.gridspec as gridspec  # Import GridSpec from matplotlib for specifying grid layout of subplots
from mpl_toolkits.axes_grid1 import make_axes_locatable  # Import make_axes_locatable for managing layout

def solutionplot(u_pred,X_u_train,u_train):  # Define function solutionplot with three parameters
    
    fig, ax = plt.subplots(figsize=(25, 22))  # Create figure and axis objects with specified figure size
    ax.axis('off')  # Turn off the axis (hide axis labels and ticks)

    gs0 = gridspec.GridSpec(1, 1)  # Create a GridSpec object with 1 row and 1 column
    gs0.update(top=1-0.06, bottom=1-1/3, left=0.15, right=0.85, wspace=0)  # Update GridSpec layout parameters
    
    ax = plt.subplot(gs0[:, :])  # Create a subplot based on the GridSpec layout

    h = ax.imshow(u_pred, interpolation='nearest', cmap='twilight', 
                extent=[T.min(), T.max(), X.min(), X.max()], 
                origin='lower', aspect='auto')  # Display u_pred as an image on the axis
    
    divider = make_axes_locatable(ax)  # Create a divider for the existing axis instance
    cax = divider.append_axes("right", size="5%", pad=0.05)  # Append axes to the right of ax with specified size and padding
    
    fig.colorbar(h, cax=cax)  # Add a colorbar to the right of the image
    
    # Scatter plot for the data points with adjusted marker size
    ax.scatter(X_u_train[:, 1], X_u_train[:, 0], color='k', marker='.', label='Data (%d points)' % (u_train.shape[0]), s=40)  
    # Scatter plot for the initial condition points with adjusted marker size
    ax.scatter(X_IC[:, 1], X_IC[:, 0], color='g', marker='x', label='IC Points (%d points)' % (X_IC.shape[0]), s=200)  
    # Scatter plot for the boundary condition points with adjusted marker size
    ax.scatter(X_BC[:, 1], X_BC[:, 0], color='y', marker='o', label='BC Points (%d points)' % (X_BC.shape[0]), s=200)

    
    ax.legend(loc='upper right')  # Create a legend in the upper right corner of the subplot
    plt.show()  # Display the figure

solutionplot(U_real, X_train_Nu.cpu().detach().numpy(), U_train_Nu)

/tmp/ipykernel_22660/3906780272.py:12: MatplotlibDeprecationWarning: Auto-removal of overlapping axes is deprecated since 3.6 and will be removed two minor releases later; explicitly call ax.remove() as needed.
  ax = plt.subplot(gs0[:, :])  # Create a subplot based on the GridSpec layout
/tmp/ipykernel_22660/3906780272.py:21: MatplotlibDeprecationWarning: Auto-removal of grids by pcolor() and pcolormesh() is deprecated since 3.5 and will be removed two minor releases later; please call grid(False) first.
  fig.colorbar(h, cax=cax)  # Add a colorbar to the right of the image

In this summary, we delve into the enhancement of Physics-Informed Neural Networks (PINNs) training via the integration of Fourier features and Residual Neural Networks. Furthermore, we explore the mesh invariance issue witnessed in PINNs and propose the adoption of Neural Operators as a viable solution.

(a) Utilizing Fourier Features and Residual Neural Networks in PINNs:

The amalgamation of Fourier features and Residual Neural Networks (ResNets) has been recognized as a promising approach to augment the training of PINNs for both forward and inverse problems. Fourier features are crucial in capturing the periodic nature and the multi-scale behaviors inherent in many physical systems. They aid in enriching the feature space, thus allowing the neural networks to learn and generalize better across varying scales and domains. ResNets, on the other hand, facilitate the training of deeper networks by mitigating the vanishing gradient problem, which is pivotal in learning hierarchical features and ensuring the convergence of the training process. When harmonized, these techniques contribute to a more robust and efficient training regime, improving the performance of PINNs in solving Partial Differential Equations (PDEs) inherent in the physical problems at hand.

(b) Mesh Invariance Issue in PINNs:

Despite the advancements, a notable drawback of PINNs is their sensitivity to the mesh grid, number of points on the grid, and other hyperparameters. Particularly, the quality of results may deteriorate with alterations in the grid configuration or boundary conditions. This lack of mesh invariance poses a challenge as it impedes the robustness and generalizability of PINNs across varying discretizations and geometries. Mesh invariance is essentially the ability of the model to provide consistent results irrespective of the mesh grid configurations. The absence of this invariance in PINNs could potentially lead to inconsistent or erroneous results across different problem setups.

Neural Operators emerge as a promising avenue to overcome the mesh invariance issue witnessed in PINNs. Unlike traditional PINNs that operate on a fixed grid, Neural Operators are designed to learn mappings between function spaces, thus offering a level of abstraction that transcends fixed discretizations. By operating in this higher-level function space, Neural Operators are capable of achieving mesh invariance, ensuring consistent and accurate results across varying grid configurations and boundary conditions. This distinction underscores a fundamental shift from solving PDEs in a discretized domain (as done in PINNs) to a more generalized approach that operates in continuous function spaces. Although the exact implementation may vary, the core idea remains to establish a learning framework that is less tethered to the discretization scheme and more focused on the inherent structures and properties of the underlying PDEs.

In conclusion, the fusion of Fourier features and Residual Neural Networks significantly boosts the efficacy of PINNs in tackling forward and inverse problems. However, the mesh invariance issue remains a hurdle, necessitating further exploration of alternative strategies like Neural Operators to attain a more robust and generalized solution framework for solving PDEs.