PINNs in Forward & Inverse Problems
In this post, we demonstrate the forward and inverse PINNs problem for the heat equation
$$U_t=0.1\,U_{xx}$$
with Dirichlet boundary conditions, homogeneous $U(0,t)=0$ and non-homogeneous $U(1,t)=\cos(2t)$, together with the initial condition $U(x,0)=\sqrt{x}$. Non-homogeneous solutions can cause difficulties, i.e., it is often not straightforward to obtain stable convergence, and the model overfits easily. The problem can be overcome by applying a bespoke NN architecture and/or Fourier Features. The analytical solution was obtained through Mathematica in the form of a 30-term expansion of the solution and is plotted below. For the residual term, we have a grid of 200x100, totaling 20000 points from which we will sample points, namely collocation points inside the domain, $\Omega$.
The first step is to obtain an analytical solution by means of Mathematica. You can use the Mathematica kernel for free. How to install it and use with Jupyter you can find here. Now I will swap to the Mathematica kernel and obtain the approximated analytical solution of the equation in question.
heqn = D[u[x, t], t] == 0.1*D[u[x, t], {x, 2}]
ic = u[x, 0] == Sqrt[x];
bc = {u[0, t] == 0, u[1, t] == Cos[2t]};
sol = DSolve[{heqn, ic, bc}, u[x, t], {x, t}];
asol = u[x, t] /. sol[[1]] /. {[Infinity] -> 30} //Activate//FullSimplify;
Load/install needed libraries
import matplotlib.pyplot as plt
import torch
from torch import nn
from torch import Tensor
import torch.autograd as autograd
import torch.optim as optim
from torch.optim import Adam
import numpy as np
#!pip install pyDOE
from pyDOE import lhs #for Latin hypercube sampling
#!pip install torchviz
from torchviz import make_dot
#!pip install jupyterthemes
from jupyterthemes import jtplot
jtplot.style(theme="monokai", context="notebook", ticks=True, grid=True)
print("CUDA available: ", torch.cuda.is_available())
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print("PyTorch version: ", torch.__version__ )
In this post, we delve into the intricacies of Residual Networks (ResNets) and the role of the tune_beta
parameter, represented as $ \beta $ in the mathematics, in deep learning architectures, especially when it pertains to the enhancement of these networks.
Residual Learning¶
In classical feed-forward neural networks, each layer is trained to directly map its input to the corresponding output. However, for exceedingly deep networks, this can be problematic due to challenges such as vanishing or exploding gradients. Enter Residual Networks (ResNets), which introduced a paradigm shift by leveraging "shortcut" or "skip" connections to enable identity mapping, thereby allowing layers to learn the residual between the input and output.
Mathematically expressed, instead of aiming to learn $ H(x) $, a ResNet endeavors to learn the residual function:
$$ F(x) = H(x) - x $$
During the forward pass, the input $ x $ is added to this residual function, rendering the actual output:
$$ H(x) = F(x) + x $$
Stochastic Depth and the Emergence of $ \beta $¶
Stochastic depth is a regularization technique designed to improve convergence and generalization in deep networks. The essence of this method is the random "dropping" or "bypassing" of layers during the training phase.
Originally, stochastic depth utilized a binary random value to scale a layer's output. However, the innovation of introducing $ \beta $ generalized this binary random value into a continuous, trainable parameter. This allows the network to dynamically adjust the significance of each layer's output during the training process.
Layer Scaling with $ \beta $¶
Layer scaling is based on the proposition that permitting the network to modify the significance or contribution of specific layers or blocks can optimize the training process. For a given layer's output $ F(x) $, scaling introduces the factor $ \beta $:
$$ \text{Scaled Output} = \beta \cdot F(x) $$
When tune_beta
is set to True, the network considers $ \beta $ as a learnable parameter for each layer in the residual block. This parameter scales the layer's output prior to activation. Conversely, when set to False, $ \beta $ is fixed at a value of 1, ensuring the layer's output remains unchanged.
Fourier Features in Machine Learning¶
Fourier analysis provides a way to express functions in terms of a combination of sine and cosine functions. This powerful technique asserts that, irrespective of a function's complexity, it can be represented as a sum of simpler sinusoids, each with a distinct frequency and amplitude.
Fourier Transform: A Bridge to the Frequency Domain¶
The Fourier transform is an essential operation that takes a function from its original domain (typically time or space) and maps it to the frequency domain. Formally, for a function $ f(t) $, its Fourier transform $ F(\omega) $ is given by:
$$ F(\omega) = \int_{-\infty}^{\infty} f(t) e^{-j\omega t} dt $$
The inverse process, which recovers $ f(t) $ from $ F(\omega) $, is achieved using the inverse Fourier transform:
$$ f(t) = \frac{1}{2\pi} \int_{-\infty}^{\infty} F(\omega) e^{j\omega t} d\omega $$
Fourier Features in Neural Networks¶
When deploying neural networks for tasks like regression on functions with periodic patterns, directly using raw inputs might not capture the inherent oscillatory behavior. Fourier features come to the rescue by projecting these inputs into a higher-dimensional space, enhancing the network's expressive power.
Given an input $ x $, the Fourier feature mapping can be defined as:
$$ \phi(x) = \left[ \sin(2\pi B x), \cos(2\pi B x) \right] $$
Where ( B ) is a matrix of learned or fixed frequencies. This mapping produces a sinusoidal embedding for each input dimension, amplifying the model's capacity to learn periodic patterns.
Fourier Features and the Utility of $ \beta $¶
Fourier features, often used to transform input data, can amplify the expressive power of neural networks, particularly for functions that exhibit periodic patterns. These features essentially project the input data into a higher-dimensional space, making it easier for the network to learn complex, oscillatory functions.
When used in conjunction with $ \beta $, the neural network can dynamically decide the importance of Fourier-transformed features versus the original features. By allowing the network to learn the optimal scaling factor $ \beta $, it can effectively balance between the raw and transformed features, leading to improved convergence and better generalization to unseen data.
Benefits in Machine Learning¶
- Expressiveness: With Fourier features, neural networks can learn intricate, oscillatory functions with fewer parameters.
- Generalization: By emphasizing periodic patterns, models might generalize better to unseen data exhibiting similar behaviors.
- Interpolation: For tasks like image synthesis, Fourier features can lead to smoother interpolations between data points.
In essence, Fourier features offer a bridge, allowing neural networks to tap into the rich world of frequency-domain information, thus enhancing their learning capabilities in certain tasks.
Let's demonstrate the working of Fourier features using a simple 1D regression example.¶
This snippet is a good example that will help you understnand the Fourier Features.
We'll first generate some synthetic data that follows a sine wave pattern. We'll then try to fit a neural network to this data using both the raw input features and the Fourier-transformed input features. By comparing the two fits, we can visualize the impact of the Fourier features.
# Generate synthetic data
x = np.linspace(0, 4 * np.pi, 500)[:, None]
y = np.sin(x)
# Convert to PyTorch tensors and move them to the device
x_tensor = torch.tensor(x, dtype=torch.float32).to(device)
y_tensor = torch.tensor(y, dtype=torch.float32).to(device)
# Define a simple feedforward neural network
class SimpleNN(nn.Module):
def __init__(self, input_dim):
super(SimpleNN, self).__init__()
self.fc = nn.Sequential(
nn.Linear(input_dim, 64),
nn.ReLU(),
nn.Linear(64, 1)
)
def forward(self, x):
return self.fc(x)
# Fourier transformation
B = torch.randn(1, 10).to(device)
x_fourier = torch.cat([torch.sin(2 * np.pi * x_tensor @ B), torch.cos(2 * np.pi * x_tensor @ B)], dim=1)
# Training function
def train_model(steps, model, x, y):
optimizer = optim.Adam(model.parameters(), lr=0.01)
for _ in range(steps):
optimizer.zero_grad()
loss = nn.MSELoss()(model(x), y)
loss.backward()
optimizer.step()
return model(x)
# Training steps
steps_list = [10, 30, 4000]
# Initialize and move the models to the device
model_fourier = SimpleNN(20).to(device)
model_plain = SimpleNN(1).to(device)
# Collect predictions
predictions_fourier = []
predictions_plain = []
for steps in steps_list:
predictions_fourier.append(train_model(steps, model_fourier, x_fourier, y_tensor))
predictions_plain.append(train_model(steps, model_plain, x_tensor, y_tensor))
# Plotting
fig, axs = plt.subplots(3, 1, figsize=(10, 12))
for i, (steps, y_pred_f, y_pred_p) in enumerate(zip(steps_list, predictions_fourier, predictions_plain)):
axs[i].plot(x, y, label="True Function", linestyle='dashed')
axs[i].plot(x, y_pred_f.cpu().detach().numpy(), label=f"Fourier Features after {steps} steps")
axs[i].plot(x, y_pred_p.cpu().detach().numpy(), label=f"Without Fourier Features after {steps} steps")
axs[i].legend()
axs[i].set_title(f'After {steps} Training Steps')
plt.tight_layout()
plt.show()
Analyzing the line of the Fourier transformation
Step 1: Initialisation of vector of random frequencies¶
B = torch.randn(1, 10).to(device)
Here, we're creating a tensor $B$ of shape (1, 10) filled with random values sampled from a normal distribution. These random values serve as the frequencies for the Fourier features. Represented mathematically, this tensor is:
$$ B = \begin{bmatrix} b_1 & b_2 & \dots & b_{10} \end{bmatrix} $$
where each $b_i$ is a random frequency.
Step 2: Fourier Transformation - projection¶
Given an input tensor $x_{\text{tensor}}$, we use the Fourier transformation to project it into a higher-dimensional space using sinusoids.
x_fourier = torch.cat([torch.sin(2 * np.pi * x_tensor @ B), torch.cos(2 * np.pi * x_tensor @ B)], dim=1)
Here, we're computing the sine of the product of the input tensor and the frequency tensor $B$. The $@$ symbol represents matrix multiplication. The result is a tensor of sine values corresponding to each frequency in $B$. Mathematically:
$$ x_{\text{sin}} = \left[\sin(2\pi x_1 b_1), \sin(2\pi x_1 b_2), \ldots, \sin(2\pi x_1 b_{10})\right] $$
Similarly, we compute the cosine of the product of the input tensor and the frequency tensor. This results in a tensor of cosine values. Represented mathematically:
$$ x_{\text{cos}} = \left[\cos(2\pi x_1 b_1), \cos(2\pi x_1 b_2), \ldots, \cos(2\pi x_1 b_{10})\right] $$
Step 3: Concatenation¶
Finally, the sine and cosine transformed values are concatenated along dimension 1 to form the Fourier-transformed input:
$$ x_{\text{fourier}} = [x_{\text{sin}}, x_{\text{cos}}] $$
This transformed input, $x_{\text{fourier}}$, can then be used in subsequent layers of the neural network, enabling it to capture oscillatory patterns more effectively.
In conclusion, Fourier features, as implemented in the provided code, leverage the power of sinusoidal transformations to enrich the input space of neural networks. This often results in enhanced performance, especially in tasks where the data or the solution has inherent oscillatory patterns or periodicities. For example, one of the challenges in using PINNs for PDEs is enforcing boundary and initial conditions. Fourier features, due to their inherent oscillatory nature, can provide a better fit near boundaries and initial conditions, ensuring that the neural network respects these constraints.
Dealing with High-Frequency Signals¶
One of the inherent strengths of Fourier features is their ability to handle high-frequency signals. The reason lies in the fundamental principles of Fourier analysis. High-frequency signals, by definition, oscillate rapidly. The Fourier transform, which decomposes a function into its constituent sinusoids, is adept at capturing these rapid oscillations by representing them in terms of their sine and cosine components. Thus, when the input data has high-frequency variations, the Fourier features can effectively represent these oscillations in the transformed space, ensuring that the neural network doesn't miss out on these critical patterns.
In tasks involving PDEs solved by PINNs, this becomes especially relevant. Many solutions to PDEs can exhibit rapid changes or oscillatory behaviors, especially in complex domains or under specific boundary conditions. Fourier features ensure that these behaviors are captured and represented adequately, enhancing the ability of the PINN to approximate the solution more accurately.
In conclusion, Fourier features, as implemented in the provided code, leverage the power of sinusoidal transformations to enrich the input space of neural networks. This often results in enhanced performance, especially in tasks where the data or the solution has inherent oscillatory patterns, high-frequency components, or periodicities.
The NN architecture implemented here has both $\beta$-tuning and Fourier Features that facilitate the training process significantly
seed = 123
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed) # if you have more than one GPU
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = True
def analytical_sol(x, t, y):
x_plot = x.squeeze(1)
t_plot = t.squeeze(1)
X, T = torch.meshgrid(x_plot, t_plot)
F_xt = y
fig = plt.figure(figsize=(25, 12))
fig.suptitle('Analytical Solution of the Heat Equation', fontsize=30)
ax1 = fig.add_subplot(121, projection='3d')
ax2 = fig.add_subplot(122)
ax1.plot_surface(T.numpy(), X.numpy(), F_xt.numpy(), cmap="twilight")
ax1.set_title('U(x,t)')
ax1.set_xlabel('t')
ax1.set_ylabel('x')
# Rotate the plot by 180 degrees in the x-dimension
ax1.view_init(elev=45, azim=45) # You can adjust the 'elev' parameter for desired elevation
cp = ax2.contourf(T, X, F_xt, 20, cmap="twilight")
fig.colorbar(cp) # Add a colorbar to a plot
ax2.set_title('U(x,t)')
ax2.set_xlabel('t')
ax2.set_ylabel('x')
plt.show()
def approx_sol(x,t,y):
X,T= x,t
F_xt = y
fig = plt.figure(figsize=(25,15))
fig.suptitle('NN Approximated Solution of the Heat Equation', fontsize=30)
ax1 = fig.add_subplot(121, projection='3d')
ax2 = fig.add_subplot(122)
ax1.plot_surface(T.numpy(), X.numpy(), F_xt.numpy(),cmap="twilight")
ax1.set_title('U(x,t)')
ax1.set_xlabel('t')
ax1.set_ylabel('x')
# Rotate the plot by 180 degrees in the x-dimension
ax1.view_init(elev=45, azim=45) # You can adjust the 'elev' parameter for desired elevation
cp = ax2.contourf(T,X, F_xt,20,cmap="twilight")
fig.colorbar(cp) # Add a colorbar to a plot
ax2.set_title('U(x,t)')
ax2.set_xlabel('t')
ax2.set_ylabel('x')
plt.show()
def contour_comparison_with_errors(x, t, y_analytical, y_approximated):
x_plot = x.squeeze(1)
t_plot = t.squeeze(1)
X, T = torch.meshgrid(x_plot, t_plot)
error_matrix = y_analytical - y_approximated
pointwise_error = torch.sqrt(error_matrix**2)
frobenius_norm = torch.norm(error_matrix, p='fro').item()
fig = plt.figure(figsize=(35, 12))
fig.suptitle('Comparison of Analytical and NN Approximated Solutions', fontsize=30)
# Analytical Solution
ax1 = fig.add_subplot(131)
cp1 = ax1.contourf(T, X, y_analytical, 20, cmap="twilight")
fig.colorbar(cp1, ax=ax1)
ax1.set_title('Analytical U(x,t)')
ax1.set_xlabel('t')
ax1.set_ylabel('x')
# Approximated Solution
ax2 = fig.add_subplot(132)
cp2 = ax2.contourf(T, X, y_approximated, 20, cmap="twilight")
fig.colorbar(cp2, ax=ax2)
ax2.set_title('Approximated U(x,t)')
ax2.set_xlabel('t')
ax2.set_ylabel('x')
# Error Visualization using the Euclidean norm for each point
ax3 = fig.add_subplot(133)
cp3 = ax3.contourf(T, X, pointwise_error, 20, cmap="inferno")
fig.colorbar(cp3, ax=ax3)
ax3.set_title(f'Pointwise Euclidean Error\nFrobenius Norm: {frobenius_norm:.4f}')
ax3.set_xlabel('t')
ax3.set_ylabel('x')
plt.show()
We will approximate the heat equation on the box [0, 1]x[0, 1], but the first step we will render the analytical solution of equation in question from Mathematica. There are [x, t] = [200, 100] collocation points for the residual loss, and 40 and 40 points for initial and boundary conditions, respectively.
# We will approximate the heat equation on the box [0, 1]x[0, 1]:
x_min=0.
x_max=1.
t_min=0.
t_max=2.#2*torch.pi#(3.*torch.pi)/4.
#Collocation discretisation of the box, i.e. for the residual term
# total_points_x=200
# total_points_t=100
total_points_x=200
total_points_t=100
#Create mesh
x=torch.linspace(x_min,x_max,total_points_x).view(-1,1) #add dimension
print(f'x shape is {x.shape}')
t=torch.linspace(t_min,t_max,total_points_t).view(-1,1)
# Let's define the analytical solution of our heat equation. It is approximation made up
# of the first 30 terms obtained by Mathematica as shown above.
def u(x, t):
pi = torch.pi
exp = torch.exp
sin = torch.sin
cos = torch.cos
expression = (
x * cos(2 * t) +
0.750034 * exp(-0.98696 * t) * sin(pi * x) +
0.0126985 * exp(-3.94784 * t) * sin(2 * pi * x) +
0.0541309 * exp(-8.88264 * t) * sin(3 * pi * x) +
0.0253758 * exp(-15.7914 * t) * sin(4 * pi * x) +
0.0210899 * exp(-24.674 * t) * sin(5 * pi * x) +
0.0149057 * exp(-35.5306 * t) * sin(6 * pi * x) +
0.0123551 * exp(-48.3611 * t) * sin(7 * pi * x) +
0.00983617 * exp(-63.1655 * t) * sin(8 * pi * x) +
0.00840252 * exp(-79.9438 * t) * sin(9 * pi * x) +
0.00707543 * exp(-98.696 * t) * sin(10 * pi * x) +
0.00619775 * exp(-119.422 * t) * sin(11 * pi * x) +
0.00539475 * exp(-142.122 * t) * sin(12 * pi * x) +
0.00481634 * exp(-166.796 * t) * sin(13 * pi * x) +
0.00428605 * exp(-193.444 * t) * sin(14 * pi * x) +
0.00388256 * exp(-222.066 * t) * sin(15 * pi * x) +
0.00351044 * exp(-252.662 * t) * sin(16 * pi * x) +
0.00321628 * exp(-285.232 * t) * sin(17 * pi * x) +
0.00294317 * exp(-319.775 * t) * sin(18 * pi * x) +
0.00272112 * exp(-356.293 * t) * sin(19 * pi * x) +
0.00251363 * exp(-394.784 * t) * sin(20 * pi * x) +
0.00234125 * exp(-435.25 * t) * sin(21 * pi * x) +
0.00217921 * exp(-477.689 * t) * sin(22 * pi * x) +
0.00204226 * exp(-522.102 * t) * sin(23 * pi * x) +
0.00191284 * exp(-568.489 * t) * sin(24 * pi * x) +
0.00180193 * exp(-616.85 * t) * sin(25 * pi * x) +
0.00169662 * exp(-667.185 * t) * sin(26 * pi * x) +
0.00160532 * exp(-719.494 * t) * sin(27 * pi * x) +
0.00151825 * exp(-773.777 * t) * sin(28 * pi * x) +
0.00144203 * exp(-830.034 * t) * sin(29 * pi * x) +
1. * (
sin(2. * t) * (
0.252637 * sin(pi * x) -
0.128324 * sin(2 * pi * x) +
0.0454747 * sin(3 * pi * x) -
0.019839 * sin(4 * pi * x) +
0.0102531 * sin(5 * pi * x) -
0.00595364 * sin(6 * pi * x) +
0.00375469 * sin(7 * pi * x) -
0.00251713 * sin(8 * pi * x) +
0.00176852 * sin(9 * pi * x) -
0.00128953 * sin(10 * pi * x) +
0.00096897 * sin(11 * pi * x) -
0.000746415 * sin(12 * pi * x) +
0.000587108 * sin(13 * pi * x) -
0.000470089 * sin(14 * pi * x) +
0.000382209 * sin(15 * pi * x) -
0.000314937 * sin(16 * pi * x) +
0.000262568 * sin(17 * pi * x) -
0.000221195 * sin(18 * pi * x) +
0.000188077 * sin(19 * pi * x) -
0.000161254 * sin(20 * pi * x) +
0.000139297 * sin(21 * pi * x) -
0.000121153 * sin(22 * pi * x) +
0.000106028 * sin(23 * pi * x) -
0.0000933193 * sin(24 * pi * x) +
0.0000825631 * sin(25 * pi * x) -
0.0000733984 * sin(26 * pi * x) +
0.0000655414 * sin(27 * pi * x) -
0.000058767 * sin(28 * pi * x) +
0.0000528949 * sin(29 * pi * x) -
0.0000477798 * sin(30 * pi * x)
) +
cos(2. * t) * (
-0.511949 * sin(pi * x) +
0.0650094 * sin(2 * pi * x) -
0.010239 * sin(3 * pi * x) +
0.00251264 * sin(4 * pi * x) -
0.000831087 * sin(5 * pi * x) +
0.000335128 * sin(6 * pi * x) -
0.000155277 * sin(7 * pi * x) +
0.0000796995 * sin(8 * pi * x) -
0.0000442442 * sin(9 * pi * x) +
0.0000261314 * sin(10 * pi * x) -
0.0000162276 * sin(11 * pi * x) +
0.0000105038 * sin(12 * pi * x) -
7.03982e-6 * sin(13 * pi * x) +
4.8602e-6 * sin(14 * pi * x) -
3.4423e-6 * sin(15 * pi * x) +
2.49295e-6 * sin(16 * pi * x) -
1.84109e-6 * sin(17 * pi * x) +
1.38344e-6 * sin(18 * pi * x) -
1.05574e-6 * sin(19 * pi * x) +
8.1692e-7 * sin(20 * pi * x) -
6.40081e-7 * sin(21 * pi * x) +
5.07247e-7 * sin(22 * pi * x) -
4.06158e-7 * sin(23 * pi * x) +
3.28306e-7 * sin(24 * pi * x) -
2.67692e-7 * sin(25 * pi * x) +
2.20024e-7 * sin(26 * pi * x) -
1.82187e-7 * sin(27 * pi * x) +
1.51896e-7 * sin(28 * pi * x) -
1.27452e-7 * sin(29 * pi * x) +
1.0758e-7 * sin(30 * pi * x)
)
) +
0.00136908 * exp(-888.264 * t) * sin(30 * pi * x)
)
return expression
# Create the mesh
X,T=torch.meshgrid(x.squeeze(1),t.squeeze(1))#[200,100]
print(x.shape)#[200,1]
print(t.shape)#[100,1]
print(X.shape)#[200,100]
print(T.shape)#[200,100]
x_min_tens = torch.Tensor([x_min])
x_max_tens = torch.Tensor([x_max])
#To get analytical solution obtained via MATHEMATICA
f_real = lambda x, t: u(x, t)
# Evaluate real solution on the box domain
U_real=f_real(X,T)
print(f'U_real shape is: {U_real.shape}')
analytical_sol(x,t,U_real) #f_real was defined previously(function)
The class Feeder() manages the mesh together with boundary and initial conditions. Here we have a tensor (x, t) for $u(x,t)$. So we have to flatten it in each dimension, and next to cocatenate obtained long vectors into two column vector. After all, we sample either for the collocation points or boundary conditions. Everthing is processed by class Feeder.
class Feeder():
def __init__(self, N_IC, N_BC, x_min, x_max, t_min, t_max, tot_pts_x, tot_pts_t, real_func,
left_boundary_func, bottom_boundary_func, top_boundary_func):
# Create mesh
self.x = torch.linspace(x_min, x_max, tot_pts_x).view(-1,1)
self.t = torch.linspace(t_min, t_max, tot_pts_t).view(-1,1)
# Create the mesh
self.X, self.T = torch.meshgrid(self.x.squeeze(1), self.t.squeeze(1)) # [200,100]
# Set the functions as attributes of the class
self.real_func = real_func
self.left_boundary_func = left_boundary_func
self.bottom_boundary_func = bottom_boundary_func
self.top_boundary_func = top_boundary_func
self.N_IC = N_IC
self.N_BC = N_BC
def func_real(self):
self.U_real = self.real_func(self.X, self.T)
return self.x, self.t, self.U_real
def get_tensors(self, N_residual):
# Call func_real() with the provided real_func
self.func_real()
# Prepare testing data
x_test = torch.hstack((self.X.T.flatten()[:, None], self.T.T.flatten()[:, None])) # [200x100 -> 20000,2]
U_test = self.U_real.T.flatten()[:, None] # [200x100 -> 20000,1]
# Domain bounds
lb = x_test[0] # first value of the mesh
ub = x_test[-1] # last value of the mesh
# Left Edge: U(x,0) -> xmin=<x=<xmax; t=0
left_X = torch.hstack((self.X[:, 0][:, None], self.T[:, 0][:, None])) # [200,2]
left_U = self.left_boundary_func(left_X) # [200,1]
# Bottom Edge: U(x_min=0,t) = sin(2x); tmin=<t=<max
bottom_X = torch.hstack([self.X[0, :][:, None], self.T[0, :][:, None]]) # [100,2]
bottom_U = self.bottom_boundary_func(bottom_X) # [100,1]
# Top Edge: U(x_max=1,t) = 0 ; 0=<t=<1
top_X = torch.hstack((self.X[-1, :][:, None], self.T[-1, :][:, None])) # [100,2]
top_U = self.top_boundary_func(top_X) # [100,1]
# N_IC=20
idx = np.random.choice(left_X.shape[0], self.N_IC, replace=False)
X_IC = left_X[idx, :] # [15,2]
U_IC = left_U[idx, :] # [15,1]
# Stack all boundaries vertically
X_BC = torch.vstack([bottom_X, top_X]) # [150,2]
U_BC = torch.vstack([bottom_U, top_U]) # [150,1]
# N_BC=200
idx = np.random.choice(X_BC.shape[0], self.N_BC, replace=False)
X_BC = X_BC[idx, :] # [100,2]
U_BC = U_BC[idx, :] # [100,1]
# Stack all boundaries vertically
X_train_boundaries = torch.vstack([X_IC, X_BC]) # [200,2]
U_train_boundaries = torch.vstack([U_IC, U_BC]) # [200,1]
# Collocation Points to evaluate PDE residual
X_train_residual = lb + (ub - lb) * torch.tensor(lhs(2, N_residual)).float() # [1000,2]
X_train_total = torch.vstack((X_train_residual, X_train_boundaries)) # [10000,2]
# Store tensors to GPU
X_train_boundaries = X_train_boundaries.float().to(device) # Training Points (BC)
U_train_boundaries = U_train_boundaries.float().to(device) # Training Points (BC)
X_train_total = X_train_total.float().to(device) # Collocation Points
U_hat = torch.zeros(X_train_total.shape[0], 1).to(device) # [10100,1]
return lb, ub, x_test, U_test, X_train_boundaries, U_train_boundaries, X_train_total, U_hat
def initial_condition_function(left_X):
return torch.sqrt(left_X[:, 0]).unsqueeze(1)
def bottom_boundary_function(bottom_X):
return torch.zeros(bottom_X.shape[0], 1)
def top_boundary_function(top_X):
return torch.cos(2. * top_X[:, 1]).unsqueeze(1)
# Assuming you have a real_func defined somewhere
dane = Feeder(N_IC=40, N_BC=40, x_min=x_min, x_max=x_max, t_min=t_min, t_max=t_max, tot_pts_x=total_points_x, tot_pts_t=total_points_t, real_func=f_real,
left_boundary_func=initial_condition_function,
bottom_boundary_func=bottom_boundary_function,
top_boundary_func=top_boundary_function)
# We will use 10000 collocation points
lb, ub, x_test, U_test, X_train_boundaries, U_train_boundaries, X_train_total, U_hat = dane.get_tensors(10000)
print('x_test_shape: ', x_test.shape)
print('U_test_shape: ', U_test.shape)
print('X_train_boundaries_shape: ', X_train_boundaries.shape)
print('U_train_boundaries_shape: ', U_train_boundaries.shape)
print('X_train_total_shape: ', X_train_total.shape)
print('U_hat_shape: ', U_hat.shape)
class ResBlock(nn.Module):
def __init__(self, num_layers, num_neurons, activation, tune_beta):
super(ResBlock, self).__init__()
# Create a list of linear layers with num_layers elements
self.layers = nn.ModuleList([nn.Linear(num_neurons, num_neurons) for _ in range(num_layers)])
self.activation = activation
# If tune_beta is True, initialize beta for each layer in the ResBlock as learnable parameters
if tune_beta:
self.beta = nn.Parameter(torch.ones(num_layers, 1))
else:
# If tune_beta is False, set beta to a tensor of ones
self.beta = torch.ones(num_layers, 1).to(device)
def forward(self, x):
identity = x
for idx, layer in enumerate(self.layers):
# Apply the activation function to the output of each layer scaled by beta
x = self.activation(self.beta[idx] * layer(x)) + identity
return x
class DenseResNet(nn.Module):
def __init__(self, dim_in, dim_out, num_resnet_blocks,
num_layers_per_block, num_neurons, activation,
fourier_features, m_freqs, sigma, tune_beta, lb, ub):
super(DenseResNet, self).__init__()
self.fourier_features = fourier_features
self.activation = activation
self.lb = lb
self.ub = ub
if fourier_features:
# Initialize learnable parameter B for Fourier features
self.B = nn.Parameter(sigma * torch.randn(dim_in, m_freqs).to(device))
dim_in = 2 * m_freqs
# Define the first layer as a linear layer followed by activation
self.first = nn.Sequential(
nn.Linear(dim_in, num_neurons),
activation,
)
# Create a list of ResBlock modules, each with num_layers_per_block layers
# Pass the tune_beta flag to each ResBlock
self.resblocks = nn.ModuleList([
ResBlock(num_layers_per_block, num_neurons, activation, tune_beta)
for _ in range(num_resnet_blocks)
])
# Define the last layer as a linear layer
self.last = nn.Linear(num_neurons, dim_out)
def forward(self, x):
ub = self.ub.float().to(device)
lb = self.lb.float().to(device)
# Perform feature scaling to the input
x = (x - lb) / (ub - lb)
if self.fourier_features:
# Compute the cosine and sine components of the Fourier features
cosx = torch.cos(torch.matmul(x, self.B)).to(device)
sinx = torch.sin(torch.matmul(x, self.B)).to(device)
# Concatenate the cosine and sine components along dimension 1
x = torch.cat((cosx, sinx), dim=1)
# Forward pass through the first layer
x = self.first(x)
# Forward pass through each ResBlock in the list
for resblock in self.resblocks:
x = resblock(x)
# Forward pass through the last layer
out = self.last(x)
return out
To visualise the NN architecture
# lb = torch.tensor([0.0, 0.0]).to(device) # Example lb
# ub = torch.tensor([1.0, 1.0]).to(device) # Example ub
# Fourier_dense_net1 = DenseResNet(dim_in=2, dim_out=1, num_resnet_blocks=3,
# num_layers_per_block=3, num_neurons=25, activation=nn.LeakyReLU(0.65),
# fourier_features=True, m_freqs=30, sigma=2*torch.pi, tune_beta=True, lb=lb, ub=ub).to(device)
# # PINN = Fourier_dense_net1.to(device)
# x = torch.randn(2,2).to(device).requires_grad_(True)
# y = Fourier_dense_net1(x)
# #make_dot(y, params=dict(list(PINN.named_parameters()))).render("Residual_net", format="png")
# make_dot(y, params=dict(list(Fourier_dense_net1.named_parameters()))).render("Residual_PINN", format="svg")
The architecture of NN is as follow or in details
There are helper classes for the trainig purposes
class EarlyStopping:
def __init__(self, patience=30, verbose=True, delta=0):
# Initialize EarlyStopping with patience, verbosity, and delta
self.patience = patience # Number of steps with no improvement to wait before stopping
self.verbose = verbose # Whether to print messages when early stopping occurs
self.test_loss_history = [] # Use a list to store the history of test losses
self.delta = delta # Minimum change in test loss to be considered an improvement
def __call__(self, test_loss):
# Calculate a score based on the negative of the test loss (lower loss is better)
score = -test_loss
self.test_loss_history.append(score) # Append the score to the history of test losses
# If we have more than `patience` results, then look back `patience` results
if len(self.test_loss_history) > self.patience:
# Check if the current score is better than the score from `patience` steps ago
if score < self.test_loss_history[-self.patience] + self.delta:
if self.verbose:
print(f"Early stopping due to no improvement over the last {self.patience} steps.")
return True # Early stop condition is met
return False # Continue training
The solving of forward and inverse problems using Physics-Informed Neural Networks (PINNs) involves creating a neural network that can learn to approximate the solutions to the given partial differential equations (PDEs) while respecting any known physics (i.e., the differential equations and boundary/initial conditions).
Forward Problem Algorithm:¶
For the forward problem, the goal is to find the function $u$ for a given data $f$ with specified parameters $\omega$. A typical approach for solving both the forward and inverse problems with PINNs, as seen in literature, is based [source]. Here is a simplified version of the forward algorithm:
- Initialize Neural Network: Set up a neural network with an appropriate architecture.
- Define Loss Function: The loss function incorporates a residual term from the PDE and possibly boundary/initial conditions.
- Train Neural Network: Train the neural network to minimize the loss function using optimization algorithms like Adam or L-BFGS.
- Evaluate: Evaluate the trained network to obtain the solution $u$ over the domain of interest.
Inverse Problem Algorithm:¶
For the implementation go here
In the inverse problem, the aim is to deduce parameters $\gamma$ from the given data. One way to approach this is by extending the traditional PINN framework to handle the inverse problem by incorporating the unknown parameters into the neural network and optimizing over these as well. A Bayesian approach has also been proposed, where a Bayesian Neural Network (BNN) combined with a PINN serves as the prior, and techniques like Hamiltonian Monte Carlo (HMC) or Variational Inference (VI) could serve as estimators [source]. Here is a simplified version of the algorithm:
- Initialize Neural Network and Parameters: Set up a neural network with an appropriate architecture and initialize the unknown parameters $\gamma$.
- Define Loss Function: The loss function incorporates a residual term from the PDE, boundary/initial conditions, and a term related to the discrepancy between the predicted and observed data.
- Optimize: Train the neural network and optimize the unknown parameters $\gamma$ to minimize the loss function.
- Evaluate: Evaluate the trained network and optimized parameters to obtain the solution $u$ and parameters $\gamma$ over the domain of interest.
X_test=x_test.float().to(device) # the input dataset (complete)
U_test=U_test.float().to(device) # the real solution
class Fabryka(nn.Module):
def __init__(self, X_train_boundaries, U_train_boundaries, U_hat, X_train_total, total_pts_x, total_pts_t, dim_in, dim_out, num_resnet_blocks,
num_layers_per_block, num_neurons, activation,
fourier_features, m_freqs, sigma, tune_beta, lb, ub):
super(Fabryka, self).__init__()
self.X_train_boundaries = X_train_boundaries
self.U_train_boundaries = U_train_boundaries
self.U_hat = U_hat
self.X_train_total = X_train_total
self.total_pts_x = total_pts_x
self.total_pts_t = total_pts_t
self.loss_function = nn.MSELoss(reduction ='mean')
# Initialize the DenseResNet inside FCN
self.dense_res_net = DenseResNet(dim_in, dim_out, num_resnet_blocks,
num_layers_per_block, num_neurons, activation,
fourier_features, m_freqs, sigma, tune_beta, lb=lb, ub=ub)
def forward_loss(self):
x_BC= self.dense_res_net(self.X_train_boundaries)
loss_BC = self.loss_function(x_BC, self.U_train_boundaries) #Loss
g=X_train_total.clone() # points to training
g.requires_grad=True # enable differentiation
U = self.dense_res_net(g)
# DERVIVATIVES COMPUTATIONS
U_x_t = autograd.grad(U,g,torch.ones([g.shape[0], 1]).to(device), retain_graph=True, create_graph=True)[0] #first derivative
U_xx_tt = autograd.grad(U_x_t,g,torch.ones(g.shape).to(device), create_graph=True)[0] #second derivative
U_t=U_x_t[:,[1]] # we select the 2nd element for t (the first one is x) (Remember the input X=[x,t])
U_xx=U_xx_tt[:,[0]] # we select the 1st element for x (the second one is t) (Remember the input X=[x,t])
U=U_t - 0.1*U_xx # updated approximation
loss_PDE = self.loss_function(U,self.U_hat)#U_hat=0 if you putt the equation on the LHS and leave 0 on the RHS
loss = loss_BC + loss_PDE
return loss
def test_PINN(self, X_test):
return self.dense_res_net(X_test)
def __call__(self, steps, lr):
optimizer = torch.optim.Adam(self.parameters(), lr=lr)
LOSSES = []
TEST_LOSSES = []
ITERACJE = []
#LEARNING_RATES = [] # Initialization
iter = 0
test_loss = torch.tensor([np.Inf]).to(device)
U_anim = np.zeros((self.total_pts_x, self.total_pts_t, steps))
early_stopping_test = EarlyStopping(patience=30, verbose=True)
print('Iter|Training Losses|Test Losses')
for i in range(steps):
optimizer.zero_grad()
loss = self.forward_loss()
loss.backward()
optimizer.step()
# Updated to handle the larger number of steps
if i < steps:
U_anim[:, :, i] = self.dense_res_net(X_test).reshape(shape=[self.total_pts_t, self.total_pts_x]).transpose(1, 0).detach().cpu()
iter += 1
if i % 30 == 0:
with torch.no_grad():
self.loss_function.eval()
# Compute test loss
test_loss = self.loss_function(self.dense_res_net(X_test), U_test)
LOSSES.append(loss.detach().cpu().numpy())
TEST_LOSSES.append(test_loss.detach().cpu().numpy())
#current_lr = optimizer.param_groups[0]['lr']
#LEARNING_RATES.append(current_lr)
ITERACJE.append(iter)
# Early stopping check
if early_stopping_test(test_loss.detach().cpu()):
break
print(iter-1, ': ', loss.detach().cpu().numpy(), '---', test_loss.detach().cpu().numpy())#, '--- LR:', current_lr)
return U_anim, ITERACJE, LOSSES, TEST_LOSSES
seed = 123
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed) # if you have more than one GPU
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = True
fcn_model = Fabryka(X_train_boundaries, U_train_boundaries, U_hat, X_train_total, total_points_x, total_points_t, dim_in=2, dim_out=1, num_resnet_blocks=3,
num_layers_per_block=3, num_neurons=25, activation=nn.ELU(0.7),#nn.LeakyReLU(0.65),
fourier_features=True, m_freqs=29, sigma=2, tune_beta=True, lb=lb, ub=ub) # Fill in the other required arguments
PINN = fcn_model.to(device)
steps = 4000 # or whatever number of steps you want
learning_rate = 8e-5
U_anim, ITERACJE, LOSSES, TEST_LOSSES = PINN(steps, learning_rate)
fig, (ax1, ax2) = plt.subplots(2,1, figsize=(20,15))
fig.suptitle('Historical Losses', fontsize=30)
ax1.semilogy(ITERACJE[1:], LOSSES[1:], c='deeppink',linewidth=2.8, label='Train loss')
ax1.set_ylabel('Loss')
ax1.set_xlabel('Iterations')
ax1.legend()
ax2.semilogy(ITERACJE[1:], TEST_LOSSES[1:],c='dodgerblue',linewidth=2.8, label='Test loss')
ax2.set_ylabel('Test Loss')
ax2.set_xlabel('Iterations')
ax2.legend()
plt.show()
# Save the model parameters
model_path = 'fcn_model.pth'
torch.save(PINN.state_dict(), model_path)
# Create a new instance of the model class
new_fcn_model = Fabryka(
X_train_boundaries, U_train_boundaries, U_hat, X_train_total,
total_points_x, total_points_t, dim_in=2, dim_out=1,
num_resnet_blocks=3, num_layers_per_block=3, num_neurons=25,
activation=nn.ELU(0.7), fourier_features=True, m_freqs=29,
sigma=2, tune_beta=True, lb=lb, ub=ub
)
new_PINN = new_fcn_model.to(device)
# Load the saved parameters
model_path = 'fcn_model.pth'
new_PINN.load_state_dict(torch.load(model_path))
# Set the model to evaluation mode
new_PINN.eval()
U1=new_PINN.test_PINN(X_test)
x1=X_test[:,0]
t1=X_test[:,1]
arr_x1=x1.reshape(shape=[total_points_t,total_points_x]).transpose(1,0).detach().cpu()
arr_T1=t1.reshape(shape=[total_points_t,total_points_x]).transpose(1,0).detach().cpu()
arr_U1=U1.reshape(shape=[total_points_t,total_points_x]).transpose(1,0).detach().cpu()
arr_U_test=U_test.reshape(shape=[total_points_t,total_points_x]).transpose(1,0).detach().cpu()
approx_sol(arr_x1,arr_T1,arr_U1)
contour_comparison_with_errors(x, t, U_real, arr_U1)
Inverse Problem¶
Denote a neural network (NN) function by $N$ and remember that PDE in question is a model that can be parametrised. We parametrise the assumed $u(x,t) + model[\theta; \alpha]=0$ with $u(x,t)=u_t$ as hidden solution and $model[\theta; \alpha]$ as non-linear operator. Note, there are also parameters $\theta$ inside the $N$. Thus, for the discovering purposes, we have to put parameters $\theta$ and $\alpha$ together. By solving the inverse proble, for example, having some temperature data flow, we can identify the PDE's parameters characteristic for a given material, and hence to identify this material.
Rearranging our heat eq $U_t-0.1\,U_{xx}$, or $u_t+model[\theta; \alpha]=0$, we use PINN to obtain $\alpha$ by setting $\theta$ parmeters inside the NN under the PDE's operator structure.
By universal approximation theorem, we can use neural network (NN), $N$, to approximate any function $u(x,t) \approx N(x,t)$ . Since $N$ is a function, we can obtain its derivatives via automatic differention. Then we can arrange these derivatives to mimic the PDE's opearator structure
$$u_t-\alpha u_{xx}=N_t-\alpha N_{xx}=0$$
Define function $f$
$$f(t, x)=u_t\underbrace{-\alpha u_{xx}}_{{+ model[\theta; \alpha]}}=N_t-\alpha N_{xx}$$
and try to
$$f(x,t)\approx 0$$
$f$ is a functional that is the operator taking the function and returns a scalar. It is also the argument of the loss function, $\mathcal{L}_{f}$ . If $f {\longrightarrow} 0$ then we discover our $\alpha $ parameter.
In order to better express the dependence on the parameters $\alpha$ and $\theta$ in the total loss expression $\mathcal{L}$, you might want to consider the following amendment to the total loss expression. Incorporating the additional loss due to boundary and initial conditions, the total loss function now can be expressed as:
$$ \mathcal{L}(\theta, \alpha) = \mathcal{L}_{f}(\theta, \alpha) + \mathcal{L}_{u=data}(\theta) + \mathcal{L}_{\text{b}}(\theta) $$
Here are the individual components of the total loss function:
-
Loss Relating to PDE: $$ \mathcal{L}_{f}(\theta, \alpha) = \frac{1}{N_{N\_f}} \sum _{i=1}^{N\_f} \left| f\left(x_n^i,t_n^i; \theta, \alpha\right) \right|^2 $$
-
Loss Relating to the Overall Performance of the Algorithm: $$ \mathcal{L}_{u=data}(\theta) = \frac{1}{N_{N\_f}} \sum _{i=1}^{N\_f} \left| u\left(x_n^i, t_n^i\right) - N\left(x_n^i,t_n^i; \theta\right) \right|^2 $$
-
Loss Relating to Boundary and Initial Conditions: $$ \mathcal{L}_{\text{b}}(\theta) = \frac{1}{N_{N\_b}} \sum _{i=1}^{N\_b} \left| u\left(x_b^i, t_b^i\right) - N\left(x_b^i,t_b^i; \theta\right) \right|^2 $$
In the above expressions:
- $f$ and $N$ are functions parametrized by $\theta$ and $\alpha$.
- $N_{N\_f}$ is the total number of collocation points.
- $N_{N\_b}$ is the total number of boundary and initial condition points.
- $(x_b^i, t_b^i)$ are the coordinates of the boundary and initial condition points.
The goal is to minimize the total loss $\mathcal{L}$ to obtain the neural network's parameters $\theta=[W_i, b_i]$ and the PDE's parameter $\alpha$.
Let's recall relevant shapes of tensors.
seed = 123
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed) # if you have more than one GPU
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = True
# We will approximate the heat equation on the box [0, 1]x[0, 1]:
x_min=0.
x_max=1.
t_min=0.
t_max=2.
#Collocation discretisation of the box, i.e. for the residual term
total_points_x=200
total_points_t=100
#Create mesh
x=torch.linspace(x_min,x_max,total_points_x).view(-1,1) #add dimension
print(f'x shape is {x.shape}')
t=torch.linspace(t_min,t_max,total_points_t).view(-1,1)
# Create the mesh
X,T=torch.meshgrid(x.squeeze(1),t.squeeze(1))#[200,100]
print(x.shape)#[200,1]
print(t.shape)#[100,1]
print(X.shape)#[200,100]
print(T.shape)#[200,100]
x_min_tens = torch.Tensor([x_min])
x_max_tens = torch.Tensor([x_max])
#To get analytical solution obtained via MATHEMATICA
f_real = lambda x, t: u(x, t)
# Evaluate real solution on the box domain
U_real=f_real(X,T)
print(f'U_real shape is: {U_real.shape}')
analytical_sol(x,t,U_real) #f_real was defined previously(function)
print(x.shape,t.shape, U_real.shape)
print(x.shape[0])
print(X.shape,T.shape)
Training Data¶
For the inverse problem, we sample points from the PDE's solution which is given. Again we have to flatten (x,t)-grid of $u(x,t)$ in eaxh dimension, x and t. Next we concatenate obtained long vectors (remembering about keeping right order). We have to process initial and boundaries conditions since we know them and we can use them. However the latter is not necessary to recover $\alpha$ parameter albeit helpful.
def initial_condition_function(left_X):
return torch.sqrt(left_X[:, 0]).unsqueeze(1)
def bottom_boundary_function(bottom_X):
return torch.zeros(bottom_X.shape[0], 1)
def top_boundary_function(top_X):
return torch.cos(2. * top_X[:, 1]).unsqueeze(1)
# Flatten and concatenate to obtain X_true
X_true = torch.cat((X.flatten(start_dim=0).unsqueeze(1), T.flatten(start_dim=0).unsqueeze(1)), dim=1)
print(X.flatten(start_dim=0).shape)
# Domain bounds
lb = X_true[0] # [-1. 0.]
ub = X_true[-1] # [1. 1.]
usol_numpy = U_real.cpu().numpy()
U_true_numpy = usol_numpy.reshape(-1)#, order='F') # Transpose, then flatten in Fortran order
U_true = torch.tensor(U_true_numpy, dtype=torch.float32).unsqueeze(1) # Convert back to tensor and add a new dimension
# Left Edge: U(x,0) -> xmin=<x=<xmax; t=0
initial_X = torch.hstack((X[:, 0][:, None], T[:, 0][:, None])) # [200,2]
initial_U = initial_condition_function(initial_X) # [200,1]
idx = np.random.choice(initial_X.shape[0], 20, replace=False)
X_IC = initial_X[idx, :] # [15,2]
U_IC = initial_U[idx, :] # [15,1]
bottom_X = torch.hstack([X[0, :][:, None], T[0, :][:, None]]) # [100,2]
bottom_U = bottom_boundary_function(bottom_X) # [100,1]
top_X = torch.hstack((X[-1, :][:, None], T[-1, :][:, None])) # [100,2]
top_U = top_boundary_function(top_X) # [100,1]
# Stack all boundaries vertically
X_BC = torch.vstack([bottom_X, top_X]) # [150,2]
U_BC = torch.vstack([bottom_U, top_U]) # [150,1]
# N_BC=200
idx = np.random.choice(X_BC.shape[0], 20, replace=False)
X_BC = X_BC[idx, :] # [100,2]
U_BC = U_BC[idx, :] # [100,1]
# Stack all boundaries vertically
X_train_boundaries = torch.vstack([X_IC, X_BC]) # [200,2]
U_train_boundaries = torch.vstack([U_IC, U_BC]) # [200,1]
# Store tensors to GPU
X_train_boundaries = X_train_boundaries.float().to(device) # Training Points (BC)
U_train_boundaries = U_train_boundaries.float().to(device) # Training Points (BC)
print('U_true', U_true.shape)
total_points=len(x)*len(t)
print('total_points: ', total_points)
N_f = 1000 #Total number of collocation points
# Obtain random points for interior
id_f = np.random.choice(total_points, N_f, replace=False)# Randomly chosen points for Interior
print('id_f: ', id_f.shape)
X_train_Nu = X_true[id_f]
U_train_Nu= U_true[id_f]
'send them to our GPU'
X_train_Nu = X_train_Nu.float().to(device)
print('X_train_Nu: ', X_train_Nu.shape)
U_train_Nu = U_train_Nu.float().to(device)
print('U_train_Nu: ', U_train_Nu.shape)
X_true = X_true.float().to(device)
print('X_true: ', X_true.shape)
U_true = U_true.float().to(device)
print('U_true: ', U_true.shape)
f_hat = torch.zeros(X_train_Nu.shape[0],1).to(device)
print('f_hat: ', f_hat.shape)
class Fabryka_Inverse(nn.Module):
def __init__(self, dim_in, dim_out, num_resnet_blocks,
num_layers_per_block, num_neurons, activation,
fourier_features, m_freqs, sigma, tune_beta, alpha, lb, ub):
super(Fabryka_Inverse, self).__init__()
self.loss_function = nn.MSELoss(reduction ='mean') # Mean squared error loss
self.iter = 0 # Iteration counter
# alpha is a parameter that will be learned (related to the PDE)
self.alpha = torch.tensor([alpha], requires_grad=True).float().to(device)
self.alpha = nn.Parameter(self.alpha)
# DenseResNet is a custom network architecture used in this model
self.dnn = DenseResNet(
dim_in=dim_in, dim_out=dim_out,
num_resnet_blocks=num_resnet_blocks,
num_layers_per_block=num_layers_per_block,
num_neurons=num_neurons, activation=activation,
fourier_features=fourier_features, m_freqs=m_freqs,
sigma=sigma, tune_beta=tune_beta, lb=lb, ub=ub
).to(device)
# Registering alpha as a parameter of dnn
self.dnn.register_parameter('alpha', self.alpha)
def loss_data(self,x,y):
# Loss against real data
loss_u = self.loss_function(self.dnn(x), y)
return loss_u
def loss_boundaries(self, X_boundaries, U_boundaries):
# Compute the loss for both initial and boundary conditions
loss_b = self.loss_function(self.dnn(X_boundaries), U_boundaries)
return loss_b
def loss_PDE(self, X_train_Nu):
alpha=self.alpha
#lambda2=self.lambda2 # Uncomment if lambda2 is needed
g = X_train_Nu.clone()
g.requires_grad = True # Setting requires_grad to True for automatic differentiation
u = self.dnn(g)
# Calculating gradients and second-order gradients
u_x_t = autograd.grad(u, g, torch.ones([X_train_Nu.shape[0], 1]).to(device), retain_graph=True, create_graph=True)[0]
u_xx_tt = autograd.grad(u_x_t, g, torch.ones(X_train_Nu.shape).to(device), create_graph=True)[0]
u_x = u_x_t[:,[0]]
u_t = u_x_t[:,[1]]
u_xx = u_xx_tt[:,[0]]
f = u_t - (alpha)*u_xx # Residual of the PDE
loss_f = self.loss_function(f,f_hat) # f_hat should be defined or passed as an argument
return loss_f
def loss_total(self, X_col, U_col, X_boundaries, U_boundaries):
# Combine data loss, PDE loss, initial loss, and boundary loss
loss_u = self.loss_data(X_col, U_col)
loss_f = self.loss_PDE(X_col)
loss_b = self.loss_boundaries(X_boundaries, U_boundaries)
loss_val = loss_u + loss_f + loss_b# + loss_i
return loss_val
def __call__(self, steps, lr, X_boundaries, U_boundaries):
# Collecting parameters for optimization
params = list(self.dnn.parameters())#+ [self.alpha]
optimizer = torch.optim.Adam(params, lr=lr) # Adam optimizer
iter = 0
for i in range(steps):
optimizer.zero_grad()
# Ensure the correct arguments are passed to loss_total
loss = self.loss_total(X_train_Nu, U_train_Nu, X_boundaries, U_boundaries)# X_IC, U_IC, X_BC, U_BC) # If loss_total requires arguments, provide them here
loss.backward() # Backpropagation
optimizer.step() # Update step
self.iter += 1
if self.iter % 100 == 0:
error_vec, _ = self.test()
print(f'Iter: {self.iter}, Relative Error(Test): {error_vec.cpu().detach().numpy():.5f}, α_real = [0.1], α_PINN = [{self.alpha.item():.5f}]')
# print(
# 'Iter: %d, Relative Error(Test): %.5f , α_real = [0.1], α_PINN = [%.5f]' %
# (
# self.iter,
# error_vec.cpu().detach().numpy(),
# self.alpha.item(),
# )
# )
return loss
def test(self):
u_pred = self.dnn(X_true) # X_true should be defined or passed as an argument
# Relative L2 Norm of the error (Vector)
error_vec = torch.linalg.norm((U_true-u_pred),2)/torch.linalg.norm(U_true,2) # U_true should be defined or passed as an argument
u_pred = u_pred.cpu().detach().numpy()
u_pred = np.reshape(u_pred,(x.shape[0],t.shape[0]),order='C') # x and t should be defined or passed as arguments
return error_vec, u_pred
#Set default dtype to float32
torch.set_default_dtype(torch.float)
torch.manual_seed(123)
np.random.seed(123)
fcn_instance_inv = Fabryka_Inverse(
dim_in=2, dim_out=1, num_resnet_blocks=3,
num_layers_per_block=3, num_neurons=25, activation=nn.ELU(0.853), # nn.Tanh(),#nn.SiLU(),
fourier_features=True, m_freqs=29, sigma=2, tune_beta=True, alpha=0.5, lb=lb, ub=ub)
fcn_instance_inv.to(device)
# Now call the __call__ method of FCN with steps and lr
steps = 20000
lr = 7e-5
fcn_instance_inv(steps, lr, X_train_boundaries, U_train_boundaries)
# Save the model parameters
model_path = 'fcn_model_inv.pth'
torch.save(fcn_instance_inv.state_dict(), model_path)
# # Create a new instance of FCN
# loaded_fcn_instance = Fabryka_Inverse(
# dim_in=2, dim_out=1, num_resnet_blocks=3,
# num_layers_per_block=3, num_neurons=25, activation=nn.ELU(0.77), # nn.Tanh(),#nn.SiLU(),
# fourier_features=True, m_freqs=29, sigma=2, tune_beta=True, alpha=0.5, lb=lb, ub=ub
# )
# # Load the model parameters from the file
# loaded_fcn_instance.load_state_dict(torch.load(model_path))
# # Move the model to the desired device
# loaded_fcn_instance.to(device)
error_vec, u_pred = fcn_instance_inv.test()
contour_comparison_with_errors(x, t, U_real, u_pred)
import matplotlib.gridspec as gridspec # Import GridSpec from matplotlib for specifying grid layout of subplots
from mpl_toolkits.axes_grid1 import make_axes_locatable # Import make_axes_locatable for managing layout
def solutionplot(u_pred,X_u_train,u_train): # Define function solutionplot with three parameters
fig, ax = plt.subplots(figsize=(25, 22)) # Create figure and axis objects with specified figure size
ax.axis('off') # Turn off the axis (hide axis labels and ticks)
gs0 = gridspec.GridSpec(1, 1) # Create a GridSpec object with 1 row and 1 column
gs0.update(top=1-0.06, bottom=1-1/3, left=0.15, right=0.85, wspace=0) # Update GridSpec layout parameters
ax = plt.subplot(gs0[:, :]) # Create a subplot based on the GridSpec layout
h = ax.imshow(u_pred, interpolation='nearest', cmap='twilight',
extent=[T.min(), T.max(), X.min(), X.max()],
origin='lower', aspect='auto') # Display u_pred as an image on the axis
divider = make_axes_locatable(ax) # Create a divider for the existing axis instance
cax = divider.append_axes("right", size="5%", pad=0.05) # Append axes to the right of ax with specified size and padding
fig.colorbar(h, cax=cax) # Add a colorbar to the right of the image
# Scatter plot for the data points with adjusted marker size
ax.scatter(X_u_train[:, 1], X_u_train[:, 0], color='k', marker='.', label='Data (%d points)' % (u_train.shape[0]), s=40)
# Scatter plot for the initial condition points with adjusted marker size
ax.scatter(X_IC[:, 1], X_IC[:, 0], color='g', marker='x', label='IC Points (%d points)' % (X_IC.shape[0]), s=200)
# Scatter plot for the boundary condition points with adjusted marker size
ax.scatter(X_BC[:, 1], X_BC[:, 0], color='y', marker='o', label='BC Points (%d points)' % (X_BC.shape[0]), s=200)
ax.legend(loc='upper right') # Create a legend in the upper right corner of the subplot
plt.show() # Display the figure
solutionplot(U_real, X_train_Nu.cpu().detach().numpy(), U_train_Nu)
In this summary, we delve into the enhancement of Physics-Informed Neural Networks (PINNs) training via the integration of Fourier features and Residual Neural Networks. Furthermore, we explore the mesh invariance issue witnessed in PINNs and propose the adoption of Neural Operators as a viable solution.
(a) Utilizing Fourier Features and Residual Neural Networks in PINNs:
The amalgamation of Fourier features and Residual Neural Networks (ResNets) has been recognized as a promising approach to augment the training of PINNs for both forward and inverse problems. Fourier features are crucial in capturing the periodic nature and the multi-scale behaviors inherent in many physical systems. They aid in enriching the feature space, thus allowing the neural networks to learn and generalize better across varying scales and domains. ResNets, on the other hand, facilitate the training of deeper networks by mitigating the vanishing gradient problem, which is pivotal in learning hierarchical features and ensuring the convergence of the training process. When harmonized, these techniques contribute to a more robust and efficient training regime, improving the performance of PINNs in solving Partial Differential Equations (PDEs) inherent in the physical problems at hand.
(b) Mesh Invariance Issue in PINNs:
Despite the advancements, a notable drawback of PINNs is their sensitivity to the mesh grid, number of points on the grid, and other hyperparameters. Particularly, the quality of results may deteriorate with alterations in the grid configuration or boundary conditions. This lack of mesh invariance poses a challenge as it impedes the robustness and generalizability of PINNs across varying discretizations and geometries. Mesh invariance is essentially the ability of the model to provide consistent results irrespective of the mesh grid configurations. The absence of this invariance in PINNs could potentially lead to inconsistent or erroneous results across different problem setups.
(c) Alleviating Invariance Issues through Neural Operators:
Neural Operators emerge as a promising avenue to overcome the mesh invariance issue witnessed in PINNs. Unlike traditional PINNs that operate on a fixed grid, Neural Operators are designed to learn mappings between function spaces, thus offering a level of abstraction that transcends fixed discretizations. By operating in this higher-level function space, Neural Operators are capable of achieving mesh invariance, ensuring consistent and accurate results across varying grid configurations and boundary conditions. This distinction underscores a fundamental shift from solving PDEs in a discretized domain (as done in PINNs) to a more generalized approach that operates in continuous function spaces. Although the exact implementation may vary, the core idea remains to establish a learning framework that is less tethered to the discretization scheme and more focused on the inherent structures and properties of the underlying PDEs.
In conclusion, the fusion of Fourier features and Residual Neural Networks significantly boosts the efficacy of PINNs in tackling forward and inverse problems. However, the mesh invariance issue remains a hurdle, necessitating further exploration of alternative strategies like Neural Operators to attain a more robust and generalized solution framework for solving PDEs.