Kreyszig 2.6 Linear Operators

Lucy Nowacki

2023-11-21 19:22

Problem 1. Show that the operators in sections 2.6-2, 2.6-3, and 2.6-4 are linear.

Solution:

To show that the operators in sections 2.6-2, 2.6-3, and 2.6-4 are linear, we need to verify that each operator satisfies the two linearity conditions for all vectors $x, y$ in the domain and all scalars $a, b$ in the field over which the vector space is defined:

$T(x + y) = T(x) + T(y)$ (additivity)
$T(ax) = aT(x)$ (homogeneity)

Let's consider each operator in turn:

2.6-2 Identity Operator $I_x$: The identity operator $I_x$ on a vector space $X$ is defined by $I_x(x) = x$ for all $x \in X$.

For additivity, consider two vectors $x, y \in X$. We need to show that $I_x(x + y) = I_x(x) + I_x(y)$. Indeed, $I_x(x + y) = x + y = I_x(x) + I_x(y)$.
For homogeneity, consider a scalar $a$ and a vector $x \in X$. We need to show that $I_x(ax) = aI_x(x)$. Indeed, $I_x(ax) = ax = aI_x(x)$.

2.6-3 Zero Operator $0_x$: The zero operator $0_x$ on a vector space $X$ to another vector space $Y$ is defined by $0_x(x) = 0$ for all $x \in X$, where $0$ is the zero vector in $Y$.

For additivity, consider two vectors $x, y \in X$. We have $0_x(x + y) = 0 = 0 + 0 = 0_x(x) + 0_x(y)$.
For homogeneity, for any scalar $a$ and vector $x in X$, $0_x(ax) = 0 = a \cdot 0 = a0_x(x)$.

2.6-4 Differentiation Operator $D$: Let $X$ be the vector space of all polynomials on $[a, b]$. The differentiation operator $D$ is defined by $D(T(x)) = T'(x)$, where $T'$ denotes differentiation with respect to $x$.

For additivity, let $x(t)$ and $y(t)$ be polynomials in $X$. Then $D(x(t) + y(t)) = (x + y)'(t) = x'(t) + y'((t) = D(x(t)) + D(y(t))$.
For homogeneity, let $a$ be a scalar and $x(t)$ be a polynomial in $X$. Then $D(a \cdot x(t)) = (a \cdot x)'(t) = a \cdot x'(t) = a \cdot D(x(t))$.

In all cases, the operators satisfy the linearity conditions, hence they are indeed linear operators.

$\blacksquare$

Problem 2. Show that the operators $T_1, T_2, T_3$, and $T_4$ from $\mathbb{R}^2$ into $\mathbb{R}^2$ defined by

$T_1(\xi_1, \xi_2) = (\xi_1, 0)$
$T_2(\xi_1, \xi_2) = (0, \xi_2)$
$T_3(\xi_1, \xi_2) = (\xi_2, \xi_1)$
$T_4(\xi_1, \xi_2) = (\gamma\xi_1, \gamma\xi_2)$

respectively, are linear, and interpret these operators geometrically.

Solution: To demonstrate the linearity of operators $T_1, T_2, T_3$, and $T_4$, we must verify that each operator satisfies the following properties for all vectors $\xi, \eta \in \mathbb{R}^2$ and all scalars $a \in \mathbb{R}$:

Additivity: $T(\xi + \eta) = T(\xi) + T(\eta)$
Homogeneity: $T(a\xi) = aT(\xi)$

For $T_1$:

Additivity: $T_1((\xi_1 + \eta_1, \xi_2 + \eta_2)) = (\xi_1 + \eta_1, 0) = (\xi_1, 0) + (\eta_1, 0) = T_1(\xi_1, \xi_2) + T_1(\eta_1, \eta_2)$
Homogeneity: $T_1(a(\xi_1, \xi_2)) = (a\xi_1, 0) = a(\xi_1, 0) = aT_1(\xi_1, \xi_2)$

For $T_2$, additivity and homogeneity can be shown similarly, with $T_2$ projecting any vector onto the y-axis.

For $T_3$:

Additivity: $T_3((\xi_1 + \eta_1, \xi_2 + \eta_2)) = (\xi_2 + \eta_2, \xi_1 + \eta_1) = (\xi_2, \xi_1) + (\eta_2, \eta_1) = T_3(\xi_1, \xi_2) + T_3(\eta_1, \eta_2)$
Homogeneity: $T_3(a(\xi_1, \xi_2)) = (a\xi_2, a\xi_1) = a(\xi_2, \xi_1) = aT_3(\xi_1, \xi_2)$

For $T_4$:

Additivity: $T_4((\xi_1 + \eta_1, \xi_2 + \eta_2)) = (\gamma(\xi_1 + \eta_1), \gamma(\xi_2 + \eta_2)) = (\gamma\xi_1, \gamma\xi_2) + (\gamma\eta_1, \gamma\eta_2) = T_4(\xi_1, \xi_2) + T_4(\eta_1, \eta_2)$
Homogeneity: $T_4(a(\xi_1, \xi_2)) = (a\gamma\xi_1, a\gamma\xi_2) = a(\gamma\xi_1, \gamma\xi_2) = aT_4(\xi_1, \xi_2)$

Geometric Interpretation:

$T_1$ and $T_2$ are projection operators onto the x-axis and y-axis respectively.
$T_3$ is a reflection operator across the line $\xi_1 = \xi_2$.

$\blacksquare$

Problem 3. What are the domain, range, and null space of $T_1, T_2, T_3$ in Problem 2?

Solution:

To determine the domain, range, and null space of the linear operators $T_1, T_2,$ and $T_3$, we consider their definitions from Problem 2.

For Operator $T_1$: $T_1(\xi_1, \xi_2) = (\xi_1, 0)$

Domain: The domain of $T_1$ is the entire $\mathbb{R}^2$.
Range: The range of $T_1$ is the x-axis, given by $\{(\xi_1, 0) \mid \xi_1 \in \mathbb{R}\}$.
Null Space: The null space of $T_1$ is the set of all vectors that map to the zero vector under $T_1$, which is $\{(0, \xi_2) \mid \xi_2 \in \mathbb{R}\}$.

For Operator $T_2$: $T_2(\xi_1, \xi_2) = (0, \xi_2)$

Domain: The domain of $T_2$ is the entire $\mathbb{R}^2$.
Range: The range of $T_2$ is the y-axis, described by $\{(0, \xi_2) \mid \xi_2 \in \mathbb{R}\}$.
Null Space: The null space of $T_2$ includes all vectors that $T_2$ maps to the zero vector, which is $\{(\xi_1, 0) \mid \xi_1 \in \mathbb{R}\}$.

For Operator $T_3$: $T_3(\xi_1, \xi_2) = (\xi_2, \xi_1)$

Domain: The domain of $T_3$ is the entire $\mathbb{R}^2$.
Range: The range of $T_3$ is also $\mathbb{R}^2$ since any vector in $\mathbb{R}^2$ can be obtained by applying $T_3$ to some vector in $\mathbb{R}^2$.
Null Space: The null space of $T_3$ is the set of vectors that are mapped to the zero vector, which is only the zero vector itself $\{(0, 0)\}$.

These operators' geometric interpretations relate to their ranges and null spaces, with $T_1$ and $T_2$ acting as projection operators onto the x-axis and y-axis, respectively, and $T_3$ mapping vectors onto the line $\xi_1 = \xi_2$.

$\blacksquare$

Problem 4. What is the null space of $T_4$ in Problem 2? Of $T_1$ and $T_2$ in 2.6-7? Of $T$ in 2.6-4?

Solution:

Given the definitions of the operators $T_4$, $T_1$, and $T_2$ from the provided images, we can find their null spaces.

For Operator $T_4$ from 2.6-4 (Differentiation):

Definition: $T_4$ is defined on the vector space $X$ of all polynomials on $[a, b]$ by $T_4(x(t)) = x'(t)$, where the prime denotes differentiation with respect to $t$.
Null Space: The null space of the differentiation operator consists of all polynomials $x(t)$ such that $x'(t) = 0$. Thus, the null space of $T_4$ is the set of all constant polynomials on $[a, b]$.

For Operator $T_1$ from 2.6-7 (Cross product with a fixed vector):

Definition: $T_1$ is defined on $\mathbb{R}^3$ by $T_1(\vec{x}) = \vec{x} \times \vec{a}$, where $\vec{a}$ is a fixed vector in $\mathbb{R}^3$.
Null Space: The null space of $T_1$ includes all vectors $\vec{x}$ such that $\vec{x} \times \vec{a} = \vec{0}$, which are the scalar multiples of $\vec{a}$ including the zero vector.

For Operator $T_2$ from 2.6-7 (Dot product with a fixed vector):

Definition: $T_2$ is defined on $\mathbb{R}^3$ by $T_2(\vec{x}) = \vec{x} \cdot \vec{a}$, where $\vec{a} = (a_i)$ is a fixed vector in $\mathbb{R}^3$.
Null Space: The null space of $T_2$ consists of all vectors $\vec{x}$ that are orthogonal to $\vec{a}$, which is the orthogonal complement of the vector $\vec{a}$ in $\mathbb{R}^3$.

The null spaces reflect the specific transformations these operators perform on their respective vector spaces.

$\blacksquare$

Problem 7. Determine if the operators $T_1$ and $T_3$ from Problem 2 commute.

Given:

$T_1(\xi_1, \xi_2) = (\xi_1, 0)$
$T_3(\xi_1, \xi_2) = (\xi_2, \xi_1)$

Solution:

To check for commutativity, we calculate $(T_1T_3)(\xi_1, \xi_2)$ and $(T_3T_1)(\xi_1, \xi_2)$.

Applying $T_1$ followed by $T_3$:

Apply $T_1$ to $(\xi_1, \xi_2)$:

$T_1(\xi_1, \xi_2) = (\xi_1, 0)$
Then apply $T_3$ to the result:

$T_3(\xi_1, 0) = (0, \xi_1)$

Applying $T_3$ followed by $T_1$:

Apply $T_3$ to $(\xi_1, \xi_2)$:

$T_3(\xi_1, \xi_2) = (\xi_2, \xi_1)$
Then apply $T_1$ to the result:

$T_1(\xi_2, \xi_1) = (\xi_2, 0)$

Comparing Results:

$T_1T_3$ yields $(0, \xi_1)$.
$T_3T_1$ yields $(\xi_2, 0)$.

Since $(0, \xi_1) \neq (\xi_2, 0)$ for arbitrary $\xi_1, \xi_2$, we conclude that $T_1$ and $T_3$ do not commute.

Conclusion:: The operators $T_1$ and $T_3$ do not satisfy the commutativity property $T_1T_3 = T_3T_1$ for all vectors in $\mathbb{R}^2$. Therefore, they are non-commutative.

$\blacksquare$

Problem 8. Represent the operators $T_1, T_2, T_3$, and $T_4$ from Problem 2 using $2 \times 2$ matrices.

Given Operators:

$T_1(\xi_1, \xi_2) = (\xi_1, 0)$
$T_2(\xi_1, \xi_2) = (0, \xi_2)$
$T_3(\xi_1, \xi_2) = (\xi_2, \xi_1)$
$T_4(\xi_1, \xi_2) = (\gamma\xi_1, \gamma\xi_2)$

Matrix Representations:

Conclusion:: Each operator from Problem 2 can be expressed as a $2 \times 2$ matrix. These matrices transform vectors in $\mathbb{R}^2$ by linearly scaling and/or permuting their components as specified by the operators.

$\blacksquare$

Problem 9. Elaborate the condition in 2.6-10(a) regarding the existence of an inverse operator, $T^{-1}$, in the context of the null space of $T$.

Theorem Interpretation: The theorem from section 2.6-10(a) can be restated in the context of the null space of $T$ as follows:

The inverse operator $T^{-1}$ from $\mathcal{R}(T)$ to $\mathcal{D}(T)$ exists if and only if the only solution to $Tx = 0$ is the trivial solution $x = 0$. This is equivalent to saying that the null space of $T$, denoted $N(T)$ or $\text{ker}(T)$, consists solely of the zero vector.

Definitions:

Linear Operator: A mapping $T: \mathcal{D}(T) \rightarrow Y$ between vector spaces $X$ and $Y$, adhering to additivity ($T(x + z) = T(x) + T(z)$) and homogeneity ($T(\alpha x) = \alpha T(x)$), for all $x, z \in \mathcal{D}(T)$ and scalars $\alpha$.
Inverse Operator: $T^{-1}: \mathcal{R}(T) \rightarrow \mathcal{D}(T)$ is the reverse mapping such that $T^{-1}(Tx) = x$ for all $x \in \mathcal{D}(T)$ and $T(T^{-1}y) = y$ for all $y \in \mathcal{R}(T)$.
Null Space: Denoted by $N(T)$ or $\text{ker}(T)$, it is the set of vectors $x \in \mathcal{D}(T)$ where $T(x) = 0$.

In-Depth Analysis of Theorem 2.6-10(a):

This theorem posits that $T^{-1}$ can only exist if $Tx = 0$ strictly leads to $x = 0$. Essentially, $N(T)$ must be trivial—comprised solely of the zero vector. If $N(T)$ included any non-zero vectors, $T$ could not be injective, as it would map distinct vectors to the same point (the zero vector in $Y$), contravening the bijective requirement for an inverse function.

Formulating the Condition for Inverse Existence:

The existence condition for $T^{-1}$ relative to the null space of $T$ is that $N(T) = \{0\}$. This reflects the injectivity of $T$.

Examples:

For an Injective Operator: A matrix representation of $T$ as $A$ with no linearly dependent rows or columns ensures $N(T) = \{0\}$, affirming the existence of $T^{-1}$.
For a Non-Injective Operator: Should $T$ be depicted by a matrix $A$ containing a zero row, $N(T)$ would be non-trivial, housing non-zero vectors, thus negating the presence of $T^{-1}$.

Conclusion:: The theorem outlined in 2.6-10(a) underscores a pivotal tenet in linear algebra: the invertibility of a linear operator is inherently dependent on the exclusivity of the zero vector in its null space. An operator $T$ is invertible if and only if $N(T)$ is trivial, serving as a vital criterion for $T$'s injectivity.

$\blacksquare$

Problem 10. Determine the existence of the inverse operator $T^{-1}$ for the differentiation operator $T$ as defined in section 2.6-4.

Operator Definition:: The operator $T$ defined in section 2.6-4 is the differentiation operator acting on the vector space $X$ of all polynomials on the interval $[a, b]$. The action of $T$ is defined by $T(x(t)) = x'(t)$, where $x'(t)$ denotes the derivative of $x(t)$ with respect to $t$.
Inverse Operator Existence Criteria:: An operator $T$ has an inverse $T^{-1}$ if and only if $T$ is bijective, which means it is both injective (one-to-one) and surjective (onto).
Injectivity Analysis:: $T$ is injective if $T(x) = T(y)$ implies $x = y$. For the differentiation operator, if $x'(t) = y'(t)$ for two polynomials $x(t)$ and $y(t)$, then $x(t)$ and $y(t)$ differ by at most a constant. Hence, for $T$ to be injective, we must restrict our attention to a subspace of $X$ where the constant of integration is fixed, for example by setting $x(a) = 0$ for all $x \in X$.
Surjectivity Analysis:: $T$ is surjective if for every function $y(t)$ in the codomain, there exists an $x(t)$ in the domain such that $T(x) = y$. The differentiation operator is surjective onto the space of all differentiable functions on $[a, b]$ that can be expressed as the derivative of a polynomial, which is again the space of all polynomials on $[a, b]$.
Existence of $T^{-1}$:: For the differentiation operator $T$, an inverse would correspond to the integration operator. However, since integration includes a constant of integration, $T$ is not surjective onto $X$, and therefore, its inverse $T^{-1}$ does not exist as a map back into $X$.
Conclusion:: The inverse $T^{-1}$ of the differentiation operator $T$ as defined in 2.6-4 does not exist within the space of all polynomials on $[a, b]$ because $T$ is not surjective onto $X$. The differentiation operator, without additional constraints, does not have a unique inverse that maps back to the original polynomial space due to the constant of integration involved in the antiderivative.

$\blacksquare$

Counterexample Illustration:

Consider the differentiation operator $T$ on the space $X$ of polynomials over an interval $[a, b]$. We are given a function $y(t) = e^t$ which is not a polynomial. Our goal is to find a polynomial $x(t)$ such that $x'(t) = y(t)$.

Attempt to Find $x(t)$:

The inverse operation to differentiation is integration. Thus, we integrate $y(t)$ to find $x(t)$:

\begin{equation*} x(t) = \int y(t) dt = \int e^t dt = e^t + C \end{equation*}

where $C$ represents the constant of integration.

Analysis:

The result of the integration, $x(t) = e^t + C$, is not a polynomial. Hence, it does not reside in the space $X$ of polynomials on $[a, b]$. This shows that $y(t)$, a non-polynomial function, does not have an antiderivative that is a polynomial in $X$.

Conclusion:

Since the integration maps $y(t) = e^t$ to a function outside the space of polynomials, it demonstrates that the differentiation operator $T$ is not surjective over the space $X$. Consequently, $T$ does not have an inverse $T^{-1}$ that maps back to $X$. The function $y(t) = e^t$ serves as a counterexample, indicating that there are functions in the codomain of $T$ for which no polynomial in $X$ is a pre-image, thereby confirming the non-existence of an inverse operator $T^{-1}$ that returns to the original polynomial space $X$.

Problem 11. Verify the linearity of the operator $T: X \rightarrow X$ defined by $T(x) = bx$ for a fixed $2 \times 2$ complex matrix $b$, and determine the condition for the existence of the inverse operator $T^{-1}$.

Proof of Linearity: To demonstrate that $T$ is linear, it must satisfy additivity and homogeneity.

Additivity:

For any $2 \times 2$ matrices $x$ and $y$ in $X$:

\begin{equation*} T(x + y) = b(x + y) = bx + by = T(x) + T(y) \end{equation*}

Homogeneity:

For any complex scalar $\alpha$ and matrix $x$ in $X$:

\begin{equation*} T(\alpha x) = b(\alpha x) = \alpha bx = \alpha T(x) \end{equation*}

Since $T$ satisfies both properties, we conclude that $T$ is indeed a linear operator.

Condition for the Existence of $T^{-1}$: The inverse operator $T^{-1}$ exists if and only if $T$ is bijective, which entails being both injective and surjective.

Injectivity:

$T$ is injective if $T(x) = T(y)$ implies $x = y$. For $T$, this condition holds if the matrix $b$ is invertible, i.e., $\text{det}(b) \neq 0$.

Surjectivity:

$T$ is surjective if for every $z$ in $X$, there exists an $x$ such that $T(x) = z$. This is true if $b$ is invertible, allowing us to solve $x = b^{-1}z$ for any $z$.

Therefore, the inverse operator $T^{-1}$ exists if and only if the matrix $b$ is invertible, characterized by a non-zero determinant, $\text{det}(b) \neq 0$.

$\blacksquare$

Problem 12. Assess the surjectivity of the operator $T: X \rightarrow X$, defined by $T(x) = bx$ for a fixed matrix $b$ in $X$, where $X$ is the vector space of all $2 \times 2$ complex matrices, and $bx$ denotes the standard product of matrices.

Surjectivity Definition:: An operator $T$ is said to be surjective if for every matrix $z$ in $X$, there is a matrix $x$ in $X$ such that $T(x) = z$. Formally, this means that the equation $bx = z$ has a solution for every matrix $z$ in $X$.
Condition for Surjectivity:: The operator $T$ defined by matrix multiplication is surjective if and only if the matrix $b$ is invertible. This is equivalent to the requirement that $\text{det}(b) \neq 0$. If $b$ is invertible, then for every matrix $z$ in $X$, there exists a unique matrix $x = b^{-1}z$ that solves the equation $bx = z$, indicating that $T$ maps onto the entire space $X$.
Conclusion:: Surjectivity of the operator $T$ hinges on the invertibility of the matrix $b$. If $b$ is not invertible (i.e., $\text{det}(b) = 0$), not all matrices $z$ in $X$ will have a pre-image under $T$, and thus $T$ will not be surjective. Conversely, if $b$ is invertible, $T$ is surjective, ensuring that the inverse operator $T^{-1}$ exists and operates as $T^{-1}(z) = b^{-1}z$ for all $z$ in $X$.

$\blacksquare$

Problem 13 Prove that if $\{x_1, \ldots, x_n\}$ is a linearly independent set in $\mathcal{D}(T)$, and $T: \mathcal{D}(T) \rightarrow Y$ is a linear operator with an inverse, then the set $\{Tx_1, \ldots, Tx_n\}$ is also linearly independent.

Proof:

Assume for contradiction that $\{Tx_1, \ldots, Tx_n\}$ is not linearly independent. Then there exist scalars $c_1, \ldots, c_n$, not all zero, such that:

\begin{equation*} c_1 Tx_1 + \ldots + c_n Tx_n = 0. \end{equation*}

Applying the inverse operator $T^{-1}$ to both sides, and using the linearity of $T^{-1}$, we obtain:

\begin{equation*} c_1 T^{-1}(Tx_1) + \ldots + c_n T^{-1}(Tx_n) = T^{-1}(0). \end{equation*}

Since $T^{-1}T$ is the identity operator on $\mathcal{D}(T)$, we have $T^{-1}(Tx_i) = x_i$ for all $i$. Knowing that the identity operator maps $0$ to $0$, the equation simplifies to:

\begin{equation*} c_1 x_1 + \ldots + c_n x_n = 0. \end{equation*}

This implies that $c_1, \ldots, c_n$ must all be zero because $\{x_1, \ldots, x_n\}$ is linearly independent, contradicting our assumption.

Conclusion:

Therefore, the set $\{Tx_1, \ldots, Tx_n\}$ must be linearly independent, under the condition that $T$ is invertible. This holds true due to the fundamental properties of linear transformations and their inverses in vector space theory.

$\blacksquare$

Problem 14. Prove that for a linear operator $T: X \rightarrow Y$ with $\text{dim} X = \text{dim} Y = n$, the range of $T$, $\mathcal{R}(T)$, is equal to $Y$ if and only if the inverse operator $T^{-1}$ exists.

Proof:

Forward Direction ($\mathcal{R}(T) = Y$ implies $T^{-1}$ exists):

If $\mathcal{R}(T) = Y$, then $T$ is surjective, meaning for every $y \in Y$, there exists at least one $x \in X$ such that $T(x) = y$. Since $\text{dim} X = \text{dim} Y$, $T$ is a surjective linear map between two finite-dimensional vector spaces of equal dimension, which implies $T$ is also injective. This is a consequence of the Rank-Nullity Theorem, which in this case implies that $\text{nullity}(T) = 0$ because $\text{rank}(T) = \text{dim} Y = n$ and $\text{rank}(T) + \text{nullity}(T) = \text{dim} X$.

Being both injective and surjective, $T$ is bijective, and therefore an inverse $T^{-1}$ exists by definition.

Reverse Direction ($T^{-1}$ exists implies $\mathcal{R}(T) = Y$):

If $T^{-1}$ exists, then by definition, $T$ is bijective, meaning it is both injective and surjective. The surjectivity of $T$ immediately gives us $\mathcal{R}(T) = Y$, because for every $y \in Y$, the existence of $T^{-1}$ guarantees an $x \in X$ such that $T(x) = y$.

Conclusion:

The range of $T$, $\mathcal{R}(T)$, is equal to $Y$ if and only if $T$ is bijective, and since $T$ is linear, this bijectivity is equivalent to the existence of an inverse $T^{-1}$. This holds true for finite-dimensional vector spaces $X$ and $Y$ of equal dimension $n$.

$\blacksquare$

Detailed Explanation of the Rank-Nullity Theorem in Context:: The Rank-Nullity Theorem is pivotal in understanding the relationship between the dimensions of a linear operator's range, null space, and domain. For a linear operator $T: X \rightarrow Y$ with $\text{dim} X = \text{dim} Y = n$, the theorem is expressed as:

\begin{equation*} \text{rank}(T) + \text{nullity}(T) = \text{dim} X \end{equation*}

Here, $\text{rank}(T)$ represents the dimension of the range of $T$ ($\mathcal{R}(T)$), and $\text{nullity}(T)$ signifies the dimension of the null space of $T$ ($N(T)$).

Application to the Given Problem:

If $\mathcal{R}(T) = Y$:

The rank of $T$ is the dimension of $Y$, hence $\text{rank}(T) = \text{dim} Y = n$.
Applying the Rank-Nullity Theorem, and knowing $\text{dim} X = n$, we deduce that $\text{nullity}(T) = 0$, which implies that $T$ is injective.
A linear operator that is injective and surjective is bijective, indicating the existence of an inverse $T^{-1}$.

If $T^{-1}$ Exists:
- The existence of $T^{-1}$ implies $T$ is bijective. Consequently, $T$ is injective, leading to $\text{nullity}(T) = 0$.
- Since $T$ is also surjective, $\text{rank}(T) = \text{dim} Y = n$.
- The Rank-Nullity Theorem then confirms that $\text{rank}(T) + \text{nullity}(T) = n$, which equals $\text{dim} X$, thus confirming that $\mathcal{R}(T) = Y$.

Conclusion:: The Rank-Nullity Theorem in this scenario confirms that the linear operator $T$ is invertible if and only if it is surjective. When the domain and codomain are finite-dimensional vector spaces of equal dimension, surjectivity implies injectivity, which is integral to establishing the existence of an inverse operator $T^{-1}$.

Problem 15. We are tasked with proving that the range $\mathcal{R}(T)$ of a linear operator $T$ defined on the vector space $X$ of all real-valued functions with derivatives of all orders is the entirety of $X$. However, we must also demonstrate that the inverse $T^{-1}$ does not exist. This is to be contrasted with Problem 14.

Showing that $\mathcal{R}(T)$ is all of $X$:: Any function $y(t)$ in $X$ can be expressed as the derivative of another function in $X$, as the space includes functions with derivatives of all orders. We can take an antiderivative of $y(t)$ to find a function $x(t)$ in $X$ whose derivative is $y(t)$, that is, $x'(t) = y(t)$. Since the space of functions is closed under integration, this antiderivative $x(t)$ is also in $X$. This demonstrates that for every $y(t)$ in $X$, there exists an $x(t)$ in $X$ such that $T(x(t)) = y(t)$, confirming that $\mathcal{R}(T)$ is all of $X$.
Showing that $T^{-1}$ does not exist:: An inverse operator $T^{-1}$ would map a function $y(t)$ to a function $x(t)$ such that $T(x(t)) = y(t)$. However, the process of taking an antiderivative is not unique due to the constant of integration. Hence, $T$ is not injective, as multiple functions in $X$ can map to the same function under $T$. Since injectivity is a necessary condition for the existence of an inverse, $T^{-1}$ does not exist.
Comparison with Problem 14 and Comments:: Problem 14 involves a finite-dimensional vector space, where surjectivity implies invertibility. In contrast, Problem 15 deals with an infinite-dimensional vector space of smooth functions, where surjectivity is not sufficient for invertibility. The non-uniqueness of the antiderivatives prevents $T$ from being injective, unlike in finite dimensions, where surjectivity implies injectivity due to the Rank-Nullity Theorem.
Conclusion:: Despite $\mathcal{R}(T)$ covering all of $X$, the non-uniqueness of the antiderivative, due to the constant of integration, prevents $T$ from being injective, thus precluding the existence of $T^{-1}$. This example underscores a significant distinction between linear operators in finite-dimensional spaces and those in infinite-dimensional spaces.

Kreyszig 2.5 Compactness and Finite Dimension

Lucy Nowacki

2023-11-17 20:30

Problem 1. Show that $\mathbb{R}^n$ and $\mathbb{C}^n$ are not compact.

Solution

To show that $\mathbb{R}^n$ and $\mathbb{C}^n$ are not compact, we can utilize the Heine-Borel theorem, which characterizes compact subsets of $\mathbb{R}^n$. The theorem states that a subset of $\mathbb{R}^n$ is compact if and only if it is closed and bounded.

For $\mathbb{R}^n$:

Closedness: By definition, $\mathbb{R}^n$ is closed because it contains all its limit points; every convergent sequence in $\mathbb{R}^n$ has a limit that is also in $\mathbb{R}^n$.
Boundedness: However, $\mathbb{R}^n$ is not bounded. To see this, consider the sequence $\{(x_k)\}_{k=1}^{\infty}$ where $x_k = (k, 0, 0, \ldots, 0)$ in $\mathbb{R}^n$. This sequence has no upper bound within $\mathbb{R}^n$ because for any given $M \in \mathbb{R}$, there exists an $N$ such that for all $n > N$, $x_n > M$.

Thus, since $\mathbb{R}^n$ is not bounded, it cannot be compact according to the Heine-Borel theorem.

For $\mathbb{C}^n$:

Closedness: $\mathbb{C}^n$ is also closed because it includes all its limit points; it is the entire space of $n$-tuples of complex numbers.
Boundedness: We can show that $\mathbb{C}^n$ is not bounded in a similar manner to $\mathbb{R}^n$. Consider the sequence of complex numbers $\{(z_k)\}_{k=1}^{\infty}$ where $z_k = (k, 0, \ldots, 0)$ in $\mathbb{C}^n$, where $k$ represents the complex number $k + 0i$. This sequence, too, has no bound within $\mathbb{C}^n$ for the same reasons as in $\mathbb{R}^n$.

Thus, $\mathbb{C}^n$ is not bounded and, hence, not compact.

Note that $\mathbb{C}^n$ is isomorphic to $\mathbb{R}^{2n}$ since each complex number corresponds to a pair of real numbers (the real and imaginary parts). Therefore, the non-compactness of $\mathbb{C}^n$ follows from the non-compactness of $\mathbb{R}^{2n}$.

In conclusion, neither $\mathbb{R}^n$ nor $\mathbb{C}^n$ is compact because, although both are closed, neither is bounded.

The detailed explanation of why the sequence $\{(x_k)\}_{k=1}^{\infty}$ with $x_k = (k, 0, 0, \ldots, 0)$ in $\mathbb{R}^n$ has no upper bound is as follows:

For any chosen real number $M$, no matter how large, there exists a natural number $N$ such that for all $k > N$, the value of $k$ is greater than $M$. This means that the first component of the vector $x_k$ is greater than $M$. The formal expression of this statement is:

\begin{equation*} \forall M \in \mathbb{R}, \exists N \in \mathbb{N} : \forall k > N, x_k > M \end{equation*}

This expression states that for any real number $M$ one might consider as a potential upper bound, there exists a point in the sequence, beyond the $N$ th term, where the elements of the sequence exceed $M$. Thus, for any $M$ in $\mathbb{R}$, we can find elements in the sequence $\{(x_k)\}$ that are larger than $M$, demonstrating that the sequence does not have an upper bound in $\mathbb{R}^n$.

The absence of an upper bound implies that the sequence is unbounded. According to the Heine-Borel theorem, a necessary condition for a subset of $\mathbb{R}^n$ to be compact is that it must be bounded. Since the sequence $\{(x_k)\}$ is unbounded in $\mathbb{R}^n$, it follows that $\mathbb{R}^n$ itself is not compact as it fails to satisfy the boundedness condition required by the theorem.

Problem 2. Show that a discrete metric space $X$ consisting of infinitely many points is not compact.

Solution

To prove that an infinite discrete metric space $X$ is not compact, we use the definition of compactness in metric spaces. A metric space is compact if every open cover has a finite subcover.

In a discrete metric space, the metric is defined such that $d(x, x) = 0$ for all $x \in X$, and $d(x, y) = 1$ for all $x \neq y$. This implies that each point in $X$ is isolated from every other point. We can then consider an open cover of $X$ consisting of the open balls $\{B(x, \frac{1}{2})\}$ for each $x \in X$. Each of these balls is indeed an open set because it contains no points other than its center.

Since $X$ contains infinitely many points, the collection $\{B(x, \frac{1}{2})\}$ is an infinite cover for $X$. If $X$ were compact, there would exist a finite subcover that still covers $X$. However, this is not possible because each open ball in our cover contains exactly one point of $X$ and no two balls contain the same point. Thus, no finite collection of these balls can cover the entirety of $X$.

Consequently, there is no finite subcover possible for the cover $\{B(x, \frac{1}{2})\}$, which means that the discrete metric space $X$ cannot be compact.

Problem 3. Give examples of compact and noncompact curves in the plane $\mathbb{R}^2$.

Solution

Compact Curves:

Unit Circle: The set of all points $(x, y)$ such that $x^2 + y^2 = 1$. This curve is closed and bounded.
Square: The boundary defined by the set of all points $(x, y)$ with vertices at $(\pm1, \pm1)$. It is a closed and bounded shape.
Triangle: The boundary formed by connecting points $(0, 0)$, $(1, 0)$, and $(0, 1)$. Each edge is a closed line segment, making the whole triangle compact.
Closed Disk: The set of all points $(x, y)$ satisfying $x^2 + y^2 \leq r^2$ for a fixed $r$. This includes all points within and on the boundary, constituting a closed and bounded set.

Noncompact Curves:

Ray: The set of points $(x, y)$ forming a ray extending from the origin indefinitely, like $\{(t, t) | t \geq 0\}$. This curve is unbounded.
Hyperbola: The set of points $(x, y)$ satisfying $xy = 1$. This curve extends to infinity in all directions.
Infinite Line: A line like $y = x$ that extends without bound in both directions.
Logarithmic Spiral: Defined by the polar equation $r = e^{\theta}$, this curve winds away from the origin infinitely.

These examples illustrate the distinction in $\mathbb{R}^2$ between compact sets, which are both closed and bounded, and noncompact sets, which are not closed, not bounded, or both.

Problem 4. Show that for an infinite subset $M$ in the space $s$ to be compact, it is necessary that there are numbers $\gamma_1, \gamma_2, \ldots$ such that for all $x = (\xi_k(x)) \in M$ we have $|\xi_k(x)| \leq \gamma_k$. (It can be shown that the condition is also sufficient for the compactness of $M$.)

Solution

To show the necessity of the condition for compactness in the space $s$, as defined in the problem statement, we need to demonstrate that for any sequence in a compact subset $M$ of $s$, the elements of the sequence must be uniformly bounded by some sequence $\{\gamma_k\}$.

Consider the space $s$ as defined in the second image, where the metric $d$ is given by:

\begin{equation*} d(x, y) = \sum_{j=1}^{\infty} \frac{1}{2^j} \frac{| \xi_j - \eta_j |}{1 + | \xi_j - \eta_j |} \end{equation*}

with $x = (\xi_k)$ and $y = (\eta_k)$ being elements of $s$.

The metric $d$ is designed such that the "distance" it measures is the sum of a series of terms, each of which is a fraction of the absolute difference between the components of two elements $x$ and $y$, scaled by $1/2^j$. This series converges because each term is less than or equal to $1/2^j$, and $\sum 1/2^j$ is a convergent geometric series.

Now let's consider the subset $M \subset s$. If $M$ is compact, then from the definition of compactness in metric spaces, every sequence in $M$ has a convergent subsequence. For a sequence $\{x^{(n)}\}$ with $x^{(n)} = (\xi_k^{(n)})$ in $M$, its convergence in $s$ means that for every $\epsilon > 0$, there exists an $N$ such that for all $m, n > N$, $d(x^{(m)}, x^{(n)}) < \epsilon$.

For compactness, we require that this sequence has a convergent subsequence in $s$. Due to the definition of the metric, this means that for each $j$, the sequence $\{\xi_j^{(n)}\}$ must be Cauchy, and hence bounded. Therefore, there must exist a bound $\gamma_j$ for each $j$ such that $|\xi_j^{(n)}| \leq \gamma_j$ for all $n$.

To see why the sequence $\{\xi_j^{(n)}\}$ must be bounded, suppose it were not. If for some $j$, $\{\xi_j^{(n)}\}$ were unbounded, then we could choose $\epsilon$ small enough (specifically $\epsilon < 1/2^j$) and a subsequence $\{x^{(n_k)}\}$ such that $|\xi_j^{(n_k)} - \xi_j^{(n_{k+1})}| > 1$ for all $k$, which would imply $d(x^{(n_k)}, x^{(n_{k+1})})$ would not converge to 0, contradicting the compactness of $M$.

Therefore, for $M$ to be compact, it is necessary that there exist numbers $\gamma_1, \gamma_2, \ldots$ such that for all $x = (\xi_k(x)) \in M$ we have $|\xi_k(x)| \leq \gamma_k$. This condition is known to be sufficient as well for the compactness of $M$ in the space $s$, as a uniformly bounded and equicontinuous sequence in $s$ will have a convergent subsequence by the Arzelà-Ascoli theorem.

Problem 5. A metric space $X$ is said to be locally compact if every point of $X$ has a compact neighborhood. Show that $\mathbb{R}$ and $\mathbb{C}$, and more generally, $\mathbb{R}^n$ and $\mathbb{C}^n$ are locally compact.

Solution

To prove that $\mathbb{R}$, $\mathbb{C}$, and by extension $\mathbb{R}^n$ and $\mathbb{C}^n$, are locally compact, we utilize the following concepts:

A space is locally compact if each point has a compact neighborhood.
A set is compact if every open cover has a finite subcover, which, in a metric space, translates to the set being closed and bounded, as per the Heine-Borel theorem.
A neighborhood of a point includes an open set containing that point.

Detailed Proofs of Local Compactness

Detailed Proof for $\mathbb{R}$:

For any point $x \in \mathbb{R}$, we can identify a neighborhood around $x$, such as the open interval $(x - \epsilon, x + \epsilon)$ for some $\epsilon > 0$. The closure of this interval is the closed interval $[x - \epsilon, x + \epsilon]$, which encompasses its limit points and is delimited by the points $x - \epsilon$ and $x + \epsilon$. By the Heine-Borel theorem, as $[x - \epsilon, x + \epsilon]$ is both closed and bounded within $\mathbb{R}$, it is compact. Therefore, every point $x$ possesses a compact neighborhood in $\mathbb{R}$, affirming its local compactness.

Detailed Proof for $\mathbb{C}$:

Upon recognizing $\mathbb{C}$ as topologically equivalent to $\mathbb{R}^2$, for any $z \in \mathbb{C}$, we consider the open disk centered at $z$, denoted $D(z, \epsilon)$ for some $\epsilon > 0$. This disk serves as a neighborhood of $z$. The closure of $D(z, \epsilon)$, which consists of all points inside and on the boundary of the disk, constitutes a closed set. It is also bounded by the circumference of the disk. Thus, by the Heine-Borel theorem, the closure of $D(z, \epsilon)$ is compact in $\mathbb{C}$, corroborating its local compactness.

Detailed Proof for $\mathbb{R}^n$:

For an arbitrary point $x \in \mathbb{R}^n$, we select the open ball $B(x, \epsilon)$ centered at $x$ with a radius $\epsilon > 0$. The closure of this ball, $\overline{B(x, \epsilon)}$, which includes all points within and on the periphery of the sphere, is closed. Moreover, it is bounded as all points lie within a maximum distance $\epsilon$ from $x$. Consequently, $\overline{B(x, \epsilon)}$ is compact as per the Heine-Borel theorem, demonstrating that $\mathbb{R}^n$ is locally compact since $x$ has a compact neighborhood.

Detailed Proof for $\mathbb{C}^n$:

Given that $\mathbb{C}^n$ aligns with $\mathbb{R}^{2n}$ topologically, each complex coordinate having a real and imaginary part, for any point $z \in \mathbb{C}^n$, an open ball in $\mathbb{R}^{2n}$ can be centered at the point corresponding to $z$ with a radius $\epsilon > 0$. The closure of this ball is also a closed and bounded set in $\mathbb{R}^{2n}$, and hence compact. This provides every point in $\mathbb{C}^n$ with a compact neighborhood, certifying local compactness.

Each proof underlines the principle that local compactness is evidenced by the ability to encase any point within a closed and bounded (thus compact) subset, meeting the local compactness criterion.

Problem 6. Show that a compact metric space $X$ is locally compact.

Proof

Let $X$ be a compact metric space. We aim to prove that for every point $x$ in $X$, there exists a compact neighborhood around $x$. In metric spaces, we have the luxury of using open balls as basic neighborhoods. For an arbitrary $x \in X$ and for any positive real number $\epsilon$, the open ball $B(x, \epsilon)$ is an open set containing $x$.

Due to the compactness of $X$, any open cover has a finite subcover. Consider the collection of open balls $\{B(x, \frac{1}{n})\}_{n \in \mathbb{N}}$, which is indeed an open cover of $X$. By the compactness of $X$, there exists a finite subcover of this collection, implying the existence of some $N \in \mathbb{N}$ such that $B(x, \frac{1}{N})$ is contained within an open set that is part of the finite subcover of $X$.

The closure of $B(x, \frac{1}{N})$, denoted by $\overline{B(x, \frac{1}{N})}$, is a closed subset of the compact space $X$. By the properties of compact spaces, closed subsets of compact spaces are also compact. Thus, $\overline{B(x, \frac{1}{N})}$ is compact and contains the open ball $B(x, \frac{1}{N})$, which is a neighborhood of $x$. This establishes that $x$ has a compact neighborhood.

Since the choice of $x$ in $X$ was arbitrary, and we have demonstrated that each point has a compact neighborhood, it follows that the metric space $X$ is locally compact.

This detailed proof leverages the Heine-Borel theorem and the properties of open and closed sets in metric spaces to demonstrate the local compactness of a compact metric space.

Problem 7. If $\dim Y < \infty$ in Riesz's lemma 2.5-4, show that one can even choose $\theta = 1$.

Proof Using Riesz's Lemma

Let us consider Riesz's lemma in the context where $Y$ is a finite-dimensional subspace of $Z$, a subspace of a normed space $X$. Riesz's lemma asserts that given a closed subspace $Y$ which is a proper subset of $Z$, for every real number $\theta$ in the interval (0,1), there exists a $z \in Z$ such that $\|z\| = 1$ and $\|z - y\| \geq \theta$ for all $y \in Y$.

Suppose $v \in Z \setminus Y$ and denote the distance from $v$ to $Y$ by $a$, where $a = \inf\{\|v - y\| : y \in Y\}$. Since $Y$ is closed and finite-dimensional, it is also a known fact that closed balls in $Y$ are compact. Thus, the infimum $a$ is actually achieved by some $y_0 \in Y$. We have $\|v - y_0\| = a$ and $a > 0$ because $v$ is not in $Y$.

We proceed to define $z$ as the normalization of $v - y_0$, so $z = c(v - y_0)$ where $c = \frac{1}{\|v - y_0\|} = \frac{1}{a}$. This normalization ensures that $\|z\| = 1$.

For any $y \in Y$, we can express $y$ as $y_1 = y_0 + c^{-1}y$, with $y_1$ also in $Y$ due to the vector space properties of $Y$. The norm $\|z - y\|$ is then $\|c(v - y_0) - y\| = c\|v - y_1\|$. Given that $v$ is closest to $y_0$ by the very definition of $a$, it follows that $c\|v - y_1\| \geq c\|v - y_0\| = c \cdot a = 1$. Consequently, $\|z - y\| \geq 1$ for all $y \in Y$.

Since the choice of $y$ was arbitrary, this implies that $\|z - y\| \geq \theta$ for any $\theta \leq 1$. Thus, when $Y$ is finite-dimensional, it is permissible to select $\theta = 1$ in Riesz's lemma. The lemma is thereby applicable for $\theta = 1$, which is due to the structure of the normed space and the finite-dimensionality of $Y$, guaranteeing the existence of such a $z$ with the specified characteristics.

A little different approach

To show that $\theta=1$ can be chosen in Riesz's lemma under the condition that the dimension of $Y$ is finite, we will analyze the proof of Riesz's lemma and demonstrate that if $Y$ has a finite dimension, then the distance from any $v \in Z \setminus Y$ to $Y$ can be made equal to 1, which implies that $\theta$ can be taken as 1.

Riesz's Lemma states that for any two subspaces $Y$ and $Z$ of a normed space $X$, with $Y$ being closed and a proper subset of $Z$, for every $\theta$ in the interval (0,1), there exists a $z \in Z$ such that $\|z\| = 1$ and $\|z - y\| \geq \theta$ for all $y \in Y$.

Proof Using Riesz's Lemma

Suppose $Y$ is a finite-dimensional subspace of $Z$. By the properties of finite-dimensional normed spaces, we know that closed balls in $Y$ are compact. Let $v \in Z \setminus Y$ and denote its distance from $Y$ by $a$, that is, $a = \inf\{\|v - y\| : y \in Y\}$. Since $Y$ is closed and $v$ is not in $Y$, it follows that $a > 0$.

In the finite-dimensional subspace $Y$, due to compactness, the infimum $a$ is actually attained for some $y_0 \in Y$. That is, there exists a $y_0 \in Y$ such that $\|v - y_0\| = a$. Now, define $z$ as a scaled vector of $v - y_0$, specifically $z = c(v - y_0)$, where $c = \frac{1}{\|v - y_0\|} = \frac{1}{a}$. This scaling ensures that $\|z\| = 1$.

Now, consider any $y \in Y$. We examine the distance from $z$ to $y$. Note that any $y$ can be written as $y_1 = y_0 + c^{-1}y$, where $y_1 \in Y$ due to $Y$ being a vector space and thus closed under addition and scalar multiplication. We calculate:

\begin{equation*} \|z - y\| = \|c(v - y_0) - y\| = c\|v - y_0 - c^{-1}y\| = c\|v - y_1\|. \end{equation*}

Because $v$ is closer to $y_0$ than any other point in $Y$ by the definition of $y_0$, it follows that $c\|v - y_1\| \geq c\|v - y_0\| = c \cdot a = 1$. Therefore, for all $y \in Y$, $\|z - y\| \geq 1$, which by the choice of our $z$ implies $\|z - y\| \geq \theta$ for any $\theta \leq 1$. Hence, in the case where $Y$ has finite dimension, we can choose $\theta = 1$ in Riesz's lemma.

This shows that the lemma is not only true for any $\theta$ in the open interval (0,1) but can be strengthened to include $\theta = 1$ when the subspace $Y$ is of finite dimension. The lemma holds trivially for $\theta = 1$ because the normed space structure and finite dimensionality ensure the existence of such $z$ with the required properties.

Problem 8. In Problem 7, Section 2.4, show directly (without using 2.4-5) that there is an $a > 0$ such that $a\|x\|_2 \leq \|x\|$. (Use 2.5-7.)

Show directly that there is a constant $a > 0$ such that $a\|x\|_2 \leq \|x\|$ for a normed finite-dimensional vector space $X$ without using the theorem on equivalent norms.

Proof

Let $X$ be a finite-dimensional vector space equipped with two norms $\|\cdot\|$ and $\|\cdot\|_2$, where $\|\cdot\|_2$ is the standard Euclidean norm. Consider the unit sphere $S$ in $X$ with respect to $\|\cdot\|_2$, that is, $S = \{x \in X : \|x\|_2 = 1\}$.

Since $X$ is finite-dimensional, $S$ is compact with respect to $\|\cdot\|_2$. Now, define a mapping $T: S \to \mathbb{R}$ by $T(x) = \|x\|$ for all $x \in S$. This mapping is continuous because the norms are continuous functions, and by the Corollary 2.5-7, since $S$ is compact, $T$ attains its maximum and minimum values on $S$.

Let $m = \min \{T(x) : x \in S\}$. Since all norms on a finite-dimensional space are positive definite, we have $m > 0$ because if $m = 0$, there would exist an $x \in S$ such that $\|x\| = 0$, which implies $x = 0$, contradicting the fact that $x$ is on the unit sphere $S$.

Now, for any $x \in X$ with $x \neq 0$, we can write $x$ as $x = \|x\|_2 \cdot \left(\frac{x}{\|x\|_2}\right)$. Notice that $\frac{x}{\|x\|_2} \in S$, hence $\left\|\frac{x}{\|x\|_2}\right\| \geq m$. Multiplying both sides by $\|x\|_2$, we get $\|x\| = \|x\|_2 \cdot \left\|\frac{x}{\|x\|_2}\right\| \geq m \|x\|_2$.

Set $a = m$, which is the positive minimum value of $T$ on the compact set $S$. We have established that $a\|x\|_2 \leq \|x\|$ for all $x \in X$, where $a > 0$.

This completes the proof, establishing the existence of a positive constant $a$ that provides a lower bound for the ratio of the norms $\|\cdot\|$ and $\|\cdot\|_2$ on a finite-dimensional vector space $X$.

Problem.9 If $X$ is a compact metric space and $M \subseteq X$ is closed, show that $M$ is compact.

Proof

Consider $X$, a metric space endowed with a metric $d$, and let $M \subseteq X$ be a closed subset. Our objective is to substantiate the compactness of $M$ predicated on the compactness of the ambient space $X$.

Compactness in a metric space is defined such that a subset $M$ of $X$ is compact if every open cover of $M$ admits a finite subcover. Let us take an arbitrary open cover $\mathcal{O}$ of $M$, constituted by a family of open sets in $X$ such that every point in $M$ resides within some member of $\mathcal{O}$.

Given the closure of $M$ in $X$, its complement $X \setminus M$ is open in $X$. Enhance the open cover $\mathcal{O}$ of $M$ by annexing the open set $X \setminus M$, thereby generating a new open cover $\mathcal{O}'$ that extends over the entirety of $X$, for it encompasses every point in $X$.

The compact nature of $X$ necessitates that the open cover $\mathcal{O}'$ of $X$ must possess a finite subcover, designated as $\mathcal{O}''$. This finite subcover aptly covers all points in $X$, and by extension, all points in $M$.

From the finite subcover $\mathcal{O}''$, excise the set $X \setminus M$ should it be included. The residual compendium of sets within $\mathcal{O}''$ thus forms a finite subcollection originating from the initial cover $\mathcal{O}$, which adequately covers $M$. Consequently, $M$ is furnished with a finite subcover from its open cover $\mathcal{O}$.

Ergo, $M$ aligns with the compactness criterion. We have thus rigorously delineated, utilizing the axioms of metric topology alongside the attributes of closed sets nestled within compact spaces, that a closed subset $M$ of a compact metric space $X$ is necessarily compact.

This consummates the proof, and we have methodically demonstrated, consistent with the tenets of metric space theory and the inherent properties of closed subsets within compact spaces, that a closed subset $M$ of a compact metric space $X$ must itself exhibit compactness.

Kreyszig 2.4 Finite Dimensional Normed Spaces and Subspaces

Lucy Nowacki

2023-11-15 19:46

Problem 1. Give examples of subspaces of $\ell^\infty$ and $\ell^2$ which are not closed.

Solution:

In topology and functional analysis, a subspace of a topological space is considered closed if it contains all its limit points. For a subspace to be not closed, there must exist sequences (or nets) within the subspace that converge to a point outside the subspace.

For $\ell^\infty$ (Bounded Sequences):

Example of a Non-Closed Subspace:

Consider the subspace of $\ell^\infty$ consisting of sequences that converge to 0.
Example Sequence: Define $x_n = (0, 0, \ldots, 0, \frac{1}{n}, 0, \ldots)$, where each $x_n$ has $\frac{1}{n}$ at the $n$-th position and 0 everywhere else.
Proof of Non-Closure: - As $n$ increases, each $x_n$ converges to the zero sequence (all terms are zero), which is in the subspace. - The term-wise limit of $(x_n)$ as $n \to \infty$ is the harmonic sequence $y = (1, \frac{1}{2}, \frac{1}{3}, \ldots)$. - Although each $x_n$ converges to 0, the limit sequence $y$ does not converge to 0. - Since $y$ is not in the subspace (it doesn't converge to 0) but is a limit of sequences in the subspace, this demonstrates that the subspace is not closed.

For $\ell^2$ (Square-Summable Sequences):

Example of a Non-Closed Subspace:

Consider the subspace of $\ell^2$ consisting of sequences that converge to 0.
Example Sequence: Let $x_n = (1, \frac{1}{2}, \ldots, \frac{1}{n}, 0, 0, \ldots)$, where the first $n$ terms are the first $n$ terms of the harmonic sequence, and the rest are 0.
Proof of Non-Closure: - Each $x_n$ is square-summable and converges to 0, making them elements of the subspace. - The term-wise limit of $(x_n)$ as $n \to \infty$ is $y = (1, \frac{1}{2}, \frac{1}{3}, \ldots)$. - While each $x_n$ is square-summable, the limit sequence $y$ is not square-summable. The sum of the squares of the terms of $y$, $\sum \frac{1}{n^2}$, diverges. - Since $y$ is not in $\ell^2$ (not square-summable) but is the limit of sequences in the subspace, this demonstrates that the subspace is not closed.

These examples illustrate that not all subspaces in $\ell^\infty$ and $\ell^2$ are closed. A subspace is not closed if it does not contain the limits of all convergent sequences within it. In $\ell^\infty$, the subspace of sequences converging to 0 fails to include the harmonic sequence, which is a limit point. Similarly, in $\ell^2$, the subspace of sequences converging to 0 does not include the harmonic sequence, which is not square-summable. This absence of limit points in the respective subspaces proves they are not closed.

These examples illustrate the principle of closed subspaces and demonstrate how sequences within these subspaces and their corresponding limit points can indicate whether a subspace is closed.

Problem:

Consider the space $\ell^\infty$ which consists of all bounded sequences of real numbers, and the space $\ell^2$ which consists of all square-summable sequences. We are tasked with demonstrating that certain subspaces of these spaces are not closed.

Solution:

The distinction between the sequences $\left( \frac{1}{n} \right)$ and $\left( \frac{1}{n^2} \right)$ in the context of closed subspaces is significant.

The sequence $\left( \frac{1}{n} \right)$ has terms that approach 0 as $n$ grows, but it does not actually converge to 0. The terms get infinitesimally small but never reach a limit as a sequence. This sequence does not meet the criterion for sequences in a subspace $S \subset \ell^\infty$ that we consider, which is the set of all sequences that converge to 0.

On the other hand, the sequence $\left( \frac{1}{n^2} \right)$ does converge to 0. For every $\epsilon > 0$, there is an $N$ such that for all $n > N$, $\left| \frac{1}{n^2} \right| < \epsilon$. This means the sequence $\left( \frac{1}{n^2} \right)$ meets the criterion for inclusion in $S$.

When we consider the limit of a sequence of sequences from $S$, if the limit sequence is not in $S$, it shows that $S$ is not closed. For the sequence $\left( \frac{1}{n} \right)$, we can demonstrate this by considering a sequence of sequences $(x_n)$ where each $x_n$ consists of the first $n$ terms of $\left( \frac{1}{n} \right)$ followed by zeros. The term-wise limit of this sequence of sequences as $n$ approaches infinity is the sequence $\left( \frac{1}{n} \right)$, which is bounded and therefore an element of $\ell^\infty$ but not in $S$ since it does not converge to 0, hence showing that $S$ is not closed in $\ell^\infty$.

In contrast, any sequence of sequences from $S$ where each sequence converges to 0 will have a limit that also converges to 0. This is due to the more rapid convergence of the terms $\left( \frac{1}{n^2} \right)$ compared to $\left( \frac{1}{n} \right)$. Thus, the sequence $\left( \frac{1}{n^2} \right)$ is not suitable for constructing a counterexample of a non-closed subspace in $\ell^\infty$, because its limit behavior is consistent with sequences in $S$.

Proofs of convergence

\begin{equation*} \varepsilon \text{-Proof that } \frac{1}{n} \text{ Does Not Converge to 0:} \end{equation*}

Suppose for contradiction that $\frac{1}{n}$ converges to 0. Then, for any $\varepsilon > 0$, there should exist an $N \in \mathbb{N}$ such that for all $n > N$, $\left| \frac{1}{n} - 0 \right| < \varepsilon$.

Let's choose $\varepsilon = \frac{1}{2}$. According to our assumption, there should exist an $N$ such that for all $n > N$, $\left| \frac{1}{n} \right| < \frac{1}{2}$.

However, no matter how large $N$ is chosen, there will always be an $n = N + 1$ such that $\frac{1}{n} = \frac{1}{N + 1}$, which is not less than $\frac{1}{2}$ for sufficiently large $N$. This contradicts the assumption that $\frac{1}{n}$ converges to 0.

\begin{equation*} \varepsilon \text{-Proof that } \frac{1}{n^2} \text{ Converges to 0:} \end{equation*}

To prove that $\frac{1}{n^2}$ converges to 0, we must show that for any $\varepsilon > 0$, there exists an $N \in \mathbb{N}$ such that for all $n > N$, $\left| \frac{1}{n^2} - 0 \right| < \varepsilon$.

Let $\varepsilon > 0$ be given. We want to find an $N$ such that if $n > N$, then $\frac{1}{n^2} < \varepsilon$.

Since $\frac{1}{n^2}$ is decreasing as $n$ increases, we can solve the inequality $\frac{1}{n^2} < \varepsilon$ for $n$. We get $n^2 > \frac{1}{\varepsilon}$, and thus $n > \sqrt{\frac{1}{\varepsilon}}$.

Therefore, if we choose $N$ to be any integer greater than $\sqrt{\frac{1}{\varepsilon}}$, then for all $n > N$, $\left| \frac{1}{n^2} \right| < \varepsilon$. This proves that $\frac{1}{n^2}$ converges to 0.

Problem Statement:

Discuss examples and counterexamples for non-closed and closed subspaces in the spaces $\ell^\infty$ and $\ell^2$.

Solution:

Example for $\ell^\infty$ (Non-closed Subspace):

Subspace Definition: Consider the subspace $S \subset \ell^\infty$ consisting of sequences that converge to 0.
Example Sequence: $x_n = (0, 0, \ldots, 0, \underbrace{\frac{1}{n}}_{n\text{-th position}}, 0, 0, \ldots)$. As $n \to \infty$, each $x_n$ converges to the zero sequence, which is in $S$.
Limit Sequence: The term-wise limit of $(x_n)$ as $n \to \infty$ is $y = (1, \frac{1}{2}, \frac{1}{3}, \ldots)$, the harmonic sequence.
Proof: Since $y$ does not converge to 0, $y \notin S$. But $y$ is the limit of sequences in $S$. Therefore, $S$ does not contain all its limit points, showing it is not closed.

Counterexample for $\ell^\infty$ (Closed Subspace):

Subspace Definition: Consider the subspace $T \subset \ell^\infty$ consisting of all sequences that are eventually constant (i.e., from some point onward, all terms are the same).
Example Sequence: $x_n = (a, a, \ldots, a, \underbrace{b}_{n\text{-th position}}, b, b, \ldots)$, where $a, b$ are constants.
Limit Sequence: The limit of $(x_n)$ as $n \to \infty$ is a constant sequence $(a, a, a, \ldots)$.
Proof: Since the limit sequence is constant, it is in $T$. Thus, $T$ contains all its limit points and is closed.

Example for $\ell^2$ (Non-closed Subspace):

Subspace Definition: Consider the subspace $U \subset \ell^2$ consisting of sequences converging to 0.
Example Sequence: $x_n = (1, \frac{1}{2}, \ldots, \frac{1}{n}, 0, 0, \ldots)$. Each $x_n$ is in $U$ since it converges to 0 and is square-summable.
Limit Sequence: The term-wise limit of $(x_n)$ as $n \to \infty$ is the sequence $y = (1, \frac{1}{2}, \frac{1}{3}, \ldots)$, which is not square-summable.
Proof: The sequence $y$ is the harmonic sequence, which is not in $U$ as it is not square-summable. However, it is the limit of sequences in $U$. Hence, $U$ does not contain all its limit points, showing it is not closed.

Counterexample for $\ell^2$ (Closed Subspace):

Subspace Definition: Consider the subspace $V \subset \ell^2$ consisting of all sequences that are eventually zero.
Example Sequence: $x_n = (a_1, a_2, \ldots, a_k, 0, 0, \ldots)$ with only a finite number of non-zero terms.
Limit Sequence: Any sequence of sequences in $V$ will have a limit sequence that is also eventually zero.
Proof: Since the limit sequence will be eventually zero, it is in $V$. Thus, $V$ contains all its limit points and is closed.

The essence of the original problem is to understand and provide examples of non-closed subspaces within the mathematical spaces $\ell^\infty$ and $\ell^2$.

In the Context of $\ell^\infty$: This space consists of all bounded sequences of real numbers. The goal was to identify a subspace of $\ell^\infty$ that is not closed. A subspace is closed if it contains all its limit points, meaning every sequence within the subspace that converges has its limit also within the subspace. A non-closed subspace would be one where you can find a sequence (or sequences) within the subspace that converges to a limit not included in the subspace.
In the Context of $\ell^2$: This space is made up of all square-summable sequences of real numbers. Similar to $\ell^\infty$, the task was to find a subspace of $\ell^2$ that does not contain all its limit points, thus making it a non-closed subspace.

In both cases, the challenge lies in identifying specific sequences and showing through examples (and counterexamples) how their behavior within these spaces illustrates the concept of closed versus non-closed subspaces. This understanding is fundamental in functional analysis and topology, as it provides insight into the behavior of sequences in different mathematical spaces and the properties of these spaces.

Applying the concepts of closed and non-closed subspaces to real-life scenarios can help in understanding these abstract mathematical ideas in a more tangible way. Let's use some analogies:

Closed Subspaces - A "Complete" Library

Analogy: Think of a closed subspace as a library that contains every possible book (limit points) on a specific topic. For instance, a library dedicated to "World History" contains every book ever written on the subject, including those that are the culmination of earlier works (analogous to limit points of sequences).
Real-Life Example: When a researcher looks for information on a particular historical event, they will find all relevant books in this library, including those that have evolved from earlier research. The library is "closed" in the sense that it leaves no gaps in this field of knowledge.

Non-Closed Subspaces - An Incomplete Music Playlist

Analogy: A non-closed subspace is like a music playlist meant to include every song from a specific genre but misses some key tracks. Imagine a playlist intended to contain every jazz song ever composed, but it lacks some essential pieces that are considered evolutions or variations of earlier jazz songs.
Real-Life Example: A jazz enthusiast looking for a comprehensive collection of jazz music in this playlist will find it lacking. Some songs that should be there, being logical continuations or variations of existing songs (like limits of sequences), are missing. This playlist is "non-closed" as it doesn't encapsulate the complete range of jazz music.

For $\ell^\infty$ - Temperature Readings

Analogy: Consider a weather monitoring system that tracks temperature but is set to record only up to a certain threshold. This system is analogous to a non-closed subspace in $\ell^\infty$ if it fails to record extreme temperature spikes that surpass its set limit, even though such spikes are the logical continuations (limits) of the recorded data.
Real-Life Example: A meteorological station records temperatures but stops logging data beyond a certain point. During an unusual thermal event, where temperatures exceed this threshold, the system fails to record these critical data points, thus not "closing" the full spectrum of temperature variations.

For $\ell^2$ - Ecological Studies

Analogy: Imagine an ecological study tracking the population of a specific species over time. If the study is discontinued prematurely, it's like a non-closed subspace in $\ell^2$, failing to include the "limit" of the population trend.
Real-Life Example: Biologists observe a species' population but stop their study after a certain period. The final phase of the study, which could have shown a critical trend (like a limit of a sequence), is missing. This incomplete study doesn't encapsulate the full picture of the population dynamics.

These analogies help to illustrate the concepts of closed and non-closed subspaces in a more concrete and relatable manner.

Problem 2. Determine the largest possible value of $c$ in the inequality

\begin{equation*} \|\alpha_1 x_1 + \ldots + \alpha_n x_n\| \geq c(|\alpha_1| + \ldots + |\alpha_n|) \end{equation*}

from Lemma 2.4-1, for the cases where the space $X$ is $\mathbb{R}^2$ with vectors $x_1 = (1,0), x_2 = (0,1)$ and when $X$ is $\mathbb{R}^3$ with vectors $x_1 = (1,0,0), x_2 = (0,1,0), x_3 = (0,0,1)$.

Solution:

To determine the largest possible value of $c$ in $\mathbb{R}^2$ and $\mathbb{R}^3$, we utilize the lemma on linear combinations which asserts that for any set of linearly independent vectors in a normed space $X$, there exists a $c > 0$ such that for any choice of scalars $\alpha_1, \ldots, \alpha_n$, the inequality

\begin{equation*} \|\alpha_1 x_1 + \ldots + \alpha_n x_n\| \geq c(|\alpha_1| + \ldots + |\alpha_n|) \end{equation*}

holds true.

For $\mathbb{R}^2$:

Given vectors $x_1 = (1,0)$ and $x_2 = (0,1)$, we seek the largest $c$ such that for any scalars $\alpha_1$ and $\alpha_2$:

\begin{equation*} \|(1,0)\alpha_1 + (0,1)\alpha_2\| \geq c(|\alpha_1| + |\alpha_2|) \end{equation*}

The left-hand side simplifies to $\|(\alpha_1, \alpha_2)\|$, which is the Euclidean norm in $\mathbb{R}^2$ and is equal to $\sqrt{\alpha_1^2 + \alpha_2^2}$. The inequality thus becomes:

\begin{equation*} \sqrt{\alpha_1^2 + \alpha_2^2} \geq c(|\alpha_1| + |\alpha_2|) \end{equation*}

To find the largest $c$, we need to maximize the quotient:

\begin{equation*} \frac{\sqrt{\alpha_1^2 + \alpha_2^2}}{|\alpha_1| + |\alpha_2|} \end{equation*}

which reaches its maximum when $\alpha_1 = \alpha_2$, yielding $c = \frac{1}{\sqrt{2}}$.

For $\mathbb{R}^3$:

Given vectors $x_1 = (1,0,0)$, $x_2 = (0,1,0)$, and $x_3 = (0,0,1)$, we aim to find the largest $c$ such that for any scalars $\alpha_1, \alpha_2, \alpha_3$:

\begin{equation*} \|(1,0,0)\alpha_1 + (0,1,0)\alpha_2 + (0,0,1)\alpha_3\| \geq c(|\alpha_1| + |\alpha_2| + |\alpha_3|) \end{equation*}

The left-hand side simplifies to $\|(\alpha_1, \alpha_2, \alpha_3)\|$, which is $\sqrt{\alpha_1^2 + \alpha_2^2 + \alpha_3^2}$. The inequality becomes:

\begin{equation*} \sqrt{\alpha_1^2 + \alpha_2^2 + \alpha_3^2} \geq c(|\alpha_1| + |\alpha_2| + |\alpha_3|) \end{equation*}

Similarly, to maximize the quotient:

\begin{equation*} \frac{\sqrt{\alpha_1^2 + \alpha_2^2 + \alpha_3^2}}{|\alpha_1| + |\alpha_2| + |\alpha_3|} \end{equation*}

the maximum is achieved when $\alpha_1 = \alpha_2 = \alpha_3$, resulting in $c = \frac{1}{\sqrt{3}}$.

Hence, in $\mathbb{R}^2$, the largest possible $c$ is $\frac{1}{\sqrt{2}}$, and in $\mathbb{R}^3$, the largest possible $c$ is $\frac{1}{\sqrt{3}}$.

Problem 3. Show that in Definition 2.4-4 the axioms of an equivalence relation hold. The definition states that a norm $\|\cdot\|$ on a vector space $X$ is said to be equivalent to a norm $\|\cdot\|_0$ on $X$ if there are positive numbers $a$ and $b$ such that for all $x \in X$ we have $a\|x\|_0 \leq \|x\| \leq b\|x\|_0$.

Solution:

To show that the definition of equivalent norms satisfies the axioms of an equivalence relation, we must verify reflexivity, symmetry, and transitivity.

Reflexivity:

For reflexivity, we consider any norm $\|\cdot\|$ on a vector space $X$. A norm is reflexive if it is equivalent to itself. Given the definition of equivalent norms, for all $x \in X$, the condition

\begin{equation*} a\|x\|_0 \leq \|x\| \leq b\|x\|_0 \end{equation*}

must hold for some positive constants $a$ and $b$. When $\|\cdot\|_0$ is the same as $\|\cdot\|$, we can choose $a = b = 1$, thus for all $x \in X$, we have

\begin{equation*} \|x\| \leq \|x\| \leq \|x\|, \end{equation*}

which is trivially true, thus establishing reflexivity.

Symmetry:

For symmetry, assume that $\|\cdot\|$ is equivalent to $\|\cdot\|_0$. This implies the existence of positive constants $a$ and $b$ such that

\begin{equation*} a\|x\|_0 \leq \|x\| \leq b\|x\|_0, \quad \forall x \in X. \end{equation*}

To demonstrate symmetry, we must show that $\|\cdot\|_0$ is also equivalent to $\|\cdot\|$. From the given inequality, we can derive that

\begin{equation*} \frac{1}{b}\|x\| \leq \|x\|_0 \leq \frac{1}{a}\|x\|, \end{equation*}

establishing symmetry by showing that the constants $1/b$ and $1/a$ serve to demonstrate the equivalence in the reverse order.

Transitivity:

\begin{equation*} a\|x\|_0 \leq \|x\| \leq b\|x\|_0 \end{equation*}

and

\begin{equation*} c\|x\|_1 \leq \|x\|_0 \leq d\|x\|_1, \end{equation*}

for all $x \in X$. To establish transitivity, we demonstrate that $\|\cdot\|$ is equivalent to $\|\cdot\|_1$. By combining the inequalities and recognizing that multiplication of inequalities is valid, we obtain

\begin{equation*} ac\|x\|_1 \leq \|x\| \leq bd\|x\|_1, \end{equation*}

confirming that $\|\cdot\|$ is equivalent to $\|\cdot\|_1$ and thus showing transitivity.

These demonstrations confirm that the axioms of an equivalence relation are satisfied by the definition of equivalent norms.

Problem 4 Show that equivalent norms on a vector space $X$ induce the same topology for $X$.

Solution:

Suppose $\|\cdot\|$ and $\|\cdot\|_0$ are equivalent norms on a vector space $X$. By definition, there exists positive constants $a, b > 0$ such that for all $x \in X$, the following inequalities hold:

\begin{equation*} a\|x\|_0 \leq \|x\| \leq b\|x\|_0. \end{equation*}

We aim to show that the identity map $I: (X, \|\cdot\|) \rightarrow (X, \|\cdot\|_0)$ induces the same topology, which requires proving that $I$ is continuous. To this end, consider a point $x_0 \in X$ and an arbitrary $\epsilon > 0$. We choose $\delta = \frac{\epsilon}{a}$ and for all $x$ satisfying $\|x - x_0\| < \delta$, we have:

\begin{equation*} \|x - x_0\|_0 \leq \frac{1}{a}\|x - x_0\| < \frac{a\delta}{a} = \epsilon. \end{equation*}

This inequality shows that every ball in $(X, \|\cdot\|_0)$ centered at $x_0$ with radius $\epsilon$ contains the image under $I$ of a ball in $(X, \|\cdot\|)$ centered at the same point with radius $\delta$. Hence, $I$ is continuous.

Similarly, to prove that the inverse identity map $\bar{I}: (X, \|\cdot\|_0) \rightarrow (X, \|\cdot\|)$ is continuous, we take a point $x_0 \in X$ and an arbitrary $\epsilon > 0$. Setting $\delta = \frac{\epsilon}{b}$ ensures that for all $x$ satisfying $\|x - x_0\|_0 < \delta$, the following holds:

\begin{equation*} \|x - x_0\| \leq b\|x - x_0\|_0 < b\delta = \epsilon. \end{equation*}

Thus, every ball in $(X, \|\cdot\|)$ centered at $x_0$ with radius $\epsilon$ contains the image under $\bar{I}$ of a ball in $(X, \|\cdot\|_0)$ centered at the same point with radius $\delta$, confirming the continuity of the inverse map.

Having established the continuity of both $I$ and $\bar{I}$, we conclude that they are homeomorphisms, showing that the topologies induced by $\|\cdot\|$ and $\|\cdot\|_0$ are indeed the same.

Remark:

The converse is also true, that is, if two norms $\|\cdot\|$ and $\|\cdot\|_0$ on $X$ induce the same topology, they are equivalent norms on $X$.

Solution:

To prove this, we use the definition of equivalent norms that provides constants $a, b > 0$ such that for all $x \in X$:

\begin{equation*} a\|x\|_0 \leq \|x\| \leq b\|x\|_0. \end{equation*}

A sequence $(x_n)$ in $X$ is Cauchy with respect to a norm if for every $\epsilon > 0$, there exists an $N \in \mathbb{N}$ such that for all $m, n > N$, it holds that $\|x_m - x_n\| < \epsilon$.

Cauchy in $(X, \|\cdot\|)$ implies Cauchy in $(X, \|\cdot\|_0)$:

Assume $(x_n)$ is Cauchy in $(X, \|\cdot\|)$. For every $\epsilon > 0$, choose $\delta = a\epsilon$. There exists $N$ such that for all $m, n > N$, we have:

\begin{equation*} \|x_m - x_n\| < \delta. \end{equation*}

Applying the inequality given by the equivalence of norms, we obtain:

\begin{equation*} \|x_m - x_n\|_0 \leq \frac{1}{a}\|x_m - x_n\| < \frac{\delta}{a} = \epsilon. \end{equation*}

Hence, $(x_n)$ is also Cauchy in $(X, \|\cdot\|_0)$.

Cauchy in $(X, \|\cdot\|_0)$ implies Cauchy in $(X, \|\cdot\|)$:

Conversely, if $(x_n)$ is Cauchy in $(X, \|\cdot\|_0)$, for every $\epsilon > 0$, choose $\delta = \frac{\epsilon}{b}$. There exists $N$ such that for all $m, n > N$, it holds that:

\begin{equation*} \|x_m - x_n\|_0 < \delta. \end{equation*}

From the equivalent norm inequality, we get:

\begin{equation*} \|x_m - x_n\| \leq b\|x_m - x_n\|_0 < b\delta = \epsilon. \end{equation*}

Therefore, $(x_n)$ is a Cauchy sequence in $(X, \|\cdot\|)$ as well.

Since we have shown that Cauchy sequences in one normed space are Cauchy in the other and vice versa, we conclude that the Cauchy sequences in both $(X, \|\cdot\|)$ and $(X, \|\cdot\|_0)$ are the same. This conclusion follows from the ability to find a $\delta$ for every $\epsilon$ (and vice versa) satisfying the conditions for a Cauchy sequence in both normed spaces.

Problem 6. Theorem 2.4-5 implies that $\|\cdot\|_2$ and $\|\cdot\|_\infty$ in Problem 8, Section 2.2, are equivalent. Give a direct proof of this fact.

Solution:

By Theorem 2.4-5, on a finite-dimensional vector space $X$, any norm $\|\cdot\|$ is equivalent to any other norm $\|\cdot\|_0$. To prove the equivalence of the $\ell_2$ and $\ell_\infty$ norms, we need to establish two inequalities that hold for all vectors in $\mathbb{R}^n$.

To provide a direct proof that the $\ell_2$ norm (Euclidean norm) and the $\ell_\infty$ norm (maximum norm) are equivalent on a finite-dimensional vector space $X$, we will leverage the theorem from the second image which states that on a finite-dimensional vector space, any norm is equivalent to any other norm.

The $\ell_2$ norm of a vector $x \in \mathbb{R}^n$ is defined as:

\begin{equation*} \|x\|_2 = \sqrt{x_1^2 + x_2^2 + \ldots + x_n^2}, \end{equation*}

and the $\ell_\infty$ norm is defined as:

\begin{equation*} \|x\|_\infty = \max_{1 \leq i \leq n} |x_i|. \end{equation*}

Proof:

Showing :math:`|x|_2 leq c |x|_infty` for some :math:`c > 0`:

By definition, for any $x \in \mathbb{R}^n$, each component $x_i$ satisfies $|x_i| \leq \|x\|_\infty$. Thus,

\begin{equation*} \|x\|_2^2 = x_1^2 + x_2^2 + \ldots + x_n^2 \leq n\|x\|_\infty^2, \end{equation*}

since there are $n$ terms and each $x_i^2 \leq \|x\|_\infty^2$. Taking square roots, we have:

\begin{equation*} \|x\|_2 \leq \sqrt{n}\|x\|_\infty. \end{equation*}

Hence, we can choose $c = \sqrt{n}$, showing the first inequality.

Showing :math:`|x|_infty leq d |x|_2` for some :math:`d > 0`:

Consider the vector $x \in \mathbb{R}^n$ with the largest component in absolute value being $\|x\|_\infty$. Then, by the definition of the Euclidean norm:

\begin{equation*} \|x\|_\infty^2 \leq x_1^2 + x_2^2 + \ldots + x_n^2 = \|x\|_2^2. \end{equation*}

Since the square root is a monotonic function, taking square roots gives:

\begin{equation*} \|x\|_\infty \leq \|x\|_2. \end{equation*}

Here, we can choose $d = 1$, showing the second inequality.

Since we have established both inequalities $\|x\|_2 \leq \sqrt{n}\|x\|_\infty$ and $\|x\|_\infty \leq \|x\|_2$, by the definition of equivalent norms, the $\ell_2$ and $\ell_\infty$ norms are equivalent on $\mathbb{R}^n$.

This direct proof aligns with the statement of the theorem that in a finite-dimensional vector space, all norms are equivalent.

Theorem 2.4-5 (Equivalent norms):

On a finite-dimensional vector space $X$, any norm $\|\cdot\|$ is equivalent to any other norm $\|\cdot\|_0$. This theorem is fundamental in the study of finite-dimensional vector spaces because it ensures that all norms define the same topology and consequently, the same notions of convergence, continuity, and compactness.

Solution:

We are given that $\|\cdot\|_2$ is the Euclidean norm defined on $\mathbb{R}^n$. For any vector $x$ in $\mathbb{R}^n$, the $\ell_2$ norm is calculated as:

\begin{equation*} \|x\|_2 = \sqrt{x_1^2 + x_2^2 + \ldots + x_n^2}. \end{equation*}

Let $\{e_1, e_2, \ldots, e_n\}$ be the standard basis for $\mathbb{R}^n$. The norm $\|\cdot\|$ being a norm on $X$ implies that it satisfies the property of absolute scalability, which states that for any scalar $\alpha$ and any vector $x$ in $X$, the following holds:

\begin{equation*} \|\alpha x\| = |\alpha| \|x\|. \end{equation*}

For any vector $x = (x_1, x_2, \ldots, x_n)$ in $X$, we express $x$ as a linear combination of the standard basis vectors:

\begin{equation*} x = x_1 e_1 + x_2 e_2 + \ldots + x_n e_n. \end{equation*}

Applying the properties of a norm, specifically the triangle inequality and absolute scalability, we have:

\begin{equation*} \|x\| = \|x_1 e_1 + x_2 e_2 + \ldots + x_n e_n\| \leq |x_1| \|e_1\| + |x_2| \|e_2\| + \ldots + |x_n| \|e_n\|. \end{equation*}

Define $b_i = \|e_i\|$ for each $i$. Set $b = \max\{b_1, b_2, \ldots, b_n\}$, which allows us to rewrite the inequality as:

\begin{equation*} \|x\| \leq b (|x_1| + |x_2| + \ldots + |x_n|). \end{equation*}

Using the Cauchy-Schwarz inequality, we observe that the sum of the absolute values of the components of $x$ is less than or equal to the square root of the sum of the squares of these components. This gives us:

\begin{equation*} |x_1| + |x_2| + \ldots + |x_n| \leq \sqrt{n(x_1^2 + x_2^2 + \ldots + x_n^2)} = \sqrt{n}\|x\|_2. \end{equation*}

Combining the two inequalities, we obtain:

\begin{equation*} \|x\| \leq b\sqrt{n}\|x\|_2. \end{equation*}

Thus, we have established that there exists a constant $b' = b\sqrt{n}$ which satisfies the condition $\|x\| \leq b' \|x\|_2$ for all $x \in X$. The constant $b'$ depends on the norm $\|\cdot\|$ and the dimension of the space $X$, which concludes the direct proof.

Problem 8. Show that the norms $\|\cdot\|_1$ and $\|\cdot\|_2$ in Prob. 8, Sec. 2.2, satisfy

\begin{equation*} \frac{1}{\sqrt{n}} \|x\|_1 \leq \|x\|_2 \leq \|x\|_1. \end{equation*}

Solution:

To establish this relationship between the $\ell_1$ and $\ell_2$ norms, we will utilize the definitions and properties of each norm. For a vector $x \in \mathbb{R}^n$, the norms are defined by:

\begin{equation*} \|x\|_1 = |x_1| + |x_2| + \ldots + |x_n|, \end{equation*}

and

\begin{equation*} \|x\|_2 = \sqrt{x_1^2 + x_2^2 + \ldots + x_n^2}. \end{equation*}

Proof:

The inequality $\|x\|_2 \leq \|x\|_1$:

The square root of a sum of squares is always less than or equal to the sum of the absolute values, hence we have:

\begin{equation*} \|x\|_2 = \sqrt{x_1^2 + x_2^2 + \ldots + x_n^2} \leq |x_1| + |x_2| + \ldots + |x_n| = \|x\|_1. \end{equation*}

This follows since for all $i$, $\sqrt{x_i^2} = |x_i|$.

The inequality $\frac{1}{\sqrt{n}} \|x\|_1 \leq \|x\|_2$:

By employing the Cauchy-Schwarz inequality, which asserts that for any sequences $a_i$ and $b_i$:

\begin{equation*} \left(\sum_{i=1}^n a_i b_i\right)^2 \leq \left(\sum_{i=1}^n a_i^2\right)\left(\sum_{i=1}^n b_i^2\right). \end{equation*}

Letting $a_i = 1$ and $b_i = |x_i|$, it yields:

\begin{equation*} \left(\sum_{i=1}^n |x_i|\right)^2 \leq n \left(\sum_{i=1}^n |x_i|^2\right), \end{equation*}

which simplifies to:

\begin{equation*} \|x\|_1^2 \leq n \|x\|_2^2. \end{equation*}

Taking square roots on both sides, we get:

\begin{equation*} \|x\|_1 \leq \sqrt{n} \|x\|_2, \end{equation*}

and rearranging gives us:

\begin{equation*} \frac{1}{\sqrt{n}} \|x\|_1 \leq \|x\|_2. \end{equation*}

By combining the established inequalities, we have proven the required relationship:

\begin{equation*} \frac{1}{\sqrt{n}} \|x\|_1 \leq \|x\|_2 \leq \|x\|_1. \end{equation*}

This completes the direct proof of the norm inequalities as presented in the problem statement.

Solution:

By the definition of equivalent norms, there exist positive constants $c, C$ such that for all $x \in X$, we have:

\begin{equation*} c\|x\|_0 \leq \|x\| \leq C\|x\|_0. \end{equation*}

Proof:

Assume $\|x_n - x\| \rightarrow 0$, meaning that for every $\epsilon > 0$, there exists an $N$ such that for all $n > N$, $\|x_n - x\| < \epsilon$. Utilizing the lower bound of the equivalent norms, we have:

\begin{equation*} \|x_n - x\|_0 \leq \frac{1}{c}\|x_n - x\|. \end{equation*}

Since $\|x_n - x\| < \epsilon$, it follows that:

\begin{equation*} \|x_n - x\|_0 < \frac{\epsilon}{c}. \end{equation*}

Therefore, $\|x_n - x\|_0 \rightarrow 0$ as $n \rightarrow \infty$.

Conversely, assume $\|x_n - x\|_0 \rightarrow 0$. For every $\epsilon' > 0$, there exists an $N'$ such that for all $n > N'$, $\|x_n - x\|_0 < \epsilon'$. Applying the upper bound of the equivalent norms, we obtain:

\begin{equation*} \|x_n - x\| \leq C\|x_n - x\|_0. \end{equation*}

By setting $\epsilon' = \frac{\epsilon}{C}$, we find:

\begin{equation*} \|x_n - x\| < C \frac{\epsilon}{C} = \epsilon, \end{equation*}

which indicates that $\|x_n - x\| \rightarrow 0$ as $n \rightarrow \infty$.

Kreyszig 2.3, Further Properties of Normed Spaces

Lucy Nowacki

2023-11-05 11:08

Problem 1. Show that $c \subset l^{\infty}$ is a vector subspace of $l^{\infty}$ and so is $C_0$, the space of all sequences of scalars converging to zero.

Solution:

The space $l^{\infty}$ is defined as the set of all bounded sequences of real (or complex) numbers. A sequence $(a_n)$ is in $l^{\infty}$ if there exists a real number $M$ such that for every term $a_n$ in the sequence, $|a_n| \leq M$.

The space $c$ denotes the set of all convergent sequences. A sequence $(a_n)$ is in $c$ if it converges to some limit $L$ in the real (or complex) numbers.

The space $C_0$, or $c_0$ as it is often denoted, is the set of all sequences that converge to zero.

To show that $c$ and $C_0$ are subspaces of $l^{\infty}$, we must verify the following properties for each:

Non-emptiness: The subspace must contain the zero vector.
Closed under vector addition: If two vectors $x$ and $y$ are in the subspace, then their sum $x + y$ must also be in the subspace.
Closed under scalar multiplication: If a vector $x$ is in the subspace and $\alpha$ is any scalar, then the product $\alpha x$ must also be in the subspace.

For $c$ (all convergent sequences):

Non-emptiness: The zero sequence is in $c$ since it converges to zero, and it is clearly bounded.
Closed under vector addition: If $x, y \in c$, both converge to some limits $L_x$ and $L_y$, and their sum $x + y$ converges to $L_x + L_y$. Also, the sum of two bounded sequences is bounded.
Closed under scalar multiplication: For any $x \in c$ and scalar $\alpha$, the sequence $\alpha x$ converges to $\alpha L_x$ and is bounded if $x$ is bounded.

For $C_0$ (sequences converging to zero):

Non-emptiness: $C_0$ contains the zero sequence.
Closed under vector addition: The sum of two sequences in $C_0$ also converges to zero.
Closed under scalar multiplication: A scalar multiple of a sequence in $C_0$ also converges to zero and is bounded.

Since both $c$ and $C_0$ satisfy these properties, they are both subspaces of $l^{\infty}$.

Problem 2. Show that $c_0$ in Problem 1 is a closed subspace of $l^{\infty}$, so that $c_0$ is complete by: (a) Theorem (Complete subspace): A subspace $M$ of a complete metric space $X$ is itself complete if and only if the set $M$ is closed in $X$. (b) Completeness of $l^{\infty}$: The space $l^{\infty}$, is complete.

Solution:

To show that $c_0$ from Problem 1 is a closed subspace of $l^{\infty}$, we will use the theorem provided and the fact that $l^{\infty}$ is complete.

Step 1: Use the Theorem (Complete Subspace)

The theorem states that a subspace $M$ of a complete metric space $X$ is complete if and only if $M$ is closed in $X$. Therefore, we must demonstrate that $c_0$ is closed in $l^{\infty}$.

Step 2: Show that $c_0$ is closed in $l^{\infty}$

A subset of a metric space is closed if it contains all of its limit points. To prove that $c_0$ is closed, we need to show that if a sequence of elements in $c_0$ converges to some limit within $l^{\infty}$, then this limit is also in $c_0$.

Suppose $(x_n)$ is a sequence of sequences in $c_0$ that converges to some sequence $x$ in $l^{\infty}$. We need to show that $x$ is also in $c_0$. This means that $x$ must converge to zero.

Since $(x_n)$ converges to $x$ in $l^{\infty}$, for every $\epsilon > 0$, there exists an $N$ such that for all $n \geq N$, the sequences $x_n$ are within $\epsilon$ of $x$ in the supremum norm, i.e.,

\begin{equation*} \sup_{k \in \mathbb{N}} |(x_n)_k - x_k| < \epsilon. \end{equation*}

Each $x_n$ is in $c_0$, meaning that for each $x_n$ and for every $\epsilon > 0$, there exists an $M$ (which can depend on $n$) such that for all $k \geq M$, $|(x_n)_k| < \epsilon$.

As $n \rightarrow \infty$, $x_n$ converges to $x$ and since each $x_n$ gets arbitrarily close to zero for large enough indices, $x$ must also get arbitrarily close to zero for large enough indices. This means that $x$ converges to zero and thus $x \in c_0$.

Step 3: Apply the Completeness of $l^{\infty}$

Since $l^{\infty}$ is a complete metric space and $c_0$ is closed in $l^{\infty}$, by the theorem, $c_0$ is also complete.

By showing that $c_0$ is a closed subset of the complete space $l^{\infty}$, we have shown that $c_0$ is a complete subspace of $l^{\infty}$.

Problem 3. Problem Statement: In $l^{\infty}$, let $Y$ be the subset of all sequences with only finitely many nonzero terms. Show that $Y$ is a subspace of $l^{\infty}$ but not a closed subspace.

Solution:

To demonstrate that $Y$ is a subspace of $l^{\infty}$, we must verify that $Y$ satisfies the three properties of a vector subspace:

Non-emptiness: $Y$ contains the zero vector.
Closed under vector addition: If two vectors $x$ and $y$ are in $Y$, then their sum $x + y$ must also be in $Y$.
Closed under scalar multiplication: If a vector $x$ is in $Y$ and $\alpha$ is any scalar, then the product $\alpha x$ must also be in $Y$.

Let's examine each property:

Non-emptiness: The zero sequence, where every term is zero, is a sequence with finitely many nonzero terms (specifically, none), so $Y$ contains the zero vector.
Closed under vector addition: If $x$ and $y$ are in $Y$, they each have only finitely many nonzero terms. The sum $x + y$ will also have only finitely many nonzero terms because the nonzero terms can only occur at the indices where $x$ or $y$ (or both) have nonzero terms. Therefore, $x + y$ is also in $Y$.
Closed under scalar multiplication: If $x$ is in $Y$ and $\alpha$ is any scalar, multiplying $x$ by $\alpha$ will not introduce any new nonzero terms beyond those already present in $x$. Therefore, $\alpha x$ will also have only finitely many nonzero terms and is in $Y$.

Since $Y$ satisfies all three properties, it is a subspace of $l^{\infty}$.

To show that $Y$ is not a closed subspace, we need to find a sequence of elements in $Y$ that converges to a limit not in $Y$. This limit will be a sequence with infinitely many nonzero terms, demonstrating that $Y$ does not contain all its limit points, and hence it is not closed.

Consider the sequence of sequences $(y^{(n)})$ defined by:

\begin{equation*} y^{(n)} = (1, \frac{1}{2}, \frac{1}{3}, \ldots, \frac{1}{n}, 0, 0, 0, \ldots) \end{equation*}

Each $y^{(n)}$ is in $Y$ because it has only $n$ nonzero terms. Now, consider the sequence $y$ defined by:

\begin{equation*} y = (1, \frac{1}{2}, \frac{1}{3}, \ldots) \end{equation*}

The sequence $y$ is not in $Y$ because it has infinitely many nonzero terms. However, $(y^{(n)})$ converges to $y$ in the $l^{\infty}$ norm because for every $\epsilon > 0$, there exists an $N$ such that for all $n \geq N$, the tail of the sequence $y$ (from $n$ onward) is bounded above by $\epsilon$.

Therefore, the limit of the convergent sequence $(y^{(n)})$ is not in $Y$, showing that $Y$ is not closed in $l^{\infty}$.

Problem 4. In a normed space $X$, show that vector addition and multiplication by scalars are continuous operations with respect to the norm; that is, the mappings defined by $(x, y) \mapsto x+y$ and $(\alpha, x) \mapsto \alpha x$ are continuous.

Solution:

Continuity of Vector Addition

Let $(x_n)$ and $(y_n)$ be sequences in $X$ such that $x_n \to x$ and $y_n \to y$ as $n \to \infty$. We need to show that $x_n + y_n \to x + y$. By the definition of convergence in a normed space, for every $\epsilon > 0$, there exist $N_1, N_2 \in \mathbb{N}$ such that for all $n \geq N_1$, $\|x_n - x\| < \frac{\epsilon}{2}$ and for all $n \geq N_2$, $\|y_n - y\| < \frac{\epsilon}{2}$.

Let $N = \max\{N_1, N_2\}$. Then for all $n \geq N$, we have:

\begin{equation*} \| (x_n + y_n) - (x + y) \| = \| (x_n - x) + (y_n - y) \| \leq \|x_n - x\| + \|y_n - y\| < \frac{\epsilon}{2} + \frac{\epsilon}{2} = \epsilon \end{equation*}

The inequality follows from the triangle inequality of the norm. Since $\epsilon$ was arbitrary, this shows that $x_n + y_n \to x + y$, and thus vector addition is continuous.

Continuity of Scalar Multiplication

Let $(\alpha_n)$ be a sequence of scalars converging to $\alpha$, and let $(x_n)$ be a sequence in $X$ such that $x_n \to x$. We need to show that $\alpha_n x_n \to \alpha x$. For every $\epsilon > 0$, there exist $N_1, N_2 \in \mathbb{N}$ such that for all $n \geq N_1$, $|\alpha_n - \alpha| < \frac{\epsilon}{2(\|x\|+1)}$ and for all $n \geq N_2$, $\|x_n - x\| < \frac{\epsilon}{2(\|\alpha\|+1)}$.

Let $N = \max\{N_1, N_2\}$. Then for all $n \geq N$, we have:

\begin{equation*} \| \alpha_n x_n - \alpha x \| = \| \alpha_n x_n - \alpha_n x + \alpha_n x - \alpha x \| \leq \| \alpha_n (x_n - x) \| + \| (\alpha_n - \alpha) x \| \end{equation*}

Using the properties of the norm and the convergence of $\alpha_n$ and $x_n$, we further obtain:

\begin{equation*} \| \alpha_n x_n - \alpha x \| \leq |\alpha_n| \| x_n - x \| + |\alpha_n - \alpha| \| x \| < (\|\alpha\|+1) \frac{\epsilon}{2(\|\alpha\|+1)} + \frac{\epsilon}{2} = \epsilon \end{equation*}

Since $\epsilon$ was arbitrary, this shows that $\alpha_n x_n \to \alpha x$, and thus scalar multiplication is continuous.

Hence, in a normed space $X$, both vector addition and scalar multiplication are continuous with respect to the norm.

Problem 5. Show that $x_n \to x$ and $y_n \to y$ implies $x_n + y_n \to x + y$. Show that $\alpha_n \to \alpha$ and $x_n \to x$ implies $\alpha_n x_n \to \alpha x$.

Solution:

Continuity of Vector Addition

Given $x_n \to x$ and $y_n \to y$, we need to demonstrate that $x_n + y_n \to x + y$.

By the definition of convergence, for every $\epsilon > 0$, there exists an $N_1$ such that for all $n \geq N_1$, $\|x_n - x\| < \frac{\epsilon}{2}$. Similarly, there exists an $N_2$ such that for all $n \geq N_2$, $\|y_n - y\| < \frac{\epsilon}{2}$.

Let $N = \max(N_1, N_2)$. Then for all $n \geq N$:

\begin{equation*} \| (x_n + y_n) - (x + y) \| = \| (x_n - x) + (y_n - y) \| \leq \|x_n - x\| + \|y_n - y\| < \frac{\epsilon}{2} + \frac{\epsilon}{2} = \epsilon. \end{equation*}

This proves that $x_n + y_n \to x + y$, confirming the continuity of vector addition.

Continuity of Scalar Multiplication

Given $\alpha_n \to \alpha$ and $x_n \to x$, we need to show that $\alpha_n x_n \to \alpha x$.

For every $\epsilon > 0$, there exists an $N_1$ such that for all $n \geq N_1$, $|\alpha_n - \alpha| < \frac{\epsilon}{2(\|x\| + 1)}$ (assuming $x \neq 0$, otherwise the result is trivial). Also, there exists an $N_2$ such that for all $n \geq N_2$, $\|x_n - x\| < \frac{\epsilon}{2(|\alpha| + 1)}$.

Let $N = \max(N_1, N_2)$. Then for all $n \geq N$:

\begin{equation*} \| \alpha_n x_n - \alpha x \| = \| \alpha_n x_n - \alpha_n x + \alpha_n x - \alpha x \| \leq |\alpha_n| \| x_n - x \| + |\alpha_n - \alpha| \| x \|. \end{equation*}

Using the convergence criteria and the norm properties, we get:

\begin{equation*} |\alpha_n| \| x_n - x \| < (|\alpha| + 1) \frac{\epsilon}{2(|\alpha| + 1)} = \frac{\epsilon}{2}, \end{equation*}

and

\begin{equation*} |\alpha_n - \alpha| \| x \| < \frac{\epsilon}{2(\|x\| + 1)} \|x\| \leq \frac{\epsilon}{2}. \end{equation*}

Summing these inequalities gives:

\begin{equation*} \| \alpha_n x_n - \alpha x \| < \epsilon. \end{equation*}

This confirms that $\alpha_n x_n \to \alpha x$, establishing the continuity of scalar multiplication.

Problem 6. Show that the closure $\bar{Y}$ of a subspace $Y$ of a normed space $X$ is again a vector subspace.

Solution:

To show that the closure $\overline{Y}$ of a subspace $Y$ is a vector subspace, we need to verify that it satisfies the properties of a vector subspace:

Non-emptiness: The closure $\overline{Y}$ must contain the zero vector. Since $Y$ is a subspace, it contains the zero vector $0$. The closure of a set contains all the limit points of that set, and since $0$ is in $Y$ and is its own limit, $0$ is also in $\overline{Y}$.

Closed under vector addition: If $x$ and $y$ are in $\overline{Y}$, then $x + y$ must also be in $\overline{Y}$. Let $x$ and $y$ be in $\overline{Y}$. By the definition of closure, for every $\epsilon > 0$, there exist points $x' \in Y$ and $y' \in Y$ such that $\|x - x'\| < \frac{\epsilon}{2}$ and $\|y - y'\| < \frac{\epsilon}{2}$. Since $Y$ is a subspace and therefore closed under addition, $x' + y'$ is in $Y$.

Consider $x + y$ and $x' + y'$. We have:

\begin{equation*} \| (x + y) - (x' + y') \| = \| (x - x') + (y - y') \| \leq \|x - x'\| + \|y - y'\| < \frac{\epsilon}{2} + \frac{\epsilon}{2} = \epsilon. \end{equation*}

This inequality shows that for every point $x + y$ in $\overline{Y}$, we can find a point $x' + y'$ in $Y$ such that $x + y$ is as close as we wish to $x' + y'$, which means $x + y$ is a limit point of $Y$ and hence in $\overline{Y}$.

Closed under scalar multiplication: If $x$ is in $\overline{Y}$ and $\alpha$ is a scalar, then $\alpha x$ must also be in $\overline{Y}$. Let $x$ be in $\overline{Y}$ and let $\alpha$ be any scalar. By the definition of closure, for every $\epsilon > 0$, there exists a point $x' \in Y$ such that $\|x - x'\| < \frac{\epsilon}{|\alpha|}$ if $\alpha \neq 0$ (if $\alpha = 0$, the result is trivial since $0 \cdot x = 0$ is in $Y$ and hence in $\overline{Y}$).

Since $Y$ is a subspace, it is closed under scalar multiplication, so $\alpha x'$ is in $Y$. Consider $\alpha x$ and $\alpha x'$. We have:

\begin{equation*} \| \alpha x - \alpha x' \| = |\alpha| \| x - x' \| < |\alpha| \cdot \frac{\epsilon}{|\alpha|} = \epsilon. \end{equation*}

This inequality shows that for every point $\alpha x$ in $\overline{Y}$, we can find a point $\alpha x'$ in $Y$ such that $\alpha x$ is as close as we wish to $\alpha x'$, which means $\alpha x$ is a limit point of $Y$ and hence in $\overline{Y}$.

Therefore, the closure $\overline{Y}$ of a subspace $Y$ of a normed space $X$ satisfies all the properties of a vector subspace and is thus itself a vector subspace of $X$.

Problem 7. Show that convergence of $\|\mathbf{y}_1\| + \|\mathbf{y}_2\| + \|\mathbf{y}_3\| + \ldots$ may not imply convergence of $\mathbf{y}_1 + \mathbf{y}_2 + \mathbf{y}_3 + \ldots$. Hint: Consider $\mathbf{y}$ in Prob. 3 and $(\mathbf{y}_n)$, where $\mathbf{y}_n = (\eta_j^{(n)})$, $\eta_n^{(n)} = 1/n^2$, $\eta_j^{(n)} = 0$ for all $j \neq n$.

Solution:

To demonstrate the statement, we'll consider a sequence in the space $l^\infty$ of all bounded sequences of scalars, which is the space mentioned in Problem 3.

We'll construct a specific example using the hint provided, which involves sequences with only one non-zero term whose magnitude is $\frac{1}{n^2}$. This example will show that the series of norms converges (absolute convergence), but the series of vectors does not converge in the $l^\infty$ space.

Construction:

Let $y_n$ be a sequence in $l^\infty$ defined by $y_n = (\eta_j^{(n)})$ where:

\begin{equation*} \eta_j^{(n)} = \begin{cases} \frac{1}{n^2} & \text{if } j = n \\ 0 & \text{if } j \neq n \end{cases} \end{equation*}

This sequence $y_n$ has only the $n$-th term non-zero and equal to $\frac{1}{n^2}$, and all other terms are zero.

Absolute convergence of norms:

Consider the series of norms $\sum_{n=1}^\infty \|y_n\|$. Since $\|y_n\| = \frac{1}{n^2}$ for each $n$, the series is:

\begin{equation*} \sum_{n=1}^\infty \|y_n\| = \sum_{n=1}^\infty \frac{1}{n^2} \end{equation*}

The series $\sum_{n=1}^\infty \frac{1}{n^2}$ is known to converge (it's a p-series with $p = 2$, which converges for $p > 1$).

Lack of convergence of the vector series:

Now consider the series of vectors $\sum_{n=1}^\infty y_n$. The $n$-th partial sum of this series is:

\begin{equation*} S_n = \sum_{k=1}^n y_k = (1, \frac{1}{4}, \frac{1}{9}, \ldots, \frac{1}{n^2}, 0, 0, \ldots) \end{equation*}

Each partial sum $S_n$ is a sequence in $l^\infty$ where the first $n$ terms are the reciprocals of the squares of the natural numbers, and the rest are zeros.

The limit of the partial sums $S_n$ as $n \to \infty$, if it exists, would be the sequence:

\begin{equation*} S = (1, \frac{1}{4}, \frac{1}{9}, \ldots, \frac{1}{n^2}, \ldots) \end{equation*}

The sequence $S$ represents the harmonic series of squares, which does not converge in the $l^\infty$ space, because it's not a bounded sequence. Each term in the sequence $S$ is a positive number, and there are infinitely many terms, so the sequence does not converge to a point in $l^\infty$ (which requires boundedness).

Conclusion:

We have shown that while the series of norms $\sum_{n=1}^\infty \|y_n\|$ converges, the series of vectors $\sum_{n=1}^\infty y_n$ does not converge in the $l^\infty$ space. This example illustrates that absolute convergence of the norms does not imply convergence of the series of vectors in the $l^\infty$ space.

Problem 8. Problem Statement: In a normed space $X$, if absolute convergence of any series always implies convergence of that series, show that $X$ is complete.

Proof:

Absolute Convergence Implies Convergence: By hypothesis, if a series $\sum_{n=1}^\infty x_n$ in $X$ is absolutely convergent, meaning that $\sum_{n=1}^\infty \|x_n\|$ converges, then the series $\sum_{n=1}^\infty x_n$ itself converges in $X$.
Cauchy Criterion for Series: A series $\sum_{n=1}^\infty x_n$ converges if and only if the sequence of partial sums $S_m = \sum_{n=1}^m x_n$ is a Cauchy sequence.
Absolute Convergence and Cauchy Sequences: Suppose $\sum_{n=1}^\infty x_n$ is absolutely convergent. Then for every $\varepsilon > 0$, there exists $N \in \mathbb{N}$ such that for all $m > n \geq N$, we have $\sum_{k=n}^m \|x_k\| < \varepsilon$ because the series of norms is convergent and hence satisfies the Cauchy criterion.
Implication for Partial Sums: The property above implies that the sequence of partial sums $(S_m)$ is Cauchy. To see this, note that for $m > n \geq N$,

\begin{equation*} \|S_m - S_n\| = \left\|\sum_{k=n+1}^m x_k\right\| \leq \sum_{k=n+1}^m \|x_k\| < \varepsilon. \end{equation*}

This inequality holds because the norm is subadditive (it satisfies the triangle inequality).
Completeness of X: If $(S_m)$ is a Cauchy sequence in $X$ and $X$ is a space where absolute convergence implies convergence, then $(S_m)$ must converge in $X$ because it is absolutely convergent.
Conclusion: Since every Cauchy sequence in $X$ converges in $X$, $X$ is complete. Hence, $X$ is a Banach space.

The key point here is the equivalence of the Cauchy criterion for series convergence and the completeness of the space. The hypothesis that absolute convergence implies convergence ensures that Cauchy sequences of partial sums always converge, which is precisely the definition of a complete space. Solution:

Completeness of $X$:

To say that $X$ is complete means that every Cauchy sequence in $X$ converges to a limit within $X$. Now, let's consider any Cauchy sequence $(x_n)$ in $X$. By the property of normed spaces, we can form a series $\sum_{n=1}^\infty (x_{n+1} - x_n)$. This series is absolutely convergent if the series of norms $\sum_{n=1}^\infty \|x_{n+1} - x_n\|$ converges.

Since $(x_n)$ is a Cauchy sequence, for every $\varepsilon > 0$, there exists an $N$ such that for all $m > n \geq N$, the distance between $x_n$ and $x_m$ is less than $\varepsilon$. Formally, $\|x_m - x_n\| < \varepsilon$.

The property of being a Cauchy sequence suggests that the series of differences $(x_{n+1} - x_n)$ has terms that become arbitrarily small as $n$ increases. In other words, the series $\sum_{n=N}^\infty \|x_{n+1} - x_n\|$ has terms that decrease and approach zero, implying that the series of norms is convergent.

Given that absolute convergence of a series in $X$ implies its convergence, we can conclude that the series $\sum_{n=1}^\infty (x_{n+1} - x_n)$ converges in $X$. The convergence of this series means that the sequence of partial sums, which corresponds to the sequence $(x_n)$ up to an initial segment, converges to a limit in $X$.

Therefore, the original Cauchy sequence $(x_n)$ must also converge in $X$, because its behavior at infinity is captured by the series formed by its successive differences. Since every Cauchy sequence in $X$ has a limit in $X$, we conclude that $X$ is complete.

In summary, the condition that absolute convergence implies convergence in $X$ allows us to transform the Cauchy criterion for sequences into a condition on series. Since this condition guarantees convergence for all absolutely convergent series—and hence for all Cauchy sequences—it follows that $X$ is a complete normed space, or a Banach space.

Solution:

This proof leverages the fundamental property of normed spaces: a space is complete if every Cauchy sequence converges within the space. The given condition, that absolute convergence implies convergence, is used to show that Cauchy sequences, constructed from series of vectors in the space, converge. This implies that the space must be complete, as all such Cauchy sequences have a limit in the space, satisfying the definition of a Banach space.

Problem 9. Show that in a Banach space, an absolutely convergent series is convergent.

Detailed Proof:

Let $(X, \|\cdot\|)$ be a Banach space. Suppose we have a series $\sum_{n=1}^\infty x_n$ in $X$ that is absolutely convergent. By definition, this means that the series $\sum_{n=1}^\infty \|x_n\|$ converges in the real numbers.

Definition of Absolute Convergence: The series $\sum_{n=1}^\infty \|x_n\|$ is said to be absolutely convergent if the sum of the norms, which are real numbers, is a convergent series in $\mathbb{R}$, i.e., there exists a real number $L$ such that for every $\epsilon > 0$, there is an integer $N$ such that for all $n \geq N$, it holds that

\begin{equation*} \left|\sum_{k=N+1}^n \|x_k\| - L\right| < \epsilon. \end{equation*}
Partial Sums as a Sequence: Define the $n$-th partial sum $S_n$ of the series $\sum_{n=1}^\infty x_n$ by $S_n = \sum_{k=1}^n x_k$. The sequence $(S_n)$ is a sequence of elements in $X$.
Partial Sums Are Cauchy: To show that $(S_n)$ is a Cauchy sequence, consider any $\epsilon > 0$. Since the series of norms converges, there exists an integer $N$ such that for all $m, n \geq N$ with $m < n$, we have

\begin{equation*} \sum_{k=m+1}^n \|x_k\| < \epsilon. \end{equation*}

Now, consider the difference between the $n$-th and $m$-th partial sums:

\begin{equation*} \|S_n - S_m\| = \left\|\sum_{k=m+1}^n x_k\right\| \leq \sum_{k=m+1}^n \|x_k\|, \end{equation*}

where we used the triangle inequality for norms. Given our choice of $N$, for $m, n \geq N$, this implies

\begin{equation*} \|S_n - S_m\| < \epsilon. \end{equation*}

This is the Cauchy criterion for sequences in a normed space: for any $\epsilon > 0$, there exists an $N$ such that for all $m, n \geq N$, the norm of the difference between the $n$-th and $m$-th terms of the sequence is less than $\epsilon$.
Convergence of Cauchy Sequences in Banach Spaces: A Banach space is, by definition, a complete normed vector space. Completeness means that every Cauchy sequence in the space converges to a limit within the space. Since we have established that $(S_n)$ is a Cauchy sequence, it must converge to some limit $S$ in $X$.
Conclusion: The limit $S$ to which the sequence $(S_n)$ converges is the sum of the series $\sum_{n=1}^\infty x_n$. Therefore, the series converges in $X$, and we have demonstrated that an absolutely convergent series in a Banach space is indeed convergent.

Solution:

This detailed proof walks through the concepts of absolute convergence, the properties of Cauchy sequences, and the completeness of Banach spaces to conclusively show that an absolutely convergent series in a Banach space must converge. This result is a cornerstone of functional analysis and underscores the robustness of Banach spaces for analytical purposes.

Kreyszig-2.2-Normed Space, Banach Space

Lucy Nowacki

2023-11-04 21:45

Problem 1. Show that the norm $\|x\|$ of x is the distance from x to O.

Definition:

A norm on a (real or complex) vector space $X$ is a real-valued function on $X$ whose value at an $x \in X$ is denoted by $\|x\|$ (read "norm of x") and which has the properties

$\|x\| \geq 0$ (Non-negativity)
$\|x\| = 0 \iff x = 0$ (Definiteness)
$\|a x\| = |a| \|x\|$ (Homogeneity)
$\|x + y\| \leq \|x\| + \|y\|$ (Triangle Inequality);

here $x$ and $y$ are arbitrary vectors in $X$ and $a$ is any scalar.

Solution:

The norm $\|x\|$ of a vector $x$ in a vector space is a generalization of the notion of "length" of a vector. It measures the size of vectors and is consistent with our geometric intuition.

In a normed vector space, the distance $d$ between two vectors $x$ and $y$ is defined as:

\begin{equation*} d(x, y) = \|x - y\| \end{equation*}

The distance from any vector $x$ to the origin $O$ (the zero vector $0$) is then:

\begin{equation*} d(x, O) = \|x - 0\| \end{equation*}

Since subtracting the zero vector does not change the vector $x$, we have:

\begin{equation*} d(x, O) = \|x\| \end{equation*}

Thus, the norm $\|x\|$ is the distance from the vector $x$ to the origin $O$ in the vector space $X$. This relationship holds in any normed vector space, whether it be a space of real numbers, complex numbers, or more abstract objects.

Problem 2. Verify that the usual length of a vector in the plane or in three-dimensional space has the properties (N1) to (N4) of a norm.

Solution:

In the Plane ($\mathbb{R}^2$)

For a vector $x = (x_1, x_2)$ in $\mathbb{R}^2$, the usual length (Euclidean norm) is defined as:

\begin{equation*} \|x\| = \sqrt{x_1^2 + x_2^2} \end{equation*}

Property (N1): Non-negativity

\begin{equation*} \|x\| = \sqrt{x_1^2 + x_2^2} \geq 0 \end{equation*}

The square of any real number is non-negative, and the square root of a non-negative number is also non-negative.

Property (N2): Definiteness

\begin{equation*} \|x\| = 0 \iff x_1^2 + x_2^2 = 0 \iff x_1 = 0 \text{ and } x_2 = 0 \iff x = (0, 0) \end{equation*}

The norm is zero if and only if both components of the vector are zero.

Property (N3): Homogeneity

For any scalar $a$ and vector $x = (x_1, x_2)$,

\begin{equation*} \|a \cdot x\| = \| (a x_1, a x_2) \| = \sqrt{(a x_1)^2 + (a x_2)^2} = |a| \cdot \sqrt{x_1^2 + x_2^2} = |a| \cdot \|x\| \end{equation*}

The norm of a scaled vector is the absolute value of the scalar times the norm of the vector.

Property (N4): Triangle Inequality

For any vectors $x = (x_1, x_2)$ and $y = (y_1, y_2)$, let's consider the norm of their sum:

\begin{equation*} \|x + y\| = \| (x_1 + y_1, x_2 + y_2) \| = \sqrt{(x_1 + y_1)^2 + (x_2 + y_2)^2} \end{equation*}

To prove the triangle inequality, we expand the square of the norm of $x + y$:

\begin{equation*} \|x + y\|^2 = x_1^2 + 2x_1y_1 + y_1^2 + x_2^2 + 2x_2y_2 + y_2^2 \end{equation*}

We can rewrite this as:

\begin{equation*} \|x + y\|^2 = \|x\|^2 + 2\langle x, y \rangle + \|y\|^2 \end{equation*}

By the Cauchy-Schwarz inequality, we know:

\begin{equation*} |\langle x, y \rangle| \leq \|x\| \cdot \|y\| \end{equation*}

So we have:

\begin{equation*} \|x + y\|^2 \leq \|x\|^2 + 2\|x\| \cdot \|y\| + \|y\|^2 = (\|x\| + \|y\|)^2 \end{equation*}

Taking the square root of both sides (and remembering that the square root function is increasing), we get:

\begin{equation*} \|x + y\| \leq \|x\| + \|y\| \end{equation*}

This completes the proof of the triangle inequality for vectors in $\mathbb{R}^2$.

In Three-Dimensional Space (:math:`mathbb{R}^3`)

The verification of properties (N1) to (N4) in three-dimensional space follows similarly to that in the plane, with the addition of the third component for each vector. The proof of the triangle inequality in $\mathbb{R}^3$ follows the same steps as above.

Concavity of the Square Root Function

The function $f(t) = \sqrt{t}$ is concave on $[0, \infty)$ because its second derivative is negative:

\begin{equation*} f''(t) = -\frac{1}{4t^{3/2}} < 0 \text{ for all } t > 0 \end{equation*}

The concavity of the square root function ensures that the function applied to a sum is less than or equal to the sum of the functions applied to each term separately, which is consistent with the triangle inequality as we've just proven.

The above reasoning solidifies that the Euclidean norm satisfies the triangle inequality, completing the verification that it indeed constitutes a norm in both $\mathbb{R}^2$ and $\mathbb{R}^3$.

Detailed Transition

Consider two vectors $x$ and $y$ in $\mathbb{R}^n$. The squared norm of their sum is:

\begin{equation*} \|x + y\|^2 = \langle x + y, x + y \rangle \end{equation*}

Expanding the inner product:

\begin{equation*} \|x + y\|^2 = \langle x, x \rangle + 2\langle x, y \rangle + \langle y, y \rangle \end{equation*}

The inner product of a vector with itself is the square of its norm:

\begin{equation*} \|x + y\|^2 = \|x\|^2 + 2\langle x, y \rangle + \|y\|^2 \end{equation*}

By the Cauchy-Schwarz inequality:

\begin{equation*} |\langle x, y \rangle| \leq \|x\| \cdot \|y\| \end{equation*}

This implies:

\begin{equation*} 2|\langle x, y \rangle| \leq 2\|x\| \cdot \|y\| \end{equation*}

Since the norms are non-negative, and the inner product can be negative, we use the absolute value:

\begin{equation*} 2\langle x, y \rangle \leq 2\|x\| \cdot \|y\| \end{equation*}

Substituting back into our expanded norm equation:

\begin{equation*} \|x + y\|^2 \leq \|x\|^2 + 2\|x\| \cdot \|y\| + \|y\|^2 \end{equation*}

The right-hand side is the square of $\|x\| + \|y\|$:

\begin{equation*} \|x + y\|^2 \leq (\|x\| + \|y\|)^2 \end{equation*}

Taking the square root of both sides, since the square root function is monotonically increasing:

\begin{equation*} \|x + y\| \leq \|x\| + \|y\| \end{equation*}

This is the triangle inequality for norms, demonstrating that the Euclidean norm satisfies property (N4).

Problem 4. Show that we may replace (N2) by $\|x\| = 0 \iff x = 0$ without altering the concept of a norm. Show that non-negativity of a norm also follows from (N3) and (N4).

Solution:

Part 1: Replacing (N2)

The definiteness condition states that $\|x\| = 0 \iff x = 0$. This condition is crucial because it ensures that the only vector with a norm of zero is the zero vector itself.

Part 2: Non-negativity from (N3) and (N4)

Property (N3) states that $\|a x\| = |a| \|x\|$ for any scalar $a$ and any vector $x$. This property is known as absolute homogeneity or scalability.

Property (N4) is the triangle inequality, which states that $\|x + y\| \leq \|x\| + \|y\|$ for any vectors $x$ and $y$.

To show that non-negativity follows from (N3) and (N4), consider the following:

For any vector $x$ in the vector space, by property (N3), we have:

\begin{equation*} \|0 \cdot x\| = |0| \|x\| = 0 \end{equation*}

Here we used the fact that multiplying any vector by zero yields the zero vector, and the absolute value of zero is zero. This gives us the result that $\|0\| = 0$.

Now, using the triangle inequality (N4), for any vector $x$:

\begin{equation*} \|x\| = \|x + 0\| \leq \|x\| + \|0\| \end{equation*}

Since we've established that $\|0\| = 0$, this simplifies to:

\begin{equation*} \|x\| \leq \|x\| + 0 \end{equation*}

Thus, $\|x\| \leq \|x\|$, which is true by the reflexivity of the inequality. This shows that $\|x\|$ must be non-negative since it cannot be less than itself.

Together, these parts demonstrate that the property (N2) can be replaced by the definiteness condition without changing the concept of a norm, and that non-negativity can be derived from (N3) and (N4), confirming that these properties are sufficient to define a norm.

Problem 5. Show that the Euclidean norm with components $x_i$ replaced by $\xi_i$ and scalar $a$ replaced by $\alpha$ defines a norm on the vector space $\mathbb{R}^n$.

Solution:

To demonstrate that the Euclidean norm defines a norm on $\mathbb{R}^n$ with the components $x_i$ replaced by $\xi_i$ and the scalar $a$ replaced by $\alpha$, we must verify that it satisfies the following properties:

Non-negativity: For any vector $x$, since each component $\xi_i$ is squared, the sum is non-negative. Therefore, $\|x\| \geq 0$.
Definiteness: The norm $\|x\|$ equals zero if and only if every $\xi_i$ is zero, which implies that $x$ is the zero vector.
Homogeneity (or scalability): For any scalar $\alpha$ and vector $x$, the norm of the scaled vector is given by:

\begin{equation*} \|\alpha x\| = \sqrt{\sum_{i=1}^{n} (\alpha \xi_i)^2} = |\alpha| \sqrt{\sum_{i=1}^{n} \xi_i^2} = |\alpha| \|x\| \end{equation*}
Triangle Inequality: For vectors $x = (\xi_1, \xi_2, \ldots, \xi_n)$ and $y = (\eta_1, \eta_2, \ldots, \eta_n)$, we need to show that $\|x + y\| \leq \|x\| + \|y\|$.

Starting with the left side of the inequality:

\begin{equation*} \|x + y\|^2 = \sum_{i=1}^{n} (\xi_i + \eta_i)^2 = \sum_{i=1}^{n} (\xi_i^2 + 2\xi_i\eta_i + \eta_i^2) \end{equation*}

Applying the Cauchy-Schwarz inequality:

\begin{equation*} \left| \sum_{i=1}^{n} \xi_i\eta_i \right| \leq \sqrt{\sum_{i=1}^{n} \xi_i^2} \cdot \sqrt{\sum_{i=1}^{n} \eta_i^2} \end{equation*}

We then have:

\begin{equation*} 2\sum_{i=1}^{n} \xi_i\eta_i \leq 2\sqrt{\sum_{i=1}^{n} \xi_i^2} \cdot \sqrt{\sum_{i=1}^{n} \eta_i^2} \end{equation*}

Substituting this back into the squared norm of $x + y$, we get:

\begin{equation*} \|x + y\|^2 \leq \sum_{i=1}^{n} \xi_i^2 + 2\sqrt{\sum_{i=1}^{n} \xi_i^2} \cdot \sqrt{\sum_{i=1}^{n} \eta_i^2} + \sum_{i=1}^{n} \eta_i^2 \end{equation*}

Which simplifies to:

\begin{equation*} \|x + y\|^2 \leq \left( \sqrt{\sum_{i=1}^{n} \xi_i^2} + \sqrt{\sum_{i=1}^{n} \eta_i^2} \right)^2 \end{equation*}

Taking the square root of both sides:

\begin{equation*} \|x + y\| \leq \sqrt{\sum_{i=1}^{n} \xi_i^2} + \sqrt{\sum_{i=1}^{n} \eta_i^2} \end{equation*}

Thus, we have proven the triangle inequality:

\begin{equation*} \|x + y\| \leq \|x\| + \|y\| \end{equation*}

By confirming these properties, we have shown that the Euclidean norm with substitutions $\xi_i$ for $x_i$ and $\alpha$ for $a$ indeed defines a norm on the vector space $\mathbb{R}^n$.

Problem 6. Let $X$ be the vector space of all ordered pairs $x = (\xi_1, \xi_2)$, $y = (\eta_1, \eta_2)$, ... of real numbers. We are to show that norms on $X$ are defined by:

\begin{equation*} \|x\|_1 = |\xi_1| + |\xi_2| \end{equation*}

\begin{equation*} \|x\|_2 = (\xi_1^2 + \xi_2^2)^{1/2} \end{equation*}

\begin{equation*} \|x\|_{\infty} = \max\{|\xi_1|, |\xi_2|\} \end{equation*}

Solution:

For the $L^1$ norm:
- Non-negativity: Since absolute values are always non-negative, we have $|\xi_1| + |\xi_2| \geq 0$.
- Definiteness: $\|x\|_1 = 0$ if and only if $|\xi_1| = 0$ and $|\xi_2| = 0$, which occurs if and only if $\xi_1 = 0$ and $\xi_2 = 0$, hence $x = 0$.
- Scalar multiplication: For any scalar $\alpha$, $\|\alpha x\|_1 = |\alpha \xi_1| + |\alpha \xi_2| = |\alpha|(|\xi_1| + |\xi_2|) = |\alpha| \|x\|_1$.
- Triangle inequality: For any vectors $x = (\xi_1, \xi_2)$ and $y = (\eta_1, \eta_2)$, $\|x + y\|_1 = |(\xi_1 + \eta_1)| + |(\xi_2 + \eta_2)| \leq (|\xi_1| + |\eta_1|) + (|\xi_2| + |\eta_2|) = \|x\|_1 + \|y\|_1$.
For the $L^2$ norm:
- Non-negativity: The sum of squares is non-negative, and so is their square root, hence $\|x\|_2 \geq 0$.
- Definiteness: $\|x\|_2 = 0$ if and only if $\xi_1^2 + \xi_2^2 = 0$, which occurs only when $\xi_1 = 0$ and $\xi_2 = 0$, thus $x = 0$.
- Scalar multiplication: $\|\alpha x\|_2 = ((\alpha \xi_1)^2 + (\alpha \xi_2)^2)^{1/2} = |\alpha| (\xi_1^2 + \xi_2^2)^{1/2} = |\alpha| \|x\|_2$.
- Triangle inequality: This follows from the Minkowski inequality, which is a general result and holds for the $L^2$ norm.
For the $L^\infty$ norm:
- Non-negativity: The maximum of absolute values is non-negative, so $\|x\|_{\infty} \geq 0$.
- Definiteness: $\|x\|_{\infty} = 0$ if and only if both $|\xi_1| = 0$ and $|\xi_2| = 0$, which means $x = 0$.
- Scalar multiplication: For any scalar $\alpha$, $\|\alpha x\|_{\infty} = \max\{|\alpha \xi_1|, |\alpha \xi_2|\} = |\alpha| \max\{|\xi_1|, |\xi_2|\} = |\alpha| \|x\|_{\infty}$.
- Triangle inequality: For any vectors $x$ and $y$, $\|x + y\|_{\infty} \leq \|x\|_{\infty} + \|y\|_{\infty}$ because the maximum absolute value of the sum of components is less than or equal to the sum of the maximum absolute values.

Triangle inequality

For the $L^2$ norm, we want to prove the triangle inequality:

\begin{equation*} \|x + y\|_2 \leq \|x\|_2 + \|y\|_2 \end{equation*}

where $x = (\xi_1, \xi_2)$ and $y = (\eta_1, \eta_2)$. We start by squaring both sides of the inequality:

\begin{equation*} (\|x + y\|_2)^2 \leq (\|x\|_2 + \|y\|_2)^2 \end{equation*}

Expanding the left-hand side, we have:

\begin{equation*} (\xi_1 + \eta_1)^2 + (\xi_2 + \eta_2)^2 \end{equation*}

And the right-hand side becomes:

\begin{equation*} (\|x\|_2)^2 + 2\|x\|_2\|y\|_2 + (\|y\|_2)^2 \end{equation*}

Simplifying the norms, we obtain:

\begin{equation*} \xi_1^2 + 2\xi_1\eta_1 + \eta_1^2 + \xi_2^2 + 2\xi_2\eta_2 + \eta_2^2 \leq \xi_1^2 + \xi_2^2 + 2\sqrt{(\xi_1^2 + \xi_2^2)(\eta_1^2 + \eta_2^2)} + \eta_1^2 + \eta_2^2 \end{equation*}

The inequality holds due to the Cauchy-Schwarz inequality, which asserts:

\begin{equation*} (\sum a_i b_i)^2 \leq (\sum a_i^2)(\sum b_i^2) \end{equation*}

In our case, it implies:

\begin{equation*} (2\xi_1\eta_1 + 2\xi_2\eta_2)^2 \leq (2\sqrt{(\xi_1^2 + \xi_2^2)(\eta_1^2 + \eta_2^2)})^2 \end{equation*}

Since the inequality holds when squared, it also holds when we take the square root of both sides, which gives us the triangle inequality for the $L^2$ norm:

\begin{equation*} \|x + y\|_2 \leq \|x\|_2 + \|y\|_2 \end{equation*}

Problem 7. Prove that the vector space of all continuous real-valued functions on $[a, b]$ forms a normed space $X$ with norm defined by

\begin{equation*} \|x\| = \left( \int_a^b x(t)^2 \, dt \right)^{1/2} \end{equation*}

satisfies the properties (N1) to (N4).

Solution:

To prove that the given norm satisfies the properties (N1) to (N4), we consider two functions $x(t)$ and $y(t)$ from the vector space, and a scalar $\alpha$.

(N1) Non-negativity: Since $x(t)^2 \geq 0$ for all $t$, it follows that

\begin{equation*} \|x\| = \left( \int_a^b x(t)^2 \, dt \right)^{1/2} \geq 0. \end{equation*}

(N2) Definiteness: If $\|x\| = 0$, then

\begin{equation*} \int_a^b x(t)^2 \, dt = 0. \end{equation*}

Since $x(t)^2 \geq 0$, this implies $x(t)^2 = 0$ almost everywhere, and hence $x(t) = 0$ almost everywhere.

(N3) Scalar multiplication: We have

\begin{equation*} \|\alpha x\| = \left( \int_a^b (\alpha x(t))^2 \, dt \right)^{1/2} = |\alpha| \left( \int_a^b x(t)^2 \, dt \right)^{1/2} = |\alpha| \|x\|. \end{equation*}

(N4) Triangle inequality: The proof of the triangle inequality for this norm involves the Cauchy-Schwarz inequality for integrals. We start by expanding the square of the norm of the sum:

\begin{equation*} \|x + y\|^2 = \int_a^b (x(t) + y(t))^2 \, dt. \end{equation*}

Expanding the integrand and applying the Cauchy-Schwarz inequality, we get:

\begin{equation*} \int_a^b (x(t) + y(t))^2 \, dt = \int_a^b x(t)^2 \, dt + 2\int_a^b x(t)y(t) \, dt + \int_a^b y(t)^2 \, dt \leq \int_a^b x(t)^2 \, dt + 2\left(\int_a^b x(t)^2 \, dt\right)^{1/2} \left(\int_a^b y(t)^2 \, dt\right)^{1/2} + \int_a^b y(t)^2 \, dt. \end{equation*}

This implies:

\begin{equation*} \|x + y\|^2 \leq \left( \left( \int_a^b x(t)^2 \, dt \right)^{1/2} + \left( \int_a^b y(t)^2 \, dt \right)^{1/2} \right)^2. \end{equation*}

Taking the square root of both sides, we obtain the triangle inequality:

\begin{equation*} \|x + y\| \leq \|x\| + \|y\|. \end{equation*}

This completes the proof that the vector space of all continuous real-valued functions on $[a, b]$ with the given norm is a normed space.

The more detailes for triangle inequality are:

\begin{equation*} \|x+y\|^2 = \int_a^b (x(t) + y(t))^2 \, dt \end{equation*}

We expand the integrand:

\begin{equation*} \int_a^b (x(t) + y(t))^2 \, dt = \int_a^b (x(t)^2 + 2x(t)y(t) + y(t)^2) \, dt \end{equation*}

We then split the integral:

\begin{equation*} \int_a^b (x(t)^2 + 2x(t)y(t) + y(t)^2) \, dt = \int_a^b x(t)^2 \, dt + 2\int_a^b x(t)y(t) \, dt + \int_a^b y(t)^2 \, dt \end{equation*}

Using the Cauchy-Schwarz inequality for integrals to handle the cross-term:

\begin{equation*} \left(\int_a^b x(t)y(t) \, dt\right)^2 \leq \left(\int_a^b x(t)^2 \, dt\right) \left(\int_a^b y(t)^2 \, dt\right) \end{equation*}

This implies that:

\begin{equation*} 2\int_a^b x(t)y(t) \, dt \leq 2\left(\int_a^b x(t)^2 \, dt\right)^{1/2} \left(\int_a^b y(t)^2 \, dt\right)^{1/2} \end{equation*}

Combine the results to get an upper bound for the integral of the sum:

\begin{equation*} \|x+y\|^2 \leq \int_a^b x(t)^2 \, dt + 2\left(\int_a^b x(t)^2 \, dt\right)^{1/2} \left(\int_a^b y(t)^2 \, dt\right)^{1/2} + \int_a^b y(t)^2 \, dt \end{equation*}

Recognizing that the right-hand side is a perfect square:

\begin{equation*} \|x+y\|^2 \leq \left( \left( \int_a^b x(t)^2 \, dt \right)^{1/2} + \left( \int_a^b y(t)^2 \, dt \right)^{1/2} \right)^2 \end{equation*}

Since both sides are positive, we can take the square root:

\begin{equation*} \|x+y\| \leq \left( \int_a^b x(t)^2 \, dt \right)^{1/2} + \left( \int_a^b y(t)^2 \, dt \right)^{1/2} \end{equation*}

Which simplifies to the triangle inequality for the $L^2$ norm:

\begin{equation*} \|x+y\| \leq \|x\| + \|y\| \end{equation*}

This completes the proof for the triangle inequality of the $L^2$ norm in the vector space of continuous real-valued functions on $[a, b]$.

Problem 8. There are several norms of practical importance on the vector space ofordered n-tuples of numbers - $||x||_1 = |\xi _1| + |\xi _2| + \ldots + |\xi _n|$ - $||x||_p = (|\xi _1|^p + |\xi _2|^p + \ldots + |\xi _n|^p)^{1/p}$ for $p \geq 1$ - $||x||_\infty = \max\{|\xi _1|, |\xi _2|, \ldots, |\xi _n|\}$

Solution:

To verify that each of these functions is a norm, we need to show they satisfy the four properties of norms:

Non-negativity: $||x|| \geq 0$ for all $x \in X$.
Definiteness: $||x|| = 0$ if and only if $x$ is the zero vector.
Homogeneity (or scalability): $||\alpha x|| = |\alpha| ||x||$ for any scalar $\alpha$ and any $x \in X$.
Triangle inequality: $||x + y|| \leq ||x|| + ||y||$ for all $x, y \in X$.

For the $p$-norm, the first three properties are straightforward to verify. The triangle inequality for the $p$-norm is established by Minkowski's inequality.

\begin{equation*} \left(\sum_{i=1}^{n} |\xi _i + \eta _i|^p\right)^{1/p} \leq \left(\sum_{i=1}^{n} |\xi _i|^p\right)^{1/p} + \left(\sum_{i=1}^{n} |\eta _i|^p\right)^{1/p} \end{equation*}

This is the triangle inequality for the $p$-norms. To prove Minkowski's inequality, we consider:

For $p=1$, the inequality reduces to the triangle inequality for absolute values, which is trivially true.
For $p>1$, we use Hölder's inequality, which for $\frac{1}{p} + \frac{1}{q} = 1$ (where $p,q>1$), states:

\begin{equation*} \sum_{i=1}^{n} |\xi _i \eta _i| \leq \left(\sum_{i=1}^{n} |\xi _i|^p\right)^{1/p} \left(\sum_{i=1}^{n} |\eta _i|^q\right)^{1/q} \end{equation*}

By applying Hölder's inequality, we rewrite the left side of Minkowski's inequality as follows:

\begin{equation*} \sum_{i=1}^{n} |\xi _i + \eta _i|^p = \sum_{i=1}^{n} |\xi _i + \eta _i|^{p-1} |\xi _i + \eta _i| \end{equation*}

We then apply Hölder's inequality with $|\xi _i + \eta _i|^{p-1}$ and $|\xi _i + \eta _i|$ as the sequences, and by doing the same for $|\eta _i|$ instead of $|\xi _i|$, and then summing the inequalities, we obtain:

\begin{equation*} \left(\sum_{i=1}^{n} |\xi _i + \eta _i|^p\right) \leq \left(\sum_{i=1}^{n} |\xi _i + \eta _i|^p\right)^{\frac{p}{p-1}} \left(\left(\sum_{i=1}^{n} |\xi _i|^p\right)^{1/p} + \left(\sum_{i=1}^{n} |\eta _i|^p\right)^{1/p}\right) \end{equation*}

Raising both sides to the $\frac{1}{p}$ power completes the proof of Minkowski's inequality and establishes the triangle inequality for the $p$-norm.

By verifying that each function satisfies all four norm properties, we show that $||x||_1$, $||x||_p$, and $||x||_\infty$ each define a norm on the vector space XX.

Problem 9. Verify that the space $C[a, b]$ with the norm given by

\begin{equation*} \|x\| = \max_{t \in [a, b]} |x(t)| \end{equation*}

where $[a, b]$ is the interval, defines a norm.

Solution:

To verify that the given formula defines a norm on the space $C[a, b]$, we need to check that it satisfies the following properties for all functions $x, y \in C[a, b]$ and all scalars $\lambda$:

Non-negativity: $\|x\| \geq 0$, and $\|x\| = 0$ if and only if $x(t) = 0$ for all $t \in [a, b]$.
Absolute scalability: $\|\lambda x\| = |\lambda| \|x\|$.
Triangle inequality: $\|x + y\| \leq \|x\| + \|y\|$.

Non-negativity

For any $x \in C[a, b]$, since $x(t)$ is a continuous function on a closed interval, it will attain a maximum absolute value which is non-negative. Thus, $\|x\| = \max_{t \in [a, b]} |x(t)| \geq 0$. Also, $\|x\| = 0$ if and only if $|x(t)| = 0$ for all $t$, which means $x(t) = 0$ for all $t \in [a, b]$.

Absolute scalability

For any scalar $\lambda$ and any $x \in C[a, b]$, we have:

\begin{equation*} \|\lambda x\| = \max_{t \in [a, b]} |\lambda x(t)| = |\lambda| \max_{t \in [a, b]} |x(t)| = |\lambda| \|x\| \end{equation*}

This follows because the absolute value function is homogeneous, meaning $|ab| = |a||b|$ for all $a, b$.

Triangle inequality

The triangle inequality states that for any $x, y \in C[a, b]$, the norm of their sum is less than or equal to the sum of their norms:

\begin{equation*} \|x + y\| = \max_{t \in [a, b]} |x(t) + y(t)| \leq \max_{t \in [a, b]} (|x(t)| + |y(t)|) \leq \max_{t \in [a, b]} |x(t)| + \max_{t \in [a, b]} |y(t)| = \|x\| + \|y\| \end{equation*}

The inequality $|x(t) + y(t)| \leq |x(t)| + |y(t)|$ follows from the triangle inequality for absolute values, and we use the fact that the maximum value of a sum is less than or equal to the sum of the maximum values.

Since the given norm satisfies all three properties, it is indeed a norm on the space $C[a, b]$.

Clarification of Non-negativity Property:

For any function $x$ in $C[a, b]$, the norm is defined as

\begin{equation*} \|x\| = \max_{t \in [a, b]} |x(t)| \end{equation*}

Since $x(t)$ is a continuous function on the closed interval $[a, b]$, it has the following properties:

Boundedness: A continuous function on a closed interval is bounded. That is, there exists a real number $M$ such that $|x(t)| \leq M$ for all $t \in [a, b]$.
Attainment of Bounds: By the extreme value theorem, a continuous function on a closed interval attains its maximum and minimum values at least once within that interval. Therefore, there exists some $t_{\text{max}} \in [a, b]$ where $|x(t_{\text{max}})| = \max_{t \in [a, b]} |x(t)|$.

With these properties, the non-negativity of the norm can be discussed in detail:

Non-negativity: The norm $\|x\|$ is always non-negative because absolute values are non-negative, and because $x$ is continuous, it achieves a maximum absolute value on $[a, b]$. This maximum is the value of the norm and cannot be negative.
Zero Norm: The norm $\|x\|$ is zero if and only if the maximum absolute value that $x(t)$ achieves over the interval $[a, b]$ is zero. If $\|x\| = 0$, then $\max_{t \in [a, b]} |x(t)| = 0$, implying that $|x(t)| = 0$ for all $t \in [a, b]$. Since a real number's absolute value is zero if and only if the number itself is zero, it follows that $x(t) = 0$ for all $t \in [a, b]$. Conversely, if $x(t) = 0$ for all $t \in [a, b]$, then clearly $\|x\| = 0$.

These points confirm the non-negativity of the norm and the condition under which the norm of a function is zero.

Why a Continuous Function on a Closed Interval is Bounded:

A continuous function on a closed interval ([a, b]) is guaranteed to be bounded. This assertion is supported by the Boundedness Theorem, which is a direct consequence of the Extreme Value Theorem. The reasoning is as follows:

Closed Interval: A closed interval $[a, b]$ includes its endpoints, making it a compact set in the real numbers. Compactness in real numbers implies that the set is both closed and bounded.
Continuity: A function $f$ is continuous on $[a, b]$ if, for every point $c$ in the interval and every $\epsilon > 0$, there exists a $\delta > 0$ such that for all $x$ within $\delta$ of $c$, the value of $f(x)$ is within $\epsilon$ of $f(c)$. This means the function does not exhibit jumps, breaks, or infinite behavior within the interval.
Extreme Value Theorem: Due to continuity and the closed nature of the interval, the Extreme Value Theorem ensures that a continuous function on a closed interval will attain both its maximum and minimum values within that interval. This theorem does not hold for open intervals or functions that are not continuous.

Intuitive Explanation:

If a continuous function were not bounded on a closed interval, it would suggest that the function could assume arbitrarily large or small values. However, continuity ensures a gradual change without sudden leaps. As the interval is closed, the function cannot 'escape' to infinity at the endpoints, because these points are part of the interval and the function must be defined and finite at them. If the function were unbounded, there would exist points where the function's values would become arbitrarily large, contradicting the very definition of continuity.

Thus, the interplay between the function's continuity (precluding abrupt changes or infinite values) and the interval's closed nature (disallowing endpoints from being unbounded) ensures that the function must be bounded.

Problem Statement Show that the closed unit ball $\tilde{B}_1(0)$ in a normed space $X$ is convex.

Solution: To prove that the closed unit ball $\tilde{B}_1(0)$ is convex, we need to demonstrate that for any two points $x, y \in \tilde{B}_1(0)$, the line segment joining them is entirely contained within $\tilde{B}_1(0)$. A line segment in a vector space can be represented as the set of all convex combinations of $x$ and $y$, which is given by

\begin{equation*} z = \alpha x + (1 - \alpha) y \end{equation*}

where $0 \leq \alpha \leq 1$. The point $z$ is a point on the line segment between $x$ and $y$, varying smoothly from one to the other as $\alpha$ goes from 0 to 1.

Now, we must show that $z$ also belongs to $\tilde{B}_1(0)$, which means that $\|z\| \leq 1$. Given that $x, y \in \tilde{B}_1(0)$, we have $\|x\| \leq 1$ and $\|y\| \leq 1$. The norm of $z$ is computed as follows:

\begin{equation*} \|z\| = \|\alpha x + (1 - \alpha) y\| \leq \alpha \|x\| + (1 - \alpha) \|y\| \end{equation*}

Here we have used the triangle inequality and the property of absolute scalability of norms. Because $\|x\| \leq 1$ and $\|y\| \leq 1$, it follows that:

\begin{equation*} \alpha \|x\| + (1 - \alpha) \|y\| \leq \alpha \cdot 1 + (1 - \alpha) \cdot 1 = \alpha + 1 - \alpha = 1 \end{equation*}

Hence, $\|z\| \leq 1$, which implies that $z$ is in $\tilde{B}_1(0)$. This confirms that $\tilde{B}_1(0)$ is convex, as every point $z$ formed as a convex combination of any two points $x$ and $y$ in $\tilde{B}_1(0)$ also lies within $\tilde{B}_1(0)$.

Explanation:

The point $z = \alpha x + (1 - \alpha) y$ is crucial in the definition of a convex set because it represents any point on the line segment between two points $x$ and $y$ within a vector space $X$. The scalar $\alpha$ ranges from 0 to 1 and determines the position of $z$ on the line segment:

When $\alpha = 0$, the expression becomes $z = 0 \cdot x + (1 - 0) \cdot y = y$, placing $z$ at the point $y$.
When $\alpha = 1$, it simplifies to $z = 1 \cdot x + (1 - 1) \cdot y = x$, positioning $z$ at the point $x$.
For values of $\alpha$ between 0 and 1, $z$ lies within the line segment connecting $x$ and $y$.

A set is convex if, for every pair of points within the set, the entire line segment that connects them also lies within the set. The point $z$ symbolizes a general point on the line segment between $x$ and $y$. Demonstrating that for all values of $\alpha$ in the closed interval [0, 1], $z$ remains within the set proves the set's convexity. This is the essence of why the expression $z = \alpha x + (1 - \alpha) y$ is used: it is a generic representation of any point on the line segment, and verifying that all such points are contained within the set for all $\alpha$ in [0, 1] affirms the convexity of the set.

Problem Statement 15. Show that a subset $M$ in a normed space $X$ is bounded if and only if there is a positive number $c$ such that $\|x\| \leq c$ for every $x \in M$. The diameter $\delta(A)$ of a nonempty set $A$ in a metric space $(X, d)$ is defined to be $\delta(A) = \sup \{d(x, y) : x, y \in A\}$. A set $A$ is said to be bounded if $\delta(A) < \infty$.

Solution:

If $M$ is bounded, then there exists a $c$ such that $\|x\| \leq c$ for every $x \in M$:

Step 1: Assume $M$ is bounded.: By definition of boundedness in the context of a normed space, this means that the diameter $\delta(M)$, which is the supremum of the distances between all pairs of points in $M$, is less than infinity. In mathematical terms, $\delta(M) = \sup \{\|x - y\| : x, y \in M\} = b < \infty$.
Step 2: Choose any $x \in M$ and also take a fixed element $x_0 \in M$.: We need a reference point in $M$ to compare all other points to. The choice of $x_0$ is arbitrary but will be used to help establish a universal bound for the norm of any element in $M$.
Step 3: Define $c = b + \|x_0\|$, which is a positive number.: Here $b$ is the diameter $\delta(M)$ we defined earlier, which captures the maximum distance between any two points in $M$. We are defining a new constant $c$ that not only accounts for this maximum distance but also adds the norm of our reference point $x_0$ to ensure that $c$ will be an upper bound for the norm of any point in $M$.

Step 4: For any $x \in M$, estimate $\|x\|$ using the triangle inequality:

Step 5: This shows that for every $x \in M$, $\|x\| \leq c$.: The definition of $c$ was constructed to be a bound for the norms of all points in $M$ relative to the fixed point $x_0$ and the diameter of $M$.

Conversely , if there exists a $c$ such that $\|x\| \leq c$ for every $x \in M$, then $M$ is bounded:

Step 1: Assume that for every $x \in M$, $\|x\| \leq c$ for some positive number $c$.: This is the hypothesis that there is a uniform bound on the norms of all elements in the set $M$.

Step 2: For any $x, y \in M$, using the triangle inequality we have:

Step 3: Since $\|x\| \leq c$ and $\|y\| \leq c$, it follows that:

Step 4: This inequality holds for all $x, y \in M$, so $\delta(M)$, the supremum of all such distances, is at most $2c$.: Here we use the definition of $\delta(M)$ again, which is the supremum of all distances between points in $M$. Since we've shown that every such distance is bounded by $2c$, it follows that $\delta(M) \leq 2c$.
Step 5: Therefore, $\delta(M) \leq 2c < \infty$, which means $M$ is bounded.: Since $2c$ is a finite number, the supremum of the set of distances (the diameter) is also finite, confirming that $M$ is bounded by definition.

In both directions of the proof, the definition of $\delta(M)$ as the supremum of distances $\|x - y\|$ for $x, y \in M$ is crucial for establishing the boundedness of $M$.

Kreyszig 2.1, Normed Spaces - Vector Space

Lucy Nowacki

2023-10-11 15:24

Problem 1. Show that the set of real numbers, with the usual addition and multiplication, constitutes a one-dimensional real vector space, and the set of all complex numbers constitutes a one-dimensional complex vector space.

Proof for Real Numbers as a One-Dimensional Real Vector Space

Additive Identity: There exists a number $0$ such that for every real number $x$, $x + 0 = x$.

Example:

\begin{equation*} 5 + 0 = 5 \end{equation*}
Additive Inverse: For every real number $x$, there exists a number $-x$ such that $x + (-x) = 0$.

Example:

\begin{equation*} 5 + (-5) = 0 \end{equation*}
Closure under Addition: For every pair of real numbers $x$ and $y$, their sum $x + y$ is also a real number.

Example:

\begin{equation*} 5 + 3 = 8 \end{equation*}
Closure under Scalar Multiplication: For every real number $x$ and every scalar $a$, the product $ax$ is also a real number.

Example:

\begin{equation*} 3 \times 5 = 15 \end{equation*}
Distributivity of Scalar Multiplication with respect to Vector Addition: For every real number $x$ and $y$ and every scalar $a$, $a(x + y) = ax + ay$.

Example:

\begin{equation*} 3(5 + 2) = 3 \times 5 + 3 \times 2 \end{equation*}
Distributivity of Scalar Multiplication with respect to Scalar Addition: For every real number $x$ and scalars $a$ and $b$, $(a + b)x = ax + bx$.

Example:

\begin{equation*} (3 + 2) \times 5 = 3 \times 5 + 2 \times 5 \end{equation*}
Associativity of Scalar Multiplication: For every real number $x$ and scalars $a$ and $b$, $a(bx) = (ab)x$.

Example:

\begin{equation*} 3(2 \times 5) = (3 \times 2) \times 5 \end{equation*}
Multiplicative Identity of Scalar Multiplication: There exists a scalar $1$ such that for every real number $x$, $1 \times x = x$.

Example:

\begin{equation*} 1 \times 5 = 5 \end{equation*}

Proof for Complex Numbers as a One-Dimensional Complex Vector Space

Additive Identity: There exists a complex number $0$ such that for every complex number $z$, $z + 0 = z$.

Example:

\begin{equation*} (3 + 2i) + 0 = 3 + 2i \end{equation*}
Additive Inverse: For every complex number $z$, there exists a complex number $-z$ such that $z + (-z) = 0$.

Example:

\begin{equation*} (3 + 2i) + (-3 - 2i) = 0 \end{equation*}
Closure under Addition: For every pair of complex numbers $z_1$ and $z_2$, their sum $z_1 + z_2$ is also a complex number.

Example:

\begin{equation*} (3 + 2i) + (1 + 4i) = 4 + 6i \end{equation*}
Closure under Scalar Multiplication: For every complex number $z$ and every scalar $c$ in the complex numbers, the product $cz$ is also a complex number.

Example:

\begin{equation*} (2 + i) \times (3 + 2i) = 6 + 7i \end{equation*}
Distributivity of Scalar Multiplication with respect to Vector Addition: For every complex number $z_1$ and $z_2$ and every scalar $c$ in the complex numbers, $c(z_1 + z_2) = cz_1 + cz_2$.

Example:

\begin{equation*} (2 + i)((3 + 2i) + (1 + 4i)) = (2 + i)(3 + 2i) + (2 + i)(1 + 4i) \end{equation*}
Distributivity of Scalar Multiplication with respect to Scalar Addition: For every complex number $z$ and scalars $c_1$ and $c_2$ in the complex numbers, $(c_1 + c_2)z = c_1z + c_2z$.

Example:

\begin{equation*} (2 + i + 1)(3 + 2i) = (2 + i)(3 + 2i) + 3 + 2i \end{equation*}
Associativity of Scalar Multiplication: For every complex number $z$ and scalars $c_1$ and $c_2$ in the complex numbers, $c_1(c_2z) = (c_1c_2)z$.

Example:

\begin{equation*} (2 + i)(i(3 + 2i)) = (2 + i \times i)(3 + 2i) \end{equation*}
Multiplicative Identity of Scalar Multiplication: There exists a scalar $1$ in the complex numbers such that for every complex number $z$, $1 \times z = z$.

Example:

\begin{equation*} 1 \times (3 + 2i) = 3 + 2i \end{equation*}

Problem 2. Proof for Properties of the Zero Vector

Given that $\theta$ is the zero vector in a vector space.

Proof for $0 \cdot \mathbf{x} = \theta$:

Using the distributive property of scalar multiplication over vector addition, we have:

\begin{equation*} 0 \cdot \mathbf{x} = (0 + 0) \cdot \mathbf{x} = 0 \cdot \mathbf{x} + 0 \cdot \mathbf{x} \end{equation*}

Subtracting $0 \cdot \mathbf{x}$ from both sides:

\begin{equation*} 0 \cdot \mathbf{x} - 0 \cdot \mathbf{x} = \theta \end{equation*}

Thus, $0 \cdot \mathbf{x} = \theta$.

$\blacksquare$
Proof for $\alpha \cdot \theta = \theta$:

For any scalar $\alpha$:

\begin{equation*} \alpha \cdot \theta = \alpha \cdot (0 \cdot \mathbf{x}) = (\alpha \cdot 0) \cdot \mathbf{x} = 0 \cdot \mathbf{x} = \theta \end{equation*}

Therefore, $\alpha \cdot \theta = \theta$.

$\blacksquare$

Proof for the Property $(-1) \cdot \mathbf{x} = -\mathbf{x}$

Given a vector $\cdot$ in a vector space.

To prove: $(-1) \cdot \mathbf{x} = -\mathbf{x}$

Proof:

Using the distributive property of scalar multiplication over scalar addition, we have:

\begin{equation*} \mathbf{x} + (-1) \cdot \mathbf{x} = (1 + (-1)) \cdot \mathbf{x} = 0 \cdot \mathbf{x} \end{equation*}

From a previous proof, we know that:

\begin{equation*} 0 \cdot \mathbf{x} = \theta \end{equation*}

Where $\theta$ is the zero vector.

Therefore:

\begin{equation*} \mathbf{x} + (-1) \cdot \mathbf{x} = \theta \end{equation*}

This implies that $(-1) \cdot \mathbf{x}$ is the additive inverse of $\mathbf{x}$, which is denoted as $-\mathbf{x}$.

Hence, $(-1) \cdot \mathbf{x} = -\mathbf{x}$.

Problem 3. Span of the Set M in $\mathbb{R}^3$

The span of a set of vectors is the set of all linear combinations of those vectors. In other words, it's the set of all vectors that can be obtained by taking weighted sums of the vectors in the set.

Given the set $M = \{ (1,1,1), (0,0,2) \}$ in $\mathbb{R}^3$, any vector in the span of $M$ can be written as:

\begin{equation*} \alpha (1,1,1) + \beta (0,0,2) = (\alpha, \alpha, \alpha + 2\beta) \end{equation*}

From the above expression, we can see that:

The first and second components of any vector in the span are always equal.
The third component can be any real number since $\alpha$ and $\beta$ can be any real numbers.

Thus, the span of $M$ in $\mathbb{R}^3$ is the set of all vectors of the form $(a, a, b)$ where $a$ and $b$ are real numbers. This is a plane in $\mathbb{R}^3$ that passes through the origin and is defined by the equation $x = y$.

Visualization

This plane passes through the origin and spans infinitely in all directions within the plane. The vectors $(1,1,1)$ and $(0,0,2)$ from the set $M$ lie on this plane, and their linear combinations fill out the entire plane.

Problem 4. Determination of Subspaces in $\mathbb{R}^3$

To determine whether a subset of $\mathbb{R}^3$ constitutes a subspace, it must satisfy the following three properties:

The zero vector of $\mathbb{R}^3$ is in the subset.
The subset is closed under vector addition.
The subset is closed under scalar multiplication.

Given the subsets:

All $x$ with $\xi_1 = \xi_2$, and $\xi_2 = 0$

Evaluation:

This means $x$ is of the form $(0, 0, \xi_3)$.

The zero vector $(0, 0, 0)$ is in this subset.
Sum of any two vectors in this subset will also be in this subset.
Scalar multiplication of any vector in this subset will also be in this subset.

Thus, (a) is a subspace of $\mathbb{R}^3$.

All $x$ with $\xi_1 = \xi_2 + 1$

Evaluation:

This subset doesn't contain the zero vector.
It's not closed under scalar multiplication since multiplying by a negative scalar will result in a vector outside this subset.

Thus, (b) is not a subspace of $\mathbb{R}^3$.

All $x$ with positive $\xi_1, \xi_2, \xi_3$

Evaluation:

This subset doesn't contain the zero vector.
It's not closed under scalar multiplication since multiplying by a negative scalar will result in a vector outside this subset.

Thus, (c) is not a subspace of $\mathbb{R}^3$.

All $x$ with $\xi_1 - \xi_2 + \xi_3 = \text{const}$

Evaluation:

If the constant is zero, then this subset could be a subspace. But if the constant is any other value, then the subset won't contain the zero vector, so it won't be a subspace.

Conclusion:

1. is a subspace of $\mathbb{R}^3$.
1. is not a subspace of $\mathbb{R}^3$.
1. is not a subspace of $\mathbb{R}^3$.
1. could be a subspace if the constant is zero; otherwise, it's not.

$\blacksquare$

Problem 5. The space $C[a,b]$ consists of all continuous functions defined on the closed interval $[a, b]$. That is, $C[a,b]$ is the set of functions $f: [a, b] \to \mathbb{R}$ such that $f$ is continuous on $[a, b]$.

Proof of Linear Independence

To show that the set $\{ x_1, ..., x_n \}$, where $x_j(x) = t^j$ for $j = 1, ..., n$, is linearly independent in $C[a,b]$, we need to show that the only scalars $c_1, ..., c_n$ that satisfy the equation

\begin{equation*} c_1 x_1(t) + c_2 x_2(t) + ... + c_n x_n(t) = 0 \end{equation*}

for all $t$ in $[a, b]$ are $c_1 = c_2 = ... = c_n = 0$.

Given the functions $x_j(t) = t^j$, the above equation becomes:

\begin{equation*} c_1 t + c_2 t^2 + ... + c_n t^n = 0 \end{equation*}

This is a polynomial of degree $n$. If this polynomial is identically zero on the interval $[a, b]$, then all its coefficients must be zero. This is because a non-zero polynomial of degree $n$ can have at most $n$ roots, but if the polynomial is zero for all $t$ in a continuous interval, it must be the zero polynomial.

Therefore, $c_1 = c_2 = ... = c_n = 0$, which proves that the set $\{ x_1, ..., x_n \}$ is linearly independent in $C[a,b]$.

Clarification on Polynomials and Their Roots

A polynomial of degree $n$ is an expression of the form:

\begin{equation*} p(t) = c_0 + c_1 t + c_2 t^2 + \dots + c_n t^n \end{equation*}

where $c_0, c_1, \dots, c_n$ are coefficients and $n$ is a non-negative integer.

The Fundamental Theorem of Algebra states that a non-zero polynomial of degree $n$ has exactly $n$ roots, counting multiplicities. This means that the polynomial can be zero at most $n$ times.

However, if we have a polynomial that is zero for every value of $t$ in a continuous interval (like $[a, b]$), then it's not just zero at isolated points—it's zero everywhere in that interval. This behavior is inconsistent with a polynomial that has non-zero coefficients because such a polynomial would not be zero at more than $n$ points.

Therefore, the only way a polynomial can be zero for all $t$ in a continuous interval is if it's the zero polynomial, which means all its coefficients $c_0, c_1, \dots, c_n$ are zero.

In simpler terms: If you have a polynomial that's zero everywhere in an interval, then it's actually the zero polynomial, and all its coefficients are zero.

The crux of the proof is:

If a polynomial is zero for every value of t within a continuous interval (like [a,b]), then it cannot merely be the result of the polynomial having roots within that interval. Instead, the polynomial must be the zero polynomial, meaning all its coefficients are zero.

In essence, a non-zero polynomial can only be zero at a finite number of points determined by its degree. If it's zero everywhere in a continuous interval, it contradicts this property, so it must be the zero polynomial.

Problem 6. Show that in an $n$-dimensional vector space $X$, the representation of any vector $x$ as a linear combination of a given basis vectors $e_1, \dots, e_n$ is unique.

Proof:

Assume, for the sake of contradiction, that there are two different representations of the vector $x$ in terms of the basis vectors.

Let these representations be:

\begin{equation*} x = a_1 e_1 + a_2 e_2 + \dots + a_n e_n \end{equation*}

\begin{equation*} x = b_1 e_1 + b_2 e_2 + \dots + b_n e_n \end{equation*}

where $a_i$ and $b_i$ are scalars, and at least one $a_i$ is not equal to $b_i$.

Subtracting the second equation from the first, we get:

\begin{equation*} 0 = (a_1 - b_1) e_1 + (a_2 - b_2) e_2 + \dots + (a_n - b_n) e_n \end{equation*}

Now, since $\{ e_1, e_2, \dots, e_n \}$ is a basis for $X$, these vectors are linearly independent. This means that the only way the above equation can hold is if each coefficient $(a_i - b_i)$ is zero.

Thus, $a_i - b_i = 0$ for all $i$, which implies $a_i = b_i$ for all $i$.

This contradicts our assumption that the two representations were different. Therefore, our original assumption was false, and the representation of any vector $x$ as a linear combination of the basis vectors is unique.

Truth Table for the Proof's Logic

https://www.wolframcloud.com/obj/845f9add-0989-4031-9cc4-aaec65b61ba3

The logical structure of the proof can be summarized as:

Assumption: There are two different representations of $x$.
Implication: Subtracting the two representations results in a non-zero polynomial.
Contradiction: A non-zero polynomial cannot be zero everywhere in a continuous interval.
Conclusion: The assumption is false; the representation is unique.

Problem 7: Basis and Dimension of Complex Vector Space $X$

Problem Statement:

Let $\{e_1,...,e_n\}$ be a basis for a complex vector space $X$. Find the basis for $X$ regarded as a real vector space. What is the dimension of $X$ in either case?

Solution:

Basis for :math:`X` as a Complex Vector Space:

Given that $\{e_1, \dots, e_n\}$ is a basis for the complex vector space $X$, any vector $v$ in $X$ can be expressed as:

\begin{equation*} v = a_1 e_1 + a_2 e_2 + \dots + a_n e_n \end{equation*}

where $a_i$ are complex numbers.

Basis for :math:`X` as a Real Vector Space:

When we regard $X$ as a real vector space, the basis for $X$ is:

\begin{equation*} \{e_1, i e_1, e_2, i e_2, \dots, e_n, i e_n\} \end{equation*}

Dimension of :math:`X` in Either Case:

As a complex vector space, the dimension of $X$ is $n$.
As a real vector space, the dimension of $X$ is $2n$.

$\blacksquare$

Problem 8. If $M$ is a linearly dependent set in a complex vector space $X$, is $M$ linearly dependent in $X$, regarded as a real vector space?

Solution

If $M$ is linearly dependent in a complex vector space $X$, then there exist complex scalars, not all zero, such that:

\begin{equation*} c_1 v_1 + c_2 v_2 + \dots + c_n v_n = 0 \end{equation*}

where $v_1, v_2, \dots, v_n$ are vectors in $M$ and at least one of the $c_i$ is non-zero.

Now, when we regard $X$ as a real vector space, each complex scalar $c_i$ can be expressed as:

\begin{equation*} c_i = a_i + b_i i \end{equation*}

where $a_i$ and $b_i$ are real numbers.

Substituting this into our linear combination, we get:

\begin{equation*} (a_1 + b_1 i) v_1 + (a_2 + b_2 i) v_2 + \dots + (a_n + b_n i) v_n = 0 \end{equation*}

This can be rearranged as:

\begin{equation*} a_1 v_1 + a_2 v_2 + \dots + a_n v_n + i(b_1 v_1 + b_2 v_2 + \dots + b_n v_n) = 0 \end{equation*}

For the above equation to hold true, both the real part and the imaginary part of the equation must be zero:

\begin{equation*} a_1 v_1 + a_2 v_2 + \dots + a_n v_n = 0 b_1 v_1 + b_2 v_2 + \dots + b_n v_n = 0 \end{equation*}

Given that at least one of the $c_i$ is non-zero, it implies that at least one of the $a_i$ or $b_i$ is non-zero. Therefore, the set $M$ is also linearly dependent when $X$ is regarded as a real vector space.

Conclusion

Yes, if $M$ is linearly dependent in a complex vector space $X$, then $M$ is also linearly dependent in $X$ when regarded as a real vector space.

$\blacksquare$

Problem 9 Statement

On a fixed interval $[a, b] \subset \mathbb{R}$, consider the set $X$ consisting of all polynomials with real coefficients and of degree not exceeding a given $n$, and the polynomial $x = 0$ (for which a degree is not defined in a usual discussion of degree). Show that $X$ with usual addition and usual multiplication by real numbers is a real vector space of dimension $n+1$.

/ Solution

Vector Space Axioms Verification:

Closure under Addition: For any two polynomials $p(t), q(t) \in X$, their sum $p(t) + q(t)$ is also a polynomial with real coefficients, and its degree is not exceeding $n$. Therefore, $p(t) + q(t) \in X$.
Closure under Scalar Multiplication: For any polynomial $p(t) \in X$ and any real number $c$, the product $c \cdot p(t)$ is also a polynomial with real coefficients, and its degree is not exceeding $n$. Therefore, $c \cdot p(t) \in X$.
Associativity of Addition: For any $p(t), q(t), r(t) \in X$, $(p(t) + q(t)) + r(t) = p(t) + (q(t) + r(t))$.
Commutativity of Addition: For any $p(t), q(t) \in X$, $p(t) + q(t) = q(t) + p(t)$.
Identity Element of Addition: The zero polynomial $0$ acts as the additive identity in $X$, since for any $p(t) \in X$, $p(t) + 0 = p(t)$.
Inverse Elements of Addition: For every $p(t) \in X$, its additive inverse is $-p(t)$, which is also in $X$. Thus, $p(t) + (-p(t)) = 0$.
Compatibility of Scalar Multiplication with Field Multiplication: For any real numbers $a, b$ and any $p(t) \in X$, $a \cdot (b \cdot p(t)) = (a \cdot b) \cdot p(t)$.
Identity Element of Scalar Multiplication: For any $p(t) \in X$, $1 \cdot p(t) = p(t)$, where 1 is the multiplicative identity in $\mathbb{R}$.
Distributivity of Scalar Multiplication with respect to Vector Addition: For any real number $a$ and any $p(t), q(t) \in X$, $a \cdot (p(t) + q(t)) = a \cdot p(t) + a \cdot q(t)$.
Distributivity of Scalar Multiplication with respect to Scalar Addition: For any real numbers $a, b$ and any $p(t) \in X$, $(a + b) \cdot p(t) = a \cdot p(t) + b \cdot p(t)$.

Basis and Dimension:

A basis for $X$ can be the set of monomials $\{e_0, e_1, \ldots, e_n\}$, where $e_j(t) = t^j$ for $t \in [a, b]$ and $0 \leq j \leq n$. This set is linearly independent and spans $X$, as any polynomial of degree not exceeding $n$ can be written as a linear combination of these monomials.

The dimension of $X$ is the number of vectors in its basis, which is $n+1$.

Conclusion:

$X$ is a real vector space of dimension $n+1$ on the interval $[a, b]$, with a basis $\{e_0, e_1, \ldots, e_n\}$.

Problem 9 (Real Polynomial Example) Statement

On a fixed interval $[a, b] \subset \mathbb{R}$, consider two specific real polynomials $p(t) = 2t^2 + 3t + 1$ and $q(t) = t^2 - 2t + 4$. Show that the set $X$ of all real polynomials of degree not exceeding a given $n$ with the usual addition and scalar multiplication is a real vector space.

Solution

Vector Space Axioms Verification:

Closure under Addition:

\begin{equation*} p(t) + q(t) = (2t^2 + 3t + 1) + (t^2 - 2t + 4) = 3t^2 + t + 5 \end{equation*}

The result is a polynomial of degree 2, which is not exceeding $n$ if $n \geq 2$.
Closure under Scalar Multiplication: Let's take a real number $c = 3$.

\begin{equation*} c \cdot p(t) = 3 \cdot (2t^2 + 3t + 1) = 6t^2 + 9t + 3 \end{equation*}

The result is a polynomial of degree 2, which is not exceeding $n$ if $n \geq 2$.
Associativity of Addition: Let's take another polynomial $r(t) = 4t - 1$.

\begin{equation*} (p(t) + q(t)) + r(t) = (3t^2 + t + 5) + (4t - 1) = 3t^2 + 5t + 4 p(t) + (q(t) + r(t)) = (2t^2 + 3t + 1) + (5t^2 - 2t + 3) = 3t^2 + 5t + 4 \end{equation*}

Both results are equal.
Commutativity of Addition:

\begin{equation*} p(t) + q(t) = 3t^2 + t + 5 q(t) + p(t) = 3t^2 + t + 5 \end{equation*}

Both results are equal.
Identity Element of Addition: The zero polynomial is $0$.

\begin{equation*} p(t) + 0 = 2t^2 + 3t + 1 0 + p(t) = 2t^2 + 3t + 1 \end{equation*}

Both results are equal to $p(t)$.
Inverse Elements of Addition: The additive inverse of $p(t)$ is $-p(t) = -2t^2 - 3t - 1$.

\begin{equation*} p(t) + (-p(t)) = 2t^2 + 3t + 1 + (-2t^2 - 3t - 1) = 0 \end{equation*}
Compatibility of Scalar Multiplication with Field Multiplication: For real numbers $a = 2$ and $b = 3$,

\begin{equation*} a \cdot (b \cdot p(t)) = 2 \cdot (3 \cdot (2t^2 + 3t + 1)) = 12t^2 + 18t + 6 (a \cdot b) \cdot p(t) = (2 \cdot 3) \cdot (2t^2 + 3t + 1) = 12t^2 + 18t + 6 \end{equation*}

Both results are equal.
Identity Element of Scalar Multiplication:

\begin{equation*} 1 \cdot p(t) = 1 \cdot (2t^2 + 3t + 1) = 2t^2 + 3t + 1 \end{equation*}

The result is equal to $p(t)$.
Distributivity of Scalar Multiplication with respect to Vector Addition: For any real number $a$,

\begin{equation*} a \cdot (p(t) + q(t)) = 3 \cdot (3t^2 + t + 5) = 9t^2 + 3t + 15 a \cdot p(t) + a \cdot q(t) = 3 \cdot (2t^2 + 3t + 1) + 3 \cdot (t^2 - 2t + 4) = 9t^2 + 3t + 15 \end{equation*}

Both results are equal.
Distributivity of Scalar Multiplication with respect to Scalar Addition: For any real numbers $a, b$,

\begin{equation*} (a + b) \cdot p(t) = (3 + 2) \cdot (2t^2 + 3t + 1) = 10t^2 + 15t + 5 a \cdot p(t) + b \cdot p(t) = 3 \cdot (2t^2 + 3t + 1) + 2 \cdot (2t^2 + 3t + 1) = 10t^2 + 15t + 5 \end{equation*}

Both results are equal.

Basis and Dimension:

A basis for $X$ can be the set of monomials $\{1, t, t^2, \ldots, t^n\}$. In our specific example, a basis for polynomials of degree not exceeding 2 is $\{1, t, t^2\}$.

The dimension of $X$ is the number of vectors in its basis, which is $n+1$. In our specific example, the dimension is 3.

Conclusion:

The set $X$ of all real polynomials of degree not exceeding $n$ on the interval $[a, b]$ is a real vector space of dimension $n+1$, with the usual addition and scalar multiplication.

Problem Statement

Show that we can obtain a complex metric space $\tilde{X}$ in a similar fashion if we let the coefficients be complex. Also, determine if $X$ is a subspace of $\tilde{X}$.

Solution

Part 1: Constructing $\tilde{X}$

Set Definition: Let $\tilde{X}$ be the set of all polynomials with complex coefficients of degree not exceeding $n$, along with the zero polynomial. A general element of $\tilde{X}$ can be represented as:

\begin{equation*} p(t) = c_0 + c_1t + c_2t^2 + \ldots + c_nt^n \end{equation*}

where $c_0, c_1, \ldots, c_n$ are complex numbers.
Vector Space Operations:
- Addition: For any two polynomials $p(t), q(t) \in \tilde{X}$, their sum $p(t) + q(t)$ is also a polynomial in $\tilde{X}$ with complex coefficients.
- Scalar Multiplication: For any complex number $\alpha$ and any polynomial $p(t) \in \tilde{X}$, the product $\alpha p(t)$ is also a polynomial in $\tilde{X}$.
Verification of Vector Space Axioms: Similar to the real case, one can verify that $\tilde{X}$ satisfies all the vector space axioms under these operations.

Part 2: Is $X$ a Subspace of $\tilde{X}$?

Subspace Criteria: A subset $Y$ of a vector space $Z$ is a subspace of $Z$ if:
- The zero vector of $Z$ is in $Y$.
- For every $u, v \in Y$, the sum $u + v$ is in $Y$.
- For every $u \in Y$ and scalar $c$, the product $cu$ is in $Y$.
Application to $X$ and $\tilde{X}$:
- The zero polynomial is in both $X$ and $\tilde{X}$.
- The sum of any two polynomials in $X$ (with real coefficients) is a polynomial with real coefficients, which is in $X$.
- The product of any polynomial in $X$ by any real number is a polynomial with real coefficients, which is in $X$.
Failure under Complex Scalar Multiplication:

However, if we consider scalar multiplication by complex numbers (as is allowed in $\tilde{X}$), $X$ is not closed under this operation. For example, if $p(t) \in X$ and $i$ is the imaginary unit, $i \cdot p(t)$ will be a polynomial with complex coefficients, which is not in $X$ but is in $\tilde{X}$.
Conclusion: While $X$ satisfies the criteria for being a subspace under real scalar multiplication, it does not satisfy the criteria under complex scalar multiplication. Therefore, $X$ is not a subspace of $\tilde{X}$ when $\tilde{X}$ is considered as a complex vector space.

Problem 10.

If $Y$ and $Z$ are subspaces of a vector space $X$, show that $Y \cap Z$ is a subspace of $X$, but $Y \cup Z$ need not be one. Provide three examples to illustrate the concepts.

Solution

Part 1: $Y \cap Z$ is a Subspace

To show that $Y \cap Z$ is a subspace of $X$, we need to verify the subspace criteria:

Non-emptiness: Since $Y$ and $Z$ are subspaces, they both contain the zero vector. Therefore, $Y \cap Z$ is non-empty as it at least contains the zero vector.
Closed under addition: Let $u$ and $v$ be any vectors in $Y \cap Z$. Since $u$ and $v$ are in both $Y$ and $Z$, and since $Y$ and $Z$ are subspaces (and thus closed under addition), $u + v$ must be in both $Y$ and $Z$. Therefore, $u + v$ is in $Y \cap Z$.
Closed under scalar multiplication: Let $u$ be any vector in $Y \cap Z$ and let $c$ be any scalar. Since $u$ is in both $Y$ and $Z$, and since $Y$ and $Z$ are subspaces (and thus closed under scalar multiplication), $c \cdot u$ must be in both $Y$ and $Z$. Therefore, $c \cdot u$ is in $Y \cap Z$.

Part 2: $Y \cup Z$ Need Not Be a Subspace

To show that $Y \cup Z$ need not be a subspace, consider the following examples:

Example 1: Let $X = \mathbb{R}^2$, $Y = \{(x, 0) \mid x \in \mathbb{R}\}$ (the x-axis), and $Z = \{(0, y) \mid y \in \mathbb{R}\}$ (the y-axis). $Y$ and $Z$ are both subspaces of $X$, but $Y \cup Z$ is not because it is not closed under addition. For example, $(1, 0) \in Y$ and $(0, 1) \in Z$, but $(1, 0) + (0, 1) = (1, 1) \notin Y \cup Z$.
Example 2: Let $X = \mathbb{R}^3$, $Y = \{(x, 0, 0) \mid x \in \mathbb{R}\}$, and $Z = \{(0, y, 0) \mid y \in \mathbb{R}\}$. $Y$ and $Z$ are both subspaces of $X$, but $Y \cup Z$ is not a subspace because it is not closed under scalar multiplication. For example, $(1, 0, 0) \in Y$, but $2 \cdot (1, 0, 0) = (2, 0, 0) \notin Y \cup Z$.
Example 3: Let $X = \mathbb{R}$, $Y = \{2\}$, and $Z = \{3\}$. $Y$ and $Z$ are both subspaces of $X$, but $Y \cup Z = \{2, 3\}$ is not a subspace because it is not closed under addition or scalar multiplication.

Conclusion

While the intersection of two subspaces is always a subspace, the union of two subspaces is generally not a subspace unless one of the subspaces is contained within the other. The provided examples illustrate scenarios where the union of two subspaces fails to satisfy the subspace criteria.

Problem 11

If $M \neq \emptyset$ is any subset of a vector space $X$, show that $\text{span}(M)$ is a subspace of $X$.

Solution

To prove that $\text{span}(M)$ is a subspace of $X$, we need to verify the following properties:

Non-emptiness: Since $M \neq \emptyset$, there is at least one vector in $M$. The zero vector of $X$ can be represented as a linear combination of vectors in $M$ with all coefficients being zero. Hence, the zero vector is in $\text{span}(M)$, ensuring that $\text{span}(M)$ is non-empty.

\begin{equation*} \text{Let } \mathbf{v} \in M, \text{ then } 0 \cdot \mathbf{v} = \mathbf{0} \in \text{span}(M) \end{equation*}
Closure under Addition: Let $\mathbf{u}$ and $\mathbf{v}$ be any two vectors in $\text{span}(M)$. This means that $\mathbf{u}$ and $\mathbf{v}$ can be expressed as linear combinations of vectors from $M$. The sum $\mathbf{u} + \mathbf{v}$ is also a linear combination of vectors from $M$ and is therefore in $\text{span}(M)$.

\begin{equation*} \text{Let } \mathbf{u} = \sum_{i=1}^{n} a_i \mathbf{m}_i \text{ and } \mathbf{v} = \sum_{i=1}^{n} b_i \mathbf{m}_i, \text{ where } \mathbf{m}_i \in M \text{Then, } \mathbf{u} + \mathbf{v} = \sum_{i=1}^{n} (a_i + b_i) \mathbf{m}_i \in \text{span}(M) \end{equation*}
Closure under Scalar Multiplication: Let $\mathbf{u}$ be a vector in $\text{span}(M)$ and $c$ be any scalar. The product $c\mathbf{u}$ is also a linear combination of vectors from $M$ and is therefore in $\text{span}(M)$.

\begin{equation*} \text{Let } \mathbf{u} = \sum_{i=1}^{n} a_i \mathbf{m}_i, \text{ where } \mathbf{m}_i \in M \text{Then, } c\mathbf{u} = c\sum_{i=1}^{n} a_i \mathbf{m}_i = \sum_{i=1}^{n} (ca_i) \mathbf{m}_i \in \text{span}(M) \end{equation*}

By verifying these properties, we have shown that $\text{span}(M)$ is a subspace of $X$.

Kreyszig 1.4, Metric Spaces - Convergence, Cauchy Sequence, Completness

Lucy Nowacki

2023-10-11 10:46

Problem 1. Convergence of Subsequences in a Metric Space

Given: A sequence $(x_n)$ in a metric space $X$ is convergent and has limit $x$.

To Prove: Every subsequence $(x_{n_k})$ of $(x_n)$ is convergent and has the same limit $x$.

Proof:

Let $(x_{n_k})$ be an arbitrary subsequence of $(x_n)$.

Given that $(x_n)$ is convergent with limit $x$: This means that for every $\epsilon > 0$, there exists an $N$ such that for all $n \geq N$, the distance between $x_n$ and $x$ is less than $\epsilon$. Mathematically, this is: $d(x_n, x) < \epsilon \quad \text{for all} \quad n \geq N$
Consider the subsequence $(x_{n_k})$: Since $n_k$ represents the indices of the subsequence and $n_k$ is increasing (because it's a subsequence), for every $k \geq K$ (for some $K$), we have $n_k \geq N$.
Using the convergence of $(x_n)$: For the same $\epsilon > 0$ as before, for all $k \geq K$, we have: $d(x_{n_k}, x) < \epsilon$ This is because $n_k \geq N$ for all $k \geq K$, and we know from the convergence of $(x_n)$ that the distance between any term beyond $N$ and the limit $x$ is less than $\epsilon$.
Conclusion: The above expression shows that the subsequence $(x_{n_k})$ also converges to the same limit $x$.

Hence, every subsequence $(x_{n_k})$ of $(x_n)$ is convergent and has the same limit $x$.

This completes the proof.

Crux of the Proof for Convergence of Subsequences

Definition of Convergence: A sequence $(x_n)$ in a metric space converges to a limit $x$ if, for every $\epsilon > 0$, there exists an $N$ such that for all $n \geq N$, the distance between $x_n$ and $x$ is less than $\epsilon$.
Nature of Subsequences: A subsequence $(x_{n_k})$ retains the order of the original sequence $(x_n)$. This means that if $n_k$ is the index of a term in the subsequence, then for every $k' > k$, $n_{k'} > n_k$.

Given these two points:

If $(x_n)$ converges to $x$, then beyond a certain index $N$, all terms of $(x_n)$ are close to $x$.
Since a subsequence $(x_{n_k})$ retains the order of $(x_n)$, beyond some index $K$, all terms of $(x_{n_k})$ will also be terms of $(x_n)$ that are close to $x$.

Thus, the subsequence $(x_{n_k})$ will also be close to $x$ beyond this index $K$, meaning it converges to the same limit $x$.

In essence, the convergence of the original sequence ensures that its terms get arbitrarily close to the limit, and the nature of subsequences ensures that their terms, being a subset of the original sequence's terms, will also get arbitrarily close to the same limit.

A Practical Example

Consider the sequence $(x_n)$ defined by $x_n = \frac{1}{n}$ for all natural numbers $n$. This sequence represents the reciprocals of natural numbers and is defined in the metric space of real numbers with the usual metric (absolute value of the difference).

Observation: As $n$ grows larger, $x_n$ gets closer and closer to 0. Hence, the sequence $(x_n)$ converges to 0 in the real numbers.

Subsequence: Let's consider a subsequence $(x_{n_k})$ where $n_k = 2^k$. This subsequence consists of the terms of $(x_n)$ at the positions which are powers of 2. So, the subsequence is: $x_{n_1} = \frac{1}{2}$, $x_{n_2} = \frac{1}{4}$, $x_{n_3} = \frac{1}{8}$, and so on.

Observation for the Subsequence: Just like the original sequence, as $k$ grows larger, $x_{n_k}$ gets closer and closer to 0. Hence, the subsequence $(x_{n_k})$ also converges to 0 in the real numbers.

Conclusion: Both the sequence $(x_n)$ and its subsequence $(x_{n_k})$ converge to the same limit, 0, in the metric space of real numbers. This is consistent with our earlier proof that if a sequence in a metric space converges to a limit, then every subsequence of it also converges to the same limit.

Problem 2. Convergence of Cauchy Sequences with Convergent Subsequences

Given: - A sequence $(x_n)$ is Cauchy. - There exists a convergent subsequence $(x_{n_k})$ such that $x_{n_k} \rightarrow x$.

To Prove: - $(x_n)$ is convergent and its limit is $x$.

Proof:

Cauchy Sequence: By definition, for every $\epsilon > 0$, there exists an $N_1$ such that for all $m, n \geq N_1$, we have $d(x_m, x_n) < \frac{\epsilon}{2}$.
Convergent Subsequence: Since $(x_{n_k}) \rightarrow x$, for the same $\epsilon > 0$, there exists a $K$ such that for all $k \geq K$, we have $d(x_{n_k}, x) < \frac{\epsilon}{2}$.
Combining the Two: Let's choose $N = \max(N_1, n_K)$ where $n_K$ is the $K$-th term of the sequence $(n_k)$. Then, for all $n \geq N$, we have $d(x_n, x) \leq d(x_n, x_{n_K}) + d(x_{n_K}, x)$ $< \frac{\epsilon}{2} + \frac{\epsilon}{2} = \epsilon$.

Therefore, $(x_n)$ converges to $x$.

If a sequence is "tightening up" (Cauchy) and a part of it (subsequence) is getting close to a specific value (converging to x), then the entire sequence must also be getting close to that value (converging to x).

Practical Example:

Consider the sequence $(x_n)$ defined as follows: $x_n = (-1)^n + \frac{1}{n}$.

Cauchy Sequence: $(x_n)$ is not a Cauchy sequence as it oscillates between positive and negative values without settling down as $n$ increases.

2. Convergent Subsequence: However, we can find a convergent subsequence. Consider $x_{n_k}$ where $n_k = 2k$. This subsequence is: $x_{n_1} = \frac{3}{2}$, $x_{n_2} = \frac{5}{4}$, $x_{n_3} = \frac{7}{6}$, and so on, which converges to 1.

According to the proof, if $(x_n)$ was Cauchy, it would converge to the same limit as its subsequence, which is 1. However, since $(x_n)$ is not Cauchy, we cannot conclude that $(x_n)$ converges to 1, which aligns with our observation of the sequence.

Note: This example demonstrates that the condition of being Cauchy is crucial for the sequence to converge to the same limit as its convergent subsequence.

Problem 3. Convergence and Neighborhoods in Metric Spaces

Given:

A sequence $(x_n)$ in a metric space.
A point $x$ in the same metric space.

To Prove: $x_n \rightarrow x$ if and only if for every neighborhood $B$ of $x$, there exists an integer $n_0$ such that $x_n \in B$ for all $n > n_0$.

Proof:

(⇒) Forward Direction: Assume $x_n \rightarrow x$.

By the definition of convergence, for every $\epsilon > 0$, there exists an $N$ such that for all $n \geq N$, the distance between $x_n$ and $x$ is less than $\epsilon$. This means that $x_n$ lies in the $\epsilon$-neighborhood of $x$ for all $n \geq N$.

Given any neighborhood $B$ of $x$, there exists some $\epsilon > 0$ such that the $\epsilon$-neighborhood of $x$ is contained in $B$. By the convergence of $x_n$, there exists an $n_0$ such that $x_n$ lies in this $\epsilon$-neighborhood (and hence in $B$) for all $n > n_0$.

(⇐) Reverse Direction: Assume that for every neighborhood $B$ of $x$, there exists an integer $n_0$ such that $x_n \in B$ for all $n > n_0$.

Given any $\epsilon > 0$, consider the $\epsilon$-neighborhood of $x$. By assumption, there exists an $n_0$ such that $x_n$ lies in this $\epsilon$-neighborhood for all $n > n_0$. This means that the distance between $x_n$ and $x$ is less than $\epsilon$ for all $n > n_0$.

Therefore, $x_n \rightarrow x$.

Conclusion: The sequence $x_n$ converges to $x$ if and only if for every neighborhood $B$ of $x$, there exists an integer $n_0$ such that $x_n \in B$ for all $n > n_0$.

This completes the proof.

Problem 4. Boundedness of Cauchy Sequences

Given: - A sequence $(x_n)$ is Cauchy.

To Prove: - The sequence $(x_n)$ is bounded.

Proof:

Definition of Cauchy Sequence: By definition, a sequence is Cauchy if, for any given $\epsilon > 0$ (let's choose $\epsilon = 1$ for simplicity), there exists an $N$ such that for all $m, n \geq N$, we have $|x_m - x_n| < 1$.
Boundedness of Terms Beyond $N$: For any $n \geq N$, using the triangle inequality, we get:

$|x_n| = |x_n - x_N + x_N|$ $\leq |x_n - x_N| + |x_N|$ $< 1 + |x_N|$

Let's denote $M = 1 + |x_N|$. So, for all $n \geq N$, $|x_n| < M$.
Boundedness of Terms Before $N$: For terms $x_1, x_2, ... x_{N-1}$, they are finitely many, so they have a maximum absolute value, say $M'$.
Combining the Two: The sequence $(x_n)$ is bounded by $\max(M, M')$ for all $n$.

Conclusion: Every Cauchy sequence is bounded.

This completes the proof.

Crux of the Proof for Boundedness of Cauchy Sequences

The core idea behind proving that a Cauchy sequence is bounded revolves around leveraging the defining property of Cauchy sequences: as the sequence progresses, its terms get arbitrarily close to each other.

Cauchy's Property: A Cauchy sequence ensures that, after a certain point (denoted by $N$), the distance between any two terms is less than any given positive value. For the sake of the proof, we chose this value as $\epsilon = 1$.
Boundedness Beyond a Point (Using $M$): Given the Cauchy property, we deduced that all terms of the sequence beyond the point $N$ are not just close to each other but are also close to a specific term, $x_N$. This means that the sequence's terms, after $N$, are bounded by a value $M$, which is a little more than the absolute value of $x_N$.
Boundedness Before the Point (Using $M'$): The terms before $N$ are finitely many. Any finite set of numbers is always bounded because there will be a maximum and minimum value among them. We denote the maximum absolute value of these terms as $M'$.
Why Two Different $M$ and $M'$?: The reason for using two different bounds, $M$ and $M'$, is to separately handle the boundedness of two segments of the sequence:
- $M$ handles the terms after the point $N$, ensuring they don't stray too far from $x_N$.
- $M'$ handles the initial terms, up to $N$, by simply using the maximum absolute value among them.

By combining these two bounds, we ensure that the entire sequence is bounded by the larger of $M$ and $M'$.

In essence, the proof uses the "tightening" behavior of Cauchy sequences to ensure that the sequence remains within a certain "boundary" and doesn't diverge to infinity, thus proving it's bounded.

Problem 6. Convergence of Distance Sequence of Cauchy Sequences

Given: Two sequences $(x_n)$ and $(y_n)$ in a metric space $(X,d)$ are Cauchy.

To Prove: The sequence $(a_n)$, where $a_n = d(x_n, y_n)$, converges.

Proof:

Cauchy Property of $(x_n)$ and $(y_n)$: Since both $(x_n)$ and $(y_n)$ are Cauchy, for any given $\epsilon > 0$, there exist integers $N_1$ and $N_2$ such that for all $m,n \geq N_1$ and $p,q \geq N_2$, we have:

$d(x_m, x_n) < \frac{\epsilon}{2}$ $d(y_p, y_q) < \frac{\epsilon}{2}$
Using the Triangle Inequality: Consider the difference $|a_m - a_n|$, where $a_m = d(x_m, y_m)$ and $a_n = d(x_n, y_n)$. Using the triangle inequality, we get:

$|a_m - a_n| = |d(x_m, y_m) - d(x_n, y_n)|$ $\leq d(x_m, x_n) + d(y_m, y_n)$

Now, using the properties of the metric and the Cauchy nature of the sequences, we can further bound this as:

$\leq d(x_m, x_n) + d(y_m, y_n) < \epsilon$ for all $m,n$ greater than $N = \max(N_1, N_2)$.
Convergence of $(a_n)$: The above inequality shows that the sequence $(a_n)$ is Cauchy. In metric spaces where every Cauchy sequence converges (like in real numbers), $(a_n)$ will converge.

Illustrative Example:

Consider the metric space $(X,d)$ where $X$ is the set of real numbers and $d$ is the usual metric (absolute difference). Let:

$x_n = \frac{1}{n}$ $y_n = \frac{1}{n+1}$

Both $(x_n)$ and $(y_n)$ are Cauchy sequences in this metric space. Now, consider:

$a_n = d(x_n, y_n) = \left| \frac{1}{n} - \frac{1}{n+1} \right| = \frac{1}{n(n+1)}$

The sequence $(a_n)$ represents the distances between the terms of $(x_n)$ and $(y_n)$. As $n$ goes to infinity, $a_n$ goes to 0, showing that $(a_n)$ converges to 0.

This completes the proof and illustrative example.

Problem 8. Equivalence of Cauchy Sequences in Two Metrics

Given: Two metrics $d_1$ and $d_2$ on the same set $X$. There exist positive numbers $a$ and $b$ such that for all $x, y \in X$:

\begin{equation*} a d_1(x,y) \leq d_2(x,y) \leq b d_1(x,y) \end{equation*}

To Prove: The Cauchy sequences in $(X,d_1)$ and $(X,d_2)$ are the same.

Proof:

Assume a Cauchy Sequence in :math:`(X,d_1)`:

Let $(x_n)$ be a Cauchy sequence in $(X,d_1)$.

This means that for any given $\epsilon > 0$, there exists an integer $N$ such that for all $m, n \geq N$:

\begin{equation*} d_1(x_m, x_n) < \epsilon \end{equation*}
Using the Given Inequality: Using the given inequality, we can deduce:

\begin{equation*} d_2(x_m, x_n) \leq b d_1(x_m, x_n) < b\epsilon \end{equation*}

This shows that $(x_n)$ is also a Cauchy sequence in $(X,d_2)$.
Conversely, Assume a Cauchy Sequence in $(X,d_2)$:

Similarly, if $(x_n)$ is a Cauchy sequence in $(X,d_2)$, then for any given $\epsilon > 0$, there exists an integer $N$ such that for all $m, n \geq N$:

\begin{equation*} d_2(x_m, x_n) < \epsilon \end{equation*}

Using the given inequality again, we get:

\begin{equation*} d_1(x_m, x_n) \leq \frac{1}{a} d_2(x_m, x_n) < \frac{\epsilon}{a} \end{equation*}

This shows that $(x_n)$ is also a Cauchy sequence in $(X,d_1)$.

Conclusion: The Cauchy sequences in $(X,d_1)$ and $(X,d_2)$ are the same.

This completes the proof.

Crux of the Proof for Equivalence of Cauchy Sequences in Two Metrics

The essence of the proof lies in the given relationship between the two metrics $d_1$ and $d_2$. The inequalities provided ensure that the "distance" between any two points in $X$ as measured by $d_1$ and $d_2$ are directly proportional. This proportionality ensures that if the terms of a sequence get arbitrarily close to each other in one metric, they must also get arbitrarily close in the other metric.

Crux of the Proof:

Proportional Distances: The given inequalities $a d_1(x,y) \leq d_2(x,y) \leq b d_1(x,y)$ ensure that distances in $d_2$ are always bounded by proportional distances in $d_1$.
Cauchy in $d_1$ Implies Cauchy in $d_2$: If a sequence is Cauchy in $(X, d_1)$, then the terms of the sequence are getting closer in the $d_1$ metric. Due to the proportional relationship, they must also be getting closer in the $d_2$ metric.
Cauchy in $d_2$ Implies Cauchy in $d_1$: Similarly, if a sequence is Cauchy in $(X, d_2)$, the proportional relationship ensures that the sequence is also Cauchy in $(X, d_1)$.

By establishing these two implications, we conclude that the set of Cauchy sequences in both metrics is the same.

Problem 10. Completeness of Complex Numbers Using Completeness of Real Numbers

Given:

The real numbers $\mathbb{R}$ are complete, which means every Cauchy sequence of real numbers converges to a limit in $\mathbb{R}$.

To Prove:

The complex numbers $\mathbb{C}$ are complete.

Proof:

Representation of Complex Numbers: Every complex number can be represented as:

\begin{equation*} z = x + yi \end{equation*}

where $x$ and $y$ are real numbers and $i$ is the imaginary unit.
Assume a Cauchy Sequence in $\mathbb{C}$: Let $(z_n)$ be a Cauchy sequence in $\mathbb{C}$. This means that for any given $\epsilon > 0$, there exists an integer $N$ such that for all $m, n \geq N$:

\begin{equation*} |z_m - z_n| < \epsilon \end{equation*}
Real and Imaginary Parts are Cauchy: The sequences of real parts $(x_n)$ and imaginary parts $(y_n)$ of $(z_n)$ are also Cauchy in $\mathbb{R}$. This is because:

\begin{equation*} |x_m - x_n| \leq |z_m - z_n| |y_m - y_n| \leq |z_m - z_n| \end{equation*}
Convergence of Real and Imaginary Parts: Since $\mathbb{R}$ is complete, the Cauchy sequences $(x_n)$ and $(y_n)$ converge to limits in $\mathbb{R}$.
Convergence of the Complex Sequence: The sequence $(z_n)$ converges to:

\begin{equation*} z = x + yi \end{equation*}

in $\mathbb{C}$.

Conclusion: Every Cauchy sequence in $\mathbb{C}$ converges to a limit in $\mathbb{C}$. Hence, $\mathbb{C}$ is complete.

This completes the proof.

Kreyszig 1.2, Metric Spaces

Lucy Nowacki

2023-10-07 00:03

Problem 1. Show that in the metric $d(x,y) = \sum_{j=1}^{\infty} \frac{1}{2^j} \frac{|x_j - y_j|}{1 + |x_j - y_j|}$ we can obtain another metric by replacing $\frac{1}{2^j}$ by $\mu_j > 0$ such that $\sum \mu_j$ converges.

Proof:

Given the metric: $d(x,y) = \sum_{j=1}^{\infty} \frac{1}{2^j} \frac{|x_j - y_j|}{1 + |x_j - y_j|}$

We want to replace $\frac{1}{2^j}$ with $\mu_j > 0$ such that $\sum \mu_j$ converges. Let's denote the new metric as $d'(x,y)$.

$d'(x,y) = \sum_{j=1}^{\infty} \mu_j \frac{|x_j - y_j|}{1 + |x_j - y_j|}$

To show that $d'$ is a metric, it must satisfy the metric axioms:

Non-negativity: $d'(x,y) \geq 0$ for all $x, y$ and $d'(x,y) = 0$ if and only if $x = y$.

Proof: Each term in the series is non-negative due to the absolute value and the fact that $\mu_j > 0$. The sum is zero if and only if each term is zero, which implies $x_j = y_j$ for all $j$, or $x = y$.
Symmetry: $d'(x,y) = d'(y,x)$ for all $x, y$.

Proof: This is evident from the absolute value in the definition of the metric.
Triangle Inequality: $d'(x,z) \leq d'(x,y) + d'(y,z)$ for all $x, y, z$.

Proof: For each term in the series, the triangle inequality for the absolute value gives: $\frac{|x_j - z_j|}{1 + |x_j - z_j|} \leq \frac{|x_j - y_j| + |y_j - z_j|}{1 + |x_j - y_j| + |y_j - z_j|}$ Multiplying both sides by $\mu_j$ and summing over all $j$ gives the desired result.

Given that the series $\sum \mu_j$ converges, the series defining $d'$ will also converge for any $x$ and $y$ (by the comparison test, since each term of the metric series is bounded by $\mu_j$).

Thus, $d'$ defined with $\mu_j$ is a valid metric on the space as long as $\sum \mu_j$ converges.

Problem 2: Show that the geometric mean of two positive numbers does not exceed the arithmetic mean using the given inequality.

Given:

$\alpha \beta \leq \int_0^{\alpha} t^{p-1} dt + \int_0^{\beta} u^{q-1} du = \frac{\alpha^p}{p} + \frac{\beta^q}{q}$

where $\alpha$ and $\beta$ are positive numbers, and $p$ and $q$ are conjugate exponents such that $\frac{1}{p} + \frac{1}{q} = 1$.

Proof:

To show that the geometric mean of two positive numbers does not exceed the arithmetic mean, consider two positive numbers $a$ and $b$. The geometric mean is $\sqrt{ab}$ and the arithmetic mean is $\frac{a+b}{2}$.

Let's set $\alpha = a$ and $\beta = b$, and choose $p = 2$ and $q = 2$ (since they are conjugate exponents).

Using the given inequality: $ab \leq \frac{a^2}{2} + \frac{b^2}{2}$

Rearranging: $2ab \leq a^2 + b^2$

Taking the square root of both sides: $\sqrt{2ab} \leq \sqrt{a^2 + b^2}$

Dividing both sides by 2: $\sqrt{ab} \leq \frac{a+b}{2}$

This shows that the geometric mean of $a$ and $b$ is less than or equal to their arithmetic mean.

Thus, the geometric mean of two positive numbers does not exceed the arithmetic mean.

Problem 3: Show that the Cauchy-Schwarz inequality implies the given inequality for sequences.

Given:

$\left| \sum_{j=1}^{\infty} \xi_j \eta_j \right| \leq \sqrt{ \sum_{k=1}^{\infty} |\xi_k|^2 } \sqrt{ \sum_{m=1}^{\infty} |\eta_m|^2 }$

To Prove:

Proof:

Let's consider two sequences:

$a_j = |\xi_j|$ for $j = 1, 2, ..., n$
$b_j = 1$ for all $j$

Using the Cauchy-Schwarz inequality, we have:

$\left( \sum_{j=1}^{n} a_j b_j \right)^2 \leq \left( \sum_{j=1}^{n} a_j^2 \right) \left( \sum_{j=1}^{n} b_j^2 \right)$

Substituting in our choices for $a_j$ and $b_j$, we get:

$\left( \sum_{j=1}^{n} |\xi_j| \right)^2 \leq \left( \sum_{j=1}^{n} |\xi_j|^2 \right) \left( \sum_{j=1}^{n} 1^2 \right)$

Since $\sum_{j=1}^{n} 1^2 = n$, our inequality becomes:

$\left( \sum_{j=1}^{n} |\xi_j| \right)^2 \leq n \left( \sum_{j=1}^{n} |\xi_j|^2 \right)$

This completes the proof.

Problem 4: Find a sequence that converges to 0 but is not in any :math:`l^p` space, where :math:`1 leq p < +infty`.

Given:

Consider the sequence $(x_n)$ defined as:

\begin{equation*} x_n = \begin{cases} \frac{1}{k} & \text{if } n = 2^k \text{ for some } k \in \mathbb{N} \\ 0 & \text{otherwise} \end{cases} \end{equation*}

This means that the sequence $(x_n)$ will have the values:

\begin{equation*} x_1 = 0, x_2 = 1, x_3 = 0, x_4 = \frac{1}{2}, x_5 = x_6 = x_7 = 0, x_8 = \frac{1}{3}, \dots \end{equation*}

Proof:

1. Convergence to 0: Clearly, $(x_n)$ converges to 0 as $n$ approaches infinity because the sequence is 0 at all but countably many points, and where it is not 0, it is a sequence of numbers $\frac{1}{k}$ which tends to 0 as $k$ increases.

2. Not in any $l^p$ space: To see why $(x_n)$ is not in any $l^p$ space, consider the $p$-norm of $(x_n)$:

\begin{equation*} ||x_n||_p = \left( \sum_{n=1}^{\infty} |x_n|^p \right)^{\frac{1}{p}} \end{equation*}

For $n = 2^k$, $x_n = \frac{1}{k}$. So, the $p$-norm becomes:

\begin{equation*} ||x_n||_p = \left( \sum_{k=1}^{\infty} \left( \frac{1}{k} \right)^p \right)^{\frac{1}{p}} \end{equation*}

Given that the series $\sum_{k=1}^{\infty} \left( \frac{1}{k} \right)^p$ is divergent for $p \leq 1$, it implies that the sequence $(x_n)$ is not in any $l^p$ space for $p \leq 1$.

Problem 5: Find a sequence $x$ which is in $l^p$ for some $p > 1$ but $x$ is not in $l^1$ .

Solution:

Consider the sequence $x_n$ defined by:

\begin{equation*} x_n = \frac{1}{n^{\alpha}} \end{equation*}

where $0 < \alpha < 1$.

1. $x$ is in $l^p$ for $p > 1$:

For the sequence to be in $l^p$, the series $\sum_{n=1}^{\infty} |x_n|^p$ must converge. In this case:

\begin{equation*} \sum_{n=1}^{\infty} \left( \frac{1}{n^{\alpha}} \right)^p = \sum_{n=1}^{\infty} \frac{1}{n^{p\alpha}} \end{equation*}

Given that $p > 1$ and $0 < \alpha < 1$, the exponent $p\alpha$ will be strictly between 1 and $p$. Since the series $\sum_{n=1}^{\infty} \frac{1}{n^s}$ converges for $s > 1$, our series will converge for any $p > 1$.

Proof: Convergence of the series $\sum_{n=1}^{\infty} \frac{1}{n^s}$ for $s > 1$

Integral Test:

To determine the convergence of the series $\sum_{n=1}^{\infty} \frac{1}{n^s}$, we can compare it to the improper integral:

\begin{equation*} \int_{1}^{\infty} \frac{1}{x^s} \, dx \end{equation*}

Evaluate the integral:

\begin{equation*} \int_{1}^{\infty} \frac{1}{x^s} \, dx = \lim_{{b \to \infty}} \int_{1}^{b} x^{-s} \, dx \end{equation*}

Using the power rule for integration:

\begin{equation*} \lim_{{b \to \infty}} \left[ \frac{x^{-s+1}}{-s+1} \right]_1^b = \lim_{{b \to \infty}} \left[ \frac{1}{(1-s)b^{s-1}} - \frac{1}{1-s} \right] \end{equation*}

For $s > 1$, the term $\frac{1}{(1-s)b^{s-1}}$ approaches 0 as $b$ approaches infinity. Thus, the integral converges to:

\begin{equation*} \frac{1}{s-1} \end{equation*}

Comparison with the series:

Since the improper integral converges, the series $\sum_{n=1}^{\infty} \frac{1}{n^s}$ also converges by the integral test.

In conclusion, the series $\sum_{n=1}^{\infty} \frac{1}{n^s}$ converges for all $s > 1$.

2. $x$ is not in $l^1$:

For the sequence to be in $l^1$, the series $\sum_{n=1}^{\infty} |x_n|$ must converge. In this case:

\begin{equation*} \sum_{n=1}^{\infty} \frac{1}{n^{\alpha}} \end{equation*}

Given that $0 < \alpha < 1$, this is a p-series with $p = \alpha$, and it is known that such a series diverges when $p \leq 1$. Thus, the sequence $x_n$ is not in $l^1$.

Proof: Divergence of the series $\sum_{n=1}^{\infty} \frac{1}{n^p}$ for $p \leq 1$

Integral Test:

To determine the convergence of the series $\sum_{n=1}^{\infty} \frac{1}{n^p}$, we can compare it to the improper integral:

\begin{equation*} \int_{1}^{\infty} \frac{1}{x^p} \, dx \end{equation*}

Evaluate the integral:

\begin{equation*} \int_{1}^{\infty} \frac{1}{x^p} \, dx = \lim_{{b \to \infty}} \int_{1}^{b} x^{-p} \, dx \end{equation*}

Using the power rule for integration:

\begin{equation*} \lim_{{b \to \infty}} \left[ \frac{x^{-p+1}}{-p+1} \right]_1^b = \lim_{{b \to \infty}} \left[ \frac{1}{(1-p)b^{p-1}} - \frac{1}{1-p} \right] \end{equation*}

For $p \leq 1$, the term $\frac{1}{(1-p)b^{p-1}}$ does not approach 0 as $b$ approaches infinity. Instead, it either remains constant (for (p = 1)) or grows without bound (for (p < 1)). Thus, the integral diverges.

Comparison with the series:

Since the improper integral diverges, the series $\sum_{n=1}^{\infty} \frac{1}{n^p}$ also diverges by the integral test.

In conclusion, the series $\sum_{n=1}^{\infty} \frac{1}{n^p}$ diverges for all $p \leq 1$.

Therefore, the sequence $x_n = \frac{1}{n^{\alpha}}$ where $0 < \alpha < 1$ is in $l^p$ for any $p > 1$ but is not in $l^1$.

Problem 6: Show that if $A \subset B$ in a metric space $(X,d)$, then $\delta(A) \leq \delta(B)$.

Given: The diameter $\delta(A)$ of a nonempty set $A$ in a metric space $(X,d)$ is defined as:

\begin{equation*} \delta(A) = \sup_{x, y \in A} d(x, y) \end{equation*}

A set $A$ is said to be bounded if $\delta(A) < \infty$.

Intuition: Consider two nested sets $A$ and $B$ in a metric space. The inner set represents $A$ and the outer set represents $B$.

https://www.wolframcloud.com/obj/8284851e-93d0-4a74-90e1-459e0333670c

The maximum distance between any two points in $A$ will always be less than or equal to the maximum distance between any two points in $B$. This is because every point in $A$ is also in $B$, and the supremum of distances in $B$ must account for all distances in $A$ as well as additional distances between points exclusive to $B$ or between points in $A$ and points exclusive to $B$.

Proof: 1. Let's consider any two points $x, y$ in $A$. Since $A \subset B$, both $x$ and $y$ are also in $B$. Therefore, the distance $d(x, y)$ is also a distance between two points in $B$. 2. Given the definition of $\delta$ as the supremum of distances between any two points in a set, the maximum distance between any two points in $A$ will always be less than or equal to the maximum distance between any two points in $B$. This is because the set of all distances in $A$ is a subset of the set of all distances in $B$. 3. Therefore, $\delta(A) \leq \delta(B)$.

Problem 7: Given the definition of the diameter $\delta(A)$ of a nonempty set $A$ in a metric space $(X,d)$, show that:

\begin{equation*} \delta(A) = \sup_{x, y \in A} d(x, y) \end{equation*}

Show that $\delta(A) = 0$ if and only if $A$ consists of a single point.

Proof:

(=>) Direction: Assume $\delta(A) = 0$. This means that the supremum of the distances between all pairs of points in $A$ is 0. For any two distinct points $x$ and $y$ in $A$, the distance $d(x, y)$ must be 0. However, in a metric space, the distance between two distinct points is always greater than 0. Therefore, the only way for the supremum of the distances to be 0 is if there are no pairs of distinct points in $A$. This implies that $A$ consists of a single point.

(<=) Direction: Assume $A$ consists of a single point, say $a$. Then, for any $x, y \in A$, $x = y = a$. The distance $d(x, y) = d(a, a) = 0$. Since this is the only possible distance between points in $A$, the supremum of these distances is also 0. Therefore, $\delta(A) = 0$.

Combining both directions, we conclude that $\delta(A) = 0$ if and only if $A$ consists of a single point.

Problem 8: Given two nonempty subsets $A$ and $B$ of a metric space $(X,d)$, the distance $D(A,B)$ between $A$ and $B$ is defined as:

\begin{equation*} D(A,B) = \inf_{a \in A, b \in B} d(a, b) \end{equation*}

Show that $D$ does not define a metric on the power set of $X$.

Proof: To show that $D$ does not define a metric on the power set of $X$, we need to show that at least one of the metric properties is violated by $D$. The metric properties are:

Non-negativity: For all sets $A, B$ in the power set of $X$, $D(A,B) \geq 0$.
Identity of indiscernibles: $D(A,B) = 0$ if and only if $A = B$.
Symmetry: For all sets $A, B$ in the power set of $X$, $D(A,B) = D(B,A)$.
Triangle inequality: For all sets $A, B, C$ in the power set of $X$, $D(A,C) \leq D(A,B) + D(B,C)$.

We will focus on the second property, the identity of indiscernibles.

Consider two distinct sets $A$ and $B$ such that $A$ is a subset of $B$ and $B$ contains one additional point $b$ not in $A$. Now, for any point $a$ in $A$, the distance $d(a, b)$ is some positive value. However, since $A$ is a subset of $B$, the distance between any point in $A$ and itself in $B$ is 0. This means that the infimum of the distances between points in $A$ and $B$ is 0, even though $A$ and $B$ are distinct sets. This violates the identity of indiscernibles property, as $D(A,B) = 0$ even when $A \neq B$.

Therefore, $D$ does not define a metric on the power set of $X$.

Problem 8: Given two nonempty subsets $A$ and $B$ of a metric space $(X,d)$, the distance $D(A,B)$ between $A$ and $B$ is defined as:

\begin{equation*} D(A,B) = \inf_{a \in A, b \in B} d(a, b) \end{equation*}

Show that $D$ does not define a metric on the power set of $X$.

Visualization:

https://www.wolframcloud.com/obj/b078151a-22ad-4290-b859-657a3777ff26

In the above visualization:

The inner circle represents the set $A$.
The outer circle represents the set $B$.
The point labeled "a" is a point in $A$.
The point labeled "b" is a point in $B$ but not in $A$.

Proof with Visualization:

Consider two distinct sets $A$ and $B$ in a metric space such that $A$ is a proper subset of $B$. As visualized, $A$ is represented by the inner circle, and $B$ is represented by the outer circle. The point $a$ is in both $A$ and $B$, while the point $b$ is only in $B$.

Now, the distance $d(a, b)$ between any point $a$ in $A$ and the point $b$ in $B$ is some positive value. However, since $A$ is a subset of $B$, the distance between any point in $A$ and itself in $B$ is 0. This means that the infimum of the distances between points in $A$ and $B$ is 0, even though $A$ and $B$ are distinct sets. This violates the identity of indiscernibles property, as $D(A,B) = 0$ even when $A \neq B$.

Therefore, $D$ does not define a metric on the power set of $X$.

Problem 9: Given the definition of the distance $D(A,B)$ between two nonempty subsets $A$ and $B$ of a metric space $(X,d)$, show that:

\begin{equation*} D(A,B) = \inf_{a \in A, b \in B} d(a, b) \end{equation*}

Show that if $A \cap B \neq \emptyset$, then $D(A,B) = 0$. What about the converse?

Proof with Visualization:

https://www.wolframcloud.com/obj/15f2f93c-50a2-4687-b6cb-0b8b26a2a6f2

In the above visualization:

The circle on the left represents the set $A$.
The circle on the right represents the set $B$.
The point labeled "x" is a point that belongs to both $A$ and $B$, i.e., $x \in A \cap B$.

If $A \cap B \neq \emptyset$ then $D(A,B) = 0$ :

If $A \cap B \neq \emptyset$, then there exists at least one point $x$ such that $x \in A$ and $x \in B$. For this point, $d(x, x) = 0$. Since $D(A,B)$ is the infimum of the distances between all pairs of points where one is from $A$ and the other is from $B$, and since 0 is a possible distance (because of the point $x$), the infimum is 0. Therefore, $D(A,B) = 0$.
Converse: If $D(A,B) = 0$ , then $A \cap B \neq \emptyset$ :

This statement is true. If $D(A,B) = 0$, it means that the infimum of the distances between all pairs of points where one is from $A$ and the other is from $B$ is 0. This implies that there exists a pair of points $a \in A$ and $b \in B$ such that $d(a, b) = 0$. In a metric space, the distance between two points is 0 if and only if the two points are the same. Therefore, $a = b$, which means there exists a point that belongs to both $A$ and $B$, i.e., $A \cap B \neq \emptyset$.

Problem 10: Given the definition of the distance $D(x,B)$ from a point $x$ to a non-empty subset $B$ of a metric space $(X,d)$, show that:

\begin{equation*} D(x,B) = \inf_{b \in B} d(x, b) \end{equation*}

Show that for any $x, y \in X$:

\begin{equation*} |D(x,B) - D(y,B)| \leq d(x,y) \end{equation*}

Proof:

For any $b \in B$:

$d(x, b) \leq d(x, y) + d(y, b)$ (by the triangle inequality)

Rearranging, we get:

$d(x, b) - d(y, b) \leq d(x, y)$

Now, taking the infimum over all $b \in B$ on both sides:

$D(x,B) - D(y,B) \leq d(x,y)$

Similarly, by interchanging $x$ and $y$:

$d(y, b) \leq d(y, x) + d(x, b)$ (by the triangle inequality)

Rearranging, we get:

$d(y, b) - d(x, b) \leq d(y, x)$

Taking the infimum over all $b \in B$ on both sides:

$D(y,B) - D(x,B) \leq d(y,x)$

From (3) and (6), we get:

\begin{equation*} |D(x,B) - D(y,B)| \leq d(x,y) \end{equation*}

Visualization Explanation:

https://www.wolframcloud.com/obj/2f65e7f6-4afe-448d-93ed-5cf2d939da51

In the above visualization:

The circle represents the set $B$.
The points labeled "x" and "y" are two arbitrary points in $X$.
The point labeled "b" is an arbitrary point in $B$.
The solid line between "x" and "y" represents the distance $d(x,y)$.
The dashed lines from "x" and "y" to "b" represent the distances $d(x,b)$ and $d(y,b)$ respectively.

From the triangle inequality, the direct distance between "x" and "y" (i.e., $d(x,y)$) is always less than or equal to the sum of their distances to any point "b" in $B$. This is visually evident as the direct path (solid line) between "x" and "y" is shorter than the path that goes through "b" (dashed lines).

This visualization supports the proof that for any two points $x$ and $y$ in $X$, the difference in their distances to set $B$ is bounded by their direct distance, i.e., $|D(x,B) - D(y,B)| \leq d(x,y)$.

Problem 12:

Given the definition of a bounded set from Problem 6, where the diameter $\delta(A)$ of a nonempty set $A$ in a metric space $(X, d)$ is defined by:

\begin{equation*} \delta(A) = \sup_{x,y \in A} d(x, y) \end{equation*}

A set is said to be bounded if $\delta(A) < \infty$.

Show that the union of two bounded sets $A$ and $B$ in a metric space is a bounded set.

Visualization of the union of two bounded sets

Proof:

Let's assume that both $A$ and $B$ are bounded sets. This means:

\begin{equation*} \delta(A) = \sup_{x,y \in A} d(x, y) < \infty \delta(B) = \sup_{x,y \in B} d(x, y) < \infty \end{equation*}

Now, consider any two points $p$ and $q$ in $A \cup B$. There are three possible scenarios:

Both $p$ and $q$ are in $A$.
Both $p$ and $q$ are in $B$.
$p$ is in $A$ and $q$ is in $B$ or vice versa.

For the first scenario, $d(p, q) \leq \delta(A)$ since $A$ is bounded.

For the second scenario, $d(p, q) \leq \delta(B)$ since $B$ is bounded.

For the third scenario, let's use the triangle inequality:

\begin{equation*} d(p, q) \leq d(p, r) + d(r, q) \end{equation*}

Where $r$ is any point in $A$ (or $B$). Since both $A$ and $B$ are bounded, we can say:

\begin{equation*} d(p, q) \leq \delta(A) + \delta(B) \end{equation*}

Combining all three scenarios, the supremum of the distances between any two points in $A \cup B$ is:

\begin{equation*} \delta(A \cup B) \leq \max(\delta(A), \delta(B), \delta(A) + \delta(B)) \end{equation*}

Since both $\delta(A)$ and $\delta(B)$ are finite, their sum is also finite. Thus, $\delta(A \cup B) < \infty$, which means $A \cup B$ is bounded.

This completes the proof.

Problem 11:

Given a metric space $(X,d)$, another metric on $X$ is defined by:

\begin{equation*} \tilde{d}(x,y) = \frac{d(x,y)}{1+d(x,y)} \end{equation*}

Show that $\tilde{d}$ is a metric and that $X$ is bounded in this metric.

Proof:

Part 1: Show that (tilde{d}(x,y)) is a metric:

Non-negativity: For any $x, y \in X$,

\begin{equation*} \tilde{d}(x,y) = \frac{d(x,y)}{1+d(x,y)} \geq 0 \end{equation*}

since $d(x,y) \geq 0$ by the definition of a metric.
Identity of indiscernibles: For any $x \in X$,

\begin{equation*} \tilde{d}(x,x) = \frac{d(x,x)}{1+d(x,x)} = 0 \end{equation*}

since $d(x,x) = 0$.
Symmetry: For any $x, y \in X$,

\begin{equation*} \tilde{d}(x,y) = \frac{d(x,y)}{1+d(x,y)} = \frac{d(y,x)}{1+d(y,x)} = \tilde{d}(y,x) \end{equation*}

because $d(x,y) = d(y,x)$.
Triangle Inequality: For any $x, y, z \in X$,

\begin{equation*} \tilde{d}(x,z) + \tilde{d}(z,y) = \frac{d(x,z)}{1+d(x,z)} + \frac{d(z,y)}{1+d(z,y)} \end{equation*}

Using the properties of fractions and the triangle inequality for $d$, we can show that:

\begin{equation*} \tilde{d}(x,z) + \tilde{d}(z,y) \geq \tilde{d}(x,y) \end{equation*}

Part 2: Show that (X) is bounded in the metric (tilde{d}(x,y)):

Given the nature of the fraction, as $d(x,y)$ increases, the value of $\tilde{d}(x,y)$ also increases, but at a diminishing rate due to the increasing denominator. As $d(x,y)$ approaches infinity, $\tilde{d}(x,y)$ approaches but never exceeds 1. This means that for all pairs $x, y \in X$, the value of $\tilde{d}(x,y)$ is always between 0 and 1, inclusive. Therefore, the supremum of $\tilde{d}(x,y)$ over all $x, y \in X$ is 1, which means that $X$ is bounded in the metric $\tilde{d}(x,y)$ with diameter at most 1.

Explanation for the supremum statement:

The function $\tilde{d}(x,y) = \frac{d(x,y)}{1+d(x,y)}$ is a fraction where the numerator is the original distance between $x$ and $y$, and the denominator is 1 plus that distance. The smallest value of $d(x,y)$ is 0 (when $x=y$), and in this case, $\tilde{d}(x,y) = 0$. As $d(x,y)$ increases, the value of $\tilde{d}(x,y)$ also increases, but at a diminishing rate due to the increasing denominator. As $d(x,y)$ approaches infinity, $\tilde{d}(x,y)$ approaches but never exceeds 1. This means that the largest possible value of $\tilde{d}(x,y)$ over all $x, y \in X$ is 1.

This completes the proof.

Let's delve deeper into the statement "the supremum of $\tilde{d}(x,y)$ over all $x,y \in X$ is at most 1."

Explanation:

The function $\tilde{d}(x,y) = \frac{d(x,y)}{1+d(x,y)}$ is a fraction where the numerator is the original distance between $x$ and $y$, and the denominator is 1 plus that distance.

The smallest value of $d(x,y)$ is 0 (when $x=y$), and in this case, $\tilde{d}(x,y) = 0$.
As $d(x,y)$ increases, the value of $\tilde{d}(x,y)$ also increases. However, because of the denominator $1+d(x,y)$, the rate of increase of $\tilde{d}(x,y)$ is slower than the rate of increase of $d(x,y)$.
As $d(x,y)$ approaches infinity, the fraction $\frac{d(x,y)}{1+d(x,y)}$ approaches 1. This means that no matter how large $d(x,y)$ becomes, $\tilde{d}(x,y)$ will never exceed 1.

Thus, the largest possible value of $\tilde{d}(x,y)$ over all $x,y \in X$ is 1, making the supremum of $\tilde{d}(x,y)$ equal to 1.

PINNs in Forward & Inverse Problems

Lucy Nowacki

2023-10-05 17:23

ooo

In this post, we demonstrate the forward and inverse PINNs problem for the heat equation

$$U_t=0.1\,U_{xx}$$

with Dirichlet boundary conditions, homogeneous $U(0,t)=0$ and non-homogeneous $U(1,t)=\cos(2t)$, together with the initial condition $U(x,0)=\sqrt{x}$. Non-homogeneous solutions can cause difficulties, i.e., it is often not straightforward to obtain stable convergence, and the model overfits easily. The problem can be overcome by applying a bespoke NN architecture and/or Fourier Features. The analytical solution was obtained through Mathematica in the form of a 30-term expansion of the solution and is plotted below. For the residual term, we have a grid of 200x100, totaling 20000 points from which we will sample points, namely collocation points inside the domain, $\Omega$.

The first step is to obtain an analytical solution by means of Mathematica. You can use the Mathematica kernel for free. How to install it and use with Jupyter you can find here. Now I will swap to the Mathematica kernel and obtain the approximated analytical solution of the equation in question.

heqn = D[u[x, t], t] == 0.1*D[u[x, t], {x, 2}]

ic = u[x, 0] == Sqrt[x];

bc = {u[0, t] == 0, u[1, t] == Cos[2t]};

sol = DSolve[{heqn, ic, bc}, u[x, t], {x, t}];

asol = u[x, t] /. sol[[1]] /. {[Infinity] -> 30} //Activate//FullSimplify;

Load/install needed libraries

In [2]:

import matplotlib.pyplot as plt
import torch
from torch import nn
from torch import Tensor
import torch.autograd as autograd
import torch.optim as optim
from torch.optim import Adam
import numpy as np
#!pip install pyDOE
from pyDOE import lhs #for Latin hypercube sampling
#!pip install torchviz
from torchviz import make_dot

#!pip install jupyterthemes
from jupyterthemes import jtplot
jtplot.style(theme="monokai", context="notebook", ticks=True, grid=True)

print("CUDA available: ", torch.cuda.is_available())
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print("PyTorch version: ", torch.__version__ )

CUDA available:  True
PyTorch version:  1.13.1

In this post, we delve into the intricacies of Residual Networks (ResNets) and the role of the tune_beta parameter, represented as $ \beta $ in the mathematics, in deep learning architectures, especially when it pertains to the enhancement of these networks.

Residual Learning¶

In classical feed-forward neural networks, each layer is trained to directly map its input to the corresponding output. However, for exceedingly deep networks, this can be problematic due to challenges such as vanishing or exploding gradients. Enter Residual Networks (ResNets), which introduced a paradigm shift by leveraging "shortcut" or "skip" connections to enable identity mapping, thereby allowing layers to learn the residual between the input and output.

Mathematically expressed, instead of aiming to learn $ H(x) $, a ResNet endeavors to learn the residual function:

$$ F(x) = H(x) - x $$

During the forward pass, the input $ x $ is added to this residual function, rendering the actual output:

$$ H(x) = F(x) + x $$

Stochastic Depth and the Emergence of $ \beta $¶

Stochastic depth is a regularization technique designed to improve convergence and generalization in deep networks. The essence of this method is the random "dropping" or "bypassing" of layers during the training phase.

Originally, stochastic depth utilized a binary random value to scale a layer's output. However, the innovation of introducing $ \beta $ generalized this binary random value into a continuous, trainable parameter. This allows the network to dynamically adjust the significance of each layer's output during the training process.

Layer Scaling with $ \beta $¶

Layer scaling is based on the proposition that permitting the network to modify the significance or contribution of specific layers or blocks can optimize the training process. For a given layer's output $ F(x) $, scaling introduces the factor $ \beta $:

$$ \text{Scaled Output} = \beta \cdot F(x) $$

When tune_beta is set to True, the network considers $ \beta $ as a learnable parameter for each layer in the residual block. This parameter scales the layer's output prior to activation. Conversely, when set to False, $ \beta $ is fixed at a value of 1, ensuring the layer's output remains unchanged.

Fourier Features in Machine Learning¶

Fourier analysis provides a way to express functions in terms of a combination of sine and cosine functions. This powerful technique asserts that, irrespective of a function's complexity, it can be represented as a sum of simpler sinusoids, each with a distinct frequency and amplitude.

Fourier Transform: A Bridge to the Frequency Domain¶

The Fourier transform is an essential operation that takes a function from its original domain (typically time or space) and maps it to the frequency domain. Formally, for a function $ f(t) $, its Fourier transform $ F(\omega) $ is given by:

$$ F(\omega) = \int_{-\infty}^{\infty} f(t) e^{-j\omega t} dt $$

The inverse process, which recovers $ f(t) $ from $ F(\omega) $, is achieved using the inverse Fourier transform:

$$ f(t) = \frac{1}{2\pi} \int_{-\infty}^{\infty} F(\omega) e^{j\omega t} d\omega $$

Fourier Features in Neural Networks¶

When deploying neural networks for tasks like regression on functions with periodic patterns, directly using raw inputs might not capture the inherent oscillatory behavior. Fourier features come to the rescue by projecting these inputs into a higher-dimensional space, enhancing the network's expressive power.

Given an input $ x $, the Fourier feature mapping can be defined as:

$$ \phi(x) = \left[ \sin(2\pi B x), \cos(2\pi B x) \right] $$

Where ( B ) is a matrix of learned or fixed frequencies. This mapping produces a sinusoidal embedding for each input dimension, amplifying the model's capacity to learn periodic patterns.

Fourier Features and the Utility of $ \beta $¶

Fourier features, often used to transform input data, can amplify the expressive power of neural networks, particularly for functions that exhibit periodic patterns. These features essentially project the input data into a higher-dimensional space, making it easier for the network to learn complex, oscillatory functions.

When used in conjunction with $ \beta $, the neural network can dynamically decide the importance of Fourier-transformed features versus the original features. By allowing the network to learn the optimal scaling factor $ \beta $, it can effectively balance between the raw and transformed features, leading to improved convergence and better generalization to unseen data.

Benefits in Machine Learning¶

Expressiveness: With Fourier features, neural networks can learn intricate, oscillatory functions with fewer parameters.
Generalization: By emphasizing periodic patterns, models might generalize better to unseen data exhibiting similar behaviors.
Interpolation: For tasks like image synthesis, Fourier features can lead to smoother interpolations between data points.

In essence, Fourier features offer a bridge, allowing neural networks to tap into the rich world of frequency-domain information, thus enhancing their learning capabilities in certain tasks.

Let's demonstrate the working of Fourier features using a simple 1D regression example.¶

This snippet is a good example that will help you understnand the Fourier Features.

We'll first generate some synthetic data that follows a sine wave pattern. We'll then try to fit a neural network to this data using both the raw input features and the Fourier-transformed input features. By comparing the two fits, we can visualize the impact of the Fourier features.

In [14]:

# Generate synthetic data
x = np.linspace(0, 4 * np.pi, 500)[:, None]
y = np.sin(x)

# Convert to PyTorch tensors and move them to the device
x_tensor = torch.tensor(x, dtype=torch.float32).to(device)
y_tensor = torch.tensor(y, dtype=torch.float32).to(device)

# Define a simple feedforward neural network
class SimpleNN(nn.Module):
    def __init__(self, input_dim):
        super(SimpleNN, self).__init__()
        self.fc = nn.Sequential(
            nn.Linear(input_dim, 64),
            nn.ReLU(),
            nn.Linear(64, 1)
        )
        
    def forward(self, x):
        return self.fc(x)

# Fourier transformation
B = torch.randn(1, 10).to(device)
x_fourier = torch.cat([torch.sin(2 * np.pi * x_tensor @ B), torch.cos(2 * np.pi * x_tensor @ B)], dim=1)

# Training function
def train_model(steps, model, x, y):
    optimizer = optim.Adam(model.parameters(), lr=0.01)
    for _ in range(steps):
        optimizer.zero_grad()
        loss = nn.MSELoss()(model(x), y)
        loss.backward()
        optimizer.step()
    return model(x)

# Training steps
steps_list = [10, 30, 4000]

# Initialize and move the models to the device
model_fourier = SimpleNN(20).to(device)
model_plain = SimpleNN(1).to(device)

# Collect predictions
predictions_fourier = []
predictions_plain = []

for steps in steps_list:
    predictions_fourier.append(train_model(steps, model_fourier, x_fourier, y_tensor))
    predictions_plain.append(train_model(steps, model_plain, x_tensor, y_tensor))

# Plotting
fig, axs = plt.subplots(3, 1, figsize=(10, 12))

for i, (steps, y_pred_f, y_pred_p) in enumerate(zip(steps_list, predictions_fourier, predictions_plain)):
    axs[i].plot(x, y, label="True Function", linestyle='dashed')
    axs[i].plot(x, y_pred_f.cpu().detach().numpy(), label=f"Fourier Features after {steps} steps")
    axs[i].plot(x, y_pred_p.cpu().detach().numpy(), label=f"Without Fourier Features after {steps} steps")
    axs[i].legend()
    axs[i].set_title(f'After {steps} Training Steps')

plt.tight_layout()
plt.show()

No description has been provided for this image

Analyzing the line of the Fourier transformation

Step 1: Initialisation of vector of random frequencies¶

B = torch.randn(1, 10).to(device)

Here, we're creating a tensor $B$ of shape (1, 10) filled with random values sampled from a normal distribution. These random values serve as the frequencies for the Fourier features. Represented mathematically, this tensor is:

$$ B = \begin{bmatrix} b_1 & b_2 & \dots & b_{10} \end{bmatrix} $$

where each $b_i$ is a random frequency.

Step 2: Fourier Transformation - projection¶

Given an input tensor $x_{\text{tensor}}$, we use the Fourier transformation to project it into a higher-dimensional space using sinusoids.

x_fourier = torch.cat([torch.sin(2 * np.pi * x_tensor @ B), torch.cos(2 * np.pi * x_tensor @ B)], dim=1)

Here, we're computing the sine of the product of the input tensor and the frequency tensor $B$. The $@$ symbol represents matrix multiplication. The result is a tensor of sine values corresponding to each frequency in $B$. Mathematically:

$$ x_{\text{sin}} = \left[\sin(2\pi x_1 b_1), \sin(2\pi x_1 b_2), \ldots, \sin(2\pi x_1 b_{10})\right] $$

Similarly, we compute the cosine of the product of the input tensor and the frequency tensor. This results in a tensor of cosine values. Represented mathematically:

$$ x_{\text{cos}} = \left[\cos(2\pi x_1 b_1), \cos(2\pi x_1 b_2), \ldots, \cos(2\pi x_1 b_{10})\right] $$

Step 3: Concatenation¶

Finally, the sine and cosine transformed values are concatenated along dimension 1 to form the Fourier-transformed input:

$$ x_{\text{fourier}} = [x_{\text{sin}}, x_{\text{cos}}] $$

This transformed input, $x_{\text{fourier}}$, can then be used in subsequent layers of the neural network, enabling it to capture oscillatory patterns more effectively.

In conclusion, Fourier features, as implemented in the provided code, leverage the power of sinusoidal transformations to enrich the input space of neural networks. This often results in enhanced performance, especially in tasks where the data or the solution has inherent oscillatory patterns or periodicities. For example, one of the challenges in using PINNs for PDEs is enforcing boundary and initial conditions. Fourier features, due to their inherent oscillatory nature, can provide a better fit near boundaries and initial conditions, ensuring that the neural network respects these constraints.

Dealing with High-Frequency Signals¶

One of the inherent strengths of Fourier features is their ability to handle high-frequency signals. The reason lies in the fundamental principles of Fourier analysis. High-frequency signals, by definition, oscillate rapidly. The Fourier transform, which decomposes a function into its constituent sinusoids, is adept at capturing these rapid oscillations by representing them in terms of their sine and cosine components. Thus, when the input data has high-frequency variations, the Fourier features can effectively represent these oscillations in the transformed space, ensuring that the neural network doesn't miss out on these critical patterns.

In tasks involving PDEs solved by PINNs, this becomes especially relevant. Many solutions to PDEs can exhibit rapid changes or oscillatory behaviors, especially in complex domains or under specific boundary conditions. Fourier features ensure that these behaviors are captured and represented adequately, enhancing the ability of the PINN to approximate the solution more accurately.

The NN architecture implemented here has both $\beta$-tuning and Fourier Features that facilitate the training process significantly

In [3]:

seed = 123
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)  # if you have more than one GPU
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = True

def analytical_sol(x, t, y):
    x_plot = x.squeeze(1)
    t_plot = t.squeeze(1)
    X, T = torch.meshgrid(x_plot, t_plot)
    F_xt = y
    fig = plt.figure(figsize=(25, 12))
    fig.suptitle('Analytical Solution of the Heat Equation', fontsize=30)

    ax1 = fig.add_subplot(121, projection='3d')
    ax2 = fig.add_subplot(122)

    ax1.plot_surface(T.numpy(), X.numpy(), F_xt.numpy(), cmap="twilight")
    ax1.set_title('U(x,t)')
    ax1.set_xlabel('t')
    ax1.set_ylabel('x')

    # Rotate the plot by 180 degrees in the x-dimension
    ax1.view_init(elev=45, azim=45)  # You can adjust the 'elev' parameter for desired elevation

    cp = ax2.contourf(T, X, F_xt, 20, cmap="twilight")
    fig.colorbar(cp)  # Add a colorbar to a plot
    ax2.set_title('U(x,t)')
    ax2.set_xlabel('t')
    ax2.set_ylabel('x')
    plt.show()

def approx_sol(x,t,y):
    X,T= x,t
    F_xt = y
    fig = plt.figure(figsize=(25,15))
    fig.suptitle('NN Approximated Solution of the Heat Equation', fontsize=30)
    ax1 = fig.add_subplot(121, projection='3d')
    ax2 = fig.add_subplot(122)

    ax1.plot_surface(T.numpy(), X.numpy(), F_xt.numpy(),cmap="twilight")
    ax1.set_title('U(x,t)')
    ax1.set_xlabel('t')
    ax1.set_ylabel('x')
    # Rotate the plot by 180 degrees in the x-dimension
    ax1.view_init(elev=45, azim=45)  # You can adjust the 'elev' parameter for desired elevation

    cp = ax2.contourf(T,X, F_xt,20,cmap="twilight")
    fig.colorbar(cp) # Add a colorbar to a plot
    ax2.set_title('U(x,t)')
    ax2.set_xlabel('t')
    ax2.set_ylabel('x')
    plt.show()


def contour_comparison_with_errors(x, t, y_analytical, y_approximated):
    x_plot = x.squeeze(1)
    t_plot = t.squeeze(1)
    X, T = torch.meshgrid(x_plot, t_plot)
    
    error_matrix = y_analytical - y_approximated
    pointwise_error = torch.sqrt(error_matrix**2)
    frobenius_norm = torch.norm(error_matrix, p='fro').item()
    
    fig = plt.figure(figsize=(35, 12))
    fig.suptitle('Comparison of Analytical and NN Approximated Solutions', fontsize=30)

    # Analytical Solution
    ax1 = fig.add_subplot(131)
    cp1 = ax1.contourf(T, X, y_analytical, 20, cmap="twilight")
    fig.colorbar(cp1, ax=ax1)
    ax1.set_title('Analytical U(x,t)')
    ax1.set_xlabel('t')
    ax1.set_ylabel('x')

    # Approximated Solution
    ax2 = fig.add_subplot(132)
    cp2 = ax2.contourf(T, X, y_approximated, 20, cmap="twilight")
    fig.colorbar(cp2, ax=ax2)
    ax2.set_title('Approximated U(x,t)')
    ax2.set_xlabel('t')
    ax2.set_ylabel('x')

    # Error Visualization using the Euclidean norm for each point
    ax3 = fig.add_subplot(133)
    cp3 = ax3.contourf(T, X, pointwise_error, 20, cmap="inferno")
    fig.colorbar(cp3, ax=ax3)
    ax3.set_title(f'Pointwise Euclidean Error\nFrobenius Norm: {frobenius_norm:.4f}')
    ax3.set_xlabel('t')
    ax3.set_ylabel('x')
    
    plt.show()

We will approximate the heat equation on the box [0, 1]x[0, 1], but the first step we will render the analytical solution of equation in question from Mathematica. There are [x, t] = [200, 100] collocation points for the residual loss, and 40 and 40 points for initial and boundary conditions, respectively.

In [4]:

# We will approximate the heat equation on the box [0, 1]x[0, 1]:
x_min=0.
x_max=1.
t_min=0.
t_max=2.#2*torch.pi#(3.*torch.pi)/4.
#Collocation discretisation of the box, i.e. for the residual term
# total_points_x=200
# total_points_t=100

total_points_x=200
total_points_t=100

#Create mesh
x=torch.linspace(x_min,x_max,total_points_x).view(-1,1) #add dimension
print(f'x shape is {x.shape}')
t=torch.linspace(t_min,t_max,total_points_t).view(-1,1)

# Let's define the analytical solution of our heat equation. It is approximation made up 
# of the first 30 terms obtained by Mathematica as shown above. 

def u(x, t):
    pi = torch.pi
    exp = torch.exp
    sin = torch.sin
    cos = torch.cos

    expression = (
        x * cos(2 * t) +
        0.750034 * exp(-0.98696 * t) * sin(pi * x) +
        0.0126985 * exp(-3.94784 * t) * sin(2 * pi * x) +
        0.0541309 * exp(-8.88264 * t) * sin(3 * pi * x) +
        0.0253758 * exp(-15.7914 * t) * sin(4 * pi * x) +
        0.0210899 * exp(-24.674 * t) * sin(5 * pi * x) +
        0.0149057 * exp(-35.5306 * t) * sin(6 * pi * x) +
        0.0123551 * exp(-48.3611 * t) * sin(7 * pi * x) +
        0.00983617 * exp(-63.1655 * t) * sin(8 * pi * x) +
        0.00840252 * exp(-79.9438 * t) * sin(9 * pi * x) +
        0.00707543 * exp(-98.696 * t) * sin(10 * pi * x) +
        0.00619775 * exp(-119.422 * t) * sin(11 * pi * x) +
        0.00539475 * exp(-142.122 * t) * sin(12 * pi * x) +
        0.00481634 * exp(-166.796 * t) * sin(13 * pi * x) +
        0.00428605 * exp(-193.444 * t) * sin(14 * pi * x) +
        0.00388256 * exp(-222.066 * t) * sin(15 * pi * x) +
        0.00351044 * exp(-252.662 * t) * sin(16 * pi * x) +
        0.00321628 * exp(-285.232 * t) * sin(17 * pi * x) +
        0.00294317 * exp(-319.775 * t) * sin(18 * pi * x) +
        0.00272112 * exp(-356.293 * t) * sin(19 * pi * x) +
        0.00251363 * exp(-394.784 * t) * sin(20 * pi * x) +
        0.00234125 * exp(-435.25 * t) * sin(21 * pi * x) +
        0.00217921 * exp(-477.689 * t) * sin(22 * pi * x) +
        0.00204226 * exp(-522.102 * t) * sin(23 * pi * x) +
        0.00191284 * exp(-568.489 * t) * sin(24 * pi * x) +
        0.00180193 * exp(-616.85 * t) * sin(25 * pi * x) +
        0.00169662 * exp(-667.185 * t) * sin(26 * pi * x) +
        0.00160532 * exp(-719.494 * t) * sin(27 * pi * x) +
        0.00151825 * exp(-773.777 * t) * sin(28 * pi * x) +
        0.00144203 * exp(-830.034 * t) * sin(29 * pi * x) +
        1. * (
            sin(2. * t) * (
                0.252637 * sin(pi * x) -
                0.128324 * sin(2 * pi * x) +
                0.0454747 * sin(3 * pi * x) -
                0.019839 * sin(4 * pi * x) +
                0.0102531 * sin(5 * pi * x) -
                0.00595364 * sin(6 * pi * x) +
                0.00375469 * sin(7 * pi * x) -
                0.00251713 * sin(8 * pi * x) +
                0.00176852 * sin(9 * pi * x) -
                0.00128953 * sin(10 * pi * x) +
                0.00096897 * sin(11 * pi * x) -
                0.000746415 * sin(12 * pi * x) +
                0.000587108 * sin(13 * pi * x) -
                0.000470089 * sin(14 * pi * x) +
                0.000382209 * sin(15 * pi * x) -
                0.000314937 * sin(16 * pi * x) +
                0.000262568 * sin(17 * pi * x) -
                0.000221195 * sin(18 * pi * x) +
                0.000188077 * sin(19 * pi * x) -
                0.000161254 * sin(20 * pi * x) +
                0.000139297 * sin(21 * pi * x) -
                0.000121153 * sin(22 * pi * x) +
                0.000106028 * sin(23 * pi * x) -
                0.0000933193 * sin(24 * pi * x) +
                0.0000825631 * sin(25 * pi * x) -
                0.0000733984 * sin(26 * pi * x) +
                0.0000655414 * sin(27 * pi * x) -
                0.000058767 * sin(28 * pi * x) +
                0.0000528949 * sin(29 * pi * x) -
                0.0000477798 * sin(30 * pi * x)
            ) +
            cos(2. * t) * (
                -0.511949 * sin(pi * x) +
                0.0650094 * sin(2 * pi * x) -
                0.010239 * sin(3 * pi * x) +
                0.00251264 * sin(4 * pi * x) -
                0.000831087 * sin(5 * pi * x) +
                0.000335128 * sin(6 * pi * x) -
                0.000155277 * sin(7 * pi * x) +
                0.0000796995 * sin(8 * pi * x) -
                0.0000442442 * sin(9 * pi * x) +
                0.0000261314 * sin(10 * pi * x) -
                0.0000162276 * sin(11 * pi * x) +
                0.0000105038 * sin(12 * pi * x) -
                7.03982e-6 * sin(13 * pi * x) +
                4.8602e-6 * sin(14 * pi * x) -
                3.4423e-6 * sin(15 * pi * x) +
                2.49295e-6 * sin(16 * pi * x) -
                1.84109e-6 * sin(17 * pi * x) +
                1.38344e-6 * sin(18 * pi * x) -
                1.05574e-6 * sin(19 * pi * x) +
                8.1692e-7 * sin(20 * pi * x) -
                6.40081e-7 * sin(21 * pi * x) +
                5.07247e-7 * sin(22 * pi * x) -
                4.06158e-7 * sin(23 * pi * x) +
                3.28306e-7 * sin(24 * pi * x) -
                2.67692e-7 * sin(25 * pi * x) +
                2.20024e-7 * sin(26 * pi * x) -
                1.82187e-7 * sin(27 * pi * x) +
                1.51896e-7 * sin(28 * pi * x) -
                1.27452e-7 * sin(29 * pi * x) +
                1.0758e-7 * sin(30 * pi * x)
            )
        ) +
        0.00136908 * exp(-888.264 * t) * sin(30 * pi * x)
    )

    return expression

# Create the mesh 
X,T=torch.meshgrid(x.squeeze(1),t.squeeze(1))#[200,100]
print(x.shape)#[200,1]
print(t.shape)#[100,1]
print(X.shape)#[200,100]
print(T.shape)#[200,100]

x_min_tens = torch.Tensor([x_min]) 
x_max_tens = torch.Tensor([x_max]) 
#To get analytical solution obtained via MATHEMATICA
f_real = lambda x, t:  u(x, t)
# Evaluate real solution on the box domain
U_real=f_real(X,T)
print(f'U_real shape is: {U_real.shape}')
analytical_sol(x,t,U_real) #f_real was defined previously(function)

x shape is torch.Size([200, 1])
torch.Size([200, 1])
torch.Size([100, 1])
torch.Size([200, 100])
torch.Size([200, 100])
U_real shape is: torch.Size([200, 100])

/home/lucy/anaconda3/envs/lucy/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1670525541990/work/aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]

The class Feeder() manages the mesh together with boundary and initial conditions. Here we have a tensor (x, t) for $u(x,t)$. So we have to flatten it in each dimension, and next to cocatenate obtained long vectors into two column vector. After all, we sample either for the collocation points or boundary conditions. Everthing is processed by class Feeder.

In [5]:

class Feeder():
    def __init__(self, N_IC, N_BC, x_min, x_max, t_min, t_max, tot_pts_x, tot_pts_t, real_func, 
                 left_boundary_func, bottom_boundary_func, top_boundary_func):
        # Create mesh
        self.x = torch.linspace(x_min, x_max, tot_pts_x).view(-1,1)
        self.t = torch.linspace(t_min, t_max, tot_pts_t).view(-1,1)
        # Create the mesh
        self.X, self.T = torch.meshgrid(self.x.squeeze(1), self.t.squeeze(1))  # [200,100]
        # Set the functions as attributes of the class
        self.real_func = real_func
        self.left_boundary_func = left_boundary_func
        self.bottom_boundary_func = bottom_boundary_func
        self.top_boundary_func = top_boundary_func
        self.N_IC = N_IC
        self.N_BC = N_BC

    def func_real(self):
        self.U_real = self.real_func(self.X, self.T)
        return self.x, self.t, self.U_real

    def get_tensors(self, N_residual):
        # Call func_real() with the provided real_func
        self.func_real()

        # Prepare testing data
        x_test = torch.hstack((self.X.T.flatten()[:, None], self.T.T.flatten()[:, None]))  # [200x100 -> 20000,2]
        U_test = self.U_real.T.flatten()[:, None]  # [200x100 -> 20000,1]
        # Domain bounds
        lb = x_test[0]  # first value of the mesh
        ub = x_test[-1]  # last value of the mesh

        # Left Edge: U(x,0) -> xmin=<x=<xmax; t=0
        left_X = torch.hstack((self.X[:, 0][:, None], self.T[:, 0][:, None]))  # [200,2]
        left_U = self.left_boundary_func(left_X)  # [200,1]

        # Bottom Edge: U(x_min=0,t) = sin(2x); tmin=<t=<max
        bottom_X = torch.hstack([self.X[0, :][:, None], self.T[0, :][:, None]])  # [100,2]
        bottom_U = self.bottom_boundary_func(bottom_X)  # [100,1]

        # Top Edge: U(x_max=1,t) = 0 ; 0=<t=<1
        top_X = torch.hstack((self.X[-1, :][:, None], self.T[-1, :][:, None]))  # [100,2]
        top_U = self.top_boundary_func(top_X)  # [100,1]

        # N_IC=20
        idx = np.random.choice(left_X.shape[0], self.N_IC, replace=False)
        X_IC = left_X[idx, :]  # [15,2]
        U_IC = left_U[idx, :]  # [15,1]

        # Stack all boundaries vertically
        X_BC = torch.vstack([bottom_X, top_X])  # [150,2]
        U_BC = torch.vstack([bottom_U, top_U])  # [150,1]

        # N_BC=200
        idx = np.random.choice(X_BC.shape[0], self.N_BC, replace=False)
        X_BC = X_BC[idx, :]  # [100,2]
        U_BC = U_BC[idx, :]  # [100,1]

        # Stack all boundaries vertically
        X_train_boundaries = torch.vstack([X_IC, X_BC])  # [200,2]
        U_train_boundaries = torch.vstack([U_IC, U_BC])  # [200,1]

        # Collocation Points to evaluate PDE residual
        X_train_residual = lb + (ub - lb) * torch.tensor(lhs(2, N_residual)).float()  # [1000,2]
        X_train_total = torch.vstack((X_train_residual, X_train_boundaries))  # [10000,2]

        # Store tensors to GPU
        X_train_boundaries = X_train_boundaries.float().to(device)  # Training Points (BC)
        U_train_boundaries = U_train_boundaries.float().to(device)  # Training Points (BC)
        X_train_total = X_train_total.float().to(device)  # Collocation Points
        U_hat = torch.zeros(X_train_total.shape[0], 1).to(device)  # [10100,1]

        return lb, ub, x_test, U_test, X_train_boundaries, U_train_boundaries, X_train_total, U_hat

In [6]:

def initial_condition_function(left_X):
    return torch.sqrt(left_X[:, 0]).unsqueeze(1)

def bottom_boundary_function(bottom_X):
    return torch.zeros(bottom_X.shape[0], 1)

def top_boundary_function(top_X):
    return torch.cos(2. * top_X[:, 1]).unsqueeze(1)

# Assuming you have a real_func defined somewhere
dane = Feeder(N_IC=40, N_BC=40, x_min=x_min, x_max=x_max, t_min=t_min, t_max=t_max, tot_pts_x=total_points_x, tot_pts_t=total_points_t, real_func=f_real,
                left_boundary_func=initial_condition_function, 
                bottom_boundary_func=bottom_boundary_function, 
                top_boundary_func=top_boundary_function)
# We will use 10000 collocation points
lb, ub, x_test, U_test, X_train_boundaries, U_train_boundaries, X_train_total, U_hat  = dane.get_tensors(10000)
print('x_test_shape: ', x_test.shape)
print('U_test_shape: ', U_test.shape)
print('X_train_boundaries_shape: ', X_train_boundaries.shape)
print('U_train_boundaries_shape: ', U_train_boundaries.shape)
print('X_train_total_shape: ', X_train_total.shape)
print('U_hat_shape: ',  U_hat.shape)

x_test_shape:  torch.Size([20000, 2])
U_test_shape:  torch.Size([20000, 1])
X_train_boundaries_shape:  torch.Size([80, 2])
U_train_boundaries_shape:  torch.Size([80, 1])
X_train_total_shape:  torch.Size([10080, 2])
U_hat_shape:  torch.Size([10080, 1])

ResNet with Fourier Features and tuning beta

In [5]:

class ResBlock(nn.Module):
    def __init__(self, num_layers, num_neurons, activation, tune_beta):
        super(ResBlock, self).__init__()
        
        # Create a list of linear layers with num_layers elements
        self.layers = nn.ModuleList([nn.Linear(num_neurons, num_neurons) for _ in range(num_layers)])
        self.activation = activation

        # If tune_beta is True, initialize beta for each layer in the ResBlock as learnable parameters
        if tune_beta:
            self.beta = nn.Parameter(torch.ones(num_layers, 1))
        else:
            # If tune_beta is False, set beta to a tensor of ones
            self.beta = torch.ones(num_layers, 1).to(device)

    def forward(self, x):
        identity = x
        for idx, layer in enumerate(self.layers):
            # Apply the activation function to the output of each layer scaled by beta
            x = self.activation(self.beta[idx] * layer(x)) + identity
        return x

class DenseResNet(nn.Module):
    def __init__(self, dim_in, dim_out, num_resnet_blocks,
                 num_layers_per_block, num_neurons, activation,
                 fourier_features, m_freqs, sigma, tune_beta, lb, ub):
        super(DenseResNet, self).__init__()
        self.fourier_features = fourier_features
        self.activation = activation
        self.lb = lb
        self.ub = ub

        if fourier_features:
            # Initialize learnable parameter B for Fourier features
            self.B = nn.Parameter(sigma * torch.randn(dim_in, m_freqs).to(device))
            dim_in = 2 * m_freqs

        # Define the first layer as a linear layer followed by activation
        self.first = nn.Sequential(
            nn.Linear(dim_in, num_neurons),
            activation,
        )

        # Create a list of ResBlock modules, each with num_layers_per_block layers
        # Pass the tune_beta flag to each ResBlock
        self.resblocks = nn.ModuleList([
            ResBlock(num_layers_per_block, num_neurons, activation, tune_beta)
            for _ in range(num_resnet_blocks)
        ])

        # Define the last layer as a linear layer
        self.last = nn.Linear(num_neurons, dim_out)

    def forward(self, x):
        ub = self.ub.float().to(device)
        lb = self.lb.float().to(device)
        
        # Perform feature scaling to the input
        x = (x - lb) / (ub - lb)

        if self.fourier_features:
            # Compute the cosine and sine components of the Fourier features
            cosx = torch.cos(torch.matmul(x, self.B)).to(device)
            sinx = torch.sin(torch.matmul(x, self.B)).to(device)
            
            # Concatenate the cosine and sine components along dimension 1
            x = torch.cat((cosx, sinx), dim=1)

        # Forward pass through the first layer
        x = self.first(x)
        
        # Forward pass through each ResBlock in the list
        for resblock in self.resblocks:
            x = resblock(x)

        # Forward pass through the last layer
        out = self.last(x)
        return out

To visualise the NN architecture

In [18]:

# lb = torch.tensor([0.0, 0.0]).to(device)  # Example lb
# ub = torch.tensor([1.0, 1.0]).to(device)  # Example ub

# Fourier_dense_net1 = DenseResNet(dim_in=2, dim_out=1, num_resnet_blocks=3, 
#                         num_layers_per_block=3, num_neurons=25, activation=nn.LeakyReLU(0.65),
#                         fourier_features=True, m_freqs=30, sigma=2*torch.pi, tune_beta=True, lb=lb, ub=ub).to(device)

# # PINN = Fourier_dense_net1.to(device) 
# x = torch.randn(2,2).to(device).requires_grad_(True)
# y = Fourier_dense_net1(x)
# #make_dot(y, params=dict(list(PINN.named_parameters()))).render("Residual_net", format="png")
# make_dot(y, params=dict(list(Fourier_dense_net1.named_parameters()))).render("Residual_PINN", format="svg")

The architecture of NN is as follow image info or in details

There are helper classes for the trainig purposes

In [23]:

class EarlyStopping:
    def __init__(self, patience=30, verbose=True, delta=0):
        # Initialize EarlyStopping with patience, verbosity, and delta
        self.patience = patience  # Number of steps with no improvement to wait before stopping
        self.verbose = verbose    # Whether to print messages when early stopping occurs
        self.test_loss_history = []  # Use a list to store the history of test losses
        self.delta = delta        # Minimum change in test loss to be considered an improvement

    def __call__(self, test_loss):
        # Calculate a score based on the negative of the test loss (lower loss is better)
        score = -test_loss
        self.test_loss_history.append(score)  # Append the score to the history of test losses

        # If we have more than `patience` results, then look back `patience` results
        if len(self.test_loss_history) > self.patience:
            # Check if the current score is better than the score from `patience` steps ago
            if score < self.test_loss_history[-self.patience] + self.delta:
                if self.verbose:
                    print(f"Early stopping due to no improvement over the last {self.patience} steps.")
                return True  # Early stop condition is met
        return False  # Continue training

The solving of forward and inverse problems using Physics-Informed Neural Networks (PINNs) involves creating a neural network that can learn to approximate the solutions to the given partial differential equations (PDEs) while respecting any known physics (i.e., the differential equations and boundary/initial conditions).

Forward Problem Algorithm:¶

For the forward problem, the goal is to find the function $u$ for a given data $f$ with specified parameters $\omega$. A typical approach for solving both the forward and inverse problems with PINNs, as seen in literature, is based [source]. Here is a simplified version of the forward algorithm:

Initialize Neural Network: Set up a neural network with an appropriate architecture.
Define Loss Function: The loss function incorporates a residual term from the PDE and possibly boundary/initial conditions.
Train Neural Network: Train the neural network to minimize the loss function using optimization algorithms like Adam or L-BFGS.
Evaluate: Evaluate the trained network to obtain the solution $u$ over the domain of interest.

Inverse Problem Algorithm:¶

For the implementation go here

In the inverse problem, the aim is to deduce parameters $\gamma$ from the given data. One way to approach this is by extending the traditional PINN framework to handle the inverse problem by incorporating the unknown parameters into the neural network and optimizing over these as well. A Bayesian approach has also been proposed, where a Bayesian Neural Network (BNN) combined with a PINN serves as the prior, and techniques like Hamiltonian Monte Carlo (HMC) or Variational Inference (VI) could serve as estimators [source]. Here is a simplified version of the algorithm:

Initialize Neural Network and Parameters: Set up a neural network with an appropriate architecture and initialize the unknown parameters $\gamma$.
Define Loss Function: The loss function incorporates a residual term from the PDE, boundary/initial conditions, and a term related to the discrepancy between the predicted and observed data.
Optimize: Train the neural network and optimize the unknown parameters $\gamma$ to minimize the loss function.
Evaluate: Evaluate the trained network and optimized parameters to obtain the solution $u$ and parameters $\gamma$ over the domain of interest.

In [24]:

X_test=x_test.float().to(device) # the input dataset (complete)
U_test=U_test.float().to(device) # the real solution


class Fabryka(nn.Module):
    def __init__(self, X_train_boundaries, U_train_boundaries, U_hat, X_train_total, total_pts_x, total_pts_t, dim_in, dim_out, num_resnet_blocks,
                 num_layers_per_block, num_neurons, activation,
                 fourier_features, m_freqs, sigma, tune_beta, lb, ub):
        super(Fabryka, self).__init__()

        self.X_train_boundaries = X_train_boundaries
        self.U_train_boundaries = U_train_boundaries
        self.U_hat = U_hat
        self.X_train_total = X_train_total
        self.total_pts_x = total_pts_x
        self.total_pts_t = total_pts_t

        self.loss_function = nn.MSELoss(reduction ='mean')
        # Initialize the DenseResNet inside FCN
        self.dense_res_net = DenseResNet(dim_in, dim_out, num_resnet_blocks,
                                         num_layers_per_block, num_neurons, activation,
                                         fourier_features, m_freqs, sigma, tune_beta, lb=lb, ub=ub)

    def forward_loss(self):
        x_BC= self.dense_res_net(self.X_train_boundaries)
        loss_BC = self.loss_function(x_BC, self.U_train_boundaries) #Loss
        g=X_train_total.clone()  # points to training
        g.requires_grad=True # enable differentiation
        U = self.dense_res_net(g)
        # DERVIVATIVES COMPUTATIONS
        U_x_t = autograd.grad(U,g,torch.ones([g.shape[0], 1]).to(device), retain_graph=True, create_graph=True)[0] #first derivative
        U_xx_tt = autograd.grad(U_x_t,g,torch.ones(g.shape).to(device), create_graph=True)[0] #second derivative
        U_t=U_x_t[:,[1]]    # we select the 2nd element for t (the first one is x) (Remember the input X=[x,t])
        U_xx=U_xx_tt[:,[0]] # we select the 1st element for x (the second one is t) (Remember the input X=[x,t])
        U=U_t - 0.1*U_xx    # updated approximation
        loss_PDE = self.loss_function(U,self.U_hat)#U_hat=0 if you putt the equation on the LHS and leave 0 on the RHS
        loss = loss_BC + loss_PDE
        return loss

    def test_PINN(self, X_test):
      return self.dense_res_net(X_test)


    def __call__(self, steps, lr):

        optimizer = torch.optim.Adam(self.parameters(), lr=lr)

        LOSSES = []
        TEST_LOSSES = []
        ITERACJE = []
        #LEARNING_RATES = []  # Initialization
        iter = 0
        test_loss = torch.tensor([np.Inf]).to(device)
        U_anim = np.zeros((self.total_pts_x, self.total_pts_t, steps))

        early_stopping_test = EarlyStopping(patience=30, verbose=True)
        print('Iter|Training Losses|Test Losses')
        for i in range(steps):
            optimizer.zero_grad()
            loss = self.forward_loss()
            loss.backward()
            optimizer.step()

            # Updated to handle the larger number of steps
            if i < steps:
                U_anim[:, :, i] = self.dense_res_net(X_test).reshape(shape=[self.total_pts_t, self.total_pts_x]).transpose(1, 0).detach().cpu()
            iter += 1

            if i % 30 == 0:
                with torch.no_grad():
                    self.loss_function.eval()
                    # Compute test loss
                    test_loss = self.loss_function(self.dense_res_net(X_test), U_test)
                    LOSSES.append(loss.detach().cpu().numpy())
                    TEST_LOSSES.append(test_loss.detach().cpu().numpy())
                    #current_lr = optimizer.param_groups[0]['lr']
                    #LEARNING_RATES.append(current_lr)
                    ITERACJE.append(iter)

                    # Early stopping check
                    if early_stopping_test(test_loss.detach().cpu()):
                        break
                
                print(iter-1, ': ', loss.detach().cpu().numpy(), '---', test_loss.detach().cpu().numpy())#, '--- LR:', current_lr)
        
        return U_anim, ITERACJE, LOSSES, TEST_LOSSES

In [28]:

seed = 123
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)  # if you have more than one GPU
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = True

fcn_model = Fabryka(X_train_boundaries, U_train_boundaries, U_hat, X_train_total, total_points_x, total_points_t, dim_in=2, dim_out=1, num_resnet_blocks=3,
                        num_layers_per_block=3, num_neurons=25, activation=nn.ELU(0.7),#nn.LeakyReLU(0.65),
                        fourier_features=True, m_freqs=29, sigma=2, tune_beta=True, lb=lb, ub=ub)  # Fill in the other required arguments
PINN = fcn_model.to(device)
steps = 4000  # or whatever number of steps you want

learning_rate = 8e-5
U_anim, ITERACJE, LOSSES, TEST_LOSSES = PINN(steps, learning_rate)

Iter|Training Losses|Test Losses
0 :  0.468925 --- 0.28691
30 :  0.2621652 --- 0.113855176
60 :  0.20304663 --- 0.065380335
90 :  0.16770844 --- 0.050485965
120 :  0.13713512 --- 0.038412724
150 :  0.1098391 --- 0.02873213
180 :  0.086597115 --- 0.021036236
210 :  0.06781949 --- 0.015839787
240 :  0.053009707 --- 0.012410511
270 :  0.041080147 --- 0.010183584
300 :  0.031839874 --- 0.008684642
330 :  0.024757434 --- 0.0076582395
360 :  0.01974354 --- 0.006889947
390 :  0.016145233 --- 0.006218079
420 :  0.013526186 --- 0.0057333764
450 :  0.011637928 --- 0.005431896
480 :  0.009868612 --- 0.005092977
510 :  0.008498156 --- 0.004778018
540 :  0.0074970415 --- 0.004501841
570 :  0.0067845127 --- 0.0042877793
600 :  0.006226059 --- 0.004070964
630 :  0.005765113 --- 0.0038868117
660 :  0.0054019545 --- 0.0037612708
690 :  0.0050892504 --- 0.0035976954
720 :  0.004776342 --- 0.003446025
750 :  0.0045442753 --- 0.0033423544
780 :  0.0043516154 --- 0.00321678
810 :  0.004199932 --- 0.0031565323
840 :  0.004042793 --- 0.0030986436
870 :  0.003882287 --- 0.0030353274
900 :  0.003710757 --- 0.0029827754
930 :  0.0035694437 --- 0.0029258812
960 :  0.0034092527 --- 0.002865794
990 :  0.0032419157 --- 0.0028174834
1020 :  0.003149318 --- 0.0027772353
1050 :  0.0030456046 --- 0.0027627475
1080 :  0.0028909561 --- 0.0027378497
1110 :  0.0028413073 --- 0.0027280853
1140 :  0.0027200207 --- 0.0027100479
1170 :  0.0026494698 --- 0.002683589
1200 :  0.0025702505 --- 0.002667365
1230 :  0.0025124962 --- 0.0026474625
1260 :  0.002448263 --- 0.0026071856
1290 :  0.002377959 --- 0.0025808895
1320 :  0.0023367987 --- 0.0025443875
1350 :  0.002270943 --- 0.0025166022
1380 :  0.0022320156 --- 0.0024998365
1410 :  0.002181276 --- 0.0024441583
1440 :  0.0021435595 --- 0.0024084044
1470 :  0.0020856662 --- 0.0023469864
1500 :  0.0020219139 --- 0.0022983577
1530 :  0.0019888368 --- 0.0022292216
1560 :  0.0019703903 --- 0.0021512657
1590 :  0.001951131 --- 0.0020587128
1620 :  0.0019260589 --- 0.0019524704
1650 :  0.0019119659 --- 0.0018759979
1680 :  0.0018765854 --- 0.0017755636
1710 :  0.0018453137 --- 0.0016931132
1740 :  0.0017868083 --- 0.001613018
1770 :  0.0017254128 --- 0.0015507904
1800 :  0.0016898145 --- 0.0014790513
1830 :  0.001650763 --- 0.0014157251
1860 :  0.0016297187 --- 0.0013635923
1890 :  0.0015918473 --- 0.0013193875
1920 :  0.001563162 --- 0.0012664263
1950 :  0.0015677896 --- 0.0012403495
1980 :  0.0016253946 --- 0.0012094359
2010 :  0.0016668786 --- 0.0011979443
2040 :  0.0016486959 --- 0.0011799183
2070 :  0.001655466 --- 0.0011638915
2100 :  0.0016211409 --- 0.0011522169
2130 :  0.0016085415 --- 0.0011521807
2160 :  0.0015830229 --- 0.0011525482
2190 :  0.0015499955 --- 0.0011422847
2220 :  0.0014853115 --- 0.0011327434
2250 :  0.0013992008 --- 0.0011219531
2280 :  0.0013436937 --- 0.0011108606
2310 :  0.0012901663 --- 0.0011086622
2340 :  0.001253202 --- 0.0011035432
2370 :  0.0012286587 --- 0.0010904134
2400 :  0.0011829687 --- 0.0010685654
2430 :  0.0011353065 --- 0.001038318
2460 :  0.0011126392 --- 0.0010093594
2490 :  0.0010938832 --- 0.0009819814
2520 :  0.001068648 --- 0.0009471277
2550 :  0.0010525509 --- 0.0009117645
2580 :  0.0010656575 --- 0.00087259937
2610 :  0.0010670838 --- 0.0008309442
2640 :  0.0010907757 --- 0.00080858875
2670 :  0.0010618016 --- 0.00078817835
2700 :  0.001003453 --- 0.0007754855
2730 :  0.000997267 --- 0.00077035255
2760 :  0.0009832024 --- 0.0007678535
2790 :  0.0009898666 --- 0.00076933403
2820 :  0.0009720258 --- 0.0007785238
2850 :  0.0009329722 --- 0.0007839675
2880 :  0.00088877714 --- 0.0007940581
2910 :  0.0008495425 --- 0.00080384384
2940 :  0.0008251701 --- 0.0008045991
2970 :  0.00079786486 --- 0.000805407
3000 :  0.00077164016 --- 0.0008069982
3030 :  0.0007444651 --- 0.00080586143
3060 :  0.00072155613 --- 0.000802927
3090 :  0.0006996202 --- 0.000803055
3120 :  0.000687537 --- 0.0008039736
3150 :  0.0006753947 --- 0.000799291
3180 :  0.00066558574 --- 0.0007981512
3210 :  0.0006639491 --- 0.0007970973
3240 :  0.00065873796 --- 0.00079331343
3270 :  0.00065332593 --- 0.00078922353
3300 :  0.0006397837 --- 0.00078767736
3330 :  0.00064285 --- 0.0007852361
3360 :  0.00062098046 --- 0.0007821692
3390 :  0.00062533113 --- 0.0007789852
3420 :  0.0006127054 --- 0.00077238935
3450 :  0.0006183595 --- 0.0007647131
3480 :  0.00062309264 --- 0.00075954146
3510 :  0.00061472144 --- 0.0007544256
3540 :  0.0006060315 --- 0.00074986694
3570 :  0.00061153655 --- 0.00074819825
3600 :  0.0005951432 --- 0.000745387
3630 :  0.00059012405 --- 0.0007451246
3660 :  0.00057858543 --- 0.00074309786
3690 :  0.00057183317 --- 0.0007396285
3720 :  0.0005619305 --- 0.00073659705
3750 :  0.00054851035 --- 0.00073202036
3780 :  0.0005448676 --- 0.0007298937
3810 :  0.0005307372 --- 0.0007244989
3840 :  0.00052222935 --- 0.00072126207
3870 :  0.0005062257 --- 0.00071828277
3900 :  0.00050085003 --- 0.0007160879
3930 :  0.00049519213 --- 0.00071459956
3960 :  0.0004798803 --- 0.00071136537
3990 :  0.00048315214 --- 0.0007084526

In [29]:

fig, (ax1, ax2) = plt.subplots(2,1, figsize=(20,15))
fig.suptitle('Historical Losses', fontsize=30)
ax1.semilogy(ITERACJE[1:], LOSSES[1:], c='deeppink',linewidth=2.8, label='Train loss')
ax1.set_ylabel('Loss')
ax1.set_xlabel('Iterations')
ax1.legend()

ax2.semilogy(ITERACJE[1:], TEST_LOSSES[1:],c='dodgerblue',linewidth=2.8, label='Test loss')
ax2.set_ylabel('Test Loss')
ax2.set_xlabel('Iterations')
ax2.legend()
plt.show()

In [ ]:

# Save the model parameters
model_path = 'fcn_model.pth'
torch.save(PINN.state_dict(), model_path)

In [27]:

# Create a new instance of the model class
new_fcn_model = Fabryka(
    X_train_boundaries, U_train_boundaries, U_hat, X_train_total,
    total_points_x, total_points_t, dim_in=2, dim_out=1,
    num_resnet_blocks=3, num_layers_per_block=3, num_neurons=25,
    activation=nn.ELU(0.7), fourier_features=True, m_freqs=29,
    sigma=2, tune_beta=True, lb=lb, ub=ub
)
new_PINN = new_fcn_model.to(device)

# Load the saved parameters
model_path = 'fcn_model.pth'
new_PINN.load_state_dict(torch.load(model_path))

# Set the model to evaluation mode
new_PINN.eval()

Out[27]:

Fabryka(
  (loss_function): MSELoss()
  (dense_res_net): DenseResNet(
    (activation): ELU(alpha=0.7)
    (first): Sequential(
      (0): Linear(in_features=58, out_features=25, bias=True)
      (1): ELU(alpha=0.7)
    )
    (resblocks): ModuleList(
      (0): ResBlock(
        (layers): ModuleList(
          (0): Linear(in_features=25, out_features=25, bias=True)
          (1): Linear(in_features=25, out_features=25, bias=True)
          (2): Linear(in_features=25, out_features=25, bias=True)
        )
        (activation): ELU(alpha=0.7)
      )
      (1): ResBlock(
        (layers): ModuleList(
          (0): Linear(in_features=25, out_features=25, bias=True)
          (1): Linear(in_features=25, out_features=25, bias=True)
          (2): Linear(in_features=25, out_features=25, bias=True)
        )
        (activation): ELU(alpha=0.7)
      )
      (2): ResBlock(
        (layers): ModuleList(
          (0): Linear(in_features=25, out_features=25, bias=True)
          (1): Linear(in_features=25, out_features=25, bias=True)
          (2): Linear(in_features=25, out_features=25, bias=True)
        )
        (activation): ELU(alpha=0.7)
      )
    )
    (last): Linear(in_features=25, out_features=1, bias=True)
  )
)

In [28]:

U1=new_PINN.test_PINN(X_test)
x1=X_test[:,0]
t1=X_test[:,1]

arr_x1=x1.reshape(shape=[total_points_t,total_points_x]).transpose(1,0).detach().cpu()
arr_T1=t1.reshape(shape=[total_points_t,total_points_x]).transpose(1,0).detach().cpu()
arr_U1=U1.reshape(shape=[total_points_t,total_points_x]).transpose(1,0).detach().cpu()
arr_U_test=U_test.reshape(shape=[total_points_t,total_points_x]).transpose(1,0).detach().cpu()

approx_sol(arr_x1,arr_T1,arr_U1)

In [31]:

contour_comparison_with_errors(x, t, U_real, arr_U1)

Inverse Problem¶

Denote a neural network (NN) function by $N$ and remember that PDE in question is a model that can be parametrised. We parametrise the assumed $u(x,t) + model[\theta; \alpha]=0$ with $u(x,t)=u_t$ as hidden solution and $model[\theta; \alpha]$ as non-linear operator. Note, there are also parameters $\theta$ inside the $N$. Thus, for the discovering purposes, we have to put parameters $\theta$ and $\alpha$ together. By solving the inverse proble, for example, having some temperature data flow, we can identify the PDE's parameters characteristic for a given material, and hence to identify this material.

Rearranging our heat eq $U_t-0.1\,U_{xx}$, or $u_t+model[\theta; \alpha]=0$, we use PINN to obtain $\alpha$ by setting $\theta$ parmeters inside the NN under the PDE's operator structure.

By universal approximation theorem, we can use neural network (NN), $N$, to approximate any function $u(x,t) \approx N(x,t)$ . Since $N$ is a function, we can obtain its derivatives via automatic differention. Then we can arrange these derivatives to mimic the PDE's opearator structure

$$u_t-\alpha u_{xx}=N_t-\alpha N_{xx}=0$$

Define function $f$

$$f(t, x)=u_t\underbrace{-\alpha u_{xx}}_{{+ model[\theta; \alpha]}}=N_t-\alpha N_{xx}$$

and try to

$$f(x,t)\approx 0$$

$f$ is a functional that is the operator taking the function and returns a scalar. It is also the argument of the loss function, $\mathcal{L}_{f}$ . If $f {\longrightarrow} 0$ then we discover our $\alpha $ parameter.

In order to better express the dependence on the parameters $\alpha$ and $\theta$ in the total loss expression $\mathcal{L}$, you might want to consider the following amendment to the total loss expression. Incorporating the additional loss due to boundary and initial conditions, the total loss function now can be expressed as:

$$ \mathcal{L}(\theta, \alpha) = \mathcal{L}_{f}(\theta, \alpha) + \mathcal{L}_{u=data}(\theta) + \mathcal{L}_{\text{b}}(\theta) $$

Here are the individual components of the total loss function:

Loss Relating to PDE: $$ \mathcal{L}_{f}(\theta, \alpha) = \frac{1}{N_{N\_f}} \sum _{i=1}^{N\_f} \left| f\left(x_n^i,t_n^i; \theta, \alpha\right) \right|^2 $$
Loss Relating to the Overall Performance of the Algorithm: $$ \mathcal{L}_{u=data}(\theta) = \frac{1}{N_{N\_f}} \sum _{i=1}^{N\_f} \left| u\left(x_n^i, t_n^i\right) - N\left(x_n^i,t_n^i; \theta\right) \right|^2 $$
Loss Relating to Boundary and Initial Conditions: $$ \mathcal{L}_{\text{b}}(\theta) = \frac{1}{N_{N\_b}} \sum _{i=1}^{N\_b} \left| u\left(x_b^i, t_b^i\right) - N\left(x_b^i,t_b^i; \theta\right) \right|^2 $$

In the above expressions:

$f$ and $N$ are functions parametrized by $\theta$ and $\alpha$.
$N_{N\_f}$ is the total number of collocation points.
$N_{N\_b}$ is the total number of boundary and initial condition points.
$(x_b^i, t_b^i)$ are the coordinates of the boundary and initial condition points.

The goal is to minimize the total loss $\mathcal{L}$ to obtain the neural network's parameters $\theta=[W_i, b_i]$ and the PDE's parameter $\alpha$.

Let's recall relevant shapes of tensors.

In [6]:

seed = 123
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)  # if you have more than one GPU
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = True

# We will approximate the heat equation on the box [0, 1]x[0, 1]:
x_min=0.
x_max=1.
t_min=0.
t_max=2.
#Collocation discretisation of the box, i.e. for the residual term

total_points_x=200
total_points_t=100

#Create mesh
x=torch.linspace(x_min,x_max,total_points_x).view(-1,1) #add dimension
print(f'x shape is {x.shape}')
t=torch.linspace(t_min,t_max,total_points_t).view(-1,1)

# Create the mesh 
X,T=torch.meshgrid(x.squeeze(1),t.squeeze(1))#[200,100]
print(x.shape)#[200,1]
print(t.shape)#[100,1]
print(X.shape)#[200,100]
print(T.shape)#[200,100]

x_min_tens = torch.Tensor([x_min]) 
x_max_tens = torch.Tensor([x_max]) 
#To get analytical solution obtained via MATHEMATICA
f_real = lambda x, t:  u(x, t)
# Evaluate real solution on the box domain
U_real=f_real(X,T)
print(f'U_real shape is: {U_real.shape}')
analytical_sol(x,t,U_real) #f_real was defined previously(function)

print(x.shape,t.shape, U_real.shape)
print(x.shape[0])
print(X.shape,T.shape)

x shape is torch.Size([200, 1])
torch.Size([200, 1])
torch.Size([100, 1])
torch.Size([200, 100])
torch.Size([200, 100])
U_real shape is: torch.Size([200, 100])

torch.Size([200, 1]) torch.Size([100, 1]) torch.Size([200, 100])
200
torch.Size([200, 100]) torch.Size([200, 100])

Training Data¶

For the inverse problem, we sample points from the PDE's solution which is given. Again we have to flatten (x,t)-grid of $u(x,t)$ in eaxh dimension, x and t. Next we concatenate obtained long vectors (remembering about keeping right order). We have to process initial and boundaries conditions since we know them and we can use them. However the latter is not necessary to recover $\alpha$ parameter albeit helpful.

In [7]:

def initial_condition_function(left_X):
    return torch.sqrt(left_X[:, 0]).unsqueeze(1)

def bottom_boundary_function(bottom_X):
    return torch.zeros(bottom_X.shape[0], 1)

def top_boundary_function(top_X):
    return torch.cos(2. * top_X[:, 1]).unsqueeze(1)

In [8]:

# Flatten and concatenate to obtain X_true
X_true = torch.cat((X.flatten(start_dim=0).unsqueeze(1), T.flatten(start_dim=0).unsqueeze(1)), dim=1)
print(X.flatten(start_dim=0).shape)

# Domain bounds
lb = X_true[0]  # [-1. 0.]
ub = X_true[-1] # [1.  1.]

usol_numpy = U_real.cpu().numpy()
U_true_numpy = usol_numpy.reshape(-1)#, order='F')  # Transpose, then flatten in Fortran order
U_true = torch.tensor(U_true_numpy, dtype=torch.float32).unsqueeze(1) # Convert back to tensor and add a new dimension

# Left Edge: U(x,0) -> xmin=<x=<xmax; t=0
initial_X = torch.hstack((X[:, 0][:, None], T[:, 0][:, None])) # [200,2]
initial_U = initial_condition_function(initial_X)  # [200,1]

idx = np.random.choice(initial_X.shape[0], 20, replace=False)
X_IC = initial_X[idx, :]  # [15,2]
U_IC = initial_U[idx, :]  # [15,1]

bottom_X = torch.hstack([X[0, :][:, None], T[0, :][:, None]])    # [100,2]
bottom_U = bottom_boundary_function(bottom_X)  # [100,1]

top_X = torch.hstack((X[-1, :][:, None], T[-1, :][:, None]))   # [100,2]
top_U = top_boundary_function(top_X)  # [100,1]

# Stack all boundaries vertically
X_BC = torch.vstack([bottom_X, top_X])  # [150,2]
U_BC = torch.vstack([bottom_U, top_U])  # [150,1]

# N_BC=200
idx = np.random.choice(X_BC.shape[0], 20, replace=False)
X_BC = X_BC[idx, :]  # [100,2]
U_BC = U_BC[idx, :]  # [100,1]

# Stack all boundaries vertically
X_train_boundaries = torch.vstack([X_IC, X_BC])  # [200,2]
U_train_boundaries = torch.vstack([U_IC, U_BC])  # [200,1]

# Store tensors to GPU
X_train_boundaries = X_train_boundaries.float().to(device)  # Training Points (BC)
U_train_boundaries = U_train_boundaries.float().to(device)  # Training Points (BC)

print('U_true', U_true.shape)

total_points=len(x)*len(t)
print('total_points: ', total_points)
N_f = 1000 #Total number of collocation points

# Obtain random points for interior
id_f = np.random.choice(total_points, N_f, replace=False)# Randomly chosen points for Interior
print('id_f: ', id_f.shape)
X_train_Nu = X_true[id_f]
U_train_Nu= U_true[id_f]

'send them to our GPU'
X_train_Nu = X_train_Nu.float().to(device)
print('X_train_Nu: ', X_train_Nu.shape)
U_train_Nu = U_train_Nu.float().to(device)
print('U_train_Nu: ', U_train_Nu.shape)
X_true = X_true.float().to(device)
print('X_true: ', X_true.shape)
U_true = U_true.float().to(device)
print('U_true: ', U_true.shape)
f_hat = torch.zeros(X_train_Nu.shape[0],1).to(device)
print('f_hat: ', f_hat.shape)

torch.Size([20000])

U_true torch.Size([20000, 1])
total_points:  20000
id_f:  (1000,)
X_train_Nu:  torch.Size([1000, 2])
U_train_Nu:  torch.Size([1000, 1])
X_true:  torch.Size([20000, 2])
U_true:  torch.Size([20000, 1])
f_hat:  torch.Size([1000, 1])

In [11]:

class Fabryka_Inverse(nn.Module):
    def __init__(self, dim_in, dim_out, num_resnet_blocks,
                 num_layers_per_block, num_neurons, activation,
                 fourier_features, m_freqs, sigma, tune_beta, alpha, lb, ub):
        super(Fabryka_Inverse, self).__init__()

        self.loss_function = nn.MSELoss(reduction ='mean')  # Mean squared error loss
        self.iter = 0  # Iteration counter

        # alpha is a parameter that will be learned (related to the PDE)
        self.alpha = torch.tensor([alpha], requires_grad=True).float().to(device)
        self.alpha = nn.Parameter(self.alpha)

        # DenseResNet is a custom network architecture used in this model
        self.dnn = DenseResNet(
            dim_in=dim_in, dim_out=dim_out,
            num_resnet_blocks=num_resnet_blocks,
            num_layers_per_block=num_layers_per_block,
            num_neurons=num_neurons, activation=activation,
            fourier_features=fourier_features, m_freqs=m_freqs,
            sigma=sigma, tune_beta=tune_beta, lb=lb, ub=ub
        ).to(device)

        # Registering alpha as a parameter of dnn
        self.dnn.register_parameter('alpha', self.alpha)

    def loss_data(self,x,y):
        # Loss against real data
        loss_u = self.loss_function(self.dnn(x), y)

        return loss_u
    
    def loss_boundaries(self, X_boundaries, U_boundaries):
        # Compute the loss for both initial and boundary conditions
        loss_b = self.loss_function(self.dnn(X_boundaries), U_boundaries)
        return loss_b

    def loss_PDE(self, X_train_Nu):
        alpha=self.alpha
        #lambda2=self.lambda2  # Uncomment if lambda2 is needed
        g = X_train_Nu.clone()
        g.requires_grad = True  # Setting requires_grad to True for automatic differentiation
        u = self.dnn(g)

        # Calculating gradients and second-order gradients
        u_x_t = autograd.grad(u, g, torch.ones([X_train_Nu.shape[0], 1]).to(device), retain_graph=True, create_graph=True)[0]
        u_xx_tt = autograd.grad(u_x_t, g, torch.ones(X_train_Nu.shape).to(device), create_graph=True)[0]
        u_x = u_x_t[:,[0]]
        u_t = u_x_t[:,[1]]
        u_xx = u_xx_tt[:,[0]]
        f = u_t - (alpha)*u_xx  # Residual of the PDE
        loss_f = self.loss_function(f,f_hat)  # f_hat should be defined or passed as an argument

        return loss_f

    def loss_total(self, X_col, U_col, X_boundaries, U_boundaries):
        # Combine data loss, PDE loss, initial loss, and boundary loss
        loss_u = self.loss_data(X_col, U_col)
        loss_f = self.loss_PDE(X_col)
        loss_b = self.loss_boundaries(X_boundaries, U_boundaries)
        loss_val = loss_u + loss_f + loss_b# + loss_i
        return loss_val
    
    def __call__(self, steps, lr, X_boundaries, U_boundaries):
        # Collecting parameters for optimization
        params = list(self.dnn.parameters())#+ [self.alpha] 
        optimizer = torch.optim.Adam(params, lr=lr)  # Adam optimizer
        iter = 0

        for i in range(steps):
            optimizer.zero_grad()
            # Ensure the correct arguments are passed to loss_total
            loss = self.loss_total(X_train_Nu, U_train_Nu, X_boundaries, U_boundaries)# X_IC, U_IC, X_BC, U_BC)   # If loss_total requires arguments, provide them here
            loss.backward()  # Backpropagation
            optimizer.step()  # Update step

            self.iter += 1
            if self.iter % 100 == 0:
                error_vec, _ = self.test()
                print(f'Iter: {self.iter}, Relative Error(Test): {error_vec.cpu().detach().numpy():.5f}, α_real = [0.1], α_PINN = [{self.alpha.item():.5f}]')

                # print(
                #     'Iter: %d, Relative Error(Test): %.5f , α_real = [0.1], α_PINN = [%.5f]' %
                #     (
                #         self.iter,
                #         error_vec.cpu().detach().numpy(),
                #         self.alpha.item(),
                #     )
                # )
        return loss

    def test(self):
        u_pred = self.dnn(X_true)  # X_true should be defined or passed as an argument
        # Relative L2 Norm of the error (Vector)
        error_vec = torch.linalg.norm((U_true-u_pred),2)/torch.linalg.norm(U_true,2)  # U_true should be defined or passed as an argument
        u_pred = u_pred.cpu().detach().numpy()
        u_pred = np.reshape(u_pred,(x.shape[0],t.shape[0]),order='C')  # x and t should be defined or passed as arguments

        return error_vec, u_pred

In [12]:

#Set default dtype to float32
torch.set_default_dtype(torch.float)
torch.manual_seed(123)
np.random.seed(123)

fcn_instance_inv = Fabryka_Inverse(
    dim_in=2, dim_out=1, num_resnet_blocks=3,
    num_layers_per_block=3, num_neurons=25, activation=nn.ELU(0.853), # nn.Tanh(),#nn.SiLU(),
    fourier_features=True, m_freqs=29, sigma=2, tune_beta=True, alpha=0.5, lb=lb, ub=ub)

fcn_instance_inv.to(device)

# Now call the __call__ method of FCN with steps and lr
steps = 20000
lr = 7e-5
fcn_instance_inv(steps, lr, X_train_boundaries, U_train_boundaries)

Iter: 100, Relative Error(Test): 0.49314, α_real = [0.1], α_PINN = [0.49701]
Iter: 200, Relative Error(Test): 0.44367, α_real = [0.1], α_PINN = [0.49353]
Iter: 300, Relative Error(Test): 0.41400, α_real = [0.1], α_PINN = [0.48472]
Iter: 400, Relative Error(Test): 0.39632, α_real = [0.1], α_PINN = [0.47552]
Iter: 500, Relative Error(Test): 0.38704, α_real = [0.1], α_PINN = [0.46726]
Iter: 600, Relative Error(Test): 0.38473, α_real = [0.1], α_PINN = [0.45966]
Iter: 700, Relative Error(Test): 0.38208, α_real = [0.1], α_PINN = [0.45218]
Iter: 800, Relative Error(Test): 0.38396, α_real = [0.1], α_PINN = [0.44455]
Iter: 900, Relative Error(Test): 0.38294, α_real = [0.1], α_PINN = [0.43705]
Iter: 1000, Relative Error(Test): 0.38051, α_real = [0.1], α_PINN = [0.42975]
Iter: 1100, Relative Error(Test): 0.37638, α_real = [0.1], α_PINN = [0.42251]
Iter: 1200, Relative Error(Test): 0.37121, α_real = [0.1], α_PINN = [0.41525]
Iter: 1300, Relative Error(Test): 0.36857, α_real = [0.1], α_PINN = [0.40799]
Iter: 1400, Relative Error(Test): 0.36418, α_real = [0.1], α_PINN = [0.40074]
Iter: 1500, Relative Error(Test): 0.35960, α_real = [0.1], α_PINN = [0.39336]
Iter: 1600, Relative Error(Test): 0.35303, α_real = [0.1], α_PINN = [0.38587]
Iter: 1700, Relative Error(Test): 0.34590, α_real = [0.1], α_PINN = [0.37833]
Iter: 1800, Relative Error(Test): 0.33516, α_real = [0.1], α_PINN = [0.37080]
Iter: 1900, Relative Error(Test): 0.32468, α_real = [0.1], α_PINN = [0.36333]
Iter: 2000, Relative Error(Test): 0.31428, α_real = [0.1], α_PINN = [0.35578]
Iter: 2100, Relative Error(Test): 0.30544, α_real = [0.1], α_PINN = [0.34796]
Iter: 2200, Relative Error(Test): 0.29518, α_real = [0.1], α_PINN = [0.34007]
Iter: 2300, Relative Error(Test): 0.28093, α_real = [0.1], α_PINN = [0.33218]
Iter: 2400, Relative Error(Test): 0.26707, α_real = [0.1], α_PINN = [0.32443]
Iter: 2500, Relative Error(Test): 0.25326, α_real = [0.1], α_PINN = [0.31657]
Iter: 2600, Relative Error(Test): 0.22366, α_real = [0.1], α_PINN = [0.30899]
Iter: 2700, Relative Error(Test): 0.25305, α_real = [0.1], α_PINN = [0.30166]
Iter: 2800, Relative Error(Test): 0.19551, α_real = [0.1], α_PINN = [0.29508]
Iter: 2900, Relative Error(Test): 0.19321, α_real = [0.1], α_PINN = [0.28886]
Iter: 3000, Relative Error(Test): 0.18534, α_real = [0.1], α_PINN = [0.28289]
Iter: 3100, Relative Error(Test): 0.17522, α_real = [0.1], α_PINN = [0.27699]
Iter: 3200, Relative Error(Test): 0.16447, α_real = [0.1], α_PINN = [0.27120]
Iter: 3300, Relative Error(Test): 0.15459, α_real = [0.1], α_PINN = [0.26554]
Iter: 3400, Relative Error(Test): 0.14742, α_real = [0.1], α_PINN = [0.26021]
Iter: 3500, Relative Error(Test): 0.14590, α_real = [0.1], α_PINN = [0.25494]
Iter: 3600, Relative Error(Test): 0.13252, α_real = [0.1], α_PINN = [0.25012]
Iter: 3700, Relative Error(Test): 0.12318, α_real = [0.1], α_PINN = [0.24562]
Iter: 3800, Relative Error(Test): 0.12032, α_real = [0.1], α_PINN = [0.24120]
Iter: 3900, Relative Error(Test): 0.11548, α_real = [0.1], α_PINN = [0.23689]
Iter: 4000, Relative Error(Test): 0.11327, α_real = [0.1], α_PINN = [0.23256]
Iter: 4100, Relative Error(Test): 0.10514, α_real = [0.1], α_PINN = [0.22823]
Iter: 4200, Relative Error(Test): 0.10277, α_real = [0.1], α_PINN = [0.22403]
Iter: 4300, Relative Error(Test): 0.09879, α_real = [0.1], α_PINN = [0.21981]
Iter: 4400, Relative Error(Test): 0.09483, α_real = [0.1], α_PINN = [0.21571]
Iter: 4500, Relative Error(Test): 0.09154, α_real = [0.1], α_PINN = [0.21180]
Iter: 4600, Relative Error(Test): 0.08835, α_real = [0.1], α_PINN = [0.20801]
Iter: 4700, Relative Error(Test): 0.08535, α_real = [0.1], α_PINN = [0.20438]
Iter: 4800, Relative Error(Test): 0.08167, α_real = [0.1], α_PINN = [0.20072]
Iter: 4900, Relative Error(Test): 0.07881, α_real = [0.1], α_PINN = [0.19696]
Iter: 5000, Relative Error(Test): 0.07706, α_real = [0.1], α_PINN = [0.19321]
Iter: 5100, Relative Error(Test): 0.07444, α_real = [0.1], α_PINN = [0.18957]
Iter: 5200, Relative Error(Test): 0.07229, α_real = [0.1], α_PINN = [0.18604]
Iter: 5300, Relative Error(Test): 0.06919, α_real = [0.1], α_PINN = [0.18267]
Iter: 5400, Relative Error(Test): 0.06683, α_real = [0.1], α_PINN = [0.17951]
Iter: 5500, Relative Error(Test): 0.06408, α_real = [0.1], α_PINN = [0.17641]
Iter: 5600, Relative Error(Test): 0.06177, α_real = [0.1], α_PINN = [0.17339]
Iter: 5700, Relative Error(Test): 0.05926, α_real = [0.1], α_PINN = [0.17037]
Iter: 5800, Relative Error(Test): 0.05713, α_real = [0.1], α_PINN = [0.16732]
Iter: 5900, Relative Error(Test): 0.05508, α_real = [0.1], α_PINN = [0.16428]
Iter: 6000, Relative Error(Test): 0.05348, α_real = [0.1], α_PINN = [0.16122]
Iter: 6100, Relative Error(Test): 0.05168, α_real = [0.1], α_PINN = [0.15852]
Iter: 6200, Relative Error(Test): 0.04923, α_real = [0.1], α_PINN = [0.15595]
Iter: 6300, Relative Error(Test): 0.04762, α_real = [0.1], α_PINN = [0.15361]
Iter: 6400, Relative Error(Test): 0.04583, α_real = [0.1], α_PINN = [0.15141]
Iter: 6500, Relative Error(Test): 0.04447, α_real = [0.1], α_PINN = [0.14929]
Iter: 6600, Relative Error(Test): 0.04289, α_real = [0.1], α_PINN = [0.14717]
Iter: 6700, Relative Error(Test): 0.04125, α_real = [0.1], α_PINN = [0.14514]
Iter: 6800, Relative Error(Test): 0.04028, α_real = [0.1], α_PINN = [0.14314]
Iter: 6900, Relative Error(Test): 0.03912, α_real = [0.1], α_PINN = [0.14127]
Iter: 7000, Relative Error(Test): 0.03783, α_real = [0.1], α_PINN = [0.13933]
Iter: 7100, Relative Error(Test): 0.03700, α_real = [0.1], α_PINN = [0.13740]
Iter: 7200, Relative Error(Test): 0.03614, α_real = [0.1], α_PINN = [0.13568]
Iter: 7300, Relative Error(Test): 0.03513, α_real = [0.1], α_PINN = [0.13400]
Iter: 7400, Relative Error(Test): 0.03455, α_real = [0.1], α_PINN = [0.13242]
Iter: 7500, Relative Error(Test): 0.03375, α_real = [0.1], α_PINN = [0.13075]
Iter: 7600, Relative Error(Test): 0.03313, α_real = [0.1], α_PINN = [0.12915]
Iter: 7700, Relative Error(Test): 0.03271, α_real = [0.1], α_PINN = [0.12759]
Iter: 7800, Relative Error(Test): 0.03197, α_real = [0.1], α_PINN = [0.12636]
Iter: 7900, Relative Error(Test): 0.03117, α_real = [0.1], α_PINN = [0.12534]
Iter: 8000, Relative Error(Test): 0.03068, α_real = [0.1], α_PINN = [0.12454]
Iter: 8100, Relative Error(Test): 0.03031, α_real = [0.1], α_PINN = [0.12373]
Iter: 8200, Relative Error(Test): 0.03002, α_real = [0.1], α_PINN = [0.12296]
Iter: 8300, Relative Error(Test): 0.02957, α_real = [0.1], α_PINN = [0.12192]
Iter: 8400, Relative Error(Test): 0.02923, α_real = [0.1], α_PINN = [0.12087]
Iter: 8500, Relative Error(Test): 0.02905, α_real = [0.1], α_PINN = [0.11988]
Iter: 8600, Relative Error(Test): 0.02880, α_real = [0.1], α_PINN = [0.11908]
Iter: 8700, Relative Error(Test): 0.02807, α_real = [0.1], α_PINN = [0.11845]
Iter: 8800, Relative Error(Test): 0.02780, α_real = [0.1], α_PINN = [0.11773]
Iter: 8900, Relative Error(Test): 0.02686, α_real = [0.1], α_PINN = [0.11713]
Iter: 9000, Relative Error(Test): 0.02622, α_real = [0.1], α_PINN = [0.11662]
Iter: 9100, Relative Error(Test): 0.02566, α_real = [0.1], α_PINN = [0.11597]
Iter: 9200, Relative Error(Test): 0.02514, α_real = [0.1], α_PINN = [0.11523]
Iter: 9300, Relative Error(Test): 0.02491, α_real = [0.1], α_PINN = [0.11455]
Iter: 9400, Relative Error(Test): 0.02482, α_real = [0.1], α_PINN = [0.11410]
Iter: 9500, Relative Error(Test): 0.02478, α_real = [0.1], α_PINN = [0.11366]
Iter: 9600, Relative Error(Test): 0.02462, α_real = [0.1], α_PINN = [0.11316]
Iter: 9700, Relative Error(Test): 0.02482, α_real = [0.1], α_PINN = [0.11282]
Iter: 9800, Relative Error(Test): 0.02445, α_real = [0.1], α_PINN = [0.11284]
Iter: 9900, Relative Error(Test): 0.02437, α_real = [0.1], α_PINN = [0.11213]
Iter: 10000, Relative Error(Test): 0.02421, α_real = [0.1], α_PINN = [0.11130]
Iter: 10100, Relative Error(Test): 0.02334, α_real = [0.1], α_PINN = [0.11064]
Iter: 10200, Relative Error(Test): 0.02303, α_real = [0.1], α_PINN = [0.11023]
Iter: 10300, Relative Error(Test): 0.02333, α_real = [0.1], α_PINN = [0.10994]
Iter: 10400, Relative Error(Test): 0.02322, α_real = [0.1], α_PINN = [0.10957]
Iter: 10500, Relative Error(Test): 0.02301, α_real = [0.1], α_PINN = [0.10925]
Iter: 10600, Relative Error(Test): 0.02239, α_real = [0.1], α_PINN = [0.10867]
Iter: 10700, Relative Error(Test): 0.02201, α_real = [0.1], α_PINN = [0.10830]
Iter: 10800, Relative Error(Test): 0.02205, α_real = [0.1], α_PINN = [0.10818]
Iter: 10900, Relative Error(Test): 0.02194, α_real = [0.1], α_PINN = [0.10814]
Iter: 11000, Relative Error(Test): 0.02141, α_real = [0.1], α_PINN = [0.10783]
Iter: 11100, Relative Error(Test): 0.02149, α_real = [0.1], α_PINN = [0.10753]
Iter: 11200, Relative Error(Test): 0.02119, α_real = [0.1], α_PINN = [0.10705]
Iter: 11300, Relative Error(Test): 0.02094, α_real = [0.1], α_PINN = [0.10675]
Iter: 11400, Relative Error(Test): 0.02028, α_real = [0.1], α_PINN = [0.10655]
Iter: 11500, Relative Error(Test): 0.01995, α_real = [0.1], α_PINN = [0.10631]
Iter: 11600, Relative Error(Test): 0.01956, α_real = [0.1], α_PINN = [0.10620]
Iter: 11700, Relative Error(Test): 0.01960, α_real = [0.1], α_PINN = [0.10615]
Iter: 11800, Relative Error(Test): 0.01922, α_real = [0.1], α_PINN = [0.10623]
Iter: 11900, Relative Error(Test): 0.01937, α_real = [0.1], α_PINN = [0.10619]
Iter: 12000, Relative Error(Test): 0.01883, α_real = [0.1], α_PINN = [0.10623]
Iter: 12100, Relative Error(Test): 0.01955, α_real = [0.1], α_PINN = [0.10618]
Iter: 12200, Relative Error(Test): 0.01898, α_real = [0.1], α_PINN = [0.10596]
Iter: 12300, Relative Error(Test): 0.01891, α_real = [0.1], α_PINN = [0.10574]
Iter: 12400, Relative Error(Test): 0.01894, α_real = [0.1], α_PINN = [0.10554]
Iter: 12500, Relative Error(Test): 0.01886, α_real = [0.1], α_PINN = [0.10535]
Iter: 12600, Relative Error(Test): 0.01911, α_real = [0.1], α_PINN = [0.10527]
Iter: 12700, Relative Error(Test): 0.01904, α_real = [0.1], α_PINN = [0.10498]
Iter: 12800, Relative Error(Test): 0.01845, α_real = [0.1], α_PINN = [0.10460]
Iter: 12900, Relative Error(Test): 0.01861, α_real = [0.1], α_PINN = [0.10393]
Iter: 13000, Relative Error(Test): 0.01838, α_real = [0.1], α_PINN = [0.10306]
Iter: 13100, Relative Error(Test): 0.01846, α_real = [0.1], α_PINN = [0.10265]
Iter: 13200, Relative Error(Test): 0.01822, α_real = [0.1], α_PINN = [0.10255]
Iter: 13300, Relative Error(Test): 0.01778, α_real = [0.1], α_PINN = [0.10244]
Iter: 13400, Relative Error(Test): 0.01758, α_real = [0.1], α_PINN = [0.10241]
Iter: 13500, Relative Error(Test): 0.01714, α_real = [0.1], α_PINN = [0.10236]
Iter: 13600, Relative Error(Test): 0.01715, α_real = [0.1], α_PINN = [0.10248]
Iter: 13700, Relative Error(Test): 0.01674, α_real = [0.1], α_PINN = [0.10240]
Iter: 13800, Relative Error(Test): 0.01659, α_real = [0.1], α_PINN = [0.10229]
Iter: 13900, Relative Error(Test): 0.01646, α_real = [0.1], α_PINN = [0.10212]
Iter: 14000, Relative Error(Test): 0.01670, α_real = [0.1], α_PINN = [0.10207]
Iter: 14100, Relative Error(Test): 0.01622, α_real = [0.1], α_PINN = [0.10198]
Iter: 14200, Relative Error(Test): 0.01576, α_real = [0.1], α_PINN = [0.10189]
Iter: 14300, Relative Error(Test): 0.01607, α_real = [0.1], α_PINN = [0.10174]
Iter: 14400, Relative Error(Test): 0.01596, α_real = [0.1], α_PINN = [0.10191]
Iter: 14500, Relative Error(Test): 0.01608, α_real = [0.1], α_PINN = [0.10199]
Iter: 14600, Relative Error(Test): 0.01634, α_real = [0.1], α_PINN = [0.10201]
Iter: 14700, Relative Error(Test): 0.01626, α_real = [0.1], α_PINN = [0.10190]
Iter: 14800, Relative Error(Test): 0.01584, α_real = [0.1], α_PINN = [0.10180]
Iter: 14900, Relative Error(Test): 0.01570, α_real = [0.1], α_PINN = [0.10150]
Iter: 15000, Relative Error(Test): 0.01517, α_real = [0.1], α_PINN = [0.10134]
Iter: 15100, Relative Error(Test): 0.01494, α_real = [0.1], α_PINN = [0.10128]
Iter: 15200, Relative Error(Test): 0.01463, α_real = [0.1], α_PINN = [0.10112]
Iter: 15300, Relative Error(Test): 0.01447, α_real = [0.1], α_PINN = [0.10093]
Iter: 15400, Relative Error(Test): 0.01445, α_real = [0.1], α_PINN = [0.10085]
Iter: 15500, Relative Error(Test): 0.01540, α_real = [0.1], α_PINN = [0.10077]
Iter: 15600, Relative Error(Test): 0.01430, α_real = [0.1], α_PINN = [0.10065]
Iter: 15700, Relative Error(Test): 0.01416, α_real = [0.1], α_PINN = [0.10054]
Iter: 15800, Relative Error(Test): 0.01386, α_real = [0.1], α_PINN = [0.10048]
Iter: 15900, Relative Error(Test): 0.01378, α_real = [0.1], α_PINN = [0.10034]
Iter: 16000, Relative Error(Test): 0.01371, α_real = [0.1], α_PINN = [0.10027]
Iter: 16100, Relative Error(Test): 0.01362, α_real = [0.1], α_PINN = [0.10034]
Iter: 16200, Relative Error(Test): 0.01379, α_real = [0.1], α_PINN = [0.10017]
Iter: 16300, Relative Error(Test): 0.01329, α_real = [0.1], α_PINN = [0.10008]
Iter: 16400, Relative Error(Test): 0.01325, α_real = [0.1], α_PINN = [0.10000]
Iter: 16500, Relative Error(Test): 0.01324, α_real = [0.1], α_PINN = [0.09998]
Iter: 16600, Relative Error(Test): 0.01308, α_real = [0.1], α_PINN = [0.09998]
Iter: 16700, Relative Error(Test): 0.01292, α_real = [0.1], α_PINN = [0.09998]
Iter: 16800, Relative Error(Test): 0.01322, α_real = [0.1], α_PINN = [0.09997]
Iter: 16900, Relative Error(Test): 0.01369, α_real = [0.1], α_PINN = [0.09996]
Iter: 17000, Relative Error(Test): 0.01318, α_real = [0.1], α_PINN = [0.10005]
Iter: 17100, Relative Error(Test): 0.01337, α_real = [0.1], α_PINN = [0.09998]
Iter: 17200, Relative Error(Test): 0.01289, α_real = [0.1], α_PINN = [0.10006]
Iter: 17300, Relative Error(Test): 0.01313, α_real = [0.1], α_PINN = [0.10008]
Iter: 17400, Relative Error(Test): 0.01303, α_real = [0.1], α_PINN = [0.10017]
Iter: 17500, Relative Error(Test): 0.01282, α_real = [0.1], α_PINN = [0.10019]
Iter: 17600, Relative Error(Test): 0.01277, α_real = [0.1], α_PINN = [0.10015]
Iter: 17700, Relative Error(Test): 0.01338, α_real = [0.1], α_PINN = [0.10009]
Iter: 17800, Relative Error(Test): 0.01383, α_real = [0.1], α_PINN = [0.10007]
Iter: 17900, Relative Error(Test): 0.01266, α_real = [0.1], α_PINN = [0.10004]
Iter: 18000, Relative Error(Test): 0.01274, α_real = [0.1], α_PINN = [0.09995]
Iter: 18100, Relative Error(Test): 0.01272, α_real = [0.1], α_PINN = [0.09993]
Iter: 18200, Relative Error(Test): 0.01282, α_real = [0.1], α_PINN = [0.09977]
Iter: 18300, Relative Error(Test): 0.01321, α_real = [0.1], α_PINN = [0.09969]
Iter: 18400, Relative Error(Test): 0.01295, α_real = [0.1], α_PINN = [0.09957]
Iter: 18500, Relative Error(Test): 0.01306, α_real = [0.1], α_PINN = [0.09958]
Iter: 18600, Relative Error(Test): 0.01302, α_real = [0.1], α_PINN = [0.09958]
Iter: 18700, Relative Error(Test): 0.01321, α_real = [0.1], α_PINN = [0.09961]
Iter: 18800, Relative Error(Test): 0.01304, α_real = [0.1], α_PINN = [0.09964]
Iter: 18900, Relative Error(Test): 0.01267, α_real = [0.1], α_PINN = [0.09969]
Iter: 19000, Relative Error(Test): 0.01283, α_real = [0.1], α_PINN = [0.09979]
Iter: 19100, Relative Error(Test): 0.01263, α_real = [0.1], α_PINN = [0.09977]
Iter: 19200, Relative Error(Test): 0.01282, α_real = [0.1], α_PINN = [0.09980]
Iter: 19300, Relative Error(Test): 0.01257, α_real = [0.1], α_PINN = [0.09984]
Iter: 19400, Relative Error(Test): 0.01226, α_real = [0.1], α_PINN = [0.09985]
Iter: 19500, Relative Error(Test): 0.01266, α_real = [0.1], α_PINN = [0.09986]
Iter: 19600, Relative Error(Test): 0.01250, α_real = [0.1], α_PINN = [0.09969]
Iter: 19700, Relative Error(Test): 0.01235, α_real = [0.1], α_PINN = [0.09955]
Iter: 19800, Relative Error(Test): 0.01243, α_real = [0.1], α_PINN = [0.09968]
Iter: 19900, Relative Error(Test): 0.01239, α_real = [0.1], α_PINN = [0.09968]
Iter: 20000, Relative Error(Test): 0.01231, α_real = [0.1], α_PINN = [0.09961]

Out[12]:

tensor(9.3544e-05, device='cuda:0', grad_fn=<AddBackward0>)

In [13]:

# Save the model parameters
model_path = 'fcn_model_inv.pth'
torch.save(fcn_instance_inv.state_dict(), model_path)

# # Create a new instance of FCN
# loaded_fcn_instance = Fabryka_Inverse(
#     dim_in=2, dim_out=1, num_resnet_blocks=3,
#     num_layers_per_block=3, num_neurons=25, activation=nn.ELU(0.77), # nn.Tanh(),#nn.SiLU(),
#     fourier_features=True, m_freqs=29, sigma=2, tune_beta=True, alpha=0.5, lb=lb, ub=ub
# )

# # Load the model parameters from the file
# loaded_fcn_instance.load_state_dict(torch.load(model_path))

# # Move the model to the desired device
# loaded_fcn_instance.to(device)

In [14]:

error_vec, u_pred = fcn_instance_inv.test()

In [15]:

contour_comparison_with_errors(x, t, U_real, u_pred)

In [16]:

import matplotlib.gridspec as gridspec  # Import GridSpec from matplotlib for specifying grid layout of subplots
from mpl_toolkits.axes_grid1 import make_axes_locatable  # Import make_axes_locatable for managing layout

def solutionplot(u_pred,X_u_train,u_train):  # Define function solutionplot with three parameters
    
    fig, ax = plt.subplots(figsize=(25, 22))  # Create figure and axis objects with specified figure size
    ax.axis('off')  # Turn off the axis (hide axis labels and ticks)

    gs0 = gridspec.GridSpec(1, 1)  # Create a GridSpec object with 1 row and 1 column
    gs0.update(top=1-0.06, bottom=1-1/3, left=0.15, right=0.85, wspace=0)  # Update GridSpec layout parameters
    
    ax = plt.subplot(gs0[:, :])  # Create a subplot based on the GridSpec layout

    h = ax.imshow(u_pred, interpolation='nearest', cmap='twilight', 
                extent=[T.min(), T.max(), X.min(), X.max()], 
                origin='lower', aspect='auto')  # Display u_pred as an image on the axis
    
    divider = make_axes_locatable(ax)  # Create a divider for the existing axis instance
    cax = divider.append_axes("right", size="5%", pad=0.05)  # Append axes to the right of ax with specified size and padding
    
    fig.colorbar(h, cax=cax)  # Add a colorbar to the right of the image
    
    # Scatter plot for the data points with adjusted marker size
    ax.scatter(X_u_train[:, 1], X_u_train[:, 0], color='k', marker='.', label='Data (%d points)' % (u_train.shape[0]), s=40)  
    # Scatter plot for the initial condition points with adjusted marker size
    ax.scatter(X_IC[:, 1], X_IC[:, 0], color='g', marker='x', label='IC Points (%d points)' % (X_IC.shape[0]), s=200)  
    # Scatter plot for the boundary condition points with adjusted marker size
    ax.scatter(X_BC[:, 1], X_BC[:, 0], color='y', marker='o', label='BC Points (%d points)' % (X_BC.shape[0]), s=200)

    
    ax.legend(loc='upper right')  # Create a legend in the upper right corner of the subplot
    plt.show()  # Display the figure

solutionplot(U_real, X_train_Nu.cpu().detach().numpy(), U_train_Nu)

/tmp/ipykernel_22660/3906780272.py:12: MatplotlibDeprecationWarning: Auto-removal of overlapping axes is deprecated since 3.6 and will be removed two minor releases later; explicitly call ax.remove() as needed.
  ax = plt.subplot(gs0[:, :])  # Create a subplot based on the GridSpec layout
/tmp/ipykernel_22660/3906780272.py:21: MatplotlibDeprecationWarning: Auto-removal of grids by pcolor() and pcolormesh() is deprecated since 3.5 and will be removed two minor releases later; please call grid(False) first.
  fig.colorbar(h, cax=cax)  # Add a colorbar to the right of the image

In this summary, we delve into the enhancement of Physics-Informed Neural Networks (PINNs) training via the integration of Fourier features and Residual Neural Networks. Furthermore, we explore the mesh invariance issue witnessed in PINNs and propose the adoption of Neural Operators as a viable solution.

(a) Utilizing Fourier Features and Residual Neural Networks in PINNs:

The amalgamation of Fourier features and Residual Neural Networks (ResNets) has been recognized as a promising approach to augment the training of PINNs for both forward and inverse problems. Fourier features are crucial in capturing the periodic nature and the multi-scale behaviors inherent in many physical systems. They aid in enriching the feature space, thus allowing the neural networks to learn and generalize better across varying scales and domains. ResNets, on the other hand, facilitate the training of deeper networks by mitigating the vanishing gradient problem, which is pivotal in learning hierarchical features and ensuring the convergence of the training process. When harmonized, these techniques contribute to a more robust and efficient training regime, improving the performance of PINNs in solving Partial Differential Equations (PDEs) inherent in the physical problems at hand.

(b) Mesh Invariance Issue in PINNs:

Despite the advancements, a notable drawback of PINNs is their sensitivity to the mesh grid, number of points on the grid, and other hyperparameters. Particularly, the quality of results may deteriorate with alterations in the grid configuration or boundary conditions. This lack of mesh invariance poses a challenge as it impedes the robustness and generalizability of PINNs across varying discretizations and geometries. Mesh invariance is essentially the ability of the model to provide consistent results irrespective of the mesh grid configurations. The absence of this invariance in PINNs could potentially lead to inconsistent or erroneous results across different problem setups.

Neural Operators emerge as a promising avenue to overcome the mesh invariance issue witnessed in PINNs. Unlike traditional PINNs that operate on a fixed grid, Neural Operators are designed to learn mappings between function spaces, thus offering a level of abstraction that transcends fixed discretizations. By operating in this higher-level function space, Neural Operators are capable of achieving mesh invariance, ensuring consistent and accurate results across varying grid configurations and boundary conditions. This distinction underscores a fundamental shift from solving PDEs in a discretized domain (as done in PINNs) to a more generalized approach that operates in continuous function spaces. Although the exact implementation may vary, the core idea remains to establish a learning framework that is less tethered to the discretization scheme and more focused on the inherent structures and properties of the underlying PDEs.

In conclusion, the fusion of Fourier features and Residual Neural Networks significantly boosts the efficacy of PINNs in tackling forward and inverse problems. However, the mesh invariance issue remains a hurdle, necessitating further exploration of alternative strategies like Neural Operators to attain a more robust and generalized solution framework for solving PDEs.

Kreyszig 1.1, Metric Spaces

Lucy Nowacki

2023-10-05 15:16

Problem 1. Show that the real line is a metric space.

To demonstrate that the real line is a metric space, we must define a distance function (or metric) on the real line and confirm that it adheres to the properties of a metric.

Let's represent the real line as $\mathbb{R}$ and consider two points $x, y \in \mathbb{R}$.

Definition of the Metric (Distance Function):

Define the distance between $x$ and $y$ as: $d(x, y) = |x - y|$ where $| \cdot |$ symbolizes the absolute value.

We now need to confirm that $d$ adheres to the three properties of a metric:

Non-negativity: For all $x, y \in \mathbb{R}$, $d(x, y) \geq 0$ and $d(x, y) = 0$ if and only if $x = y$.

Proof: $d(x, y) = |x - y| \geq 0$ because the absolute value is always non-negative. $d(x, y) = 0$ if and only if $x - y = 0$ or $x = y$.
Symmetry: For all $x, y \in \mathbb{R}$, $d(x, y) = d(y, x)$.

Proof: $d(x, y) = |x - y| = |- (y - x)| = |y - x| = d(y, x)$.
Triangle Inequality: For all $x, y, z \in \mathbb{R}$, $d(x, z) \leq d(x, y) + d(y, z)$.

Proof: Using the properties of absolute values, we have: $|x - z| = |(x - y) + (y - z)| \leq |x - y| + |y - z|$. Thus, $d(x, z) \leq d(x, y) + d(y, z)$.

Given that the distance function $d$ adheres to all three properties of a metric on $\mathbb{R}$, we can deduce that the real line $\mathbb{R}$ is a metric space with metric $d$.

Problem 2. Does $d(x,y) = (x-y)^2$ define a metric on the set of all real numbers?

To determine if $d(x,y) = (x-y)^2$ defines a metric on the set of all real numbers, we need to check if it satisfies the three properties of a metric:

Non-negativity: For all $x, y \in \mathbb{R}$, $d(x, y) \geq 0$ and $d(x, y) = 0$ if and only if $x = y$.

Proof: $d(x, y) = (x-y)^2$ is always non-negative since the square of any real number is non-negative. $d(x, y) = 0$ if and only if $(x-y)^2 = 0$, which implies $x-y = 0$ or $x = y$.
Symmetry: For all $x, y \in \mathbb{R}$, $d(x, y) = d(y, x)$.

Proof: $d(x, y) = (x-y)^2$ and $d(y, x) = (y-x)^2$. Since $(x-y)^2 = (y-x)^2$, we have $d(x, y) = d(y, x)$.
Triangle Inequality: For all $x, y, z \in \mathbb{R}$, $d(x, z) \leq d(x, y) + d(y, z)$.

Proof: We need to check if $(x-z)^2 \leq (x-y)^2 + (y-z)^2$ for all $x, y, z \in \mathbb{R}$. This inequality does not hold in general. For a counterexample, consider $x = 0$, $y = 1$, and $z = 2$. We have: $(x-z)^2 = 4$ $(x-y)^2 + (y-z)^2 = 1 + 1 = 2$ Clearly, 4 is not less than or equal to 2, so the triangle inequality is not satisfied.

Given that the triangle inequality is not satisfied for the function $d(x,y) = (x-y)^2$, we can conclude that it does not define a metric on the set of all real numbers.

Problem 3. Show that $d(x,y)=\sqrt(|x-y|)$ defines a metric on the set of all real numbers.

Define the distance between $x$ and $y$ as: $d(x, y) = \sqrt{|x - y|}$ where $| \cdot |$ symbolizes the absolute value.

We now need to confirm that $d$ adheres to the three properties of a metric:

Non-negativity: For all $x, y \in \mathbb{R}$, $d(x, y) \geq 0$ and $d(x, y) = 0$ if and only if $x = y$.

Proof: $d(x, y) = \sqrt{|x - y|}$ is always non-negative since the square root of any non-negative number is non-negative. $d(x, y) = 0$ if and only if $\sqrt{|x-y|} = 0$, which implies $|x-y| = 0$ or $x = y$.
Symmetry: For all $x, y \in \mathbb{R}$, $d(x, y) = d(y, x)$.

Proof: $d(x, y) = \sqrt{|x - y|}$ and $d(y, x) = \sqrt{|y - x|}$. Since $|x-y| = |y-x|$, we have $d(x, y) = d(y, x)$.
Triangle Inequality: For all $x, y, z \in \mathbb{R}$, $d(x, z) \leq d(x, y) + d(y, z)$.

Proof: We need to check if $\sqrt{|x-z|} \leq \sqrt{|x-y|} + \sqrt{|y-z|}$ for all $x, y, z \in \mathbb{R}$. Squaring both sides, this is equivalent to checking if $|x-z| \leq |x-y| + 2\sqrt{|x-y|}\sqrt{|y-z|} + |y-z|$. Using the triangle inequality for absolute values, we have: $|x-z| \leq |x-y| + |y-z|$. Since $2\sqrt{|x-y|}\sqrt{|y-z|}$ is always non-negative, the inequality $|x-z| \leq |x-y| + 2\sqrt{|x-y|}\sqrt{|y-z|} + |y-z|$ is satisfied.

Given that the distance function $d$ adheres to all three properties of a metric on $\mathbb{R}$, we can deduce that the real line $\mathbb{R}$ is a metric space with metric $d$.

Problem 4. Find all metrics on set X consisting of two points. Consiting of one point.

Consider a set $X$ with two distinct points, $a$ and $b$.

For $d: X \times X \rightarrow \mathbb{R}$ to be a metric, it must satisfy the following properties:

$d(x, y) \geq 0$ (non-negativity)
$d(x, y) = 0$ if and only if $x = y$ (identity of indiscernibles)
$d(x, y) = d(y, x)$ (symmetry)
$d(x, z) \leq d(x, y) + d(y, z)$ (triangle inequality)

Given that $X$ has only two points, the possible distances are:

$d(a, a) = 0$
$d(b, b) = 0$
$d(a, b)$ which must be positive.
$d(b, a) = d(a, b)$

Thus, for any positive real number $r$, defining $d(a, b) = r$ and $d(b, a) = r$ gives a metric on $X$. There are infinitely many such metrics, one for each positive real number $r$.

Now, for a set $X$ with a single point, $a$, the only possible distance is:

$d(a, a) = 0$

Given that this satisfies all the properties of a metric, there is only one metric on $X$ when $X$ consists of a single point.

Problem 5. Let d be a metric on X. Determine all constants k such that (i) k*d, (ii) d+k is metric on X.

Given a metric $d$ on $X$, let's determine the constants $k$ for which:

$k \cdot d$ is a metric on $X$
$d + k$ is a metric on $X$

For (i) $k \cdot d$ to be a metric:

$k \cdot d(x, y) \geq 0$ for all $x, y \in X$. This implies $k \geq 0$.
$k \cdot d(x, y) = 0$ if and only if $x = y$. This property is satisfied for all values of $k$.
Symmetry is automatically satisfied.
The triangle inequality is satisfied for all $k$.

Thus, for (i), $k$ can be any non-negative real number.

For (ii) $d + k$ to be a metric:

$d(x, y) + k \geq 0$ for all $x, y \in X$. This implies $k > 0$.
$d(x, y) + k = 0$ is never true since $k > 0$.
Symmetry is automatically satisfied.
The triangle inequality is satisfied for all $k$.

Thus, for (ii), $k$ must be strictly positive.

Problem 6. We have a sequence space $l^{\infty}$ , where every element of X is a complex sequence, i.e., $x=(\xi_1, \xi_2, ...)$ , and the metric is defined as $d(x,y)=sup_{j \in \mathbb{N}} |\xi_j - \eta_j|$ . Show that a such defined d(x,y) is really metric.

Consider the sequence space $l^{\infty}$ where each element of $X$ is a complex sequence, i.e., $x = (\xi_1, \xi_2, \ldots)$. The metric is defined as $d(x,y) = \sup_{j \in \mathbb{N}} |\xi_j - \eta_j|$.

To show that $d$ is a metric, we need to verify the following properties:

Non-negativity: - $d(x, y) \geq 0$ for all $x, y \in X$ and $d(x, y) = 0$ if and only if $x = y$.
- Proof: The absolute value is always non-negative, and if $d(x, y) = 0$, then $\xi_j = \eta_j$ for all $j$. Conversely, if $x = y$, then $d(x, y) = 0$.
Symmetry: - $d(x, y) = d(y, x)$ for all $x, y \in X$.
- Proof: $d(x, y) = \sup_{j \in \mathbb{N}} |\xi_j - \eta_j| = d(y, x)$.
Triangle Inequality: - For all $x, y, z \in X$, $d(x, z) \leq d(x, y) + d(y, z)$.
- Proof: Using the triangle inequality for absolute values, we have $|\xi_j - \zeta_j| \leq |\xi_j - \eta_j| + |\eta_j - \zeta_j|$. Taking the supremum over all $j$ gives the desired result.

Given that the distance function $d$ satisfies all the properties of a metric, we conclude that it is indeed a metric on $l^{\infty}$.

Problem 7. Determine the Induced Metric on $A$

Consider the space $l^{\infty}$, which consists of all bounded sequences. If $A$ is a subspace of $l^{\infty}$ consisting of all sequences of zeros and ones, then the metric of $l^{\infty}$ induces a metric on $A$.

For $d: A \times A \rightarrow \mathbb{R}$ to be a metric, it must satisfy the following properties:

$d(x, y) \geq 0$ (non-negativity)
$d(x, y) = 0$ if and only if $x = y$ (identity of indiscernibles)
$d(x, y) = d(y, x)$ (symmetry)
$d(x, z) \leq d(x, y) + d(y, z)$ (triangle inequality)

Definition of the Induced Metric:

For any two sequences $x, y$ in $A$, the distance between them in the induced metric is given by: $d(x,y) = \sup_{j \in \mathbb{N}} |x_j - y_j|$

However, since each $x_j$ and $y_j$ can only be 0 or 1 in $A$:

Non-negativity: The absolute difference $|x_j - y_j|$ can only be 0 (if $x_j = y_j$) or 1 (if $x_j \neq y_j$). Therefore, the supremum will be 0 if the sequences are the same and 1 if they are different at any position.

Proof: The absolute difference between any two terms of sequences in $A$ is either 0 or 1. The supremum of a set of non-negative numbers that includes 0 and does not exceed 1 is 1.
Symmetry: For all sequences $x, y$ in $A$, $d(x, y) = d(y, x)$.

Proof: The order of the terms in the absolute difference does not affect its value, so $|x_j - y_j| = |y_j - x_j|$.
Triangle Inequality: This needs to be verified for the induced metric on $A$. Given the nature of sequences in $A$, the triangle inequality holds.

Proof: For any three sequences $x, y, z$ in $A$, the triangle inequality can be verified by considering the individual terms of the sequences and noting that the sum of the absolute differences for any term does not exceed 1.

In other words, the induced metric on $A$ is: $d(x,y) = \begin{cases} 0 & \text{if } x=y \\ 1 & \text{if } x \neq y \end{cases}$

This is known as the discrete metric. Every pair of distinct sequences in $A$ is at a distance of 1 from each other, while the distance between a sequence and itself is 0.

Given the properties and proofs outlined above, the induced metric on $A$ is indeed a metric.

Problem 8. Show that $\tilde{d}$ defines a metric on the function space $C[a,b]$

Consider the function space $C[a,b]$, which consists of all real-valued continuous functions defined on the closed interval $[a,b]$. Let $X$ be the set of all such functions, denoted by $x, y, \dots$, where each function is a mapping from the independent variable $t$ to the real numbers.

For $\tilde{d}: X \times X \rightarrow \mathbb{R}$ to be a metric, it must satisfy the following properties:

$\tilde{d}(x, y) \geq 0$ (non-negativity)
$\tilde{d}(x, y) = 0$ if and only if $x = y$ (identity of indiscernibles)
$\tilde{d}(x, y) = \tilde{d}(y, x)$ (symmetry)
$\tilde{d}(x, z) \leq \tilde{d}(x, y) + \tilde{d}(y, z)$ (triangle inequality)

Definition of the Metric $\tilde{d}$:

For any two functions $x, y$ in $X$, the distance between them in the metric $\tilde{d}$ is given by: $\tilde{d}(x,y) = \int_{a}^{b} |x(t) - y(t)| dt$

Verification of Metric Properties:

Non-negativity: The absolute value ensures that the integrand is non-negative. Therefore, the integral of a non-negative function is also non-negative.

Proof: $\tilde{d}(x,y) = \int_{a}^{b} |x(t) - y(t)| dt \geq 0$
Identity of indiscernibles: If $x = y$, then for all $t \in [a,b]$, $x(t) = y(t)$, and thus $|x(t) - y(t)| = 0$. The integral of zero is zero.

Proof: If $x = y$, then $\tilde{d}(x,y) = \int_{a}^{b} 0 dt = 0$
Symmetry: The absolute value function is symmetric, so swapping $x$ and $y$ does not change the value of the integrand.

Proof: $\tilde{d}(x,y) = \int_{a}^{b} |x(t) - y(t)| dt = \int_{a}^{b} |y(t) - x(t)| dt = \tilde{d}(y,x)$
Triangle Inequality: Using the properties of integrals and absolute values, we can show that the triangle inequality holds.

Proof: For any function $z$ in $X$, we have: $|x(t) - z(t)| \leq |x(t) - y(t)| + |y(t) - z(t)|$

Integrating both sides over $[a,b]$, we get: $\int_{a}^{b} |x(t) - z(t)| dt \leq \int_{a}^{b} |x(t) - y(t)| dt + \int_{a}^{b} |y(t) - z(t)| dt$

Thus, $\tilde{d}(x,z) \leq \tilde{d}(x,y) + \tilde{d}(y,z)$

Given the properties and proofs outlined above, the function $\tilde{d}$ does indeed define a metric on the function space $C[a,b]$.

Problem 10. Hamming Distance. Let X be a set of all ordered triples of zeros and ones. Show that X consists of 8 elements and a metric d on X is defined by d(x,y)=number of places where x and y have a different entries.

Definition of Ordered Triples:

An ordered triple refers to a set of three elements arranged in a specific order. The order of elements in the triple is significant, meaning that the triple (0,1,0) is different from the triple (1,0,0).

Elements of Set X:

Given that each element of the ordered triple can be either a 0 or a 1, we can list all possible ordered triples in set $X$ as follows:

(0,0,0)
(0,0,1)
(0,1,0)
(0,1,1)
(1,0,0)
(1,0,1)
(1,1,0)
(1,1,1)

Thus, set $X$ consists of 8 distinct ordered triples.

Definition of the Metric d on X:

For any two ordered triples $x$ and $y$ in $X$, the metric $d(x,y)$ is defined as the number of positions at which the entries of $x$ and $y$ differ.

Example:

Let's consider two ordered triples: $x = (0,1,0)$ $y = (1,1,1)$

Comparing the entries of $x$ and $y$ position by position:

At the first position: $x$ has 0 and $y$ has 1. They are different.
At the second position: Both $x$ and $y$ have 1. They are the same.
At the third position: $x$ has 0 and $y$ has 1. They are different.

So, there are 2 positions at which $x$ and $y$ have different entries. Therefore, $d(x,y) = 2$.

In essence, the metric $d(x,y)$ for the set $X$ counts the number of "mismatches" between the entries of two ordered triples. The more mismatches there are, the "farther apart" the two ordered triples are considered to be in terms of the metric.

This format should be suitable for Nikola reStructuredText (rst) rendering.

Properties of the Metric d:

To show that $d$ is a metric, we need to verify the following properties:

Non-negativity: For all $x, y \in X$, $d(x, y) \geq 0$.

Proof: The number of differing positions between two ordered triples is always non-negative.
Identity of Indiscernibles: For all $x, y \in X$, $d(x, y) = 0$ if and only if $x = y$.

Proof: If $d(x, y) = 0$, it means there are no differing positions between $x$ and $y$, implying $x = y$.
Symmetry: For all $x, y \in X$, $d(x, y) = d(y, x)$.

Proof: The number of differing positions between $x$ and $y$ is the same as the number of differing positions between $y$ and $x$.
Triangle Inequality: For all $x, y, z \in X$, $d(x, z) \leq d(x, y) + d(y, z)$.

Proof: Let's consider three ordered triples $x, y,$ and $z$ from set $X$. For each position in the ordered triples:
- If $x$ and $z$ have the same entry, then the contribution to $d(x, z)$ is 0 for that position.
- If $x$ and $z$ have different entries, then either $x$ and $y$ have different entries, or $y$ and $z$ have different entries, or both. This means that the sum of the contributions to $d(x, y)$ and $d(y, z)$ for that position is at least 1.
Summing over all positions, we get: $d(x, z) \leq d(x, y) + d(y, z)$

Thus, the triangle inequality is satisfied for the Hamming distance on set $X$.

Given that the metric $d$ satisfies all the properties of a metric on $X$, we can conclude that $d$ defines a metric on the set $X$ of all ordered triples of zeros and ones.

Problem 11. Prove

\begin{equation*} d(x_1, x_{k+1}) \leq (d(x_1,x_2) + d(x_2,x_3) + \ldots + d(x_{k-1}, x_k)) + d(x_k, x_{k+1}) \end{equation*}

Given a metric space with metric $d$, we want to prove that for any sequence of points $x_1, x_2, \ldots, x_n$ in the space:

\begin{equation*} d(x_1, x_n) \leq d(x_1,x_2) + d(x_2,x_3) + \ldots + d(x_{n-1}, x_n) \end{equation*}

To prove this, we'll use induction on $n$.

Base Case $n=2$ For $n = 2$ , the statement is trivially true as:

\begin{equation*} d(x_1, x_2) \leq d(x_1, x_2) \end{equation*}

Inductive Step Assuming our inductive hypothesis, we have:

\begin{equation*} d(x_1, x_k) \leq d(x_1,x_2) + d(x_2,x_3) + \ldots + d(x_{k-1}, x_k) \end{equation*}

Now, consider the distance between $x_1$ and $x_{k+1}$. Using the triangle inequality, we can say:

\begin{equation*} d(x_1, x_{k+1}) \leq d(x_1, x_k) + d(x_k, x_{k+1}) \end{equation*}

Here's the breakdown:

$d(x_1, x_{k+1})$ is the total distance from $x_1$ to $x_{k+1}$.
$d(x_1, x_k)$ represents the distance from $x_1$ to $x_k$.
$d(x_k, x_{k+1})$ is the distance between the points $x_k$ and $x_{k+1}$.

The triangle inequality tells us that the direct path from $x_1$ to $x_{k+1}$ is shorter than or equal to the sum of the path from $x_1$ to $x_k$ and then from $x_k$ to $x_{k+1}$.

Now, using our inductive assumption, we can replace $d(x_1, x_k)$ with the sum of distances between consecutive points up to $x_k$:

\begin{equation*} d(x_1, x_{k+1}) \leq (d(x_1,x_2) + d(x_2,x_3) + \ldots + d(x_{k-1}, x_k)) + d(x_k, x_{k+1}) \end{equation*}

This equation essentially states that the direct distance from $x_1$ to $x_{k+1}$ is less than or equal to the sum of distances when traveling through all intermediate points up to $x_{k+1}$.

Thus, our inductive step is complete, and the statement holds for $n = k+1$.

By the principle of mathematical induction, the statement is true for all positive integers $n$.

Conclusion

The given inequality is a direct consequence of the triangle inequality, and it holds for any sequence of points in a metric space.

Problem 12.

Given the inequality:

\begin{equation*} d(x_1, x_{k+1}) \leq (d(x_1,x_2) + d(x_2,x_3) + \ldots + d(x_{k-1}, x_k)) + d(x_k, x_{k+1}) \end{equation*}

We aim to prove:

\begin{equation*} |d(x,y) - d(z,w)| \leq d(x,z) + d(y,w) \end{equation*}

Proof

Using the triangle inequality, we have:

\begin{equation*} d(x,y) \leq d(x,z) + d(z,y) \end{equation*}

Similarly:

\begin{equation*} d(z,w) \leq d(z,y) + d(y,w) \end{equation*}

From the first inequality, expressing d(z,y) :

\begin{equation*} d(z,y) \geq d(x,y) - d(x,z) \end{equation*}

Substituting into the second inequality:

\begin{equation*} d(z,w) \leq d(x,y) - d(x,z) + d(y,w) \end{equation*}

Rearranging, we get:

\begin{equation*} d(x,y) - d(z,w) \leq d(x,z) + d(y,w) \end{equation*}

Similarly, by interchanging x and z and y and w , we derive:

\begin{equation*} d(z,w) - d(x,y) \leq d(x,z) + d(y,w) \end{equation*}

Combining the two results:

\begin{equation*} |d(x,y) - d(z,w)| \leq d(x,z) + d(y,w) \end{equation*}

In conclusion, the direct path (straight line) between any two points is always shorter than or equal to the sum of the paths that go through an intermediate point.

This completes the proof.

Problem 13. Using triangle inequality show $|d(x,z)-d(y,z)|\leq d(x,y)$

Using the Triangle Inequality:

We know from the triangle inequality that: $d(x,z) \leq d(x,y) + d(y,z)$ (1)

Similarly, we have: $d(y,z) \leq d(y,x) + d(x,z)$ (2)
Rearranging (1):

From (1), we can rearrange to get: $d(x,z) - d(y,z) \leq d(x,y)$ (3)
Rearranging (2):

From (2), we can rearrange to get: $d(y,z) - d(x,z) \leq d(x,y)$ (4)
Combining (3) and (4):

From (3) and (4), we can conclude that: $-d(x,y) \leq d(x,z) - d(y,z) \leq d(x,y)$

This is the definition of the absolute value inequality. Thus, we can rewrite it as: $|d(x,z) - d(y,z)| \leq d(x,y)$

And that completes the proof.

Problem 16.

We aim to prove the following inequality using the triangle inequality:

\begin{equation*} d(x,y) \leq d(z,x) + d(z,y) \end{equation*}

Proof

Using the triangle inequality for any three points x , y , and z in a metric space, we have:

\begin{equation*} d(x,y) \leq d(x,z) + d(z,y) \end{equation*}

This states that the distance between x and y is always less than or equal to the sum of the distances from x to z and from z to y .

Thus, the desired inequality is proven:

\begin{equation*} d(x,y) \leq d(z,x) + d(z,y) \end{equation*}

A bit detailed look

The triangle inequality states that for any three points in a metric space, the direct distance between two of them is always less than or equal to the sum of their distances to the third point.

Given three points x , y, and z:

The direct distance between x and y is denoted as:

\begin{equation*} d(x,y) \end{equation*}
The sum of their distances to z is:

\begin{equation*} d(x,z) + d(z,y) \end{equation*}

Using the triangle inequality, we can conclude:

\begin{equation*} d(x,y) \leq d(x,z) + d(z,y) \end{equation*}

This completes the proof.

Problem 14. Show that (M3) and (M4) can be derived from (M2) and (M4).

Proof:

Deriving (M3) from (M2) and (M4):

Given (M2): $d(x,y) = 0$ if and only if $x = y$

Given (M4): $d(x,y) \leq d(x,z) + d(z,y)$

To prove (M3): $d(x,y) = d(y,x)$

Consider any two points $x$ and $y$.

Using (M4) with $z = x$, we get: $d(x,y) \leq d(x,x) + d(x,y)$

But from (M2), $d(x,x) = 0$. So, $d(x,y) \leq d(x,y)$

Similarly, using (M4) with $z = y$, we get: $d(y,x) \leq d(y,y) + d(y,x)$

Again, from (M2), $d(y,y) = 0$. So, $d(y,x) \leq d(y,x)$

Combining the two inequalities, we get: $d(x,y) = d(y,x)$

This proves (M3).

Deriving (M4) from (M2) and (M4):

This is trivial since (M4) is already given.

Thus, we have shown that (M3) can be derived from (M2) and (M4).

Problem 15. Show that non-negativity of a metric follows from (M2) to (M4).

Proof:

Given the axioms:

(M2): $d(x,y) = 0$ if and only if $x = y$

(M3): $d(x,y) = d(y,x)$

(M4): $d(x,y) \leq d(x,z) + d(z,y)$

We aim to prove non-negativity, i.e., $d(x,y) \geq 0$ for all $x, y$.

Consider any two points $x$ and $y$.

Using (M4) with $z = x$, we get: $d(x,y) \leq d(x,x) + d(x,y)$

From (M2), we know that $d(x,x) = 0$. Substituting this in, we get: $d(x,y) \leq d(x,y)$

Now, using (M3), we have: $d(y,x) = d(x,y)$

Using (M4) with $z = y$, we get: $d(y,x) \leq d(y,y) + d(y,x)$

Again, from (M2), $d(y,y) = 0$. Substituting this in, we get: $d(y,x) \leq d(y,x)$

Combining the two inequalities, we get: $0 \leq d(x,y)$

This proves the non-negativity of the metric $d$.

Thus, non-negativity of a metric follows from (M2) to (M4).