*After a while since my last post, I have finally come back to getting some things written! Been really involved with work and other important things for myself, but I recently got most of these things out of the way so it is time to get back to the blogging grind! WOOT!*

# Introduction

In the world of calculus, a fundamental part is the computation of derivatives. The classic formalization of a derivative is in the following form below:

\begin{align}

\left.\frac{df}{dx}\right|_x = \lim_{h \rightarrow 0} \frac{f(x+h) – f(x)}{h} \label{eq1}

\end{align}

This is great and all, but what happens when you try to use equation $\refp{eq1}$ to compute a derivative on a computer? As one might guess, the division by $0$ based on $h$ in the denominator causes some undesireable affects on a computer, one being an estimation for a derivative that is not at all correct. So how can we begin to estimate the value of a derivative at some location? One of the common approaches is something referred to as **Finite Differences**.

# Finite Differences

## Intro

The reality is estimating a derivative is much easier than it first appears. Let us pretend instead of making $h = 0$ in equation $\refp{eq1}$, we choose $h = \epsilon$, where $\epsilon \ll 1$. Computing the derivative in this fashion produces a basic **Finite Difference** scheme, where the name comes from the fact we use small and finite changes in a function, based on a small $\epsilon$, to estimate a value for the derivative of that function. Using this aproach, we can then just use the classic derivative formulation and approximate the derivative like so:

$$ \left.\frac{df}{dx}\right|_x \approx \frac{f(x+\epsilon) – f(x)}{\epsilon} $$

## Error Bounds

As we can imagine, for small values of $h$, the derivative is close. However, this is really just an approximation. What’s the theoretical error of this approximation? Well we can estimate this by using a Taylor Series! This can be done using the following steps:

\begin{align*}

\left.\frac{df}{dx}\right|_x &= \frac{f(x+\epsilon) – f(x)}{\epsilon} + \text{Error} \\

\left.\frac{df}{dx}\right|_x &= \frac{1}{\epsilon} f(x+\epsilon) – \frac{1}{\epsilon} f(x) + \text{Error} \\

\left.\frac{df}{dx}\right|_x &\approx \frac{1}{\epsilon} \left(f(x) + \epsilon \left.\frac{df}{dx}\right|_x + \frac{\epsilon^2}{2!}\left.\frac{d^2f}{dx^2}\right|_x \right) – \frac{1}{\epsilon} f(x) + \text{Error} \\

\left.\frac{df}{dx}\right|_x &\approx \left.\frac{df}{dx}\right|_x + \frac{\epsilon}{2!}\left.\frac{d^2f}{dx^2}\right|_x + \text{Error} \\

\text{Error} &\approx -\frac{\epsilon}{2!}\left.\frac{d^2f}{dx^2}\right|_x = O(\epsilon)

\end{align*}

What we get from this result is that the true error is approximately proportional to the value of $\epsilon$. So basically, if you cut the value of $\epsilon$ in half, you should expect the error of the derivative approximation to be cut in half. That’s pretty neat!

## Deriving New Schemes – Taylor Series Approach

So now that we have proven we can approximate a derivative with this scheme, we’re done and it’s time to wrap things up… Oh wait, you want to know if we can do better than this scheme? Well that’s an interesting thought. Let’s try something. First, let’s assume we can approximate a derivative using the following weighted average:

$$ \left.\frac{df}{dx}\right|_{x_i} \approx \sum_{j=n}^{m} a_j f(x_i + jh)$$

where we are finding the derivative at some location $x_i$ and where $n$ and $m$ are integers one can choose such that $n \lt m$. Given this, let’s also state that the Taylor Series of some $f(x_i + jh)$ can be written out like so:

$$ f(x_i + jh) = f_{i+j} = f_i + \sum_{k=1}^{\infty} \frac{(jh)^k}{k!}f_i^{(k)}$$

What’s interesting is we can see that each $f_{i+j}$ in the weighted average will share similar Taylor Series, only differing in the $(jh)^{k}$ coefficient of each term. We can use this pattern to setup a set of equations. This can be shown to be the following:

\begin{align}

\left.\frac{df}{dx}\right|_{x_i} &\approx \sum_{j=n}^{m} a_j f(x_i + jh) \nonumber \\

\left.\frac{df}{dx}\right|_{x_i} &\approx \sum_{j=n}^{m} a_j \left( f_i + \sum_{k=1}^{\infty} \frac{(jh)^k}{k!}f_i^{(k)} \right) \nonumber \\

\left.\frac{df}{dx}\right|_{x_i} &\approx \left(\sum_{j=n}^{m} a_j\right)f_i + \sum_{j=n}^{m} a_j\sum_{k=1}^{\infty} \frac{(jh)^k}{k!}f_i^{(k)} \nonumber \\

\left.\frac{df}{dx}\right|_{x_i} &\approx \left(\sum_{j=n}^{m} a_j\right)f_i + \sum_{k=1}^{\infty} \left(\sum_{j=n}^{m} a_j j^{k}\right)\frac{h^{k}}{k!}f_i^{(k)} \label{eq_s}

\end{align}

If we equate both sides of this equation, we end up with the following set of equations to solve for the $(m-n+1)$ weights :

\begin{align*}

0 &= \left(\sum_{j=n}^{m} a_j\right) \\

1 &= \left(\sum_{j=n}^{m} a_j j\right)h \\

0 &= \left(\sum_{j=n}^{m} a_j j^{2}\right) \\

0 &= \left(\sum_{j=n}^{m} a_j j^{3}\right) \\

&\vdots \\

0 &= \left(\sum_{j=n}^{m} a_j j^{m-n}\right)

\end{align*}

If we solve this system of equations, we obtain the weights for the weighted average form of the Finite Difference scheme such that we zero out all but one of the $(m-n+1)$ terms in the truncated Taylor Series. This zeroing out of terms typically results in a numerical scheme of order $O(h^{m-n})$ for first order derivative approximations, though the exact order can be found by obtaining the first nonzero Taylor Series term found in the $\text{Error}$ after you plug in the values for $\left\lbrace a_j\right\rbrace$.

### Example 1

As an example, let’s choose the case where $n = -1$ and $m = 1$. Using these values, we end up with the following system of equations:

\begin{align*}

0 &= a_{-1} + a_{0} + a_{1}\\

\frac{1}{h} &= -a_{-1} + a_{1}\\

0 &= a_{-1} + a_{1}

\end{align*}

We will solve this set of equations analytically as an example, but typically you’d likely want to compute it these schemes numerically using some Linear Algebra routines. So based on these equations, we can first see that $a_{-1} = -a_{1}$. Thus, by the second equation, $a_{1} = \frac{1}{2h}$ and in turn $a_{1} = -\frac{1}{2h}$. Plugging in the values for $a_{-1}$ and $a_{1}$ into the first equation results in $a_{0} = 0$. Thus, our resulting Finite Difference scheme, known as a First Order Central Difference, is:

$$ \left.\frac{df}{dx}\right|_{x_i} = \frac{f_{i+1} – f_{i-1}}{2h} $$

That’s pretty convenient! What’s interesting with this setup is we can pretty easily compute Finite Fifferences for more than just a single first order derivative.

### Example 2

To show how Finite Difference schemes can be derived for more complicated expressions, let’s try a second example. So how about we try to compute a Finite Difference scheme to estimate the following quantity:

\begin{align}

\alpha\left.\frac{df}{dx}\right|_{x_i} + \beta\left.\frac{d^2f}{dx^2}\right|_{x_i} \approx \sum_{j=-2}^{2} a_j f(x_i + jh) \label{ex2}

\end{align}

If we use the right-hand side of equation $\refp{eq_s}$ from earlier and equate it to the left-hand side of equation $\refp{ex2}$, we end up with the following:

\begin{align*}

0 &= \left(\sum_{j=-2}^{2} a_j\right) \\

\alpha &= \left(\sum_{j=-2}^{2} a_j j\right)h \\

\beta &= \left(\sum_{j=-2}^{2} a_j j^{2}\right)\frac{h^2}{2!} \\

0 &= \left(\sum_{j=-2}^{2} a_j j^{3}\right) \\

0 &= \left(\sum_{j=-2}^{2} a_j j^{4}\right)

\end{align*}

If we expand the various series in the equations above and then write all these equations in matrix form, we end up with the following matrix equations to solve for the unknown coefficients $\left\lbrace a_j \right\rbrace$:

\begin{align}

\begin{pmatrix}

1 & 1 & 1 & 1 & 1 \\

-2 & -1 & 0 & 1 & 2 \\

4 & 1 & 0 & 1 & 4 \\

-8 & -1 & 0 & 1 & 8 \\

16 & 1 & 0 & 1 & 16

\end{pmatrix}

\begin{pmatrix}

a_{-2} \\

a_{-1} \\

a_{0} \\

a_{1} \\

a_{2}

\end{pmatrix}

=

\begin{pmatrix}

0 \\

\frac{\alpha}{h} \\

\frac{2\beta}{h^2} \\

0 \\

0

\end{pmatrix}

\label{ex2_mat}

\end{align}

After solving this set of equations, using whatever method you prefer (numerically, symbolically, by hand, etc.), one is able to employ the scheme in whatever problems are needed. As one can see from this example, the thing that changes the most in this formulation is just the vector on the right-hand side of the matrix equation where you basically express which derivative quantities you want the Finite Difference scheme to approximate. So as you can see, it’s really not too difficult to develop a Finite Difference scheme!

## Deriving New Schemes – Lagrange Interpolation Approach

Now obtaining Finite Differences in this way is actually not the only approach. One potentially more straight forward way is to build Finite Difference schemes using Lagrange Interpolation. Essentially, the idea is to use Lagrange Interpolation to build an interpolant based on the number of points you wish to use to approximate the derivative. Then, you just take whatever derivatives you need of this interpolant to obtain the derivative you want! So to jump into it, we can write out our Lagrange Interpolant in 1-D, $\hat{f}(x)$, as the following, given we are evaluating our function $f(\cdot)$ at $x_j = x_i + jh \;\; \forall j \in \left\lbrace n, n+1, \cdots, m-1, m \right\rbrace$:

\begin{align}

\hat{f}(x) = \sum_{j=n}^{m} f(x_j) \prod_{k=n,k\neq j}^m \frac{(x-x_k)}{(x_j – x_k)}

\end{align}

Given this expression, we can then compute the necessary derivatives we wish to approximate, evaluate the results at $x_i$, and obtain our Finite Difference scheme. For example, let’s try this again against Example 1. First, we can expand the interpolant to be the following based on the fact $n=-1$ and $m=1$:

\begin{align}

\hat{f}(x) = f(x_{-1})\frac{(x-x_{0})}{(x_{-1} – x_{0})}\frac{(x-x_{1})}{(x_{-1} – x_{1})} +

f(x_{0})\frac{(x-x_{-1})}{(x_{0} – x_{-1})}\frac{(x-x_{1})}{(x_{0} – x_{1})} +

f(x_{1})\frac{(x-x_{0})}{(x_{1} – x_{0})}\frac{(x-x_{-1})}{(x_{1} – x_{-1})}

\end{align}

We then take the derivative once, resulting in the expression below:

\begin{align}

\frac{d\hat{f}}{dx}(x) = f(x_{-1})\frac{(x-x_{0}) + (x-x_{1})}{(x_{-1} – x_{1})(x_{-1} – x_{0})} +

f(x_{0})\frac{(x-x_{-1}) + (x-x_{1})}{(x_{0} – x_{-1})(x_{0} – x_{1})} +

f(x_{1})\frac{(x-x_{0}) + (x-x_{-1})}{(x_{1} – x_{0})(x_{1} – x_{-1})}

\end{align}

We then evaluate this derivative expression at $x_i$ and simplify the numerators and denominators, resulting in the following:

\begin{align}

\frac{d\hat{f}}{dx}(x_i) &= f(x_{-1})\frac{-h}{(-2h)(-h)} +

f(x_{0})\frac{0}{(h)(-h)} +

f(x_{1})\frac{h}{(h)(2h)} \nonumber \\

\frac{d\hat{f}}{dx}(x_i) &= f(x_{1})\frac{1}{2h} – f(x_{-1})\frac{1}{2h} \nonumber \\

\frac{d\hat{f}}{dx}(x_i) &= \frac{f(x_{1}) – f(x_{-1})}{2h} \nonumber \\

\frac{d\hat{f}}{dx}(x_i) &= \frac{f_{i+1} – f_{i-1}}{2h} \label{ex1_v2}

\end{align}

As we can see looking at equation $\refp{ex1_v2}$, the Lagrange Interpolant reproduced a Second Order Central Difference scheme at some location $x_i$, showing there’s more than one approach to generating a Finite Difference scheme. But now knowing a Lagrange Interpolant can help derive these schemes, there’s something quite interesting we can now understand with respect to Finite Differences.

### Runge’s Phenomenon and Finite Differences

In model building, there exists a phenomenon named Runge’s Phenomenon that essentially displays how fitting a polynomial model on the order of the number of data points you’re fitting to results in oscillations between points due to overfitting. An example of this phenomenon can be seen below.

As one can see based on the graphic, the polynomial based on Lagrange interpolation goes through each data point, but gets large deviations between points as it works its way away from the center. This oscillation actually results in large errors in derivatives, which makes sense if you look at the picture. If we just compare the slopes at the right most point on the plot, for example, we can see the slope of the true function is much smaller in magnitude than the slope based on the Lagrange interpolation.

This phenomenon often occurs when data points are roughly equially spaced apart, though it can be shown placing the data differently (like using Chebychev points) can greatly mitigate the occurance of Runge’s Phenomenon. Additionally, the distance between points makes a large difference, where larger distances between points increases the problem with Runge’s Phenomenon. An example plot below show how the fit using Lagrange Interpolation, even for a high order polynomial, does fine when the distances between points are small.

The improved fit using Lagrange interpolation makes sense because the true function approaches linear behavior between data points as the distance between points shrinks, which results in approximate fits being quite accurate.

Now with respect to Finite Differences, since they can be modeled using Lagrange Interpolation, we can see it is possible the accuracy of derivatives based on high order fits (or Finite Differences using many points in the weighted average) can gain a bit of error, especially if the distance between the points aren’t very small. This property of Finite Differences makes them trickier to use successfully if you want to implement, for example, a $9^{th}$ order accurate Finite Difference scheme to estimate a first order derivative.

However, if you do manage to make the stepsize, $h$, particularly small, you will likely result in a fine approximation using a high order Finite Difference scheme. However, it is important one validates the scheme is providing the sort of error one expects, especially since finite arithmetic in floating point numbers can generate its own set of problems (which won’t be covered here).

# Conclusion

In this post, we covered some fundamental theory and examples revolving around the derivation of Finite Differences. In future posts, we will investigate the use of Finite Differences in various problems to get a feel for their value in computational mathematics.

For those interested, I recommend working out some of the math, deriving some error terms for some different Finite Difference formulas, and taking the mathematical equations and trying to build some codes around them… aka be like Bender:

For further investigation in the fundamentals of Finite Difference and related topics covered here, I recommend the book *Fundamentals of Engineering Numerical Analysis* by **Parviz Moin**. This covers a lot of topics fairly well for anyone first investigating the subject of Numerical Analysis and who may not be a mathematician (aka this book isn’t super mathematically rigorous). It is a good read nonetheless!

Nice.

So, how would you turn equation 5 in to a 3-D formulation? Or just in general any multivariate formulation?

So I will note that I plan to do an independent blog post that focuses on Lagrange Interpolation where this would be covered, but the simple approach would be the following equation:

\begin{align*}

\hat{f}(x,y,z) &= \sum_{j=n}^{m} f(x_j) \prod_{k=n,k\neq j}^m \frac{(x-x_k)}{(x_j – x_k)} \frac{(y-y_k)}{(y_j – y_k)} \frac{(z-z_k)}{(z_j – z_k)} \\

\hat{f}(x,y,z) &= \sum_{j=n}^{m} f(x_j) \mathcal{L}_{j}(x,y,z)

\end{align*}

One can see this is valid because if you evaluate the set of Lagrange Polynomials, aka $\lbrace \mathcal{L}_{j}(x,y,z) \rbrace_{j=n}^{m}$, at say $(x_k,y_k,z_k)$, you will find $\mathcal{L}_{j}(x_k,y_k,z_k) = \delta_{jk}$, where $\delta_{jk}$ is the Kronecker Delta and is defined as:

\begin{align*}

\delta_{jk} &= \begin{cases}

1 & j = k \\

0 & j \neq k

\end{cases}

\end{align*}

This Kronecker Delta property is one that defines the Lagrange Polynomials, thus making the formula above correct.