Partial differential equations are solved numerically with the finite element method. The idea of this method is to approximate the PDE with a system of linear equations. Solving the equations requires a lot of memory, sometimes too much for a single workstation to handle; this is a situation that might call for domain decomposition. Domain decomposition methods split the domain into $n$ subdomains and solve problems on the subdomains instead of globally. The subproblems can be solved sequentially, and each subproblem only requires $1/n$ times as much memory as the original problem. In principle, this allows for solving very memory intensive problems on even small workstations, although in practice the number of subdomains is also limited by a relationship between the number of subdomains and computation time.

The problem with increasing computation time is alleviated in a cluster setup, where each subdomain can be sent to a separate server and all the subproblems can be solved in parallel. However, to be able to do this requires some extra forethought in the implementation of the domain decomposition method, there are different ways in which it can be done. This post explores how to implement Schwarz type domain decomposition methods using the domain decomposition subpackage in FEMAddOns, a subpackage that I wrote while doing an internship at Wolfram Research. We will first look at how to implement the sequential versions of multiplicative Schwarz and additive Schwarz with functions from the package's private context. We will then implement a new domain decomposition method that is not included in the package, a version of multiplicative Schwarz which has been parallelized with the use of graph coloring functions in the IGraph/M package.

Feel free to skip ahead to the next part if you already know how domain decomposition methods work. If not, read on for a non-mathematical description of Schwarz type algorithms. There is a more mathematical treatment in the documentation that comes with the package. We start by importing the packages that needs:

We now define a mesh that we are going to use as an example throughout this text.

In a Schwarz type domain decomposition algorithm, we start by splitting this into a number of overlapping regions. In the following picture, there are four different regions, and darker patches in between them that signify overlap. Each region can be thought of as a graph of connected components, and an overlap of 0.05 indicates that we desire an overlap of five percent of the graph diameter of that graph, beyond the edge where the two regions would have met in the case of no overlap.

It is easier to see that a region can be represented as a graph when zooming in:

We see that this subregion, like the other subregions, consists of triangles. Each triangle has two or three adjacent triangles. The graph that we are interested in is the one where each triangle is a node and there is an edge to each adjacent triangle.

The boundaries of the original problem are called "physical boundaries" and the boundaries of a subregion that do not belong to the set of physical boundaries are called "artificial boundaries." The overlap means that the artificial boundaries of a subregion lies inside other subregions. Multiplicative Schwarz algorithm proceeds in the following way:

- Make a guess for what the solution of the PDE is. This is, give a numerical value for each incident in the mesh (each corner in the triangles.)
- Solve the PDE over subregion 1. The boundary conditions on the physical boundaries remain the same as for the original problem. We supply Dirichlet conditions for the artificial boundaries, setting the value at those boundaries to the value of the corresponding incidents in the overlapping subregions.
- After we have solved the PDE over subregion 1, we need to update the Dirichlet conditions on the artificial boundaries that overlap subregion 1.
- We do the same thing for subregion 2, then 3, then 4. After this, we go back and start again with subregion 1. Continuing like this, the Dirichlet conditions on the artificial boundaries will converge to the solution of the original problem. When the Dirichlet conditions don't change much from iteration to iteration, we say that it has converged.
- When the Dirichlet conditions have converged, we use the solutions on the subregions to get the solution of the original problem.

Multiplicative Schwarz is not readily parallelizable because we have to solve the problem over each subregion sequentially, but we shall see later how to parallelize multiplicative Schwarz using graph coloring. There is another very similar algorithm, called additive Schwarz, which is inherently parallelizable. The only difference between the two algorithms is that additive Schwarz solves the problem in all of the subregions before it updates the Dirichlet conditions on the artificial boundaries. This change makes additive Schwarz parallelizable because each subregion can be solved for independently. The number of subregions can be adapted to the number of servers available. Should there be eight servers available, for example, we could create eight subregions and send one subregion to each server. The problem with additive Schwarz is that it converges more slowly because information does not travel between the subregions as fast as in multiplicative Schwarz, since the boundary conditions are updated more rarely. This is our motivation for parallelizing multiplicative Schwarz with graph coloring, in this way we get both the ability to distribute the workload over several servers and we get the superior convergence rate of multiplicative Schwarz.

## Test problem

The following problem will be used throughout this text to test our algorithms:It is the Laplacian equation over a Menger mesh, with Dirichlet conditions on all edges. The mesh is separated into six different regions, also called subdomains. Before we describe how to implement domain decomposition methods, we start by using the implementation that comes with the package to see how well domain decomposition performs on this problem (this is also done in the documentation).

We see that solving the original problem directly takes approximately six times as much memory as solving it using domain decomposition with six subdomains. The solutions are almost exactly the same:

This is what the solution looks like:

## Template (Multiplicative Schwarz)

The package's private context includes a high-level API that makes it easy to implement Schwarz type algorithms. We will start by exposing this API and then present an implementation of the multiplicative Schwarz algorithm, which will serve as a template for how to implement other Schwarz algorithms. The following code puts the private functions that we need in the global context:

Next, we initialize some variables that we need to contain the state of the solutions as we iterate. Our starting guess is that the solution is zero everywhere. This is a bad guess that means that we will have to do many iterations before the algorithm converges, but it works for our purpose in this article. What one would typically do in a real case would be to solve the problem over a coarse mesh, as coarse as is possible to solve the problem with the memory that one has on a single workstation, and then use that as the starting guess. However, since we are only experimenting, making a bad guess is alright.

`restrictedPDESpec`

is required to ensure that boundary conditions given by the user only applies to physical boundaries. For example, a Dirichlet condition such as `DirichletCondition[u[x,y] = 0, x > 3]`

could apply to artificial boundaries as well as physical boundaries. `restrictedPDESpec`

changes the conditions so that the affected nodes need to have both `x`

larger than 3 and also be part of the physical boundary.

Now we are ready to solve the PDE. The main part of the implementation is a loop within a loop, just as our outline of the algorithm would suggest. The inner loop loops over the subdomains and does the three things that it needs to do in every iteration:

- It updates the boundary condition, using values from the other subdomains.
- It solves the problem using these boundary conditions.
- It updates the global solution so that the new values will be available for other subdomains so that they can update their boundary conditions.

The outer loop repeats the inner loop until convergence or until it hits the maximum number of iterations.

This shows that the solution has converged within the given tolerance. It took 14 iterations.

The solution matches the solution given by `NDSolve`

, with only a small difference:

We can also visually confirm that the solution looks the way that we expect:

## Additive Schwarz method

Turning the implementation of multiplicative Schwarz into additive Schwarz requires only one small change. We have to replace

with

so that the boundary conditions are only updated once per iteration. As we can see below, it now takes 28 iterations instead of 14 before the solution has converged to the given tolerance. Additive Schwarz is easy to parallelize because the inner loop can be parallelized, but it comes at the cost of being twice as slow in this case.

## Coloring

We have seen that additive Schwarz is easy to parallelize. Each iteration of the inner loop is independent of the other iterations which means that the iterations can be evaluated on different servers if necessary, such as when memory is a problem. The inner loop in our implementation of multiplicative Schwarz, on the other hand, cannot be parallelized in this way because each iteration affects the boundary conditions used in the other iterations. This does not mean, however, that there aren't groups of iterations \[Dash] or equivalently, of subdomains \[Dash] that are independent of each other. The solution on a subdomain only affects the boundary conditions of other subdomains that overlap it. No other subdomains are affected. The goal, then, is to find groups of subdomains that are not neighbors of each other. Once we have such groups, we can handle the groups sequentially but parallelize the work within one and the same group.

Graph vertex coloring refers to the problem of assigning colors to vertices in a graph such that no two vertices of the same color are joined by an edge. We can find groups of subdomains such that no subdomains in the same group are connected by an overlap, using graph coloring, by creating a graph in which each subdomain is a vertex and edges connect subdomains that are connected by an overlap. Each group consists of those vertices that have the same color.

Wolfram Language does not currently have graph coloring built-in. The two options are either Combinatorica, which is a deprecated package, or IGraph/M, which is an open source project. Let's see how to use the graph coloring functions of IGraph/M. We start by loading the package and defining our mesh, which is the same as the one that we have used before:

Create the graph and find a coloring:

In the following figure, each subdomain is drawn with the color that it was assigned by the graph coloring algorithm. No subdomains with the same color share incidents:

Let's also try one case with more subdomains, for illustration:

## Parallelization with coloring

We are now ready to do what we set out to do: implement a new version of Schwarz algorithm that is not currently implemented in the domain decomposition package. We do this by creating a new loop, one that loops over groups. Inside it, we find the inner loop that we have seen before, and it can be parallelized because the iterations in the innermost loop are independent of each other.

We see that it works and that it converged within the given tolerance in sixteen iterations. This is two more iterations than our previous implementation of multiplicative Schwarz, but still clearly much better than additive Schwarz which needs 28 iterations. The small difference in the number of iterations between the two implementations of multiplicative Schwarz happens because the order in which subdomains are considered affects the convergence rate, and when we group the subdomains we also change the order.

## 3D equivalents

We have worked with 2D meshes so far, but the API provided by the domain decomposition package is built to work automatically with 1D and 3D meshes as well. All we have to do is to create the mesh and then run it through any of our implementations, and whether it is 1D, 2D, or 3D it will still work. The graph coloring similarly works without modification, below is an example of that:

## Conclusion

We now know how to use private functions from the domain decomposition subpackage in FEMAddOns to implement Schwarz type domain decomposition algorithms. Furthermore, we were able to show that graph coloring can be used to parallelize multiplicative Schwarz. We saw that the parallelized multiplicative Schwarz algorithm performs approximately as well as the sequential algorithm, and much better than the additive Schwarz algorithm which is currently the only option available for distributed computing. A major takeaway from this article is the template that we developed at the beginning of the article, and that was subsequently reused to implement the different versions. This is a template that can be used to create other Schwarz type algorithms as well, perhaps with different graph coloring.

Nice summary and extension of the FEMAddOns Domain Decomposition. Well done.