## DifferentialEquations.jl v6.7.0: GPU-based Ensembles and Automatic Sparsity

Let’s just jump right in! This time we have a bunch of new GPU tools and sparsity handling. (Breaking with Deprecations) DiffEqGPU: GPU-based Ensemble Simulations The MonteCarloProblem interface received an overhaul. First of all, the interface has been renamed to Ensemble. The changes are: MonteCarloProblem -> EnsembleProblem MonteCarloSolution -> EnsembleSolution MonteCarloSummary -> EnsembleSummary num_monte -> trajectories Specifying parallel_type has been deprecated and a deprecation warning is thrown mentioning this. So don’t worry: your code will work but will give warnings as to what to change. Additionally, the DiffEqMonteCarlo.jl package is no longer necessary for any of this functionality. Now, solve of a EnsembleProblem works on the same dispatch mechanism as the rest of DiffEq, which looks like solve(ensembleprob,Tsit5(),EnsembleThreads(),trajectories=n) where the third argument is an ensembling algorithm to specify the threading-based form. Code with the deprecation warning will work until the release of DiffEq 7.0, at which time the alternative path will be removed. See the updated ensembles page for more details The change to dispatch was done for a reason: it allows us to build new libraries specifically for sophisticated handling of many trajectory ODE solves without introducing massive new dependencies to the standard DifferentialEquations.jl user. However, many people might be interested in the first project to make use of this: DiffEqGPU.jl. DiffEqGPU.jl lets you define a problem, like an ODEProblem, and then solve thousands of trajectories in parallel using your GPU. The syntax looks like: monteprob = EnsembleProblem(my_ode_prob) solve(monteprob,Tsit5(),EnsembleGPUArray(),num_monte=100_000) and it will return 100,000 ODE solves. We have seen between a 12x and 90x speedup depending on the GPU of the test systems, meaning that this can be a massive improvement for parameter space exploration on smaller systems of ODEs. Currently there are a few limitations of this method, including that events cannot be used, but those will be solved shortly. Additional methods for GPU-based parameter parallelism are coming soon to the same interface. Also planned are GPU-accelerated multi-level Monte Carlo methods for faster weak convergence of SDEs. Again, this is utilizing compilation tricks to take the user-defined f and recompile it on the fly to a .ptx kernel, and generating kernel-optimized array-based formulations of the existing ODE solvers Automated Sparsity Detection Shashi Gowda (@shashigowda) implemented a sparsity detection algorithm which digs through user-defined Julia functions with Cassette.jl to find out what inputs influence the output. The basic version checks at a given trace, but a more sophisticated version, which we are calling Concolic Combinatoric Analysis, looks at all possible branch choices and utilizes this to conclusively build a Jacobian whose sparsity pattern captures the possible variable interactions. The nice part is that this functionality is very straightforward to use. For example, let’s say we had the following function: function f(dx,x,p,t) for i in 2:length(x)-1 dx[i] = x[i-1] - 2x[i] + x[i+1] end dx[1] = -2x[1] + x[2] dx[end] = x[end-1] - 2x[end] nothing end If we want to find out the sparsity pattern of f, we would simply call: sparsity_pattern = sparsity!(f,output,input,p,t) where output is an array like dx, input is an array like x, p are possible parameters, and t is a possible t. The function will then be analyzed and sparsity_pattern will return a Sparsity type of I and J which denotes the terms in the Jacobian with non-zero elements. By doing sparse(sparsity_pattern) we can turn this into a SparseMatrixCSC with the correct sparsity pattern. This functionality highlights the power of Julia since there is no way to conclusively determine the Jacobian of an arbitrary program f using numerical techniques, since all sorts of scenarios lead to “fake zeros” (cancelation, not checking a place in parameter space where a branch is false, etc.). However, by directly utilizing Julia’s compiler and the SSA provided by a Julia function definition we can perform a non-standard interpretation that tells all of the possible numerical ways the program can act, thus conclusively determining all of the possible variable interactions. Of course, you can still specify analytical Jacobians and sparsity patterns if you want, but if you’re lazy… :) See SparseDiffTools’s README for more details. GPU Offloading in Implicit DE Solving We are pleased to announce the LinSolveGPUFactorize option which allows for automatic offloading of linear solves to the GPU. For a problem with a large enough dense Jacobian, using linsolve=LinSolveGPUFactorize() will now automatically perform the factorization and back-substitution on the GPU, allowing for better scaling. For example: using CuArrays Rodas5(linsolve = LinSolveGPUFactorize()) This simply requires a working installation of CuArrays.jl. See the linear solver documentation for more details. Experimental: Automated Accelerator (GPU) Offloading We have been dabbling in allowing automated accelerator (GPU, multithreading, distributed, TPU, etc.) offloading when the right hardware is detected and the problem size is sufficient to success a possible speedup. A working implementation exists as a PR for DiffEqBase which would allow automated acceleration of linear solves in implicit DE solving. However, this somewhat invasive of a default, and very architecture dependent, so it is unlikely we will be releasing this soon. However, we are investigating this concept in more detail in the AutoOffload.jl. If you’re interested in Julia-wide automatic acceleration, please take a look at the repo and help us get something going! A Complete Set of Iterative Solver Routines for Implicit DEs Previous releases had only a pre-built GMRES implementation. However, as detailed on the linear solver page, we now have an array of iterative solvers readily available, including: LinSolveGMRES – GMRES LinSolveCG – CG (Conjugate Gradient) LinSolveBiCGStabl – BiCGStabl Stabilized Bi-Conjugate Gradient LinSolveChebyshev – Chebyshev LinSolveMINRES – MINRES These are all compatible with matrix-free implementations of a AbstractDiffEqOperator. Exponential integrator improvements Thanks to Yingbo Ma (@YingboMa), the exprb methods have been greatly improved. Next Directions Our current development is very much driven by the ongoing GSoC/JSoC projects, which is a good thing because they are outputting some really amazing results! Here’s some things to look forward to: Automated matrix-free finite difference PDE operators Surrogate optimization Jacobian reuse efficiency in Rosenbrock-W methods Native Julia fully implicit ODE (DAE) solving in OrdinaryDiffEq.jl High Strong Order Methods for Non-Commutative Noise SDEs GPU-Optimized Sparse (Colored) Automatic Differentiation Parallelized Implicit Extrapolation of ODEs