Let’s just jump right in! This time we have a bunch of new GPU tools and sparsity handling. (Breaking with Deprecations) DiffEqGPU: GPU-based Ensemble Simulations The MonteCarloProblem interface received an overhaul. First of all, the interface has been renamed to Ensemble. The changes are: MonteCarloProblem -> EnsembleProblem MonteCarloSolution -> EnsembleSolution MonteCarloSummary -> EnsembleSummary num_monte -> trajectories Specifying parallel_type has been deprecated and a deprecation warning is thrown mentioning this. So don’t worry: your code will work but will give warnings as to what to change. Additionally, the DiffEqMonteCarlo.jl package is no longer necessary for any of this functionality. Now, solve of a EnsembleProblem works on the same dispatch mechanism as the rest of DiffEq, which looks like solve(ensembleprob,Tsit5(),EnsembleThreads(),trajectories=n) where the third argument is an ensembling algorithm to specify the threading-based form. Code with the deprecation warning will work until the release of DiffEq 7.0, at which time the alternative path will be removed. See the updated ensembles page for more details The change to dispatch was done for a reason: it allows us to build new libraries specifically for sophisticated handling of many trajectory ODE solves without introducing massive new dependencies to the standard DifferentialEquations.jl user. However, many people might be interested in the first project to make use of this: DiffEqGPU.jl. DiffEqGPU.jl lets you define a problem, like an ODEProblem, and then solve thousands of trajectories in parallel using your GPU. The syntax looks like: monteprob = EnsembleProblem(my_ode_prob) solve(monteprob,Tsit5(),EnsembleGPUArray(),num_monte=100_000) and it will return 100,000 ODE solves. We have seen between a 12x and 90x speedup depending on the GPU of the test systems, meaning that this can be a massive improvement for parameter space exploration on smaller systems of ODEs. Currently there are a few limitations of this method, including that events cannot be used, but those will be solved shortly. Additional methods for GPU-based parameter parallelism are coming soon to the same interface. Also planned are GPU-accelerated multi-level Monte Carlo methods for faster weak convergence of SDEs. Again, this is utilizing compilation tricks to take the user-defined f and recompile it on the fly to a .ptx kernel, and generating kernel-optimized array-based formulations of the existing ODE solvers Automated Sparsity Detection Shashi Gowda (@shashigowda) implemented a sparsity detection algorithm which digs through user-defined Julia functions with Cassette.jl to find out what inputs influence the output. The basic version checks at a given trace, but a more sophisticated version, which we are calling Concolic Combinatoric Analysis, looks at all possible branch choices and utilizes this to conclusively build a Jacobian whose sparsity pattern captures the possible variable interactions. The nice part is that this functionality is very straightforward to use. For example, let’s say we had the following function: function f(dx,x,p,t) for i in 2:length(x)-1 dx[i] = x[i-1] - 2x[i] + x[i+1] end dx = -2x + x dx[end] = x[end-1] - 2x[end] nothing end If we want to find out the sparsity pattern of f, we would simply call: sparsity_pattern = sparsity!(f,output,input,p,t) where output is an array like dx, input is an array like x, p are possible parameters, and t is a possible t. The function will then be analyzed and sparsity_pattern will return a Sparsity type of I and J which denotes the terms in the Jacobian with non-zero elements. By doing sparse(sparsity_pattern) we can turn this into a SparseMatrixCSC with the correct sparsity pattern. This functionality highlights the power of Julia since there is no way to conclusively determine the Jacobian of an arbitrary program f using numerical techniques, since all sorts of scenarios lead to “fake zeros” (cancelation, not checking a place in parameter space where a branch is false, etc.). However, by directly utilizing Julia’s compiler and the SSA provided by a Julia function definition we can perform a non-standard interpretation that tells all of the possible numerical ways the program can act, thus conclusively determining all of the possible variable interactions. Of course, you can still specify analytical Jacobians and sparsity patterns if you want, but if you’re lazy… :) See SparseDiffTools’s README for more details. GPU Offloading in Implicit DE Solving We are pleased to announce the LinSolveGPUFactorize option which allows for automatic offloading of linear solves to the GPU. For a problem with a large enough dense Jacobian, using linsolve=LinSolveGPUFactorize() will now automatically perform the factorization and back-substitution on the GPU, allowing for better scaling. For example: using CuArrays Rodas5(linsolve = LinSolveGPUFactorize()) This simply requires a working installation of CuArrays.jl. See the linear solver documentation for more details. Experimental: Automated Accelerator (GPU) Offloading We have been dabbling in allowing automated accelerator (GPU, multithreading, distributed, TPU, etc.) offloading when the right hardware is detected and the problem size is sufficient to success a possible speedup. A working implementation exists as a PR for DiffEqBase which would allow automated acceleration of linear solves in implicit DE solving. However, this somewhat invasive of a default, and very architecture dependent, so it is unlikely we will be releasing this soon. However, we are investigating this concept in more detail in the AutoOffload.jl. If you’re interested in Julia-wide automatic acceleration, please take a look at the repo and help us get something going! A Complete Set of Iterative Solver Routines for Implicit DEs Previous releases had only a pre-built GMRES implementation. However, as detailed on the linear solver page, we now have an array of iterative solvers readily available, including: LinSolveGMRES – GMRES LinSolveCG – CG (Conjugate Gradient) LinSolveBiCGStabl – BiCGStabl Stabilized Bi-Conjugate Gradient LinSolveChebyshev – Chebyshev LinSolveMINRES – MINRES These are all compatible with matrix-free implementations of a AbstractDiffEqOperator. Exponential integrator improvements Thanks to Yingbo Ma (@YingboMa), the exprb methods have been greatly improved. Next Directions Our current development is very much driven by the ongoing GSoC/JSoC projects, which is a good thing because they are outputting some really amazing results! Here’s some things to look forward to: Automated matrix-free finite difference PDE operators Surrogate optimization Jacobian reuse efficiency in Rosenbrock-W methods Native Julia fully implicit ODE (DAE) solving in OrdinaryDiffEq.jl High Strong Order Methods for Non-Commutative Noise SDEs GPU-Optimized Sparse (Colored) Automatic Differentiation Parallelized Implicit Extrapolation of ODEs
DifferentialEquations.jl v6.6.0: Sparse Jacobian Coloring, Quantum Computer ODE Solvers, and Stiff SDEs
Sparsity Performance: Jacobian coloring with numerical and forward differentiation If you have a function f!(du,u) which has a Tridiagonal Jacobian, you could calculate that Jacobian by mixing perturbations. For example, instead of doing u .+ [epsilon,0,0,0,0,0,0,0,...], you’d do u .+ [epsilon,0,0,epsilon,0,0,...]. Because the epsilons will never overlap, you can then decode this “compressed” Jacobian into the sparse form. Do that 3 times and boom, full Jacobian in 4 calls to f! no matter the size of u! Without a color vector, this matrix would take 1+length(u) f! calls, so I’d say that’s a pretty good speedup. This is called Jacobian coloring. [1,2,3,1,2,3,1,2,3,...] are the colors in this example, and places with the same color can be differentiated simultaneously. Now, the DiffEqDiffTools.jl internals allow for passing a color vector into the numerical differentiation libraries and automatically decompressing into a sparse Jacobian. This means that DifferentialEquations.jl will soon be compatible with this dramatic speedup technique. In addition, other libraries in Julia with rely on our utility libraries, like Optim.jl, could soon make good use of this. What if you don’t know a good color vector for your Jacobian? No sweat! The soon to be released SparseDiffTools.jl repository has methods for automatically generating color vectors using heuristic graphical techniques. DifferentialEquations.jl will soon make use of this automatically if you specify a sparse matrix for your Jacobian! Note that the SparseDiffTools.jl repository also includes functions for calculating the sparse Jacobians using color vectors and forward-mode automatic differentiation (using Dual numbers provided by ForwardDiff.jl). In this case, the number of Dual partials is equal to the number of colors, which can be dramatically lower than the length(u) (the dense default!), thereby dramatically reducing compile and run time. Stay tuned for the next releases which begin to auto-specialize everything along the way based on sparsity structure. Thanks to JSoC student Pankaj (@pkj-m) for this work. Higher weak order SROCK methods for stiff SDEs Deepesh Thakur (@deeepeshthakur) continues his roll with stiff stochastic differential equation solvers by implementing not 1 but 7 new high weak order stiff SDE solvers. SROCK1 with generalized noise, SKSROCK, and a bunch of variants of SROCK2. Benchmark updates will come soon, but I have a feeling that these new methods may be by far the most stable methods in the library, and the ones which achieve the lowest error in the mean solution most efficiently. DiffEqBot GSoC student Kanav Gupta (@kanav99) implemented a bot for the JuliaDiffEq team that allows us to run performance regression benchmarks on demand with preset Gitlab runners. Right now this has a dedicated machine for CPU and parallelism performance testing, and soon we’ll have a second machine up and running for performance testing on GPUs. If you haven’t seen the Julialang blog post on this topic, please check it out!. Quantum ODE Solver QuLDE If you happen to have a quantum computer handy, hold your horses. QuLDE from QuDiffEq.jl is an ODE solver designed for quantum computers. It utilizes the Yao.jl quantum circuit simulator to run, but once Yao.jl supports QASM then this will compile to something compatible with (future) quantum computing hardware. This means that, in order to enter the new age of computing, all you have to do is change solve(prob,Tsit5()) to solve(prob,QuLDE()) and you’re there. Is it practical? Who knows (please let us know). Is it cool? Oh yeah! See the quantum ODE solver blog post for more details. Commutative Noise GPU compatibility The commutative noise SDE solvers are now GPU-compatible thanks to GSoC student Deepesh Thakur (@deeepeshthakur). The next step will be to implement high order non-commutative noise SDE solvers and the associated iterated integral approximations in a manner that is GPU-compatible. New benchmark and tutorial repository setups DiffEqBenchmarks.jl and DiffEqTutorials.jl are now fully updated to a Weave.jl form. We still need to fix up a few benchmarks, but it’s in a state that is ready for new contributions. Optimized multithreaded extrapolation The GBS extrapolation methods have gotten optimized, and they now are the one of the most efficient methods at lower tolerances of the Float64 range for non-stiff ODEs: Thank you to Konstantin Althaus (@AlthausKonstantin) for contributing the first version of this algorithm and GSoC student Saurabh Agarwal (@saurabhkgp21) for adding automatic parallelization of the method. This method will soon see improvements as multithreading will soon be improved in Julia v1.2. The new PARTR features will allow our internal @threads loop to perform dynamic work-stealing which will definitely be a good improvement to the current parallelism structure. So stay tuned: this will likely benchmark even better in a few months. Fully non-allocating exp! in exponential integrators Thanks to Yingbo Ma (@YingboMa) for making the internal exp calls of the exponential integrators non-allocating. Continued improvements to this category of methods is starting to show promise in the area of semilinear PDEs. Rosenbrock-W methods JSoC student Langwen Huang (@huanglangwen) has added the Rosenbrock-W class of methods to OrdinaryDiffEq.jl. These methods are like the Rosenbrock methods but are able to reuse their W matrix for multiple steps, allowing the method to scale to larger ODEs more efficiently. Since the Rosenbrock methods benchmark as the fastest methods for small ODEs right now, this is an exciting new set of methods which will get optimized over the course of the summer. Efficient Jacobian reuse techniques and the ability to utilize the sparse differentiation tooling are next on this project. Next Directions Our current development is very much driven by the ongoing GSoC/JSoC projects, which is a good thing because they are outputting some really amazing results! Here’s some things to look forward to: Higher order SDE methods for non-commutative noise Parallelized methods for stiff ODEs Integration of sparse colored differentiation into the differential equation solvers Jacobian reuse efficiency in Rosenbrock-W methods Exponential integrator improvements Native Julia fully implicit ODE (DAE) solving in OrdinaryDiffEq.jl Automated matrix-free finite difference PDE operators Surrogate optimization GPU-based Monte Carlo parallelism
Well, we zoomed towards this one. In this release we have a lot of very compelling new features for performance in specific domains. Large ODEs, stiff SDEs, high accuracy ODE solving, many callbacks, etc. are all specialized on and greatly improved in this PR.
This is a huge release. We should take the time to thank every contributor to the JuliaDiffEq package ecosystem. A lot of this release focuses on performance features. The ability to use stiff ODE solvers on the GPU, with automated tooling for matrix-free Newton-Krylov, faster broadcast, better Jacobian re-use algorithms, memory use reduction, etc. All of these combined give some pretty massive performance boosts in the area of medium to large sized highly stiff ODE systems. In addition, numerous robustness fixes have enhanced the usability of these tools, along with a few new features like an implementation of extrapolation for ODEs and the release of ModelingToolkit.jl. Let’s start by summing up this release with an example. Comprehensive Example Here’s a nice showcase of DifferentialEquations.jl: Neural ODE with batching on the GPU (without internal data transfers) with high order adaptive implicit ODE solvers for stiff equations using matrix-free Newton-Krylov via preconditioned GMRES and trained using checkpointed adjoint equations. Few programs work directly with neural networks and allow for batching, few utilize GPUs, few have methods applicable to highly stiff equations, few allow for large stiff equations via matrix-free Newton-Krylov, and finally few have checkpointed adjoints. This is all done in a high level programming language. What does the code for this look like? using OrdinaryDiffEq, Flux, DiffEqFlux, DiffEqOperators, CuArrays x = Float32[2.; 0.]|>gpu tspan = Float32.((0.0f0,25.0f0)) dudt = Chain(Dense(2,50,tanh),Dense(50,2))|>gpu p = DiffEqFlux.destructure(dudt) dudt_(du,u::TrackedArray,p,t) = du .= DiffEqFlux.restructure(dudt,p)(u) dudt_(du,u::AbstractArray,p,t) = du .= Flux.data(DiffEqFlux.restructure(dudt,p)(u)) ff = ODEFunction(dudt_,jac_prototype = JacVecOperator(dudt_,x)) prob = ODEProblem(ff,x,tspan,p) diffeq_adjoint(p,prob,KenCarp4(linsolve=LinSolveGMRES());u0=x, saveat=0.0:0.1:25.0,backsolve=false) That is 10 lines of code, and we can continue to make it even more succinct. Now, onto the release highlights. Full GPU Support in ODE Solvers Now not just the non-stiff ODE solvers but the stiff ODE solvers allow for the initial condition to be a GPUArray, with the internal methods not performing any indexing in order to allow for all computations to take place on the GPU without data transfers. This allows for expensive right-hand side calculations, like those in neural ODEs or PDE discretizations, to utilize GPU acceleration without worrying about whether the cost of data transfers will overtake the solver speed enhancements. While the presence of broadcast throughout the solvers might worry one about performance… Fast DiffEq-Specific Broadcast Yingbo Ma (@YingboMa) implemented a fancy broadcast wrapper that allows for all sorts of information to be passed to the compiler in the differential equation solver’s internals, making a bunch of no-aliasing and sizing assumptions that are normally not possible. These change the internals to all use a special @.. which turns out to be faster than standard loops, and this is the magic that really enabled the GPU support to happen without performance regressions (and in fact, we got some speedups from this, close to 2x in some cases!) Smart linsolve defaults and LinSolveGMRES One of the biggest performance-based features to be released is smarter linsolve defaults. If you are using dense arrays with a standard Julia build, OpenBLAS does not perform recursive LU factorizations which we found to be suboptimal by about 5x in some cases. Thus our default linear solver now automatically detects the BLAS installation and utilizes RecursiveFactorizations.jl to give this speedup for many standard stiff ODE cases. In addition, if you passed a sparse Jacobian for the jac_prototype, the linear solver now automatically switches to a form that works for sparse Jacobians. If you use an AbstractDiffEqOperator, the default linear solver automatically switches to a Krylov subspace method (GMRES) and utilizes the matrix-free operator directly. Banded matrices and Jacobians on the GPU are now automatically handled as well. Of course, that’s just the defaults, and most of this was possible before but now has just been made more accessible. In addition to these, the ability to easily switch to GMRES was added via LinSolveGMRES. Just add linsolve = LinSolveGMRES() to any native Julia algorithm with a swappable linear solver and it’ll switch to using GMRES. In this you can pass options for preconditioners and tolerances as well. We will continue to integrate this better into our integrators as doing so will enhance the efficiency when solving large sparse systems. Automated J*v Products via Autodifferentiation When using GMRES, one does not need to construct the full Jacobian matrix. Instead, one can simply use the directional derivatives in the direction of v in order to compute J*v. This has now been put into an operator form via JacVecOperator(dudt_,x), so now users can directly ask for this to occur using one line. It allows for the use of autodifferentiation or numerical differentiation to calculate the J*v. DEStats One of the nichest but nicest new features is DEStats. If you do sol.destats then you will see a load of information on how many steps were taken, how many f calls were done, etc. giving a broad overview of the performance of the algorithm. Thanks to Kanav Gupta (@kanav99) and Yingbo Ma (@YingboMa) for really driving this feature since it has allowed for a lot of these optimizations to be more thoroughly investigated. You can expect DiffEq development to accelerate with this information! Improved Jacobian Reuse One of the things which was noticed using DEStats was that the amount of Jacobians and inversions that were being calculated could be severly reduced. Yingbo Ma (@YingboMa) did just that, greatly increasing the performance of all implicit methods like KenCarp4 showing cases in the 1000+ range where OrdinaryDiffEq’s native methods outperformed Sundials CVODE_BDF. This still has plenty of room for improvement. DiffEqBiological performance improvements for large networks (speed and sparsity) Samuel Isaacson (@isaacson) has been instrumental in improving DiffEqBiological.jl and its ability to handle large reaction networks. It can now parse the networks much faster and can build Jacobians which utilize sparse matrices. It pairs with his ParseRxns(???) library and has been a major source of large stiff test problems! Partial Neural ODEs, Batching and GPU Fixes We now have working examples of partial neural differential equations, which are equations which have pre-specified portions that are known while others are learnable neural networks. These also allow for batched data and GPU acceleration. Not much else to say except let your neural diffeqs go wild! Low Memory RK Optimality and Alias_u0 Kanav Gupta (@kanav99) and Hendrik Ranocha (@ranocha) did amazing jobs at doing memory optimizations of low-memory Runge-Kutta methods for hyperbolic or advection-dominated PDEs. Essentially these methods have a minimal number of registers which are theoretically required for the method. Kanav added some tricks to the implementation (using a fun = -> += overload idea) and Henrick added the alias_u0 argument to allow for using the passed in initial condition as one of the registers. Unit tests confirm that our implementations achieve the minimum possible number of registers, allowing for large PDE discretizations to make use of DifferentialEquations.jl without loss of memory efficiency. We hope to see this in use in some large-scale simulation software! More Robust Callbacks Our ContinuousCallback implementation now has increased robustness in double event detection, using a new strategy. Try to break it. GBS Extrapolation New contributor Konstantin Althaus (@AlthausKonstantin) implemented midpoint extrapolation methods for ODEs using Barycentric formulas and different a daptivity behaviors. We will be investigating these methods for their parallelizability via multithreading in the context of stiff and non-stiff ODEs. ModelingToolkit.jl Release ModelingToolkit.jl has now gotten some form of a stable release. A lot of credit goes to Harrison Grodin (@HarrisonGrodin). While it has already been out there and found quite a bit of use, it has really picked up steam over the last year as a modeling framework suitable for the flexibility DifferentialEquations.jl. We hope to continue its development and add features like event handling to its IR. SUNDIALS J*v interface, stats, and preconditioners While we are phasing out Sundials from our standard DifferentialEquations.jl practice, the Sundials.jl continues to improve as we add more features to benchmark against. Sundials’ J*v interface has now been exposed, so adding a DiffEqOperator to the jac_prototype will work with Sundials. DEStats is hooked up to Sundials, and now you can pass preconditioners to its internal Newton-Krylov methods. Next Directions Improved nonlinear solvers for stiff SDE handling More adaptive methods for SDEs Better boundary condition handling in DiffEqOperators.jl More native implicit ODE (DAE) solvers Adaptivity in the MIRK BVP solvers LSODA integrator interface Improved BDF
This marks the release of DifferentialEquations.jl v6.0.0. Here’s a low down of what has happened in the timeframe.