Mathematics faster than ever
Within the new Evergine 2022.9.28, we released a new version of our Evergine.Mathematics library, which uses a hardware-accelerated instruction set designed by the CPU manufacturers.
This is possible thanks to the new hardware intrinsics API created for Net6, which allows to use the Single-Instruction Multiple Data (SIMD) instructions on x86, x64 and ARM64 target architectures if the hardware supports it. With this new API we can check if those extensions are present in the hardware and, if possible, use 128 and 256 register vectors to vectorize the code and execute multiple primitive operations in a single cycle (a cycle is the basic unit of time in a CPU).
We have used these instructions to optimize operations inside the most important structs of our Evergine.Mathematics library such as Matrix4x4, Vector4, Matrix3x3 and Quaternion, and the results are impressive.
To create some microbenchmarks we have used benchmarkdotnet. This is the setup to launch the Matrix4x4 comparison.
The program.cs file content:
The Matrix4x4 multiply test content:
The CPU used for all x86 and x64 benchmarks presented in this article is:
Intel Core i9-10980HK CPU 2.40GHz, 1 CPU, 16 logical and 8 physical cores
And the detailed results generated by benchmarkdotnet are:
As you can see, the faster path using MultiplyMethodRef where the matrix parameters are passed by reference instead of copy, the new Path for Net6 using SIMD instruction SSE in this case, needs only 7.393 ns vs 23.965 ns. The code size is also reduced from 1.044 bytes to 414 bytes. Those are really great results for one of the most used math operations in a 3D graphics engine.
To visually see the performance improvement in x86 and x64 architecture we have created some chart for every math struct.
We also developed a new path for ARM64 architecture. To benchmark this new path, we have used a Raspberry Pi4. In this architecture we are using the ARM AdvSimd instruction set if it is possible. These are the results generated from Ubuntu 22.04.1 LTS on Rasberry Pi4.
The new Evergine.Mathematics library is very fast and allows you to take advantage of platform-specific functionality for the machine you’re running on. This is between 3-4X times faster than before thanks to the use of SSE and AVX instructions on x86 and x64 architectures and AdvSimd on ARM64 devices.
The next step is on you, download the new release to start using these performance improvements in all your projects thanks to the new Evergine.Mathematics library.