How to Use Arm SIMD to Achieve Huge Performance Gains
These educational materials are for native app developers, familiar with C/C++ programming and with a basic knowledge of SIMD.
Optimize Your Programs
Learn about tips and techniques to create better performing programs, so that today’s compilers can auto-vectorize leveraging Arm SIMD extensions.
- Learn about the importance of using the “restrict” keyword in C correctly. When a compiler auto-vectorizes code, it first needs to be sure that this is a safe action.
- Gain a better understanding of caches, prefetching, and data alignment on Arm platforms. Learn what a programmer can do to improve this access time.
- This how-to guide explains how to avoid pitfalls (cases where inadvertently the developer ends up with floating point operations) and how to leverage the power of integer performance.
- Learn how to structure the flow of your program to make it easier for the compiler to perform auto-vectorization.
- An efficient data layout can be the difference between a slow and very fast program. Learn how you can help the compiler, as well as how you can covert your program to hand-written SIMD code.
Optimize with Arm SIMD
Learn how to optimize in Assembly and in C/C++ using Neon, SVE, and SVE2 intrinsics. Arm intrinsics are a set of C/C++ functions whose precise implementation is known to the Arm compiler, GCC and LLVM. The LLVM (open-source Clang) version 5 and onwards includes support for SVE, and version 9 and onwards includes support for SVE2.
- The Arm intrinsics search engine can be filtered by SIMD ISA (Neon, SVE, SVE2, Helium), base type (floating point, integer, etc.), bit size, and architecture.
Optimizing C/C++ and Assembly Code with Arm SIMD
- The , , , , explain how to use intrinsics in your C/C++ code to take advantage of SIMD in Armv8 and Armv9. For IoT Cortex-M ecosystem, there is the .
C/C++ Case Studies with Open-Source Libraries
- with Neon Intrinsics, Optimizing library with Neon intrinsics, for Arm Neoverse CPUs.
- C compilers have limited ability to vectorize loops with conditional statements. Learn how best to use Arm Neon intrinsics to get the best optimized code from C compilers.
Migrate from x86 and x64 to Arm Intrinsics
Learn about the different methods of porting existing x86 and x64 to Arm SIMD. And get inspired with several case studies from cloud to edge.
- Learn about different libraries to migrate the x86 and x64 Intrinsics code to Arm intrinsics, and how to find intrinsics in large code bases.
- Vectorscan is a portable fork of Intel’s Hyperscan. Learn about the porting challenges and the success of the porting project.
Optimize with Arm Intrinsics for Android
- A wealth of resources on how-to get started using Arm intrinsics (Neon and SVE2) on Android’s NDK.
- A case study on how H.266 (VVenC and VVdeC) was converted from x86 and x64 to Arm Neon with SIMDe, leveraging over 200% performance gains.
- Read the list of considerations to take when deciding which library would be best suited to your SIMD porting needs.
- Blog going through the different porting options with the pros and cons of each, when migrating x86 or x64 code to Arm intrinsics.
Join the Arm Developer Program
Join the Arm Developer Program to build your future on Arm. Get fresh insights directly from Arm experts,
connect with like-minded peers for advice, or build on your expertise and become an Arm Ambassador.
Community Support
Learn from the Community
Talk directly to an Arm expert, George Steed, and the broader Arm community involved in server and cloud computing today.
George Steed
An Arm Expert in SIMD intrinsics and performance optimisation, George has spent the last eight years working on improving the performance of maths libraries and codec implementations running on Arm.
Tell Us What We Are Missing
Think we are missing some resources? Have some examples to share from your experience? Let us know directly via the link below.