AMD Releases ROCm 6.3 Version
AMD has recently launched the latest version of ROCm, version 6.3, which includes a range of new features and optimizations. Some of the key updates in this release include:
- SGLang integration for accelerated AI inferencing
- Re-engineered FlashAttention-2 for optimized AI training and inference
- Introduction of multi-node Fast Fourier Transform (FFT)
- New Fortran compiler
- Enhanced computer vision libraries like rocDecode, rocJPEG, and rocAL
According to AMD, the SGLang runtime, now supported by ROCm 6.3, is specifically designed to optimize inference on models such as LLMs and VLMs on AMD Instinct GPUs. It promises 6x higher throughput and easier usage due to Python integration and pre-configured ROCm Docker containers.
Additionally, ROCm 6.3 brings further transformer optimizations with FlashAttention-2, offering significant improvements in forward and backward pass compared to FlashAttention-1. The release also includes a new AMD Fortran compiler with direct GPU offloading, backward compatibility, and integration with HIP Kernels and ROCm libraries. Furthermore, multi-node FFT support in rocFFT simplifies multi-node scaling and improves scalability. Enhanced computer vision libraries rocDecode, rocJPEG, and rocAL provide AV1 codec support, GPU-accelerated JPEG decoding, and improved audio augmentation.
AMD emphasizes that ROCm 6.3 aims to deliver cutting-edge tools to simplify development, enhance performance, and scalability for AI and HPC workloads. The company remains committed to open-source principles and evolving to meet the needs of developers. For more information, visit the ROCm Documentation Hub or the AMD ROCm Blogs.