Course Outline

Introduction

  • What is ROCm?
  • What is HIP?
  • ROCm vs CUDA vs OpenCL
  • Overview of ROCm and HIP features and architecture
  • ROCm for Windows vs ROCm for Linux

Installation

  • Installing ROCm on Windows
  • Verifying the installation and check the device compatibility
  • Updating or uninstall ROCm on Windows
  • Troubleshooting common installation issues

Getting Started

  • Creating a new ROCm project using Visual Studio Code on Windows
  • Exploring the project structure and files
  • Compiling and run the program
  • Displaying the output using printf and fprintf

ROCm API

  • Using ROCm API in the host program
  • Querying device information and capabilities
  • Allocating and deallocate device memory
  • Copying data between host and device
  • Launching kernels and synchronize threads
  • Handling errors and exceptions

HIP Language

  • Using HIP language in the device program
  • Writing kernels that execute on the GPU and manipulate data
  • Using data types, qualifiers, operators, and expressions
  • Using built-in functions, variables, and libraries

ROCm and HIP Memory Model

  • Using different memory spaces, such as global, shared, constant, and local
  • Using different memory objects, such as pointers, arrays, textures, and surfaces
  • Using different memory access modes, such as read-only, write-only, read-write, etc.
  • Using memory consistency model and synchronization mechanisms

ROCm and HIP Execution Model

  • Using different execution models, such as threads, blocks, and grids
  • Using thread functions, such as hipThreadIdx_x, hipBlockIdx_x, hipBlockDim_x, etc.
  • Using block functions, such as __syncthreads, __threadfence_block, etc.
  • Using grid functions, such as hipGridDim_x, hipGridSync, cooperative groups, etc.

Debugging

  • Debugging ROCm and HIP programs on Windows
  • Using Visual Studio Code debugger to inspect variables, breakpoints, call stack, etc.
  • Using ROCm Debugger to debug ROCm and HIP programs on AMD devices
  • Using ROCm Profiler to analyze ROCm and HIP programs on AMD devices

Optimization

  • Optimizing ROCm and HIP programs on Windows
  • Using coalescing techniques to improve memory throughput
  • Using caching and prefetching techniques to reduce memory latency
  • Using shared memory and local memory techniques to optimize memory accesses and bandwidth
  • Using profiling and profiling tools to measure and improve the execution time and resource utilization

Summary and Next Steps

Requirements

  • An understanding of C/C++ language and parallel programming concepts
  • Basic knowledge of computer architecture and memory hierarchy
  • Experience with command-line tools and code editors
  • Familiarity with Windows operating system and PowerShell

Audience

  • Developers who wish to learn how to install and use ROCm on Windows to program AMD GPUs and exploit their parallelism
  • Developers who wish to write high-performance and scalable code that can run on different AMD devices
  • Programmers who wish to explore the low-level aspects of GPU programming and optimize their code performance
 21 Hours

Delivery Options

Private Group Training

Our identity is rooted in delivering exactly what our clients need.

  • Pre-course call with your trainer
  • Customisation of the learning experience to achieve your goals -
    • Bespoke outlines
    • Practical hands-on exercises containing data / scenarios recognisable to the learners
  • Training scheduled on a date of your choice
  • Delivered online, onsite/classroom or hybrid by experts sharing real world experience

Private Group Prices RRP from €6840 online delivery, based on a group of 2 delegates, €2160 per additional delegate (excludes any certification / exam costs). We recommend a maximum group size of 12 for most learning events.

Contact us for an exact quote and to hear our latest promotions


Public Training

Please see our public courses

Testimonials (2)

Provisonal Upcoming Courses (Contact Us For More Information)

Related Categories