Understanding why your GPU-accelerated is slow using NVIDIA Nsight Systems¶

This training is given in the context of the Scynergy 2026 event. It provides participants with a hands-on introduction to profiling using NVIDIA NSigh-Systems on the MeluXina supercomputer.

🎯 Objectives¶
By the end of this workshop, you should be able to:
- Profile your GPU jobs on MeluXina
- Interpret key NSight-Systems trace metrics and timelines
- Identify common bottlenecks in GPU accelerated codes/applications (IO, compute, memory, synchronization, communication)
- Apply simple optimizations and validate improvements
🪧 Agenda¶
Today's training is composed of:
- Connection to MeluXina via OpenOnDemand (~10 minutes)
- Introduction to NVIDIA NSight-Systems (~30 minutes)
- How to generate a trace with
nsys-profile - Looking at (already collected) traces of a MonAI training on Meluxina
- How to navigate on the NSight-Systems GUI
- How to use the
nsysCLI to get some stats
- Hands-on: making a MonAI (PyTorch based) training faster (~60 minutes)
- Generate your own traces
- Modify the code to accelerate it
💻 Demo/Hands-on Mix¶
- Hands-on Part: Settings things up
- Demo and Discussion: ️Getting to know the tool and profiling of a slow [MonAI](https://project-monai.github.io/) training code
- Hands-on Part: Profile and optimize a distributed GPU-accelerated code
ℹ️ About this training¶
Authors: Marco Magliulo, Emmanuel Kieffer, and Tom Walter
This training has been developed by the Supercomputing Application Services group at LuxProvide in the context of the EPICURE project.

