Skip to content

Understanding why your GPU-accelerated is slow using NVIDIA Nsight Systems

SCynergy 2026

This training is given in the context of the Scynergy 2026 event. It provides participants with a hands-on introduction to profiling using NVIDIA NSigh-Systems on the MeluXina supercomputer.

MeluXina

🎯 Objectives

By the end of this workshop, you should be able to:

  • Profile your GPU jobs on MeluXina
  • Interpret key NSight-Systems trace metrics and timelines
  • Identify common bottlenecks in GPU accelerated codes/applications (IO, compute, memory, synchronization, communication)
  • Apply simple optimizations and validate improvements

🪧 Agenda

Today's training is composed of:

  • Connection to MeluXina via OpenOnDemand (~10 minutes)
  • Introduction to NVIDIA NSight-Systems (~30 minutes)
  • How to generate a trace with nsys-profile
  • Looking at (already collected) traces of a MonAI training on Meluxina
    • How to navigate on the NSight-Systems GUI
    • How to use the nsys CLI to get some stats
  • Hands-on: making a MonAI (PyTorch based) training faster (~60 minutes)
  • Generate your own traces
  • Modify the code to accelerate it

💻 Demo/Hands-on Mix

ℹ️ About this training

Authors: Marco Magliulo, Emmanuel Kieffer, and Tom Walter

This training has been developed by the Supercomputing Application Services group at LuxProvide in the context of the EPICURE project.

EPICURE LuxProvide