PixInsight GPU acceleration for AMD GPUS

Starless North America Neula

I am thrilled to share that GPU accelerated tensorflow plugins on PixInsight now work with AMD 7000 series (76000,7800,7900xtx) GPUs. It took a little bit to figure out the build process and successfully get a library. I hope this simple guide can help get you up and running and enjoying GPU acceleration with StarNet++ and other apps.

This guide has been updated to reflect that WSL2 GPU Pass through support works on Windows and Ubuntu with latest ROCm release. This will allow windows users to run PixInsight from WSL2 and enjoy GPU Pass through acceleration. I hope we see continued development for native Windows tensorflow, but for now, this will allow you to run the Linux version almost like it’s native.

Requirements

  • Ubuntu 24.0.4
  • PixInsight (Latest version)
  • AMD GPU (AMD 7000 series)
  • GPU enabled scripts to verify (Starnet++)

Optional: WSL2 ROcm Requirements

These steps are only needed if you run WSL2 Ubuntu inside Windows. Only tested on 7000 series GPUs.

  • Install latest Radeon Driver available for Windows (as of 11/1/2024)
  • Reboot
  • Jump to the Install ROCM Windows WSL2 section.

Install rocm – Ubuntu no WSL2

These steps are for bare metal / VM running Ubuntu 24.04. Please be sure that you have completed all apt updates prior to installing and make sure you have plenty of free disk space available.

sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
sudo usermod -a -G render,video $LOGNAME # Adding current user to Video, Render groups. See prerequisites.
sudo apt update
wget https://repo.radeon.com/amdgpu-install/6.2.3/ubuntu/noble/amdgpu-install_6.2.60203-1_all.deb 
sudo apt install ./amdgpu-install_6.2.60203-1_all.deb
amdgpu-install --usecase=rocm,dkms
echo "Please reboot system for all settings to take effect."

Be sure to reboot as the above script says. This process will build kernel drivers and also set up permissions for render/video so the GPU can access the kernel drivers. GPU Acceleration in PixInsight with tensorflow based apps will NOT work until you restart.

Note: If you are having issues with PixInsight failing for QT crashes on Ubuntu, please be sure to install the following packages and their dependencies.

sudo apt-get install libqt5core5a libqt5svg5 libqt5webenginecore5 libqt5webengine5 libqt5webenginewidgets5 libqt5x11extras5

Install ROCM – Windows WSL2

These steps are for WSL2 on Windows 10/11 running Ubuntu 24.04

sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
sudo apt update
wget https://repo.radeon.com/amdgpu-install/6.2.3/ubuntu/noble/amdgpu-install_6.2.60203-1_all.deb 
sudo apt install ./amdgpu-install_6.2.60203-1_all.deb
wget https://repo.radeon.com/amdgpu/6.2.3/ubuntu/pool/main/h/hsa-runtime-rocr4wsl-amdgpu/hsa-runtime-rocr4wsl-amdgpu_1.14.0-2057403.24.04_amd64.deb
sudo apt install ./hsa-runtime-rocr4wsl-amdgpu_1.14.0-2057403.24.04_amd64.deb
amdgpu-install -y --usecase=wsl,rocm --no-dkms
echo "Please restart WSL2 session for these to take effect."

Verify ROCMinfo

You should be able to run rocminfo to see that your GPU is detected and operating within ROCM on Ubuntu/WSL

byron@3900X:~$ rocminfo
WSL environment detected.
=====================
HSA System Attributes
=====================
Runtime Version:         1.1
Runtime Ext Version:     1.6
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE
System Endianness:       LITTLE
Mwaitx:                  DISABLED
DMAbuf Support:          NO

==========
HSA Agents
==========
*******
Agent 1
*******
  Name:                    CPU
  Uuid:                    CPU-XX
  Marketing Name:          CPU
  Vendor Name:             CPU
  Feature:                 None specified
  Profile:                 FULL_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        0(0x0)
  Queue Min Size:          0(0x0)
  Queue Max Size:          0(0x0)
  Queue Type:              MULTI
  Node:                    0
  Device Type:             CPU
  Cache Info:
  Chip ID:                 0(0x0)
  Cacheline Size:          64(0x40)
  Internal Node ID:        0
  Compute Unit:            24
  SIMDs per CU:            0
  Shader Engines:          0
  Shader Arrs. per Eng.:   0
  Memory Properties:
  Features:                None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    49331708(0x2f0bdfc) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 2
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    49331708(0x2f0bdfc) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
  ISA Info:
*******
Agent 2
*******
  Name:                    gfx1100
  Marketing Name:          AMD Radeon RX 7900 XTX
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        16(0x10)
  Queue Min Size:          4096(0x1000)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    1
  Device Type:             GPU
  Cache Info:
    L1:                      32(0x20) KB
    L2:                      6144(0x1800) KB
    L3:                      98304(0x18000) KB
  Chip ID:                 29772(0x744c)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   2371
  Internal Node ID:        1
  Compute Unit:            96
  SIMDs per CU:            2
  Shader Engines:          6
  Shader Arrs. per Eng.:   2
  Coherent Host Access:    FALSE
  Memory Properties:
  Features:                KERNEL_DISPATCH
  Fast F16 Operation:      TRUE
  Wavefront Size:          32(0x20)
  Workgroup Max Size:      1024(0x400)
  Workgroup Max Size per Dimension:
    x                        1024(0x400)
    y                        1024(0x400)
    z                        1024(0x400)
  Max Waves Per CU:        32(0x20)
  Max Work-item Per CU:    1024(0x400)
  Grid Max Size:           4294967295(0xffffffff)
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)
    y                        4294967295(0xffffffff)
    z                        4294967295(0xffffffff)
  Max fbarriers/Workgrp:   32
  Packet Processor uCode:: 2280
  SDMA engine uCode::      21
  IOMMU Support::          None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    25080996(0x17eb4a4) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:2048KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 2
      Segment:                 GROUP
      Size:                    64(0x40) KB
      Allocatable:             FALSE
      Alloc Granule:           0KB
      Alloc Recommended Granule:0KB
      Alloc Alignment:         0KB
      Accessible by all:       FALSE
  ISA Info:
    ISA 1
      Name:                    amdgcn-amd-amdhsa--gfx1100
      Machine Models:          HSA_MACHINE_MODEL_LARGE
      Profiles:                HSA_PROFILE_BASE
      Default Rounding Mode:   NEAR
      Default Rounding Mode:   NEAR
      Fast f16:                TRUE
      Workgroup Max Size:      1024(0x400)
      Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
      Grid Max Size:           4294967295(0xffffffff)
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)
        y                        4294967295(0xffffffff)
        z                        4294967295(0xffffffff)
      FBarrier Max Size:       32
*** Done ***

Download libtensorflow.so for AMD

I compiled this on Ubuntu 22.0.4 against ROCm 6.x using kernel 6.6. Please make sure you are on a kernel 6.6 (or if 6.8 is available when you read this, use kernel 6.8 or higher)

Please download latest release

Updated on 07/02/2024 – Libtensorflow217 – compiled against ROCm 6.x

Install libtensorflow into Pixinsight

Backup your original libtensorflow in PixInsight:

cd /opt/PixInsight/bin/lib
mkdir bak
mv libtensorflow* bak

Extract the download file to /usr/local from the path where you downloaded it too.

sudo tar -C /usr/local -xzf /path/to/Downloads/libtensorflow217.tar.gz

Run linker to build OS library

sudo ldconfig /usr/local/lib

Exit/Restart PixInsight and try StarNet or run StarNet CLI. PixInsight won’t load this library if you don’t restart after running ldconfig.

Update mesa & Drivers

Optional:PixInsight requires updated MESA drivers for WSL2 and standard Ubuntu installations. The issue was more pronounced with Ubuntu 22.04, prompting us to revise our guide to concentrate solely on 24.04. Should you encounter graphics or performance issues, consider installing these updated components; otherwise, this section may be omitted if you’re using Ubuntu 24.04.

sudo add-apt-repository ppa:kisak/kisak-mesa
sudo apt update
sudo apt upgrade

Once this is complete, then you will want to reboot as it may have updated the kernel blobs for your graphics driver as well.

GPU accelerated Starnet++

Now let’s verify that StarNet will use the AMD GPU. StarNet includes a libtensorflow.so in its directory that you will need to replace with my AMD ones if you run the CLI version, but if you run this tool in PixInsight it should use the one we installed in your ld path.

bymiller@byron-X570:~/Downloads/StarNetv2CLI_linux$ sh run_starnet.sh 
2024-05-12 21:01:54.012227: E external/local_xla/xla/stream_executor/plugin_registry.cc:90] Invalid plugin kind specified: DNN
2024-05-12 21:01:54.033232: E external/local_xla/xla/stream_executor/plugin_registry.cc:90] Invalid plugin kind specified: BLAS
Reading input image... Done!
Bits per sample: 16
Samples per pixel: 3
Height: 712
Width: 1048
Restoring neural network checkpoint... Done!
2024-05-12 21:01:54.384899: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-12 21:01:54.436196: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:812] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-05-12 21:01:54.458272: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:812] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-05-12 21:01:54.458309: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:812] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-05-12 21:01:54.458427: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:812] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-05-12 21:01:54.458460: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:812] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-05-12 21:01:54.458493: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:812] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-05-12 21:01:54.458511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 23512 MB memory:  -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:0a:00.0
Total number of tiles: 15
2024-05-12 21:01:54.798149: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
100% finished

Done!

Hooray! AMD Users Rejoice at beautiful GPU accelerated StartNet++ (and other PixInsight tools that are GPU enabled)

Radeontop (Bare metal/VM only)

You can install radeontop and run it in a shell while you run starnet or other Tensorflow jobs and see the GPU load spike.

Note: This only works on VM and bare metal. GPU pass through on WSL2 does not show up as a linux device as radeontop and nvtop expect. You can use Windows task manager to view your GPU resources.

sudo apt install radeontop

View GPU load on Windows

From Windows machine, you can right click on your start windows icon, select task manager and then click performance and select GPU. When you run a PixInsight process that uses GPU, you can see it spike here in Windows.

GPU Pass through load for ROCm on Windows WSL2 Ubuntu 22.04

Here you can see the memory and Computer spike when running Noise Xterminator. This process would usually take 20 minutes or longer on full frame masters but now finishes in seconds.

GPU load reflecting WSL2 GPU pass through on ROCm and libtensorflow for PixInsight

AMD GPU Accelerated RC-Astro

I’m happy to report that the amazing rc-astro tools all work with this set up as well.

StarXTerminator – Remove stars or create star mask.

NoiseXTerminator – AI Noise Removal.

BlurXTerminator – Deconvolution.

You can download the trial version of these from: https://www.rc-astro.com/pixinsight-installation-instructions/

Comments

Please leave a comment below if this works or doesn’t work for you. I have another tensorflow library built on 2.13 that may work with older AMD GPUs if this one doesn’t work for you.

May your skies be clear.

13 thoughts on “PixInsight GPU acceleration for AMD GPUS”

  1. Hi Byron – thanks for sharing – this is a breakthrough! Managed to get it working on Arch with ubuntu 22.04 going in distrobox using your guide (starnet++ cli). Fingers are crossed for PixInsight.
    All the best and thanks once again.
    Cheers

    1. Awesome news! It should work on Debian based systems. I’ll be trying to get it to work on Fedora as well next. Which video card do you have? I’d like to track ones it works on.

      1. yep very awesome. good luck with Fedora – maybe Arch after that 😉 – the way pixinsight installs appears to not play ball w/ the distrobox sandboxing. i have a 7900xtx in the main rig – I do also have a 6900xt but that is in the home server for a windoze vm (isn’t getting much use – if i get some time i could spin up a ubuntu or fedora vm and see if it works on the older libraries).

  2. Hi Byron –
    I’m not a linux expert by any means but I stopped at the end of the first step (Install ROCM) because after the instal amdgpu-dkms and install rocm steps it said it couldn’t find either package.

    Also it looks like linux mint is behind the times a bit, I can’t upgrade to any kernel past 6.5 at the moment.

    1. I don’t have linux mint to test. Ubuntu has vendor supported ROCm and Fedora has upstreamed ROCm. I’ll be trying out the latest ROCm build to confirm on Ubuntu 22.04. Can’t wait for official support of 24.04

  3. The AMD Adrenalin drivers for WSL2 do not support 6600 series as of yet and installing the linux headers in WSL2 only works by compiling them yourself (maybe update the tutorial on this or am I missing something?).

  4. Works perfectly on Ubuntu 24.04LTS with a 7900XT – StarXterminator went from 3m16s to 13s (roughly 25x speed!), thank you very much Byron!

  5. Hi! How did you get kernel 6.6 on Ubuntu 22.04? The mainline kernel doesn’t work due to wrong libc version. I managed to get it manually compiled and installed, but when I install rocm, dkms complains about unsupported kernel. Should I try Ubuntu 24.04 instead? Thanks!

  6. Thanks for sharing this guide! I am on WSL2 Windows 11 and have 24.04 LTS installed. PixInsight is already installed. When I try to start following your guide with the line below:

    sudo apt install “linux-headers-$(uname -r)” “linux-modules-extra-$(uname -r)”

    I get

    Reading package lists… Done
    Building dependency tree… Done
    Reading state information… Done
    E: Unable to locate package linux-headers-5.15.153.1-microsoft-standard-WSL2
    E: Couldn’t find any package by glob ‘linux-headers-5.15.153.1-microsoft-standard-WSL2’
    E: Unable to locate package linux-modules-extra-5.15.153.1-microsoft-standard-WSL2

    Any ideas on what to do in this case?

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Shopping Cart
Scroll to Top