Access to Arnold and Majda is currently restricted. Only certain users have access. If you need them for research, for academic year 2023-2024, email zhan1966@purdue.edu.

STEP 0: how to acess Chongzhi/Arnold/Majda

1. Chongzhi is a GPU server with 3 Nvidia TEX A5000 GPUs and 2 AMD CPUs.
2. Arnold is a GPU server with two Intel Xeon Gold 5220R processors (each processor has 24-core, 48 threads, 2.20Ghz, 35.75M Cache), 2TB total RAM, around 8TB disk, and 10 Nvidia Quadro TRX 8000 48 GB GDD R6 GPUs.   
3. Majda is a GPU server with 4 Nvidia A100 80G GPUs, 512 total RAM, 2 Intel Xeon Silver 4310 CPUs (each processor has 12-core, 24 threads).

The following uses Arnold as an example: it is the same for Chongzhi and Majda.

Connect to Arnold from a math linux computer

If you have a linux desktop in math department, simply use ssh (assume your math account ID is dave72 and your linux desktop ID is euler):

euler ~ % ssh arnold
dave72@arnold's password:

Connect to Arnold remotely

Suppose we want to connect to arnold from an off-campus computer. From a linux/apple computer, open the terminal and connect to banach first (assume you have a macbook and your username is dave72):

MacBook-Pro:~ dave% ssh dave72@banach.math.purdue.edu
dave72@banach.math.purdue.edu's password:

then connect to arnold (you cannot ssh to arnold.math.purdue.edu directly from an off-campus computer):

banach ~ % ssh arnold
dave72@arnold's password:

If you have a Windows computer, you need to install a SSH client such as PuTTY

Step 1: check usage of GPUs

There is no scheduler installed so try to avoid using up all GPUs. Also, avoid any intensive CPU jobs on arnold.

Use top to check current CPU usage.

arnold ~ % top

In order to check up the current usage of GPUs, you can use nvidia=smi

arnold ~ % nvidia-smi
 +-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.57 Driver Version: 515.57 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 8000 Off | 00000000:1A:00.0 Off | Off |
| 33% 24C P8 24W / 260W | 3MiB / 49152MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Quadro RTX 8000 Off | 00000000:1B:00.0 Off | Off |
| 33% 25C P8 22W / 260W | 3MiB / 49152MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Quadro RTX 8000 Off | 00000000:1C:00.0 Off | Off |
| 33% 27C P8 31W / 260W | 3MiB / 49152MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Quadro RTX 8000 Off | 00000000:1D:00.0 Off | Off |
| 33% 27C P8 24W / 260W | 3MiB / 49152MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 4 Quadro RTX 8000 Off | 00000000:1E:00.0 Off | Off |
| 33% 27C P8 33W / 260W | 3MiB / 49152MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 5 Quadro RTX 8000 Off | 00000000:3D:00.0 Off | Off |
| 33% 24C P8 28W / 260W | 3MiB / 49152MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 6 Quadro RTX 8000 Off | 00000000:3E:00.0 Off | Off |
| 33% 27C P8 25W / 260W | 3MiB / 49152MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 7 Quadro RTX 8000 Off | 00000000:3F:00.0 Off | Off |
| 33% 24C P8 22W / 260W | 3MiB / 49152MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 8 Quadro RTX 8000 Off | 00000000:40:00.0 Off | Off |
| 33% 27C P8 20W / 260W | 3MiB / 49152MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 9 Quadro RTX 8000 Off | 00000000:41:00.0 Off | Off |
| 33% 26C P8 23W / 260W | 3MiB / 49152MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+


Step 2: load software

Module is used on arnold.

arnold ~ % module avail

You will see a list of installed softwares. Use module to load them. For example, if magma is needed,

arnold ~ % module load magma/2.20-10

Step 3: a demonstration of using GPU in matlab

Download the testing code. Matlab 2023 is needed, and it is available on Arnold and Majda. This is an example of accelerating a simple 3D Poisson solver on Majda. See Section 2.8 in MA 615 notes for details of the simple eigenvector method to invert Laplacian, which has N^{4/3} complexity for a 3D problem. See also this paper for more details. Beware that GPU acceleration can be observed only for large enough problems, e.g., 100^3 might be too small to see the acceleration.  

First, always remember to check which GPU device is available, since there is no queue of submitting jobs and everything is interactive.

majda ~ % nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.106.00 Driver Version: 460.106.00 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 A100 80GB PCIe Off | 00000000:17:00.0 Off | 0 |
| N/A 47C P0 92W / 300W | 12858MiB / 81251MiB | 26% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 A100 80GB PCIe Off | 00000000:65:00.0 Off | 0 |
| N/A 55C P0 106W / 300W | 3224MiB / 81251MiB | 34% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 2 A100 80GB PCIe Off | 00000000:CA:00.0 Off | 0 |
| N/A 48C P0 92W / 300W | 3478MiB / 81251MiB | 30% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 3 A100 80GB PCIe Off | 00000000:E3:00.0 Off | 0 |
| N/A 38C P0 67W / 300W | 50936MiB / 81251MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1631494 C python 2641MiB |
| 0 N/A N/A 1631585 C python 2641MiB |
| 0 N/A N/A 1631629 C python 2641MiB |
| 0 N/A N/A 1631718 C python 2639MiB |
| 0 N/A N/A 1634638 C python 2291MiB |
| 1 N/A N/A 1634639 C python 3221MiB |
| 2 N/A N/A 1634637 C python 3475MiB |
| 3 N/A N/A 1555380 C ...r2023a/bin/glnxa64/MATLAB 50933MiB |
+-----------------------------------------------------------------------------+

In this case, GPU number 3 looks available while the other three are being used. In Matlab, the device number would be 4 (GPU 0 will be labeled as 1 in Matlab). The demo code set default device number ID as 1.

Open matlab in command line mode:

majda ~ % matlab -nodisplay 

< M A T L A B (R) >
Copyright 1984-2023 The MathWorks, Inc.
R2023a Update 2 (9.14.0.2254940) 64-bit (glnxa64)
April 17, 2023

Warning: X does not support locale C.UTF-8

To get started, type doc.
For product information, visit www.mathworks.com.

>> run ('Poisson3Ddemo.m')
This is a code solving 3D Poison on a grid of size 200 by 200 by 200
scheme is 2nd order centered difference
GPU computation: starting to load matrices/data
GPU computation: loading finished and GPU computing started
The ell-2 norm residue is 7.009260e-11
The GPU online computation time is 1.805100e-02

On Majda, for a 1000^3 grid, online computation will cost about 0.8 second of GPU computing time:

 ~ % matlab -nodisplay

< M A T L A B (R) >
Copyright 1984-2023 The MathWorks, Inc.
R2023a Update 2 (9.14.0.2254940) 64-bit (glnxa64)
April 17, 2023
>> run ('Poisson3Ddemo.m')
This is a code solving 3D Poison on a grid of size 1000 by 1000 by 1000
scheme is 2nd order centered difference
GPU computation: starting to load matrices/data
GPU computation: loading finished and GPU computing started
The ell-2 norm residue is 4.851762e-09
The GPU online computation time is 7.683490e-01

The same method also applies to very high order finite element method on cartesian meshes. See this page.

Keep in mind that you should NOT do large CPU jobs on GPU servers. Test large CPU jobs on your own desktops or CPU servers. If running the demo code on a computer without any GPU device, the code will do computation on CPU (you can also simply set Param.device = 'cpu' in the demo code):

 ~ % matlab -nodisplay

< M A T L A B (R) >
Copyright 1984-2023 The MathWorks, Inc.
R2023a Update 2 (9.14.0.2254940) 64-bit (glnxa64)
April 17, 2023
>> run ('Poisson3Ddemo.m')
This is a code solving 3D Poison on a grid of size 200 by 200 by 200
scheme is 2nd order centered difference
The ell-2 norm residue is 6.990211e-11
The CPU online computation time is 1.212430e-01

Step 4: a demonstration of using GPU in Python Jax

For each GPU machine, e.g., Majda, install Jax in your local account via conda, which is a tool of managing software.

First, create an environment with name "myenv" (you can set myenv to any other name). Then activate the environment "myenv" and install Jax under the environment "myenv".

 ~ % conda create -n myenv
....
Proceed (Preparing transaction: done
Verifying transaction: done
Executing transaction: done
~ % conda activate myenv
(myenv) % pip install --upgrade "jax[cuda12]"

Next, download two Python Jax demo codes for solving a 3D Poisson equation using second order finite difference: Jax_double.py is for double precision computing and the Jax_single.py is for single precision.


For double precision, a problem size as large as 1000^3 should be fine.

(myenv) % python Jax_double.py
Available GPUs:
W external/xla/xla/service/platform_util.cc:206] unable to create StreamExecutor for CUDA:0: failed..
[CudaDevice(id=1), CudaDevice(id=2), CudaDevice(id=3)]
Choosing to use GPU id= 2
Solving Poisson of size n^3 with n= 1000
precision: float64
Computational Time is 1.3882017135620117
ell 2 error: 2.2818226843766946e-05

Remark: Be aware that GPU id can be out of range in Python due to various reasons. For example, on Majda, in Python, there are supposed to be four GPUs: id=0, id=1, id=2, id=3. In the example above, GPU (id=0) was being intensively used, thus available devices became " [CudaDevice(id=1), CudaDevice(id=2), CudaDevice(id=3)]", and in this case using "jax.default_device=jax.devices("gpu")[3]" would induce a device index out of range error. The remedy is to use "jax.default_device=jax.devices("gpu")[2]" instead, i.e., id=3 becomes id=2.


For single precision, we can push to a problem size as large as 1300^3.

(myenv) % python Jax_single.py
Available GPUs:
[CudaDevice(id=0), CudaDevice(id=1), CudaDevice(id=2), CudaDevice(id=3)]
Choosing to use GPU id= 2
Solving Poisson of size n^3 with n= 1300
The preparation computation precision
precision: float64
The Poisson solver computation precision
precision: float32
Computational Time is 1.3602116107940674
ell 2 error: 0.00050684914

To exit the environment:

(myenv) % conda deactivate

Author: Xiangxiong Zhang.