An example showing how to use jax to train resnet50 on multi-node multi-GPU

This repo shows how to use jax for multi-node multi-GPU training. The example is adapted from the resnet50 example in dm-haiku (https://github.com/deepmind/dm-haiku/tree/main/examples/imagenet). It only requires each node knows the IP of the rank 0 node, very similar to PyTorch’s DDP. When two containers on the same cluster are running, one can run the following script in each container to launch a multi-node multi-GPU training job: python train.py –server_ip=$ROOT_IP –server_port=$PORT –num_hosts=$NUM_HOSTS –host_idx=$HOST_IDX GitHub View Github    

Read more

High performance ptychography reconstruction python package running on GPU

Quickstart Need to install python3 to run the GUI and ptychopy, other needed library isin requirement.txt.(Tested OS RHEL 6.0, 7.0). This library could also becompiled as a CUDA-C library. Inside src folder, change build.sh with yourHDF library path. Recommend conda virtual environment, for example conda create -n py36 python=3.6 hdf5-external-filter-plugins-lz4 Activate the virtual environment source activate py36 To install and build the python package, set environment variables HDF5_BASEand CUDAHOME, which point to the installed path of the HDF5 and CUDAlibraries. […]

Read more

A simple command-line utility for querying and monitoring GPU status

Just less than nvidia-smi? NOTE: This works with NVIDIA Graphics Devices only, no AMD support as of now. Contributions are welcome! Self-Promotion: A web interface of gpustat is available (in alpha)! Check out gpustat-web. Usage $ gpustat Options: –color : Force colored output (even when stdout is not a tty) –no-color : Suppress colored output -u, –show-user : Display username of the process owner -c, –show-cmd : Display the process name -f, –show-full-cmd : Display full command and cpu stats […]

Read more

Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters

Somoclu is a massively parallel implementation of self-organizing maps. It exploits multicore CPUs, it is able to rely on MPI for distributing the workload in a cluster, and it can be accelerated by CUDA. A sparse kernel is also included, which is useful for training maps on vector spaces generated in text mining processes. Key features: Fast execution by parallelization: OpenMP, MPI, and CUDA are supported. Multi-platform: Linux, macOS, and Windows are supported. Planar and toroid maps. Rectangular and hexagonal […]

Read more