A simple command-line utility for querying and monitoring GPU status

Just less than nvidia-smi? NOTE: This works with NVIDIA Graphics Devices only, no AMD support as of now. Contributions are welcome! Self-Promotion: A web interface of gpustat is available (in alpha)! Check out gpustat-web. Usage $ gpustat Options: –color : Force colored output (even when stdout is not a tty) –no-color : Suppress colored output -u, –show-user : Display username of the process owner -c, –show-cmd : Display the process name -f, –show-full-cmd : Display full command and cpu stats […]

Read more

A Python library intended to liberate data scientists and machine learning engineers

lazycluster is a Python library intended to liberate data scientists and machine learning engineers by abstracting away cluster management and configuration so that they are able to focus on their actual tasks. Especially, the easy and convenient cluster setup with Python for various distributed machine learning frameworks is emphasized. Highlights High-Level API for starting clusters: DASK Hyperopt More lazyclusters (e.g. Ray, PyTorch, Tensorflow, Horovod, Spark) to come … Lower-level API for: Managing Runtimes or RuntimeGroups to: A-/synchronously execute RuntimeTasks by […]

Read more

TensorFrames lets you manipulate Apache Spark’s DataFrames with TensorFlow programs

Note:  TensorFrames is deprecated. You can use pandas UDF instead. Experimental TensorFlow binding for Scala andApache Spark. TensorFrames (TensorFlow on Spark DataFrames) lets you manipulate Apache Spark’s DataFrames withTensorFlow programs. This package is experimental and is provided as a technical preview only. While theinterfaces are all implemented and working, there are still some areas of low performance. Supported platforms: This package only officially supports linux 64bit platforms as a target.Contributions are welcome for other platforms. See the file project/Dependencies.scala for […]

Read more

A novel evolutionary computation framework for rapid prototyping and testing of ideas

DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data structures transparent. It works in perfect harmony with parallelisation mechanisms such as multiprocessing and SCOOP. DEAP includes the following features: Genetic algorithm using any imaginable representation List, Array, Set, Dictionary, Tree, Numpy Array, etc. Genetic programing using prefix trees Loosely typed, Strongly typed Automatically defined functions Evolution strategies (including CMA-ES) Multi-objective optimisation (NSGA-II, NSGA-III, SPEA2, MO-CMA-ES) Co-evolution (cooperative […]

Read more

Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters

Somoclu is a massively parallel implementation of self-organizing maps. It exploits multicore CPUs, it is able to rely on MPI for distributing the workload in a cluster, and it can be accelerated by CUDA. A sparse kernel is also included, which is useful for training maps on vector spaces generated in text mining processes. Key features: Fast execution by parallelization: OpenMP, MPI, and CUDA are supported. Multi-platform: Linux, macOS, and Windows are supported. Planar and toroid maps. Rectangular and hexagonal […]

Read more

A PyTorch library for decentralized deep learning across the Internet

Hivemind: decentralized deep learning in PyTorch Hivemind is a PyTorch library for decentralized deep learning across the Internet. Its intended usage is training one large model on hundreds of computers from different universities, companies, and volunteers. Key Features Distributed training without a master node: Distributed Hash Table allows connecting computers in a decentralizednetwork. Fault-tolerant backpropagation: forward and backward passes succeed even if some nodes are unresponsive or take toolong to respond. Decentralized parameter averaging: iteratively aggregate updates from multiple workers […]

Read more

A Python distributed computing library for modern computer clusters

Distributed Computing for AI Made Simple This project is experimental and the APIs are not considered stable. Fiber is a Python distributed computing library for modern computer clusters. It is easy to use. Fiber allows you to write programs that run on a computer cluster level without the need to dive into the details of computer cluster. It is easy to learn. Fiber provides the same API as Python’s standard multiprocessing library that you are familiar with. If you know […]

Read more

A high performance and generic framework for distributed DNN training

BytePS is a high performance and general distributed training framework. It supports TensorFlow, Keras, PyTorch, and MXNet, and can run on either TCP or RDMA network. BytePS outperforms existing open-sourced distributed training frameworks by a large margin. For example, on BERT-large training, BytePS can achieve ~90% scaling efficiency with 256 GPUs (see below), which is much higher than Horovod+NCCL. In certain scenarios, BytePS can double the training speed compared with Horovod+NCCL. Performance We show our experiment on BERT-large training, which […]

Read more

A lightweight tool for submitting Python functions for computation within a Slurm cluster

What is submitit? Submitit is a lightweight tool for submitting Python functions for computation within a Slurm cluster.It basically wraps submission and provide access to results, logs and more.Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.Submitit allows to switch seamlessly between executing on Slurm or locally. An example is worth a thousand words: performing an addition From inside an environment with submitit installed: import submitit def add(a, […]

Read more
1 518 519 520 521 522 973