Python tutorials

GIN: How to Design the Most Powerful Graph Neural Network

Graph Neural Networks are not limited to classifying nodes. One of the most popular applications is graph classification. This is a common task when dealing with molecules: they are represented as graphs and features about each atom (node) can be used to predict the behavior of the entire molecule. However, GNNs only learn node embeddings. How to combine them in order to produce an entire graph embedding? In this article, we will: See a new type of layer, called “global […]

Read more

Introduction to Constraint Programming in Python

Constraint Programming is a technique to find every solution that respects a set of predefined constraints. It is an invaluable tool for data scientists to solve a huge variety of problems, such as scheduling, timetabling, sequencing, etc. In this article, we’ll see how to use CP in two different ways: Satisfiability: the goal is to find one or multiple feasible solutions (i.e., solutions that respect our constraints) by narrowing down a large set of potential solutions; Optimization: the goal is […]

Read more

Optimize Your Marketing Budget with Nonlinear Programming

In the age of digital marketing, businesses face the challenge of allocating their marketing budget across multiple channels to maximize sales. However, as they broaden their reach, these firms inevitably face the issue of diminishing returns – the phenomenon where additional investment in a marketing channel yields progressively smaller increases in conversions. This is where the concept of marketing budget allocation steps in, adding another layer of complexity to the whole process. In this article, we’re going to explore the […]

Read more

Decoding Strategies in Large Language Models

In the fascinating world of large language models (LLMs), much attention is given to model architectures, data processing, and optimization. However, decoding strategies like beam search, which play a crucial role in text generation, are often overlooked. In this article, we will explore how LLMs generate text by delving into the mechanics of greedy search and beam search, as well as sampling techniques with top-k and nucleus sampling. By the conclusion of this article, you’ll not only understand these decoding […]

Read more

Improve ChatGPT with Knowledge Graphs

ChatGPT has shown impressive capabilities in processing and generating human-like text. However, it is not without its imperfections. A primary concern is the model’s propensity to produce either inaccurate or obsolete answers, often called “hallucinations.” The New York Times recently highlighted this issue in their article, “Here’s What Happens When Your Lawyer Uses ChatGPT.” It presents a lawsuit where a lawyer leaned heavily on ChatGPT to assist in preparing a court filing for a client suing an airline. The model […]

Read more

4-bit LLM Quantization with GPTQ

Recent advancements in weight quantization allow us to run massive large language models on consumer hardware, like a LLaMA-30B model on an RTX 3090 GPU. This is possible thanks to novel 4-bit quantization techniques with minimal performance degradation, like GPTQ, GGML, and NF4. In the previous article, we introduced naïve 8-bit quantization techniques and the excellent LLM.int8(). In this article, we will explore the popular GPTQ algorithm to understand how it works and implement it using the AutoGPTQ library. You […]

Read more

A Beginner’s Guide to LLM Fine-Tuning

The growing interest in Large Language Models (LLMs) has led to a surge in tools and wrappers designed to streamline their training process. Popular options include FastChat from LMSYS (used to train Vicuna) and Hugging Face’s transformers/trl libraries (used in my previous article). In addition, each big LLM project, like WizardLM, tends to have its own training script, inspired by the original Alpaca implementation. In this article, we will use Axolotl, a tool created by the OpenAccess AI Collective. We […]

Read more

ExLlamaV2: The Fastest Library to Run LLMs

Quantizing Large Language Models (LLMs) is the most popular approach to reduce the size of these models and speed up inference. Among these techniques, GPTQ delivers amazing performance on GPUs. Compared to unquantized models, this method uses almost 3 times less VRAM while providing a similar level of accuracy and faster generation. It became so popular that it has recently been directly integrated into the transformers library. ExLlamaV2 is a library designed to squeeze even more performance out of GPTQ. […]

Read more
1 7 8 9 10 11 167