Building Blocks for Foundation Model Training and Inference on AWS

For a long time, “scaling” in foundation models mostly meant one thing: spend more compute on pre-training and capabilities rise. That intuition was supported by empirical work such as Kaplan et al. (2020), which reported predictable power-law trends in loss as you scale model parameters, dataset size, and training compute. In practice, these trends justified sustained investment in large-scale accelerator capacity and the surrounding distributed infrastructure needed to keep it efficiently utilized. But the frontier has evolved—and scaling is no […]

Read more

Unlocking asynchronicity in continuous batching

TL;DR: we explain how to separate CPU and GPU workloads to get a massive performance boost for inference. This is the second post in a series on efficient LLM inference. The first post covered continuous batching from first principles. It introduces some concepts we build upon: KV cache, FlashAttention, attention masks, etc. An H200 costs around $5 an hour on Inference Endpoints. That’s cheap for an hour, but use it for a day and you are already paying $120. If […]

Read more

Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality

TL;DR: Two new Apache 2.0 multilingual embedding models built on ModernBERT — a 97M-parameter compact model that beats every open sub-100M multilingual embedder on MTEB Multilingual Retrieval (60.3), and a 311M full-size model that scores 65.2 on MTEB Multilingual Retrieval (#2 among open models under 500M parameters) with Matryoshka support. Both cover 200+ languages, are tuned on 52 languages, handle 32K-token context (64x R1), and add code retrieval across 9 programming languages. In this post: Enterprise-Ready by Design · A […]

Read more

Further Notes on Our Recent Research on AI Delegation and Long-Horizon Reliability

Our recent paper, “LLMs Corrupt Your Documents When You Delegate”, has generated discussion about the reliability of AI systems in delegated workflows. We appreciate the interest in this work and want to clarify several important points about what the paper does—and does not—claim. The research aims to develop robust evaluation methods for long-horizon delegated and collaborative tasks. More broadly, this work reflects an ongoing effort to better understand the gap between strong benchmark performance and certain real-world tasks. Using a […]

Read more

Quiz: Python’s Array: Working With Numeric Data Efficiently

Interactive Quiz ⋅ 12 QuestionsBy Joseph Peart Share In this quiz, you’ll test your understanding of Python’s Array: Working With Numeric Data Efficiently. By working through this quiz, you’ll revisit the differences between Python’s array module and the built-in list, the meaning of type codes, how to create and manipulate arrays as mutable sequences, and the performance trade-offs of using a low-level numeric container. The quiz contains 12 questions and there is no time limit. You’ll get 1 point for […]

Read more

Quiz: Python Metaclasses

Interactive Quiz ⋅ 8 QuestionsBy Joseph Peart Share In this quiz, you’ll test your understanding of Python Metaclasses. Metaclasses sit behind every class you write in Python, and they’re one of the language’s deeper object-oriented concepts. By working through this quiz, you’ll revisit how classes are themselves objects, how type creates them, and how a custom metaclass lets you customize class creation. You’ll also reflect on when a custom metaclass is actually the right tool and when a simpler technique […]

Read more

Quiz: Cursor vs Windsurf: Which AI Code Editor Is Best for Python?

Interactive Quiz ⋅ 10 QuestionsBy Joseph Peart Share In this quiz, you’ll test your understanding of Cursor vs Windsurf: Which AI Code Editor Is Best for Python? By working through these questions, you’ll revisit how the two editors differ across code completion, agentic multi-file editing, and debugging. You’ll also reconnect with the audit points worth applying whenever an AI agent writes Python on your behalf. The quiz contains 10 questions and there is no time limit. You’ll get 1 point […]

Read more

Quiz: Cursor vs Windsurf: Which AI Code Editor Is Best for Python?

Interactive Quiz ⋅ 10 QuestionsBy Joseph Peart Share In this quiz, you’ll test your understanding of Cursor vs Windsurf: Which AI Code Editor Is Best for Python? By working through these questions, you’ll revisit how the two editors differ across code completion, agentic multi-file editing, and debugging. You’ll also reconnect with the audit points worth applying whenever an AI agent writes Python on your behalf. The quiz contains 10 questions and there is no time limit. You’ll get 1 point […]

Read more

How to Use OpenCode for AI-Assisted Python Coding

OpenCode is an open-source AI coding agent that runs in your terminal and lets you analyze and refactor a Python project through conversational commands. In this guide, you’ll install it on your system, set it up with a free Google Gemini API key, and learn the basics of how to use it in your daily programming work. Here’s what OpenCode’s main interface looks like: OpenCode’s Initial Screen OpenCode works as a conversational assistant you explicitly direct. Ask it to   […]

Read more
1 2 3 1,037