Research Focus: Week of January 23, 2023

Welcome to Research Focus, a new series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft. In this article Revolutionizing Document AI with multimodal document foundation models   Organizations must digitize various documents, many with charts and images, to manage and streamline essential functions. Yet manually digitized documents are often of uneven  

Read more

Biomedical Research Platform Terra Now Available on Microsoft Azure

We stand at the threshold of a new era of precision medicine, where health and life sciences data hold the potential to dramatically propel and expand our understanding and treatment of human disease. One of the tools that we believe will help to enable precision medicine is Terra, the secure biomedical research platform co-developed by Broad Institute of MIT and Harvard, Microsoft, and Verily. Today, we are excited to share that Terra is available for preview on Microsoft Azure. Starting […]

Read more

Python Basics: Object-Oriented Programming

OOP, or object-oriented programming, is a method of structuring a program by bundling related properties and behaviors into individual objects. Conceptually, objects are like the components of a system. Think of a program as a factory assembly line of sorts. At each step of the assembly line, a system component processes some material, ultimately transforming raw material into a finished product. An object contains both data, like the raw or preprocessed materials at each step on an assembly line, and […]

Read more

Linear Algebra in Python: Matrix Inverses and Least Squares

Linear algebra is an important topic across a variety of subjects. It allows you to solve problems related to vectors, matrices, and linear equations. In Python, most of the routines related to this subject are implemented in scipy.linalg, which offers very fast linear algebra capabilities. In particular, linear models play an important role in a variety of real-world problems, and scipy.linalg provides tools to compute them in an efficient way. In this tutorial, you’ll learn how to: Study linear systems […]

Read more

Some reasons to avoid Cython

If you need to speed up Python, Cython is a very useful tool. It lets you seamlessly merge Python syntax with calls into C or C++ code, making it easy to write high-performance extensions with rich Python interfaces. That being said, Cython is not the best tool in all circumstances. So in this article I’ll go over some of the limitations and problems with Cython, and suggest some alternatives. A quick overview of Cython In case you’re not familiar with […]

Read more

Why don’t people use character-level MT? – One year later

In this post, I comment on our (i.e., myself, Helmut Schmid and Alex Fraser) year-old paper “Why don’t people use character-level machine translation,” published in Findings of ACL 2022. Here, I will (besides briefly summarizing the paper’s main message) mostly comment on what I learned while working on the one-year-later perspective, focusing more on what I would do differently now. If you are interested in the exact research content, read the paper or watch a 5-minute presentation. Paper TL;DR Doing […]

Read more

EasyOCR Python Tutorial with Examples

Introduction EasyOCR is a Python library for Optical Character Recognition (OCR) that allows you to easily extract text from images and scanned documents. In this tutorial, we will understand the basics of using the Python EasyOCR package with examples to show how to extract text from images along with various parameter settings. EasyOCR Python Package Overview Reader Class EasyOCR Python package consists of the base

Read more

Python Basics Exercises: File System Operations

In Python Basics: File System Operations, you learned how to use Python to work with files and folders. As a programmer, you’ll use the pathlib and shutil modules to complete file system operations without relying on your graphical user interface (GUI). While you already got lots of hands-on practice with file system operations, programmers never stop training! The more you use your new skills, the more comfortable you’ll be when it’s time to put them to work in your own […]

Read more

Python’s Assignment Operator: Write Robust Assignments

Python’s assignment operators allow you to define assignment statements. This type of statement lets you create, initialize, and update variables throughout your code. Variables are a fundamental cornerstone in every piece of code, and assignment statements give you complete control over variable creation and mutation. Learning about the Python assignment operator and its use for writing assignment statements will arm you with powerful tools for writing better and more robust Python code. Assignment Statements and the Assignment Operator One of […]

Read more

Why Polars uses less memory than Pandas

Processing large amounts of data with Pandas can be difficult; it’s quite easy to run out of memory and either slow down or crash. The Polars dataframe library is a potential solution. While Polars is mostly known for running faster than Pandas, if you use it right it can sometimes also significantly reduce memory usage compared to Pandas. In particular, certain techniques that you need to do manually in Pandas can be done automatically in Polars, allowing you to process […]

Read more
1 160 161 162 163 164 989