Scaling early detection of esophageal cancer with AI

Microsoft Research and Cyted have collaborated to build novel AI models (opens in new tab) to scale the early detection of esophageal cancer. The AI-supported methods demonstrated the same diagnostic performance as the existing manual workflow, potentially reducing the pathologist’s workload by up to 63%. Esophageal cancer is the sixth most common cause of cancer deaths worldwide, in part because this disease  

Read more

Harmonizing Data: A Symphony of Segmenting, Concatenating, Pivoting, and Merging

In the world of data science, where raw information swirls in a cacophony of numbers and variables, lies the art of harmonizing data. Like a maestro conducting a symphony, the skilled data scientist orchestrates the disparate elements of datasets, weaving them together into a harmonious composition of insights. Welcome to a journey where data transcends mere numbers and, instead, transforms into a vibrant melody of patterns and revelations. Let’s explore the intricacies of segmenting, concatenating, pivoting, and merging data using […]

Read more

Improving LLM understanding of structured data and exploring advanced prompting methods

This research paper was presented at the 17th ACM International Conference on Web Search and Data Mining (opens in new tab) (WSDM 2024), the premier conference on web-inspired research on search and data mining. In today’s data-driven landscape, tables are indispensable for organizing and presenting information, particularly text. They streamline repetitive content, enhance  

Read more

Beyond SQL: Transforming Real Estate Data into Actionable Insights with Pandas

In the realm of data analysis, SQL stands as a mighty tool, renowned for its robust capabilities in managing and querying databases. However, Python’s pandas library brings SQL-like functionalities to the fingertips of analysts and data scientists, enabling sophisticated data manipulation and analysis without the need for a traditional SQL database. This exploration delves into applying SQL-like functions within Python to dissect and understand data, using the Ames Housing dataset as your canvas. The Ames Housing dataset, a comprehensive compilation […]

Read more

Skewness Be Gone: Transformative Tricks for Data Scientists

Data transformations enable data scientists to refine, normalize, and standardize raw data into a format ripe for analysis. These transformations are not merely procedural steps; they are essential in mitigating biases, handling skewed distributions, and enhancing the robustness of statistical models. This post will primarily focus on how to address skewed data. By focusing on the ‘SalePrice’ and ‘YearBuilt’ attributes from the Ames housing dataset, we will provide examples of positive and negative skewed data and illustrate ways to normalize […]

Read more

Highlights from Machine Translation and Multilinguality in February 2024

With a new month, here are a few papers that I noticed on arXiv in February. Linear-time Minimum Bayes Risk Decoding with Reference Aggregation A preprint from the University of Zurich proposes a linear time version of Minimum Bayes Risk (MBR) decoding in machine translation. This decoding algorithm does not aim to generate the most probable sequence given the model but the most typical one. This is typically done by sampling dozens of candidate output sentences, from which we select […]

Read more

Research Forum Episode 2: Transforming health care and the natural sciences, AI and society, and the evolution of foundational AI technologies

Research advances are driving real-world impact faster than ever. Recent developments in AI are reshaping the way people live, work, and think. In the latest episode of Microsoft Research Forum (opens in new tab), we explore how AI is transforming health care and the natural sciences, the intersection of AI and society, and the continuing evolution of foundational AI technologies.  Below is a brief recap of the event, including select quotes from the presentations. Full replays of each session and […]

Read more

Build an LLM RAG Chatbot With LangChain

You’ve likely interacted with large language models (LLMs), like the ones behind OpenAI’s ChatGPT, and experienced their remarkable ability to answer questions, summarize documents, write code, and much more. While LLMs are remarkable by themselves, with a little programming knowledge, you can leverage libraries like LangChain to create your own LLM-powered chatbots that can do just about anything. In an enterprise setting, one of the most popular ways to create an LLM-powered chatbot is through retrieval-augmented generation (RAG). When you […]

Read more

Research Focus: Week of March 4, 2024

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft. NEW RESEARCH Generative Kaleidoscopic Networks Neural networks are deep learning models that can be trained to learn complex patterns and relationships within data. In a recent paper: Generative Kaleidoscopic Networks, researchers from Microsoft detail how they discovered an “over-generalization” phenomenon, which indicates that the neural networks tend to  

Read more

Creating Asynchronous Tasks With Celery and Django

You’ve built a shiny Django app and want to release it to the public, but you’re worried about time-intensive tasks that are part of your app’s workflow. You don’t want your users to have a negative experience navigating your app. You can integrate Celery to help with that. Celery is a distributed task queue for UNIX systems. It allows you to offload work from your Python app. Once you integrate Celery into your app, you can send time-intensive tasks to […]

Read more
1 3 4 5 6 7 858