# Generating Synthetic Data with Numpy and Scikit-Learn

Introduction In this tutorial, we’ll discuss the details of generating different synthetic datasets using Numpy and Scikit-learn libraries. We’ll see how different samples can be generated from various distributions with known parameters. We’ll also discuss generating datasets for different purposes, such as regression, classification, and clustering. At the end we’ll see how we can generate a dataset that mimics the distribution of an existing dataset. The Need for Synthetic Data In data science, synthetic data plays a very important role. […]

# Kernel Density Estimation in Python Using Scikit-Learn

Introduction This article is an introduction to kernel density estimation using Python’s machine learning library scikit-learn. Kernel density estimation (KDE) is a non-parametric method for estimating the probability density function of a given random variable. It is also referred to by its traditional name, the Parzen-Rosenblatt Window method, after its discoverers. Given a sample of independent, identically distributed (i.i.d) observations ((x_1,x_2,ldots,x_n)) of a random variable from an unknown source distribution, the kernel density estimate, is given by: \$\$p(x) = frac{1}{nh} […]

# Python for NLP: Sentiment Analysis with Scikit-Learn

This is the fifth article in the series of articles on NLP for Python. In my previous article, I explained how Python’s spaCy library can be used to perform parts of speech tagging and named entity recognition. In this article, I will demonstrate how to do sentiment analysis using Twitter data using the Scikit-Learn library. Sentiment analysis refers to analyzing an opinion or feelings about something using data like text or images, regarding almost anything. Sentiment analysis helps companies in […]

# Python for NLP: Topic Modeling

This is the sixth article in my series of articles on Python for NLP. In my previous article, I talked about how to perform sentiment analysis of Twitter data using Python’s Scikit-Learn library. In this article, we will study topic modeling, which is another very important application of NLP. We will see how to do topic modeling with Python. What is Topic Modeling Topic modeling is an unsupervised technique that intends to analyze large volumes of text data by clustering […]

# Predicting Customer Ad Clicks via Machine Learning

Introduction Internet marketing has taken over traditional marketing strategies in the recent past. Companies prefer to advertise their products on websites and social media platforms. However, targeting the right audience is still a challenge in online marketing. Spending millions to display the advertisement to the audience that is not likely to buy your products can be costly. In this article, we will work with the advertising data of a marketing agency to develop a machine learning algorithm that predicts if […]

# Multiple Linear Regression with Python

Introduction Linear regression is one of the most commonly used algorithms in machine learning. You’ll want to get familiar with linear regression because you’ll need to use it if you’re trying to measure the relationship between two or more continuous values. A deep dive into the theory and implementation of linear regression will help you understand this valuable machine learning algorithm. Defining Terms Before we delve into linear regression, let’s take a moment to make sure we are clear on […]

# Gradient Boosting Classifiers in Python with Scikit-Learn

Introduction Gradient boosting classifiers are a group of machine learning algorithms that combine many weak learning models together to create a strong predictive model. Decision trees are usually used when doing gradient boosting. Gradient boosting models are becoming popular because of their effectiveness at classifying complex datasets, and have recently been used to win many Kaggle data science competitions. The Python machine learning library, Scikit-Learn, supports different implementations of gradient boosting classifiers, including XGBoost. In this article we’ll go over […]

# Dimensionality Reduction in Python with Scikit-Learn

Introduction In machine learning, the performance of a model only benefits from more features up until a certain point. The more features are fed into a model, the more the dimensionality of the data increases. As the dimensionality increases, overfitting becomes more likely. There are multiple techniques that can be used to fight overfitting, but dimensionality reduction is one of the most effective techniques. Dimensionality reduction selects the most important components of the feature space, preserving them and dropping the […]