One-Hot Encoding in Python with Pandas and Scikit-Learn

Introduction In computer science, data can be represented in a lot of different ways, and naturally, every single one of them has its advantages as well as disadvantages in certain fields. Since computers are unable to process categorical data as these categories have no meaning for them, this information has to be prepared if we want a computer to be able to process it. This action is called preprocessing. A big part of preprocessing is encoding – representing every single […]

Read more

Python Scikit-learn to simplify Machine learning : { Bag of words } To [ TF-IDF ]

Text (word) analysis and tokenized text modeling always give a chill air around ears, specially when you are new to machine learning. Thanks to Python and its extended libraries for its warm support around text analytics and machine learning. Scikit-learn is a savior and excellent support in text processing when you also understand some of the concept like “Bag of word”, “Clustering” and “vectorization”. Vectorization is  must-to-know technique for all machine leaning learners, text miner and algorithm implementor. I personally consider […]

Read more

What’s That Beer Style? Ask a Neighbor, or Two

Beer is delicious but it is not one thing. If you disagree with the former part of the previous sentence please keep the latter in mind[1]. Think of sports, for instance. Many would agree with the blanket statement “sports are fun” but depending on what you have in mind two people can easily have opposite reactions to being presented the opportunity to play ping-pong. Sports are not one thing, music is not one thing, and neither is beer. Presented with […]

Read more

How to automatically create Base Line Estimators using scikit learn.

For any machine learning problem, say a classifier in this case, it’s always handy to create quickly a base line classifier against which we can compare our new models. You don’t want to spend a lot of time creating these base line classifiers; you would rather spend that time in building and validating new features for your final model. In this post we will see how we can rapidly create base line classifier using scikit learn package for any dataset. […]

Read more

scikit-learn: Save and Restore Models

On many occasions, while working with the scikit-learn library, you’ll need to save your prediction models to file, and then restore them in order to reuse your previous work to: test your model on new data, compare multiple models, or anything else. This saving procedure is also known as object serialization – representing an object with a stream of bytes, in order to store it on disk, send it over a network or save to a database, while the restoring […]

Read more

Using Machine Learning to Predict the Weather: Part 1

Part 1: Collecting Data From Weather Underground This is the first article of a multi-part series on using Python and Machine Learning to build models to predict weather temperatures based off data collected from Weather Underground. The series will be comprised of three different articles describing the major aspects of a Machine Learning project. The topics to be covered are: Data collection and processing (this article) Linear regression models (article 2) Neural network models (article 3) The data used in […]

Read more

Using Machine Learning to Predict the Weather: Part 2

This article is a continuation of the prior article in a three part series on using Machine Learning in Python to predict weather temperatures for the city of Lincoln, Nebraska in the United States based off data collected from Weather Underground’s API services. In the first article of the series, Using Machine Learning to Predict the Weather: Part 1, I described how to extract the data from Weather Underground, parse it, and clean it. For a summary of the topics […]

Read more

Using Machine Learning to Predict the Weather: Part 3

This is the final article on using machine learning in Python to make predictions of the mean temperature based off of meteorological weather data retrieved from Weather Underground as described in part one of this series. The topic of this final article will be to build a neural network regressor using Google’s Open Source TensorFlow library. For a general introduction into TensorFlow, as well a discussion of installation methods, please see Mihajlo Pavloski’s excellent post TensorFlow Neural Network Tutorial. Topics […]

Read more

K-Means Clustering with Scikit-Learn

Introduction K-means clustering is one of the most widely used unsupervised machine learning algorithms that forms clusters of data based on the similarity between data instances. For this particular algorithm to work, the number of clusters has to be defined beforehand. The K in the K-means refers to the number of clusters. The K-means algorithm starts by randomly choosing a centroid value for each cluster. After that the algorithm iteratively performs three steps: (i) Find the Euclidean distance between each […]

Read more

Introduction to Neural Networks with Scikit-Learn

What is a Neural Network? Humans have an ability to identify patterns within the accessible information with an astonishingly high degree of accuracy. Whenever you see a car or a bicycle you can immediately recognize what they are. This is because we have learned over a period of time how a car and bicycle looks like and what their distinguishing features are. Artificial neural networks are computation systems that intend to imitate human learning capabilities via a complex architecture that […]

Read more
1 2 3 4