The Top Skills for a Career in Datascience in 2021

Datascience is exploding in popularity due to how it’s tethered to the future of technology, supply-demand for high paying jobs and being on the bleeding edge of corporate culture, startups and innovation! Students from South and East Asia especially can fast track lucrative technology careers with data science even as tech startups are exploding in those areas with increased foreign funding. Think carefully. Would you consider becoming a Data Scientist? According to Coursera: A data scientist might do the following […]

Read more

Matplotlib Stack Plot – Tutorial and Examples

Introduction There are many data visualization libraries in Python, yet Matplotlib is the most popular library out of all of them. Matplotlib’s popularity is due to its reliability and utility – it’s able to create both simple and complex plots with little code. You can also customize the plots in a variety of ways. In this tutorial, we’ll cover how to plot Stack Plots in Matplotlib. Stack Plots are used to plot linear data, in a vertical order, stacking each […]

Read more

When is programming needed in most leading Self Service configurations

To all Data Analysts big and small: Many Corporates typically have Self service BI and DWH solutions ( I am asking only about those who did NOT build an inhouse solution) :  -When is programming needed in most leading Self Service configurations? -When do analysts and Business executives require coding and programming when the Self service application, slice and dice, filtering and fields are not enough?! – IN SOME PLACES, us junior analysts are getting a feeling (that may be […]

Read more

Python: How to Flatten a List of Lists

Introduction A list is the most flexible data structure in Python. Whereas, a 2D list which is commonly known as a list of lists, is a list object where every item is a list itself – for example: [[1,2,3], [4,5,6], [7,8,9]]. Flattening a list of lists entails converting a 2D list into a 1D list by un-nesting each list item stored in the list of lists – i.e., converting [[1, 2, 3], [4, 5, 6], [7, 8, 9]] into [1, […]

Read more

Generating Synthetic Data with Numpy and Scikit-Learn

Introduction In this tutorial, we’ll discuss the details of generating different synthetic datasets using Numpy and Scikit-learn libraries. We’ll see how different samples can be generated from various distributions with known parameters. We’ll also discuss generating datasets for different purposes, such as regression, classification, and clustering. At the end we’ll see how we can generate a dataset that mimics the distribution of an existing dataset. The Need for Synthetic Data In data science, synthetic data plays a very important role. […]

Read more

Remove Element from an Array in Python

Introduction This tutorial will go through some common ways for removing elements from Python arrays. Here’s a list of all the techniques and methods we’ll cover in this article: Arrays in Python Arrays and lists are not the same thing in Python. Although lists are more commonly used than arrays, the latter still have their use cases. The main difference between the two is that lists can be used to store arbitrary values. They are also heterogeneous which means they […]

Read more

Solving Systems of Linear Equations with Python’s Numpy

The Numpy library can be used to perform a variety of mathematical/scientific operations such as matrix cross and dot products, finding sine and cosine values, Fourier transform and shape manipulation, etc. The word Numpy is short-hand notation for “Numerical Python”. In this article, you will see how to solve a system of linear equations using Python’s Numpy library. What is a System of Linear Equations? Wikipedia defines a system of linear equations as: In mathematics, a system of linear equations […]

Read more

Dimensionality Reduction in Python with Scikit-Learn

Introduction In machine learning, the performance of a model only benefits from more features up until a certain point. The more features are fed into a model, the more the dimensionality of the data increases. As the dimensionality increases, overfitting becomes more likely. There are multiple techniques that can be used to fight overfitting, but dimensionality reduction is one of the most effective techniques. Dimensionality reduction selects the most important components of the feature space, preserving them and dropping the […]

Read more

An indispensable Python : Data sourcing to Data science.

Data analysis echo system has grown all the way from SQL’s to NoSQL and from Excel analysis to Visualization. Today, we are in scarceness of the resources to process ALL (You better understand what i mean by ALL) kind of data that is coming to enterprise. Data goes through profiling, formatting, munging or cleansing, pruning, transformation steps to analytics and predictive modeling. Interestingly, there is no one tool proved to be an effective solution to run all these operations { Don’t forget the […]

Read more