Basic Statistical Analysis with NumPy

Basic Statistical Analysis with NumPy Introduction Statistical analysis is important in data science. It helps us understand data better. NumPy is a key Python library for numerical operations. It simplifies and speeds up this process. In this article, we will explore several functions for basic statistical analysis offered by NumPy. NumPy is a Python library for numerical computing. It helps with working on arrays and mathematical functions. It makes calculations faster and easier. NumPy is essential for data analysis and […]

Read more

A Gentle Introduction to Bayesian Statistics

Image by Pexels (Photo by Balázs Utasi) Bayesian statistics constitute one of the not-so-conventional subareas within statistics, based on a particular vision of the concept of probabilities. This post introduces and unveils what bayesian statistics is and its differences from frequentist statistics, through a gentle and predominantly non-technical narrative that will awaken your curiosity about this fascinating topic. Introduction Statistics constitutes an invaluable set of methods and tools for analyzing and making decisions based on data. Their application in various […]

Read more

The Top Skills for a Career in Datascience in 2021

Datascience is exploding in popularity due to how it’s tethered to the future of technology, supply-demand for high paying jobs and being on the bleeding edge of corporate culture, startups and innovation! Students from South and East Asia especially can fast track lucrative technology careers with data science even as tech startups are exploding in those areas with increased foreign funding. Think carefully. Would you consider becoming a Data Scientist? According to Coursera: A data scientist might do the following […]

Read more

Stocks, Significance Testing & p-Hacking: How volatile is volatile?

October is historically the most volatile month for stocks, but is this a persistent signal or just noise in the data? Stocks, Significance Testing & p-Hacking. Follow me on Twitter (twitter.com/pdquant) for more. Over the past 32 years, October has been the most volatile month on average for the S&P500 and December the least, in this article we will use simulation to assess the statistical significance of this observation and to what extent this observation could occur by chance. All code […]

Read more

10 Statistical Functions in Excel every Analytics Professional Should Know

Overview Microsoft Excel is an excellent tool for learning and executing statistical functions Here are 12 statistical functions in Excel that you should master for a successful analytics career   Let’s Excel in Statistics! “Statistics is the grammar of Science.” – Karl  Pearson Let’s make that a bit more relevant for us – Statistics is the grammar of “Data” Science. You’ll notice that almost every successful data science professional or analytics professional has a solid understanding of statistics – but […]

Read more

Crash Course in Statistics for Machine Learning

Last Updated on August 15, 2020 You do not need to know statistics before you can start learning and applying machine learning. You can start today. Nevertheless, knowing some statistics can be very helpful to understand the language used in machine learning. Knowing some statistics will eventually be required when you want to start making strong claims about your results. In this post you will discover a few key concepts from statistics that will give you the confidence you need […]

Read more

Machine Learning Terminology from Statistics and Computer Science

Last Updated on August 8, 2019 Data plays a big part in machine learning. It is important to understand and use the right terminology when talking about data. In this post you will discover exactly how to describe and talk about data in machine learning. After reading this post you will know the terminology and nomenclature used in machine learning to describe data. Kick-start your project with my new book Statistics for Machine Learning, including step-by-step tutorials and the Python source […]

Read more

Estimate the Number of Experiment Repeats for Stochastic Machine Learning Algorithms

Last Updated on August 14, 2020 A problem with many stochastic machine learning algorithms is that different runs of the same algorithm on the same data return different results. This means that when performing experiments to configure a stochastic algorithm or compare algorithms, you must collect multiple results and use the average performance to summarize the skill of the model. This raises the question as to how many repeats of an experiment are enough to sufficiently characterize the skill of […]

Read more

How to Use Statistical Significance Tests to Interpret Machine Learning Results

Last Updated on August 8, 2019 It is good practice to gather a population of results when comparing two different machine learning algorithms or when comparing the same algorithm with different configurations. Repeating each experimental run 30 or more times gives you a population of results from which you can calculate the mean expected performance, given the stochastic nature of most machine learning algorithms. If the mean expected performance from two algorithms or configurations are different, how do you know […]

Read more

How to Report Classifier Performance with Confidence Intervals

Last Updated on August 14, 2020 Once you choose a machine learning algorithm for your classification problem, you need to report the performance of the model to stakeholders. This is important so that you can set the expectations for the model on new data. A common mistake is to report the classification accuracy of the model alone. In this post, you will discover how to calculate confidence intervals on the performance of your model to provide a calibrated and robust […]

Read more
1 2 3 6