The Top Skills for a Career in Datascience in 2021

Datascience is exploding in popularity due to how it’s tethered to the future of technology, supply-demand for high paying jobs and being on the bleeding edge of corporate culture, startups and innovation! Students from South and East Asia especially can fast track lucrative technology careers with data science even as tech startups are exploding in those areas with increased foreign funding. Think carefully. Would you consider becoming a Data Scientist? According to Coursera: A data scientist might do the following […]

Read more

Stocks, Significance Testing & p-Hacking: How volatile is volatile?

October is historically the most volatile month for stocks, but is this a persistent signal or just noise in the data? Stocks, Significance Testing & p-Hacking. Follow me on Twitter (twitter.com/pdquant) for more. Over the past 32 years, October has been the most volatile month on average for the S&P500 and December the least, in this article we will use simulation to assess the statistical significance of this observation and to what extent this observation could occur by chance. All code […]

Read more

10 Statistical Functions in Excel every Analytics Professional Should Know

Overview Microsoft Excel is an excellent tool for learning and executing statistical functions Here are 12 statistical functions in Excel that you should master for a successful analytics career   Let’s Excel in Statistics! “Statistics is the grammar of Science.” – Karl  Pearson Let’s make that a bit more relevant for us – Statistics is the grammar of “Data” Science. You’ll notice that almost every successful data science professional or analytics professional has a solid understanding of statistics – but […]

Read more

Crash Course in Statistics for Machine Learning

Last Updated on August 15, 2020 You do not need to know statistics before you can start learning and applying machine learning. You can start today. Nevertheless, knowing some statistics can be very helpful to understand the language used in machine learning. Knowing some statistics will eventually be required when you want to start making strong claims about your results. In this post you will discover a few key concepts from statistics that will give you the confidence you need […]

Read more

Machine Learning Terminology from Statistics and Computer Science

Last Updated on August 8, 2019 Data plays a big part in machine learning. It is important to understand and use the right terminology when talking about data. In this post you will discover exactly how to describe and talk about data in machine learning. After reading this post you will know the terminology and nomenclature used in machine learning to describe data. Kick-start your project with my new book Statistics for Machine Learning, including step-by-step tutorials and the Python source […]

Read more

Estimate the Number of Experiment Repeats for Stochastic Machine Learning Algorithms

Last Updated on August 14, 2020 A problem with many stochastic machine learning algorithms is that different runs of the same algorithm on the same data return different results. This means that when performing experiments to configure a stochastic algorithm or compare algorithms, you must collect multiple results and use the average performance to summarize the skill of the model. This raises the question as to how many repeats of an experiment are enough to sufficiently characterize the skill of […]

Read more

How to Use Statistical Significance Tests to Interpret Machine Learning Results

Last Updated on August 8, 2019 It is good practice to gather a population of results when comparing two different machine learning algorithms or when comparing the same algorithm with different configurations. Repeating each experimental run 30 or more times gives you a population of results from which you can calculate the mean expected performance, given the stochastic nature of most machine learning algorithms. If the mean expected performance from two algorithms or configurations are different, how do you know […]

Read more

How to Report Classifier Performance with Confidence Intervals

Last Updated on August 14, 2020 Once you choose a machine learning algorithm for your classification problem, you need to report the performance of the model to stakeholders. This is important so that you can set the expectations for the model on new data. A common mistake is to report the classification accuracy of the model alone. In this post, you will discover how to calculate confidence intervals on the performance of your model to provide a calibrated and robust […]

Read more

How to Calculate Bootstrap Confidence Intervals For Machine Learning Results in Python

Last Updated on August 14, 2020 It is important to both present the expected skill of a machine learning model a well as confidence intervals for that model skill. Confidence intervals provide a range of model skills and a likelihood that the model skill will fall between the ranges when making predictions on new data. For example, a 95% likelihood of classification accuracy between 70% and 75%. A robust way to calculate confidence intervals for machine learning algorithms is to […]

Read more

Introduction to Random Number Generators for Machine Learning in Python

Last Updated on July 31, 2020 Randomness is a big part of machine learning. Randomness is used as a tool or a feature in preparing data and in learning algorithms that map input data to output data in order to make predictions. In order to understand the need for statistical methods in machine learning, you must understand the source of randomness in machine learning. The source of randomness in machine learning is a mathematical trick called a pseudorandom number generator. […]

Read more
1 2 3 6