Optimal Scraping Technique: CSS Selector, XPath, & RegEx

Web scraping deals with HTML almost exclusively. In nearly all cases, what is required is a small sample from a very large file (e.g. pricing information from an ecommerce page). Therefore, an essential part of scraping is searching through an HTML document and finding the correct information. How that should be done is the matter of some debate, preferences, experience, and types of data. While all scraping and parsing methods are “correct”, some of them have benefits that may be […]

Read more

Understanding the Complexity of Metaclasses and their Practical Applications

Metaprogramming is a collection of programming techniques which focus on ability of programs to introspect themselves, understand their own code and modify themselves. Such approach to programming gives programmers a lot of power and flexibility. Without metaprogramming techniques, we probably wouldn’t have modern programming frameworks, or those frameworks would be way less expressive.  This article is an excerpt from the book, Expert Python Programming, Fourth Edition by Michal Jaworski and Tarek Ziade – A book that expresses many years of professional experience in building all kinds of applications […]

Read more

The Top Skills for a Career in Datascience in 2021

Datascience is exploding in popularity due to how it’s tethered to the future of technology, supply-demand for high paying jobs and being on the bleeding edge of corporate culture, startups and innovation! Students from South and East Asia especially can fast track lucrative technology careers with data science even as tech startups are exploding in those areas with increased foreign funding. Think carefully. Would you consider becoming a Data Scientist? According to Coursera: A data scientist might do the following […]

Read more

Data Science Trends of the Future 2022

Photo credit: Unsplash. Data Science is an exciting field for knowledge workers because it increasingly intersects with the future of how industries, society, governance and policy will function. While it’s one of those vague terms thrown around a lot for students, it’s actually fairly simple to define. Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data, and apply knowledge and actionable insights from data across […]

Read more

Build an end-end Currency Convertor chatbot with Python and Dialogflow

This article was published as a part of the Data Science Blogathon Introduction Hello all, Hope you are fine. In this tutorial we will learn how to create chatbots using Dialogflow and python, as well we will learn the deployment of chatbots to telegram. In our previous articles, we have learned to create a simple rule-based chatbot using simple python and NLTK libraries. I would like to request you to have a look at the article creating a simple chatbot […]

Read more

Malawi News Classification -An NLP Project

Classifying Malawi News articles into 19 different classes using SMOTE and SGDClassifier. Introduction Text classification is common among the application that we use on daily basis. For example, email providers use text classification to filter out spam emails from your inbox. The other most common use of text classification is in customer care where they use sentimental analysis to differentiate bad reviews from good reviews ADDI AI 2050. The modern use of text classification list goes on as we have excelled to […]

Read more

Identifying The Language of A Document Using NLP!

This article was published as a part of the Data Science Blogathon Introduction The goal of this article is to identify the language from the written text. The text in documents is available in many languages and when we don’t know the language it becomes very difficult sometimes to tell this to google translator as well. For most translators, we have to tell both the input language and the desired language. If you had a text written in Spanish and you […]

Read more

Performing Sentiment Analysis Using Twitter Data!

Photo by Daddy Mohlala on Unsplash Data is water, purifying to make it edible is a role of Data Analyst – Kashish Rastogi We are going to clean the twitter text data and visualize data in this blog. Table Of Contents: Problem Statement Data Description Cleaning text with NLP Finding if the text has: with spacy Cleaning text with preprocessor library Analysis of the sentiment of data Data visualizing   I am taking the twitter data which is available here on […]

Read more

S2, A next generation data science toolbox

  We have created a language that is faster than python in every way, works with the entire Java ecosystem (such as the Spring framework, Eclipse and many more) and can be deployed into embedded devices seamlessly, allowing you to collect and process data from pretty much any device you want even without internet. Our language comes built-in with mathematical libraries necessary for any data scientist, from basic math like Linear Algebra and Statistics to Digital Signal Processing and Time […]

Read more

Bag-of-words vs TFIDF vectorization –A Hands-on Tutorial

This article was published as a part of the Data Science Blogathon Whenever we apply any algorithm to textual data, we need to convert the text to a numeric form. Hence, there arises a need for some pre-processing techniques that can convert our text to numbers. Both bag-of-words (BOW) and TFIDF are pre-processing techniques that can generate a numeric form from an input text. Bag-of-Words: The bag-of-words model converts text into fixed-length vectors by counting how many times each word appears. […]

Read more
1 2 3 54