10+ Examples for Using CountVectorizer

Scikit-learn’s CountVectorizer is used to transform a corpora of text to a vector of term / token counts. It also provides the capability to preprocess your text data prior to generating the vector representation making it a highly flexible feature representation module for text. In this article, we are going to go in-depth into the different ways you can use CountVectorizer such that you are not just computing counts of words, but also preprocessing your text data appropriately as well […]

Read more

5 Ways to Improve Productivity in Customer Support with AI

Companies receive support inquiries from various channels. This may include emails, support tickets, tweets, chat conversations with customer support representatives (CSRs), chatbot conversations, and more. Sources of customer service requests This is a lot of data that you are dealing with and it’s mostly unstructured and scattered in nature, making it that much harder to manage. All this text data can actually be leveraged to improve speed in responding to customer service inquiries and reduce the volume of incoming tickets.  According to a research […]

Read more

Text Classification: Best Practices for Real World Applications

Most text classification examples that you see on the Web or in books focus on demonstrating techniques. This will help you build a pseudo usable prototype. If you want to take your classifier to the next level and use it within a product or service workflow, then there are things you need to do from day one to make this a reality. I’ve seen classifiers failing miserably and being replaced with off the shelf solutions  because they don’t work in […]

Read more

HashingVectorizer vs. CountVectorizer

Previously, we learned how to use CountVectorizer for text processing. In place of CountVectorizer, you also have the option of using HashingVectorizer. In this tutorial, we will learn how HashingVectorizer differs from CountVectorizer and when to use which. CountVectorizer vs. HashingVectorizer HashingVectorizer and CountVectorizer are meant to do the same thing. Which is to convert a collection of text documents to a matrix of token occurrences. The difference is that HashingVectorizer does not store the resulting vocabulary (i.e. the unique […]

Read more

Word2Vec: A Comparison Between CBOW, SkipGram & SkipGramSI

Word2Vec is a widely used word representation technique that uses neural networks under the hood. The resulting word representation or embeddings can be used to infer semantic similarity between words and phrases, expand queries, surface related concepts and more. The sky is the limit when it comes to how you can use these embeddings for different NLP tasks. In this article, we will look at how the different neural network architectures for training a Word2Vec model behave in practice. The […]

Read more

Before AI, Invest in A Big Data Strategy

Big data describes the volumes of data that your company generates, every single day. Both structured and unstructured. Analysts at Gartner estimate that more than 80 percent of enterprise data is unstructured. Meaning, they can be text files from IT logs, emails from customer support, direct Twitter messages from customers, and employee complaints to your HR department. This type of diverse and scattered data sources is true of almost every enterprise. A big data strategy, on the other hand, is a glorified term for how […]

Read more

5 Examples of Text Classification in Practice

AI is transforming nearly every industry, and text analysis is a key area of interest. That’s because there’s been an explosion in unstructured text data—nearly 80% of data at most organizations—which is quickly becoming impractical to analyze by humans alone. We’ve already talked about some best practices for building a text classifier, but how can a tool like this help your business? Let’s take a closer look at document classification and some real-world examples. What Is Document Classification? Organizations need […]

Read more

How to Rename Pandas DataFrame Column in Python

Introduction Pandas is a Python library for data analysis and manipulation. Almost all operations in pandas revolve around DataFrames. A Dataframe is is an abstract representation of a two-dimensional table which can contain all sorts of data. They also enable us give all the columns names, which is why oftentimes columns are referred to as attributes or fields when using DataFrames. In this article we’ll see how we can rename an already existing DataFrame‘s columns. There are two options for […]

Read more

Python: Get Size of Dictionary

Introduction In this article, we’ll take a look at how to find the size of a dictionary in Python. Dictionary size can mean its length, or space it occupies in memory. To find the number of elements stored in a dictionary we can use the len() function. To find the size of a dictionary in bytes we can use the getsizeof() function of the sys module. To count the elements of a nested dictionary, we can use a recursive function. […]

Read more

Issue #116 – Fully Non-autoregressive Neural Machine Translation

04 Feb21 Issue #116 – Fully Non-autoregressive Neural Machine Translation Author: Dr. Patrik Lambert, Senior Machine Translation Scientist @ Iconic Introduction The standard Transformer model is autoregressive (AT), which means that the prediction of each target word is based on the predictions for the previous words. The output is generated from left to right, a process which cannot be parallelised because the prediction probability of a token depends on previous tokens. In the last few years, new approaches have been […]

Read more
1 7 8 9 10