Machine Translation Weekly 67: Where the language neurality of mBERT reside?

If someone told me ten years ago when I was a freshly graduated bachelor of computer science that there would models that would produce multilingual sentence representation allowing zero-shot model transfer, I would have hardly believed such a prediction. If they added that the models would be total black boxes and we would not know why it worked, I would think they were insane. After all, one of the goals of the mathematization of stuff in science is to make […]

Read more

Gradient Descent With Momentum from Scratch

Gradient descent is an optimization algorithm that follows the negative gradient of an objective function in order to locate the minimum of the function. A problem with gradient descent is that it can bounce around the search space on optimization problems that have large amounts of curvature or noisy gradients, and it can get stuck in flat spots in the search space that have no gradient. Momentum is an extension to the gradient descent optimization algorithm that allows the search […]

Read more

Research Collection – Shall we play a game?

From a research point of view, games offer an amazing environment in which to develop new machine learning algorithms and techniques. And we hope, in due course, that those new algorithms will feed back not just into gaming, but into many other domains. Beyond the very technical machine learning techniques themselves, gaming is an environment in which we can explore the relationship between AI and people, and see how they can work in partnership. It’s a very rich environment in […]

Read more

Industrial Strength Natural Language Processing

Having spent a big part of my career as a graduate student researcher and now a Data Scientist in the industry, I have come to realize that a vast majority of solutions proposed both in academic research papers and in the work place are just not meant to ship — they just don’t scale! And when I say scale, I mean handling real world uses cases,  ability to handle large amounts of data and ease of deployment in a production […]

Read more

What are N-Grams?

N-grams of texts are extensively used in text mining and natural language processing tasks. They are basically a set of co-occurring words within a given window and when computing the n-grams you typically move one word forward (although you can move X words forward in more advanced scenarios). For example, for the sentence “The cow jumps over the moon”. If N=2 (known as bigrams), then the ngrams would be: the cow cow jumps jumps over over the the moon So […]

Read more

What is Term-Frequency?

Term Frequency (TF) Term frequency (TF) often used in Text Mining, NLP and Information Retrieval tells you how frequently a term occurs in a document. In the context natural language, terms correspond to words or phrases. Since every document is different in length, it is possible that a term would appear more often in longer documents than shorter ones. Thus, term frequency is often divided by the  the total number of terms in the document as a way of normalization. […]

Read more

What is Inverse Document Frequency (IDF)?

Inverse Document Frequency (IDF) is a weight indicating how commonly a word is used. The more frequent its usage across documents, the lower its score. The lower the score, the less important the word becomes. For example, the word the appears in almost all English texts and would thus have a very low IDF score as it carries very little “topic” information. In contrast, if you take the word coffee, while it is common, it’s not used as widely as […]

Read more

Text Preprocessing for Machine Learning & NLP

Based on some recent conversations, I realized that text preprocessing is a severely overlooked topic. A few people I spoke to mentioned inconsistent results from their NLP applications only to realize that they were not  preprocessing their text or were using the wrong kind of text preprocessing for their project. With that in mind, I thought of shedding some light around what text preprocessing really is, the different techniques of text preprocessing and a way to estimate how much preprocessing […]

Read more

How to Use Tfidftransformer & Tfidfvectorizer?

Scikit-learn’s Tfidftransformer and Tfidfvectorizer aim to do the same thing, which is to convert a collection of raw documents to a matrix of TF-IDF features. The differences between the two modules can be quite confusing and it’s hard to know when to use which. This article shows you how to correctly use each module, the differences between the two and some guidelines on what to use when. Tfidftransformer Usage 1. Dataset and Imports Below we have 5 toy documents, all […]

Read more

Easily Access Pre-trained Word Embeddings with Gensim

What are pre-trained embeddings and why? Pre-trained word embeddings are vector representation of words trained on a large dataset. With pre-trained embeddings, you will essentially be using the weights and vocabulary from the end result of the training process done by….someone else! (It could also be you) One benefit of using pre-trained embeddings is that you can hit the ground running without the need for finding a large text corpora which you will have to preprocess and train with the […]

Read more
1 748 749 750 751 752 985