Controlling Text Generation with Plug and Play Language Models

This article is based on the paper “Plug and Play Language Models: A Simple Approach To Controlled Text Generation” by Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, and Rosanne Liu. The transformer neural network architecture, developed by Vaswani et al. (2017), has enabled larger models and momentous progress in natural language processing (NLP) over the last    

Read more

Top 15 Open-Source Datasets of 2020 that every Data Scientist Should add to their Portfolio!

Overview Here is a list of Top 15 Datasets for 2020 that we feel every data scientist should practice on The article contains 5 datasets each for machine learning, computer vision, and NLP By no means is this list exhaustive. Feel free to add other datasets in the comments below   Introduction For the things we have to learn before we can do them, we learn by doing them -Aristotle I am sure everyone can attest to this saying. No […]

Read more

Document-aligned Japanese-English Conversation Parallel Corpus

Sentence-level (SL) machine translation (MT) has reached acceptable quality for many high-resourced languages, but not document-level (DL) MT, which is difficult to 1) train with little amount of DL data; and 2) evaluate, as the main methods and data sets focus on SL evaluation. To address the first issue, we present a document-aligned Japanese-English conversation corpus, including balanced, high-quality business conversation data for tuning and testing… As for the second issue, we manually identify the main areas where SL MT […]

Read more

When is programming needed in most leading Self Service configurations

To all Data Analysts big and small: Many Corporates typically have Self service BI and DWH solutions ( I am asking only about those who did NOT build an inhouse solution) :  -When is programming needed in most leading Self Service configurations? -When do analysts and Business executives require coding and programming when the Self service application, slice and dice, filtering and fields are not enough?! – IN SOME PLACES, us junior analysts are getting a feeling (that may be […]

Read more

Dynamic Classifier Selection Ensembles in Python

Dynamic classifier selection is a type of ensemble learning algorithm for classification predictive modeling. The technique involves fitting multiple machine learning models on the training dataset, then selecting the model that is expected to perform best when making a prediction, based on the specific details of the example to be predicted. This can be achieved using a k-nearest neighbor model to locate examples in the training dataset that are closest to the new example to be predicted, evaluating all models […]

Read more

Machine Translation Weekly 62: The EDITOR

Papers about new models for sequence-to-sequence modeling have always been my favorite genre. This week I will talk about a model called EDITOR that was introduced in a pre-print of a paper that will appear in the TACL journal with authors from the University of Maryland. The model is based on the Levenshtein Transformer, a partially non-autoregressive model for sequence-to-sequence learning. Autoregressive models generate the output left-to-right (or right-to-left), conditioning each step on the previously generated token. On the other […]

Read more

Python: Check if Key Exists in Dictionary

Introduction Dictionary (also known as ‘map’, ‘hash’ or ‘associative array’) is a built-in Python container that stores elements as a key-value pair. Just like other containers have numeric indexing, here we use keys as indexes. Keys can be numeric or string values. However, no mutable sequence or object can be used as a key, like a list. In this article, we’ll take a look at how to check if a key exists in a dictionary in Python. In the examples, […]

Read more

Calculating Pearson Correlation Coefficient in Python with Numpy

Introduction This article is an introduction to the Pearson Correlation Coefficient, its manual calculation and its computation via Python’s numpy module. The Pearson correlation coefficient measures the linear association between variables. Its value can be interpreted like so: +1 – Complete positive correlation +0.8 – Strong positive correlation +0.6 – Moderate positive correlation 0 – no correlation whatsoever -0.6 – Moderate negative correlation -0.8 – Strong negative correlation -1 – Complete negative correlation We’ll illustrate how the correlation coefficient varies […]

Read more

Automatic Standardization of Colloquial Persian

The Iranian Persian language has two varieties: standard and colloquial. Most natural language processing tools for Persian assume that the text is in standard form: this assumption is wrong in many real applications especially web content… This paper describes a simple and effective standardization approach based on sequence-to-sequence translation. We design an algorithm for generating artificial parallel colloquial-to-standard data for learning a sequence-to-sequence model. Moreover, we annotate a publicly available evaluation data consisting of 1912 sentences from a diverse set […]

Read more
1 762 763 764 765 766 985