December 31, 2020 Advanced, Deep Learning, NLP, Project, Python, Semi-supervised, Structured Data, Technique, Text Leave a comment

Fake news classifier on US Election News📰 | LSTM 🈚

Introduction News media has become a channel to pass on the information of what’s happening in the world to the people living. Often people perceive whatever conveyed in the news to be true. There were circumstances where even the news channels acknowledged that their news is not true as they wrote. But some news has a significant impact not only on the people or

December 30, 2020 Python

Ultimate Guide to Heatmaps in Seaborn with Python

Introduction A heatmap is a data visualization technique that uses color to show how a value of interest changes depending on the values of two other variables. For example, you could use a heatmap to understand how air pollution varies according to the time of day across a set of cities. Another, perhaps more rare case of using heatmaps is to observe human behavior – you can create visualizations of how people use social media, how their answers on surveys […]

December 30, 2020 machine translation, Neural MT, NMT

Towards Fully Automated Manga Translation

We tackle the problem of machine translation of manga, Japanese comics. Manga translation involves two important problems in machine translation: context-aware and multimodal translation… Since text and images are mixed up in an unstructured fashion in Manga, obtaining context from the image is essential for manga translation. However, it is still an open problem how to extract context from image and integrate into MT models. In addition, corpus and benchmarks to train and evaluate such model is currently unavailable. In […]

December 29, 2020 machine translation, Neural MT, NMT

Learning Light-Weight Translation Models from Deep Transformer

Recently, deep models have shown tremendous improvements in neural machine translation (NMT). However, systems of this kind are computationally expensive and memory intensive… In this paper, we take a natural step towards learning strong but light-weight NMT systems. We proposed a novel group-permutation based knowledge distillation approach to compressing the deep Transformer model into a shallow model. The experimental results on several benchmarks validate the effectiveness of our method. Our compressed model is 8X shallower than the deep model, with […]

December 29, 2020 Machine Learning

Histogram-Based Gradient Boosting Ensembles in Python

Gradient boosting is an ensemble of decision trees algorithms. It may be one of the most popular techniques for structured (tabular) classification and regression predictive modeling problems given that it performs so well across a wide range of datasets in practice. A major problem of gradient boosting is that it is slow to train the model. This is particularly a problem when using the model on large datasets with tens of thousands of examples (rows). Training the trees that are […]

December 27, 2020 NLP

Efficient One-Pass End-to-End Entity Linking for Questions

November 16, 2020 By: Belinda Z. Li, Sewon Min, Srinivasan Iyer, Yashar Mehdad, Wen-tau Yih Abstract We present ELQ, a fast end-to-end entity linking model for questions, which uses a biencoder to jointly perform mention detection and linking in one pass. Evaluated on WebQSP and GraphQuestions with extended annotations that cover multiple entities per question, ELQ outperforms the previous state of the art by a large margin of +12.7% and +19.6% F1, respectively. With a very fast inference time (1.57 […]

December 26, 2020 Machine Learning

Feature Selection with Stochastic Optimization Algorithms

Typically, a simpler and better-performing machine learning model can be developed by removing input features (columns) from the training dataset. This is called feature selection and there are many different types of algorithms that can be used. It is possible to frame the problem of feature selection as an optimization problem. In the case that there are few input features, all possible combinations of input features can be evaluated and the best subset found definitively. In the case of a […]

December 24, 2020 Python

Reading and Writing HTML Tables with Pandas

Introduction Hypertext Markup Language (HTML) is the standard markup language for building web pages. We can render tabular data using HTML’s element. The Pandas data analysis library provides functions like read_html() and to_html() so we can import and export data to DataFrames. In this article, we will learn how to read tabular data from an HTML file and load it into a Pandas DataFrame. We’ll also learn how to write data from a Pandas DataFrame and to an HTML file. […]

December 23, 2020 Machine Learning

Ensemble Learning Algorithm Complexity and Occam’s Razor

Occam’s razor suggests that in machine learning, we should prefer simpler models with fewer coefficients over complex models like ensembles. Taken at face value, the razor is a heuristic that suggests more complex hypotheses make more assumptions that, in turn, will make them too narrow and not generalize well. In machine learning, it suggests complex models like ensembles will overfit the training dataset and perform poorly on new data. In practice, ensembles are almost universally the type of model chosen […]

December 23, 2020 Machine Learning

How to Choose an Optimization Algorithm

Optimization is the problem of finding a set of inputs to an objective function that results in a maximum or minimum function evaluation. It is the challenging problem that underlies many machine learning algorithms, from fitting logistic regression models to training artificial neural networks. There are perhaps hundreds of popular optimization algorithms, and perhaps tens of algorithms to choose from in popular scientific code libraries. This can make it challenging to know which algorithms to consider for a given optimization […]

1 2 3 … 19 »