Issue #113 – Optimising Transformer for Low-Resource Neural Machine Translation

14 Jan21 Issue #113 – Optimising Transformer for Low-Resource Neural Machine Translation Author: Dr. Jingyi Han, Machine Translation Scientist @ Iconic Introduction The lack of parallel training data has always been a big challenge when building neural machine translation (NMT) systems. Most approaches address the low-resource issue in NMT by exploiting more parallel or comparable corpora. Recently, several studies show that instead of adding more data, optimising NMT systems could also be helpful to improve translation quality for low-resource language […]

Read more

Emotion classification on Twitter Data Using Transformers

Introduction The world of Natural language processing is recently overtaken by the invention of Transformers. Transformers are entirely indifferent to the conventional sequence-based networks. RNNs are the initial weapon used for sequence-based tasks like text generation, text classification, etc. But with the arrival of LSTM and GRU cells, the issue with capturing long-term dependency in the text got resolved. But learning the model with LSTM cells is a hard task as we cannot make it learn parallelly.

Read more

3 Books on Optimization for Machine Learning

Optimization is a field of mathematics concerned with finding a good or best solution among many candidates. It is an important foundational topic required in machine learning as most machine learning algorithms are fit on historical data using an optimization algorithm. Additionally, broader problems, such as model selection and hyperparameter tuning, can also be framed as an optimization problem. Although having some background in optimization is critical for machine learning practitioners, it can be a daunting topic given that it […]

Read more

Code Adam Gradient Descent Optimization From Scratch

Gradient descent is an optimization algorithm that follows the negative gradient of an objective function in order to locate the minimum of the function. A limitation of gradient descent is that a single step size (learning rate) is used for all input variables. Extensions to gradient descent like AdaGrad and RMSProp update the algorithm to use a separate step size for each input variable but may result in a step size that rapidly decreases to very small values. The Adaptive […]

Read more

Learning to Segment Rigid Motions from Two Frames

Appearance-based detectors achieve remarkable performance on common scenes, but tend to fail for scenarios lack of training data. Geometric motion segmentation algorithms, however, generalize to novel scenes, but have yet to achieve comparable performance to appearance-based ones, due to noisy motion estimations and degenerate motion configurations… To combine the best of both worlds, we propose a modular network, whose architecture is motivated by a geometric analysis of what independent object motions can be recovered from an egomotion field. It takes […]

Read more

RepVGG: Making VGG-style ConvNets Great Again

We present a simple but powerful architecture of convolutional neural network, which has a VGG-like inference-time body composed of nothing but a stack of 3×3 convolution and ReLU, while the training-time model has a multi-branch topology. Such decoupling of the training-time and inference-time architecture is realized by a structural re-parameterization technique so that the model is named RepVGG… On ImageNet, RepVGG reaches over 80% top-1 accuracy, which is the first time for a plain model, to the best of our […]

Read more

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

In deep learning, models typically reuse the same parameters for all inputs. Mixture of Experts (MoE) defies this and instead selects different parameters for each incoming example… The result is a sparsely-activated model — with outrageous numbers of parameters — but a constant computational cost. However, despite several notable successes of MoE, widespread adoption has been hindered by complexity, communication costs and training instability — we address these with the Switch Transformer. We simplify the MoE routing algorithm and design […]

Read more

Analysis of skin lesion images with deep learning

Skin cancer is the most common cancer worldwide, with melanoma being the deadliest form. Dermoscopy is a skin imaging modality that has shown an improvement in the diagnosis of skin cancer compared to visual examination without support… We evaluate the current state of the art in the classification of dermoscopic images based on the ISIC-2019 Challenge for the classification of skin lesions and current literature. Various deep neural network architectures pre-trained on the ImageNet data set are adapted to a […]

Read more

Distributed Double Machine Learning with a Serverless Architecture

Serverless cloud computing is predicted to be the dominating and default architecture of cloud computing in the coming decade (Berkley View on Serverless Computing, 2019). In this paper we explore serverless cloud computing for double machine learning… Being based on repeated cross-fitting, double machine learning is particularly well suited to exploit the enormous elasticity of serverless computing. It allows to get fast on-demand estimations without additional cloud maintenance effort. We provide a prototype implementation DoubleML-Serverless written in Python that implements […]

Read more

Preconditioned training of normalizing flows for variational inference in inverse problems

Obtaining samples from the posterior distribution of inverse problems with expensive forward operators is challenging especially when the unknowns involve the strongly heterogeneous Earth. To meet these challenges, we propose a preconditioning scheme involving a conditional normalizing flow (NF) capable of sampling from a low-fidelity posterior distribution directly… This conditional NF is used to speed up the training of the high-fidelity objective involving minimization of the Kullback-Leibler divergence between the predicted and the desired high-fidelity posterior density for indirect measurements […]

Read more
1 2 3 4 5 6 8