Sequence modeling benchmarks and temporal convolutional networks

This repository contains the experiments done in the work An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling by Shaojie Bai, J. Zico Kolter and Vladlen Koltun. We specifically target a comprehensive set of tasks that have been repeatedly used to compare the effectiveness of different recurrent networks, and evaluate a simple, generic but powerful (purely) convolutional network on the recurrent nets’ home turf. Experiments are done in PyTorch. If you find this repository helpful, please cite […]

Read more

A Structured Self-attentive Sentence Embedding

Implementation for the paper A Structured Self-Attentive Sentence Embedding, which was published in ICLR 2017: https://arxiv.org/abs/1703.03130 . USAGE: For binary sentiment classification on imdb dataset run : python classification.py “binary” For multiclass classification on reuters dataset run : python classification.py “multiclass” You can change the model parameters in the model_params.json file Other tranining parameters like number of attention hops etc can be configured in the config.json file. If you want to use pretrained glove embeddings , set the use_embeddings parameter […]

Read more

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention – 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction An NMT models with two-dimensional convolutions to jointly encode the source and the target sequences. Pervasive Attention also provides an extensive decoding grid that we leverage to efficiently train wait-k models. See README. Efficient Wait-k Models for Simultaneous Machine Translation Transformer Wait-k models (Ma et al., 2019) with unidirectional encoders and with joint training of multiple wait-k paths. […]

Read more

Trellis Networks for Sequence Modeling

This repository contains the experiments done in paper Trellis Networks for Sequence Modeling by Shaojie Bai, J. Zico Kolter and Vladlen Koltun. On the one hand, a trellis network is a temporal convolutional network with special structure, characterized by weight tying across depth and direct injection of the input into deep layers. On the other hand, we show that truncated recurrent networks are equivalent to trellis networks with special sparsity structure in their weight matrices. Thus trellis networks with general […]

Read more

GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks

This repository implements a capsule model IntentCapsNet-ZSL on the SNIPS-NLU dataset in Python 3 with PyTorch, first introduced in the paper Zero-shot User Intent Detection via Capsule Neural Networks. The code aims to follow PyTorch best practices, using torch instead of numpy where possible, and using .cuda() for GPU computation. Feel free to contribute via pull requests. Congying Xia, Chenwei Zhang, Xiaohui Yan, Yi Chang, Philip S. Yu. Zero-shot User Intent Detection via Capsule Neural Networks. In Proceedings of the […]

Read more

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

This repository contains the code in both PyTorch and TensorFlow for our paper Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov (*: equal contribution) Preprint 2018 TensorFlow The source code is in the tf/ folder, supporting (1) single-node multi-gpu training, and (2) multi-host TPU training. Besides the source code, we also provide pretrained “TensorFlow” models with state-of-the-art (SoTA) performances reported in the paper. Please refer to tf/README.md […]

Read more
1 485 486 487 488 489 973