Compact Bilinear Pooling for PyTorch

This repository has a pure Python implementation of Compact Bilinear Pooling and Count Sketch for PyTorch. This version relies on the FFT implementation provided with PyTorch 0.4.0 onward. For older versions of PyTorch, use the tag v0.3.0. Installation Run the setup.py, for instance: Usage class compact_bilinear_pooling.CompactBilinearPooling(input1_size, input2_size, output_size, h1 = None, s1 = None, h2 = None, s2 = None) Basic usage: from compact_bilinear_pooling import CountSketch, CompactBilinearPooling input_size = 2048 output_size = 16000 mcb = CompactBilinearPooling(input_size, input_size, output_size).cuda() x = […]

Read more

A Pytorch Implementation of the Transformer: Attention Is All You Need

Our implementation is largely based on Tensorflow implementation Requirements Why This Project? I’m a freshman of pytorch. So I tried to implement some projects by pytorch. Recently, I read the paper Attention is all you need and impressed by the idea. So that’s it. I got similar result compared with the original tensorflow implementation. Differences with the original paper I don’t intend to replicate the paper exactly. Rather, I aim to implement the main ideas in the paper and verify […]

Read more

Sequence modeling benchmarks and temporal convolutional networks

This repository contains the experiments done in the work An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling by Shaojie Bai, J. Zico Kolter and Vladlen Koltun. We specifically target a comprehensive set of tasks that have been repeatedly used to compare the effectiveness of different recurrent networks, and evaluate a simple, generic but powerful (purely) convolutional network on the recurrent nets’ home turf. Experiments are done in PyTorch. If you find this repository helpful, please cite […]

Read more

A Structured Self-attentive Sentence Embedding

Implementation for the paper A Structured Self-Attentive Sentence Embedding, which was published in ICLR 2017: https://arxiv.org/abs/1703.03130 . USAGE: For binary sentiment classification on imdb dataset run : python classification.py “binary” For multiclass classification on reuters dataset run : python classification.py “multiclass” You can change the model parameters in the model_params.json file Other tranining parameters like number of attention hops etc can be configured in the config.json file. If you want to use pretrained glove embeddings , set the use_embeddings parameter […]

Read more

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention – 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction An NMT models with two-dimensional convolutions to jointly encode the source and the target sequences. Pervasive Attention also provides an extensive decoding grid that we leverage to efficiently train wait-k models. See README. Efficient Wait-k Models for Simultaneous Machine Translation Transformer Wait-k models (Ma et al., 2019) with unidirectional encoders and with joint training of multiple wait-k paths. […]

Read more

Trellis Networks for Sequence Modeling

This repository contains the experiments done in paper Trellis Networks for Sequence Modeling by Shaojie Bai, J. Zico Kolter and Vladlen Koltun. On the one hand, a trellis network is a temporal convolutional network with special structure, characterized by weight tying across depth and direct injection of the input into deep layers. On the other hand, we show that truncated recurrent networks are equivalent to trellis networks with special sparsity structure in their weight matrices. Thus trellis networks with general […]

Read more
1 10 11 12 13 14 48