August 5, 2021 Audio

Neural speaker diarization with pyannote-audio

Neural speaker diarization with pyannote-audio Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding pyannote.audio is an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines: pyannote.audio also comes with pretrained models covering a wide range of domains for voice activity detection, speaker change detection, […]

August 5, 2021 Task

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning Sandeep Subramanian, Adam Trischler, Yoshua Bengio & Christopher Pal ICLR 2018 About GenSen is a technique to learn general purpose, fixed-length representations of sentences via multi-task training. These representations are useful for transfer and low-resource learning. For details please refer to our ICLR paper. Code We provide a PyTorch implementation of our paper along with pre-trained models as well as code to evaluate these models on a variety of […]

August 5, 2021 Speech Recognitio

ESPnet: end-to-end speech processing toolkit

ESPnet ESPnet is an end-to-end speech processing toolkit, mainly focuses on end-to-end speech recognition and end-to-end text-to-speech. ESPnet uses chainer and pytorch as a main deep learning engine, and also follows Kaldi style data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments. Key Features Kaldi style complete recipe Support numbers of ASR recipes (WSJ, Switchboard, CHiME-4/5, Librispeech, TED, CSJ, AMI, HKUST, Voxforge, REVERB, etc.) Support numbers of TTS recipes with […]

August 5, 2021 Tool

A toolkit for validating, forging, scanning and tampering JWTs

jwt_tool.py is a toolkit for validating, forging, scanning and tampering JWTs (JSON Web Tokens). Its functionality includes: Checking the validity of a token Testing for known exploits: (CVE-2015-2951) The alg=none signature-bypass vulnerability (CVE-2016-10555) The RS/HS256 public key mismatch vulnerability (CVE-2018-0114) Key injection vulnerability (CVE-2019-20933/CVE-2020-28637) Blank password vulnerability (CVE-2020-28042) Null signature vulnerability Scanning for misconfigurations or known weaknesses Fuzzing claim values to provoke unexpected behaviours Testing the validity of a secret/key file/Public Key/JWKS key Identifying weak keys via a High-speed Dictionary […]

August 5, 2021 Beginner, Machine Learning, NLP, Project, Python, Text

Identifying The Language of A Document Using NLP!

This article was published as a part of the Data Science Blogathon Introduction The goal of this article is to identify the language from the written text. The text in documents is available in many languages and when we don’t know the language it becomes very difficult sometimes to tell this to google translator as well. For most translators, we have to tell both the input language and the desired language. If you had a text written in Spanish and you […]

August 5, 2021 Python

NumPy views: saving memory, leaking memory, and subtle bugs

If you’re using Python’s NumPy library, it’s usually because you’re processing large arrays that use plenty of memory. To reduce your memory usage, chances are you want to minimize unnecessary copying, NumPy has a built-in feature that does this transparently, in many common cases: memory views. However, this feature can also cause higher memory usage by preventing arrays from being garbage collected. And in some cases it can cause bugs, with data being mutated in unexpected ways. To avoid these […]

August 4, 2021 Graph

Continuous Query Decomposition for Complex Query Answering in Incomplete Knowledge Graphs

Continuous Query Decomposition Continuous Query Decomposition for Complex Query Answering in Incomplete Knowledge Graphs Update We implemented CQD in the KGReasoning framework, a library from SNAP implementing several Complex Query Answering models, which also supports experimenting with the Query2Box and BetaE datasets (in this repo, we only consider the former). Our implementation is available at this link. This repository contains the official implementation for our ICLR 2021 (Oral, Outstanding Paper Award) paper, Complex Query Answering with Neural Link Predictors: @inproceedings{ […]

August 4, 2021 Attack

A Python framework for adversarial attacks and model training in NLP

TextAttack TextAttack is a Python framework for adversarial attacks, data augmentation, and model training in NLP. If you’re looking for information about TextAttack’s menagerie of pre-trained models, you might want the TextAttack Model Zoo page. Slack Channel For help and realtime updates related to TextAttack, please join the TextAttack Slack! Why TextAttack? There are lots of reasons to use TextAttack: Understand NLP models better by running different adversarial attacks on them and examining the output Research and develop different NLP […]

August 4, 2021 Network

Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing

R²SQL The PyTorch implementation of paper Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing. (AAAI 2021) Requirements The model is tested in python 3.6 with following requirements: torch==1.0.0 transformers==2.10.0 sqlparse pymysql progressbar nltk numpy six spacy All experiments on SParC and CoSQL datasets were run on NVIDIA V100 GPU with 32GB GPU memory. Tips: The 16GB GPU memory may appear out-of-memory error. Setup The SParC and CoSQL experiments in two different folders, you need to download different datasets from […]

August 4, 2021 Generator

Procedural 3D data generation pipeline for architecture in python

Synthetic Dataset Generator This is a tool that generates a dataset of synthetic buildings of different typologies. The generated data includes: Mesh files of generated buildings, .obj format Rendered images of the mesh, .png format Rendered segmentation masks, .png format Depth annotation, .png and .exr format Surface normals annotation, .png format Point cloud files, .ply format (the number of points by default is 2048, can be changed in dataset_config.py) How To Use Install Blender>=2.90. After installation make sure to add […]

« 1 … 582 583 584 585 586 … 991 »