A2T: Towards Improving Adversarial Training of NLP Models

This is the source code for the EMNLP 2021 (Findings) paper “Towards Improving Adversarial Training of NLP Models”. If you use the code, please cite the paper: @misc{yoo2021improving, title={Towards Improving Adversarial Training of NLP Models}, author={Jin Yong Yoo and Yanjun Qi}, year={2021}, eprint={2109.00544}, archivePrefix={arXiv}, primaryClass={cs.CL} } Prerequisites The work heavily relies on the TextAttack package. In fact, the main training code is implemented in the TextAttack package. Required packages are listed in the requirements.txt file. pip install -r requirements.txt   […]

Read more

GreynirCorrect: Spelling and grammar correction for Icelandic

GreynirCorrect: Spelling and grammar correction for Icelandic Overview GreynirCorrect is a Python 3 (>= 3.6) package and command line tool for checking and correcting spelling and grammar in Icelandic text. GreynirCorrect relies on the Greynir package, by the same authors, to tokenize and parse text. GreynirCorrect is documented in detail here. The software has three main modes of operation, described below. Token-level correction GreynirCorrect can tokenize text and return an automatically corrected token stream. This catches token-level errors, such as […]

Read more

Flexible Generation of Natural Language Deductions

a.k.a. ParaPattern https://arxiv.org/abs/2104.08825 Kaj Bostrom, Lucy Zhao, Swarat Chaudhuri, and Greg Durrett This repository contains all the code needed to replicate the experiments from the paper, and additionally provides a set of tools to put together new natural language deduction operations from scratch. In the data/ folder, you’ll find all the data used to train and evaluate our models, already preprocessed and ready to go, with the exception of the MNLI dataset due to its size – if you want […]

Read more

Examples of Askdata usage in serving different types of data

This repository contains examples of Askdata usage in serving different types of data. Installation pip install askdata orpip install -r requirements.txt Authentication Lets handle our authenticaton from askdata import Askdata askdata = Askdata() Once your insert your account and password you’re all set Query your data # Load the list of the agents connected to your account as a pandas dataframe get_agents_df = askdata.agents_dataframe() #get one agent agent = askdata.agent(“sales_demo”) # Simple query df = agent.ask(‘give me sales by countries’) […]

Read more

Topic modeling on unstructured data in Space news articles retrieved

NLP Space News Topic Modeling topic modeling on unstructured data in Space news articles retrieved from the Guardian (UK) newspaper using API Project Idea Project Overview This project aims to learn topics published in Space news from the Guardian (UK) news publication. Motivation The model/tool would give an idea of what Space news topics matter to each publication over time. For example, a space mission led by the European Space Agency (ESA) might be more relevant/important to the Guardian than […]

Read more

A recurrent unit that can run over 10 times faster than cuDNN LSTM

sru SRU is a recurrent unit that can run over 10 times faster than cuDNN LSTM, without loss of accuracy tested on many tasks. Average processing time of LSTM, conv2d and SRU, tested on GTX 1070 For example, the figure above presents the processing time of a single mini-batch of 32 samples. SRU achieves 10 to 16 times speed-up compared to LSTM, and operates as fast as (or faster than) word-level convolution using conv2d. Reference: Simple Recurrent Units for Highly […]

Read more

Korean Simple Contrastive Learning of Sentence Embeddings implementation using pytorch

KoSimCSE Korean Simple Contrastive Learning of Sentence Embeddings implementation using pytorch Installation git clone https://github.com/BM-K/KoSimCSE.git cd KoSimCSE git clone https://github.com/SKTBrain/KoBERT.git cd KoBERT pip install -r requirements.txt pip install . cd .. pip install -r requirements.txt Training – only supervised bash run_example.sh Pre-Trained Models Using BERT [CLS] token representation Pre-Trained model check point Performance Model Cosine Pearson Cosine Spearman Euclidean Pearson Euclidean Spearman Manhattan Pearson Manhattan Spearman Dot Pearson Dot Spearman KoSBERT_SKT* 78.81 78.47 77.68 77.78 77.71 77.83 75.75 75.22 KoSimCSE_SKT […]

Read more

Dice Loss for NLP Tasks with python

Dice Loss for NLP Tasks This repository contains code for Dice Loss for Data-imbalanced NLP Tasks at ACL2020. Setup Install Package Dependencies The code was tested in Python 3.6.9+ and Pytorch 1.7.1. If you are working on ubuntu GPU machine with CUDA 10.1, please run the following command to setup environment. $ virtualenv -p /usr/bin/python3.6 venv $ source venv/bin/activate $ pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html $ pip install -r requirements.txt Download BERT Model Checkpoints Before running the repo […]

Read more

One Stop Anomaly Shop with python

One Stop Anomaly Shop (OSAS) This repository implements the models, methods and techniques presented in our paper: A Principled Approach to Enriching Security-related Data for Running Processes through Statistics and Natural Language Processing. One Stop Anomaly Shop: Anomaly detection using two-phase approach: (a) pre-labeling using statistics, Natural Language Processing and static rules; (b) anomaly scoring using supervised and unsupervised machine learning. Introduction video (follows quick start guide) This video is a recording of our Hack In The Box (HITB) Security […]

Read more

A sentence embeddings method that provides semantic representations

InferSent InferSent is a sentence embeddings method that provides semantic representations for English sentences. It is trained on natural language inference data and generalizes well to many different tasks. We provide our pre-trained English sentence encoder from our paper and our SentEval evaluation toolkit. Recent changes: Removed train_nli.py and only kept pretrained models for simplicity. Reason is I do not have time anymore to maintain the repo beyond simple scripts to get sentence embeddings. Dependencies This code is written in […]

Read more
1 22 23 24 25 26 27