A software toolkit for weak supervision applied to NLP tasks

skweak Labelled data remains a scarce resource in many practical NLP scenarios. This is especially the case when working with resource-poor languages (or text domains), or when using task-specific labels without pre-existing datasets. The only available option is often to collect and annotate texts by hand, which is expensive and time-consuming. skweak (pronounced /skwi:k/) is a Python-based software toolkit that provides a concrete solution to this problem using weak supervision. skweak is built around a very simple idea: Instead of […]

Read more

OpenAI CLIP text encoders for any language

Multilingual-CLIP OpenAI CLIP text encoders for any language. OpenAI recently released the paper Learning Transferable Visual Models From Natural Language Supervision in which they present the CLIP (Contrastive Language–Image Pre-training) model. This model is trained to connect text and images, by matching their corresponding vector representations using a contrastive learning objective. CLIP consists of two separate models, a visual encoder and a text encoder. These were trained on a wooping 400 Million images and corresponding captions. OpenAI has since released […]

Read more

A research-oriented benchmarking framework for advancing federated learning

FedNLP FedNLP is a research-oriented benchmarking framework for advancing federated learning (FL) in natural language processing (NLP). It uses FedML repository as the git submodule. In other words, FedNLP only focuses on adavanced models and dataset, while FedML supports various federated optimizers (e.g., FedAvg) and platforms (Distributed Computing, IoT/Mobile, Standalone). The figure below is the overall structure of FedNLP. avatar Installation After git clone-ing this repository, please run the following command to install our dependencies. conda create -n fednlp python=3.7 […]

Read more

Transfer learning for NLP models by annotating your textual data

Label Studio for Transformers Transfer learning for NLP models by annotating your textual data without any additional coding. This package provides a ready-to-use container that links together: Quick Usage Install Label Studio and other dependencies pip install -r requirements.txt Create ML backend with BERT classifier label-studio-ml init my-ml-backend –script models/bert_classifier.py cp models/utils.py my-ml-backend/utils.py Create ML backend with BERT named entity recognizer label-studio-ml init my-ml-backend –script models/ner.py cp models/utils.py my-ml-backend/utils.py Start ML backend at http://localhost:9090 label-studio-ml start my-ml-backend Start Label Studio […]

Read more

Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing

UT (Universal Transformation) AppendIrr Extend sentences by irrelevant sentences – BackTrans BackTrans (Trans short for translation) replaces test data with paraphrases by leveraging back translation, which is able to figure out whether or not the target models merely capture the literal features instead of semantic meaning. – Contraction Contraction replaces phrases like `will not` and `he has` with contracted forms, namely, `won’t` and `he’s` – InsertAdv Transforms an input by add adverb word before verb – Keyboard Keyboard turn to […]

Read more

Simple pure function representations of popular time series packages

timemachines State machines for time-series. (Use popular packages with one line of code) What’s different: Simple canoncial use of some functionality from packages like fbprophet, pmdarima, tsa and their ilk. Simple k-step ahead forecasts in functional style involving one line of code.Time series “models” are synomymous with functions that have a “skater” signature, facilitating “skating“.One might say that skater functions suggest state machines for sequential assimilation of observations (as a data point arrives,forecasts for 1,2,…,k steps ahead, with corresponding standard […]

Read more
1 25 26 27