The NLP Cypher | 10.17.21

David is killing it! Welcome back NLP peeps! Do you miss the old days? The old internet days of modem calling, static websites, you know… a time of innocence where developers were innovating the backbone of the internet at hyper speeds? Well, we are very much going thru that right now via the Web 3.0 revolution. Cryptocurrencies usually get all of the attention but there is something else at play and it involves the entire web. You see, the current […]

Read more

sense2vec: Contextually-keyed word vectors

sense2vec (Trask et. al, 2015) is a nice twist on word2vec that lets you learn more interesting and detailed word vectors. This library is a simple Python implementation for loading, querying and training sense2vec models. For more details, check out our blog post. To explore the semantic similarities across all Reddit comments of 2015 and 2019, see the interactive demo. ?Version 2.0 (for spaCy v3) out now! Read the release notes here. ✨Features

Read more

A full spaCy pipeline and models for scientific/biomedical documents

This repository contains custom pipes and models related to using spaCy for scientific documents. In particular, there is a custom tokenizer that adds tokenization rules on top of spaCy’s rule-based tokenizer, a POS tagger and syntactic parser trained on biomedical data and an entity span detection model. Separately, there are also NER models for more specific tasks. Just looking to test out the models on your data? Check out our demo. Installation Installing scispacy requires two steps: installing the library […]

Read more

The NLP Cypher | 10.03.21

RAFT is a few-shot classification benchmark that tests language models: – across multiple domains (lit reviews, medical data, tweets, customer interaction, etc.) – on economically valuable classification tasks (someone inherently cares about the task) – with evaluation that mirrors deployment (50 labeled examples per task, info retrieval allowed, hidden test set)  

Read more

DaCy: The State of the Art Danish NLP pipeline using SpaCy

DaCy is a Danish preprocessing pipeline trained in SpaCy. At the time of writing it has achieved State-of-the-Art performance on all Benchmark tasks for Danish. This repository contains code for reproducing DaCy. To download the models use the DaNLP package (request pending), SpaCy (request pending) or downloading the project directly here. Reproduction the folder DaCy contains a SpaCy project which will allow for a reproduction of the results. This folder also includes the evaluation metrics on DaNE. Usage To   […]

Read more

The NLP Cypher | 06.06.21

Welcome back to the simulation ✌ . So ACL 2021 data dump happened and now we have a huge list of repos to get through in the Repo Cypher this week. 😁 Also, we are updating the NLP index very soon with 100+ new repos (many of which are mentioned here) alongside 30+ new NLP notebooks like this one 👇 . If you would like to get an email alert for future newsletters and asset updates, you can sign-up here. […]

Read more

The NLP Cypher | 06.13.21

TextStyleBrush can recognize style of text in pictures and edit the words while maintaining the style. It’s “… the first self-supervised AI model that replaces text in images of both handwriting and scenes — in one shot — using a single example word.” Examples: Install TensorFlow v2.5 and the tensorflow-metal PluggableDevice to accelerate training with Metal on Mac GPUs. Chris Farber highlights how to use Postgres for common Redis use-cases. In all, he describes 3 use-cases of job-queuing, application locks, […]

Read more

The NLP Cypher | 07.04.21

Hey Welcome back! Want to wish everyone in the US a happy 4th of July🎆🎇! Also, want to quickly mention that the NLP Index has doubled in size (since its inception) with now housing over 6,000 repos, pretty cool!!! 😎 And as always, it gets updated weekly. But first, this week we asked 100 NLP developers: Name one thing Microsoft got for paying $7.5 billi for GitHub, and $1 billi to OpenAI? SURVEY SAYS: 7.5B + 1B = GitHub CoPilot […]

Read more

The NLP Cypher | 07.11.21

Welcome back! Hope you had a great week. We have a new leader on the SuperGLUE benchmark with a new Ernie model from Baidu comprising of 10 billion parameters trained on on a 4TB corpus. FYI, human baseline was already beat by Microsoft’s DeBERTa model at the beginning of the year… time for a new SuperSuperGLUE benchmark??? Paper BTW, if you are still interested in GitHub’s CoPilot, I stumbled upon the Codex paper this week: Paper DeepMind’s Perceiver transformer allows […]

Read more

The NLP Cypher | 07.18.21

Sometimes… cool things happen. A new chatbot from Facebook AI was released this Friday with remarkable features. This chatbot, BlenderBot 2.0, is an improvement on their previous bot from last year. The bot has better long-term memory and can search the internet for information during conversation! This is a convenient improvement versus traditional bots since information is not statically “memorized” but instead has the option to be dynamic and “staying up to date” via the internet. 🤯 I’ve recently tested […]

Read more
1 2 3 6