A full spaCy pipeline and models for scientific/biomedical documents

This repository contains custom pipes and models related to using spaCy for scientific documents.

In particular, there is a custom tokenizer that adds tokenization rules on top of spaCy’s rule-based tokenizer, a POS tagger and syntactic parser trained on biomedical data and an entity span detection model. Separately, there are also NER models for more specific tasks.

Just looking to test out the models on your data? Check out our demo.

Installation

Installing scispacy requires two steps: installing the library and intalling the models. To install the library, run:

to install a model (see our full selection of available models below), run a command like the following:

pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_core_sci_sm-0.4.0.tar.gz

Note: We strongly recommend that you use

To finish reading, please visit source site