An extension for asreview implements a version of the tf-idf feature extractor that saves the matrix and the vocabulary
An extension for ASReview that adds a tf-idf extractor that saves the matrix and the vocabulary to pickle and JSON respectively, and a doc2vec extractor that grabs the entire doc2vec model. Requested in discussion post #650.
Getting started
Install the new classifier with:
or
python -m pip install git+https://github.com/asreview/asreview-extension-vocab-extractor.git
Usage
Run the simulation as usual, but this time use tfidf_grab
or doc2vec_grab
as feature extractor. Extracts the matrix and the vocabulary during simulation preparation. The new Feature extractor tfidf_grab
is defined in asreviewcontrib.models.tfidf_grab.py
, and doc2vec_grab
is defined in asreviewcontrib.models.doc2vec_grab.py
.
The new tf-idf extractor can be used like this: