Frustratingly Simple Pretraining Alternatives to Masked Language Modeling
This is the official implementation for “Frustratingly Simple Pretraining Alternatives to Masked Language Modeling” (EMNLP 2021). Requirements torch transformers datasets scikit-learn tensorflow spacy How to pre-train 1. Clone this repository git clone https://github.com/gucci-j/light-transformer-emnlp2021.git 2. Install required packages cd ./light-transformer-emnlp2021 pip install -r requirements.txt requirements.txt is located just under light-transformer-emnlp2021. We also need spaCy’s en_core_web_sm for preprocessing. If you have not installed this model, please run python -m spacy download en_core_web_sm. 3. Preprocess datasets cd ./src/utils python preprocess_roberta.py –path=/path/to/save/data/ You need […]
Read more