Datasets from Instructions In Python

Datasets from Instructions
This repository contains the code for Generating Datasets with Pretrained Language Models. The paper introduces a method called Datasets from Instructions (DINO sauropod) that enables pretrained language models to generate entire datasets from scratch.
🔧 Setup
All requirements for DINO can be found in requirements.txt
. You can install all required packages in a new environment with pip install -r requirements.txt
.
💬 CLI Usage
Single Texts
To generate datasets for (single) text classification, you can use DINO as follows:
python3 dino.py
--output_dir
--task_file
--num_entries_per_label
--batch_size 1
where
is a directory to which the generated dataset is written,
is a JSON file containing a task specification (see Task Specs), and
is the number