Datasets from Instructions In Python

Datasets from Instructions

This repository contains the code for Generating Datasets with Pretrained Language Models. The paper introduces a method called Datasets from Instructions (DINO sauropod) that enables pretrained language models to generate entire datasets from scratch.

🔧 Setup

All requirements for DINO can be found in requirements.txt. You can install all required packages in a new environment with pip install -r requirements.txt.

💬 CLI Usage

Single Texts

To generate datasets for (single) text classification, you can use DINO as follows:

python3 dino.py 
 --output_dir  
 --task_file  
 --num_entries_per_label  
 --batch_size 1

where is a directory to which the generated dataset is written, is a JSON file containing a task specification (see Task Specs), and is the number

 

 

 

To finish reading, please visit source site