A work-in-progress vector version of the MNIST dataset

bezier-mnist This is a work-in-progress vector version of the MNIST dataset. Here are some samples from the training set. Note that, while these are rasterized, the underlying images can be rendered at any resolution because they are smooth vector graphics. ![A grid of sixteen digit images](https://github.com/unixpickle/bezier-mnist/raw/main/samples.png =300×300) I have already converted all of MNIST to Bezier curves. This dataset can be downloaded at this page. There are two files: train.zip and test.zip, each containing a separate json file for each […]

Read more

A Dataset of Python Challenges for AI Research

Python Programming Puzzles (P3) This repo contains a dataset of python programming puzzles which can be used to teach and evaluate an AI’s programming proficiency. We hope this dataset with grow rapidly, and it is already diverse in terms of problem difficult, domain, and algorithmic tools needed to solve the problems. Please propose a new puzzle or browse newly proposed puzzles or contribute through pull requests. To learn more about how well AI systems such as GPT-3 can solve these […]

Read more

Data pipeline architecture for onboarding public datasets to Datasets for Google Cloud

Public Datasets Pipelines Cloud-native, data pipeline architecture for onboarding public datasets to Datasets for Google Cloud. We use Pipenv to make environment setup more deterministic and uniform across different machines. If you haven’t done so, install Pipenv using the instructions found here. Now with Pipenv installed, run the following command: pipenv install –ignore-pipfile –dev This uses the Pipfile.lock found in the project root and installs all the development dependencies. Finally, initialize the Airflow database: pipenv run airflow initdb Configuring, generating, […]

Read more

A python library that generates random facts

Randfacts Randfacts is a python library that generates random facts. You can use randfacts.get_fact() to return a random fun fact. Disclaimer: Facts are not guaranteed to be true. randfacts can either be installed via pip or via the AUR, whichever way you prefer. Installation via pip: $ pip3 install randfacts Installation via AUR: $ git clone https://aur.archlinux.org/python-randfacts.git && cd python-randfacts $ makepkg -si import randfacts x = randfacts.get_fact() print(x) will print a random fact like:Penguins can’t taste sweet or savory […]

Read more

Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets

This is the official PyTorch implementation for the paper Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets (ICLR 2021) : https://openreview.net/forum?id=rkQuFUmUOg3. Abstract Despite the success of recent Neural Architecture Search (NAS) methods on various tasks which have shown to output networks that largely outperform human-designed networks, conventional NAS methods have mostly tackled the optimization of searching for the network architecture for a single task (dataset), which does not generalize well across multiple tasks (datasets). Moreover, since such […]

Read more

A dataset for online Arabic calligraphy

Calliar is a dataset for Arabic calligraphy. The dataset consists of 2500 json files that contain strokes manually annotated for Arabic calligraphy. This repository contains the dataset for the following paper : Calliar: An Online Handwritten Dataset for Arabic Calligraphy Zaid Alyafeai, Maged S. Al-shaibani, Mustafa Ghaleb, Yousif Ahmed Al-Wajih https://arxiv.org/abs/2106.10745 Abstract: Calligraphy is an essential part of the Arabic heritage and culture. It has been used in the past for the decoration of houses and mosques. Usually, such calligraphy […]

Read more

A Naturally-Occurring Dataset Based on Stack Exchange Data

SEDE SEDE (Stack Exchange Data Explorer) is new dataset for Text-to-SQL tasks with more than 12,000 SQL queries and their natural language description. It’s based on a real usage of users from the Stack Exchange Data Explorer platform, which brings complexities and challenges never seen before in any other semantic parsing dataset like including complex nesting, dates manipulation, numeric and text manipulation, parameters, and most importantly: under-specification and hidden-assumptions. Paper (NLP4Prog workshop at ACL2021): Text-to-SQL in the Wild: A Naturally-Occurring […]

Read more

A python library to access TensorBay and manage your datasets

TensorBay Python SDK TensorBay Python SDK is a python library to access TensorBay and manage your datasets.It provides: A pythonic way to access your TensorBay resources by TensorBay OpenAPI. An easy-to-use CLI tool gas (Graviti AI service) to communicate with TensorBay. A consistent dataset format to read and write your datasets. Installation pip3 install tensorbay Documentation More information can be found on the documentation site Usage An AccessKey is needed to communicate with TensorBay. Please visit this page to get […]

Read more
1 2 3