A prototype COG-based tile server for sparse Mars datasets

Mars Tiler is a prototype web application that serves tiles from cloud-optimized GeoTIFFs, with an emphasis on supporting planetary datasets. Many features are hard-coded for Mars data and global projections, but the core of this work should be applicable to other planetary projections (e.g., Mars polar data, the Moon, Ceres, etc.). Dynamic tiling This applicaion is part of a new generation of “dynamic tilers”, which generates and slices mosaics on the fly from input datasets. All it needs to function […]

Read more

JupyterLite as a Datasette plugin

JupyterLite as a Datasette plugin Installation Install this plugin in the same environment as Datasette. $ datasette install datasette-jupyterlite Usage Once installed, visit /jupyterlite/ to access JupyterLite served from your Datasette instance. Development To set up this plugin locally, first checkout the code. Then create a new virtual environment: cd datasette-jupyterlite python3 -mvenv venv source venv/bin/activate Or if you are using pipenv: Now install the dependencies and test dependencies: To run    

Read more

An elegant datasets factory for python

an elegant datasets factory Features Schema oriented datasets builder How to use it # Import the package into any python app import rawbuilder # Init the dataset object as ds ds = rawbuilder.DataSet( size=1000, schema=[‘user’], file_name=’my_users_dataset_1′ ) # Build the dataset ds.build() Credits This package was created with

Read more

Release of the ConditionalQA dataset

Datasets accompanying the paper ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers. Disclaimer This dataset should ONLY be used for NLP research purpose. Answers are NOT verified by legal professionals and should NOT be used for any legal purposes. Evaluate Please generate your predictions using the format sample_output.json. Run the following command to evaluate your predictions with evaluate.py: python evaluate.py –pred_file=sample_output.json –ref_file=v1_0/dev.json Leaderboard Submit your predictions to the Leaderboard. Please email your Codalab username to if you would like […]

Read more

TorchXRayVision: A library of chest X-ray datasets and models

A library for chest X-ray datasets and models. Including pre-trained models. ( 🎬promo video about the project) Motivation: While there are many publications focusing on the prediction of radiological and clinical findings from chest X-ray images much of this work is inaccessible to other researchers. In the case of researchers addressing clinical questions it is a waste of time for them to train models from scratch. To address this, TorchXRayVision provides pre-trained models which are trained on large cohorts of […]

Read more

Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme

Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning SchemeXi Yang, Wangmeng Xiang, Hui Zeng and Lei ZhangInternational Conference on Computer Vision, 2021. Dataset The dataset is hosted on Google Drive and Baidu Drive (code: 43ph). Some example scenes are shown below. The structure of the dataset is illustrated below. File Description GT.zip All ground truth sequences in RGB format LQ.zip All low quality sequences in RGB format GT_YCbCr.zip All ground truth sequences in YCbCr format LQ_YCbCr.zip All […]

Read more

A Statutory Article Retrieval Dataset in French

This repository contains the Belgian Statutory Article Retrieval Dataset (BSARD), as well as the code to reproduce the experimental results from the associated paper by A. Louis, G. Spanakis, and G. Van Dijck. Abstract. Statutory article retrieval is the task of automatically retrieving law articles relevant to a legal question. While recent advances in natural language processing have sparked considerable interest in many legal tasks, statutory article retrieval remains primarily untouched due to the scarcity of large-scale and high-quality annotated […]

Read more

User friendly Rasterio plugin to read raster datasets

rio-tiler User friendly Rasterio plugin to read raster datasets. rio-tiler was initialy designed to create slippy maptiles from large raster datasources and render these tiles dynamically on a web map. With rio-tiler v2.0 we added many more helper methods to readdata and metadata from any raster source supported by Rasterio/GDAL.This includes local files and via HTTP, AWS S3, Google Cloud Storage,etc. At the low level, rio-tiler is just a wrapper around the rasterio.vrt.WarpedVRT class, which can be useful for doing […]

Read more

Datasets from Instructions In Python

Datasets from Instructions This repository contains the code for Generating Datasets with Pretrained Language Models. The paper introduces a method called Datasets from Instructions (DINO sauropod) that enables pretrained language models to generate entire datasets from scratch. 🔧 Setup All requirements for DINO can be found in requirements.txt. You can install all required packages in a new environment with pip install -r requirements.txt. 💬 CLI Usage Single Texts To generate datasets for (single) text classification, you can use DINO as […]

Read more

Task-based datasets, preprocessing, and evaluation for sequence models

SeqIO SeqIO is a library for processing sequential data to be fed into downstream sequence models. It uses tf.data.Dataset to create scalable data pipelines but requires minimal use of TensorFlow. In particular, with one line of code, the returned dataset can be transformed to a numpy iterator and hence it is fully compatible with other frameworks such as JAX or PyTorch. Currently, SeqIO assumes that the dataset is a sequence, i.e., each feature is one-dimensional array. Modalities such as text […]

Read more
1 2 3