An AWS Professional Service open source initiative

Pandas on AWS Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL). Quick Start Installation command: pip install awswrangler For platforms without PyArrow 3 support (e.g. EMR, Glue PySpark Job, MWAA):pip install pyarrow==2 awswrangler import awswrangler as wr import pandas as pd from datetime import datetime df = pd.DataFrame({“id”: [1, 2], “value”: [“foo”, “boo”]}) # Storing data on Data Lake wr.s3.to_parquet( df=df, path=”s3://bucket/dataset/”, dataset=True, database=”my_db”, table=”my_table” […]

Read more

A rTorrent Disk Checker python script

rTorrent Disk Checker This program is capable of the following when: – a torrent is added by any program (autodl-irssi, RSS Downloader etc) – a torrent is added remotely or directly This program checks your available disk space. If your free disk space is not large enough to accommodate a pending torrent, the program will delete torrents based on criteria defined in config.py. If your disk space is still too low, the torrent will be sent to rTorrent in a […]

Read more

A method for cleaning and classifying text using transformers

NLP Translation and Classification The repository contains a method for classifying and cleaning text using NLP transformers. Overview The input data are web-scraped product names gathered from various e-shops. The products are either monitors or printers. Each product in the dataset has a scraped name containing information about the product brand, and product model name, but also unwanted noise – irrelevant information about the item. Additionally, only some records are relevant, meaning that they belong to the correct category: monitor […]

Read more

A Python package to create and manage your seismic training data, processes, and visualization in a single place

QuakeLabeler Quake Labeler was born from the need for seismologists and developers who are not AI specialists to easily, quickly, and independently build and visualize their training data set. Introduction QuakeLabeler is a Python package to customize, build and manage your seismic training data, processes, and visualization in a single place — so you can focus on building the next big thing. Current functionalities include retrieving waveforms from data centers, customizing seismic samples, auto-building datasets, preprocessing and augmenting for labels, […]

Read more

Community-based extensions for the python-telegram-bot library

ptbcontrib Community-based extensions for the python-telegram-bot library. This library provides extensions for the python-telegram-bot library written and maintained by the community of PTB users. Installing Because this library is subject to more frequent changes than PTB, it is not available via PyPi. You can still install it via pip: $ pip install git+https://github.com/python-telegram-bot/ptbcontrib.git If you want to use an extension that has some special requirements, you can install them on the fly as e.g. $ pip install “ptbcontrib[extension1,extension2] @ git+https://github.com/python-telegram-bot/ptbcontrib.git” […]

Read more

The training code for the 4th place model at MDX 2021 leaderboard A

This repository contains the training code of our winning model at Music Demixing Challenge 2021, which got the 4th place on leaderboard A (6th in overall), and help us (Kazane Ryo no Danna) winned the bronze prize. Model Summary Our final winning approach blends the outputs from three models, which are: model 1: A X-UMX model [1] which is initialized with the weights of the official baseline, and is fine-tuned with a modified Combinational Multi-Domain Loss from [1]. In particular, […]

Read more

A parsing tool it implements a flexible lexer and a straightforward approach to analyze documents

Python Eacc is a parsing tool it implements a flexible lexer and a straightforward approach to analyze documents. It uses Python code to specify both lexer and grammar for a given document. Eacc can handle succinctly most parsing cases that existing Python parsing tools propose to address. Documents are split into tokens and a token has a type when a sequence of tokens is matched it evaluates to a specific type then rematcned again against the existing rules. The types […]

Read more

A storage engine for vector machine learning embeddings

Embeddinghub is a database built for machine learning embeddings. It is built with four goals in mind. Store embeddings durably and with high availability Allow for approximate nearest neighbor operations Enable other operations like partitioning, sub-indices, and averaging Manage versioning, access control, and rollbacks painlessly Features Supported Operations: Run approximate nearest neighbor lookups, average multiple embeddings, partition tables (spaces), cache locally while training, and more. Storage: Store and index billions vectors embeddings from our storage layer. Versioning: Create, manage, and […]

Read more

Flexible Generation of Natural Language Deductions

a.k.a. ParaPattern https://arxiv.org/abs/2104.08825 Kaj Bostrom, Lucy Zhao, Swarat Chaudhuri, and Greg Durrett This repository contains all the code needed to replicate the experiments from the paper, and additionally provides a set of tools to put together new natural language deduction operations from scratch. In the data/ folder, you’ll find all the data used to train and evaluate our models, already preprocessed and ready to go, with the exception of the MNLI dataset due to its size – if you want […]

Read more
1 495 496 497 498 499 973