File-based TF-IDF: Calculates keywords in a document, using a word corpus

Calculates keywords in a document, using a word corpus. Why? Because I found myself with hundreds of plain text files, with no way to know what each one contains. I then recalled this thing called TF-IDF from university, but found no utility that operates on files. Hence, here we are. How? Basically, each word in the current document gets a score. The score increases each time the word it appears in this document, and decreases each time it appears in […]

Read more

Parse URLs for DOIs, PubMed identifiers, PMC identifiers, arXiv identifiers, etc

Parse URLs for DOIs, PubMed identifiers, PMC identifiers, arXiv identifiers, etc. This module has a single parse() function that takes in a URL and gives backa CURIE pair (with None as the first entry if it could not parse) >>> import citation_url >>> citation_url.parse(“https://joss.theoj.org/papers/10.21105/joss.01708”) (‘doi’, ‘10.21105/joss.01708’) >>> citation_url.parse(“http://www.ncbi.nlm.nih.gov/pubmed/34739845”) (‘pubmed’, ‘34739845’) >>> citation_url.parse(“https://example.com/true-garbage”) (None, ‘https://example.com/true-garbage’) 🚀 Installation The most recent release can be installed fromPyPI with: $ pip install citation_url The most recent code and data can be installed directly from […]

Read more

A python tool to find good RCE

A tool to find good RCEFrom my series: A powerful Burp extension to make bounties rain Well i got exactly what you need this burpsuite extension powered by a powerful AI engine, find good RCE and report it, all you need is to install it and browse sites like normal unsuspecting users (how cool is that). Download and install    

Read more

Official repository of our paper Differentiable Wavetable Synthesis

Official repository of our paper “Differentiable Wavetable Synthesis” accepted by ICASSP 2022. @article{shan2021differentiable, title={Differentiable Wavetable Synthesis}, author={Shan, Siyuan and Hantrakul, Lamtharn and Chen, Jitong and Avent, Matt and Trevelyan, David}, journal={arXiv preprint arXiv:2111.10003}, year={2021} } Our codes are based on DDSP, please first set up the environments as desribed here. To train our model on Nsynth dataset, run    

Read more

A small script to help me solve Wordle

A small script to help me solve Wordle because I’m that lazy. Warning: I didn’t write this to be efficient nor elegant at all, so you’ll probably have a hard time with the UI and the untidy code. I just thought I might as well save it to GitHub. Usages Clone/download the repo: git clone https://github.com/k4yt3x/wordle-solver.git cd wordle-solver/src Install dependencies: pip3 install -U -r requirements.txt Launch the script: The script will give you an initial guess. If this word exists […]

Read more

The starter repository for submissions to the GeneDisco challenge for optimized experimental design in genetic perturbation experiments

The starter repository for submissions to the GeneDisco challenge for optimized experimental design in genetic perturbation experiments. GeneDisco (to be published at ICLR-22) is a benchmark suite for evaluating activelearning algorithms for experimental design in drug discovery.GeneDisco contains a curated set of multiple publicly available experimental data sets as well as open-sourceimplementations of state-of-the-art active learning policies for experimental design and exploration. Install pip install -r requirements.txt Use Setup Create a cache directory. This will hold any preprocessed and downloaded […]

Read more

Pytorch implementation of MaskGIT: Masked Generative Image Transformer

Pytorch implementation of MaskGIT: Masked Generative Image Transformer (https://arxiv.org/pdf/2202.04200.pdf) Note: this is work in progress MaskGIT is an extension to the VQGAN paper which improves the second stage transformer part (and leaves the first stage untouched). It switches the unidirectional transformer for a bidirectional transformer. The (second stage) training is pretty similar to BERT by randomly masking out tokens and trying to predict these using the bidirectional transformer (the original work used a GPT architecture randomly replaced tokens by other […]

Read more

PLStream: A Framework for Fast Polarity Labelling of Massive Data Streams

Motivation When dataset freshness is critical, the annotating of high speed unlabelled data streams becomes critical but remains an open problem. We propose PLStream, a novel Apache Flink-based framework for fast polarity labelling of massive data streams, like Twitter tweets or online product reviews. Environment Requirements relative python packages are summerized in requirements.txt Flink v1.13 Python 3.7 Java 8 DataSource Tweets Yelp Reviews Amazon Reviews Quick Start quick try PLStream on yelp review dataset Data Prepare cd PLStream weget https://s3.amazonaws.com/fast-ai-nlp/yelp_review_polarity_csv.tgz […]

Read more

Module 2’s katas from Launch X’s python introduction course

Module 2’s katas from Launch X’s python introduction course. Virtual environment creation process (on Windows): Create a folder in any desired direction, I created mine in the documents/ folder and named it test-project. Next, use the Windows command prompt to navigate to the folder’s location, and executed the command py -m venv env. A folder named env should appear in your root directory. Overall directory structure:    

Read more
1 266 267 268 269 270 977