AST: Audio Spectrogram Transformer

AST This repository contains the official implementation (in PyTorch) of the Audio Spectrogram Transformer (AST) proposed in the Interspeech 2021 paper AST: Audio Spectrogram Transformer (Yuan Gong, Yu-An Chung, James Glass). AST is the first convolution-free, purely attention-based model for audio classification which supports variable length input and can be applied to various tasks. We evaluate AST on various audio classification benchmarks, where it achieves new state-of-the-art results of 0.485 mAP on AudioSet, 95.6% accuracy on ESC-50, and 98.1% accuracy […]

Read more

Variable Transformer Calculator with python

VASCO – VAriable tranSformer CalculatOr Software que calcula informações de transformadores feita para a matéria de “Conversão Eletromecânica de Energia I” do curso de “Engenharia Elétrica” da Universidade Federal de Santa Maria Campus Cachoeira do Sul (UFSM-CS). Autores: Arthur Cordeiro Andrade João Gabriel Silva de Avellar Dependências pip install pygame Dependências: pip install pyinstaller Comando Compilar pyinstaller –noconfirm –onefile –windowed –icon “Images/Cruz-De-Malta.ico” –add-data “Images;Images/” –add-data “Sounds;Sounds/” “./VASCO.py” GitHub https://github.com/ArthurCoAnd/VASCO    

Read more

xpdt: eXPeditious Data Transfer

xpdt xpdt is (yet another) language for defining data-types and generating code for serializing and deserializing them. It aims to produce code with little or no overhead and is based on fixed-length representations which allows for zero-copy deserialization and (at-most-)one-copy writes (source to buffer). The generated C code, in particular, is highly optimized and often permits the elimination of data-copying for writes and enables optimizations such as loop-unrolling for fixed-length objects. This can lead to read speeds in excess of […]

Read more

Chasing Sparsity in Vision Transformers: An End-to-End Exploration

SViTE [Preprint] “Chasing Sparsity in Vision Transformers: An End-to-End Exploration” by Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang Extensive results on ImageNet with diverse ViT backbones validate the effectiveness of our proposals which obtain significantly reduced computational cost and almost unimpaired generalization. Perhaps most surprisingly, we find that the proposed sparse (co-)training can even improve the ViT accuracy rather than compromising it, making sparsity a tantalizing “free lunch”. For example, our sparsified DeiT-Small at (5%, […]

Read more

A Unified Vision and Dialog Transformer with BERT

VD-BERT PyTorch Code for the following paper at EMNLP2020:Title: VD-BERT: A Unified Vision and Dialog Transformer with BERT [pdf]Authors: Yue Wang, Shafiq Joty, Michael R. Lyu, Irwin King, Caiming Xiong, Steven C.H. HoiInstitute: Salesforce Research and CUHKAbstractVisual dialog is a challenging vision-language task, where a dialog agent needs to answer a series of questions through reasoning on the image content and dialog history. Prior work has mostly focused on various attention mechanisms to model such intricate interactions. By contrast, in […]

Read more

Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations

TransFill-Reference-Inpainting This is the official repo for TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations (Yuqian Zhou, Connelly Barnes, Eli Shechtman, Sohrab Amirghodsi) at CVPR’21. According to some confidential reasons, we are not planning to release the training/testing codes and models. Online-demo will be public once we set up the server. However, we release the testing dataset for comparsion, and the scripts to prepare the training dataset. Introduction Image inpainting is the task of plausibly restoring missing […]

Read more

A novel attention-based architecture for vision-and-language navigation

Episodic Transformers (E.T.) Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions. This code reproduces the results obtained with E.T. on ALFRED benchmark. To learn more about the benchmark and the original code, please refer to ALFRED repository. Quickstart Clone repo: $ git clone https://github.com/alexpashevich/E.T..git ET $ export ET_ROOT=$(pwd)/ET $ export ET_LOGS=$ET_ROOT/logs $ export ET_DATA=$ET_ROOT/data $ export […]

Read more

CATs: Semantic Correspondence with Transformers

CATs Semantic Correspondence with Transformers Our model CATs is illustrated below: git clone https://github.com/SunghwanHong/CATs cd CATs conda create -n CATs python=3.6 conda activate CATs pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html pip install -U scikit-image pip install git+https://github.com/albumentations-team/albumentations pip install tensorboardX termcolor timm tqdm requests pandas Download pre-trained weights on Link All datasets are automatically downloaded into directory specified by argument datapath Result on SPair-71k: (PCK 49.9%) python test.py –pretrained “/path_to_pretrained_model/spair” –benchmark spair Result on SPair-71k, feature backbone frozen: (PCK […]

Read more
1 5 6 7