Machine Translation Weekly 87: Notes from ACL 2021

The story of the science fiction novel Roadside Picnic by Arkady and Boris Strugatsky (mostly known via Tarkovsky’s 1979 film Stalker) takes place after an extraterrestrial event called the Visitation. Some aliens stopped by, made a roadside picnic, and left behind plenty of weird and dangerous objects having features that contemporary science cannot explain. Although the UN tries to prevent people from entering the visitation zones before everything gets fully explored and explained, objects from the zone are traded on […]

Read more

Build an end-end Currency Convertor chatbot with Python and Dialogflow

This article was published as a part of the Data Science Blogathon Introduction Hello all, Hope you are fine. In this tutorial we will learn how to create chatbots using Dialogflow and python, as well we will learn the deployment of chatbots to telegram. In our previous articles, we have learned to create a simple rule-based chatbot using simple python and NLTK libraries. I would like to request you to have a look at the article creating a simple chatbot […]

Read more

Sign Language Transformers (CVPR’20)

Sign Language Transformers (CVPR’20) This repo contains the training and evaluation code for the paper Sign Language Transformers: Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation. This code is based on Joey NMT but modified to realize joint continuous sign language recognition and translation. For text-to-text translation experiments, you can use the original Joey NMT framework. Requirements Download the feature files using the data/download.sh script. [Optional] Create a conda or python virtual environment. Install required packages using the […]

Read more

Self-Supervised Learning with Vision Transformers

Self-Supervised Learning with Vision Transformers By Zhenda Xie*, Yutong Lin*, Zhuliang Yao, Zheng Zhang, Qi Dai, Yue Cao and Han Hu This repo is the official implementation of “Self-Supervised Learning with Swin Transformers”. A important feature of this codebase is to include Swin Transformer as one of the backbones, such that we can evaluate the transferring performance of the learnt representations on down-stream tasks of object detection and semantic segmentation. This evaluation is usually not included in previous works due […]

Read more

Using VideoBERT to tackle video prediction

VideoBERT This repo reproduces the results of VideoBERT (https://arxiv.org/pdf/1904.01766.pdf). Inspiration was taken from https://github.com/MDSKUL/MasterProject, but this repo tackles video prediction rather than captioning and masked language modeling. On a side note, since this model is extremely small, the results that are displayed here are extremely basic. Feel free to increase the model size per your computational resources and change the inference file to include temperature if necessary (As of now I have not implemented temperature). Here are all the steps […]

Read more

Multi-Task Vision and Language Representation Learning

12-in-1: Multi-Task Vision and Language Representation Learning Please cite the following if you use this code. Code and pre-trained models for 12-in-1: Multi-Task Vision and Language Representation Learning: @InProceedings{Lu_2020_CVPR, author = {Lu, Jiasen and Goswami, Vedanuj and Rohrbach, Marcus and Parikh, Devi and Lee, Stefan}, title = {12-in-1: Multi-Task Vision and Language Representation Learning}, booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2020} } and ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for […]

Read more

Vision-Language Pre-training on Fashion Domain

Kaleido-BERT: Vision-Language Pre-training on Fashion Domain Mingchen Zhuge*, Dehong Gao*, Deng-Ping Fan#, Linbo Jin, Ben Chen, Haoming Zhou, Minghui Qiu, Ling Shao. Introduction We present a new vision-language (VL) pre-training model dubbed Kaleido-BERT, which introduces a novel kaleido strategy for fashion cross-modality representations from transformers. In contrast to random masking strategy of recent VL models, we design alignment guided masking to jointly focus more on image-text semantic relations.To this end, we carry out five novel tasks, ie, rotation, jigsaw, camouflage, […]

Read more

A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

ALFRED ALFRED (Action Learning From Realistic Environments and Directives), is a new benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks. Long composition rollouts with non-reversible state changes are among the phenomena we include to shrink the gap between research benchmarks and real-world applications. What more? Checkout ALFWorld – interactive TextWorld environments for ALFRED scenes! Quickstart Clone repo: $ git clone https://github.com/askforalfred/alfred.git alfred $ export ALFRED_ROOT=$(pwd)/alfred Install requirements: $ virtualenv […]

Read more

DeLighT: Very Deep and Light-weight Transformers

DeLighT: Very Deep and Light-weight Transformers This repository contains the source code of our work on building efficient sequence models: DeFINE (ICLR’20) and DeLighT (preprint). Overview In this repository, we share the source code of our paper DeLight, that delivers similar or better performance thantransformer-based models with significantly fewer parameters. DeLighT more efficiently allocates parameters both (1)within each Transformer block using DExTra, a deep and light-weight transformation and (2) across blocks usingblock-wise scaling, that allows for shallower and narrower DeLighT […]

Read more
1 517 518 519 520 521 927