TOOD: Task-aligned One-stage Object Detection

TOOD TOOD: Task-aligned One-stage Object Detection (ICCV 2021 Oral) One-stage object detection is commonly implemented by optimizing two sub-tasks: object classification and localization, using heads with two parallel branches, which might lead to a certain level of spatial misalignment in predictions between the two tasks. In this work, we propose a Task-aligned One-stage Object Detection (TOOD) that explicitly aligns the two tasks in a learning-based manner. First, we design a novel Task-aligned Head (T-Head) which offers a better balance between […]

Read more

Multi-Task Vision and Language Representation Learning

12-in-1: Multi-Task Vision and Language Representation Learning Please cite the following if you use this code. Code and pre-trained models for 12-in-1: Multi-Task Vision and Language Representation Learning: @InProceedings{Lu_2020_CVPR, author = {Lu, Jiasen and Goswami, Vedanuj and Rohrbach, Marcus and Parikh, Devi and Lee, Stefan}, title = {12-in-1: Multi-Task Vision and Language Representation Learning}, booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2020} } and ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for […]

Read more

A technology that adds computer-generated virtual content to real-world views through devices

Augmented Reality 101 The development of areas such as computer vision, image processing, and computer graphics, allow the introduction of technologies such as Augmented Reality. Azuma defines Augmented Reality as “a technology that adds computer-generated virtual content to real-world views through devices”. Introduction The purpose of these map is to give you an idea about Augmented Reality and to guide you through the main features that surround this technology. Read complete post in AR 101 — Augmented Reality. Definition and […]

Read more

Python based SDK for multi human pose estimation through RGB webcam

PoseCamera PoseCamera is python based SDK for multi human pose estimation through RGB webcam. Install install posecamera package through pip pip install posecamera If you are having issues with the installation on Windows OS then check this page Usage See Google colab notebook https://colab.research.google.com/drive/18uoYeKmliOFV8dTdOrXocClCA7nTwRcX?usp=sharing draw pose keypoints on image import posecamera import cv2 det = posecamera.pose_tracker.PoseTracker() image = cv2.imread(“example.jpg”) pose = det(image) for name, (y, x, score) in pose.keypoints.items(): cv2.circle(image, (int(x), int(y)), 4, (255, 0, 0), -1) cv2.imshow(“PoseCamera”, image) cv2.waitKey(0) […]

Read more

Global Filter Networks for Image Classification

GFNet Created by Yongming Rao, Wenliang Zhao, Zheng Zhu, Jiwen Lu, Jie Zhou This repository contains PyTorch implementation for GFNet. Global Filter Networks is a transformer-style architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity. Our architecture replaces the self-attention layer in vision transformers with three key operations: a 2D discrete Fourier transform, an element-wise multiplication between frequency-domain features and learnable global filters, and a 2D inverse Fourier transform. Global Filter Layer GFNet is a conceptually […]

Read more

Text detection from images using EasyOCR: Hands-on guide

# Changing the image path IMAGE_PATH = ‘Turkish_text.png’ # Same code here just changing the attribute from [‘en’] to [‘zh’] reader = easyocr.Reader([‘tr’]) result = reader.readtext(IMAGE_PATH,paragraph=”False”) result Output: [[[[89, 7], [717, 7], [717, 108], [89, 108]], ‘Most Common Texting Slang in Turkish’], [[[392, 234], [446, 234], [446, 260], [392, 260]], ‘test’], [[[353, 263], [488, 263], [488, 308], [353, 308]], ‘yazmak’], [[[394, 380], [446, 380], [446, 410], [394, 410]], ‘link’], [[[351, 409], [489, 409], [489, 453], [351, 453]], ‘bağlantı’], [[[373, 525], […]

Read more

A service for quick deploying and using dockerized Computer Vision models

Inferoxy Inferoxy is a service for quick deploying and using dockerized Computer Vision models. It’s a core of EORA’s Computer Vision platform Vision Hub that runs on top of AWS EKS. Why use it? You should use it if: You want to simplify deploying Computer Vision models with an appropriate Data Science stack to production:all you need to do is to build a Docker imagewith your model including any pre- and post-processing steps and push it into an accessible registry […]

Read more

SRA’s seminar on Introduction to Computer Vision Fundamentals

Pixels_Seminar SRA’s seminar on Introduction to Computer Vision Fundamentals Introduction to Computer Vision This repository includes basics to : Python Numpy: A python library Git Computer Vision. The aim of this repository is to provide: A brief idea of algorithms involved in Computer Vision . Introduction to Version Control System: Git and GitHub. Computer Vision and Image Processing basics, idea of implementation of various algorithms involved using numpy (instead of any dedicated image processing library like OpenCV.) Introduction to a […]

Read more

A voice assistant which can be used to interact with your computer

J.A.R.V.I.S It is a voice assistant which can be used to interact with your computer and also you have been seeing it in Iron man movies, but this JARVIS is not that much advanced as shown in movies. API keys To run this project you should need a API key for reading news. Register for your API key by clicking the following Installation You need to first fork this repository and clone the repository to your local system git clone […]

Read more
1 2 3 4 6