Exploring how context, culture, and character matter in avatar research

This research paper was presented at the IEEE VR Workshop Series on Animation in Virtual and Augmented Environments (opens in new tab) (ANIVAE 2024), the premier series on 3D content creation for simulated training in extended reality. Face-to-face communication is changing, moving beyond physical interaction to include video conferencing and AR/VR platforms, where the participants are represented by avatars. Sophisticated avatars, animated through motion tracking, can realistically  

Read more

An all-in-one toolkit for computer vision

Introduction EasyCV is an all-in-one computer vision toolbox based on PyTorch, mainly focus on self-supervised learning, image classification, metric-learning, object detection and so on. Major features SOTA SSL Algorithms EasyCV provides state-of-the-art algorithms in self-supervised learning based on contrastive learning such as SimCLR, MoCO V2, Swav, DINO and also MAE based on masked image modeling. We also provides standard benchmark tools for ssl model evaluation. Vision Transformers EasyCV aims to provide plenty vision transformer models trained either using supervised learning […]

Read more

The first released system towards complex meters` detection and recognition, which is implemented by computer vision techniques

This is the first released system towards detection and recognition of complex meters in wild. The system can be divided into three moduels. Fisrtly, a yolo-based detector is applied to get pure meter region. Secondly, a spatial transformer module is eatablished to rectify the position of meter. Lastly, an end-to-end network is to read meter values, which is implemented by pointer/dail predcition and key number learning. Visulization results Left row is the original image, middle row is the process of […]

Read more

A Simple Long-Tailed Rocognition Baseline via Vision-Language Model

This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model. Requirements Python3 Pytorch(1.7.1 recommended) yaml other necessary packages Datasets ImageNet_LT Places_LT Download the ImageNet_2014 and Places_365. Modify the data_root in main.py to refer to your own dataset path. Training Phase A python main.py –cfg ./config/ImageNet_LT/clip_A_rn50.yaml Phase B python main.py –cfg ./config/ImageNet_LT/clip_B_rn50.yaml Testing

Read more

LeafSnap replicated using deep neural networks to test accuracy compared to traditional computer vision methods

Convolutional Neural Networks have become largely popular in image tasks such as image classification recently largely due to to Krizhevsky, et al. in their famous paper ImageNet Classification with Deep Convolutional Neural Networks. Famous models such as AlexNet, VGG-16, ResNet-50, etc. have scored state of the art results on image classfication datasets such as ImageNet and CIFAR-10. We present an application of CNN’s to the task of classifying trees by images of their leaves; specifically all 185 types of trees […]

Read more

CvT: Introducing Convolutions to Vision Transformers

convolution-vision-transformers Pytorch implementation of CvT: Introducing Convolutions to Vision Transformers , for official repo please visit here. Usage: img = torch.ones([1, 3, 224, 224]) model = CvT(224, 3, 1000) parameters = filter(lambda p: p.requires_grad, model.parameters()) parameters = sum([np.prod(p.size()) for p in parameters]) / 1_000_000 print(‘Trainable Parameters: %.3fM’ % parameters) out = model(img) print(“Shape of out :”, out.shape) # [B, num_classes] Citation: @misc{wu2021cvt, title={CvT: Introducing Convolutions to Vision Transformers}, author={Haiping Wu and Bin Xiao and Noel Codella and Mengchen Liu and […]

Read more

TOOD: Task-aligned One-stage Object Detection

TOOD TOOD: Task-aligned One-stage Object Detection (ICCV 2021 Oral) One-stage object detection is commonly implemented by optimizing two sub-tasks: object classification and localization, using heads with two parallel branches, which might lead to a certain level of spatial misalignment in predictions between the two tasks. In this work, we propose a Task-aligned One-stage Object Detection (TOOD) that explicitly aligns the two tasks in a learning-based manner. First, we design a novel Task-aligned Head (T-Head) which offers a better balance between […]

Read more

Multi-Task Vision and Language Representation Learning

12-in-1: Multi-Task Vision and Language Representation Learning Please cite the following if you use this code. Code and pre-trained models for 12-in-1: Multi-Task Vision and Language Representation Learning: @InProceedings{Lu_2020_CVPR, author = {Lu, Jiasen and Goswami, Vedanuj and Rohrbach, Marcus and Parikh, Devi and Lee, Stefan}, title = {12-in-1: Multi-Task Vision and Language Representation Learning}, booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2020} } and ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for […]

Read more
1 2 3 5