July 29, 2021 Machine Learning

End-to-End Pre-training for Vision-Language Representation Learning

Seeing Out of tHe bOx

End-to-End Pre-training for Vision-Language Representation Learning [CVPR’21, Oral]
By Zhicheng Huang*, Zhaoyang Zeng*, Yupan Huang*, Bei Liu, Dongmei Fu and Jianlong Fu

arxiv: https://arxiv.org/pdf/2104.03135.pdf

This is the official implementation of the paper. In this paper, we propose SOHO to “See Out of tHe bOx” that takes a whole image as input, and learns vision-language representation in an end-to-end manner. SOHO does not require bounding box annotations which enables inference 10 times faster than region-based approaches.

Architecture

Seeing-Out-of-tHe-bOx

Release Progress

Installation

conda create -n soho python=3.7

conda activate soho

conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge

git clone https://github.com/NVIDIA/apex.git

cd apex

python setup.py install --cuda_ext --cpp_ext

cd ../ && rm -rf apex

git clone https://github.com/researchmm/soho.git

cd
 
 

 
To finish reading, please visit source site


		
		
	

		Categories
Categories


	
		
			Search for:
			
		
		
	


		
		Recent Posts
		
											
					Quiz: What Are CRUD Operations?
									
											
					Python’s Built-in Exceptions: A Walkthrough With Examples
									
											
					HTML and CSS Foundations for Python Developers
									
											
					RASCAL: Novel robotics for scalable and highly available automated storage and retrieval
									
											
					What Is the __pycache__ Folder in Python?
									
					

		
Tags
algorithms
Attention
blogathon
Calculus
Command-line Tools
Data Preparation
data science
data visualization
Deep Learning
Deep Learning for Computer Vision
Deep Learning for Natural Language Processing
Deep Learning for Time Series
Deep Learning Performance
Deep Learning with PyTorch
Ensemble Learning
Generative Adversarial Networks
Imbalanced Classification
Linear Algebra
Long Short-Term Memory Networks
machine learning
Machine Learning Algorithms
Machine Learning Process
Machine Learning Resources
machine translation
Matplotlib
Natural language processing
Natural Language Processing & Speech
Neural MT
nlp
NMT
opencv
Optimization
pandas
Probability
python
Python for Machine Learning
Python Machine Learning
R Machine Learning
scikit-learn
sentiment analysis
Start Machine Learning
Statistics
Time Series
Weka Machine Learning
XGBoost
Categories
Categories

Archives
		Archives


	
	
		

	
	
				
		
		
			
				
								
				
					
	
		Powered by WordPress and Rubine.