September 25, 2020 Python

Python for NLP: Tokenization, Stemming, and Lemmatization with SpaCy Library

python_tutorials

In the previous article, we started our discussion about how to do natural language processing with Python. We saw how to read and write text and PDF files. In this article, we will start working with the spaCy library to perform a few more basic NLP tasks such as tokenization, stemming and lemmatization.

Introduction to SpaCy

The spaCy library is one of the most popular NLP libraries along with NLTK. The basic difference between the two libraries is the fact that NLTK contains a wide variety of algorithms to solve one problem whereas spaCy contains only one, but the best algorithm to solve a problem.

NLTK was released back in 2001 while spaCy is relatively new and was developed in 2015. In this series of articles on NLP, we will mostly be dealing with spaCy, owing to its state of the art nature. However, we will also touch NLTK when it is easier to perform a task using NLTK rather than spaCy.

Installing spaCy

If you use the pip installer to install your Python libraries, go to the command line and execute the following statement:

To finish reading, please visit source site


			
			nlp
nltk
python
spacy


		
		
	

		Categories
Categories


	
		
			Search for:
			
		
		
	


		
		Recent Posts
		
											
					Quiz: Use TorchAudio to Prepare Audio Data for Deep Learning
									
											
					Use TorchAudio to Prepare Audio Data for Deep Learning
									
											
					Quiz: The Python print() Function
									
											
					PadChest-GR: A bilingual grounded radiology reporting benchmark for chest X-rays
									
											
					Your Guide to the Python print() Function
									
					

		
Tags
Attention
blogathon
Calculus
Command-line Tools
Data Preparation
data science
data visualization
Deep Learning
Deep Learning for Computer Vision
Deep Learning for Natural Language Processing
Deep Learning for Time Series
Deep Learning Performance
Deep Learning with PyTorch
Ensemble Learning
Generative Adversarial Networks
Imbalanced Classification
Linear Algebra
Long Short-Term Memory Networks
machine learning
Machine Learning Algorithms
Machine Learning Process
Machine Learning Resources
machine translation
Matplotlib
Natural language processing
Natural Language Processing & Speech
Neural MT
nlp
NMT
opencv
Optimization
pandas
Probability
python
Python for Machine Learning
Python Machine Learning
Resources
R Machine Learning
scikit-learn
sentiment analysis
Start Machine Learning
Statistics
Time Series
Weka Machine Learning
XGBoost
Categories
Categories

Archives
		Archives


	
	
		

	
	
				
		
		
			
				
								
				
					
	
		Powered by WordPress and Rubine.