Hugging Face Releases New NLP ‘Tokenizers’ Library Version (v0.8.0)

Hugging Face is at the forefront of a lot of updates in the NLP space. They have released one groundbreaking NLP library after another in the last few years. Honestly, I have learned and improved my own NLP skills a lot thanks to the work open-sourced by Hugging Face.

And today, they’ve released another big update – a brand new version of their popular Tokenizer library.

huggingface tokenizers

 

A Quick Introduction to Tokenization

So, what is tokenization? Tokenization is a crucial cog in Natural Language Processing (NLP). It’s a fundamental step in both traditional NLP methods like Count Vectorizer and Advanced Deep Learning-based architectures like Transformers.

Tokens are the building blocks of Natural Language.

Tokenization is a way of separating a piece of text into smaller units called tokens. Here, tokens can be either words, characters, or subwords. Hence, tokenization can

 

 

 

To finish reading, please visit source site

Leave a Reply