Part 6: Step by Step Guide to Master NLP – Word2Vec

This article was published as a part of the Data Science Blogathon Introduction This article is part of an ongoing blog series on Natural Language Processing (NLP). In the previous article of this series, we completed the statistical or frequency-based word embedding techniques, which are pre-word embedding era techniques. So, in this article, we will discuss the recent word-era embedding techniques. NOTE: In recent word-era embedding, there are many such techniques but in this article, we will discuss only the Word2Vec […]

Read more

Regex Cheatsheet For Natural Language Processing tasks

This article was published as a part of the Data Science Blogathon Introduction Regex is a shorthand for Regular Expression. It is a representation for a set, a set of strings. Say we have a list of emails and we want to check if they are in the correct format or not. One way is to check each and every mail manually but that’s not possible if the number of mails is quite high. So, regex here comes to your rescue. […]

Read more

Part 13: Step by Step Guide to Master NLP – Regular Expressions

This article was published as a part of the Data Science Blogathon Introduction This article is part of an ongoing blog series on Natural Language Processing (NLP). From this article, we will start our discussion on Regular Expressions. When a data scientist comes across a text processing problem whether it is searching for titles in names or dates of birth in a dataset, regular expressions rear their ugly head very frequently. They form part of the basic techniques in NLP and […]

Read more

Part 2: Topic Modeling and Latent Dirichlet Allocation (LDA) using Gensim and Sklearn

This article was published as a part of the Data Science Blogathon Introduction In the previous article, we had started with understanding the basic terminologies of text in Natural Language Processing(NLP), what is topic modeling, its applications, the types of models, and the different topic modeling techniques available. Let’s continue from there, explore Latent Dirichlet Allocation (LDA), working of LDA, and its similarity to another very popular dimensionality reduction technique called Principal Component Analysis (PCA).   Table of Contents A Little […]

Read more

Topic modeling With Naive Bayes Classifier

This article was published as a part of the Data Science Blogathon Introduction Naive Bayes is a powerful tool that leverages Bayes’ Theorem to understand and mimic complex data structures. In recent years, it has commonly been used for Natural Language Processing (NLP) tasks, such as text categorization. Today, we will be constructing a Naive Bayes text classifier for topic categorization. Before we move forward with the explanation, I want to emphasize that Naive Bayes is not the traditional method of […]

Read more

A Python library for processing and analysis of electron backscatter diffraction patterns

kikuchipy kikuchipy is an open-source Python library for processing and analysis of electron backscatter diffraction (EBSD) patterns. The library builds on the tools for multi-dimensional data analysis provided by the HyperSpy library. User guide and API reference: https://kikuchipy.org. The guide consists of Jupyter Notebooks with many links to detailed explanations of the input parameters and output of functions and class methods (the API reference). The notebooks can be inspected statically on the web page or via nbviewer, downloaded and run […]

Read more

A python library with tools for the Molecular Simulation

This package is a python library with tools for the Molecular Simulation – Software Gromos. It allows you to easily set up, manage and analyze simulations in python. General informations about functions can be found in our wiki and usage example for many general functions and theire relations are shown in jupyter notebooks in the examples in the example folder. Content GROMOS wrappers GromosXX wrapper: for simulation execution GromosPP wrapper: for GROMOS++ program usage File handling of all GROMOS file […]

Read more

ETL pipeline on movie data using Python and postgreSQL

Movies-ETL ETL pipeline on movie data using Python and postgreSQL Overview This project consisted on a automated Extraction, Transformation and Load pipeline. This ETL extracted movie data from wikipedia, kaggle, and MovieLens to clean it, transform it, and merge it using Pandas. The product was a merged table with movies and ratings loaded to PostgreSQL. Resources Data sources: movies_metadata.csv ratings.csv wikipedia_movies.json Software: Python PostgreSQL Pandas SQLAlchemy Regular Expressions Results Summary The pipeline was created under the following assumptions: I was […]

Read more

A Brain Tumor Detection and Classification model built using RESNET50 architecture

TumorInsight TumorInsight is a Brain Tumor Detection and Classification model built using RESNET50 architecture. It aims to detect and classify the brain tumours from MRI scans. The detection is done using Image Processing algorithms and classification using Deep learning techniques.The model is also deployed as a web application using Flask framework. Download trained model from here. def main() text = “Hello World!” return text if __name__ == ‘__main__’: text = main() print(text) Lorem ipsum dolor sit amet, consectetur adipiscing elit. […]

Read more

A simple universal code generation tool in python

Żmija Żmija is a simple universal code generation tool. It is intended to be used as a means to generate code that is both efficient and easily maintainable. It is intended to be used in embdedded systems with limited resources, however it can be used anywhere else as well. Żmija lets you define sections in your code where code is generated automatically in accordance to a Python script that you provide. Such a section typically looks like this: /* ~ZMIJA.GENERATOR: […]

Read more
1 609 610 611 612 613 972