Apache Kafka + KSQL + TensorFlow for Data Scientists via Python + Jupyter Notebook

Why would a data scientist use Kafka Jupyter Python KSQL TensorFlow all together in a single notebook? There is an impedance mismatch between model development using Python and its Machine Learning tool stack and a scalable, reliable data platform. The former is what you need for quick and easy prototyping to build analytic models. The latter is what you need to use for data ingestion, preprocessing, model deployment and monitoring at scale. It requires low latency, high throughput, zero data […]

Read more

Visually Explained: Three Excel Core-Features Even Excel-Pros Don’t Know

Over the last few years, Excel has been redesigned from the ground up. Currently, Microsoft is making the new Excel core-features available to every user, regardless of your Office 365 license. Thanks to the Microsoft naming conventions, it is easy to confuse the new features with existing ones. That being said, Power Query and Power Pivot are not the same things as Pivot Tables, which you have likely been using for years. Power Query (M-Language)Data preparation is very time-consuming. Power […]

Read more

Starting to develop in PySpark with Jupyter installed in a Big Data Cluster

Is not a secret that Data Science tools like Jupyter, Apache Zeppelin or the more recently launched Cloud Data Lab and Jupyter Lab are a must be known for the day by day work so How could be combined the power of easily developing models and the capacity of computation of a Big Data Cluster? Well in this article I will share very simple step to start using Jupyter notebooks for PySpark in a Data Proc Cluster in GCP. Final goal Prerequisites 1. Have a Google Cloud […]

Read more