Data pipeline architecture for onboarding public datasets to Datasets for Google Cloud
Public Datasets Pipelines Cloud-native, data pipeline architecture for onboarding public datasets to Datasets for Google Cloud. We use Pipenv to make environment setup more deterministic and uniform across different machines. If you haven’t done so, install Pipenv using the instructions found here. Now with Pipenv installed, run the following command: pipenv install –ignore-pipfile –dev This uses the Pipfile.lock found in the project root and installs all the development dependencies. Finally, initialize the Airflow database: pipenv run airflow initdb Configuring, generating, […]
Read more