September 17, 2021 AWS

An AWS Professional Service open source initiative

Pandas on AWS

Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Quick Start

Installation command: pip install awswrangler

For platforms without PyArrow 3 support (e.g. EMR, Glue PySpark Job, MWAA):
pip install pyarrow==2 awswrangler

import awswrangler as wr

import pandas as pd

from datetime import datetime
df = pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"]})
# Storing data on Data Lake

wr.s3.to_parquet(

    df=df,

    path="s3://bucket/dataset/",

    dataset=True,

    database="my_db",

    table="my_table"

)
# Retrieving the data directly from Amazon S3

df = wr.s3.read_parquet("s3://bucket/dataset/", dataset=True)
# Retrieving the data from Amazon Athena

df = wr.athena.read_sql_query("SELECT * FROM my_table", database="my_db")
# Get a Redshift connection from Glue Catalog and retrieving data from Redshift Spectrum

con = wr.redshift.connect("my-glue-connection")

df = wr.redshift.read_sql_query("SELECT * FROM external_schema.my_table", con=con)
 
 

 
To finish reading, please visit source site


		
		
	

		Categories
Categories


	
		
			Search for:
			
		
		
	


		
		Recent Posts
		
											
					Quiz: The LEGB Rule & Understanding Python Scope
									
											
					Quiz: Building Type-Safe LLM Agents With Pydantic AI
									
											
					Building Type-Safe LLM Agents With Pydantic AI
									
											
					Adding Benchmaxxer Repellant to the Open ASR Leaderboard
									
											
					vLLM V0 to V1: Correctness Before Corrections in RL
									
					

		
Tags
Attention
blogathon
Calculus
Command-line Tools
Data Preparation
data science
data visualization
Deep Learning
Deep Learning for Computer Vision
Deep Learning for Natural Language Processing
Deep Learning for Time Series
Deep Learning Performance
Deep Learning with PyTorch
Ensemble Learning
Generative Adversarial Networks
Imbalanced Classification
Linear Algebra
Long Short-Term Memory Networks
machine learning
Machine Learning Algorithms
Machine Learning Process
Machine Learning Resources
machine translation
Matplotlib
Natural language processing
Natural Language Processing & Speech
Neural MT
nlp
NMT
opencv
Optimization
pandas
Probability
python
Python for Machine Learning
Python Machine Learning
Resources
R Machine Learning
scikit-learn
sentiment analysis
Start Machine Learning
Statistics
Time Series
Weka Machine Learning
XGBoost
Categories
Categories

Archives
		Archives


	
	
		

	
	
				
		
		
			
				
								
				
					
	
		Powered by WordPress and Rubine.