aws sdk pandas Download - aws sdk pandas Source code download

aws sdk pandas

Python

3.10.1

Download

AWS SDK for pandas (awswrangler)

Pandas on AWS

Easy integration with Athena, Glue, Redshift, Timestream, OpenSearch, Neptune, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

AWS SDK for pandas

An AWS Professional Service open source initiative | [email protected]

Source	Downloads	Installation Command
PyPi		`pip install awswrangler`
Conda		`conda install -c conda-forge awswrangler`

️Starting version 3.0, optional modules must be installed explicitly:
➡️pip install 'awswrangler[redshift]'

Quick Start

Installation command: pip install awswrangler

️Starting version 3.0, optional modules must be installed explicitly:
➡️pip install 'awswrangler[redshift]'

import awswrangler as wrimport pandas as pdfrom datetime import datetimedf = pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"]})# Storing data on Data Lakewr.s3.to_parquet(df=df,path="s3://bucket/dataset/",dataset=True,database="my_db",table="my_table")# Retrieving the data directly from Amazon S3df = wr.s3.read_parquet("s3://bucket/dataset/", dataset=True)# Retrieving the data from Amazon Athenadf = wr.athena.read_sql_query("SELECT * FROM my_table", database="my_db")# Get a Redshift connection from Glue Catalog and retrieving data from Redshift Spectrumcon = wr.redshift.connect("my-glue-connection")df = wr.redshift.read_sql_query("SELECT * FROM external_schema.my_table", con=con)con.close()# Amazon Timestream Writedf = pd.DataFrame({"time": [datetime.now(), datetime.now()],   "my_dimension": ["foo", "boo"],"measure": [1.0, 1.1],
})rejected_records = wr.timestream.write(df,database="sampleDB",table="sampleTable",time_col="time",measure_col="measure",dimensions_cols=["my_dimension"],
)# Amazon Timestream Querywr.timestream.query("""SELECT time, measure_value::double, my_dimensionFROM "sampleDB"."sampleTable" ORDER BY time DESC LIMIT 3""")

At scale

AWS SDK for pandas can also run your workflows at scale by leveraging Modin and Ray. Both projects aim to speed up data workloads by distributing processing over a cluster of workers.

Read our docs or head to our latest tutorials to learn more.

️Ray is currently not available for Python 3.12. While AWS SDK for pandas supports Python 3.12, it cannot be used at scale.

Read The Docs

What is AWS SDK for pandas?
Install

PyPi (pip)
Conda
AWS Lambda Layer
AWS Glue Python Shell Jobs
AWS Glue PySpark Jobs
Amazon SageMaker Notebook
Amazon SageMaker Notebook Lifecycle
EMR
From source

At scale

Getting Started
Supported APIs
Resources

Tutorials

001 - Introduction
002 - Sessions
003 - Amazon S3
004 - Parquet Datasets
005 - Glue Catalog
006 - Amazon Athena
007 - Databases (Redshift, MySQL, PostgreSQL, SQL Server and Oracle)
008 - Redshift - Copy & Unload.ipynb
009 - Redshift - Append, Overwrite and Upsert
010 - Parquet Crawler
011 - CSV Datasets
012 - CSV Crawler
013 - Merging Datasets on S3
014 - Schema Evolution
015 - EMR
016 - EMR & Docker
017 - Partition Projection
018 - QuickSight
019 - Athena Cache
020 - Spark Table Interoperability
021 - Global Configurations
022 - Writing Partitions Concurrently
023 - Flexible Partitions Filter
024 - Athena Query Metadata
025 - Redshift - Loading Parquet files with Spectrum
026 - Amazon Timestream
027 - Amazon Timestream 2
028 - Amazon DynamoDB
029 - S3 Select
030 - Data Api
031 - OpenSearch
033 - Amazon Neptune
034 - Distributing Calls Using Ray
035 - Distributing Calls on Ray Remote Cluster
037 - Glue Data Quality
038 - OpenSearch Serverless
039 - Athena Iceberg
040 - EMR Serverless
041 - Apache Spark on Amazon Athena

API Reference

Amazon S3
AWS Glue Catalog
Amazon Athena
Amazon Redshift
PostgreSQL
MySQL
SQL Server
Oracle
Data API Redshift
Data API RDS
OpenSearch
AWS Glue Data Quality
Amazon Neptune
DynamoDB
Amazon Timestream
Amazon EMR
Amazon CloudWatch Logs
Amazon Chime
Amazon QuickSight
AWS STS
AWS Secrets Manager
Global Configurations
Distributed - Ray

License
Contributing

Getting Help

The best way to interact with our team is through GitHub. You can open an issue and choose from one of our templates for bug reports, feature requests... You may also find help on these community resources:

The #aws-sdk-pandas Slack channel
Ask a question on Stack Overflow and tag it with awswrangler
Runbook for AWS SDK for pandas with Ray

Logging

Enabling internal logging examples:

import logginglogging.basicConfig(level=logging.INFO, format="[%(name)s][%(funcName)s] %(message)s")logging.getLogger("awswrangler").setLevel(logging.DEBUG)logging.getLogger("botocore.credentials").setLevel(logging.CRITICAL)

Into AWS lambda:

import logginglogging.getLogger("awswrangler").setLevel(logging.DEBUG)

Expand

Additional Information

Version 3.10.1
Type Python
Update Time 2025-01-02
size 2.19MB
From Github

Related Applications

azure sdk for js

2024-11-10
Azure Kinect Sensor SDK

2024-11-10
onedrive sdk python

2024-11-05
ailia sdk

2024-11-04
Baby Pandas Town Life Chinese version

2024-05-29
Panda Care Pandas Life World mobile version

2024-02-17

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
Nuitka

Python

1.0.0
Google Blog Converters (blog data converter)

Python

1.0 R54
azure storage python

Python

v2.1.0
waymo open dataset

Other source code

December 2023 Update
termwind

Other categories

v2.3.0
wp functions

Other categories

1.0.0

Related Information All

aws sdk pandas