yato下載 - yato原始碼下載

yato

其他源碼

v0.0.9

下載

yato－另一個轉型協調器

yato 是地球上最小的編排器，用於在 DuckDB 之上編排 SQL 資料轉換。您只需提供一個包含 SQL 查詢的資料夾，它就會猜測 DAG 並以正確的順序執行查詢。

安裝

yato 適用於 Python 3.9+。

pip install yato-lib

開始使用

建立一個名為sql的資料夾並將 SQL 檔案放入其中，例如您可以使用範例資料夾中給出的 2 個查詢。

 from yato import Yato

yato = Yato (
    # The path of the file in which yato will run the SQL queries.
    # If you want to run it in memory, just set it to :memory:
    database_path = "tmp.duckdb" ,
    # This is the folder where the SQL files are located.
    # The names of the files will determine the name of the table created.
    sql_folder = "sql/" ,
    # The name of the DuckDB schema where the tables will be created.
    schema = "transform" ,
)

# Runs yato against the DuckDB database with the queries in order.
yato . run ()

您也可以使用 cli 來執行 yato：

yato run --db tmp.duckdb sql/

與 dlt 一起使用

yato 旨在與 dlt 配合使用。 dlt 處理資料加載，yato 處理資料轉換。

 import dlt
from yato import Yato

yato = Yato (
    database_path = "db.duckdb" ,
    sql_folder = "sql/" ,
    schema = "transform" ,
)

# You restore the database from S3 before runnning dlt
yato . restore ()

pipeline = dlt . pipeline (
    pipeline_name = "get_my_data" ,
    destination = "duckdb" ,
    dataset_name = "production" ,
    credentials = "db.duckdb" ,
)

data = my_source ()

load_info = pipeline . run ( data )

# You backup the database after a successful dlt run
yato . backup ()
yato . run ()

進階用法

混合 SQL 和 Python 轉換

即使我們願意做所有事情都是 SQL，但有時使用 pandas（或其他函式庫）在 Python 中編寫轉換可能會更快。

這就是為什麼您可以在 yato 中混合 SQL 和 Python 轉換。

為此，您可以在轉換資料夾中新增一個 Python 檔案。在此 Python 檔案中，您必須使用run方法實作Transformation類別。如果您依賴其他 SQL 轉換，則必須在名為source_sql的靜態方法中定義來源 SQL 查詢。

下面是轉換的範例（例如orders.py ）。框架將理解訂單需要在 source_orders 之後運行。

 from yato import Transformation


class Orders ( Transformation ):
    @ staticmethod
    def source_sql ():
        return "SELECT * FROM source_orders"

    def run ( self , context , * args , ** kwargs ):
        df = self . get_source ( context )

        df [ "new_column" ] = 1

        return df