llm4dataダウンロード - llm4dataソースコードのダウンロード

llm4data

その他のソースコード

v0.0.9

ダウンロード

LLM4データ

LLM4Data は、開発データと知識発見のための大規模言語モデル (LLM) と人工知能の適用を容易にするように設計された Python ライブラリです。これは、ユーザーと組織が自然言語を通じて革新的な方法で開発データを発見し、操作できるようにすることを目的としています。

開発データチャット

LLM4Data は DevData Chat アプリケーションを強化しており、間もなくオープンソースプロジェクトとして利用可能になる予定です。このアプリケーションは、LLM (言語モデル) と AI がデータと知識の発見と対話を強化し、発見可能性とアクセシビリティのギャップに対処するのに役立つ革新的なソリューションをもたらすさまざまな方法を示しています。

LLM4Data ライブラリには、ドキュメント、指標、マイクロデータ、地理空間データなどを含むさまざまなデータタイプの検出およびデータ拡張ソリューションのコレクションが含まれています。ライブラリの現在のバージョンには、WDI インジケーターのソリューションが含まれています。将来のリリースでは、追加のソリューションが追加される予定です。

既存のメタデータ標準とスキーマを中心に構築されているため、ユーザーと組織は LLM の恩恵を受けてデータ駆動型アプリケーションを強化し、LLM4Data で自然言語処理やテキスト生成などを可能にすることができます。このライブラリは、オープンソースライブラリを使用して LLM と開発データの間のブリッジとして機能し、これらの強力な言語モデルの機能を活用するためのシームレスなインターフェイスを提供します。

インストール

パッケージマネージャー pip を使用して LLM4Data をインストールします。

pip install llm4data

使用法

次の例は、LLM4Data を使用して、プロンプトから WDI API URL と SQL クエリを生成する方法を示しています。

追加の例はここにあります。

プロンプトから WDI API URL を生成する

 This example uses the OpenAI API. Before you proceed, make sure to set your API keys in the `.env` file. See the [setup instructions](https://worldbank.github.io/llm4data/notebooks/examples/getting-started/openai-api.html) for more details.

 from llm4data . prompts . indicators import wdi

# Create a WDI API prompt object
wdi_api = wdi . WDIAPIPrompt ()

# Send a prompt to the LLM to get a WDI API URL relevant to the prompt
response = wdi_api . send_prompt (
    "What is the gdp and the co2 emissions of the philippines and its neighbors in the last decade?"
)

# Parse the response to get the WDI API URL
wdi_api_url = wdi_api . parse_response ( response )
print ( wdi_api_url )

出力は次のようになります。

 https://api.worldbank.org/v2/country/PHL;IDN;MYS;SGP;THA;VNM/indicator/NY.GDP.MKTP.CD;EN.ATM.CO2E.KT?date=2013:2022&format=json&source=2

生成された URL には、プロンプトに関連する国コードとインジケーターがすでに含まれていることに注意してください。どの国がフィリピンの隣国であるかを理解します。また、どの指標コードが GDP と CO2 排出量の関連データを提供する可能性があるかも理解します。

URL には、データの日付範囲、形式、ソースも含まれます。その後、ユーザーは必要に応じて URL を調整し、それを使用して WDI API をクエリできます。

プロンプトから WDI データに対する SQL クエリを生成する

 Make sure you have set up your environment first. The example below requires a working database engine, e.g., postgresql. If you want to use SQLite, make sure to update the `.env` file and set the environment variables.

WDI データを Pandas データフレームにロードすることはできますが、それが常に現実的であるとは限りません。たとえば、任意のデータの質問に答えることができるアプリケーションの開発です。

LLM4Data ライブラリには、WDI データへの SQL インターフェイスが含まれており、ユーザーは SQL を使用してデータをクエリできます。

このインターフェイスにより、ユーザーは SQL を使用してデータをクエリし、結果を Pandas データフレームとして返すことができます。このインターフェイスでは、ユーザーが SQL を使用してデータをクエリし、結果を JSON オブジェクトとして返すこともできます。

 import json
from llm4data . prompts . indicators import templates , wdi
from llm4data . llm . indicators import wdi_sql

prompt = "What is the GDP and army spending of the Philippines in 2020?"

sql_data = wdi_sql . WDISQL (). llm2sql_answer ( prompt , as_dict = True )

print ( sql_data )
# # {'sql': "SELECT country, value AS gdp, (SELECT value FROM wdi WHERE country_iso3 = 'PHL' AND indicator = 'MS.MIL.XPND.GD.ZS' AND year = 2020) AS army_spending FROM wdi WHERE country_iso3 = 'PHL' AND indicator = 'NY.GDP.MKTP.CD' AND year = 2020 AND value IS NOT NULL",
# #  'params': {},
# #  'data': {'data': [{'country': 'Philippines',
# #     'gdp': 361751116292.541,
# #     'army_spending': 1.01242392260698}],
# #   'sample': [{'country': 'Philippines',
# #     'gdp': 361751116292.541,
# #     'army_spending': 1.01242392260698}]},
# #  'num_samples': 20}

SQLクエリ応答の説明を生成します。

LLM4Data ライブラリには、SQL クエリ応答の説明的な説明を生成するためのサポートも含まれています。これは、ユーザーにコンテキストを提供したり、データクエリの結果を説明したりするために使用できる、データの自然言語記述を生成するのに役立ちます。

 from llm4data . prompts . indicators import templates
# Send the prompt to the LLM for a narrative explanation of the response.
# This is optional and can be skipped.
# Note that we pass only a sample in the `context_data`.
# This could limit the quality of the response.
# This is a limitation of the current version of the LLM which is constrained by the context length and cost.

explainer = templates . IndicatorPrompt ()
description = explainer . send_prompt ( prompt = prompt , context_data = json . dumps ( sql_data [ "data" ][ "sample" ]))

print ( description [ "content" ])
# # Based on the data provided, the GDP of the Philippines in 2020 was approximately 362 billion USD. Meanwhile, the country's army spending in the same year was around 1.01 billion USD. It is worth noting that while army spending is an important aspect of a country's budget, it is not the only factor that contributes to its economic growth and development. Other factors such as infrastructure, education, and healthcare also play a crucial role in shaping a country's economy.

SQL クエリに Langchain を使用してみませんか?

Langchain は優れたライブラリであり、自然言語を使用してクエリを実行できる SQL データベース用のラッパーを備えています。ラッパーはSQLDatabaseChainと呼ばれ、次のように使用できます。

 from langchain import OpenAI , SQLDatabase , SQLDatabaseChain

db = SQLDatabase . from_uri ( "sqlite:///../llm4dev/data/sqldb/wdi.db" )
llm = OpenAI ( temperature = 0 , verbose = True )

db_chain = SQLDatabaseChain . from_llm ( llm , db , verbose = True , return_intermediate_steps = True )
out = db_chain ( "What is the GDP and army spending of the Philippines in 2020?" )

 > Entering new SQLDatabaseChain chain...
What is the GDP and army spending of the Philippines in 2020 ?
SQLQuery:SELECT " value " FROM wdi WHERE " name " = ' GDP (current US$) ' AND " country_iso3 " = ' PHL ' AND " year " = 2020
UNION
SELECT " value " FROM wdi WHERE " name " = ' Military expenditure (% of GDP) ' AND " country_iso3 " = ' PHL ' AND " year " = 2020
SQLResult: [(1.01242392260698,), (361751116292.541,)]
Answer:The GDP of the Philippines in 2020 was 1.01242392260698 and the military expenditure (% of GDP) was 361751116292.541.
> Finished chain.

残念ながら、答えはThe GDP of the Philippines in 2020 was 1.01242392260698 and the military expenditure (% of GDP) was 361751116292.541.値が交換されているため、これは正しくありません。

LLM4Data の目標の 1 つは、開発データと知識の発見に適用される既存のオープンソースソリューションの制限を回避したソリューションを開発することです。

主な機能とロードマップ

テキスト生成: LLM を利用して、開発データに関する一貫した文脈に関連したテキストを生成し、チャットボットやコンテンツ生成システムの作成を可能にします。
インタラクティブなエクスペリエンス: LLM をアプリケーションに統合することで魅力的なユーザーエクスペリエンスを作成し、ユーザーがより直観的かつ会話的な方法で開発データを操作できるようにします。
メタデータの拡張: LLM を使用して既存のメタデータを強化し、新しいメタデータの作成と既存のメタデータの改善を可能にします。
AI を活用した洞察: LLM を使用してデータセットから貴重な洞察を抽出し、データ探索、傾向分析、知識発見を強化します。
動的統合: メタデータ標準とスキーマを使用することで、LLM 機能を含む独自の Python スクリプトをその場でロードして利用し、プラグインを通じてプロジェクトにシームレスに組み込みます。
自然言語処理: LLM を活用してテキストデータを分析および処理し、感情分析、テキスト分類、言語翻訳などのタスクを可能にします。