LLM4Data 是一个 Python 库,旨在促进大型语言模型 (LLM) 和人工智能的应用,以进行开发数据和知识发现。它旨在使用户和组织能够通过自然语言以创新方式发现开发数据并与之交互。
LLM4Data 为 DevData Chat 应用程序提供支持,该应用程序很快将作为开源项目提供。该应用程序展示了法学硕士(语言模型)和人工智能增强数据和知识的发现和交互的各种方式,带来创新的解决方案来帮助解决可发现性和可访问性差距。
LLM4Data 库包含一系列针对各种数据类型的发现和数据增强解决方案,包括文档、指标、微数据、地理空间数据等。该库的当前版本包含 WDI 指标的解决方案。未来版本中将添加更多解决方案。
围绕现有元数据标准和模式构建,用户和组织可以从 LLM 中受益,以增强数据驱动的应用程序,通过 LLM4Data 实现自然语言处理、文本生成等。该库充当法学硕士和使用开源库的开发数据之间的桥梁,提供无缝接口来利用这些强大语言模型的功能。
使用包管理器 pip 安装 LLM4Data。
pip install llm4data
以下示例演示如何使用 LLM4Data 根据提示生成 WDI API URL 和 SQL 查询。
可以在此处找到其他示例。
This example uses the OpenAI API. Before you proceed, make sure to set your API keys in the `.env` file. See the [setup instructions](https://worldbank.github.io/llm4data/notebooks/examples/getting-started/openai-api.html) for more details.
from llm4data . prompts . indicators import wdi
# Create a WDI API prompt object
wdi_api = wdi . WDIAPIPrompt ()
# Send a prompt to the LLM to get a WDI API URL relevant to the prompt
response = wdi_api . send_prompt (
"What is the gdp and the co2 emissions of the philippines and its neighbors in the last decade?"
)
# Parse the response to get the WDI API URL
wdi_api_url = wdi_api . parse_response ( response )
print ( wdi_api_url )
输出将如下所示:
https://api.worldbank.org/v2/country/PHL;IDN;MYS;SGP;THA;VNM/indicator/NY.GDP.MKTP.CD;EN.ATM.CO2E.KT?date=2013:2022&format=json&source=2
请注意,生成的 URL 已包含与提示相关的国家/地区代码和指示符。它了解哪些国家是菲律宾的邻国。它还了解哪些指标代码可能提供 GDP 和二氧化碳排放量的相关数据。
URL 还包括数据的日期范围、格式和来源。然后,用户可以根据需要调整 URL,并使用它来查询 WDI API。
Make sure you have set up your environment first. The example below requires a working database engine, e.g., postgresql. If you want to use SQLite, make sure to update the `.env` file and set the environment variables.
虽然 WDI 数据可以加载到 Pandas 数据帧中,但这样做并不总是可行;例如,开发可以回答任意数据问题的应用程序。
LLM4Data 库包含 WDI 数据的 SQL 接口,允许用户使用 SQL 查询数据。
该接口将允许用户使用 SQL 查询数据,并将结果作为 Pandas 数据框返回。该接口还允许用户使用 SQL 查询数据,并将结果作为 JSON 对象返回。
import json
from llm4data . prompts . indicators import templates , wdi
from llm4data . llm . indicators import wdi_sql
prompt = "What is the GDP and army spending of the Philippines in 2020?"
sql_data = wdi_sql . WDISQL (). llm2sql_answer ( prompt , as_dict = True )
print ( sql_data )
# # {'sql': "SELECT country, value AS gdp, (SELECT value FROM wdi WHERE country_iso3 = 'PHL' AND indicator = 'MS.MIL.XPND.GD.ZS' AND year = 2020) AS army_spending FROM wdi WHERE country_iso3 = 'PHL' AND indicator = 'NY.GDP.MKTP.CD' AND year = 2020 AND value IS NOT NULL",
# # 'params': {},
# # 'data': {'data': [{'country': 'Philippines',
# # 'gdp': 361751116292.541,
# # 'army_spending': 1.01242392260698}],
# # 'sample': [{'country': 'Philippines',
# # 'gdp': 361751116292.541,
# # 'army_spending': 1.01242392260698}]},
# # 'num_samples': 20}
LLM4Data 库还支持生成 SQL 查询响应的叙述性解释。这对于生成数据的自然语言描述非常有用,可用于向用户提供上下文并解释数据查询的结果。
from llm4data . prompts . indicators import templates
# Send the prompt to the LLM for a narrative explanation of the response.
# This is optional and can be skipped.
# Note that we pass only a sample in the `context_data`.
# This could limit the quality of the response.
# This is a limitation of the current version of the LLM which is constrained by the context length and cost.
explainer = templates . IndicatorPrompt ()
description = explainer . send_prompt ( prompt = prompt , context_data = json . dumps ( sql_data [ "data" ][ "sample" ]))
print ( description [ "content" ])
# # Based on the data provided, the GDP of the Philippines in 2020 was approximately 362 billion USD. Meanwhile, the country's army spending in the same year was around 1.01 billion USD. It is worth noting that while army spending is an important aspect of a country's budget, it is not the only factor that contributes to its economic growth and development. Other factors such as infrastructure, education, and healthcare also play a crucial role in shaping a country's economy.
Langchain 是一个很棒的库,它有一个 SQL 数据库的包装器,允许您使用自然语言查询它们。该包装器称为SQLDatabaseChain
,可以按如下方式使用:
from langchain import OpenAI , SQLDatabase , SQLDatabaseChain
db = SQLDatabase . from_uri ( "sqlite:///../llm4dev/data/sqldb/wdi.db" )
llm = OpenAI ( temperature = 0 , verbose = True )
db_chain = SQLDatabaseChain . from_llm ( llm , db , verbose = True , return_intermediate_steps = True )
out = db_chain ( "What is the GDP and army spending of the Philippines in 2020?" )
> Entering new SQLDatabaseChain chain...
What is the GDP and army spending of the Philippines in 2020 ?
SQLQuery:SELECT " value " FROM wdi WHERE " name " = ' GDP (current US$) ' AND " country_iso3 " = ' PHL ' AND " year " = 2020
UNION
SELECT " value " FROM wdi WHERE " name " = ' Military expenditure (% of GDP) ' AND " country_iso3 " = ' PHL ' AND " year " = 2020
SQLResult: [(1.01242392260698,), (361751116292.541,)]
Answer:The GDP of the Philippines in 2020 was 1.01242392260698 and the military expenditure (% of GDP) was 361751116292.541.
> Finished chain.
不幸的是,答案The GDP of the Philippines in 2020 was 1.01242392260698 and the military expenditure (% of GDP) was 361751116292.541.
不正确,因为值已交换。
LLM4Data 的目标之一是围绕现有开源解决方案的局限性开发解决方案,应用于开发数据和知识发现。
欢迎请求请求。对于重大更改,请先打开一个问题来讨论您想要更改的内容。
请参阅 CONTRIBUTING.md 了解更多信息。
LLM4Data 根据世界银行主社区许可协议获得许可。