LangChain是一个帮助您方便使用LM(Large Language)的框架。
LangChain Basic 解释了 LangChain 每种配置的示例代码。
描述如何将 LangChain 应用到使用 Falcon FM 创建的 SageMaker 端点。使用通过 SageMaker JumpStart 安装 Falcon FM 获得的 SageMaker 端点(例如,jumpstart-dft-hf-llm-falcon-7b-instruct-bf16)。
参考Falcon的输入和输出,注册ContentHandler的transform_input和transform_output,如下所示。
from langchain import PromptTemplate , SagemakerEndpoint
from langchain . llms . sagemaker_endpoint import LLMContentHandler
class ContentHandler ( LLMContentHandler ):
content_type = "application/json"
accepts = "application/json"
def transform_input ( self , prompt : str , model_kwargs : dict ) -> bytes :
input_str = json . dumps ({ 'inputs' : prompt , 'parameters' : model_kwargs })
return input_str . encode ( 'utf-8' )
def transform_output ( self , output : bytes ) -> str :
response_json = json . loads ( output . read (). decode ( "utf-8" ))
return response_json [ 0 ][ "generated_text" ]
使用 endpoint_name、aws_region、parameters 和 content_handler 为 Sagemaker Endpoint 注册 llm,如下所示。
endpoint_name = 'jumpstart-dft-hf-llm-falcon-7b-instruct-bf16'
aws_region = boto3 . Session (). region_name
parameters = {
"max_new_tokens" : 300
}
content_handler = ContentHandler ()
llm = SagemakerEndpoint (
endpoint_name = endpoint_name ,
region_name = aws_region ,
model_kwargs = parameters ,
content_handler = content_handler
)
您可以按如下方式检查 llm 的运行情况。
llm ( "Tell me a joke" )
此时的结果如下。
I once told a joke to a friend, but it didn't work. He just looked
Web 加载器 - 您可以使用 langchain 加载网页。
from langchain . document_loaders import WebBaseLoader
from langchain . indexes import VectorstoreIndexCreator
loader = WebBaseLoader ( "https://lilianweng.github.io/posts/2023-06-23-agent/" )
index = VectorstoreIndexCreator (). from_loaders ([ loader ])
定义如下所示的模板后,您可以定义LLMChain并运行它。有关详细信息,请参阅 langchain-sagemaker-endpoint-Q&A.ipynb。
from langchain import PromptTemplate , LLMChain
template = "Tell me a {adjective} joke about {content}."
prompt = PromptTemplate . from_template ( template )
llm_chain = LLMChain ( prompt = prompt , llm = llm )
outputText = llm_chain . run ( adjective = "funny" , content = "chickens" )
print ( outputText )
此时的结果如下。
Why did the chicken cross the playground? To get to the other slide!
使用 langchain.chains.question_answering 执行文档的提问/回答。有关详细信息,请参阅 langchain-sagemaker-endpoint-Q&A.ipynb。
定义提示的模板。
template = """Use the following pieces of context to answer the question at the end.
{context}
Question: {question}
Answer:"""
prompt = PromptTemplate (
template = template , input_variables = [ "context" , "question" ]
)
使用 langchain.docstore.document 创建文档。
from langchain . docstore . document import Document
example_doc_1 = """
Peter and Elizabeth took a taxi to attend the night party in the city. While in the party, Elizabeth collapsed and was rushed to the hospital.
Since she was diagnosed with a brain injury, the doctor told Peter to stay besides her until she gets well.
Therefore, Peter stayed with her at the hospital for 3 days without leaving.
"""
docs = [
Document (
page_content = example_doc_1 ,
)
]
现在进行提问/回答。
from langchain . chains . question_answering import load_qa_chain
question = "How long was Elizabeth hospitalized?"
chain = load_qa_chain ( prompt = prompt , llm = llm )
output = chain ({ "input_documents" : docs , "question" : question }, return_only_outputs = True )
print ( output )
此时的结果如下。
{'output_text': ' 3 days'}
langchain-sagemaker-endpoint-pdf-summary.ipynb 解释了如何使用基于 Falcon FM 的 SageMaker Endpoint 进行 PDF Summery。
首先,使用 PyPDF2 读取存储在 S3 中的 PDF 文件并提取文本。
import PyPDF2
from io import BytesIO
sess = sagemaker . Session ()
s3_bucket = sess . default_bucket ()
s3_prefix = 'docs'
s3_file_name = '2016-3series.pdf' # S3의 파일명
s3r = boto3 . resource ( "s3" )
doc = s3r . Object ( s3_bucket , s3_prefix + '/' + s3_file_name )
contents = doc . get ()[ 'Body' ]. read ()
reader = PyPDF2 . PdfReader ( BytesIO ( contents ))
raw_text = []
for page in reader . pages :
raw_text . append ( page . extract_text ())
contents = ' n ' . join ( raw_text )
new_contents = str ( contents ). replace ( " n " , " " )
由于文档很大,使用RecursiveCharacterTextSplitter将其分成块并保存在Document中。然后,使用load_summarize_chain进行汇总。
from langchain . text_splitter import CharacterTextSplitter
from langchain . text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter ( chunk_size = 1000 , chunk_overlap = 0 )
texts = text_splitter . split_text ( new_contents )
from langchain . docstore . document import Document
docs = [
Document (
page_content = t
) for t in texts [: 3 ]
]
from langchain . chains . summarize import load_summarize_chain
from langchain . prompts import PromptTemplate
prompt_template = """Write a concise summary of the following:
{text}
CONCISE SUMMARY """
PROMPT = PromptTemplate ( template = prompt_template , input_variables = [ "text" ])
chain = load_summarize_chain ( llm , chain_type = "stuff" , prompt = PROMPT )
summary = chain . run ( docs )
from langchain import Bedrock
from langchain . embeddings import BedrockEmbeddings
llm = Bedrock ()
print ( llm ( "explain GenAI" ))
浪链文档
浪链-github
SageMaker 端点
2-Lab02-RAG-法学硕士
AWS Kendra Langchain 扩展
质量检查和文档聊天
LangChain - 模块 - 语言模型 - 法学硕士 - 集成 - SageMakerEndpoint
LangChain - 生态系统 - 集成 - SageMaker Endpoint
摄取知识库数据 ta Vector DB