outlines下载 - outlines源代码下载

轮廓标志

?️让法学硕士能够说出每个申请的语言。 ？️

由 .txt 团队使用 ❤?️ 制作。

YouTube 频道 | .txt 博客 |叽叽喳喳

pip install outlines

第一次来这里？转到我们的设置指南

特征

Outlines 每周都会发布新版本和新功能。确保加注星标并 ?观看此存储库，关注@dottxtai 以了解最新信息！

为什么我应该使用结构化生成？

它不会在推理过程中增加任何开销（免费）
它允许开源模型击败闭源模型（Mistral、GPT-4）
它加快了推理速度
它提高了基本模型（GSM8K）的性能
它提高了微调模型（CoNNL）的性能
它提高了模型效率（需要更少的示例）

.txt 公司

我们创办了一家公司，不断突破结构化发电的界限。了解有关 .txt 的更多信息，如果您需要托管解决方案，请尝试我们的 .json API

结构化生成

包含大型语言模型的系统实现可靠性的第一步是确保其输出和用户定义的代码之间有一个定义良好的接口。 Outlines提供了控制语言模型生成的方法，使其输出更加可预测。

多种选择

您可以将完成简化为在多种可能性之间进行选择：

 import outlines

model = outlines . models . transformers ( "microsoft/Phi-3-mini-4k-instruct" )

prompt = """You are a sentiment-labelling assistant.
Is the following review positive or negative?

Review: This restaurant is just awesome!
"""

generator = outlines . generate . choice ( model , [ "Positive" , "Negative" ])
answer = generator ( prompt )

类型约束

您可以指示模型仅返回整数或浮点数：

 import outlines

model = outlines . models . transformers ( "WizardLM/WizardMath-7B-V1.1" )

prompt = "<s>result of 9 + 9 = 18</s><s>result of 1 + 2 = "
answer = outlines . generate . format ( model , int )( prompt )
print ( answer )
# 3

prompt = "sqrt(2)="
generator = outlines . generate . format ( model , float )
answer = generator ( prompt , max_tokens = 10 )
print ( answer )
# 1.41421356

高效的正则表达式结构生成

Outlines 还具有快速的正则表达式结构生成功能。事实上， choice和format函数首先在底层使用了正则表达式结构的生成：

 import outlines

model = outlines . models . transformers ( "microsoft/Phi-3-mini-4k-instruct" )

prompt = "What is the IP address of the Google DNS servers? "

generator = outlines . generate . text ( model )
unstructured = generator ( prompt , max_tokens = 30 )

generator = outlines . generate . regex (
    model ,
    r"((25[0-5]|2[0-4]d|[01]?dd?).){3}(25[0-5]|2[0-4]d|[01]?dd?)" ,
)
structured = generator ( prompt , max_tokens = 30 )

print ( unstructured )
# What is the IP address of the Google DNS servers?
#
# Passive DNS servers are at DNS servers that are private.
# In other words, both IP servers are private. The database
# does not contain Chelsea Manning

print ( structured )
# What is the IP address of the Google DNS servers?
# 2.2.6.1

与其他库不同，Outlines 中的正则表达式结构化生成几乎与非结构化生成一样快。

根据 Pydantic 模型高效生成 JSON

Outlines 可以指导生成过程，因此保证输出遵循 JSON 模式或 Pydantic 模型：

 from enum import Enum
from pydantic import BaseModel , constr

import outlines
import torch


class Weapon ( str , Enum ):
    sword = "sword"
    axe = "axe"
    mace = "mace"
    spear = "spear"
    bow = "bow"
    crossbow = "crossbow"


class Armor ( str , Enum ):
    leather = "leather"
    chainmail = "chainmail"
    plate = "plate"


class Character ( BaseModel ):
    name : constr ( max_length = 10 )
    age : int
    armor : Armor
    weapon : Weapon
    strength : int


model = outlines . models . transformers ( "microsoft/Phi-3-mini-4k-instruct" )

# Construct structured sequence generator
generator = outlines . generate . json ( model , Character )

# Draw a sample
seed = 789001

character = generator ( "Give me a character description" , seed = seed )

print ( repr ( character ))
# Character(name='Anderson', age=28, armor=<Armor.chainmail: 'chainmail'>, weapon=<Weapon.sword: 'sword'>, strength=8)

character = generator ( "Give me an interesting character description" )

print ( repr ( character ))
# Character(name='Vivian Thr', age=44, armor=<Armor.plate: 'plate'>, weapon=<Weapon.crossbow: 'crossbow'>, strength=125)

该方法适用于联合类型、可选类型、数组、嵌套模式等。尚不支持某些字段约束，但其他所有内容都应该有效。

根据 JSON 模式高效生成 JSON

有时您只是希望能够传递 JSON 模式而不是 Pydantic 模型。我们已经为您提供了保障：

 import outlines

schema = '''{
    "title": "Character",
    "type": "object",
    "properties": {
        "name": {
            "title": "Name",
            "maxLength": 10,
            "type": "string"
        },
        "age": {
            "title": "Age",
            "type": "integer"
        },
        "armor": {"$ref": "#/definitions/Armor"},
        "weapon": {"$ref": "#/definitions/Weapon"},
        "strength": {
            "title": "Strength",
            "type": "integer"
        }
    },
    "required": ["name", "age", "armor", "weapon", "strength"],
    "definitions": {
        "Armor": {
            "title": "Armor",
            "description": "An enumeration.",
            "enum": ["leather", "chainmail", "plate"],
            "type": "string"
        },
        "Weapon": {
            "title": "Weapon",
            "description": "An enumeration.",
            "enum": ["sword", "axe", "mace", "spear", "bow", "crossbow"],
            "type": "string"
        }
    }
}'''

model = outlines . models . transformers ( "microsoft/Phi-3-mini-4k-instruct" )
generator = outlines . generate . json ( model , schema )
character = generator ( "Give me a character description" )

使用上下文无关语法来指导生成

正式语法统治着世界，而 Outlines 也让它们统治着法学硕士。您可以传递 EBNF 格式的任何上下文无关语法，Outlines 将生成对此语法有效的输出：

 import outlines

arithmetic_grammar = """
    ?start: expression

    ?expression: term (("+" | "-") term)*

    ?term: factor (("*" | "/") factor)*

    ?factor: NUMBER
           | "-" factor
           | "(" expression ")"

    %import common.NUMBER
"""

model = outlines . models . transformers ( "WizardLM/WizardMath-7B-V1.1" )
generator = outlines . generate . cfg ( model , arithmetic_grammar )
sequence = generator ( "Alice had 4 apples and Bob ate 2. Write an expression for Alice's apples:" )

print ( sequence )
# (8-2)

这是一个非常简单的语法，您可以使用outlines.generate.cfg生成语法上有效的Python、SQL 等等。实际上，任何类型的结构化文本。您所要做的就是在网络上搜索“X EBNF 语法”，然后查看 Outlines grammars模块。

开放功能

Outlines 可以从函数的签名推断输出的结构。结果是一个字典，可以使用常用的字典扩展语法**直接传递给函数：

 import outlines


def add ( a : int , b : int ):
    return a + b

model = outlines . models . transformers ( "WizardLM/WizardMath-7B-V1.1" )
generator = outlines . generate . json ( model , add )
result = generator ( "Return json with two integers named a and b respectively. a is odd and b even." )

print ( add ( ** result ))
# 3

直接传递函数来指定结构的一个很大的优点是LLM的结构会随着函数的定义而改变。无需多处更改代码！

您还可以将各种函数嵌入到枚举中以生成参数：

 from enum import Enum
from functools import partial

import outlines


def add ( a : int , b : int ) -> int :
    return a + b

def mul ( c : float , d : float ) -> float :
    return c * d

class Operation ( Enum ):
    add = partial ( add )
    mul = partial ( mul )

model = outlines . models . transformers ( "WizardLM/WizardMath-7B-V1.1" )
generator = outlines . generate . json ( model , add )
result = generator ( "Return json with two float named c and d respectively. c is negative and d greater than 1.0." )

print ( result )
# {'c': -3.14, 'd': 1.5}

提示

构建提示可能会变得混乱。 Outlines通过将模板封装在“模板函数”中，使编写和管理提示变得更加容易。

这些函数使得提示逻辑与通用程序逻辑巧妙地分离成为可能；它们可以从其他模块和库导入。

模板函数不需要多余的抽象，它们使用 Jinja2 模板引擎来帮助以简洁的方式构建复杂的提示：

 import outlines

examples = [
    ( "The food was disgusting" , "Negative" ),
    ( "We had a fantastic night" , "Positive" ),
    ( "Recommended" , "Positive" ),
    ( "The waiter was rude" , "Negative" )
]

@ outlines . prompt
def labelling ( to_label , examples ):
    """You are a sentiment-labelling assistant.

    {% for example in examples %}
    {{ example[0] }} // {{ example[1] }}
    {% endfor %}
    {{ to_label }} //
    """

model = outlines . models . transformers ( "microsoft/Phi-3-mini-4k-instruct" )
prompt = labelling ( "Just awesome" , examples )
answer = outlines . generate . text ( model )( prompt , max_tokens = 100 )

加入我们

有想法吗？欢迎在 Discord 上与我们聊天
？想做出贡献吗？请参阅我们的贡献指南。
？发现错误？打开一个问题

引用大纲

 @article{willard2023efficient,
  title={Efficient Guided Generation for LLMs},
  author={Willard, Brandon T and Louf, R{'e}mi},
  journal={arXiv preprint arXiv:2307.09702},
  year={2023}
}