llano
1.0.0
Let Large Language Models Serve As Data Annotators.
Zero-shot/few-shot information extractor.
stable
python -m pip install -U llano
For Chinese users, the index-url can be specified for a faster installation.
python -m pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -U llano
latest
python -m pip install git+https://github.com/SeanLee97/llano.git
Currently, supports Python3.8+
. Due to Python 3.7
's end-of-life on June 27, 2023, we no longer support it.
Supporting Tasks:
Task Name | Supporting Languages | Status |
---|---|---|
NER | English (EN), Simplifed Chinese (ZH_CN) | ? |
Text Classification (Binary, MultiClass) | English (EN), Simplifed Chinese (ZH_CN) | ? |
MultiLabel Classification | English (EN), Simplifed Chinese (ZH_CN) | ? |
Data Augmentation | English (EN), Simplifed Chinese (ZH_CN) | ? |
Relation Extraction | English (EN), Simplifed Chinese (ZH_CN) | ? |
Summarization | ||
Text to SQL |
from llano.config import Tasks, Languages, OpenAIModels, NERFormatter
from llano import GPTModel, GPTAnnotator
print('All Supported Tasks:', Tasks.list_attributes())
print('All Supported Languages:', Languages.list_attributes())
print('All Supported NERFormatter:', NERFormatter.list_attributes())
print('All Supported OpenAIModels:', OpenAIModels.list_attributes())
api_key = 'Your API Key'
model = GPTModel(api_key, model=OpenAIModels.ChatGPT)
annotator = GPTAnnotator(model,
task=Tasks.NER,
language=Languages.EN,
label_mapping={
"people": 'PEO',
'location': 'LOC',
'company': 'COM',
'organization': 'ORG',
'job': 'JOB'})
doc = '''Elon Reeve Musk FRS (/ˈiːlɒn/ EE-lon; born June 28, 1971) is a business magnate and investor. He is the founder, CEO and chief engineer of SpaceX; angel investor, CEO and product architect of Tesla, Inc.; owner and CEO of Twitter, Inc.; founder of The Boring Company; co-founder of Neuralink and OpenAI; and president of the philanthropic Musk Foundation. '''
# w/o hint, w/o formatted result
ret = annotator(doc)
# w/o hint, w/ formatted result
ret = annotator(doc, formatter=NERFormatter.BIO)
# w/ hint, w/ formatted result
ret = annotator(doc, hint='the entity type `job` is job title such as CEO, founder, boss.', formatter=NERFormatter.BIO)
result
is the annotation result. formatted_result
is the formatted result.
Tip: if you want to train your domain model, you can use the formatted result.
{
"request": {
"prompt": "You are a NER (Named-entity recognition) system, please help me with the NER task.nTask: extract the entities and corresponding entity types from a given sentence.nOnly support 5 entity types, including: people, location, company, organization, job.nnExplanation and examples: the entity type `job` is job title such as CEO, founder, boss.nnOutput format: (entity, entity_type).nnFollowing is the given sentence: Elon Reeve Musk FRS (/ˈiːlɒn/ EE-lon; born June 28, 1971) is a business magnate and investor. He is the founder, CEO and chief engineer of SpaceX; angel investor, CEO and product architect of Tesla, Inc.; owner and CEO of Twitter, Inc.; founder of The Boring Company; co-founder of Neuralink and OpenAI; and president of the philanthropic Musk Foundation. nOutput:"
},
"meta": {
"role": "assistant",
"prompt_tokens": 195,
"completion_tokens": 74,
"total_tokens": 269,
"taken_time": 4.87583
},
"response": "nn("Elon Reeve Musk", "people"), ("FRS", "job"), ("SpaceX", "company"), ("Tesla, Inc.", "company"), ("Twitter, Inc.", "company"), ("The Boring Company", "organization"), ("Neuralink", "organization"), ("OpenAI", "organization"), ("Musk Foundation", "organization")",
"result": {
"text": "Elon Reeve Musk FRS (/ˈiːlɒn/ EE-lon; born June 28, 1971) is a business magnate and investor. He is the founder, CEO and chief engineer of SpaceX; angel investor, CEO and product architect of Tesla, Inc.; owner and CEO of Twitter, Inc.; founder of The Boring Company; co-founder of Neuralink and OpenAI; and president of the philanthropic Musk Foundation. ",
"entities": [
[
0,
15,
"Elon Reeve Musk",
"PEO"
],
[
16,
19,
"FRS",
"JOB"
],
[
139,
145,
"SpaceX",
"COM"
],
[
192,
203,
"Tesla, Inc.",
"COM"
],
[
222,
235,
"Twitter, Inc.",
"COM"
],
[
248,
266,
"The Boring Company",
"ORG"
],
[
282,
291,
"Neuralink",
"ORG"
],
[
296,
302,
"OpenAI",
"ORG"
],
[
339,
354,
"Musk Foundation",
"ORG"
]
],
"formatted_result": "EtB-PEOnltI-PEOnotI-PEOnntI-PEOn tI-PEOnRtI-PEOnetI-PEOnetI-PEOnvtI-PEOnetI-PEOn tI-PEOnMtI-PEOnutI-PEOnstI-PEOnktI-PEOn tOnFtB-JOBnRtI-JOBnStI-JOBn tOn(tOn/tOnˈtOnitOnːtOnltOnɒtOnntOn/tOn tOnEtOnEtOn-tOnltOnotOnntOn;tOn tOnbtOnotOnrtOnntOn tOnJtOnutOnntOnetOn tOn2tOn8tOn,tOn tOn1tOn9tOn7tOn1tOn)tOn tOnitOnstOn tOnatOn tOnbtOnutOnstOnitOnntOnetOnstOnstOn tOnmtOnatOngtOnntOnatOnttOnetOn tOnatOnntOndtOn tOnitOnntOnvtOnetOnstOnttOnotOnrtOn.tOn tOnHtOnetOn tOnitOnstOn tOnttOnhtOnetOn tOnftOnotOnutOnntOndtOnetOnrtOn,tOn tOnCtOnEtOnOtOn tOnatOnntOndtOn tOnctOnhtOnitOnetOnftOn tOnetOnntOngtOnitOnntOnetOnetOnrtOn tOnotOnftOn tOnStB-COMnptI-COMnatI-COMnctI-COMnetI-COMnXtI-COMn;tOn tOnatOnntOngtOnetOnltOn tOnitOnntOnvtOnetOnstOnttOnotOnrtOn,tOn tOnCtOnEtOnOtOn tOnatOnntOndtOn tOnptOnrtOnotOndtOnutOnctOnttOn tOnatOnrtOnctOnhtOnitOnttOnetOnctOnttOn tOnotOnftOn tOnTtB-COMnetI-COMnstI-COMnltI-COMnatI-COMn,tI-COMn tI-COMnItI-COMnntI-COMnctI-COMn.tI-COMn;tOn tOnotOnwtOnntOnetOnrtOn tOnatOnntOndtOn tOnCtOnEtOnOtOn tOnotOnftOn tOnTtB-COMnwtI-COMnitI-COMnttI-COMnttI-COMnetI-COMnrtI-COMn,tI-COMn tI-COMnItI-COMnntI-COMnctI-COMn.tI-COMn;tOn tOnftOnotOnutOnntOndtOnetOnrtOn tOnotOnftOn tOnTtB-ORGnhtI-ORGnetI-ORGn tI-ORGnBtI-ORGnotI-ORGnrtI-ORGnitI-ORGnntI-ORGngtI-ORGn tI-ORGnCtI-ORGnotI-ORGnmtI-ORGnptI-ORGnatI-ORGnntI-ORGnytI-ORGn;tOn tOnctOnotOn-tOnftOnotOnutOnntOndtOnetOnrtOn tOnotOnftOn tOnNtB-ORGnetI-ORGnutI-ORGnrtI-ORGnatI-ORGnltI-ORGnitI-ORGnntI-ORGnktI-ORGn tOnatOnntOndtOn tOnOtB-ORGnptI-ORGnetI-ORGnntI-ORGnAtI-ORGnItI-ORGn;tOn tOnatOnntOndtOn tOnptOnrtOnetOnstOnitOndtOnetOnntOnttOn tOnotOnftOn tOnttOnhtOnetOn tOnptOnhtOnitOnltOnatOnntOnttOnhtOnrtOnotOnptOnitOnctOn tOnMtB-ORGnutI-ORGnstI-ORGnktI-ORGn tI-ORGnFtI-ORGnotI-ORGnutI-ORGnntI-ORGndtI-ORGnatI-ORGnttI-ORGnitI-ORGnotI-ORGnntI-ORGn.tOn tO"
},
}
from llano.config import Tasks, Languages, OpenAIModels, NERFormatter
from llano import GPTModel, GPTAnnotator
print('All Supported Tasks:', Tasks.list_attributes())
print('All Supported Languages:', Languages.list_attributes())
print('All Supported NERFormatter:', NERFormatter.list_attributes())
print('All Supported OpenAIModels:', OpenAIModels.list_attributes())
api_keys = ['Your API Keys']
model = GPTModel(api_keys, model=OpenAIModels.ChatGPT)
annotator = GPTAnnotator(model,
task=Tasks.NER,
language=Languages.ZH_CN,
label_mapping={
'人名': 'PEO',
'地名': 'LOC',
'公司名': 'COM',
'机构名': 'ORG',
'身份': 'ID'})
doc = '''埃隆·里夫·马斯克(Elon Reeve Musk) [107] ,1971年6月28日出生于南非的行政首都比勒陀利亚,企业家、工程师、慈善家、美国国家工程院院士。他同时兼具南非、加拿大和美国三重国籍。埃隆·马斯克本科毕业于宾夕法尼亚大学,获经济学和物理学双学位。1995年至2002年,马斯克与合伙人先后办了三家公司,分别是在线内容出版软件“Zip2”、电子支付“X.com”和“PayPal”。'''
ret = annotator(doc) # w/o hint, w/o formatter
ret = annotator(doc, formatter=NERFormatter.BIO) # w/o hint, w/ formatter
ret = annotator(doc, hint='身份表示从事职位的头衔或社会地位等,如:老板,董事长,作家,理事长等', formatter=NERFormatter.BIO) # w/o hint, w/ formatter
{
"request": {
"prompt": "你是一个 NER 系统,请帮我完成中文 NER 任务。n任务要求如下:找到句子中的实体,并返回实体及实体类型。n支持的实体类型仅限5类:人名、地名、公司名、机构名、身份。nn解释及示例:身份表示从事职位的头衔或社会地位等,如:老板,董事长,作家,理事长等nn输出格式要求:(实体, 实体类型)。nn以下是输入句子:埃隆·里夫·马斯克(Elon Reeve Musk) [107] ,1971年6月28日出生于南非的行政首都比勒陀利亚,企业家、工程师、慈善家、美国国家工程院院士。他同时兼具南非、加拿大和美国三重国籍。埃隆·马斯克本科毕业于宾夕法尼亚大学,获经济学和物理学双学位。1995年至2002年,马斯克与合伙人先后办了三家公司,分别是在线内容出版软件“Zip2”、电子支付“X.com”和“PayPal”。n输出:"
},
"meta": {
"role": "assistant",
"prompt_tokens": 346,
"completion_tokens": 103,
"total_tokens": 449,
"taken_time": 4.54531
},
"response": "('埃隆·里夫·马斯克', '人名'), ('南非', '地名'), ('比勒托利亚', '地名'), ('美国国家工程院院士', '身份'), ('宾夕法尼亚大学', '机构名'), ('Zip2', '公司名'), ('X.com', '公司名'), ('PayPal', '公司名')",
"result": {
"text": "埃隆·里夫·马斯克(Elon Reeve Musk) [107] ,1971年6月28日出生于南非的行政首都比勒陀利亚,企业家、工程师、慈善家、美国国家工程院院士。他同时兼具南非、加拿大和美国三重国籍。埃隆·马斯克本科毕业于宾夕法尼亚大学,获经济学和物理学双学位。1995年至2002年,马斯克与合伙人先后办了三家公司,分别是在线内容出版软件“Zip2”、电子支付“X.com”和“PayPal”。",
"entities": [
[
0,
9,
"埃隆·里夫·马斯克",
"PEO"
],
[
48,
50,
"南非",
"LOC"
],
[
73,
82,
"美国国家工程院院士",
"ID"
],
[
88,
90,
"南非",
"LOC"
],
[
113,
120,
"宾夕法尼亚大学",
"ORG"
],
[
173,
177,
"Zip2",
"COM"
],
[
184,
189,
"X.com",
"COM"
],
[
192,
198,
"PayPal",
"COM"
]
],
"formatted_result": "埃tB-PEOn隆tI-PEOn·tI-PEOn里tI-PEOn夫tI-PEOn·tI-PEOn马tI-PEOn斯tI-PEOn克tI-PEOn(tOnEtOnltOnotOnntOn tOnRtOnetOnetOnvtOnetOn tOnMtOnutOnstOnktOn)tOn tOn[tOn1tOn0tOn7tOn]tOn tOn tOn,tOn1tOn9tOn7tOn1tOn年tOn6tOn月tOn2tOn8tOn日tOn出tOn生tOn于tOn南tB-LOCn非tI-LOCn的tOn行tOn政tOn首tOn都tOn比tOn勒tOn陀tOn利tOn亚tOn,tOn企tOn业tOn家tOn、tOn工tOn程tOn师tOn、tOn慈tOn善tOn家tOn、tOn美tB-IDn国tI-IDn国tI-IDn家tI-IDn工tI-IDn程tI-IDn院tI-IDn院tI-IDn士tI-IDn。tOn他tOn同tOn时tOn兼tOn具tOn南tB-LOCn非tI-LOCn、tOn加tOn拿tOn大tOn和tOn美tOn国tOn三tOn重tOn国tOn籍tOn。tOn埃tOn隆tOn·tOn马tOn斯tOn克tOn本tOn科tOn毕tOn业tOn于tOn宾tB-ORGn夕tI-ORGn法tI-ORGn尼tI-ORGn亚tI-ORGn大tI-ORGn学tI-ORGn,tOn获tOn经tOn济tOn学tOn和tOn物tOn理tOn学tOn双tOn学tOn位tOn。tOn1tOn9tOn9tOn5tOn年tOn至tOn2tOn0tOn0tOn2tOn年tOn,tOn马tOn斯tOn克tOn与tOn合tOn伙tOn人tOn先tOn后tOn办tOn了tOn三tOn家tOn公tOn司tOn,tOn分tOn别tOn是tOn在tOn线tOn内tOn容tOn出tOn版tOn软tOn件tOn“tOnZtB-COMnitI-COMnptI-COMn2tI-COMn”tOn、tOn电tOn子tOn支tOn付tOn“tOnXtB-COMn.tI-COMnctI-COMnotI-COMnmtI-COMn”tOn和tOn“tOnPtB-COMnatI-COMnytI-COMnPtI-COMnatI-COMnltI-COMn”tOn。tO"
}
}
WIP
Contributions are always welcome!
Welcome to join our community!