PropertyExtractor 다운로드 - PropertyExtractor 소스 코드 다운로드

PropertyExtractor

기타 소스코드

v1.0

다운로드

PropertyExtractor: 오픈 소스 대화형 LLM 기반 도구

소개

자연어 처리 및 대규모 언어 모델(LLM)의 출현으로 구조화되지 않은 학술 논문에서 데이터를 추출하는 방식이 혁신적으로 바뀌었습니다. 그러나 데이터 신뢰성을 보장하는 것은 여전히 중요한 과제로 남아 있습니다. PropertyExtractor 는 Google Gemini Pro 및 OpenAI GPT-4 와 같은 고급 대화형 LLM을 활용하고, 제로샷과 상황 내 학습을 혼합하고, 구조화된 정보 계층 구조를 동적으로 개선하기 위해 엔지니어링된 프롬프트를 사용하여 자율성을 가능하게 하는 오픈 소스 도구입니다. , 재료 특성 데이터베이스를 생성하기 위해 재료 특성 데이터를 효율적이고 확장 가능하며 정확하게 식별, 추출 및 검증합니다.

특징

고급 LLM 통합 : Google Gemini Pro와 OpenAI GPT-4를 모두 지원합니다.
Zero-shot 및 Few-shot 학습 : 추출 정확도를 높이기 위해 상황 내 학습을 혼합합니다.
엔지니어링 프롬프트 : 구조화된 정보 계층을 동적으로 개선합니다.
자율 추출 : 재료 특성을 효율적이고 확장 가능하게 식별하고 추출합니다.
높은 정밀도 및 재현율 : 90% 이상의 정밀도와 약 10%의 오류율로 재현율을 달성합니다.

설치

PropertyExtractor는 아래 설명된 대로 다양한 사용자 기본 설정에 적합한 간단한 설치 옵션을 제공합니다. 모든 라이브러리와 종속 항목은 모든 설치 옵션에서 PropertyExtractor 실행 파일 "propertyextract" 와 함께 자동으로 결정되고 설치됩니다.

pip 사용 : PropertyExtractor 패키지를 설치하는 권장 방법은 pip를 사용하는 것입니다.
- 다음을 실행하여 pip를 사용하여 최신 버전의 PropertyExtractor 패키지를 빠르게 설치하세요.
```
 pip install -U propertyextract
```
소스 코드에서 :
- 또는 사용자는 다음을 사용하여 소스 코드를 다운로드할 수 있습니다.
```
 git clone [[email protected]:gmp007/PropertyExtractor.git]
```
- 그런 다음 마스터 디렉터리로 이동하고 다음을 실행하여 PropertyExtractor를 설치합니다.
```
 pip install .
```
setup.py를 통한 설치 :
- PropertyExtractor는 setup.py 스크립트를 사용하여 설치할 수도 있습니다.
```
 python setup.py install [--prefix=/path/to/install/]
```
- 선택적 --prefix 인수는 관리 권한이 제한될 수 있는 공유 고성능 컴퓨팅(HPC) 시스템과 같은 환경에 설치할 때 유용합니다.
- 이 방법은 계속 지원되지만 최신 설치 방식을 선호하여 사용량이 점차 줄어들고 있습니다. pip 와 같은 표준 설치 방법을 적용할 수 없는 경우에만 이 설치 옵션을 권장합니다.

용법

구성

API 키를 노출하지 마세요. PropertyExtractor를 실행하기 전에 Google Gemini Pro 및 OpenAI GPT-4용 API 키를 환경 변수로 구성합니다.

리눅스/맥OS에서

 export GPT4_API_KEY= ' your_gpt4_api_key_here '
export GEMINI_PRO_API_KEY= ' your_gemini_pro_api_key_here '

Windows의 경우

 set GPT4_API_KEY= ' your_gpt4_api_key_here '
set GEMINI_PRO_API_KEY= ' your_gemini_pro_api_key_here '

PropertyExtractor 사용법 및 실행

PropertyExtractor는 실행하기 쉽습니다. PropertyExtractor를 초기화하는 주요 단계는 다음과 같습니다.

비정형 데이터 생성 *: API를 사용하여 원하는 출판사로부터 데이터베이스를 생성하려는 물질적 특성을 얻습니다. 우리는 Elsevier의 ScienceDirect API, CrossRef REST API 및 PubMed API에 대한 API 함수를 작성했습니다. 필요한 경우 이들 중 일부를 공유할 수 있습니다.

계산 디렉토리 생성 :

계산을 위한 디렉터리를 만드는 것부터 시작하세요.
propextract -0 실행하여 extract.in 인 PropertyExtractor 의 기본 입력 템플릿을 생성합니다. 포함된 자세한 지침에 따라 수정하세요.

additionalprompt.txt' for augmenting additional custom prompts and 기본 키워드를 지원하기 위한 사용자 정의 추가 키워드에 대한 'keywords.json'과 같은 선택적 파일도 생성됩니다. 추출되는 재질 특성에 맞게 수정하세요. 기본 입력 템플릿 `extract.in'은 다음과 같습니다:

 ###############################################################################
 ### The input file to control the calculation details of PropertyExtract    ###
 ###############################################################################
 # Type of LLM model: gemini/chatgpt 
 model_type = gemini
 # LLM model name: gemini-pro/gpt-4
 model_name = gemini-pro
 # Property to extract from texts
 property = thickness
 # Harmonized unit for the property to be extracted
 property_unit = Angstrom
 # temperature to max_output_tokens are LLM model parameters
 temperature = 0.0
 top_p = 0.95
 max_output_tokens = 80
 # You can supply additional keywords to be used in conjunction with the property: modify the file keywords.json
 use_keywords = True
 # You can add additional custom prompts: modify the file additionalprompt.txt
 additional_prompts = additionalprompt.txt
 # Name of input file to be processed: csv/excel format
 inputfile_name = 2Dthickness_Elsevier.csv
 # Column name in the input file to be processed
 column_name = Text
 # Name of output file
 outputfile_name = ppt_test

작업 초기화 :
- propextract 실행하여 계산 프로세스를 시작합니다.
PropertyExtractor 옵션 이해 :
- 기본 입력 파일 extract.in 각 플래그에 대한 설명 텍스트가 포함되어 있어 사용자에게 친숙합니다.

PropertyExtractor 인용

연구에 PropertyExtractor 패키지를 사용한 경우 다음을 인용해 주세요.

데이터 추출 및 재료 특성 예측을 위한 대화형 모델을 사용한 동적 상황 내 학습 -

@article{Ekuma2024,
  title = {Dynamic In-context Learning with Conversational Models for Data Extraction and Materials Property Prediction},
  journal = {XXX},
  volume = {xx},
  pages = {xx},
  year = {xx},
  doi = {xx},
  url = {xx},
  author = {Chinedu Ekuma}
}

@misc{PropertyExtractor,
  author = {Chinedu Ekuma},
  title = {PropertyExtractor -- LLM-based model to extract material property from unstructured dataset},
  year = {2024},
  howpublished = { url {https://github.com/gmp007/PropertyExtractor}},
  note = {Open-source tool leveraging LLMs like Google Gemini Pro and OpenAI GPT-4 for material property extraction},
}