ดาวน์โหลด tpot - ดาวน์โหลดซอร์สโค้ด tpot

สถานะมาสเตอร์:

สถานะการพัฒนา:

ข้อมูลแพ็คเกจ:

หากต้องการลองใช้ TPOT2 ( alpha ) โปรดไปที่นี่!

TPOT ย่อมาจาก P ipeline O ptimization T ool ที่มีพื้นฐานจาก T ree พิจารณา TPOT ผู้ช่วยวิทยาศาสตร์ข้อมูล ของคุณ TPOT คือเครื่องมือ Python Automated Machine Learning ที่ปรับไปป์ไลน์การเรียนรู้ของเครื่องให้เหมาะสมโดยใช้การเขียนโปรแกรมทางพันธุกรรม

TPOT Demo

TPOT จะทำให้ส่วนที่น่าเบื่อที่สุดของแมชชีนเลิร์นนิงเป็นไปโดยอัตโนมัติด้วยการสำรวจไปป์ไลน์ที่เป็นไปได้หลายพันรายการอย่างชาญฉลาด เพื่อค้นหาอันที่ดีที่สุดสำหรับข้อมูลของคุณ

An example Machine Learning pipeline

ตัวอย่างไปป์ไลน์ Machine Learning

เมื่อ TPOT ค้นหาเสร็จแล้ว (หรือคุณเบื่อที่จะรอ) TPOT จะให้โค้ด Python สำหรับไปป์ไลน์ที่ดีที่สุดที่พบ เพื่อให้คุณสามารถปรับแต่งไปป์ไลน์จากที่นั่นได้

An example TPOT pipeline

TPOT สร้างขึ้นจาก scikit-learn ดังนั้นโค้ดทั้งหมดที่สร้างขึ้นจึงควรดูคุ้นเคย... หากคุณคุ้นเคยกับ scikit-learn อยู่แล้ว

TPOT ยังอยู่ระหว่างการพัฒนา และเราขอแนะนำให้คุณกลับมาตรวจสอบที่เก็บข้อมูลนี้เป็นประจำเพื่อรับการอัปเดต

สำหรับข้อมูลเพิ่มเติมเกี่ยวกับ TPOT โปรดดูเอกสารประกอบโครงการ

ใบอนุญาต

โปรดดูใบอนุญาตพื้นที่เก็บข้อมูลสำหรับข้อมูลใบอนุญาตและการใช้งานสำหรับ TPOT

โดยทั่วไป เราได้อนุญาต TPOT เพื่อให้สามารถใช้งานได้อย่างกว้างขวางที่สุดเท่าที่จะเป็นไปได้

การติดตั้ง

เราเก็บรักษาคำแนะนำในการติดตั้ง TPOT ไว้ในเอกสารประกอบ TPOT ต้องการการติดตั้ง Python ที่ใช้งานได้

การใช้งาน

TPOT สามารถใช้บนบรรทัดคำสั่งหรือด้วยโค้ด Python

คลิกลิงก์ที่เกี่ยวข้องเพื่อค้นหาข้อมูลเพิ่มเติมเกี่ยวกับการใช้งาน TPOT ในเอกสารประกอบ

ตัวอย่าง

การจำแนกประเภท

ด้านล่างนี้เป็นตัวอย่างการทำงานขั้นต่ำที่มีการจดจำด้วยแสงของชุดข้อมูลตัวเลขที่เขียนด้วยลายมือ

 from tpot import TPOTClassifier
from sklearn . datasets import load_digits
from sklearn . model_selection import train_test_split

digits = load_digits ()
X_train , X_test , y_train , y_test = train_test_split ( digits . data , digits . target ,
                                                    train_size = 0.75 , test_size = 0.25 , random_state = 42 )

tpot = TPOTClassifier ( generations = 5 , population_size = 50 , verbosity = 2 , random_state = 42 )
tpot . fit ( X_train , y_train )
print ( tpot . score ( X_test , y_test ))
tpot . export ( 'tpot_digits_pipeline.py' )

การเรียกใช้โค้ดนี้ควรค้นพบไปป์ไลน์ที่มีความแม่นยำในการทดสอบประมาณ 98% และโค้ด Python ที่เกี่ยวข้องควรถูกส่งออกไปยังไฟล์ tpot_digits_pipeline.py และมีลักษณะคล้ายกับตัวอย่างต่อไปนี้:

 import numpy as np
import pandas as pd
from sklearn . ensemble import RandomForestClassifier
from sklearn . linear_model import LogisticRegression
from sklearn . model_selection import train_test_split
from sklearn . pipeline import make_pipeline , make_union
from sklearn . preprocessing import PolynomialFeatures
from tpot . builtins import StackingEstimator
from tpot . export_utils import set_param_recursive

# NOTE: Make sure that the outcome column is labeled 'target' in the data file
tpot_data = pd . read_csv ( 'PATH/TO/DATA/FILE' , sep = 'COLUMN_SEPARATOR' , dtype = np . float64 )
features = tpot_data . drop ( 'target' , axis = 1 )
training_features , testing_features , training_target , testing_target = 
            train_test_split ( features , tpot_data [ 'target' ], random_state = 42 )

# Average CV score on the training set was: 0.9799428471757372
exported_pipeline = make_pipeline (
    PolynomialFeatures ( degree = 2 , include_bias = False , interaction_only = False ),
    StackingEstimator ( estimator = LogisticRegression ( C = 0.1 , dual = False , penalty = "l1" )),
    RandomForestClassifier ( bootstrap = True , criterion = "entropy" , max_features = 0.35000000000000003 , min_samples_leaf = 20 , min_samples_split = 19 , n_estimators = 100 )
)
# Fix random state for all the steps in exported pipeline
set_param_recursive ( exported_pipeline . steps , 'random_state' , 42 )

exported_pipeline . fit ( training_features , training_target )
results = exported_pipeline . predict ( testing_features )

การถดถอย

ในทำนองเดียวกัน TPOT สามารถเพิ่มประสิทธิภาพไปป์ไลน์สำหรับปัญหาการถดถอยได้ ด้านล่างนี้เป็นตัวอย่างการทำงานขั้นต่ำกับชุดข้อมูลราคาที่อยู่อาศัยของบอสตัน

 from tpot import TPOTRegressor
from sklearn . datasets import load_boston
from sklearn . model_selection import train_test_split

housing = load_boston ()
X_train , X_test , y_train , y_test = train_test_split ( housing . data , housing . target ,
                                                    train_size = 0.75 , test_size = 0.25 , random_state = 42 )

tpot = TPOTRegressor ( generations = 5 , population_size = 50 , verbosity = 2 , random_state = 42 )
tpot . fit ( X_train , y_train )
print ( tpot . score ( X_test , y_test ))
tpot . export ( 'tpot_boston_pipeline.py' )

ซึ่งจะส่งผลให้ไปป์ไลน์ที่มีข้อผิดพลาดเฉลี่ยกำลังสอง (MSE) ประมาณ 12.77 และโค้ด Python ใน tpot_boston_pipeline.py ควรมีลักษณะคล้ายกับ:

 import numpy as np
import pandas as pd
from sklearn . ensemble import ExtraTreesRegressor
from sklearn . model_selection import train_test_split
from sklearn . pipeline import make_pipeline
from sklearn . preprocessing import PolynomialFeatures
from tpot . export_utils import set_param_recursive

# NOTE: Make sure that the outcome column is labeled 'target' in the data file
tpot_data = pd . read_csv ( 'PATH/TO/DATA/FILE' , sep = 'COLUMN_SEPARATOR' , dtype = np . float64 )
features = tpot_data . drop ( 'target' , axis = 1 )
training_features , testing_features , training_target , testing_target = 
            train_test_split ( features , tpot_data [ 'target' ], random_state = 42 )

# Average CV score on the training set was: -10.812040755234403
exported_pipeline = make_pipeline (
    PolynomialFeatures ( degree = 2 , include_bias = False , interaction_only = False ),
    ExtraTreesRegressor ( bootstrap = False , max_features = 0.5 , min_samples_leaf = 2 , min_samples_split = 3 , n_estimators = 100 )
)
# Fix random state for all the steps in exported pipeline
set_param_recursive ( exported_pipeline . steps , 'random_state' , 42 )

exported_pipeline . fit ( training_features , training_target )
results = exported_pipeline . predict ( testing_features )

ตรวจสอบเอกสารเพื่อดูตัวอย่างและบทช่วยสอนเพิ่มเติม

ร่วมสมทบทุน กปท

เรายินดีต้อนรับคุณให้ตรวจสอบปัญหาที่มีอยู่เพื่อหาจุดบกพร่องหรือการปรับปรุงเพื่อดำเนินการต่อไป หากคุณมีแนวคิดในการขยายเวลาไปยัง TPOT โปรดยื่นประเด็นใหม่เพื่อให้เราหารือกัน

ก่อนที่จะส่งผลงานใด ๆ โปรดตรวจสอบหลักเกณฑ์การบริจาคของเรา

มีปัญหาหรือมีคำถามเกี่ยวกับ TPOT?

โปรดตรวจสอบปัญหาเปิดและปิดที่มีอยู่เพื่อดูว่าปัญหาของคุณได้รับการแก้ไขแล้วหรือไม่ หากไม่เป็นเช่นนั้น ให้ส่งปัญหาใหม่ในพื้นที่เก็บข้อมูลนี้เพื่อให้เราตรวจสอบปัญหาของคุณได้

อ้างจาก ตปท

หากคุณใช้ TPOT ในสิ่งพิมพ์ทางวิทยาศาสตร์ โปรดพิจารณาอ้างอิงบทความต่อไปนี้อย่างน้อยหนึ่งบทความ:

ตรัง ที. เลอ, เหวยซวน ฟู่ และเจสัน เอช. มัวร์ (2020) ปรับขนาดการเรียนรู้ของเครื่องอัตโนมัติแบบต้นไม้ให้เป็นข้อมูลขนาดใหญ่ทางชีวการแพทย์ด้วยตัวเลือกชุดคุณลักษณะ ชีวสารสนเทศศาสตร์ .36(1): 250-256.

รายการ BibTeX:

 @article { le2020scaling ,
  title = { Scaling tree-based automated machine learning to biomedical big data with a feature set selector } ,
  author = { Le, Trang T and Fu, Weixuan and Moore, Jason H } ,
  journal = { Bioinformatics } ,
  volume = { 36 } ,
  number = { 1 } ,
  pages = { 250--256 } ,
  year = { 2020 } ,
  publisher = { Oxford University Press }
}

Randal S. Olson, Ryan J. Urbanowicz, Peter C. Andrews, Nicole A. Lavender, La Creis Kidd และ Jason H. Moore (2016) ทำให้วิทยาศาสตร์ข้อมูลชีวการแพทย์เป็นอัตโนมัติผ่านการเพิ่มประสิทธิภาพไปป์ไลน์แบบต้นไม้ การประยุกต์การคำนวณเชิงวิวัฒนาการ , หน้า 123-137.

รายการ BibTeX:

 @inbook { Olson2016EvoBio ,
    author = { Olson, Randal S. and Urbanowicz, Ryan J. and Andrews, Peter C. and Lavender, Nicole A. and Kidd, La Creis and Moore, Jason H. } ,
    editor = { Squillero, Giovanni and Burelli, Paolo } ,
    chapter = { Automating Biomedical Data Science Through Tree-Based Pipeline Optimization } ,
    title = { Applications of Evolutionary Computation: 19th European Conference, EvoApplications 2016, Porto, Portugal, March 30 -- April 1, 2016, Proceedings, Part I } ,
    year = { 2016 } ,
    publisher = { Springer International Publishing } ,
    pages = { 123--137 } ,
    isbn = { 978-3-319-31204-0 } ,
    doi = { 10.1007/978-3-319-31204-0_9 } ,
    url = { http://dx.doi.org/10.1007/978-3-319-31204-0_9 }
}

แรนดัล เอส. โอลสัน, นาธาน บาร์ตลีย์, ไรอัน เจ. เออร์บาโนวิคซ์ และเจสัน เอช. มัวร์ (2016) การประเมินเครื่องมือเพิ่มประสิทธิภาพไปป์ไลน์แบบต้นไม้สำหรับวิทยาการข้อมูลอัตโนมัติ การดำเนินการของ GECCO 2016 , หน้า 485-492.

รายการ BibTeX:

 @inproceedings { OlsonGECCO2016 ,
    author = { Olson, Randal S. and Bartley, Nathan and Urbanowicz, Ryan J. and Moore, Jason H. } ,
    title = { Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science } ,
    booktitle = { Proceedings of the Genetic and Evolutionary Computation Conference 2016 } ,
    series = { GECCO '16 } ,
    year = { 2016 } ,
    isbn = { 978-1-4503-4206-3 } ,
    location = { Denver, Colorado, USA } ,
    pages = { 485--492 } ,
    numpages = { 8 } ,
    url = { http://doi.acm.org/10.1145/2908812.2908918 } ,
    doi = { 10.1145/2908812.2908918 } ,
    acmid = { 2908918 } ,
    publisher = { ACM } ,
    address = { New York, NY, USA } ,
}

หรือคุณสามารถอ้างอิงพื้นที่เก็บข้อมูลได้โดยตรงด้วย DOI ต่อไปนี้:

การสนับสนุนสำหรับ TPOT

TPOT ได้รับการพัฒนาในห้องแล็บพันธุศาสตร์คอมพิวเตอร์ที่มหาวิทยาลัยเพนซิลวาเนียด้วยเงินทุนจาก NIH ภายใต้ทุน R01 AI117694 เรารู้สึกขอบคุณเป็นอย่างยิ่งสำหรับการสนับสนุนของ NIH และมหาวิทยาลัยเพนซิลวาเนียในระหว่างการพัฒนาโครงการนี้

โลโก้ TPOT ออกแบบโดย Todd Newmuis ผู้บริจาคเวลาให้กับโครงการนี้อย่างไม่เห็นแก่ตัว

ขยาย