evostra
1.0.0
进化策略(ES)是一种基于适应和进化思想的优化技术。您可以在 https://blog.openai.com/evolution-strategies/ 了解更多信息
它兼容 python2 和 python3。
从源安装:
$ python setup.py install
使用 pip 从 git 存储库安装最新版本:
$ pip install git+https://github.com/alirezamika/evostra.git
从 PyPI 安装:
$ pip install evostra
(python3可能需要使用python3或pip3)
人工智能代理使用 evostra 学习玩《flappybird》
人工智能代理使用 evostra 学习走路
EvolutionStrategy模块的输入权重是一个数组列表(神经网络的每一层都有一个任意形状的数组),因此我们可以使用任何框架来构建模型并将权重传递给ES。
例如,我们可以使用 Keras 构建模型并将其权重传递给 ES,但这里我们使用 Evostra 的内置模型 FeedForwardNetwork,它对于我们的用例来说要快得多:
import numpy as np
from evostra import EvolutionStrategy
from evostra . models import FeedForwardNetwork
# A feed forward neural network with input size of 5, two hidden layers of size 4 and output of size 3
model = FeedForwardNetwork ( layer_sizes = [ 5 , 4 , 4 , 3 ])
现在我们定义 get_reward 函数:
solution = np . array ([ 0.1 , - 0.4 , 0.5 ])
inp = np . asarray ([ 1 , 2 , 3 , 4 , 5 ])
def get_reward ( weights ):
global solution , model , inp
model . set_weights ( weights )
prediction = model . predict ( inp )
# here our best reward is zero
reward = - np . sum ( np . square ( solution - prediction ))
return reward
现在我们可以构建 EvolutionStrategy 对象并运行它进行一些迭代:
# if your task is computationally expensive, you can use num_threads > 1 to use multiple processes;
# if you set num_threads=-1, it will use number of cores available on the machine; Here we use 1 process as the
# task is not computationally expensive and using more processes would decrease the performance due to the IPC overhead.
es = EvolutionStrategy ( model . get_weights (), get_reward , population_size = 20 , sigma = 0.1 , learning_rate = 0.03 , decay = 0.995 , num_threads = 1 )
es . run ( 1000 , print_step = 100 )
这是输出:
iter 100。奖励:-68.819312 iter 200。奖励:-0.218466 iter 300。奖励:-0.110204 iter 400。奖励:-0.001901 iter 500。奖励:-0.000459 iter 600。奖励:-0.000287 iter 700。奖励:-0.000939 iter 800。奖励:-0.000504 iter 900。奖励:-0.000522 iter 1000。奖励:-0.000178
现在我们有了优化的权重,我们可以更新我们的模型:
optimized_weights = es . get_weights ()
model . set_weights ( optimized_weights )