概述

ELF 是一个广泛、轻量级和灵活的游戏研究平台，尤其适用于实时战略 (RTS) 游戏。在 C++ 端，ELF 与 C++ 线程并行运行多个游戏。在 Python 端，ELF 一次返回一批游戏状态，这对现代 RL 非常友好。相比之下，其他平台（例如 OpenAI Gym）用一个 Python 接口包装了一个游戏实例。这使得并发游戏执行有点复杂，这是许多现代强化学习算法的要求。

此外，ELF 现在还提供了一个运行并发游戏环境的 Python 版本，通过 Python 多处理和 ZeroMQ 进程间通信。请参阅./ex_elfpy.py一个简单的示例。

对于 RTS 游戏的研究，ELF 带有一个快速的 RTS 引擎，以及三个具体的环境：MiniRTS、夺旗和塔防。MiniRTS 拥有即时战略游戏的所有关键动态，包括收集资源、建造设施和部队、侦察可感知区域之外的未知领土、防御/攻击敌人。用户可以访问其内部表示并可以自由更改游戏设置。

ELF具有以下特点：

端到端：ELF 为游戏研究提供端到端的解决方案。它提供微型实时战略游戏环境、并发模拟、直观的 API、基于 Web 的可视化，还配备了由Pytorch支持的强化学习后端，资源需求最少。
广泛：任何具有 C/C++ 接口的游戏都可以通过编写一个简单的包装器插入到这个框架中。例如，我们已经将 Atari 游戏整合到我们的框架中，并表明每核的模拟速度与单核版本相当，因此比使用多处理或 Python 多线程的实现要快得多。未来，我们计划整合更多环境，例如 DarkForest Go 引擎。
轻量级：ELF 以最小的开销运行得非常快。带有基于 RTS 引擎构建的简单游戏 (MiniRTS) 的 ELF在 MacBook Pro 上每内核每秒运行 40K 帧。从头开始训练一个模型来玩 MiniRTS在 6 CPU + 1 GPU 上需要一天的时间。
灵活：环境和参与者之间的配对非常灵活，例如，一个环境与一个代理（例如，Vanilla A3C），一个环境与多个代理（例如，Self-play/MCTS），或多个环境与一个参与者（例如，BatchA3C） , GA3C). 此外，任何构建在 RTS 引擎之上的游戏都可以完全访问其内部表示和动态。除了高效的模拟器，我们还提供了一个轻量级但功能强大的强化学习框架。该框架可以承载大多数现有的 RL 算法。在此开源版本中，我们提供了用PyTorch编写的最先进的演员评论家算法。

教程

[hidecontent type="logged" desc="隐藏内容：登录后可查看"]

http://yuandong-tian.com/elf-tutorial/tutorial.html

安装脚本

您需要cmake>= 3.8、gcc>= 4.9 和tbb(linux libtbb-dev) 才能成功安装此脚本。

# Download miniconda and install. 
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O $HOME/miniconda.sh
/bin/bash $HOME/miniconda.sh -b
$HOME/miniconda3/bin/conda update -y --all python=3

# Add the following to ~/.bash_profile (if you haven't already) and source it:
export PATH=$HOME/miniconda3/bin:$PATH

# Create a new conda environment and install the necessary packages:
conda create -n elf python=3
source activate elf
# If you use cuda 8.0
# conda install pytorch cuda80 -c soumith
conda install pytorch -c soumith 

pip install --upgrade pip
pip install msgpack_numpy
conda install tqdm
conda install libgcc

# Install cmake >= 3.8, gcc >= 4.9 and libtbb-dev
# This is platform-dependent.

# Clone and build the repository:
cd ~
git clone https://github.com/facebookresearch/ELF
cd ELF/rts/
mkdir build && cd build
cmake .. -DPYTHON_EXECUTABLE=$HOME/miniconda3/bin/python
make

# Train the model
cd ../..
sh ./train_minirts.sh --gpu 0

支持的环境

任何具有 C/C++ 接口的游戏都可以通过编写一个简单的包装器插入到这个框架中。目前我们有以下环境：

MiniRTS 及其扩展( ./rts) 一种微型实时战略游戏，捕捉其类型的关键动态，包括建造工人、收集资源、探索看不见的领土、保卫敌人并反击他们。该游戏运行速度极快（在笔记本电脑上每核 40K FPS）以促进许多现有的策略强化学习方法的使用。
Atari 游戏( ./atari) 我们将 Arcade Learning Environment (ALE) 整合到 ELF 中，以便您可以加载任何 rom 并轻松运行 1000 个并发游戏实例。
Go 引擎( ./go) 我们在 ELF 平台上重新实现了我们的DarkForest Go 引擎。现在您可以轻松地加载一堆 .sgf 文件并以最少的资源需求（即单个 GPU 加一周）训练您自己的 Go AI。

参考

当您使用 ELF 时，请参考包含以下 BibTex 条目的论文：

ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games
Yuandong Tian, Qucheng Gong, Wenling Shang, Yuxin Wu, C. Lawrence Zitnick
NIPS 2017

@article{tian2017elf, 
  title={ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games},
  author={Yuandong Tian and Qucheng Gong and Wenling Shang and Yuxin Wu and C. Lawrence Zitnick},
  journal={Advances in Neural Information Processing Systems (NIPS)},
  year={2017}
}

文档

在此处查看详细文档。您还可以./doc使用编译您的版本sphinx。

基本用法

ELF 非常易于使用。初始化如下所示：

# We run 1024 games concurrently.
num_games = 1024

# Wait for a batch of 256 games.
batchsize = 256  

# The return states contain key 's', 'r' and 'terminal'
# The reply contains key 'a' to be filled from the Python side.
# The definitions of the keys are in the wrapper of the game.  
input_spec = dict(s='', r='', terminal='')
reply_spec = dict(a='')

context = Init(num_games, batchsize, input_spec, reply_spec)

主循环也很简单：

# Start all game threads and enter main loop.
context.Start()  
while True:
    # Wait for a batch of game states to be ready
    # These games will be blocked, waiting for replies.
    batch = context.Wait()

    # Apply a model to the game state. The output has key 'pi'
    # You can do whatever you want here. E.g., applying your favorite RL algorithms.
    output = model(batch)

    # Sample from the output to get the actions of this batch.
    reply['a'][:] = SampleFromDistribution(output)

    # Resume games.
    context.Steps()   

# Stop all game threads.
context.Stop()

请检查train.py和eval.py以获取实际可运行的代码。

依赖性

需要支持 C++11 的 C++ 编译器（例如，gcc >= 4.9）。需要以下库tbb。还需要 CMake >=3.8。

需要 Python 3.x。另外，你需要安装以下包：PyTorch version 0.2.0+, tqdm, zmq, msgpack,msgpack_numpy

如何训练

要为 MiniRTS 训练模型，请先编译./rts/game_MC（参见使用说明./rts/）cmake。请注意./rts/backend，除非您想查看可视化，否则训练不需要编译。

然后请在当前目录下运行以下命令（也可以参考train_minirts.sh）：

game=./rts/game_MC/game model=actor_critic model_file=./rts/game_MC/model \ 
python3 train.py 
    --num_games 1024 --batchsize 128                                                                  # Set number of games to be 1024 and batchsize to be 128.  
    --freq_update 50                                                                                  # Update behavior policy after 50 updates of the model.
    --players "fs=50,type=AI_NN,args=backup/AI_SIMPLE|delay/0.99|start/500;fs=20,type=AI_SIMPLE"      # Specify AI and its opponent, separated by semicolon. `fs` is frameskip that specifies How often your opponent makes a decision (e.g., fs=20 means it acts every 20 ticks)
                                                                                                      # If `backup` is specified in `args`, then we use rule-based AI for the first `start` ticks, then trained AI takes over. `start` decays with rate `decay`. 
    --tqdm                                                                  # Show progress bar.
    --gpu 0                                                                 # Use first gpu. If you don't specify gpu, it will run on CPUs. 
    --T 20                                                                  # 20 step actor-critic
    --additional_labels id,last_terminal         
    --trainer_stats winrate                                                 # If you want to see the winrate over iterations. 
                                                                            # Note that the winrate is computed when the action is sampled from the multinomial distribution (not greedy policy). 
                                                                            # To evaluate your model more accurately, please use eval.py.

请注意，长视野（例如--T 20）可以使训练更快并且（同时）稳定。对于长视界，您应该能够在 12 小时内使用 16CPU 和 1GPU 将其训练到 70% 的胜率。您可以使用控制训练中使用的 CPU 数量taskset -c。

这是一个训练有素的模型AI_SIMPLE，在 frameskip=50时胜率为 80% 。这是一场比赛重播。

以下是训练期间的示例输出：

Version:  bf1304010f9609b2114a1adff4aa2eb338695b9d_staged
Num Actions:  9
Num unittype:  6
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5000/5000 [01:35<00:00, 52.37it/s]
[2017-07-12 09:04:13.212017][128] Iter[0]:
Train count: 820/5000, actor count: 4180/5000
Save to ./
Filename = ./save-820.bin
Command arguments run.py --batchsize 128 --freq_update 50 --fs_opponent 20 --latest_start 500 --latest_start_decay 0.99 --num_games 1024 --opponent_type AI_SIMPLE --tqdm
0:acc_reward[4100]: avg: -0.34079, min: -0.58232[1580], max: 0.25949[185]
0:cost[4100]: avg: 2.15912, min: 1.97886[2140], max: 2.31487[1173]
0:entropy_err[4100]: avg: -2.13493, min: -2.17945[438], max: -2.04809[1467]
0:init_reward[820]: avg: -0.34093, min: -0.56980[315], max: 0.26211[37]
0:policy_err[4100]: avg: 2.16714, min: 1.98384[1520], max: 2.31068[1176]
0:predict_reward[4100]: avg: -0.33676, min: -1.36083[1588], max: 0.39551[195]
0:reward[4100]: avg: -0.01153, min: -0.13281[1109], max: 0.04688[124]
0:rms_advantage[4100]: avg: 0.15646, min: 0.02189[800], max: 0.79827[564]
0:value_err[4100]: avg: 0.01333, min: 0.00024[800], max: 0.06569[1549]

 86%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                    | 4287/5000 [01:23<00:15, 46.97it/s]

要为 MiniRTS 评估模型，请尝试以下命令（您也可以参考eval_minirts.sh）：

game=./rts/game_MC/game model=actor_critic model_file=./rts/game_MC/model \ 
python3 eval.py 
    --load [your model]
    --batchsize 128 
    --players "fs=50,type=AI_NN;fs=20,type=AI_SIMPLE"  
    --num_games 1024 
    --num_eval 10000
    --tqdm                          # Nice progress bar
    --gpu 0                         # Use GPU 0 as the evaluation gpu.
    --additional_labels id          # Tell the game environment to output additional dict entries.
    --greedy                        # Use greedy policy to evaluate your model. If not specified, then it will sample from the action distributions.

这是一个示例输出（用 12 个 CPU 评估 10k 个游戏需要 1 分 40 秒）：

Version:  dc895b8ea7df8ef7f98a1a031c3224ce878d52f0_
Num Actions:  9
Num unittype:  6
Load from ./save-212808.bin
Version:  dc895b8ea7df8ef7f98a1a031c3224ce878d52f0_
Num Actions:  9
Num unittype:  6
100%|████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [01:40<00:00, 99.94it/s]
str_acc_win_rate: Accumulated win rate: 0.735 [7295/2628/9923]
best_win_rate: 0.7351607376801297
new_record: True
count: 0
str_win_rate: [0] Win rate: 0.735 [7295/2628/9923], Best win rate: 0.735 [0]
Stop all game threads ...

自玩

如果您想在 Minirts 中进行自我游戏，请尝试以下脚本。它将从两个机器人开始，都从预训练模型开始。一个机器人将随着时间的推移进行训练，而另一个则保持不变。如果你只是想在没有训练的情况下检查他们的胜率，试试--actor_only。

sh ./selfplay_minirts.sh [your pre-trained model]

可视化

要可视化训练有素的机器人，您可以指定--save_replay_prefix [replay_file_prefix]何时运行eval.py以保存（大量）回放。请注意，同样的标志也可以应用于训练/自我游戏。

所有重播文件都包含动作序列，在.rep加载时应该重现完全相同的游戏。要在命令行中加载重播，请使用以下命令：

./minirts-backend replay --load_replay [your replay] --vis_after 0

并打开网页./rts/frontend/minirts.html查看游戏。要仅在命令行中加载和运行回放（例如，如果您只想快速查看谁赢得了比赛），请尝试：

./minirts-backend replay_cmd --load_replay [your replay]

[/hidecontent]

概述

教程

安装脚本

支持的环境

参考

相关资料

文档

基本用法

依赖性

如何训练

自玩

可视化