Structured Multi-Agent RL

2022-07-16 0 Word Count: 841(words) Read Count: 4(minutes)

MAGNet

Deep Multi-Agent Reinforcement Learning with Relevance Graphs (NIPS2018)

Agent 和environment object作为节点建图
图边结构不变，边的权重通过graph generation network (GGN) 学习

Mean-field MARL

Mean Field Multi-Agent Reinforcement Learning (ICML2018)

Mean Field Theory (Stanley, 1971) – the interactions within the population of agents are approximated by that of a single agent played with the average effect from the overall (local) population.

Virtual Central Agent

优化central agent近似于优化neighbor agent

but the mean action eliminates the difference among agents and thus incurs the loss of important information that could help cooperation

迭代，最后mean action和policy收敛

RFM

RELATIONAL FORWARD MODELS FOR MULTI-AGENT LEARNING (ICLR2019)

The output of a GN is also a graph, with the same connectivity structure as the input graph (i.e., same number of vertices and edges, as well as same sender and receiver for each edge), but updated global, vertex, and edge attributes.

NeurComm (ICLR2020)

MULTI-AGENT REINFORCEMENT LEARNING FOR NETWORKED SYSTEM CONTROL

MARL通信机制：

non-communicative；MADDPG，COMA
heuristic communication protocols or direct information sharing；
learnable communication protocols；DIAL，BICNet
communication attentions to selectively send messages；ATOC，IC3Net

Spatio-Temporal RL

The major assumption in Deﬁnition 3.2 is that the Markovian property holds both temporally and spatially, so that the next local state depends on the neighborhood states and policies only.

This assumption is valid in most networked control systems such as trafﬁc and wireless networks, as well as the power grid, where the impact of each agent is spread over the entire system via controlled ﬂows, or chained local transitions.

Eq 2 的缩放因子 to scale down reward signals further away (which are more difﬁcult to ﬁt using local information （另外，越远的通信代价越高）).

When $\alpha$ -> 0, each agent performs local greedy control; when $\alpha$ -> 1, each agent performs global coordination

the immediate local reward of each agent is only affected by controls within its closed neighborhood.

Neural Communication

$h_t = f(h_{t-1}, policy, state, h_{neighbor})$

输入有hidden state（belief），policy，state。以上可以multi-pass，多过几遍

DGN

GRAPH CONVOLUTIONAL REINFORCEMENT LEARNING （ICLR2020）

图是动态的，agent move, enter, leave the env
为了训练稳定，邻接矩阵在相邻的two steps不变
Relation kernel: multi-head attention
TEMPORAL RELATION REGULARIZATION：两个相连时刻的attention distribution的KLD

DCG

Deep Coordination Graphs (ICML2020)

A higher-order value factorization can be expressed as an undirected coordination graph

Specialized behavior between agents can be represented by conditioning on the agent’s role, or more generally on the agent’s ID (learned embedding of the participating agents’ histories.)
Low-rank approximation：类似把agent分K组

G2ANet

Multi-Agent Game Abstraction via Graph Attention Neural Network (AAAI2020)

先建全连接图，hard attention cut edge，soft attention学习边权重（每个step重复建图）
为每个agent建立一个subgraph

GC

MULTI-AGENT REINFORCEMENT LEARNING WITH GRAPH CLUSTERING

KNN计算邻接矩阵
Group feature：GRU输出mean，var，高斯采样，然后进GAT
保证不同cluster之间的距离

ROMA

ROMA: Multi-Agent Reinforcement Learning with Emergent Roles (ICML2020)

agents with similar roles have both similar policies and responsibilities.
Dynamic

local observation→（mean，std）→role→hypernet→utility function

Identiﬁable

为了role的temporally stable
最大化互信息：observation和（trajectory，role）

Specialized

Such formulation makes sure that dissimilarity d is high only when mutual information I is low, so that the set of learned roles is compact but diverse

RODE

RODE: LEARNING ROLES TO DECOMPOSE MULTI-AGENT TASKS (ICLR2021)

先k-means聚类action vector (像parameter sharing)
Role vector取action mean pooling
Role， history余弦相似度进行assign
选好role之后，保持c timesteps
Role selector TD loss
Role policy TD loss

ROCHICO

Structured Diversification Emergence via Reinforced Organization Control and Hierarchical Consensus Learning (AAMAS2021)

Organization control module

最多m个最近邻邻居节点
两个step 的subgraph的graph edit distance不能太大

hierarchical consensus module

DeepSet 融合一个subgraph里的agent的表示
找弱连通图，Contrastive learning保证diversity
一个team是全连接图

decision module

总结 (共同点) :

保证temporally stable

Graph/role 维持一段时间
KL距离，graph edit distance不能太大
Obs和role、trajectory互信息最大

Group内尽量相似

低秩近似
Mean-field、DeepSet
Parameter sharing

不同group要diversity

KLD不能太小
互信息小
对抗训练

本文链接： http://example.com/2022/07/16/Structured-Multi-Agent-RL/

版权声明： 本博客所有文章除特别声明外，均采用 CC BY 4.0 CN协议许可协议。转载请注明出处！

xmz