MAGNet
Deep Multi-Agent Reinforcement Learning with Relevance Graphs (NIPS2018)
Agent 和environment object作为节点建图
图边结构不变,边的权重通过graph generation network (GGN) 学习
Mean-field MARL
Mean Field Multi-Agent Reinforcement Learning (ICML2018)
Mean Field Theory (Stanley, 1971) – the interactions within the population of agents are approximated by that of a single agent played with the average effect from the overall (local) population.
Virtual Central Agent
优化central agent近似于优化neighbor agent
but the mean action eliminates the difference among agents and thus incurs the loss of important information that could help cooperation
迭代,最后mean action和policy收敛
RFM
RELATIONAL FORWARD MODELS FOR MULTI-AGENT LEARNING (ICLR2019)
The output of a GN is also a graph, with the same connectivity structure as the input graph (i.e., same number of vertices and edges, as well as same sender and receiver for each edge), but updated global, vertex, and edge attributes.
NeurComm (ICLR2020)
MULTI-AGENT REINFORCEMENT LEARNING FOR NETWORKED SYSTEM CONTROL
MARL通信机制:
- non-communicative;MADDPG,COMA
- heuristic communication protocols or direct information sharing;
- learnable communication protocols;DIAL,BICNet
- communication attentions to selectively send messages;ATOC,IC3Net
Spatio-Temporal RL
The major assumption in Definition 3.2 is that the Markovian property holds both temporally and spatially, so that the next local state depends on the neighborhood states and policies only.
This assumption is valid in most networked control systems such as traffic and wireless networks, as well as the power grid, where the impact of each agent is spread over the entire system via controlled flows, or chained local transitions.
Eq 2 的缩放因子 to scale down reward signals further away (which are more difficult to fit using local information (另外,越远的通信代价越高)).
When $\alpha$ -> 0, each agent performs local greedy control; when $\alpha$ -> 1, each agent performs global coordination
the immediate local reward of each agent is only affected by controls within its closed neighborhood.
Neural Communication
$h_t = f(h_{t-1}, policy, state, h_{neighbor})$
输入有hidden state(belief),policy,state。以上可以multi-pass,多过几遍
DGN
GRAPH CONVOLUTIONAL REINFORCEMENT LEARNING (ICLR2020)
图是动态的,agent move, enter, leave the env
为了训练稳定,邻接矩阵在相邻的two steps不变
Relation kernel: multi-head attention
TEMPORAL RELATION REGULARIZATION:两个相连时刻的attention distribution的KLD
DCG
Deep Coordination Graphs (ICML2020)
- A higher-order value factorization can be expressed as an undirected coordination graph
Specialized behavior between agents can be represented by conditioning on the agent’s role, or more generally on the agent’s ID (learned embedding of the participating agents’ histories.)
Low-rank approximation:类似把agent分K组
G2ANet
Multi-Agent Game Abstraction via Graph Attention Neural Network (AAAI2020)
先建全连接图,hard attention cut edge,soft attention学习边权重 (每个step重复建图)
为每个agent建立一个subgraph
GC
MULTI-AGENT REINFORCEMENT LEARNING WITH GRAPH CLUSTERING
KNN计算邻接矩阵
Group feature:GRU输出mean,var,高斯采样,然后进GAT
保证不同cluster之间的距离
ROMA
ROMA: Multi-Agent Reinforcement Learning with Emergent Roles (ICML2020)
agents with similar roles have both similar policies and responsibilities.
Dynamic
- local observation→(mean,std)→role→hypernet→utility function
- Identifiable
为了role的temporally stable
最大化互信息:observation和(trajectory,role)
- Specialized
- Such formulation makes sure that dissimilarity d is high only when mutual information I is low, so that the set of learned roles is compact but diverse
RODE
RODE: LEARNING ROLES TO DECOMPOSE MULTI-AGENT TASKS (ICLR2021)
先k-means聚类action vector (像parameter sharing)
Role vector取action mean pooling
Role, history余弦相似度进行assign
选好role之后,保持c timesteps
Role selector TD loss
Role policy TD loss
ROCHICO
Structured Diversification Emergence via Reinforced Organization Control and Hierarchical Consensus Learning (AAMAS2021)
- Organization control module
最多m个最近邻邻居节点
两个step 的subgraph的graph edit distance不能太大
- hierarchical consensus module
DeepSet 融合一个subgraph里的agent的表示
找弱连通图,Contrastive learning保证diversity
一个team是全连接图
- decision module
总结 (共同点) :
- 保证temporally stable
Graph/role 维持一段时间
KL距离,graph edit distance不能太大
Obs和role、trajectory互信息最大
- Group内尽量相似
低秩近似
Mean-field、DeepSet
Parameter sharing
- 不同group要diversity
KLD不能太小
互信息小
对抗训练