RL for Resource Managemet

Resource Management with Deep Reinforcement Learning (HotNet2016)

intro

Perusing recent research in the field, the typical design flow is:

  • come up with clever heuristics for a simplified model of the problem;

  • painstakingly test and tune the heuristics for good performance in practice.

This process often has to be repeated if some aspect of the problem such as the workload or the metric of interest changes.

DeepRM does not require any prior knowledge of the system’s behavior to learn these strategies.

model

image-20201007153017280

Object: average job slowdown

discuss

  1. 假设机器资源很大,没有考虑data locality
  2. 没有考虑consist of multiple stages的job,job的资源属性是不可观测的,部分观测MDP(Further, the resource profile of a job may not be known in advance (e.g., for non-recurring jobs), and the scheduler might have get an accurate view only as the job runs. The RL paradigm can in principle deal with such situations of partial observability by casting the decision problem as a POMDP)
  3. 有限的time-horizon,设计time-independent的value net

A New Approach for Resource Scheduling with Deep Reinforcement Learning (arxiv2018)

intro

Due to many factors that need to be consider, the scheduling problem in most cases is an NP-hard problem or an NP-complete problem.

contributions:

  1. imitation learning used as an initial policy
  2. cnn
  3. redefine cluster capacity
  4. online and offline versions of DeepRM
    • Online: arrive with possion dist
    • Offline: arrive at one time
image-20201007161210816

discuss

  1. multi-cluster scheduler,Because jobs do not perfectly fill all the cluster space, each node will have some resource fragmentation that is difficult to use.

Intelligent Cloud Resource Management with Deep Reinforcement Learning (IEEE Cloud Computing2017)

intro

issues in traditional resource management:

  1. 需要协调cloud environment, physical resources, virtual resources, which is a vast project.
  2. cloud systems are configured manually and heavily rely on relevant knowledge and experiences.

model

State: a configuration of multiple resources, such as the amounts of CPUs and memories.

Action: one-unit adjustment of a single resource, such as one-unit increase of CPU.

Reward: a two-dimensional vector involving performance and cost

image-20201007170658426

discuss

  1. 不一定服从MDP
  2. hierarchical RL for semi-MDP

Data Centers Job Scheduling with Deep Reinforcement Learning (PAKDD2020)

使用A2C来做

A Reinforcement Learning Based Resource Management Approach for Time-critical Workloads in Distributed Computing Environment (BigData2018)

intro

time-critical and non-time-critical applications as hybrid workloads

解决多个job:每个job作为输入,输出一个vector,这些vector拼接一起?

随机化cluster的状态以增加泛化能力

Deep Reinforcement Learning based Elasticity-compatible Heterogeneous Resource Management for Time-critical Computing (ICPP2020)

intro

both time-critical and general applications

All computing environment features such as multi-cluster, elasticity and heterogeneity (类似于job和machine的亲和性), and the job features such as job type and timeliness, add up to the complexity of generating a satisfactory resource management approach that can well utilize the multi-cluster environment.

clusters are allowed to have elasticity, that is, their computing capability in terms of executor numbers, can be temporarily expanded (with an upper bound) when necessary, to fit workload pressure.

Jobs:

  1. streaming jobs, repetitive executions with data batches and time-critical.
  2. non-streaming timecritical jobs
  3. other general non-time-critical jobs

state使用LSTM

多目标RL

A novel multi agent reinforcement learning approach for job scheduling in Grid computing (2011)

Decentralized

Resource Allocation in the Grid Using Reinforcement Learning

multi-agent

Online Resource Allocation Using Decompositional Reinforcement Learning (AAAI2005)

Online Planner Selection with Graph Neural Networks and Adaptive Scheduling (AAAI2020)