Problem:
using a satellite constellation, find an optimal control policy
(=sattelite trajectory, speed and antenna direction angles) to maximize covered population for an internet service.
Solution:
All satellites are combined in one reward function with a penalty as “uncovered population”*K-factor.
Switched to “ray.rllib.algorithms” and using one external registered environment. It makes quite easy to
change external algorithms (A3C, Impala, PPO, BC look best) and optimize hypeparameters.
A model gets trained and deployed using Amazon CodeCommit, Lambda, and Sagemaker (RLEstimator).
Tested Multi-Agent RL: In this approach, each agent cares about the action of only one entity (sat.) in the network.
It is much harder to train and tune as compared to single-agent RL models, since the success of the entire model depends
on the good training and tuning of every agent (shared vs multiple policies).