Data Science for Business

Optimization of sattelite constellation

Problem:
using a satellite constellation, find an optimal control policy (=sattelite trajectory, speed and antenna direction angles) to maximize covered population for an internet service.

Solution:

All satellites are combined in one reward function with a penalty as “uncovered population”*K-factor.
Switched to “ray.rllib.algorithms” and using one external registered environment. It makes quite easy to change external algorithms (A3C, Impala, PPO, BC look best) and optimize hypeparameters.
A model gets trained and deployed using Amazon CodeCommit, Lambda, and Sagemaker (RLEstimator).
Tested Multi-Agent RL: In this approach, each agent cares about the action of only one entity (sat.) in the network. It is much harder to train and tune as compared to single-agent RL models, since the success of the entire model depends on the good training and tuning of every agent (shared vs multiple policies).

Optimization of sattelite constellation.