I am familiar with the basics (and possibly a considerable number of fundamentals) of imitative learning and reinforcement learning. In IL (imitation), we accept demonstrations from a suspected expert, whom we also believe to be the most efficient policy.
What does this statement mean: Train a policy network based on the expert record and save the results.
Question: Why do I have to train a policy network? What is it for?
Are there any algorithms that include a policy with this training?