xpag.agents.flax_agents.tqc.tqc.TQCLearner#
- class TQCLearner(seed, observations, actions, actor_lr, critic_lr, temp_lr, hidden_dims_actor, discount, tau, target_update_period, target_entropy, backup_entropy, init_temperature, init_mean, policy_final_fc_init_scale, hidden_dims_critic, num_critics, num_quantiles, num_quantiles_to_drop)#
Bases:
SACLearner
An implementation of the version of Soft-Actor-Critic described in https://arxiv.org/abs/1812.05905
Methods
sample_actions
- rtype:
Array
update
- rtype:
Dict
[str
,float
]