Devlog #8 - Evaluating Trained Models


Model Comparison

ParameterDefault ModelSearchAgent
 (Custom)
Description
trainer_typeppoppoSame algorithm used
max_steps500,0003,000,000Extended training for more learning and convergence
summary_freq50,00010,000More frequent summaries for closer monitoring
HYPERPARAMETERS________________________________________________
learning_rate3e-43e-4No change
batch_size10241024No change
buffer_size10,24010,240No change
betanot set2.5e-4Regularisation to reduce policy entropy
epsilonnot set0.2PPO clipping parameter to control policy updates
lambdnot set0.95GAE lambda for bias-variance trade-off in advantage estimation
num_epochnot set3Number of passes over data per policy update
learning_rate_schedulelinearlinearGradual learning rate decay over training
NETWORK SETTINGS________________________________________________
hidden_units128256Larger network capacity to learn more complex features
num_layers22Same depth for balance between expressiveness and speed
normalizefalsetrueNormalise inputs to stabilise and speed training
REWARD SIGNALS________________________________________________
EXTRINSIC________________________________________________
gamma 0.990.99No change
strength
1.01.0No change
CURIOSITY________________________________________________
strengthnot set0.1Added curiosity for intrinsic motivation/exploration
gammanot set0.99Discount factor for curiosity rewards
learning_ratenot set0.0003Learning rate specific to curiosity module

Leave a comment

Log in with itch.io to leave a comment.