Training (src/training/)
Trainer-agnostic training pipeline. Concrete trainers (e.g. neural-network or tensor-network agents) implement the train_episode! interface; the TrainingSession machinery handles background scheduling, metrics collection, and persistence.
Types and core interface (types.jl)
Reversi.AbstractTrainer — Type
AbstractTrainerAbstract type for all trainer implementations.
Required interface:
train_episode!(trainer, episode::Int) -> TrainingMetrics
Optional interface (with sensible defaults — override as needed):
predict_value(trainer, game) -> Float32# value of current statehyperparameters(trainer) -> Dict{String,Any}# for UI displayopponent(trainer) -> Union{Player,Nothing}# baseline opponent (nothing = self-play)batch_size(trainer) -> Int# episodes per batch (default 1)train_batch!(trainer, ep_start, n) -> Vector{TrainingMetrics}save_trainer(trainer, path)/load_trainer(path)# persistence
Reversi.TrainingMetrics — Type
TrainingMetricsMetrics collected from a single training episode (one game).
Fields:
episode— episode number (1-based)winner—BLACK,WHITE, orEMPTY(draw)black_score— final black piece countwhite_score— final white piece countwin_rate— per-episode indicator (1.0 if BLACK won, 0.0 otherwise)policy— 8×8 move frequency / probability heatmapvalue— predicted value of the initial state from BLACK's perspectiveloss— training loss (if a weight update occurred this episode), ornothing
Reversi.TrainingSession — Type
TrainingSessionManages the lifecycle of a training run: background task, metrics history, locking.
Reversi.batch_size — Method
batch_size(trainer::AbstractTrainer) -> IntReturn the number of episodes the trainer prefers to run before reporting metrics. Trainers that accumulate gradients across episodes should override this. Default: 1.
Reversi.hyperparameters — Method
hyperparameters(trainer::AbstractTrainer) -> Dict{String,Any}Return a dictionary of hyperparameters for display in the training UI. Default: empty dict.
Reversi.load_trainer — Method
load_trainer(path::AbstractString) -> AbstractTrainerDeserialize a trainer previously saved with save_trainer.
Reversi.opponent — Method
opponent(trainer::AbstractTrainer) -> Union{Player,Nothing}Return the baseline opponent for the trainer, or nothing for pure self-play. Default: nothing.
Reversi.predict_value — Method
predict_value(trainer::AbstractTrainer, game::ReversiGame) -> Float32Predict the value of game's current state from the current player's perspective. Convention: positive = current player winning, in roughly [-1, 1]. Default: 0.0f0 (no information).
Reversi.save_trainer — Method
save_trainer(trainer::AbstractTrainer, path::AbstractString)Serialize trainer to path using Julia's Serialization stdlib. Trainers with framework-specific state (e.g. Flux models with GPU buffers) should override this with a format-appropriate writer.
Reversi.train_batch! — Method
train_batch!(trainer, episode_start::Int, n::Int) -> Vector{TrainingMetrics}Run n consecutive episodes. Default implementation calls train_episode! sequentially. Trainers that need to accumulate gradients across the batch should override this method.
Reversi.train_episode! — Function
train_episode!(trainer::AbstractTrainer, episode::Int) -> TrainingMetricsRun one training episode and return the collected metrics. Must be implemented by every concrete AbstractTrainer subtype.
Session lifecycle (session.jl)
Reversi.start_training! — Method
start_training!(session::TrainingSession)Launch the training loop as a background Task. Each episode calls train_episode!(trainer, episode_number) and appends the result to session.metrics_history.
Reversi.stop_training! — Method
stop_training!(session::TrainingSession)Signal the training loop to stop after the current episode finishes.
Reversi.training_history — Method
training_history(session::TrainingSession) -> Vector{Dict}Return the full metrics history as a vector of dicts (JSON-serializable).
Reversi.training_policy — Method
training_policy(session::TrainingSession) -> Matrix{Float32}Return the latest policy heatmap (8×8). Returns zeros if no episodes completed.
Reversi.training_status — Method
training_status(session::TrainingSession) -> DictReturn the current training status: running flag, episode count, latest metrics.
Built-in trainers (random_trainer.jl)
Reversi.RandomTrainer — Type
RandomTrainer <: AbstractTrainerDummy trainer that plays RandomPlayer vs RandomPlayer. Useful for validating the training pipeline without any ML dependencies.
Notes
Implementing a custom trainer
struct MyTrainer <: AbstractTrainer
model # your Flux / MPS / ... model
learning_rate::Float64
end
function Reversi.train_episode!(trainer::MyTrainer, episode::Int)
# ... run a self-play game, update model, return TrainingMetrics
end
# Optional overrides
Reversi.predict_value(trainer::MyTrainer, game) = ...
Reversi.hyperparameters(trainer::MyTrainer) = Dict("lr" => trainer.learning_rate)
Reversi.batch_size(trainer::MyTrainer) = 16Persistence
save_trainer / load_trainer use Julia's Serialization stdlib by default. Trainers with framework-specific state (GPU buffers, etc.) should override these with format-appropriate writers.