Training (src/training/)

Trainer-agnostic training pipeline. Concrete trainers (e.g. neural-network or tensor-network agents) implement the train_episode! interface; the TrainingSession machinery handles background scheduling, metrics collection, and persistence.

Types and core interface (types.jl)

Reversi.AbstractTrainerType
AbstractTrainer

Abstract type for all trainer implementations.

Required interface:

  • train_episode!(trainer, episode::Int) -> TrainingMetrics

Optional interface (with sensible defaults — override as needed):

  • predict_value(trainer, game) -> Float32 # value of current state
  • hyperparameters(trainer) -> Dict{String,Any} # for UI display
  • opponent(trainer) -> Union{Player,Nothing} # baseline opponent (nothing = self-play)
  • batch_size(trainer) -> Int # episodes per batch (default 1)
  • train_batch!(trainer, ep_start, n) -> Vector{TrainingMetrics}
  • save_trainer(trainer, path) / load_trainer(path) # persistence
source
Reversi.TrainingMetricsType
TrainingMetrics

Metrics collected from a single training episode (one game).

Fields:

  • episode — episode number (1-based)
  • winnerBLACK, WHITE, or EMPTY (draw)
  • black_score — final black piece count
  • white_score — final white piece count
  • win_rate — per-episode indicator (1.0 if BLACK won, 0.0 otherwise)
  • policy — 8×8 move frequency / probability heatmap
  • value — predicted value of the initial state from BLACK's perspective
  • loss — training loss (if a weight update occurred this episode), or nothing
source
Reversi.batch_sizeMethod
batch_size(trainer::AbstractTrainer) -> Int

Return the number of episodes the trainer prefers to run before reporting metrics. Trainers that accumulate gradients across episodes should override this. Default: 1.

source
Reversi.hyperparametersMethod
hyperparameters(trainer::AbstractTrainer) -> Dict{String,Any}

Return a dictionary of hyperparameters for display in the training UI. Default: empty dict.

source
Reversi.load_trainerMethod
load_trainer(path::AbstractString) -> AbstractTrainer

Deserialize a trainer previously saved with save_trainer.

source
Reversi.opponentMethod
opponent(trainer::AbstractTrainer) -> Union{Player,Nothing}

Return the baseline opponent for the trainer, or nothing for pure self-play. Default: nothing.

source
Reversi.predict_valueMethod
predict_value(trainer::AbstractTrainer, game::ReversiGame) -> Float32

Predict the value of game's current state from the current player's perspective. Convention: positive = current player winning, in roughly [-1, 1]. Default: 0.0f0 (no information).

source
Reversi.save_trainerMethod
save_trainer(trainer::AbstractTrainer, path::AbstractString)

Serialize trainer to path using Julia's Serialization stdlib. Trainers with framework-specific state (e.g. Flux models with GPU buffers) should override this with a format-appropriate writer.

source
Reversi.train_batch!Method
train_batch!(trainer, episode_start::Int, n::Int) -> Vector{TrainingMetrics}

Run n consecutive episodes. Default implementation calls train_episode! sequentially. Trainers that need to accumulate gradients across the batch should override this method.

source
Reversi.train_episode!Function
train_episode!(trainer::AbstractTrainer, episode::Int) -> TrainingMetrics

Run one training episode and return the collected metrics. Must be implemented by every concrete AbstractTrainer subtype.

source

Session lifecycle (session.jl)

Reversi.start_training!Method
start_training!(session::TrainingSession)

Launch the training loop as a background Task. Each episode calls train_episode!(trainer, episode_number) and appends the result to session.metrics_history.

source
Reversi.stop_training!Method
stop_training!(session::TrainingSession)

Signal the training loop to stop after the current episode finishes.

source
Reversi.training_historyMethod
training_history(session::TrainingSession) -> Vector{Dict}

Return the full metrics history as a vector of dicts (JSON-serializable).

source
Reversi.training_policyMethod
training_policy(session::TrainingSession) -> Matrix{Float32}

Return the latest policy heatmap (8×8). Returns zeros if no episodes completed.

source
Reversi.training_statusMethod
training_status(session::TrainingSession) -> Dict

Return the current training status: running flag, episode count, latest metrics.

source

Built-in trainers (random_trainer.jl)

Reversi.RandomTrainerType
RandomTrainer <: AbstractTrainer

Dummy trainer that plays RandomPlayer vs RandomPlayer. Useful for validating the training pipeline without any ML dependencies.

source

Notes

Implementing a custom trainer

struct MyTrainer <: AbstractTrainer
    model           # your Flux / MPS / ... model
    learning_rate::Float64
end

function Reversi.train_episode!(trainer::MyTrainer, episode::Int)
    # ... run a self-play game, update model, return TrainingMetrics
end

# Optional overrides
Reversi.predict_value(trainer::MyTrainer, game) = ...
Reversi.hyperparameters(trainer::MyTrainer) = Dict("lr" => trainer.learning_rate)
Reversi.batch_size(trainer::MyTrainer) = 16

Persistence

save_trainer / load_trainer use Julia's Serialization stdlib by default. Trainers with framework-specific state (GPU buffers, etc.) should override these with format-appropriate writers.