Training (`src/training/`)

Trainer-agnostic training pipeline. Concrete trainers (e.g. neural-network or tensor-network agents) implement the train_episode! interface; the TrainingSession machinery handles background scheduling, metrics collection, and persistence.

Types and core interface (`types.jl`)

Reversi.AbstractTrainer — Type

AbstractTrainer

Abstract type for all trainer implementations.

Required interface:

train_episode!(trainer, episode::Int) -> TrainingMetrics

Optional interface (with sensible defaults — override as needed):

predict_value(trainer, game) -> Float32 # value of current state
hyperparameters(trainer) -> Dict{String,Any} # for UI display
opponent(trainer) -> Union{Player,Nothing} # baseline opponent (nothing = self-play)
batch_size(trainer) -> Int # episodes per batch (default 1)
train_batch!(trainer, ep_start, n) -> Vector{TrainingMetrics}
save_trainer(trainer, path) / load_trainer(path) # persistence

source

Reversi.TrainingMetrics — Type

TrainingMetrics

Metrics collected from a single training episode (one game).

Fields:

episode — episode number (1-based)
winner — BLACK, WHITE, or EMPTY (draw)
black_score — final black piece count
white_score — final white piece count
win_rate — per-episode indicator (1.0 if BLACK won, 0.0 otherwise)
policy — 8×8 move frequency / probability heatmap
value — predicted value of the initial state from BLACK's perspective
loss — training loss (if a weight update occurred this episode), or nothing

source

Reversi.TrainingSession — Type

TrainingSession

Manages the lifecycle of a training run: background task, metrics history, locking.

source

Reversi.batch_size — Method

batch_size(trainer::AbstractTrainer) -> Int

Return the number of episodes the trainer prefers to run before reporting metrics. Trainers that accumulate gradients across episodes should override this. Default: 1.

source

Reversi.hyperparameters — Method

hyperparameters(trainer::AbstractTrainer) -> Dict{String,Any}

Return a dictionary of hyperparameters for display in the training UI. Default: empty dict.

source

Reversi.load_trainer — Method

load_trainer(path::AbstractString) -> AbstractTrainer

Deserialize a trainer previously saved with save_trainer.

source

Reversi.opponent — Method

opponent(trainer::AbstractTrainer) -> Union{Player,Nothing}

Return the baseline opponent for the trainer, or nothing for pure self-play. Default: nothing.

source

Reversi.predict_value — Method

predict_value(trainer::AbstractTrainer, game::ReversiGame) -> Float32

Predict the value of game's current state from the current player's perspective. Convention: positive = current player winning, in roughly [-1, 1]. Default: 0.0f0 (no information).

source

Reversi.save_trainer — Method

save_trainer(trainer::AbstractTrainer, path::AbstractString)

Serialize trainer to path using Julia's Serialization stdlib. Trainers with framework-specific state (e.g. Flux models with GPU buffers) should override this with a format-appropriate writer.

source

Reversi.train_batch! — Method

train_batch!(trainer, episode_start::Int, n::Int) -> Vector{TrainingMetrics}

Run n consecutive episodes. Default implementation calls train_episode! sequentially. Trainers that need to accumulate gradients across the batch should override this method.

source

Reversi.train_episode! — Function

train_episode!(trainer::AbstractTrainer, episode::Int) -> TrainingMetrics

Run one training episode and return the collected metrics. Must be implemented by every concrete AbstractTrainer subtype.

source

Session lifecycle (`session.jl`)

Reversi.start_training! — Method

start_training!(session::TrainingSession)

Launch the training loop as a background Task. Each episode calls train_episode!(trainer, episode_number) and appends the result to session.metrics_history.

source

Reversi.stop_training! — Method

stop_training!(session::TrainingSession)

Signal the training loop to stop after the current episode finishes.

source

Reversi.training_history — Method

training_history(session::TrainingSession) -> Vector{Dict}

Return the full metrics history as a vector of dicts (JSON-serializable).

source

Reversi.training_policy — Method

training_policy(session::TrainingSession) -> Matrix{Float32}

Return the latest policy heatmap (8×8). Returns zeros if no episodes completed.

source

Reversi.training_status — Method

training_status(session::TrainingSession) -> Dict

Return the current training status: running flag, episode count, latest metrics.

source

Built-in trainers (`random_trainer.jl`)

Reversi.RandomTrainer — Type

RandomTrainer <: AbstractTrainer

Dummy trainer that plays RandomPlayer vs RandomPlayer. Useful for validating the training pipeline without any ML dependencies.

source

Notes

Implementing a custom trainer

struct MyTrainer <: AbstractTrainer
    model           # your Flux / MPS / ... model
    learning_rate::Float64
end

function Reversi.train_episode!(trainer::MyTrainer, episode::Int)
    # ... run a self-play game, update model, return TrainingMetrics
end

# Optional overrides
Reversi.predict_value(trainer::MyTrainer, game) = ...
Reversi.hyperparameters(trainer::MyTrainer) = Dict("lr" => trainer.learning_rate)
Reversi.batch_size(trainer::MyTrainer) = 16

Persistence

save_trainer / load_trainer use Julia's Serialization stdlib by default. Trainers with framework-specific state (GPU buffers, etc.) should override these with format-appropriate writers.

Training (src/training/)

Types and core interface (types.jl)

Session lifecycle (session.jl)

Built-in trainers (random_trainer.jl)

Notes

Implementing a custom trainer

Persistence

Training (`src/training/`)

Types and core interface (`types.jl`)

Session lifecycle (`session.jl`)

Built-in trainers (`random_trainer.jl`)