Skip to content

hyperoptim

Functions:

  • main

    Command-line interface for Optuna hyperparameter optimization.

  • objective

    Optuna objective function for multi-objective neural-network architecture search.

main

main()

Command-line interface for Optuna hyperparameter optimization.

This function parses command-line arguments, creates (or resumes) an Optuna study, builds training/validation datasets, and runs multi-objective optimization over neural-network depth/width.

The optimization minimizes both validation loss and model size (parameter count) using :class:optuna.samplers.NSGAIISampler. Study results and per-trial artifacts (TensorBoard logs, CSV histories, and model checkpoints) are persisted to the configured Optuna storage and project output directories.

Command Line Parameters
  • study_name : str Name of the Optuna study. This is used to namespace results and artifacts in the configured Optuna storage and project output directories.
  • --resume_study : bool, optional If set, the script will attempt to load an existing study with the same name and resume optimization from there. If not set and a study with the same name already exists, the script will exit with an error to prevent accidental overwriting.
  • --storage : str, optional Optuna storage URL (for example, sqlite:///example.db). Default is sqlite:///optuna.db.
  • --n_trials : int, optional Number of trials to run. Default is 1000. The maximum number of trials is also capped at 1000 to prevent excessively long runs.
  • --rfp_only : bool, optional If set, the training and validation datasets will be filtered to include only RFP equilibria. Default is False (i.e., use the full dataset).
Notes
  • GPU memory growth is enabled (if GPUs are available) via tf.config.experimental.set_memory_growth to reduce OOM issues.
  • When --resume_study is used, the remaining number of trials is reduced by the number of already completed trials.
  • A first trial with default parameters is enqueued for new studies to provide a known baseline configuration.
Source code in src/fpga_profile_reco/core/hyperoptim.py
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
def main():
    """
    Command-line interface for Optuna hyperparameter optimization.

    This function parses command-line arguments, creates (or resumes) an Optuna
    study, builds training/validation datasets, and runs multi-objective
    optimization over neural-network depth/width.

    The optimization minimizes both validation loss and model size (parameter
    count) using :class:`optuna.samplers.NSGAIISampler`. Study results and
    per-trial artifacts (TensorBoard logs, CSV histories, and model checkpoints)
    are persisted to the configured Optuna storage and project output
    directories.

    Command Line Parameters
    -----------------------
    - `study_name` : str
        Name of the Optuna study. This is used to namespace results and artifacts
        in the configured Optuna storage and project output directories.
    - `--resume_study` : bool, optional
        If set, the script will attempt to load an existing study with the same name and resume optimization from there. If not set and a study with the same
        name already exists, the script will exit with an error to prevent accidental overwriting.
    - `--storage` : str, optional
        Optuna storage URL (for example, ``sqlite:///example.db``). Default is ``sqlite:///optuna.db``.
    - `--n_trials` : int, optional
        Number of trials to run. Default is 1000. The maximum number of trials is also capped at 1000 to prevent excessively long runs.
    - `--rfp_only` : bool, optional
        If set, the training and validation datasets will be filtered to include only RFP equilibria. Default is False (i.e., use the full dataset).

    Notes
    -----
    - GPU memory growth is enabled (if GPUs are available) via
      ``tf.config.experimental.set_memory_growth`` to reduce OOM issues.
    - When ``--resume_study`` is used, the remaining number of trials is reduced
      by the number of already completed trials.
    - A first trial with default parameters is enqueued for new studies to
      provide a known baseline configuration.
    """
    import argparse

    parser = argparse.ArgumentParser(description="Run Optuna hyperparameter optimization")

    parser.add_argument("study_name", type=str, help="Name of the study")
    parser.add_argument("--resume_study", action="store_true", help="Resume an existing study with the same name")
    parser.add_argument("--storage", type=str, default="sqlite:///optuna.db", help="Path to the Optuna storage file")
    parser.add_argument("--n_trials", type=int, default=1000, help="Number of trials to run (default and max: 1000)")
    parser.add_argument("--rfp_only", action="store_true", help="Use dataset with only RFP equilibria (default: False)")

    args = parser.parse_args()

    storage = args.storage
    name = args.study_name    
    n_trials = min(args.n_trials, 1000)

    sampler = optuna.samplers.NSGAIISampler(
        population_size=32,
        mutation_prob=0.2,
        crossover_prob=0.8,
        swapping_prob=0.5,
        seed=cfg.SEED
    )
    pruner = optuna.pruners.NopPruner()
    directions = ["minimize", "minimize"]  # Minimize both validation loss and model size

    try:
        study = optuna.create_study(directions=directions, study_name=name, storage=storage, load_if_exists=False, sampler=sampler, pruner=pruner)
        completed_trials = 0
    except optuna.exceptions.DuplicatedStudyError:
        print(f"Study {name} already exists.")
        if args.resume_study:
            print("Resuming existing study...")
            study = optuna.create_study(directions=directions, study_name=name, storage=storage, load_if_exists=True, sampler=sampler, pruner=pruner)
            # adjust number of trials to run
            completed_trials = len(study.get_trials(deepcopy=False, states=(optuna.trial.TrialState.COMPLETE,)))
            n_trials = max(0, n_trials - completed_trials)
            print(f"Running {n_trials} more trials...")
        else:
            print("Use --resume_study to resume the existing study or change the study name.")
            print("Exiting...")
            exit(1)

    # set memory growth for GPUs
    gpus = tf.config.experimental.list_physical_devices('GPU')
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)

    # get datasets
    train_ds, val_ds = get_datasets(train_radial_res=100, val_radial_res=250, batch_size=100_000, rfp_only=args.rfp_only)

    # enqueue first trial to test default parameters
    if not completed_trials > 0:
        print("Enqueuing first trial with default parameters...")
        study.enqueue_trial({"n_layers": 5, "units": 50})
    # run optimization
    study.optimize(lambda trial: objective(trial, train_ds, val_ds, name), n_trials=n_trials, gc_after_trial=True)

objective

objective(trial: Trial, train_ds: Dataset, val_ds: Dataset, study_name: str) -> tuple[float, int]

Optuna objective function for multi-objective neural-network architecture search.

This objective builds a :class:fpga_profile_reco.core.models.HardNN model from hyperparameters suggested by Optuna, trains it on the provided datasets, and returns a pair of objectives:

1) best validation loss (minimize) 2) model parameter count (minimize)

The function also attaches additional metrics to the trial via :meth:optuna.trial.Trial.set_user_attr for later analysis and writes TensorBoard logs, CSV history, and a best-checkpoint model for each trial.

Parameters:

  • trial

    (Trial) –

    Current Optuna trial used to sample hyperparameters and record results.

  • train_ds

    (Dataset) –

    Training dataset.

  • val_ds

    (Dataset) –

    Validation dataset.

  • study_name

    (str) –

    Name of the Optuna study, used to namespace output directories for logs and checkpoints.

Returns:

  • objectives ( tuple[float, int] ) –

    A 2-tuple (best_val_loss, n_params) suitable for a study created with directions=["minimize", "minimize"].

Notes
  • Duplicate hyperparameter configurations are detected by comparing trial.params to the params of completed trials; duplicates are pruned by raising :class:optuna.exceptions.TrialPruned.
  • This code compiles the model with an optimizer only; the project uses a custom training loop inside the model to handle losses/metrics.
  • Per-trial artifacts are written under (subpaths may vary by config): cfg.TENSORBOARD_LOGS_DIR / "optuna" / study_name / <trial_id>, cfg.HISTORY_DIR / "optuna" / study_name, cfg.MODELS_DIR / "optuna" / study_name.
Source code in src/fpga_profile_reco/core/hyperoptim.py
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
def objective(trial: optuna.trial.Trial, train_ds: tf.data.Dataset, val_ds: tf.data.Dataset, study_name: str) -> tuple[float, int]:
    """
    Optuna objective function for multi-objective neural-network architecture search.

    This objective builds a :class:`fpga_profile_reco.core.models.HardNN` model
    from hyperparameters suggested by Optuna, trains it on the provided datasets,
    and returns a pair of objectives:

    1) best validation loss (minimize)
    2) model parameter count (minimize)

    The function also attaches additional metrics to the trial via
    :meth:`optuna.trial.Trial.set_user_attr` for later analysis and writes
    TensorBoard logs, CSV history, and a best-checkpoint model for each trial.

    Parameters
    ----------
    trial : optuna.trial.Trial
        Current Optuna trial used to sample hyperparameters and record results.
    train_ds : tf.data.Dataset
        Training dataset.
    val_ds : tf.data.Dataset
        Validation dataset.
    study_name : str
        Name of the Optuna study, used to namespace output directories for logs
        and checkpoints.

    Returns
    -------
    objectives : tuple[float, int]
        A 2-tuple ``(best_val_loss, n_params)`` suitable for a study created with
        ``directions=["minimize", "minimize"]``.

    Notes
    -----
    - Duplicate hyperparameter configurations are detected by comparing
      ``trial.params`` to the params of completed trials; duplicates are pruned
      by raising :class:`optuna.exceptions.TrialPruned`.
    - This code compiles the model with an optimizer only; the project uses a
      custom training loop inside the model to handle losses/metrics.
    - Per-trial artifacts are written under (subpaths may vary by config):
      ``cfg.TENSORBOARD_LOGS_DIR / "optuna" / study_name / <trial_id>``,
      ``cfg.HISTORY_DIR / "optuna" / study_name``,
      ``cfg.MODELS_DIR / "optuna" / study_name``.
    """
    n_layers = trial.suggest_int("n_layers", 1, 10, step=1)
    units = trial.suggest_int("units", 10, 500, step=10)

    # check whether the suggested parameters have been tried before and prune the trial if so
    past_trials = trial.study.get_trials(deepcopy=False, states=[optuna.trial.TrialState.COMPLETE])
    for past_trial in reversed(past_trials):
        if trial.params == past_trial.params:
            print(f"Skipping duplicated trial: {trial.number} is identical to {past_trial.number} with params {trial.params}")
            raise optuna.exceptions.TrialPruned()

    architecture = {
        "units": [units for _ in range(n_layers)],
        "activation": "relu",
        "output_size": 6,
        "output_activation": "linear"
    }

    # instantiate model
    model = HardNN(architecture=architecture)

    # only compile with optimizer, loss and metrics are handled in the custom training loop
    model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001))

    # setup various callbacks
    callbacks = []

    callbacks.append(keras.callbacks.TerminateOnNaN())
    callbacks.append(keras.callbacks.ReduceLROnPlateau(monitor='val_loss',
                                                       factor=0.5,
                                                       patience=40,
                                                       cooldown=20,
                                                       min_lr=1e-6,
                                                       min_delta=1e-7,
                                                       verbose=0))

    callbacks.append(keras.callbacks.EarlyStopping(monitor='val_loss',
                                                   patience=50,
                                                   min_delta=1e-9,
                                                   restore_best_weights=False,
                                                   start_from_epoch=350,
                                                   verbose=0))

    trial_number = f"{trial.number:04d}"    
    tb_path = cfg.TENSORBOARD_LOGS_DIR / "optuna" / study_name / trial_number
    tb_path.mkdir(parents=True, exist_ok=True)
    callbacks.append(keras.callbacks.TensorBoard(log_dir=tb_path, histogram_freq=10, update_freq='epoch'))
    csv_path = cfg.HISTORY_DIR / "optuna" / study_name
    csv_path.mkdir(parents=True, exist_ok=True)
    callbacks.append(keras.callbacks.CSVLogger(filename=csv_path / (trial_number + '.csv'), append=False))
    model_save_path = cfg.MODELS_DIR / "optuna" / study_name
    model_save_path.mkdir(parents=True, exist_ok=True)
    callbacks.append(keras.callbacks.ModelCheckpoint(filepath=model_save_path / (trial_number + '.keras'), monitor='val_loss', save_best_only=True))

    # run training
    history = model.fit(train_ds,
                        validation_data=val_ds,
                        callbacks=callbacks,
                        epochs=1000,
                        verbose=0)

    # get objective values and save other metrics as user attributes for later analysis
    best_val_loss = min(history.history['val_loss'])
    min_index = history.history['val_loss'].index(best_val_loss)

    trial.set_user_attr("best_obs_loss", history.history['val_obs_loss'][min_index])
    trial.set_user_attr("min_obs_loss", min(history.history['val_obs_loss']))

    params = model.count_params()

    return best_val_loss, params