flip.nvflare.controllers

FLIP Controllers module containing NVFLARE workflow controllers.

Controllers orchestrate federated learning workflows.

Exports:

InitTraining: Initialization controller for training setup
ScatterAndGather: Main training loop controller with FedAvg aggregation
ScatterAndGatherLDM: Dual-phase training controller for LDM (autoencoder + diffusion model)
CrossSiteModelEval: Cross-site model evaluation controller
InitEvaluation: Initialization controller for evaluation setup
ModelEval: Main evaluation loop controller

Submodules

Classes

`CrossSiteModelEval`	Cross Site Model Validation workflow.
`ModelEval`	Model Evaluation workflow.
`InitEvaluation`	The controller that is executed pre-training and is a part of the FLIP training model
`InitTraining`	The controller that is executed pre-training and is a part of the FLIP training model
`ScatterAndGather`	The controller for FederatedAveraging Workflow.
`ScatterAndGatherLDM`	The controller for FederatedAveraging Workflow.

Package Contents

class flip.nvflare.controllers.CrossSiteModelEval(task_check_period=0.5, cross_val_dir=AppConstants.CROSS_VAL_DIR, submit_model_timeout=600, validation_timeout: int = 6000, model_locator_id='', formatter_id='', submit_model_task_name=AppConstants.TASK_SUBMIT_MODEL, validation_task_name=AppConstants.TASK_VALIDATION, cleanup_models=False, participating_clients=None, wait_for_clients_timeout=300, cleanup_timeout=600, fatal_error_delay=5, model_id='')[source]

Bases: nvflare.apis.impl.controller.Controller

Cross Site Model Validation workflow.

Parameters:

task_check_period (float, optional) – How often to check for new tasks or tasks being finished. Defaults to 0.5.
cross_val_dir (str, optional) – Path to cross site validation directory relative to run directory. Defaults to “cross_site_val”.
submit_model_timeout (int, optional) – Timeout of submit_model_task. Defaults to 600 secs.
validation_timeout (int, optional) – Timeout for validate_model task. Defaults to 6000 secs.
model_locator_id (str, optional) – ID for model_locator component. Defaults to “”.
formatter_id (str, optional) – ID for formatter component. Defaults to “”.
submit_model_task_name (str, optional) – Name of submit_model task. Defaults to “”.
validation_task_name (str, optional) – Name of validate_model task. Defaults to “validate”.
cleanup_models (bool, optional) – Whether models should be deleted after run. Defaults to False.
participating_clients (list, optional) – List of participating client names. If not provided, defaults to all clients connected at start of controller.
wait_for_clients_timeout (int, optional) – Timeout for clients to appear. Defaults to 300 secs
fatal_error_delay (int, optional) – Time in seconds to delay before calling ‘system_panic’ if a task returns an error result and ignore_result_error is set to false
model_id (str, required) – ID of the model that the training is being performed under.

flip

start_controller(fl_ctx: nvflare.apis.fl_context.FLContext)[source]

control_flow(abort_signal: nvflare.apis.signal.Signal, fl_ctx: nvflare.apis.fl_context.FLContext)[source]

stop_controller(fl_ctx: nvflare.apis.fl_context.FLContext)[source]

handle_event(event_type: str, fl_ctx: nvflare.apis.fl_context.FLContext)[source]

process_result_of_unknown_task(client: nvflare.apis.client.Client, task_name: str, client_task_id: str, result: nvflare.apis.shareable.Shareable, fl_ctx: nvflare.apis.fl_context.FLContext)[source]

class flip.nvflare.controllers.ModelEval(task_check_period=0.5, submit_model_timeout=600, validation_timeout: int = 6000, model_locator_id='', formatter_id='', submit_model_task_name=AppConstants.TASK_SUBMIT_MODEL, evaluation_task_name=PTConstants.EvalTaskName, cleanup_models=False, participating_clients=None, wait_for_clients_timeout=300, cleanup_timeout=600, fatal_error_delay=5, model_id='')[source]

Bases: nvflare.apis.impl.controller.Controller

Model Evaluation workflow.

Parameters:

task_check_period (float, optional) – How often to check for new tasks or tasks being finished. Defaults to 0.5.
submit_model_timeout (int, optional) – Timeout of submit_model_task. Defaults to 600 secs.
validation_timeout (int, optional) – Timeout for validate_model task. Defaults to 6000 secs.
model_locator_id (str, optional) – ID for model_locator component. Defaults to “”.
formatter_id (str, optional) – ID for formatter component. Defaults to “”.
submit_model_task_name (str, optional) – Name of submit_model task. Defaults to “”.
validation_task_name (str, optional) – Name of validate_model task. Defaults to “validate”.
cleanup_models (bool, optional) – Whether models should be deleted after run. Defaults to False.
participating_clients (list, optional) – List of participating client names. If not provided, defaults to all clients connected at start of controller.
wait_for_clients_timeout (int, optional) – Timeout for clients to appear. Defaults to 300 secs
fatal_error_delay (int, optional) – Time in seconds to delay before calling ‘system_panic’ if a task returns an error result and ignore_result_error is set to false
model_id (str, required) – ID of the model that the training is being performed under.

flip

start_controller(fl_ctx: nvflare.apis.fl_context.FLContext)[source]

control_flow(abort_signal: nvflare.apis.signal.Signal, fl_ctx: nvflare.apis.fl_context.FLContext)[source]

stop_controller(fl_ctx: nvflare.apis.fl_context.FLContext)[source]

handle_event(event_type: str, fl_ctx: nvflare.apis.fl_context.FLContext)[source]

process_result_of_unknown_task(client: nvflare.apis.client.Client, task_name: str, client_task_id: str, result: nvflare.apis.shareable.Shareable, fl_ctx: nvflare.apis.fl_context.FLContext)[source]

class flip.nvflare.controllers.InitEvaluation(model_id: str, min_clients: int = FlipConstants.MIN_CLIENTS, flip: flip.FLIP = FLIP(), cleanup_timeout: int = 600)[source]

Bases: nvflare.apis.impl.controller.Controller

The controller that is executed pre-training and is a part of the FLIP training model

The InitTraining workflow sends a request to the Central Hub, stating that training has initiated and executes the client cleanup task.

Parameters:

model_id (str) – ID of the model that the training is being performed under.
min_clients (int, optional) – Minimum number of clients. Defaults to 1 for the aggregation to take place with successful results.
cleanup_timeout (int, optional) – Timeout for image cleanup, defaults to 600 seconds (10 minutes)

Raises:

ValueError –

when the model ID is not a valid UUID.
when the minimum number of clients specified is less than 1
when cleanup_timeout is less the 0

flip

start_controller(fl_ctx: nvflare.apis.fl_context.FLContext)[source]

control_flow(abort_signal: nvflare.apis.signal.Signal, fl_ctx: nvflare.apis.fl_context.FLContext)[source]

stop_controller(fl_ctx: nvflare.apis.fl_context.FLContext) → None[source]

process_result_of_unknown_task(client: nvflare.apis.client.Client, task_name, client_task_id, result: nvflare.apis.shareable.Shareable, fl_ctx: nvflare.apis.fl_context.FLContext) → None[source]

class flip.nvflare.controllers.InitTraining(model_id: str, min_clients: int = FlipConstants.MIN_CLIENTS, flip: flip.FLIP = FLIP(), cleanup_timeout: int = 600)[source]

Bases: nvflare.apis.impl.controller.Controller

The controller that is executed pre-training and is a part of the FLIP training model

The InitTraining workflow sends a request to the Central Hub, stating that training has initiated and executes the client cleanup task.

Parameters:

model_id (str) – ID of the model that the training is being performed under.
min_clients (int, optional) – Minimum number of clients. Defaults to 1 for the aggregation to take place with successful results.
cleanup_timeout (int, optional) – Timeout for image cleanup, defaults to 600 seconds (10 minutes)

Raises:

ValueError –

when the model ID is not a valid UUID.
when the minimum number of clients specified is less than 1
when cleanup_timeout is less the 0

flip

start_controller(fl_ctx: nvflare.apis.fl_context.FLContext)[source]

control_flow(abort_signal: nvflare.apis.signal.Signal, fl_ctx: nvflare.apis.fl_context.FLContext)[source]

stop_controller(fl_ctx: nvflare.apis.fl_context.FLContext) → None[source]

process_result_of_unknown_task(client: nvflare.apis.client.Client, task_name, client_task_id, result: nvflare.apis.shareable.Shareable, fl_ctx: nvflare.apis.fl_context.FLContext) → None[source]

class flip.nvflare.controllers.ScatterAndGather(model_id: str = '', min_clients: int = 1, num_rounds: int = 5, start_round: int = 0, wait_time_after_min_received: int = 10, aggregator_id=AppConstants.DEFAULT_AGGREGATOR_ID, persistor_id=AppConstants.DEFAULT_PERSISTOR_ID, shareable_generator_id=AppConstants.DEFAULT_SHAREABLE_GENERATOR_ID, train_task_name=AppConstants.TASK_TRAIN, train_timeout: int = 0, ignore_result_error: bool = False, fatal_error_delay: int = 5, task_check_period: float = 0.5, persist_every_n_rounds: int = 1)[source]

Bases: nvflare.apis.impl.controller.Controller

The controller for FederatedAveraging Workflow.

The ScatterAndGather workflow defines Federated training on all clients. The model persistor (persistor_id) is used to load the initial global model which is sent to all clients. Each clients sends it’s updated weights after local training which is aggregated (aggregator_id). The shareable generator is used to convert the aggregated weights to shareable and shareable back to weights. The model_persistor also saves the model after training.

Parameters:

model_id (str, required) – ID of the model that the training is being performed under.
min_clients (int, optional) – Min number of clients in training. Defaults to 1.
num_rounds (int, optional) – The total number of training rounds. Defaults to 5.
start_round (int, optional) – Start round for training. Defaults to 0.
wait_time_after_min_received (int, optional) – Time to wait before beginning aggregation after contributions received. Defaults to 10.
aggregator_id (str, optional) – ID of the aggregator component. Defaults to “aggregator”.
persistor_id (str, optional) – ID of the persistor component. Defaults to “persistor”.
shareable_generator_id (str, optional) – ID of the shareable generator. Defaults to “shareable_generator”.
train_task_name (str, optional) – Name of the train task. Defaults to “train”.
train_timeout (int, optional) – Time to wait for clients to do local training.
ignore_result_error (bool, optional) – whether this controller can proceed if result has errors. Defaults to False.
fatal_error_delay (int, optional) – Time in seconds to delay before calling ‘system_panic’ if a task returns an error result and ignore_result_error is set to false
task_check_period (float, optional) – interval for checking status of tasks. Defaults to 0.5.
persist_every_n_rounds (int, optional) – persist the global model every n rounds. Defaults to 0. If n is 0 then no persist.

Raises:

TypeError – when any of input arguments does not have correct type
ValueError – when any of input arguments is out of range or are in an incorrect format

flip

model_id = ''

aggregator_id

persistor_id

shareable_generator_id

train_task_name

aggregator = None

persistor = None

shareable_gen = None

start_controller(fl_ctx: nvflare.apis.fl_context.FLContext) → None[source]

control_flow(abort_signal: nvflare.apis.signal.Signal, fl_ctx: nvflare.apis.fl_context.FLContext) → None[source]

stop_controller(fl_ctx: nvflare.apis.fl_context.FLContext) → None[source]

handle_event(event_type: str, fl_ctx: nvflare.apis.fl_context.FLContext)[source]

process_result_of_unknown_task(client: nvflare.apis.client.Client, task_name, client_task_id, result: nvflare.apis.shareable.Shareable, fl_ctx: nvflare.apis.fl_context.FLContext) → None[source]

get_persist_state(fl_ctx: nvflare.apis.fl_context.FLContext) → dict[source]

restore(state_data: dict, fl_ctx: nvflare.apis.fl_context.FLContext)[source]

class flip.nvflare.controllers.ScatterAndGatherLDM(model_id: str = '', min_clients: int = 1, num_rounds_ae: int = 5, num_rounds_dm: int = 5, start_round: int = 0, model_locator_id='', wait_time_after_min_received: int = 10, aggregator_id=AppConstants.DEFAULT_AGGREGATOR_ID, persistor_id=AppConstants.DEFAULT_PERSISTOR_ID, shareable_generator_id=AppConstants.DEFAULT_SHAREABLE_GENERATOR_ID, train_task_name=AppConstants.TASK_TRAIN, train_timeout: int = 0, ignore_result_error: bool = True, fatal_error_delay: int = 5, task_check_period: float = 0.5, persist_every_n_rounds: int = 1)[source]

Bases: nvflare.apis.impl.controller.Controller

The controller for FederatedAveraging Workflow.

The ScatterAndGather workflow defines Federated training on all clients. The model persistor (persistor_id) is used to load the initial global model which is sent to all clients. Each clients sends it’s updated weights after local training which is aggregated (aggregator_id). The shareable generator is used to convert the aggregated weights to shareable and shareable back to weights. The model_persistor also saves the model after training.

Parameters:

model_id (str, required) – ID of the model that the training is being performed under.
min_clients (int, optional) – Min number of clients in training. Defaults to 1.
num_rounds_ae (int, optional) – The total number of training rounds for autoencoder. Defaults to 5.
num_rounds_dm (int, optional) – The total number of training rounds for diffusion model. Defaults to 5.
start_round (int, optional) – Start round for training. Defaults to 0.
model_locator_id (str, optional) – ID of the model locator component. Defaults to “”.
wait_time_after_min_received (int, optional) – Time to wait before beginning aggregation after contributions received. Defaults to 10.
aggregator_id (str, optional) – ID of the aggregator component. Defaults to “aggregator”.
persistor_id (str, optional) – ID of the persistor component. Defaults to “persistor”.
shareable_generator_id (str, optional) – ID of the shareable generator. Defaults to “shareable_generator”.
train_task_name (str, optional) – Name of the train task. Defaults to “train”.
train_timeout (int, optional) – Time to wait for clients to do local training.
ignore_result_error (bool, optional) – whether this controller can proceed if result has errors. Defaults to False.
fatal_error_delay (int, optional) – Time in seconds to delay before calling ‘system_panic’ if a task returns an error result and ignore_result_error is set to false
task_check_period (float, optional) – interval for checking status of tasks. Defaults to 0.5.
persist_every_n_rounds (int, optional) – persist the global model every n rounds. Defaults to 0. If n is 0 then no persist.

Raises:

TypeError – when any of input arguments does not have correct type
ValueError – when any of input arguments is out of range or are in an incorrect format

flip

model_id = ''

aggregator_id

persistor_id

model_locator_id = ''

shareable_generator_id

train_task_name

aggregator = None

persistor = None

shareable_gen = None

ignore_result_error = True

start_controller(fl_ctx: nvflare.apis.fl_context.FLContext) → None[source]

locate_server_models(fl_ctx: nvflare.apis.fl_context.FLContext) → bool[source]

Locate server models for the current task.

Parameters:: fl_ctx (FLContext) – _description_
Returns:: bool – _description_

control_flow(abort_signal: nvflare.apis.signal.Signal, fl_ctx: nvflare.apis.fl_context.FLContext) → None[source]

stop_controller(fl_ctx: nvflare.apis.fl_context.FLContext) → None[source]

handle_event(event_type: str, fl_ctx: nvflare.apis.fl_context.FLContext)[source]

process_result_of_unknown_task(client: nvflare.apis.client.Client, task_name, client_task_id, result: nvflare.apis.shareable.Shareable, fl_ctx: nvflare.apis.fl_context.FLContext) → None[source]

get_persist_state(fl_ctx: nvflare.apis.fl_context.FLContext) → dict[source]

restore(state_data: dict, fl_ctx: nvflare.apis.fl_context.FLContext)[source]