flagx.gating

This module provides classification methods for automated flow cytometry gating.

SomClassifier

MLPClassifier

class flagx.gating.MLPClassifier(*args: Any, **kwargs: Any)

Bases: BaseEstimator, ClassifierMixin

A three layer perceptron (MLP) classifier.

This classifier wraps a fully connected neural network implemented in PyTorch while exposing a scikit-learn–style API. The model supports multi-class classification, automatic device selection (CPU or GPU), and provides the methods fit(), predict(), predict_proba(), score(), save(), and load(). For model training CrossEntropyLoss and the Adam optimizer are used.

classes_

Class labels after re-indexing to integers starting from 0.

Type:

np.ndarray or None

class_counts_

Class counts from the training data.

Type:

np.ndarray or None

og_classes_

Original class labels before re-indexing.

Type:

np.ndarray or None

class_priors_

Empirical class priors.

Type:

np.ndarray or None

new_to_og_classes_dict_

Mapping from new integer labels back to original labels.

Type:

dict[int, Any] or None

data_set_

PyTorch tensor dataset constructed during fitting.

Type:

TensorDataset or None

data_loader_

PyTorch DataLoader used for minibatch training.

Type:

DataLoader or None

model_

Neural network model.

Type:

nn.Module or None

criterion_

Loss function, PyTorch CrossEntropyLoss.

Type:

nn.Module or None

optimizer_

PyTorch Adam optimizer with learning rate 0.001.

Type:

Optimizer or None

training_log_

Logged losses of the training run.

Type:

dict[str, int | list[int | float]] or None

is_fitted_

Whether the classifier has been fitted.

Type:

bool

Parameters:
  • layer_sizes (Tuple[int, ...]) – Sizes of the hidden layers in the fully connected neural network.

  • n_epochs (int) – Number of training epochs.

  • loss_params (dict[str, Any] or None) – Parameters passed to the PyTorch’s CrossEntropyLoss.

  • optimizer_params (dict[str, Any] or None) – Parameters passed to the PyTorch’s Adam optimizer. If None, defaults to {'lr': 0.001}.

  • data_loader_params (dict[str, Any] or None) – Parameters passed to the PyTorch DataLoader. If None, defaults to {'batch_size': 128, 'shuffle': True, 'num_workers': 1}.

  • validation_fraction (float) – Fraction of the training data used as validation set. Defaults to 0.1.

  • early_stopping (bool) – Whether early stopping is used or not. If early_stopping is True and validation_fraction=0.0, the training loss is used as an early stopping criterion. Defaults to False.

  • tol (float) – Tolerance for early stopping. When the validation/training loss is not improving by at least tol for n_iter_no_change consecutive iterations, training is stopped early.

  • n_iter_no_change (int) – Maximum number of epochs to not meet tol improvement.

  • device (str or None) – Device to use for training (e.g., 'cpu', 'cuda', 'cuda:0'). If None, CUDA is used when available, otherwise falls back to CPU.

  • verbosity (int) – Verbosity level for training logs.

fit(X: numpy.ndarray, y: numpy.ndarray) typing_extensions.Self

Fit the MLP classifier to the provided training data.

Parameters:
  • X (np.ndarray) – Feature matrix of shape (n_samples, n_features).

  • y (np.ndarray) – Target labels of shape (n_samples,).

Returns:

The fitted classifier instance.

Return type:

Self

Raises:

ValueError – If X and y have incompatible shapes.

predict(X: numpy.ndarray) numpy.ndarray

Predict class labels for the given input samples.

Parameters:

X (np.ndarray) – Feature matrix of shape (n_samples, n_features).

Returns:

Predicted class labels using the original label encoding.

Return type:

np.ndarray

Raises:

NotFittedError – If predict() is used before calling fit().

predict_proba(X: numpy.ndarray) numpy.ndarray

Predict class probabilities for the given samples.

Parameters:

X (np.ndarray) – Feature matrix of shape (n_samples, n_features).

Returns:

Array of shape (n_samples, n_classes) containing class probabilities.

Return type:

np.ndarray

Raises:

NotFittedError – If predict() is used before calling fit().

score(X: numpy.ndarray, y: numpy.ndarray, sample_weight: numpy.ndarray | None = None)

Compute the macro F1 score of the classifier on the given dataset.

Parameters:
  • X (np.ndarray) – Feature matrix of shape (n_samples, n_features).

  • y (np.ndarray) – True labels.

  • sample_weight (np.ndarray or None) – Optional sample weights.

Returns:

Macro-averaged F1 score.

Return type:

float

Raises:

NotFittedError – If score() is used before calling fit().

save(filename: str = 'mlp_classifier.pkl', filepath: str | None = None) None

Save the fitted classifier to disk using torch.save.

Parameters:
  • filename (str) – Name of the file to save the model to.

  • filepath (str or None) – Directory where the file will be saved. Defaults to current working directory.

Returns:

None

classmethod load(filename: str = 'mlp_classifier.pkl', filepath: str | None = None, map_location: str | torch.device = 'cpu') typing_extensions.Self

Load a previously saved classifier from disk.

Parameters:
  • filename (str) – Name of the saved file.

  • filepath (str or None) – Directory containing the saved file. Defaults to current working directory.

  • map_location (str or torch.device) – Device mapping for loading the model (e.g., 'cpu' or 'cuda').

Returns:

The loaded classifier instance.

Return type:

Self

Neural Network Models

class flagx.gating.FCNNModel(*args: Any, **kwargs: Any)

Bases: Module

Fully connected neural network with arbitrary number of hidden linear layers of arbitrary size.

All but the output layer uses ReLU activations. Softmax is intentionally omitted from the final layer because torch.nn.CrossEntropyLoss expects raw logits.

The default parameters follow the configuration described in:

DGCyTOF: Deep learning with graphic cluster visualization to predict cell types of single cell mass cytometry data (Cheng et al., 2022).

Their implementation can be found at:

https://github.com/lijcheng12/DGCyTOF/blob/main/Code_Study/DGCyTOF/CyTOF2/CyTOF2.ipynb (22/27/2025).

Parameters:
  • in_size (int) – Number of input features.

  • out_size (int) – Number of output classes.

  • layer_sizes (Tuple[int, ...], optional) – Sizes of the hidden layers. Defaults to (128, 64, 32).

layers

List of fully connected linear layers.

Type:

nn.ModuleList

forward(x: torch.Tensor) torch.Tensor

Forward pass of the FCNN model. ReLU activation is applied after each layer except after the output layer.

Parameters:

x (torch.Tensor) – Input data tensor.

Returns:

Raw output logits with shape (batch_size, out_size).

Return type:

torch.Tensor