flagx.pipeline

The FLAG-X GatingPipeline offers higher level access to the functionality implemented in the io, gating, and dimred modules. It orchestrates:

training data loading and processing, model training, and pipeline saving;
as well as pipeline loading, automated gating, dimensionality, and export of results to FCS for inference on new data.

class flagx.pipeline.GatingPipeline(train_data_file_path: str | None = None, train_data_file_names: List[str] | None = None, train_data_file_type: typing_extensions.Literal.('fcs', 'csv', 'lmd') | None=None, save_path: str | None = None, channels: List[int] | List[str] | None = None, label_key: int | str | None = None, compensate: bool = False, channel_names_alignment_kwargs: Dict[str, ~typing.Any] | None=None, relabel_data_kwargs: Dict[str, ~typing.Any] | None=None, preprocessing_kwargs: Dict[str, ~typing.Any] | None=None, downsampling_kwargs: Dict[str, ~typing.Any] | None=None, gating_method: typing_extensions.Literal.('som', 'mlp')='som', gating_method_kwargs: Dict[str, ~typing.Any] | None=None, prediction_threshold: float | None = None, verbosity: int = 1)

Bases: object

End-to-end flow cytometry gating pipeline supporting preprocessing, downsampling, dimensionality reduction, and supervised or unsupervised gating.

This class orchestrates the full workflow:

Load raw FCS/CSV/LMD files
Align channel names
(Optional) Relabel training data
(Optional) Preprocess data sample-wise
(Optional) Downsample data
Train the gating module
- supervised: MLP classifier
- supervised or unsupervised: SOM classifier
Inference on new samples
Optional dimensionality reduction (UMAP, SOM, PCA, t-SNE, etc.)
Export annotated FCS files

train_data_file_path

Path to directory containing training data. Defaults to CWD.

Type:: str or None

train_data_file_names

Specific training filenames to load. If None, uses all files in directory.

Type:: list[str] or None

train_data_file_type

Input file type. If None, inferred from first filename.

Type:: Literal[‘fcs’,’csv’, ‘lmd’] or None

save_path

Output directory for pipeline metadata and results. If None, defaults to CWD.

Type:: str or None

channels

Indices or names of channels to train on.

Type:: list[int] or list[str] or None

label_key

Key to labels. Can be column index in .X, channel name (key in var_names), or key to .obs. If None, only unsupervised SOM training is available.

Type:: int, str, or None

compensate

Whether to apply compensation or not. Defaults to False. See FlowDataManager.sample_wise_compensation().

Type:: bool

channel_names_alignment_kwargs

Arguments forwarded to channel alignment. See FlowDataManager.align_channel_names().

Type:: dict or None

relabel_data_kwargs

Mapping for relabeling training data. See FlowDataManager.relabel_data().

Type:: dict or None

preprocessing_kwargs

Sample-wise preprocessing configuration. See FlowDataManager.sample_wise_preprocessing().

Type:: dict or None

downsampling_kwargs

Sample-wise downsampling configuration. See FlowDataManager.sample_wise_downsampling().

Type:: dict or None

gating_method

Which model to train.

Type:: Literal[‘som’,’mlp’]

gating_method_kwargs

Additional arguments for SOM or MLP.

Type:: dict or None

prediction_threshold

Binary case: Prediction threshold. Defaults to 0.5.
Multiclass case: If prediction certainty below threshold classifier abstains from making prediction. Event is marked with -1. Defaults to 0.0, i.e. no abstention.

Type:: float or None

verbosity

Logging level.

Type:: int

is_trained_

Whether the pipeline has been successfully trained.

Type:: bool

gating_module_

The fitted gating model.

Type:: SOMClassifier or MLPClassifier or None

binary_classes_

Whether the task is binary classification.

Type:: bool or None

Parameters:

train_data_file_path (str or None) – Path to directory containing training data. Defaults to CWD.
train_data_file_names (list[str] or None) – Specific training filenames to load. If None, uses all files in directory.
train_data_file_type (Literal['fcs','csv', 'lmd'] or None) – Input file type. If None, inferred from first filename.
save_path (str or None) – Output directory for pipeline metadata and results. If None, defaults to CWD.
channels (list[int] or list[str] or None) – Indices or names of channels to train on.
label_key (int, str, or None) – Key to labels in .X, .obs, or .layers. If None, only unsupervised SOM training is available.
compensate (bool) – Whether to apply compensation or not. Defaults to False.
channel_names_alignment_kwargs (dict or None) – Arguments forwarded to channel alignment.
relabel_data_kwargs (dict or None) – Mapping for relabeling training data.
preprocessing_kwargs (dict or None) – Sample-wise preprocessing configuration.
downsampling_kwargs (dict or None) – Sample-wise downsampling configuration.
gating_method (Literal['som','mlp']) – Which model to train.
gating_method_kwargs (dict or None) – Additional arguments for SOM or MLP.
prediction_threshold (float or None) – Binary/abstention threshold; defaults chosen automatically.
verbosity (int) – Logging level.

Returns:

None

train()

Train the full gating pipeline.

This executes the full training workflow:

Load raw data
(Optional) Align channel names
(Optional) Relabel and preprocess
(Optional) Downsample
Construct training matrix from all samples
Train SOM or MLP gating module

The gating module is stored in self.gating_module_.

Raises:

ValueError – If MLP is selected but label_key is None.
ValueError – If binary labels are not exactly {0, 1}.

inference(data_file_path: str | None = None, data_file_names: List[str] | None = None, sample_wise: bool = False, gate: bool = True, dim_red_methods: ('som', 'pca', 'umap', 'tsne', 'isomap', 'locallylinearembedding', 'mds', 'spectralembedding'), ...] | None=('umap', ), dim_red_method_kwargs: Dict[str, ~typing.Any] | None, ...] | None=None, save_path: str | None = None, save_filename: str | None = None, scale_channels: List[str] | None = None, val_range: Tuple[float, float]=(0.0, 1048576), keep_unscaled: bool = False)

Apply the trained pipeline to new data for gating and/or dimensionality reduction.

This performs: - Data loading + preprocessing (same as during training) - (Optional) Prediction using the trained model - (Optional) Dimensionality reduction using one or more methods - Export to FCS file(s) with annotations added in new channels

Parameters:

data_file_path (str or None) – Directory containing inference data.
data_file_names (list[str] or None) – Specific inference filenames.
sample_wise (bool) – If True, run dimension reduction and export separately per sample.
gate (bool) – Whether to apply the trained gating model.
dim_red_methods (tuple[str] or None) – Dimensionality reduction methods to apply.
dim_red_method_kwargs (tuple[dict] or None) – One kwargs dict per method.
save_path (str or None) – Output directory for FCS export.
save_filename (str or None) – Base filename for exported FCS.
scale_channels (list[str] or None) – Additional channels to scale for FCS export (e.g., previously added integer labels).
val_range (tuple[float,float]) – Value range for scaling when writing FCS. (This is done for proper display of the added annotations in standard analysis software.)
keep_unscaled (bool) – Whether to also retain unscaled values in separate channels.

Returns:

None

Raises:

NotFittedError – If gating was requested but the model is not trained.
ValueError – If dimensionality reduction kwargs do not match number of methods.

save(filename: str = 'gating_pipeline.pkl', filepath: str | None = None)

Save the pipeline to a pickle file, including the gating model.

Parameters:

filename (str) – Output filename.
filepath (str or None) – Directory to save to. Defaults to pipeline save_path.

Returns:

None

classmethod load(filename: str = 'gating_pipeline.pkl', filepath: str | None = None)

Load a previously saved GatingPipeline.

Parameters:

filename (str) – Pipeline pickle filename.
filepath (str or None) – Directory path for the file. Defaults to CWD.

Returns:

Fully restored pipeline instance.

Return type:

GatingPipeline