flagx.pipeline

The FLAG-X GatingPipeline offers higher level access to the functionality implemented in the io, gating, and dimred modules. It orchestrates:

  • training data loading and processing, model training, and pipeline saving;

  • as well as pipeline loading, automated gating, dimensionality, and export of results to FCS for inference on new data.

class flagx.pipeline.GatingPipeline(train_data_file_path: str | None = None, train_data_file_names: List[str] | None = None, train_data_file_type: typing_extensions.Literal.('fcs', 'csv', 'lmd') | None=None, save_path: str | None = None, channels: List[int] | List[str] | None = None, label_key: int | str | None = None, compensate: bool = False, channel_names_alignment_kwargs: Dict[str, ~typing.Any] | None=None, relabel_data_kwargs: Dict[str, ~typing.Any] | None=None, preprocessing_kwargs: Dict[str, ~typing.Any] | None=None, downsampling_kwargs: Dict[str, ~typing.Any] | None=None, gating_method: typing_extensions.Literal.('som', 'mlp')='som', gating_method_kwargs: Dict[str, ~typing.Any] | None=None, prediction_threshold: float | None = None, verbosity: int = 1)

Bases: object

End-to-end flow cytometry gating pipeline supporting preprocessing, downsampling, dimensionality reduction, and supervised or unsupervised gating.

This class orchestrates the full workflow:

  1. Load raw FCS/CSV/LMD files

  2. Align channel names

  3. (Optional) Relabel training data

  4. (Optional) Preprocess data sample-wise

  5. (Optional) Downsample data

  6. Train the gating module

    • supervised: MLP classifier

    • supervised or unsupervised: SOM classifier

  7. Inference on new samples

  8. Optional dimensionality reduction (UMAP, SOM, PCA, t-SNE, etc.)

  9. Export annotated FCS files

train_data_file_path

Path to directory containing training data. Defaults to CWD.

Type:

str or None

train_data_file_names

Specific training filenames to load. If None, uses all files in directory.

Type:

list[str] or None

train_data_file_type

Input file type. If None, inferred from first filename.

Type:

Literal[‘fcs’,’csv’, ‘lmd’] or None

save_path

Output directory for pipeline metadata and results. If None, defaults to CWD.

Type:

str or None

channels

Indices or names of channels to train on.

Type:

list[int] or list[str] or None

label_key

Key to labels. Can be column index in .X, channel name (key in var_names), or key to .obs. If None, only unsupervised SOM training is available.

Type:

int, str, or None

compensate

Whether to apply compensation or not. Defaults to False. See FlowDataManager.sample_wise_compensation().

Type:

bool

channel_names_alignment_kwargs

Arguments forwarded to channel alignment. See FlowDataManager.align_channel_names().

Type:

dict or None

relabel_data_kwargs

Mapping for relabeling training data. See FlowDataManager.relabel_data().

Type:

dict or None

preprocessing_kwargs

Sample-wise preprocessing configuration. See FlowDataManager.sample_wise_preprocessing().

Type:

dict or None

downsampling_kwargs

Sample-wise downsampling configuration. See FlowDataManager.sample_wise_downsampling().

Type:

dict or None

gating_method

Which model to train.

Type:

Literal[‘som’,’mlp’]

gating_method_kwargs

Additional arguments for SOM or MLP.

Type:

dict or None

prediction_threshold
  • Binary case: Prediction threshold. Defaults to 0.5.

  • Multiclass case: If prediction certainty below threshold classifier abstains from making prediction. Event is marked with -1. Defaults to 0.0, i.e. no abstention.

Type:

float or None

verbosity

Logging level.

Type:

int

is_trained_

Whether the pipeline has been successfully trained.

Type:

bool

gating_module_

The fitted gating model.

Type:

SOMClassifier or MLPClassifier or None

binary_classes_

Whether the task is binary classification.

Type:

bool or None

Parameters:
  • train_data_file_path (str or None) – Path to directory containing training data. Defaults to CWD.

  • train_data_file_names (list[str] or None) – Specific training filenames to load. If None, uses all files in directory.

  • train_data_file_type (Literal['fcs','csv', 'lmd'] or None) – Input file type. If None, inferred from first filename.

  • save_path (str or None) – Output directory for pipeline metadata and results. If None, defaults to CWD.

  • channels (list[int] or list[str] or None) – Indices or names of channels to train on.

  • label_key (int, str, or None) – Key to labels in .X, .obs, or .layers. If None, only unsupervised SOM training is available.

  • compensate (bool) – Whether to apply compensation or not. Defaults to False.

  • channel_names_alignment_kwargs (dict or None) – Arguments forwarded to channel alignment.

  • relabel_data_kwargs (dict or None) – Mapping for relabeling training data.

  • preprocessing_kwargs (dict or None) – Sample-wise preprocessing configuration.

  • downsampling_kwargs (dict or None) – Sample-wise downsampling configuration.

  • gating_method (Literal['som','mlp']) – Which model to train.

  • gating_method_kwargs (dict or None) – Additional arguments for SOM or MLP.

  • prediction_threshold (float or None) – Binary/abstention threshold; defaults chosen automatically.

  • verbosity (int) – Logging level.

Returns:

None

train()

Train the full gating pipeline.

This executes the full training workflow:

  • Load raw data

  • (Optional) Align channel names

  • (Optional) Relabel and preprocess

  • (Optional) Downsample

  • Construct training matrix from all samples

  • Train SOM or MLP gating module

The gating module is stored in self.gating_module_.

Raises:
  • ValueError – If MLP is selected but label_key is None.

  • ValueError – If binary labels are not exactly {0, 1}.

inference(data_file_path: str | None = None, data_file_names: List[str] | None = None, sample_wise: bool = False, gate: bool = True, dim_red_methods: ('som', 'pca', 'umap', 'tsne', 'isomap', 'locallylinearembedding', 'mds', 'spectralembedding'), ...] | None=('umap', ), dim_red_method_kwargs: Dict[str, ~typing.Any] | None, ...] | None=None, save_path: str | None = None, save_filename: str | None = None, scale_channels: List[str] | None = None, val_range: Tuple[float, float]=(0.0, 1048576), keep_unscaled: bool = False)

Apply the trained pipeline to new data for gating and/or dimensionality reduction.

This performs: - Data loading + preprocessing (same as during training) - (Optional) Prediction using the trained model - (Optional) Dimensionality reduction using one or more methods - Export to FCS file(s) with annotations added in new channels

Parameters:
  • data_file_path (str or None) – Directory containing inference data.

  • data_file_names (list[str] or None) – Specific inference filenames.

  • sample_wise (bool) – If True, run dimension reduction and export separately per sample.

  • gate (bool) – Whether to apply the trained gating model.

  • dim_red_methods (tuple[str] or None) – Dimensionality reduction methods to apply.

  • dim_red_method_kwargs (tuple[dict] or None) – One kwargs dict per method.

  • save_path (str or None) – Output directory for FCS export.

  • save_filename (str or None) – Base filename for exported FCS.

  • scale_channels (list[str] or None) – Additional channels to scale for FCS export (e.g., previously added integer labels).

  • val_range (tuple[float,float]) – Value range for scaling when writing FCS. (This is done for proper display of the added annotations in standard analysis software.)

  • keep_unscaled (bool) – Whether to also retain unscaled values in separate channels.

Returns:

None

Raises:
  • NotFittedError – If gating was requested but the model is not trained.

  • ValueError – If dimensionality reduction kwargs do not match number of methods.

save(filename: str = 'gating_pipeline.pkl', filepath: str | None = None)

Save the pipeline to a pickle file, including the gating model.

Parameters:
  • filename (str) – Output filename.

  • filepath (str or None) – Directory to save to. Defaults to pipeline save_path.

Returns:

None

classmethod load(filename: str = 'gating_pipeline.pkl', filepath: str | None = None)

Load a previously saved GatingPipeline.

Parameters:
  • filename (str) – Pipeline pickle filename.

  • filepath (str or None) – Directory path for the file. Defaults to CWD.

Returns:

Fully restored pipeline instance.

Return type:

GatingPipeline