protdata.io.read_maxquant

Contents

protdata.io.read_maxquant#

protdata.io.read_maxquant(file, intensity_column_prefixes=['LFQ intensity ', 'Intensity ', 'MS/MS count '], index_column='Protein IDs', filter_columns=['Only identified by site', 'Reverse', 'Potential contaminant'], sep='\\t')#

Load MaxQuant proteinGroups.txt into an AnnData object.

Parameters:
file Union[str, DataFrame]

Path to the MaxQuant proteinGroups.txt file or a pandas DataFrame containing the data.

intensity_column_prefixes Union[List[str], str] (default: ['LFQ intensity ', 'Intensity ', 'MS/MS count '])

Prefix(es) for intensity columns to extract. The first prefix is used for the main matrix (X), others are stored as layers if present.

index_column str (default: 'Protein IDs')

Column name to use as protein index.

filter_columns list[str] (default: ['Only identified by site', 'Reverse', 'Potential contaminant'])

Columns to use for filtering out contaminants or unwanted entries.

sep str (default: '\\t')

File separator if reading from file.

Return type:

AnnData

Returns:

anndata.AnnData object with:

  • X: intensity matrix (samples x proteins)

  • var: protein metadata (indexed by protein IDs)

  • obs: sample metadata (indexed by sample names)

  • layers: additional intensity matrices if multiple intensity column prefixes are provided

Notes

  • The first intensity column prefix is used for the main matrix (X), others are stored as layers if present.

  • Forward slashes (/) are not allowed in hdf5 keys, so they are replaced with underscores (_).