protdata.io.read_maxquant

protdata.io.read_maxquant#

protdata.io.read_maxquant(file, intensity_column_prefixes=['LFQ intensity ', 'Intensity ', 'MS/MS count '], index_column='Protein IDs', filter_columns=['Only identified by site', 'Reverse', 'Potential contaminant'], sep='\\t')#

Load MaxQuant proteinGroups.txt into an AnnData object.

Parameters:

file Union[str, DataFrame]: Path to the MaxQuant proteinGroups.txt file or a pandas DataFrame containing the data.
intensity_column_prefixes Union[List[str], str] (default: ['LFQ intensity ', 'Intensity ', 'MS/MS count ']): Prefix(es) for intensity columns to extract. The first prefix is used for the main matrix (X), others are stored as layers if present.
index_column str (default: 'Protein IDs'): Column name to use as protein index.
filter_columns list[str] (default: ['Only identified by site', 'Reverse', 'Potential contaminant']): Columns to use for filtering out contaminants or unwanted entries.
sep str (default: '\\t'): File separator if reading from file.

Return type:

AnnData

Returns:

anndata.AnnData object with:

X: intensity matrix (samples x proteins)
var: protein metadata (indexed by protein IDs)
obs: sample metadata (indexed by sample names)
layers: additional intensity matrices if multiple intensity column prefixes are provided

Notes

The first intensity column prefix is used for the main matrix (X), others are stored as layers if present.
Forward slashes (/) are not allowed in hdf5 keys, so they are replaced with underscores (_).

protdata.io.read_maxquant

Contents

protdata.io.read_maxquant#