df_operations

Contains some utility functions related to pd.DataFrames, including filtering functions

MDMC.common.df_operations.filter_dataframe(values: Sequence, dataframe: pandas.core.frame.DataFrame, column_names: list[str] = None, column_regex: str = None) pandas.core.frame.DataFrame[source]

This filter ignores rows which are duplicated (i.e. it only returns the first occurence of any duplicated rows).

Parameters
  • values (Sequence) – The values for which to filter. If any of these values occur in any of the columns defined by column_names or column_regex, the row will be included in the filtered return.

  • dataframe (pandas.DataFrame) – The pd.DataFrame object to be filtered

  • column_names (list, optional) – A list of str specifying the names of the columns which will be used to filter the Dataframe. This cannot be passed if column_regex is also passed.

  • column_regex (str) – A regular expression matching one or more column names. This specifies which columns will be used to filter the DataFrame. This cannot be passed if column_names is also passed.

Returns

A DataFrame which has been filtered so that each value in values must occur in one of the columns of DataFrame that are specified by column_names or matched by column_regex

Return type

pandas.DataFrame

MDMC.common.df_operations.filter_ordered_dataframe(values: Sequence, dataframe: pandas.core.frame.DataFrame, column_names: list[str] = None, column_regex: str = None, wildcard: str = None) pandas.core.frame.DataFrame[source]

Filters a pd.DataFrame with an iterable of ordered values. The values must occur in columns in the correct order, with the order specified by column_names, or by the order which column order which occurs from using column_regex.

This filter ignores rows which are duplicated (i.e. it only returns the first occurence of any duplicated rows).

Parameters
  • values (Sequence) – The values for which to filter. If any of these values occur in any of the columns defined by column_names or column_regex, the row will be included in the filtered return.

  • dataframe (pandas.DataFrame) – The pd.DataFrame object to be filtered

  • column_names (list, optional) – A list of str specifying the names of the columns which will be used to filter the Dataframe. This cannot be passed if column_regex is also passed.

  • column_regex (str) – A regular expression matching one or more column names. This specifies which columns will be used to filter the DataFrame. This cannot be passed if column_names is also passed.

  • wildcard (str) – A str which will be a match in any column

Returns

A DataFrame which has been filtered so that each value in values must occur in one of the columns of DataFrame that are specified by column_names or matched by column_regex

Return type

pandas.DataFrame