df_operations
Contains some utility functions related to pd.DataFrames, including filtering functions
- MDMC.common.df_operations.filter_dataframe(values: Sequence, dataframe: DataFrame, column_names: list[str] = None, column_regex: str = None) DataFrame [source]
This filter ignores rows which are duplicated (i.e. it only returns the first occurence of any duplicated rows).
- Parameters:
values (Sequence) – The values for which to filter. If any of these values occur in any of the columns defined by
column_names
orcolumn_regex
, the row will be included in the filtered return.dataframe (pandas.DataFrame) – The
pd.DataFrame
object to be filteredcolumn_names (list, optional) – A list of str specifying the names of the columns which will be used to filter the
Dataframe
. This cannot be passed ifcolumn_regex
is also passed.column_regex (str) – A regular expression matching one or more column names. This specifies which columns will be used to filter the
DataFrame
. This cannot be passed ifcolumn_names
is also passed.
- Returns:
A
DataFrame
which has been filtered so that each value invalues
must occur in one of the columns ofDataFrame
that are specified bycolumn_names
or matched bycolumn_regex
- Return type:
- MDMC.common.df_operations.filter_ordered_dataframe(values: Sequence, dataframe: DataFrame, column_names: list[str] = None, column_regex: str = None, wildcard: str = None) DataFrame [source]
Filters a
pd.DataFrame
with an iterable of ordered values. The values must occur in columns in the correct order, with the order specified bycolumn_names
, or by the order which column order which occurs from usingcolumn_regex
.This filter ignores rows which are duplicated (i.e. it only returns the first occurence of any duplicated rows).
- Parameters:
values (Sequence) – The values for which to filter. If any of these values occur in any of the columns defined by
column_names
orcolumn_regex
, the row will be included in the filtered return.dataframe (pandas.DataFrame) – The
pd.DataFrame
object to be filteredcolumn_names (list, optional) – A list of str specifying the names of the columns which will be used to filter the
Dataframe
. This cannot be passed ifcolumn_regex
is also passed.column_regex (str) – A regular expression matching one or more column names. This specifies which columns will be used to filter the
DataFrame
. This cannot be passed ifcolumn_names
is also passed.wildcard (str) – A str which will be a match in any column
- Returns:
A
DataFrame
which has been filtered so that each value invalues
must occur in one of the columns ofDataFrame
that are specified bycolumn_names
or matched bycolumn_regex
- Return type: