lisa.datautils.df_deduplicate#

lisa.datautils.df_deduplicate(df, keep, consecutives, cols=None, all_col=True)[source]#

Same as series_deduplicate() but for pandas.DataFrame.

Parameters:

cols (list(str) or None) – Only consider these columns when looking for duplicates. By default, all columns are considered
all_col (bool) – If True, remove a row when all the columns have duplicated value. Otherwise, remove the row if any column is duplicated.