Input/Output
Convert dataframe to one-hot matrix.
- param df
Input dataframe for which the rows are the features, and colums are the samples.
- type df
pd.DataFrame()
- param dtypes
Representation of the columns in the form of [‘cat’,’num’]. By default the dtype is determiend based on the pandas dataframe.
- type dtypes
list of str or ‘pandas’, optional
- param y_min
Minimal number of sampels that must be present in a group. All groups with less then y_min samples are labeled as _other_ and are not used in the enriching model. The default is None.
- type y_min
int [0..len(y)], optional
- param perc_min_num
This parameters can be used to force variables into numeric ones if unique non-zero values are above the percentage. The default is None. Alternative can be 0.8
- type perc_min_num
float [None, 0..1], optional
- param hot_only
When True; the output of the onehot matrix exclusively contains categorical values that are transformed to one-hot. The default is True.
- type hot_only
bool [True, False], optional
- param deep_extract
True: Extract information from a vector that contains a list/array/dict. False: converted to a string and treated as catagorical [‘cat’].
- type deep_extract
bool [False, True] (default : False)
- param excl_background
Remove values/strings that labeled in the list. As an example, the following column: [‘yes’, ‘no’, ‘yes’, ‘yes’,’no’,’unknown’, …], is split into ‘column_yes’, ‘column_no’ and ‘column_unknown’. If unknown listed, then ‘column_unknown’ is not transformed into a new one-hot column. The default is None (every possible name is converted into a one-hot column)
- type excl_background
list or None, [0], [0, ‘0.0’, ‘unknown’, ‘nan’, ‘None’ …], optional
- param verbose
Print message to screen. The default is 3. 0: (default), 1: ERROR, 2: WARN, 3: INFO, 4: DEBUG, 5: TRACE
- type verbose
int, optional
- returns
dict
numeric (DataFrame) – Input-dataframe with converted numerical values
onehot (DataFrame) – Input-dataframe with converted one-hot values. Note that continuous values are only removed if hot_only=True.
labx (list of str) – Input feature-labels or names
df (DataFrame) – Input-dataframe but with set dtypes. Note that df is extended if deep_extract=True
labels (list of str) – Column names of df
dtypes (list of str) – dtypes for the feature-labels for df in the form of ‘num’ (numerical) and/or ‘cat’ (categorical).
Examples
>>> import df2onehot
>>> df = df2onehot.import_example()
>>> out = df2onehot.df2onehot(df)