API Reference: qpx.Tabular
This document serves as the comprehensive API reference for the qpx.Tabular wrapper class. Below you will find the exact method signatures, parameters, and return types for every function available.
1. Class Initialization
__init__(self, df)
Initializes the Tabular wrapper around a pandas DataFrame.
- df (pd.DataFrame): The raw dataset to process.
2. Analytical Profiling (summary)
shape()
- Returns: (tuple) A tuple containing
(num_rows, num_columns).
num_rows() / num_columns()
- Returns: (int) The integer count of rows or columns.
column_names()
- Returns: (list) A list of all column headers in the DataFrame.
column_types()
- Returns: (dict) A mapping of
{column_name: string_dtype}.
numerical_columns() / bool_columns() / categorical_columns() / datetime_columns()
- Returns: (list) A list of column names that match the specified data type.
memory_usage()
- Returns: (str) A human-readable string representing the memory footprint (e.g.,
"15.2 MB").
dataset_info()
- Returns: (dict) A lightweight dictionary containing core metrics (rows, columns, missing count, memory).
summary()
- Returns: (dict) A higher-level summary aggregating missing percentages, categorical counts, and numerical counts.
feature_overview()
- Returns: (pd.DataFrame) A detailed DataFrame where the index represents your columns and the columns contain metadata (Missing %, Unique values, Memory usage).
dataset_health()
- Returns: (dict) A health scorecard analyzing critical flaws like zero-variance columns and high missing rates.
ml_readiness()
- Returns: (dict) A strict pass/fail assessment determining if the data can be fed into an ML algorithm (checks for nulls, non-numeric types, etc.).
column_profile(column_names=None)
column_names(list, optional): Specific columns to profile. If None, profiles all.- Returns: (dict) A deep statistical and structural dictionary profile for the specified features.
3. Statistical Diagnostics (statistics)
numeric_summary()
- Returns: (pd.DataFrame) Statistical summary of numerical data (mean, median, std, skewness, kurtosis, IQR).
categorical_summary()
- Returns: (pd.DataFrame) Statistical summary of categorical data (mode, unique count, top frequency).
outlier_summary(detailed=False)
detailed(bool, default=False): If True, returns a dictionary containing the actual outlier index locations. If False, returns a DataFrame with outlier counts.- Returns: (pd.DataFrame or dict) Outlier assessment based on IQR bounds.
multicollinearity_report()
- Returns: (pd.DataFrame) Variance Inflation Factor (VIF) for all numerical features to detect multicollinearity.
- Raises:
ValueErrorif the dataset contains less than two numerical features.
statistical_snapshot()
- Returns: (dict) A master dictionary combining
numeric_summary,categorical_summary, andoutlier_summary.
4. Data Preprocessing (preprocessing)
sanitize_col_names(case="snake", remove_special_chars=True, replace_spaces="_", ensure_unique=True, return_mapping=False)
- Parameters: Modifies column headers to be machine-readable.
- Returns: (dict or None) Mapping of old to new names if requested. Modifies DataFrame in place.
boolify(columns=None, true_value=True, false_value=False, return_info=False, inplace=False)
- Parameters: Casts pseudo-boolean object columns into strict boolean types.
- Returns: (dict or DataFrame) Depending on
inplaceandreturn_info. - Raises:
KeyErrorif an explicitly provided column is not found in the DataFrame.
drop_constant(columns=None, ignore_na=True, return_info=False, inplace=False)
- Parameters: Drops columns with zero variance (only 1 unique value).
- Returns: (dict or DataFrame).
label_encode(columns=None, inplace=False, return_mapping=False)
- Parameters: Numerically encodes categorical text variables using numerical labels.
- Returns: (dict or DataFrame).
- Raises:
KeyErrorif an explicitly provided column is not found in the DataFrame.
one_hot_encode(columns=None, inplace=False, return_info=False, drop_original=False, max_unique=20)
- Parameters: Creates binary dummy variables for categories up to
max_uniquethreshold. - Returns: (dict or DataFrame).
- Raises:
KeyErrorif an explicitly provided column is not found in the DataFrame.
ordinal_encode(order, inplace=False, return_mapping=True, strict=True)
- Parameters:
order(dict) Mapping of{column: [hierarchical_list]}. - Returns: (dict or DataFrame).
- Raises:
KeyErrorif a column inorderis not found.ValueErrorifstrict=Trueand unknown categories are encountered.
auto_encode(columns=None, order=None, max_onehot=10, inplace=False, return_report=False)
- Parameters: Intelligently decides between one-hot and label encoding based on the cardinality of the column.
- Returns: (dict or DataFrame).
- Raises:
KeyErrorif an explicitly provided column is not found in the DataFrame.
compress_dtype(columns=None, downcast_int=True, downcast_float=True, convert_category=True, category_threshold=0.5, inplace=False, return_report=False)
- Parameters: Slashes memory footprint by safely downcasting bit-precision (e.g., float64 to float32).
- Returns: (dict or DataFrame).
- Raises:
KeyErrorif an explicitly provided column is not found in the DataFrame.
smart_fill(columns=None, strategy="auto", value=None, missing_threshold=0.6, inplace=False, return_report=False)
- Parameters: Imputes NaNs using median for numerics, mode for categoricals, or drops columns exceeding
missing_threshold. - Returns: (dict or DataFrame).
- Raises:
KeyErrorif an explicitly provided column is not found in the DataFrame.
drop_high_cardinality(columns=None, threshold=0.8, return_info=False, inplace=False)
- Parameters: Drops nominal columns like IDs/Names where unique count / total rows >
threshold. - Returns: (dict or DataFrame).
- Raises:
KeyErrorif an explicitly provided column is not found in the DataFrame.
drop_highly_correlated(threshold=0.9, method="pearson", target_col=None, return_info=False, inplace=False)
- Parameters: Prevents multicollinearity by dropping one feature from pairs that correlate above
threshold. Iftarget_colis provided, keeps the feature most correlated with the target. - Returns: (dict or DataFrame).
auto_preprocess(...)
auto_preprocess(
target_col=None, sanitize=True, cast_numeric=True, drop_const=True,
drop_high_cardinality_cols=False, drop_correlated_cols=False,
boolify_cols=True, fill_nan=True, encode=True, compress=True,
missing_threshold=0.6, high_cardinality_threshold=0.8,
correlation_threshold=0.9, max_onehot=10, return_report=False
)
5. Data Visualization (visuals)
Note: All visualization methods render directly using matplotlib/seaborn and do not return data objects.
corr_map(target=None, ignore_cols=None, method="spearman")
target(str, optional): Specific target column to isolate correlations against.ignore_cols(list, optional): Columns to skip.method(str, default="spearman"): Correlation method (pearson,kendall,spearman).- Raises:
ValueErroriftargetis provided but is not a numeric column.
pca_plot(input_cols=None, target=None, n_components=2, sample_space=None, figsize=(8, 6))
target(str, optional): Target column used to color-code clusters.n_components(int, default=2): Accepts2or3for 2D or 3D PCA plotting.sample_space(int, optional): Randomly subsamples large datasets to speed up rendering.- Raises:
ValueErrorfor invalidn_components, missingtargetcolumns, insufficient numerical features, or invalidsample_space.
relationship_map(target, input_cols=None, ignore_cols=None, sample_space=1000, figsize=(9, 7))
target(str, required): The outcome variable to compare features against.input_cols(list, optional): The subset of features to compare.- Raises:
ValueErrorif the specifiedtargetcolumn is not found in the DataFrame.
distribution_map(cols=None, ignore_cols=None, sample_space=1000, figsize=(8, 6))
cols(list, optional): Features to plot distributions for. Built-in logic ignores high-cardinality IDs automatically.
feature_cluster_map(ignore_cols=None, method="spearman", figsize=(10, 10))
- Parameters: Generates a hierarchically-clustered heatmap. Excellent for visually spotting multicollinearity.
- Raises:
ValueErrorif the DataFrame contains fewer than 2 numeric features.