This Tool Will 10x Your Analysis

May 28, 2025

Feature engineering is typically considered a Machine Learning tool. But it is way more than that.

Feature engineering is when we modify or create columns or variables in our data.

You can use it in simple analysis as well, and you should. Here is why:

(At the end, I will show how I used it in my latest analysis)

Data is usually limited

We always say that more (quality) data is better. But data availability is usually limited. Sometimes it is intentional so we don’t store “unnecessary” info, or sometimes the source has gaps.

In either case, feature engineering can help you a lot. If data has gaps, you can fill them, and feature engineering is a must. In other cases, feature engineering can add some spice to the analysis.

The power of feature engineering

If you see a dataset that has only a few columns, don’t think immediately that it is not good for analysis. From a few useful columns, you can make (engineer) a lot of helpful features.

Here are some ideas for this tiny table:

From timestamp:

hour, day, weekday, month, is_weekend
time_of_day (e.g. morning/afternoon/evening/night)
session_duration (time diff between user activities)
days_since_last_activity
is_peak_hour (based on business-defined hours)

From amount:

amount_bucket (e.g. low/medium/high spender)
relative_amount (compared to the user’s average)
rolling_avg_amount (per user over a period)
spending_growth (compared to the previous spend)

Then, from these new columns, you can go even further, combine 2-3 columns, and build up features like a tree.

You have a lot of options to play with your data:

Create new features:
- From date_of_birth, calculate age
Encode:
- Convert education_level into ordinal values (e.g. HS=1, BSc=2, MSc=3)
Scale, normalize
Bin:
- Categorize transaction_amount into “low”, “medium”, “high”
- Group age into buckets: “18–25”, “26–35”, etc.
Extract:
- From full_name, extract first_name
- From timestamp, extract weekday
Aggregate:
- Count number_of_purchases per user
- Calculate average_order_value per customer
Text manipulation
Handle missing data:
- Use a binary flag column income_missing to indicate NAs

The more features you have in the raw data, the easier feature engineering is. You can exponentially increase the number of features, and with that, the value of your analysis.

To be good at feature engineering, you must have great domain knowledge. Everyone can subtract two dates from each other, but the real heroes create features a rookie would never think about.

Never underestimate a basic table, since with feature engineering, you can bring it to life!