Case Studies

Featured Deep Dive

Predictive Valuation Engine

End to end ML solution combining property and neighborhood risk data

Python SQL AWS Random Forest Feature Engineering

1 Business Context

A regional real estate investment firm needed to systematically evaluate residential property values across the Chicago metropolitan area. Their existing approach relied on manual comparable analysis, which couldn't scale to their deal flow, and failed to account for neighborhood-level risk factors that significantly influence property values. The business question was clear: how can we predict property value with enough precision to make faster, more confident acquisition decisions?

2 The Data Challenge

The project required integrating multiple large-scale datasets that didn't naturally align:

Property Transaction Data

~50,000 residential sale records with property attributes (beds, baths, square footage, property type, year built) and sale prices across multiple years.

Crime Incident Records

900,000+ crime incident logs spanning four years with latitude/longitude coordinates, incident type, and timestamp, requiring geographic aggregation by neighborhood.

Chicago Crime Density by Community Area

Chicago crime density choropleth map showing crime distribution across community areas

Geographic visualization of crime density used to create neighborhood risk indices

The key engineering challenge was linking crime data (reported at latitude/longitude) to property records (reported by address and neighborhood name). This required building a neighborhood-to-community-area mapping layer and aggregating crime statistics at that level.

3 Technical Approach

Data pipeline showing raw data sources through cleaning, feature engineering, model training to business insights

Data Preparation & Feature Engineering

Using Python (pandas, NumPy) and SQL for transformation, I cleaned and standardized property records, handled missing values through MICE imputation, and created derived features like price per square foot and property age. For crime data, I aggregated incidents by community area and crime type to create neighborhood risk indices.

Geospatial Integration

Built a mapping dictionary to translate 77 Chicago community area codes to neighborhood names, enabling the join between crime statistics and property records. This step alone unlocked the ability to incorporate external risk factors into the valuation model.

Model Development

Tested multiple regression approaches including Ridge Regression, Random Forest, and Gradient Boosting. Used GridSearchCV for hyperparameter tuning and cross-validation to prevent overfitting. The final ensemble model combined property-level features with neighborhood crime density scores.

Model Performance Comparison

Bar chart comparing R-squared and cross-validation scores across Linear Regression, Ridge, Lasso, Elastic Net, Decision Tree, and Random Forest models

Model Feature Importance

Top predictive features from Random Forest model (importance scores normalized)

4 Key Insights

85%+

Variance explained by final model

Top 5

Features drove 60% of predictive power

+12%

Accuracy lift from crime features

Crime density at the neighborhood level proved to be a significant predictor. Properties in higher crime areas showed predictable valuation discounts even after controlling for property characteristics. This validated the business intuition and provided a quantifiable risk adjustment factor.

5 Business Impact

The model enabled the firm to screen acquisition targets at scale, prioritize site visits based on predicted value to price ratios, and quantify neighborhood risk in investment memos. The approach reduced due diligence time per property and provided a defensible, data driven basis for pricing negotiations.

Technical Stack

Languages: Python (pandas, NumPy, scikit-learn), SQL | Methods: Random Forest, Ridge Regression, GridSearchCV, MICE Imputation, TF-IDF | Tools: AWS, Jupyter, Folium (geospatial visualization)

Based on graduate research at University of Chicago, MS in Analytics

Executive BI

Executive Revenue Dashboard

Executive revenue cycle performance dashboard showing KPIs, revenue trends, denial analysis, and payer performance

Problem

Leadership lacked real-time visibility into revenue cycle performance across service lines.

Approach

Unified KPI framework with automated Power BI dashboards and stakeholder alignment on definitions.

Outcome

Reporting reduced from 2 weeks to real-time. Identified $1.2M recoverable revenue in 90 days.

Diagnostic Analytics

Denials Analytics & Root Cause Engine

Pareto chart showing denial root causes with authorization issues, coding errors, and eligibility verification as top drivers

Problem

High claim denial rates eroding margins with no systematic root cause analysis.

Approach

Built denial classification model (Python/SQL) with Pareto-style prioritization by payer and denial code.

Outcome

Top 5 drivers identified (60% of lost revenue). Process changes reduced denials 18%.

Marketing Analytics

Email Campaign Click-Through Rate Analysis

Diverging bar chart showing content features impact on email click-through rate

Problem

Marketing couldn't determine which ad content features drove email engagement across 1,400+ unique campaigns.

Approach

Engineered CTR as target variable; built regression models with NLP-derived content features from email metadata.

Outcome

Identified that amenity offers increased CTR while unbranded content hurt performance. 50%+ variance explained.

Graduate capstone at University of Chicago

Predictive Analytics

Operational Capacity Forecasting Model

Time series chart showing actual demand vs forecast with confidence intervals

Problem

Reactive staffing led to overtime costs and service delays during peak periods.

Approach

Time-series forecasting model (Python) projecting demand 4 to 6 weeks ahead, integrated with ops dashboards.

Outcome

Staffing accuracy improved 25%, reduced overtime, eliminated recurring bottlenecks.

Data Products

Self-Service Analytics Platform

Problem

Ad hoc data requests overwhelmed the analytics team, creating backlogs on both sides.

Approach

Self-service semantic layer with governed models, user training, and request triage protocols.

Outcome

Reduced ad hoc requests by 40%. Freed analyst time for strategic work.

Data Governance

KPI Alignment & Metric Governance Initiative

Problem

Conflicting metric definitions eroded trust in data across departments.

Approach

Cross-functional workshops, canonical definitions, governed metric catalog integrated with BI tools.

Outcome

Single source of truth across 5 departments. Reduced disputes, faster decisions.

Predictive Valuation Engine

1 Business Context

2 The Data Challenge

Property Transaction Data

Crime Incident Records

Chicago Crime Density by Community Area

3 Technical Approach

Data Preparation & Feature Engineering

Geospatial Integration

Model Development

Model Performance Comparison

Model Feature Importance

4 Key Insights

5 Business Impact

Technical Stack

Additional Case Studies

Executive Revenue Dashboard

Problem

Approach

Outcome

Denials Analytics & Root Cause Engine

Problem

Approach

Outcome

Email Campaign Click-Through Rate Analysis

Problem

Approach

Outcome

Operational Capacity Forecasting Model

Problem

Approach

Outcome

Self-Service Analytics Platform

Problem

Approach

Outcome

KPI Alignment & Metric Governance Initiative

Problem

Approach

Outcome

Have a Similar Challenge?