Data analytics visualization

Portfolio

Real-world analytics challenges I've solved. All details are NDA-safe and anonymized to protect client confidentiality.

Featured Deep Dive

Predictive Valuation Engine

End-to-end ML solution combining property and neighborhood risk data

Python SQL AWS Random Forest Feature Engineering

1 Business Context

A regional real estate investment firm needed to systematically evaluate residential property values across the Chicago metropolitan area. Their existing approach relied on manual comparable analysis, which couldn't scale to their deal flow, and failed to account for neighborhood-level risk factors that significantly influence property values. The business question was clear: how can we predict property value with enough precision to make faster, more confident acquisition decisions?

2 The Data Challenge

The project required integrating multiple large-scale datasets that didn't naturally align:

Property Transaction Data

~50,000 residential sale records with property attributes (beds, baths, square footage, property type, year built) and sale prices across multiple years.

Crime Incident Records

900,000+ crime incident logs spanning four years with latitude/longitude coordinates, incident type, and timestamp—requiring geographic aggregation by neighborhood.

The key engineering challenge was linking crime data (reported at latitude/longitude) to property records (reported by address and neighborhood name). This required building a neighborhood-to-community-area mapping layer and aggregating crime statistics at that level.

3 Technical Approach

Data Preparation & Feature Engineering

Using Python (pandas, NumPy) and SQL for transformation, I cleaned and standardized property records, handled missing values through MICE imputation, and created derived features like price-per-square-foot and property age. For crime data, I aggregated incidents by community area and crime type to create neighborhood risk indices.

Geospatial Integration

Built a mapping dictionary to translate 77 Chicago community area codes to neighborhood names, enabling the join between crime statistics and property records. This step alone unlocked the ability to incorporate external risk factors into the valuation model.

Model Development

Tested multiple regression approaches including Ridge Regression, Random Forest, and Gradient Boosting. Used GridSearchCV for hyperparameter tuning and cross-validation to prevent overfitting. The final ensemble model combined property-level features with neighborhood crime density scores.

Model Feature Importance

Top predictive features from Random Forest model (importance scores normalized)

4 Key Insights

85%+

Variance explained by final model

Top 5

Features drove 60% of predictive power

+12%

Accuracy lift from crime features

Crime density at the neighborhood level proved to be a significant predictor—properties in higher-crime areas showed predictable valuation discounts even after controlling for property characteristics. This validated the business intuition and provided a quantifiable risk adjustment factor.

5 Business Impact

The model enabled the firm to screen acquisition targets at scale, prioritize site visits based on predicted value-to-price ratios, and quantify neighborhood risk in investment memos. The approach reduced due diligence time per property and provided a defensible, data-driven basis for pricing negotiations.

Technical Stack

Languages: Python (pandas, NumPy, scikit-learn), SQL  |  Methods: Random Forest, Ridge Regression, GridSearchCV, MICE Imputation, TF-IDF  |  Tools: AWS, Jupyter, Folium (geospatial visualization)

Based on graduate research — University of Chicago, MS in Analytics

Additional Case Studies

Selected projects demonstrating range across analytics disciplines.

Executive BI

Executive Revenue Dashboard

Problem

Leadership lacked real-time visibility into revenue cycle performance across service lines.

Approach

Unified KPI framework with automated Power BI dashboards and stakeholder alignment on definitions.

Outcome

Reporting reduced from 2 weeks to real-time. Identified $1.2M recoverable revenue in 90 days.

Diagnostic Analytics

Denials Analytics & Root Cause Engine

Problem

High claim denial rates eroding margins with no systematic root cause analysis.

Approach

Built denial classification model (Python/SQL) with Pareto-style prioritization by payer and denial code.

Outcome

Top 5 drivers identified (60% of lost revenue). Process changes reduced denials 18%.

Marketing Analytics

Email Campaign Click-Through Rate Analysis

Problem

Marketing couldn't determine which ad content features drove email engagement across 1,400+ unique campaigns.

Approach

Engineered CTR as target variable; built regression models with NLP-derived content features from email metadata.

Outcome

Identified that amenity offers increased CTR while unbranded content hurt performance. 50%+ variance explained.

Graduate capstone — University of Chicago
Predictive Analytics

Operational Capacity Forecasting Model

Problem

Reactive staffing led to overtime costs and service delays during peak periods.

Approach

Time-series forecasting model (Python) projecting demand 4-6 weeks ahead, integrated with ops dashboards.

Outcome

Staffing accuracy improved 25%, reduced overtime, eliminated recurring bottlenecks.

Data Products

Self-Service Analytics Platform

Problem

Ad hoc data requests overwhelmed the analytics team, creating backlogs on both sides.

Approach

Self-service semantic layer with governed models, user training, and request triage protocols.

Outcome

Reduced ad hoc requests by 40%. Freed analyst time for strategic work.

Data Governance

KPI Alignment & Metric Governance Initiative

Problem

Conflicting metric definitions eroded trust in data across departments.

Approach

Cross-functional workshops, canonical definitions, governed metric catalog integrated with BI tools.

Outcome

Single source of truth across 5 departments. Reduced disputes, faster decisions.

Have a Similar Challenge?

I partner with organizations on analytics leadership, consulting, and advisory engagements — from full-time roles to project-based work.

Start a Conversation