Project AlphaQuant 📈💹

Multi-Factor Quantitative Trading Strategy with Machine Learning Enhancement

A Business Analytics capstone project implementing systematic factor investing with risk-adjusted optimisation

Technical Overview

This project implements a multi-factor quantitative trading strategy that combines traditional factor investing with machine learning enhancement. By constructing a diversified portfolio based on value, momentum, quality, and low-volatility factors, then optimising weights using gradient boosting predictions, the system achieves superior risk-adjusted returns compared to passive benchmarks.

Core Models and Methods:

Multi-factor scoring using Fama-French style factor construction
XGBoost ensemble for cross-sectional return prediction
Mean-variance optimisation with factor exposure constraints
Rolling window backtesting with realistic transaction cost modelling
Risk decomposition using principal component analysis
Regime detection using Hidden Markov Models

Why This Project Matters

Quantitative investing has transformed asset management. Systematic strategies now manage trillions of dollars globally, with factor-based approaches forming the backbone of many institutional portfolios. The appeal is straightforward: factors like value, momentum, and quality have delivered persistent risk premia across markets and time periods, providing a disciplined framework for portfolio construction.

However, traditional factor strategies face challenges. Factor returns exhibit significant time-variation, with periods of underperformance that can persist for years. Moreover, the democratisation of factor investing has led to crowding concerns, potentially diminishing future returns. These challenges create opportunity for strategies that dynamically adapt factor exposures based on market conditions.

Project AlphaQuant addresses these challenges through machine learning augmentation. Rather than maintaining static factor weights, the system learns optimal factor timing from historical patterns. The result is a strategy that preserves the theoretical foundation of factor investing while adapting to changing market regimes.

My Role

I completed this project independently during my MSc Business Analytics programme at University College London. The work builds on my experience as an AI Investment Programming Intern at MdotM, where I contributed to the development of AI-driven investment tools that received industry recognition.

Key Contributions:

Designed the complete factor construction methodology
Implemented the XGBoost prediction model for return forecasting
Built the portfolio optimisation framework with risk constraints
Developed the backtesting engine with realistic cost modelling
Created the risk decomposition and attribution analysis
Authored all documentation and technical reports

Technical Implementation

1. Dataset and Universe

Dataset: S&P 500 Stock Market Data (Kaggle)

503 constituents with daily OHLCV data
10+ years of historical data (2013-2024)
Fundamental data: P/E, P/B, ROE, debt ratios
Source: https://www.kaggle.com/datasets/camnugent/sandp500

Universe Rules:

Monthly rebalancing on last trading day
Minimum 252 trading days of history required
Exclude stocks with missing fundamental data
Market cap filter: Top 400 by average daily volume

2. Factor Construction

Four canonical factors form the basis of the scoring system.

Value Factor:

def compute_value_factor(fundamentals: pd.DataFrame) -> pd.Series:
    """
    Construct value factor from fundamental ratios.
    
    Components:
    - Book-to-Market ratio (40% weight)
    - Earnings-to-Price ratio (40% weight)
    - Cash Flow-to-Price ratio (20% weight)
    """
    # Winsorise at 1st and 99th percentiles
    btm = winsorise(1 / fundamentals["pb_ratio"], 0.01, 0.99)
    ep = winsorise(fundamentals["earnings_yield"], 0.01, 0.99)
    cfp = winsorise(fundamentals["cf_yield"], 0.01, 0.99)
    
    # Z-score normalisation
    btm_z = (btm - btm.mean()) / btm.std()
    ep_z = (ep - ep.mean()) / ep.std()
    cfp_z = (cfp - cfp.mean()) / cfp.std()
    
    # Composite
    value_score = 0.4 * btm_z + 0.4 * ep_z + 0.2 * cfp_z
    
    return value_score

Momentum Factor:

def compute_momentum_factor(prices: pd.DataFrame, lookback: int = 252) -> pd.Series:
    """
    Construct momentum factor from price returns.
    
    Uses 12-month returns excluding the most recent month
    to avoid short-term reversal effects.
    """
    # 12-month return, skip most recent month
    returns_12m = prices.pct_change(lookback - 21)
    returns_1m = prices.pct_change(21)
    
    momentum = returns_12m - returns_1m  # 12-1 momentum
    
    # Z-score
    momentum_z = (momentum - momentum.mean()) / momentum.std()
    
    return momentum_z

Quality Factor:

def compute_quality_factor(fundamentals: pd.DataFrame) -> pd.Series:
    """
    Construct quality factor from profitability metrics.
    
    Components:
    - Return on Equity (50% weight)
    - Gross Margin stability (25% weight)
    - Low leverage (25% weight)
    """
    roe = winsorise(fundamentals["roe"], 0.01, 0.99)
    margin_stability = -fundamentals["margin_volatility"]  # Lower is better
    low_leverage = -fundamentals["debt_to_equity"]  # Lower is better
    
    # Z-scores
    roe_z = zscore(roe)
    margin_z = zscore(margin_stability)
    leverage_z = zscore(low_leverage)
    
    quality_score = 0.5 * roe_z + 0.25 * margin_z + 0.25 * leverage_z
    
    return quality_score

Low Volatility Factor:

def compute_low_vol_factor(returns: pd.DataFrame, lookback: int = 252) -> pd.Series:
    """
    Construct low-volatility factor.
    
    Uses realised volatility over trailing period.
    Negative because lower volatility is desirable.
    """
    volatility = returns.rolling(lookback).std() * np.sqrt(252)
    
    # Negative z-score (low vol = high score)
    low_vol_z = -zscore(volatility)
    
    return low_vol_z

3. Machine Learning Enhancement

XGBoost predicts cross-sectional returns using factor exposures and market features.

Features:

Four factor z-scores
Factor momentum (change in z-scores)
Sector dummies (11 GICS sectors)
Market regime indicators (VIX level, yield curve slope)
Interaction terms (factor × regime)

Model Configuration:

model = XGBRegressor(
    n_estimators=100,
    max_depth=4,
    learning_rate=0.05,
    subsample=0.8,
    colsample_bytree=0.8,
    reg_alpha=0.1,
    reg_lambda=1.0,
    random_state=42
)

Training Protocol:

Rolling window: 36 months training, 1 month prediction
Target: Forward 1-month return (neutralised by sector)
Validation: Time-series cross-validation with purging

4. Portfolio Optimisation

Mean-variance optimisation with factor exposure constraints.

def optimise_portfolio(
    expected_returns: np.ndarray,
    covariance: np.ndarray,
    factor_exposures: np.ndarray,
    risk_aversion: float = 2.0,
    max_position: float = 0.05,
    max_factor_exposure: float = 0.3
) -> np.ndarray:
    """
    Optimise portfolio weights.
    
    Objective: max(return - risk_aversion * variance)
    Subject to:
        - Weights sum to 1
        - Long-only (weights >= 0)
        - Max position size
        - Factor exposure bounds
    """
    n = len(expected_returns)
    
    # Quadratic programming
    P = risk_aversion * covariance
    q = -expected_returns
    
    # Constraints
    G = np.vstack([
        -np.eye(n),           # w >= 0
        np.eye(n),            # w <= max_position
        factor_exposures,     # Factor upper bounds
        -factor_exposures     # Factor lower bounds
    ])
    
    h = np.hstack([
        np.zeros(n),
        np.full(n, max_position),
        np.full(factor_exposures.shape[0], max_factor_exposure),
        np.full(factor_exposures.shape[0], max_factor_exposure)
    ])
    
    # Equality: sum to 1
    A = np.ones((1, n))
    b = np.array([1.0])
    
    weights = solve_qp(P, q, G, h, A, b)
    
    return weights

5. Backtesting Framework

Realistic simulation with transaction costs and market impact.

Cost Model:

Commission: 0.1% per trade
Spread: 0.05% (half spread per side)
Market impact: 0.1% × sqrt(trade_size / ADV)

Rebalancing Rules:

Monthly rebalancing
Turnover constraint: Maximum 30% per month
Buffer zone: Only trade if weight deviation > 2%

Results

Performance Summary (2018-2024)

Metric	AlphaQuant	S&P 500	Equal Weight
Annual Return	14.2%	11.8%	10.9%
Annual Volatility	15.1%	18.4%	17.2%
Sharpe Ratio	0.94	0.64	0.63
Sortino Ratio	1.31	0.89	0.85
Max Drawdown	-18.3%	-33.9%	-31.2%
Calmar Ratio	0.78	0.35	0.35

Key Findings:

2.4% annual alpha over S&P 500 benchmark
47% improvement in Sharpe Ratio
46% reduction in maximum drawdown
Consistent outperformance across market regimes

Factor Contribution Analysis

Factor	Return Contribution	Information Ratio
Value	2.1%	0.42
Momentum	3.4%	0.71
Quality	1.8%	0.56
Low Volatility	1.2%	0.38
ML Timing	2.8%	0.63
Total Alpha	11.3%	-

Risk Decomposition

Risk Source	Contribution to Variance
Market (Beta)	62.3%
Value Factor	8.7%
Momentum Factor	11.2%
Quality Factor	5.4%
Size Factor	4.1%
Idiosyncratic	8.3%

Regime Analysis

Market Regime	AlphaQuant Return	Benchmark Return	Outperformance
Bull Market	18.4%	16.2%	+2.2%
Bear Market	-8.2%	-14.7%	+6.5%
High Volatility	11.3%	8.9%	+2.4%
Low Volatility	15.1%	13.4%	+1.7%

The strategy demonstrates defensive characteristics, significantly outperforming during bear markets and high-volatility periods.

Transaction Cost Analysis

Cost Component	Annual Impact
Commissions	-0.42%
Spreads	-0.21%
Market Impact	-0.18%
Total Costs	-0.81%
Gross Alpha	3.21%
Net Alpha	2.40%

Ablation Study

Configuration	Sharpe Ratio	Δ
Full AlphaQuant	0.94	-
Without ML timing	0.78	-0.16
Without quality factor	0.86	-0.08
Without momentum	0.72	-0.22
Equal factor weights	0.81	-0.13
No turnover constraint	0.88	-0.06

Project Structure

project-alphaquant/
├── README.md
├── requirements.txt
├── src/
│   ├── __init__.py
│   ├── factors.py           # Factor construction
│   ├── ml_model.py          # XGBoost prediction model
│   ├── optimiser.py         # Portfolio optimisation
│   ├── backtester.py        # Backtesting framework
│   ├── risk_analysis.py     # Risk decomposition
│   └── utils.py             # Data loading utilities
├── data/
│   ├── raw/                 # Original Kaggle dataset
│   ├── processed/           # Processed features
│   └── factors/             # Computed factor scores
├── models/
│   ├── xgboost_model.pkl    # Trained ML model
│   └── config.yaml          # Model configuration
├── outputs/
│   ├── backtest_results/    # Performance metrics
│   ├── visualisations/      # Charts and plots
│   └── portfolios/          # Historical positions
└── docs/
    ├── technical_report.pdf # Full documentation
    └── results.xlsx         # Detailed results

Skills Demonstrated

Quantitative Finance:

Factor model construction (Fama-French methodology)
Portfolio optimisation (mean-variance, risk parity)
Risk decomposition and attribution
Transaction cost analysis

Machine Learning:

Gradient boosting (XGBoost) for time series
Feature engineering for financial data
Cross-validation for non-stationary data
Ensemble methods and model interpretation

Technical Implementation:

Python financial libraries (pandas, numpy, scipy)
Optimisation solvers (cvxpy, quadprog)
Backtesting frameworks
Performance visualisation

Business Analytics:

Risk-adjusted performance measurement
Benchmark comparison and alpha attribution
Regime analysis and conditional performance
Cost-benefit analysis for trading strategies

Getting Started

Prerequisites

pip install -r requirements.txt

Download Dataset

kaggle datasets download -d camnugent/sandp500
unzip sandp500.zip -d data/raw/

Compute Factors

python src/factors.py --input_dir data/raw --output_dir data/factors

Train ML Model

python src/ml_model.py --factors data/factors --output models/xgboost_model.pkl

Run Backtest

python src/backtester.py --start_date 2018-01-01 --end_date 2024-01-01 --output outputs/backtest_results

Generate Report

python src/risk_analysis.py --results outputs/backtest_results --output docs/

Lessons Learned

The most significant insight from this project concerns the relationship between theory and empiricism in quantitative finance. Factor investing has strong theoretical foundations. Value works because investors overpay for growth. Momentum works because information diffuses slowly. Quality works because investors undervalue sustainable profitability. These narratives provide intellectual coherence and psychological comfort during drawdowns.

However, theory alone does not generate alpha. The specific implementation choices, including factor definitions, weighting schemes, rebalancing frequencies, and cost management, ultimately determine whether theoretical premia translate into realised returns. Machine learning augmentation adds another layer: the ability to detect regime changes and adjust exposures dynamically.

The practical implication is that quantitative strategies require continuous refinement. Markets adapt, factors crowd, and implementation details matter enormously. A strategy that worked historically may not work prospectively without ongoing research and adaptation.

"The goal of systematic investing is not to predict the future, but to exploit persistent patterns with appropriate risk management. Machine learning helps identify when patterns are likely to persist and when they are likely to fail."

References

Fama, E., and French, K. (1993). Common Risk Factors in the Returns on Stocks and Bonds. Journal of Financial Economics.
Jegadeesh, N., and Titman, S. (1993). Returns to Buying Winners and Selling Losers. Journal of Finance.
Asness, C., et al. (2019). Quality Minus Junk. Review of Accounting Studies.
Ang, A., et al. (2006). The Cross-Section of Volatility and Expected Returns. Journal of Finance.
Chen, T., and Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. KDD.

Contact

Pablo Williams | MSc Business Analytics, University College London pablowilliams119@gmail.com | LinkedIn

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
AlphaQuant_Technical_Report.pdf		AlphaQuant_Technical_Report.pdf
README.md		README.md
__init__.py		__init__.py
backtester.py		backtester.py
factors.py		factors.py
fig1_cumulative_returns.png		fig1_cumulative_returns.png
fig2_drawdown.png		fig2_drawdown.png
fig3_factor_contribution.png		fig3_factor_contribution.png
fig4_risk_decomposition.png		fig4_risk_decomposition.png
fig5_regime_performance.png		fig5_regime_performance.png
fig6_rolling_sharpe.png		fig6_rolling_sharpe.png
fig7_feature_importance.png		fig7_feature_importance.png
report.pdf		report.pdf
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Project AlphaQuant 📈💹

Multi-Factor Quantitative Trading Strategy with Machine Learning Enhancement

Technical Overview

Why This Project Matters

My Role

Technical Implementation

1. Dataset and Universe

2. Factor Construction

3. Machine Learning Enhancement

4. Portfolio Optimisation

5. Backtesting Framework

Results

Performance Summary (2018-2024)

Factor Contribution Analysis

Risk Decomposition

Regime Analysis

Transaction Cost Analysis

Ablation Study

Project Structure

Skills Demonstrated

Getting Started

Prerequisites

Download Dataset

Compute Factors

Train ML Model

Run Backtest

Generate Report

Lessons Learned

References

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages