Overview

Purpose

Quantitative Finance Lab is a project-based research and engineering program in quantitative finance, organized as end-to-end projects backed by a reusable Python library, quantfinlab.

Each project is designed to show more than a final result. The goal is to demonstrate the full workflow behind a quantitative finance problem:

understand the financial question and why it matters in practice.
develop the mathematical model from first principles.
implement the method in Python against real market data.
evaluate assumptions and diagnostics.
interpret the results.
extract the reusable parts into the quantfinlab package and prove they generalize by re-running the workflow on an independent second market/ data.

The repository is therefore a research notebook collection, a software-engineering exercise, and a reproducibility exercise, all at once.

Design principle

The central design principle is:

The notebooks explain and implement the research, the library turns that implementation into reusable, tested code, and the secondary-market repeat proves the library actually generalizes and can re-implemment the whole project.

A notebook is used when the goal is exploration, derivation, diagnostics, or interpretation. Library code is used once the logic is stable enough to support another dataset, another market, or a later project. The secondary-market step at the end of most notebooks exists specifically to falsify the claim of reusability.

quantfinlab is not presented as a general-purpose finance library that does everything, it’s a focused package for exactly the workflows developed in these projects, and every claim of reuse in this repository is backed by a second run on different data.

How each project is structured

A complete project contains the following parts, in order.

1. Motivation

The notebook opens with the financial problem and why it matters in practice.

2. Mathematical setup

The relevant equations, assumptions, and derivations are introduced and worked through before any code is written. Formulas are derived.

3. Data and implementation

The notebook documents the dataset, cleaning steps, modeling choices, and implementation details needed to go from raw data to a working model.

4. Experiments and diagnostics

The project compares methods, checks assumptions, and studies model behavior using tables, plots, and diagnostic summaries including where the model breaks or behaves unexpectedly.

5. Interpretation

Results are interpreted, not only displayed. See Interpretation standard below.

6. Library implementation

The closing section of all notebooks applies the reusable quantfinlab components extracted from the project to a second, independent dataset or market, using library code only. No notebook-side logic that wasn’t already promoted to the library. This section is the actual evidence that the project’s logic generalizes beyond the one dataset it was built on.

Current project areas

The implemented projects currently span eight areas.

Fixed income

Yield curve construction, discount factors, forward rates, bond pricing, duration, convexity, PV01, key-rate duration, short-rate/term-structure models, and duration-targeted laddering. (Projects 01, 09)

Portfolio construction

Walk-forward allocation, liquidity-based universe selection, expected-return and covariance estimation, mean-variance and frontier optimization, Black–Litterman, risk parity, CVaR and robust optimization, factor investing, regime-aware allocation, network-based portfolio construction, and kelly sizing allocation. (Projects 02, 06, 10, 12, 15, 16, 17, 19, 20)

Risk analysis

Drawdowns, volatility, VaR, expected shortfall, VaR backtesting, stress testing, CAPM beta, rolling risk behavior, and risk contribution/attribution. (Project 03, used in all portfolio projects)

Options, volatility, and hedging

Black–Scholes pricing, put-call parity, quote filtering, implied-volatility solving, analytic and autodiff Greeks, GARCH and realized-volatility forecasting, volatility-surface and local-volatility modeling, stochastic-volatility (Heston/SABR/SVI) and rough-volatility (rBergomi) calibration, dynamic hedge ratios, and hedging P&L analysis. (Projects 04, 05, 07, 08, 11, 13, 14 18)

Numerical and Fourier methods

American option pricing via binomial trees, PSOR finite-difference PDE solvers, and Longstaff–Schwartz Monte Carlo; Fourier/COS-method pricing under affine and jump-diffusion models with the performance-critical kernels written in C++ and bound to Python via pybind11. (Projects 13, 14)

Macro and financial conditions

Financial-conditions index construction from macro and market variables via PCA/factor methods, and macro-aware regime detection. (Projects 12, 16)

Networks

Dependence-network construction (MST, PMFG) over equity data, and network-structure-derived portfolio signals like centrality measures. (Project 17)

Machine learning and reinforcement learning

Return-forecasting with gradient-boosted trees and sequence models, proper out-of-sample evaluation (rank metrics, pinball loss, coverage), fractional-Kelly position sizing, and policy-gradient/actor-critic reinforcement learning (PPO, recurrent PPO, SAC) for direct portfolio allocation. (Projects 19, 20)

What belongs in `quantfinlab`

Code moves into quantfinlab when it is useful beyond the one notebook cell that produced it.

Examples include:

curve construction, bond-pricing, and term-structure utilities
portfolio selection, estimation, optimization, and risk-parity/robust-allocation routines
options pricing, implied-volatility, Greeks, and surface/calibration routines, including the C++-backed numerical kernels;
risk metrics and reporting functions (VaR/ES, drawdown, CAPM, attribution)
volatility forecasting and hedging utilities
dependence-network construction and network-based portfolio signals
ML/RL components — features, evaluation metrics, sequence models, RL environments and policies
a data-loading layer (quantfinlab.dataio) that normalizes every external data source into a consistent schema
plotting and diagnostics used across project sections
shared validation, data-cleaning, and result-formatting utilities (quantfinlab.common).

Exploratory code stays in notebooks. Stable logic moves into the library when it improves clarity, reuse, or makes the secondary-market repeat possible. See quantfinlab/README.md for the full module map and usage examples.

NOTE: a method is implemented manually the first time it’s used in this series. From the second time onward, the library implementation is used instead of re-deriving the same long-form code.

Data and reproducibility

Financial data is, almost without exception, large, licensed, or both — so raw or processed market data is not committed to this repository. This means the notebooks cannot be re-run directly on a fresh clone, data has to be prepared first, using the data/ reproducibility layer that ships with the repository.

The layer works like this:

Each data source has its own folder under data/ with a README.md and either a download.py (fully automatic sources: FRED, NY Fed ACM term premia, Japan MOF JGB yields, yfinance, Kenneth French factor library) or a build.py that processes manually-obtained raw files (OptionsDX option chains, Stooq bulk equity archives) into a clean, notebook-ready panel.
Manual sources require placing the licensed raw files into the matching data/<source>/raw/ folder yourself — the top-level data/README.md lists exactly what file pattern goes where and links to the source.
Running the scripts listed in data/README.md (in the order given there) reproduces every dataset used anywhere in the repository, from primary-market data through every secondary-market repeat.

Evaluation conventions

Projects use consistent evaluation language where possible, by domain.

Portfolio and strategy work: return, volatility, Sharpe ratio, maximum drawdown, turnover, transaction costs, and benchmark comparison.

Risk work: VaR, expected shortfall, exception counts, VaR-backtesting tests (Kupiec/Christoffersen-style), stress windows, risk contribution/attribution, and regime behavior.

Options work: quote filters, parity checks, solver success rates, pricing residuals, Greek consistency, calibration error, uncertainty bands, and hedging P&L attribution.

ML/RL work: out-of-sample rank correlation, pinball loss, prediction-interval coverage, and policy evaluation against rule-based baselines rather than against the training objective alone.

Interpretation standard

The notebooks are intended to be readable research artifacts. A result table or plot is not complete until it is interpreted. A strong result discussion explains:

the main pattern in the output
whether the result is expected or surprising
what assumption is likely driving it
how it changes the financial interpretation
what limitation should be kept in mind when using it.

This matters most in the closing library-implementation sections, where the same workflow is applied to a secondary market or dataset. Those sections should not just show that the code runs. they say whether the transferred workflow behaves similarly to the primary market, where it diverges, and what that divergence implies about the model’s assumptions.

--- title: "Overview" description: "Quantitative Finance Lab project structure, the notebook-to-library workflow, data reproducibility, and interpretation conventions." toc: true --- ## Purpose **Quantitative Finance Lab** is a project-based research and engineering program in quantitative finance, organized as end-to-end projects backed by a reusable Python library, `quantfinlab`. Each project is designed to show more than a final result. The goal is to demonstrate the full workflow behind a quantitative finance problem: - understand the financial question and why it matters in practice. - develop the mathematical model from first principles. - implement the method in Python against real market data. - evaluate assumptions and diagnostics. - interpret the results. - extract the reusable parts into the `quantfinlab` package and prove they generalize by re-running the workflow on an independent second market/ data. The repository is therefore a research notebook collection, a software-engineering exercise, and a reproducibility exercise, all at once. --- ## Design principle The central design principle is: > **The notebooks explain and implement the research, the library turns that implementation into reusable, tested code, and the secondary-market repeat proves the library actually generalizes and can re-implemment the whole project.** A notebook is used when the goal is exploration, derivation, diagnostics, or interpretation. Library code is used once the logic is stable enough to support another dataset, another market, or a later project. The secondary-market step at the end of most notebooks exists specifically to falsify the claim of reusability. `quantfinlab` is not presented as a general-purpose finance library that does everything, it's a focused package for exactly the workflows developed in these projects, and every claim of reuse in this repository is backed by a second run on different data. --- ## How each project is structured A complete project contains the following parts, in order. ### 1. Motivation The notebook opens with the financial problem and why it matters in practice. ### 2. Mathematical setup The relevant equations, assumptions, and derivations are introduced and worked through before any code is written. Formulas are derived. ### 3. Data and implementation The notebook documents the dataset, cleaning steps, modeling choices, and implementation details needed to go from raw data to a working model. ### 4. Experiments and diagnostics The project compares methods, checks assumptions, and studies model behavior using tables, plots, and diagnostic summaries including where the model breaks or behaves unexpectedly. ### 5. Interpretation Results are interpreted, not only displayed. See [Interpretation standard](#interpretation-standard) below. ### 6. Library implementation The closing section of all notebooks applies the reusable `quantfinlab` components extracted from the project to a second, independent dataset or market, using library code only. No notebook-side logic that wasn't already promoted to the library. This section is the actual evidence that the project's logic generalizes beyond the one dataset it was built on. --- ## Current project areas The implemented projects currently span eight areas. ### Fixed income Yield curve construction, discount factors, forward rates, bond pricing, duration, convexity, PV01, key-rate duration, short-rate/term-structure models, and duration-targeted laddering. *(Projects 01, 09)* ### Portfolio construction Walk-forward allocation, liquidity-based universe selection, expected-return and covariance estimation, mean-variance and frontier optimization, Black–Litterman, risk parity, CVaR and robust optimization, factor investing, regime-aware allocation, network-based portfolio construction, and kelly sizing allocation. *(Projects 02, 06, 10, 12, 15, 16, 17, 19, 20)* ### Risk analysis Drawdowns, volatility, VaR, expected shortfall, VaR backtesting, stress testing, CAPM beta, rolling risk behavior, and risk contribution/attribution. *(Project 03, used in all portfolio projects)* ### Options, volatility, and hedging Black–Scholes pricing, put-call parity, quote filtering, implied-volatility solving, analytic and autodiff Greeks, GARCH and realized-volatility forecasting, volatility-surface and local-volatility modeling, stochastic-volatility (Heston/SABR/SVI) and rough-volatility (rBergomi) calibration, dynamic hedge ratios, and hedging P&L analysis. *(Projects 04, 05, 07, 08, 11, 13, 14 18)* ### Numerical and Fourier methods American option pricing via binomial trees, PSOR finite-difference PDE solvers, and Longstaff–Schwartz Monte Carlo; Fourier/COS-method pricing under affine and jump-diffusion models with the performance-critical kernels written in C++ and bound to Python via pybind11. *(Projects 13, 14)* ### Macro and financial conditions Financial-conditions index construction from macro and market variables via PCA/factor methods, and macro-aware regime detection. *(Projects 12, 16)* ### Networks Dependence-network construction (MST, PMFG) over equity data, and network-structure-derived portfolio signals like centrality measures. *(Project 17)* ### Machine learning and reinforcement learning Return-forecasting with gradient-boosted trees and sequence models, proper out-of-sample evaluation (rank metrics, pinball loss, coverage), fractional-Kelly position sizing, and policy-gradient/actor-critic reinforcement learning (PPO, recurrent PPO, SAC) for direct portfolio allocation. *(Projects 19, 20)* --- ## What belongs in `quantfinlab` Code moves into **`quantfinlab`** when it is useful beyond the one notebook cell that produced it. Examples include: - curve construction, bond-pricing, and term-structure utilities - portfolio selection, estimation, optimization, and risk-parity/robust-allocation routines - options pricing, implied-volatility, Greeks, and surface/calibration routines, including the C++-backed numerical kernels; - risk metrics and reporting functions (VaR/ES, drawdown, CAPM, attribution) - volatility forecasting and hedging utilities - dependence-network construction and network-based portfolio signals - ML/RL components — features, evaluation metrics, sequence models, RL environments and policies - a data-loading layer (`quantfinlab.dataio`) that normalizes every external data source into a consistent schema - plotting and diagnostics used across project sections - shared validation, data-cleaning, and result-formatting utilities (`quantfinlab.common`). Exploratory code stays in notebooks. Stable logic moves into the library when it improves clarity, reuse, or makes the secondary-market repeat possible. See [`quantfinlab/README.md`](https://github.com/ramtin-asadi/Quantitative-Finance-Lab/blob/main/quantfinlab/README.md) for the full module map and usage examples. > NOTE: a method is implemented manually the first time it's used in this series. From the second time onward, the library implementation is used instead of re-deriving the same long-form code. --- ## Data and reproducibility Financial data is, almost without exception, large, licensed, or both — so raw or processed market data is not committed to this repository. **This means the notebooks cannot be re-run directly on a fresh clone, data has to be prepared first**, using the `data/` reproducibility layer that ships with the repository. The layer works like this: 1. Each data source has its own folder under `data/` with a `README.md` and either a `download.py` (fully automatic sources: FRED, NY Fed ACM term premia, Japan MOF JGB yields, yfinance, Kenneth French factor library) or a `build.py` that processes manually-obtained raw files (OptionsDX option chains, Stooq bulk equity archives) into a clean, notebook-ready panel. 2. Manual sources require placing the licensed raw files into the matching `data/<source>/raw/` folder yourself — the top-level `data/README.md` lists exactly what file pattern goes where and links to the source. 3. Running the scripts listed in `data/README.md` (in the order given there) reproduces every dataset used anywhere in the repository, from primary-market data through every secondary-market repeat. --- ## Evaluation conventions Projects use consistent evaluation language where possible, by domain. **Portfolio and strategy work**: return, volatility, Sharpe ratio, maximum drawdown, turnover, transaction costs, and benchmark comparison. **Risk work**: VaR, expected shortfall, exception counts, VaR-backtesting tests (Kupiec/Christoffersen-style), stress windows, risk contribution/attribution, and regime behavior. **Options work**: quote filters, parity checks, solver success rates, pricing residuals, Greek consistency, calibration error, uncertainty bands, and hedging P&L attribution. **ML/RL work**: out-of-sample rank correlation, pinball loss, prediction-interval coverage, and policy evaluation against rule-based baselines rather than against the training objective alone. --- ## Interpretation standard {#interpretation-standard} The notebooks are intended to be readable research artifacts. A result table or plot is not complete until it is interpreted. A strong result discussion explains: 1. the main pattern in the output 2. whether the result is expected or surprising 3. what assumption is likely driving it 4. how it changes the financial interpretation 5. what limitation should be kept in mind when using it. This matters most in the closing library-implementation sections, where the same workflow is applied to a secondary market or dataset. Those sections should not just show that the code runs. they say whether the transferred workflow behaves similarly to the primary market, where it diverges, and what that divergence implies about the model's assumptions.