Module 10: Real-World Capstone Project
Learning Objectives
By the end of this module, you will:
- Build a complete end-to-end financial analysis system
- Integrate all skills learned throughout the course
- Create a professional investment research report
- Develop an automated trading strategy with backtesting
- Build an interactive portfolio dashboard
- Present findings in a clear, compelling manner
- Document your work professionally
- Create a portfolio piece for job applications
10.1 Project Overview
The Challenge
You will build a comprehensive quantitative investment system that:
- Analyzes a universe of stocks
- Identifies investment opportunities
- Constructs an optimized portfolio
- Implements risk management
- Backtests performance
- Generates professional reports
This project demonstrates mastery of:
- Data acquisition and cleaning
- Exploratory data analysis
- Statistical testing
- Financial modeling
- Portfolio optimization
- Visualization and reporting
Project Structure
quantitative-investment-system/
│
├── data/
│ ├── raw/ # Downloaded data
│ └── processed/ # Cleaned data
│
├── notebooks/
│ ├── 01_data_collection.ipynb
│ ├── 02_exploratory_analysis.ipynb
│ ├── 03_stock_screening.ipynb
│ ├── 04_portfolio_construction.ipynb
│ └── 05_backtesting.ipynb
│
├── src/
│ ├── data_pipeline.py
│ ├── analysis.py
│ ├── portfolio.py
│ └── visualization.py
│
├── reports/
│ └── investment_report.pdf
│
└── README.md
10.2 Phase 1: Data Collection and Preparation
Setting Up the Data Pipeline
import yfinance as yf
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import os
class DataPipeline:
"""
Comprehensive data pipeline for financial analysis
"""
def __init__(self, data_dir='data'):
self.data_dir = data_dir
self.raw_dir = os.path.join(data_dir, 'raw')
self.processed_dir = os.path.join(data_dir, 'processed')
# Create directories
os.makedirs(self.raw_dir, exist_ok=True)
os.makedirs(self.processed_dir, exist_ok=True)
def download_universe(self, tickers, start_date, end_date):
"""
Download data for entire universe of stocks
"""
print(f"Downloading data for {len(tickers)} tickers...")
all_data = {}
failed_tickers = []
for i, ticker in enumerate(tickers, 1):
try:
print(f" [{i}/{len(tickers)}] {ticker}...", end='')
data = yf.download(ticker, start=start_date, end=end_date,
progress=False)
if not data.empty:
all_data[ticker] = data
print(" ✓")
else:
failed_tickers.append(ticker)
print(" ✗ (no data)")
except Exception as e:
failed_tickers.append(ticker)
print(f" ✗ ({str(e)})")
print(f"\nSuccessfully downloaded: {len(all_data)}/{len(tickers)}")
if failed_tickers:
print(f"Failed: {failed_tickers}")
return all_data
def save_data(self, data_dict, filename):
"""
Save data to disk
"""
filepath = os.path.join(self.raw_dir, filename)
# Combine all data
combined = pd.DataFrame()
for ticker, data in data_dict.items():
data['Ticker'] = ticker
combined = pd.concat([combined, data])
combined.to_csv(filepath)
print(f"Data saved to {filepath}")
return filepath
def load_data(self, filename):
"""
Load data from disk
"""
filepath = os.path.join(self.raw_dir, filename)
data = pd.read_csv(filepath, index_col=0, parse_dates=True)
return data
def clean_data(self, data):
"""
Clean and prepare data
"""
print("Cleaning data...")
# Remove duplicates
data = data[~data.index.duplicated(keep='last')]
# Sort by date
data = data.sort_index()
# Forward fill missing values (up to 5 days)
data = data.fillna(method='ffill', limit=5)
# Remove remaining NaN
data = data.dropna()
print(f"Cleaned data: {len(data)} rows")
return data
def calculate_features(self, data):
"""
Calculate technical and fundamental features
"""
print("Calculating features...")
features = pd.DataFrame(index=data.index)
# Price-based features
features['Close'] = data['Adj Close']
features['Returns'] = data['Adj Close'].pct_change()
features['Log_Returns'] = np.log(data['Adj Close'] / data['Adj Close'].shift(1))
# Moving averages
for period in [20, 50, 200]:
features[f'SMA_{period}'] = data['Adj Close'].rolling(window=period).mean()
features[f'Price_to_SMA_{period}'] = data['Adj Close'] / features[f'SMA_{period}']
# Volatility
features['Volatility_20'] = features['Returns'].rolling(window=20).std()
features['Volatility_60'] = features['Returns'].rolling(window=60).std()
# Momentum
for period in [5, 10, 20, 60]:
features[f'Momentum_{period}'] = data['Adj Close'].pct_change(periods=period)
# Volume features
features['Volume'] = data['Volume']
features['Volume_MA_20'] = data['Volume'].rolling(window=20).mean()
features['Volume_Ratio'] = data['Volume'] / features['Volume_MA_20']
# RSI
delta = data['Adj Close'].diff()
gain = delta.where(delta > 0, 0).rolling(window=14).mean()
loss = -delta.where(delta < 0, 0).rolling(window=14).mean()
rs = gain / loss
features['RSI'] = 100 - (100 / (1 + rs))
# Bollinger Bands
bb_middle = data['Adj Close'].rolling(window=20).mean()
bb_std = data['Adj Close'].rolling(window=20).std()
features['BB_Upper'] = bb_middle + (2 * bb_std)
features['BB_Lower'] = bb_middle - (2 * bb_std)
features['BB_Position'] = (data['Adj Close'] - features['BB_Lower']) / (features['BB_Upper'] - features['BB_Lower'])
print(f"Calculated {len(features.columns)} features")
return features
# Example usage
pipeline = DataPipeline()
# Define universe (example: tech stocks)
universe = [
'AAPL', 'MSFT', 'GOOGL', 'AMZN', 'META', 'NVDA', 'TSLA', 'NFLX',
'AMD', 'INTC', 'CRM', 'ORCL', 'ADBE', 'CSCO', 'AVGO', 'QCOM'
]
# Download data
start_date = '2020-01-01'
end_date = '2024-01-01'
stock_data = pipeline.download_universe(universe, start_date, end_date)
# Save data
pipeline.save_data(stock_data, 'universe_data.csv')
print("\n" + "="*60)
print("Data Collection Complete")
print("="*60)
10.3 Phase 2: Stock Screening and Selection
Building a Quantitative Screening System
import pandas as pd
import numpy as np
from scipy import stats
class StockScreener:
"""
Multi-factor stock screening system
"""
def __init__(self, data_dict):
self.data = data_dict
self.scores = None
def calculate_metrics(self):
"""
Calculate screening metrics for each stock
"""
metrics = {}
for ticker, data in self.data.items():
if len(data) < 252: # Need at least 1 year of data
continue
prices = data['Adj Close']
returns = prices.pct_change().dropna()
# Calculate metrics
metrics[ticker] = {
# Return metrics
'Total_Return': (prices.iloc[-1] / prices.iloc[0] - 1) * 100,
'Annual_Return': ((prices.iloc[-1] / prices.iloc[0]) ** (252/len(prices)) - 1) * 100,
'YTD_Return': (prices.iloc[-1] / prices.iloc[0] - 1) * 100,
# Risk metrics
'Volatility': returns.std() * np.sqrt(252) * 100,
'Downside_Vol': returns[returns < 0].std() * np.sqrt(252) * 100,
'Max_Drawdown': self._calculate_max_drawdown(prices),
# Risk-adjusted metrics
'Sharpe_Ratio': (returns.mean() * 252) / (returns.std() * np.sqrt(252)),
'Sortino_Ratio': (returns.mean() * 252) / (returns[returns < 0].std() * np.sqrt(252)),
# Momentum metrics
'Momentum_1M': (prices.iloc[-1] / prices.iloc[-21] - 1) * 100,
'Momentum_3M': (prices.iloc[-1] / prices.iloc[-63] - 1) * 100,
'Momentum_6M': (prices.iloc[-1] / prices.iloc[-126] - 1) * 100,
# Technical indicators
'RSI': self._calculate_rsi(prices)[-1],
'Price_to_SMA_50': (prices.iloc[-1] / prices.rolling(50).mean().iloc[-1] - 1) * 100,
'Price_to_SMA_200': (prices.iloc[-1] / prices.rolling(200).mean().iloc[-1] - 1) * 100,
# Volume
'Avg_Volume': data['Volume'].mean(),
'Volume_Trend': (data['Volume'].iloc[-20:].mean() / data['Volume'].iloc[-60:-20].mean() - 1) * 100
}
return pd.DataFrame(metrics).T
def _calculate_max_drawdown(self, prices):
"""Calculate maximum drawdown"""
cumulative = (1 + prices.pct_change()).cumprod()
running_max = cumulative.expanding().max()
drawdown = (cumulative - running_max) / running_max
return drawdown.min() * 100
def _calculate_rsi(self, prices, period=14):
"""Calculate RSI"""
delta = prices.diff()
gain = delta.where(delta > 0, 0).rolling(window=period).mean()
loss = -delta.where(delta < 0, 0).rolling(window=period).mean()
rs = gain / loss
return 100 - (100 / (1 + rs))
def rank_stocks(self, metrics_df):
"""
Rank stocks based on multiple factors
"""
scores = pd.DataFrame(index=metrics_df.index)
# Rank each metric (higher is better)
# Positive factors
for col in ['Annual_Return', 'Sharpe_Ratio', 'Momentum_3M', 'Momentum_6M']:
scores[f'{col}_Score'] = metrics_df[col].rank(pct=True)
# Negative factors (invert)
for col in ['Volatility', 'Max_Drawdown']:
scores[f'{col}_Score'] = (1 - metrics_df[col].rank(pct=True))
# Calculate composite score
scores['Composite_Score'] = scores.mean(axis=1) * 100
# Rank by composite score
scores['Rank'] = scores['Composite_Score'].rank(ascending=False)
return scores.sort_values('Composite_Score', ascending=False)
def apply_filters(self, metrics_df, filters):
"""
Apply screening filters
"""
filtered = metrics_df.copy()
for metric, (min_val, max_val) in filters.items():
if min_val is not None:
filtered = filtered[filtered[metric] >= min_val]
if max_val is not None:
filtered = filtered[filtered[metric] <= max_val]
return filtered
def screen(self, filters=None, top_n=10):
"""
Run complete screening process
"""
print("Running stock screening...")
# Calculate metrics
metrics = self.calculate_metrics()
print(f"Analyzed {len(metrics)} stocks")
# Apply filters
if filters:
metrics = self.apply_filters(metrics, filters)
print(f"After filters: {len(metrics)} stocks")
# Rank stocks
scores = self.rank_stocks(metrics)
# Get top stocks
top_stocks = scores.head(top_n)
# Combine metrics and scores
results = pd.concat([metrics.loc[top_stocks.index], scores.loc[top_stocks.index]], axis=1)
return results
# Example usage
screener = StockScreener(stock_data)
# Define filters
filters = {
'Volatility': (None, 40), # Max 40% volatility
'Sharpe_Ratio': (0.5, None), # Min 0.5 Sharpe
'Max_Drawdown': (-30, None), # Max 30% drawdown
'Avg_Volume': (1000000, None) # Minimum liquidity
}
# Run screening
top_stocks = screener.screen(filters=filters, top_n=10)
print("\n" + "="*60)
print("TOP 10 STOCKS")
print("="*60)
print(top_stocks[['Annual_Return', 'Sharpe_Ratio', 'Volatility',
'Max_Drawdown', 'Composite_Score', 'Rank']].to_string())
10.4 Phase 3: Portfolio Construction and Optimization
Building the Optimal Portfolio
from scipy.optimize import minimize
import matplotlib.pyplot as plt
class PortfolioOptimizer:
"""
Portfolio optimization system
"""
def __init__(self, returns_data):
self.returns = returns_data
self.mean_returns = returns_data.mean() * 252
self.cov_matrix = returns_data.cov() * 252
self.num_assets = len(returns_data.columns)
def portfolio_stats(self, weights):
"""Calculate portfolio statistics"""
portfolio_return = np.sum(self.mean_returns * weights)
portfolio_std = np.sqrt(np.dot(weights.T, np.dot(self.cov_matrix, weights)))
sharpe = portfolio_return / portfolio_std if portfolio_std > 0 else 0
return {
'return': portfolio_return,
'volatility': portfolio_std,
'sharpe': sharpe
}
def negative_sharpe(self, weights):
"""Objective function for optimization"""
return -self.portfolio_stats(weights)['sharpe']
def optimize_sharpe(self, constraints=None):
"""Optimize for maximum Sharpe ratio"""
# Constraints
cons = [{'type': 'eq', 'fun': lambda x: np.sum(x) - 1}]
if constraints:
cons.extend(constraints)
# Bounds (0 to 1 for long-only)
bounds = tuple((0, 1) for _ in range(self.num_assets))
# Initial guess (equal weights)
init_weights = np.array([1/self.num_assets] * self.num_assets)
# Optimize
result = minimize(
self.negative_sharpe,
init_weights,
method='SLSQP',
bounds=bounds,
constraints=cons
)
return result.x
def optimize_min_volatility(self, target_return=None):
"""Optimize for minimum volatility"""
def portfolio_volatility(weights):
return self.portfolio_stats(weights)['volatility']
# Constraints
cons = [{'type': 'eq', 'fun': lambda x: np.sum(x) - 1}]
if target_return:
cons.append({
'type': 'eq',
'fun': lambda x: self.portfolio_stats(x)['return'] - target_return
})
bounds = tuple((0, 1) for _ in range(self.num_assets))
init_weights = np.array([1/self.num_assets] * self.num_assets)
result = minimize(
portfolio_volatility,
init_weights,
method='SLSQP',
bounds=bounds,
constraints=cons
)
return result.x
def efficient_frontier(self, num_portfolios=50):
"""Generate efficient frontier"""
# Get min and max returns
min_vol_weights = self.optimize_min_volatility()
max_sharpe_weights = self.optimize_sharpe()
min_ret = self.portfolio_stats(min_vol_weights)['return']
max_ret = self.portfolio_stats(max_sharpe_weights)['return'] * 1.2
target_returns = np.linspace(min_ret, max_ret, num_portfolios)
frontier_portfolios = []
for target in target_returns:
try:
weights = self.optimize_min_volatility(target_return=target)
stats = self.portfolio_stats(weights)
frontier_portfolios.append([
stats['volatility'],
stats['return'],
stats['sharpe'],
weights
])
except:
continue
return np.array(frontier_portfolios, dtype=object)
def visualize_portfolio(self, weights, title="Portfolio Allocation"):
"""Visualize portfolio weights"""
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
# Pie chart
significant_weights = weights[weights > 0.01]
labels = [self.returns.columns[i] for i, w in enumerate(weights) if w > 0.01]
ax1.pie(significant_weights, labels=labels, autopct='%1.1f%%',
startangle=90, textprops={'fontsize': 10})
ax1.set_title(title, fontsize=14, fontweight='bold')
# Bar chart
ax2.bar(range(len(weights)), weights, color='#2E86AB', alpha=0.7, edgecolor='black')
ax2.set_xticks(range(len(weights)))
ax2.set_xticklabels(self.returns.columns, rotation=45, ha='right')
ax2.set_ylabel('Weight', fontsize=12)
ax2.set_title('Portfolio Weights', fontsize=14, fontweight='bold')
ax2.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()
# Example usage
# Get returns for top stocks
top_tickers = top_stocks.index.tolist()[:10]
returns_data = pd.DataFrame()
for ticker in top_tickers:
if ticker in stock_data:
returns_data[ticker] = stock_data[ticker]['Adj Close'].pct_change()
returns_data = returns_data.dropna()
# Optimize portfolio
optimizer = PortfolioOptimizer(returns_data)
# Maximum Sharpe ratio portfolio
max_sharpe_weights = optimizer.optimize_sharpe()
max_sharpe_stats = optimizer.portfolio_stats(max_sharpe_weights)
print("\n" + "="*60)
print("OPTIMAL PORTFOLIO (Maximum Sharpe Ratio)")
print("="*60)
print(f"Expected Return: {max_sharpe_stats['return']*100:.2f}%")
print(f"Volatility: {max_sharpe_stats['volatility']*100:.2f}%")
print(f"Sharpe Ratio: {max_sharpe_stats['sharpe']:.2f}")
print("\nWeights:")
for ticker, weight in zip(top_tickers, max_sharpe_weights):
if weight > 0.01:
print(f" {ticker}: {weight*100:.2f}%")
# Visualize
optimizer.visualize_portfolio(max_sharpe_weights, "Optimal Portfolio (Max Sharpe)")
# Generate efficient frontier
frontier = optimizer.efficient_frontier(50)
# Plot efficient frontier
plt.figure(figsize=(12, 8))
# Plot frontier
vols = [p[0] for p in frontier]
rets = [p[1] for p in frontier]
sharpes = [p[2] for p in frontier]
scatter = plt.scatter(np.array(vols)*100, np.array(rets)*100,
c=sharpes, cmap='viridis', s=50, alpha=0.6)
plt.colorbar(scatter, label='Sharpe Ratio')
# Mark optimal portfolios
plt.scatter(max_sharpe_stats['volatility']*100, max_sharpe_stats['return']*100,
marker='*', s=500, c='red', edgecolors='black',
label='Max Sharpe', zorder=3)
# Individual assets
for ticker in top_tickers:
ret = optimizer.mean_returns[ticker]
vol = np.sqrt(optimizer.cov_matrix.loc[ticker, ticker])
plt.scatter(vol*100, ret*100, marker='o', s=100,
edgecolors='black', linewidth=1.5, label=ticker)
plt.xlabel('Volatility (%)', fontsize=13)
plt.ylabel('Expected Return (%)', fontsize=13)
plt.title('Efficient Frontier', fontsize=16, fontweight='bold', pad=20)
plt.legend(loc='best', fontsize=9)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
10.5 Phase 4: Backtesting and Performance Analysis
Comprehensive Backtesting System
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
class Backtester:
"""
Portfolio backtesting system
"""
def __init__(self, prices, weights, rebalance_frequency='Q'):
self.prices = prices
self.weights = weights
self.rebalance_freq = rebalance_frequency
self.results = None
def run_backtest(self, initial_capital=100000):
"""
Run backtest with periodic rebalancing
"""
print(f"Running backtest...")
print(f"Initial Capital: ${initial_capital:,.0f}")
print(f"Rebalancing: {self.rebalance_freq}")
# Get rebalancing dates
rebal_dates = self.prices.resample(self.rebalance_freq).last().index
# Initialize
portfolio_value = pd.Series(index=self.prices.index, dtype=float)
shares = pd.Series(self.weights * initial_capital / self.prices.iloc[0],
index=self.prices.columns)
trades = []
for date in self.prices.index:
# Calculate portfolio value
current_value = (shares * self.prices.loc[date]).sum()
portfolio_value[date] = current_value
# Rebalance if needed
if date in rebal_dates and date != self.prices.index[0]:
target_values = self.weights * current_value
current_values = shares * self.prices.loc[date]
# Calculate trades
for ticker in self.prices.columns:
current = current_values[ticker]
target = target_values[ticker]
trade = (target - current) / self.prices.loc[date, ticker]
if abs(trade) > 0.1: # Minimum trade size
shares[ticker] += trade
trades.append({
'Date': date,
'Ticker': ticker,
'Shares': trade,
'Price': self.prices.loc[date, ticker],
'Value': trade * self.prices.loc[date, ticker]
})
# Calculate metrics
returns = portfolio_value.pct_change().dropna()
results = {
'portfolio_value': portfolio_value,
'returns': returns,
'trades': pd.DataFrame(trades),
'final_value': portfolio_value.iloc[-1],
'total_return': (portfolio_value.iloc[-1] / initial_capital - 1) * 100,
'cagr': ((portfolio_value.iloc[-1] / initial_capital) **
(252 / len(portfolio_value)) - 1) * 100,
'volatility': returns.std() * np.sqrt(252) * 100,
'sharpe': (returns.mean() * 252) / (returns.std() * np.sqrt(252)),
'max_drawdown': self._calculate_max_drawdown(portfolio_value),
'num_trades': len(trades)
}
self.results = results
return results
def _calculate_max_drawdown(self, portfolio_value):
"""Calculate maximum drawdown"""
cummax = portfolio_value.expanding().max()
drawdown = (portfolio_value - cummax) / cummax
return drawdown.min() * 100
def print_summary(self):
"""Print backtest summary"""
if self.results is None:
print("Run backtest first!")
return
r = self.results
print("\n" + "="*60)
print("BACKTEST RESULTS")
print("="*60)
print(f"Period: {self.prices.index[0].date()} to {self.prices.index[-1].date()}")
print(f"Initial Capital: ${r['portfolio_value'].iloc[0]:,.2f}")
print(f"Final Value: ${r['final_value']:,.2f}")
print(f"\nPerformance:")
print(f" Total Return: {r['total_return']:.2f}%")
print(f" CAGR: {r['cagr']:.2f}%")
print(f" Volatility: {r['volatility']:.2f}%")
print(f" Sharpe Ratio: {r['sharpe']:.2f}")
print(f" Max Drawdown: {r['max_drawdown']:.2f}%")
print(f"\nTrading:")
print(f" Total Trades: {r['num_trades']}")
print(f" Rebalancing Frequency: {self.rebalance_freq}")
print("="*60)
def plot_performance(self):
"""Visualize backtest results"""
if self.results is None:
print("Run backtest first!")
return
fig, axes = plt.subplots(3, 1, figsize=(14, 12), sharex=True)
# Portfolio value
axes[0].plot(self.results['portfolio_value'].index,
self.results['portfolio_value'].values,
linewidth=2, color='#2E86AB')
axes[0].set_ylabel('Portfolio Value ($)', fontsize=11)
axes[0].set_title('Portfolio Value Over Time', fontsize=13, fontweight='bold')
axes[0].grid(True, alpha=0.3)
axes[0].yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'${y:,.0f}'))
# Drawdown
cummax = self.results['portfolio_value'].expanding().max()
drawdown = (self.results['portfolio_value'] - cummax) / cummax * 100
axes[1].fill_between(drawdown.index, 0, drawdown.values,
color='red', alpha=0.3)
axes[1].plot(drawdown.index, drawdown.values,
linewidth=2, color='darkred')
axes[1].set_ylabel('Drawdown (%)', fontsize=11)
axes[1].set_title('Portfolio Drawdown', fontsize=13, fontweight='bold')
axes[1].grid(True, alpha=0.3)
# Rolling Sharpe (252-day)
rolling_sharpe = (
self.results['returns'].rolling(window=252).mean() * 252 /
(self.results['returns'].rolling(window=252).std() * np.sqrt(252))
)
axes[2].plot(rolling_sharpe.index, rolling_sharpe.values,
linewidth=2, color='#06A77D')
axes[2].axhline(y=0, color='red', linestyle='--', linewidth=1, alpha=0.5)
axes[2].set_ylabel('Sharpe Ratio', fontsize=11)
axes[2].set_xlabel('Date', fontsize=11)
axes[2].set_title('Rolling Sharpe Ratio (252-day)', fontsize=13, fontweight='bold')
axes[2].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
def compare_to_benchmark(self, benchmark_prices, benchmark_name='Benchmark'):
"""Compare portfolio to benchmark"""
if self.results is None:
print("Run backtest first!")
return
# Align dates
common_dates = self.results['portfolio_value'].index.intersection(
benchmark_prices.index
)
port_values = self.results['portfolio_value'].loc[common_dates]
bench_values = benchmark_prices.loc[common_dates]
# Normalize to 100
port_norm = (port_values / port_values.iloc[0]) * 100
bench_norm = (bench_values / bench_values.iloc[0]) * 100
# Plot comparison
plt.figure(figsize=(14, 7))
plt.plot(port_norm.index, port_norm.values,
linewidth=2, label='Portfolio', color='#2E86AB')
plt.plot(bench_norm.index, bench_norm.values,
linewidth=2, label=benchmark_name, color='#A23B72')
plt.xlabel('Date', fontsize=12)
plt.ylabel('Growth of $100', fontsize=12)
plt.title('Portfolio vs Benchmark Performance',
fontsize=16, fontweight='bold', pad=20)
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Calculate outperformance
port_return = (port_values.iloc[-1] / port_values.iloc[0] - 1) * 100
bench_return = (bench_values.iloc[-1] / bench_values.iloc[0] - 1) * 100
print(f"\nPerformance Comparison:")
print(f" Portfolio Return: {port_return:.2f}%")
print(f" {benchmark_name} Return: {bench_return:.2f}%")
print(f" Outperformance: {port_return - bench_return:+.2f}%")
# Example usage
# Prepare price data
price_data = pd.DataFrame()
for ticker in top_tickers:
if ticker in stock_data:
price_data[ticker] = stock_data[ticker]['Adj Close']
price_data = price_data.dropna()
# Run backtest
backtester = Backtester(price_data, max_sharpe_weights, rebalance_frequency='Q')
results = backtester.run_backtest(initial_capital=100000)
# Print summary
backtester.print_summary()
# Visualize
backtester.plot_performance()
# Compare to S&P 500
spy_data = yf.download('^GSPC', start=price_data.index[0],
end=price_data.index[-1], progress=False)
backtester.compare_to_benchmark(spy_data['Adj Close'], 'S&P 500')
10.6 Phase 5: Report Generation
Creating a Professional Investment Report
from datetime import datetime
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
class ReportGenerator:
"""
Generate professional investment reports
"""
def __init__(self, portfolio_data, backtest_results, screening_results):
self.portfolio = portfolio_data
self.backtest = backtest_results
self.screening = screening_results
self.timestamp = datetime.now()
def generate_pdf_report(self, filename='investment_report.pdf'):
"""
Generate comprehensive PDF report
"""
print(f"Generating PDF report: {filename}")
with PdfPages(filename) as pdf:
# Page 1: Cover Page
self._create_cover_page(pdf)
# Page 2: Executive Summary
self._create_executive_summary(pdf)
# Page 3: Screening Results
self._create_screening_page(pdf)
# Page 4: Portfolio Construction
self._create_portfolio_page(pdf)
# Page 5: Performance Analysis
self._create_performance_page(pdf)
# Page 6: Risk Analysis
self._create_risk_page(pdf)
print(f"Report saved: {filename}")
def _create_cover_page(self, pdf):
"""Create cover page"""
fig = plt.figure(figsize=(8.5, 11))
fig.text(0.5, 0.7, 'Quantitative Investment Analysis',
ha='center', fontsize=24, fontweight='bold')
fig.text(0.5, 0.65, 'Portfolio Optimization Report',
ha='center', fontsize=18)
fig.text(0.5, 0.5, f'Generated: {self.timestamp.strftime("%B %d, %Y")}',
ha='center', fontsize=12)
fig.text(0.5, 0.3, 'Data Analytics & Python for Finance',
ha='center', fontsize=14, style='italic')
plt.axis('off')
pdf.savefig(fig, bbox_inches='tight')
plt.close()
def _create_executive_summary(self, pdf):
"""Create executive summary page"""
fig, ax = plt.subplots(figsize=(8.5, 11))
ax.axis('off')
summary_text = f"""
EXECUTIVE SUMMARY
{'='*70}
Investment Strategy
• Quantitative screening of tech sector stocks
• Multi-factor ranking system
• Portfolio optimization for maximum risk-adjusted returns
• Quarterly rebalancing
Key Results
• Portfolio Return: {self.backtest['total_return']:.2f}%
• Sharpe Ratio: {self.backtest['sharpe']:.2f}
• Maximum Drawdown: {self.backtest['max_drawdown']:.2f}%
• Number of Holdings: {len([w for w in self.portfolio['weights'] if w > 0.01])}
Recommendation
Based on quantitative analysis, this portfolio demonstrates strong
risk-adjusted returns with controlled downside risk. The diversified
allocation across selected technology stocks provides exposure to
growth while managing volatility.
"""
ax.text(0.1, 0.9, summary_text, transform=ax.transAxes,
fontsize=11, verticalalignment='top', fontfamily='monospace')
pdf.savefig(fig, bbox_inches='tight')
plt.close()
# Additional page creation methods would follow...
# (screening page, portfolio page, performance page, risk page)
# Generate report
# report_gen = ReportGenerator(
# portfolio_data={'weights': max_sharpe_weights, 'tickers': top_tickers},
# backtest_results=results,
# screening_results=top_stocks
# )
# report_gen.generate_pdf_report('my_investment_report.pdf')
10.7 Project Deliverables
Final Checklist
Code
- Data pipeline (collection, cleaning, feature engineering)
- Stock screener with multi-factor ranking
- Portfolio optimizer with constraints
- Backtesting system with rebalancing
- Visualization functions
- All code well-commented and documented
Analysis
- Exploratory data analysis of universe
- Statistical testing of selection criteria
- Correlation and risk analysis
- Factor attribution
- Sensitivity analysis of portfolio
- Comparison to benchmarks
Documentation
- README with project overview
- Code documentation and docstrings
- Jupyter notebooks with analysis
- Professional PDF report
- Presentation slides (optional)
Visualizations
- Stock performance charts
- Correlation heatmaps
- Efficient frontier
- Portfolio allocation charts
- Backtest performance graphs
- Risk metrics visualizations
Module 10 Summary
Congratulations on completing your capstone project!
What You've Built
A complete, professional-grade quantitative investment system that:
- Automatically collects and processes financial data
- Screens stocks using multiple criteria
- Constructs optimized portfolios
- Backtests strategies with realistic assumptions
- Generates institutional-quality reports
Skills Demonstrated
Technical Skills
- Python programming and software design
- Data pipeline development
- Statistical analysis and hypothesis testing
- Financial modeling and valuation
- Portfolio theory and optimization
- Backtesting and performance attribution
Financial Skills
- Stock screening and selection
- Risk management
- Portfolio construction
- Performance measurement
- Comparative analysis
Professional Skills
- Project organization and documentation
- Clear communication of findings
- Report generation
- Reproducible research
This Project is Portfolio-Ready
You now have a complete project you can:
- Showcase to potential employers
- Include in your GitHub portfolio
- Present in interviews
- Extend with additional features
- Use as a foundation for real trading systems
Next Steps
- Refine: Add features like transaction costs, taxes, or alternative strategies
- Extend: Try different asset classes or international markets
- Deploy: Build a web dashboard or automated system
- Share: Write about your methodology and findings
- Learn: Continue to Module 11 for advanced topics
You're Now a Quantitative Analyst
This project demonstrates real quantitative analyst capabilities. You've gone from Python basics to building a complete investment system. That's a remarkable achievement.
The skills you've developed are in high demand at:
- Hedge funds and asset managers
- Investment banks
- Fintech companies
- Corporate finance departments
- Trading firms
Keep building, keep learning, and keep pushing your limits.
Continue to Module 11: Advanced Topics & Next Steps →

