Module 10: Real-World Capstone Project

Learning Objectives

By the end of this module, you will:

Build a complete end-to-end financial analysis system
Integrate all skills learned throughout the course
Create a professional investment research report
Develop an automated trading strategy with backtesting
Build an interactive portfolio dashboard
Present findings in a clear, compelling manner
Document your work professionally
Create a portfolio piece for job applications

10.1 Project Overview

The Challenge

You will build a comprehensive quantitative investment system that:

Analyzes a universe of stocks
Identifies investment opportunities
Constructs an optimized portfolio
Implements risk management
Backtests performance
Generates professional reports

This project demonstrates mastery of:

Data acquisition and cleaning
Exploratory data analysis
Statistical testing
Financial modeling
Portfolio optimization
Visualization and reporting

Project Structure

quantitative-investment-system/
│
├── data/
│   ├── raw/              # Downloaded data
│   └── processed/        # Cleaned data
│
├── notebooks/
│   ├── 01_data_collection.ipynb
│   ├── 02_exploratory_analysis.ipynb
│   ├── 03_stock_screening.ipynb
│   ├── 04_portfolio_construction.ipynb
│   └── 05_backtesting.ipynb
│
├── src/
│   ├── data_pipeline.py
│   ├── analysis.py
│   ├── portfolio.py
│   └── visualization.py
│
├── reports/
│   └── investment_report.pdf
│
└── README.md

10.2 Phase 1: Data Collection and Preparation

Setting Up the Data Pipeline

import yfinance as yf
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import os

class DataPipeline:
    """
    Comprehensive data pipeline for financial analysis
    """
    def __init__(self, data_dir='data'):
        self.data_dir = data_dir
        self.raw_dir = os.path.join(data_dir, 'raw')
        self.processed_dir = os.path.join(data_dir, 'processed')
        
        # Create directories
        os.makedirs(self.raw_dir, exist_ok=True)
        os.makedirs(self.processed_dir, exist_ok=True)
    
    def download_universe(self, tickers, start_date, end_date):
        """
        Download data for entire universe of stocks
        """
        print(f"Downloading data for {len(tickers)} tickers...")
        
        all_data = {}
        failed_tickers = []
        
        for i, ticker in enumerate(tickers, 1):
            try:
                print(f"  [{i}/{len(tickers)}] {ticker}...", end='')
                data = yf.download(ticker, start=start_date, end=end_date, 
                                 progress=False)
                
                if not data.empty:
                    all_data[ticker] = data
                    print(" ✓")
                else:
                    failed_tickers.append(ticker)
                    print(" ✗ (no data)")
                    
            except Exception as e:
                failed_tickers.append(ticker)
                print(f" ✗ ({str(e)})")
        
        print(f"\nSuccessfully downloaded: {len(all_data)}/{len(tickers)}")
        if failed_tickers:
            print(f"Failed: {failed_tickers}")
        
        return all_data
    
    def save_data(self, data_dict, filename):
        """
        Save data to disk
        """
        filepath = os.path.join(self.raw_dir, filename)
        
        # Combine all data
        combined = pd.DataFrame()
        for ticker, data in data_dict.items():
            data['Ticker'] = ticker
            combined = pd.concat([combined, data])
        
        combined.to_csv(filepath)
        print(f"Data saved to {filepath}")
        
        return filepath
    
    def load_data(self, filename):
        """
        Load data from disk
        """
        filepath = os.path.join(self.raw_dir, filename)
        data = pd.read_csv(filepath, index_col=0, parse_dates=True)
        return data
    
    def clean_data(self, data):
        """
        Clean and prepare data
        """
        print("Cleaning data...")
        
        # Remove duplicates
        data = data[~data.index.duplicated(keep='last')]
        
        # Sort by date
        data = data.sort_index()
        
        # Forward fill missing values (up to 5 days)
        data = data.fillna(method='ffill', limit=5)
        
        # Remove remaining NaN
        data = data.dropna()
        
        print(f"Cleaned data: {len(data)} rows")
        
        return data
    
    def calculate_features(self, data):
        """
        Calculate technical and fundamental features
        """
        print("Calculating features...")
        
        features = pd.DataFrame(index=data.index)
        
        # Price-based features
        features['Close'] = data['Adj Close']
        features['Returns'] = data['Adj Close'].pct_change()
        features['Log_Returns'] = np.log(data['Adj Close'] / data['Adj Close'].shift(1))
        
        # Moving averages
        for period in [20, 50, 200]:
            features[f'SMA_{period}'] = data['Adj Close'].rolling(window=period).mean()
            features[f'Price_to_SMA_{period}'] = data['Adj Close'] / features[f'SMA_{period}']
        
        # Volatility
        features['Volatility_20'] = features['Returns'].rolling(window=20).std()
        features['Volatility_60'] = features['Returns'].rolling(window=60).std()
        
        # Momentum
        for period in [5, 10, 20, 60]:
            features[f'Momentum_{period}'] = data['Adj Close'].pct_change(periods=period)
        
        # Volume features
        features['Volume'] = data['Volume']
        features['Volume_MA_20'] = data['Volume'].rolling(window=20).mean()
        features['Volume_Ratio'] = data['Volume'] / features['Volume_MA_20']
        
        # RSI
        delta = data['Adj Close'].diff()
        gain = delta.where(delta > 0, 0).rolling(window=14).mean()
        loss = -delta.where(delta < 0, 0).rolling(window=14).mean()
        rs = gain / loss
        features['RSI'] = 100 - (100 / (1 + rs))
        
        # Bollinger Bands
        bb_middle = data['Adj Close'].rolling(window=20).mean()
        bb_std = data['Adj Close'].rolling(window=20).std()
        features['BB_Upper'] = bb_middle + (2 * bb_std)
        features['BB_Lower'] = bb_middle - (2 * bb_std)
        features['BB_Position'] = (data['Adj Close'] - features['BB_Lower']) / (features['BB_Upper'] - features['BB_Lower'])
        
        print(f"Calculated {len(features.columns)} features")
        
        return features

# Example usage
pipeline = DataPipeline()

# Define universe (example: tech stocks)
universe = [
    'AAPL', 'MSFT', 'GOOGL', 'AMZN', 'META', 'NVDA', 'TSLA', 'NFLX',
    'AMD', 'INTC', 'CRM', 'ORCL', 'ADBE', 'CSCO', 'AVGO', 'QCOM'
]

# Download data
start_date = '2020-01-01'
end_date = '2024-01-01'

stock_data = pipeline.download_universe(universe, start_date, end_date)

# Save data
pipeline.save_data(stock_data, 'universe_data.csv')

print("\n" + "="*60)
print("Data Collection Complete")
print("="*60)

10.3 Phase 2: Stock Screening and Selection

Building a Quantitative Screening System

import pandas as pd
import numpy as np
from scipy import stats

class StockScreener:
    """
    Multi-factor stock screening system
    """
    def __init__(self, data_dict):
        self.data = data_dict
        self.scores = None
    
    def calculate_metrics(self):
        """
        Calculate screening metrics for each stock
        """
        metrics = {}
        
        for ticker, data in self.data.items():
            if len(data) < 252:  # Need at least 1 year of data
                continue
            
            prices = data['Adj Close']
            returns = prices.pct_change().dropna()
            
            # Calculate metrics
            metrics[ticker] = {
                # Return metrics
                'Total_Return': (prices.iloc[-1] / prices.iloc[0] - 1) * 100,
                'Annual_Return': ((prices.iloc[-1] / prices.iloc[0]) ** (252/len(prices)) - 1) * 100,
                'YTD_Return': (prices.iloc[-1] / prices.iloc[0] - 1) * 100,
                
                # Risk metrics
                'Volatility': returns.std() * np.sqrt(252) * 100,
                'Downside_Vol': returns[returns < 0].std() * np.sqrt(252) * 100,
                'Max_Drawdown': self._calculate_max_drawdown(prices),
                
                # Risk-adjusted metrics
                'Sharpe_Ratio': (returns.mean() * 252) / (returns.std() * np.sqrt(252)),
                'Sortino_Ratio': (returns.mean() * 252) / (returns[returns < 0].std() * np.sqrt(252)),
                
                # Momentum metrics
                'Momentum_1M': (prices.iloc[-1] / prices.iloc[-21] - 1) * 100,
                'Momentum_3M': (prices.iloc[-1] / prices.iloc[-63] - 1) * 100,
                'Momentum_6M': (prices.iloc[-1] / prices.iloc[-126] - 1) * 100,
                
                # Technical indicators
                'RSI': self._calculate_rsi(prices)[-1],
                'Price_to_SMA_50': (prices.iloc[-1] / prices.rolling(50).mean().iloc[-1] - 1) * 100,
                'Price_to_SMA_200': (prices.iloc[-1] / prices.rolling(200).mean().iloc[-1] - 1) * 100,
                
                # Volume
                'Avg_Volume': data['Volume'].mean(),
                'Volume_Trend': (data['Volume'].iloc[-20:].mean() / data['Volume'].iloc[-60:-20].mean() - 1) * 100
            }
        
        return pd.DataFrame(metrics).T
    
    def _calculate_max_drawdown(self, prices):
        """Calculate maximum drawdown"""
        cumulative = (1 + prices.pct_change()).cumprod()
        running_max = cumulative.expanding().max()
        drawdown = (cumulative - running_max) / running_max
        return drawdown.min() * 100
    
    def _calculate_rsi(self, prices, period=14):
        """Calculate RSI"""
        delta = prices.diff()
        gain = delta.where(delta > 0, 0).rolling(window=period).mean()
        loss = -delta.where(delta < 0, 0).rolling(window=period).mean()
        rs = gain / loss
        return 100 - (100 / (1 + rs))
    
    def rank_stocks(self, metrics_df):
        """
        Rank stocks based on multiple factors
        """
        scores = pd.DataFrame(index=metrics_df.index)
        
        # Rank each metric (higher is better)
        # Positive factors
        for col in ['Annual_Return', 'Sharpe_Ratio', 'Momentum_3M', 'Momentum_6M']:
            scores[f'{col}_Score'] = metrics_df[col].rank(pct=True)
        
        # Negative factors (invert)
        for col in ['Volatility', 'Max_Drawdown']:
            scores[f'{col}_Score'] = (1 - metrics_df[col].rank(pct=True))
        
        # Calculate composite score
        scores['Composite_Score'] = scores.mean(axis=1) * 100
        
        # Rank by composite score
        scores['Rank'] = scores['Composite_Score'].rank(ascending=False)
        
        return scores.sort_values('Composite_Score', ascending=False)
    
    def apply_filters(self, metrics_df, filters):
        """
        Apply screening filters
        """
        filtered = metrics_df.copy()
        
        for metric, (min_val, max_val) in filters.items():
            if min_val is not None:
                filtered = filtered[filtered[metric] >= min_val]
            if max_val is not None:
                filtered = filtered[filtered[metric] <= max_val]
        
        return filtered
    
    def screen(self, filters=None, top_n=10):
        """
        Run complete screening process
        """
        print("Running stock screening...")
        
        # Calculate metrics
        metrics = self.calculate_metrics()
        print(f"Analyzed {len(metrics)} stocks")
        
        # Apply filters
        if filters:
            metrics = self.apply_filters(metrics, filters)
            print(f"After filters: {len(metrics)} stocks")
        
        # Rank stocks
        scores = self.rank_stocks(metrics)
        
        # Get top stocks
        top_stocks = scores.head(top_n)
        
        # Combine metrics and scores
        results = pd.concat([metrics.loc[top_stocks.index], scores.loc[top_stocks.index]], axis=1)
        
        return results

# Example usage
screener = StockScreener(stock_data)

# Define filters
filters = {
    'Volatility': (None, 40),  # Max 40% volatility
    'Sharpe_Ratio': (0.5, None),  # Min 0.5 Sharpe
    'Max_Drawdown': (-30, None),  # Max 30% drawdown
    'Avg_Volume': (1000000, None)  # Minimum liquidity
}

# Run screening
top_stocks = screener.screen(filters=filters, top_n=10)

print("\n" + "="*60)
print("TOP 10 STOCKS")
print("="*60)
print(top_stocks[['Annual_Return', 'Sharpe_Ratio', 'Volatility', 
                  'Max_Drawdown', 'Composite_Score', 'Rank']].to_string())

10.4 Phase 3: Portfolio Construction and Optimization

Building the Optimal Portfolio

from scipy.optimize import minimize
import matplotlib.pyplot as plt

class PortfolioOptimizer:
    """
    Portfolio optimization system
    """
    def __init__(self, returns_data):
        self.returns = returns_data
        self.mean_returns = returns_data.mean() * 252
        self.cov_matrix = returns_data.cov() * 252
        self.num_assets = len(returns_data.columns)
    
    def portfolio_stats(self, weights):
        """Calculate portfolio statistics"""
        portfolio_return = np.sum(self.mean_returns * weights)
        portfolio_std = np.sqrt(np.dot(weights.T, np.dot(self.cov_matrix, weights)))
        sharpe = portfolio_return / portfolio_std if portfolio_std > 0 else 0
        
        return {
            'return': portfolio_return,
            'volatility': portfolio_std,
            'sharpe': sharpe
        }
    
    def negative_sharpe(self, weights):
        """Objective function for optimization"""
        return -self.portfolio_stats(weights)['sharpe']
    
    def optimize_sharpe(self, constraints=None):
        """Optimize for maximum Sharpe ratio"""
        # Constraints
        cons = [{'type': 'eq', 'fun': lambda x: np.sum(x) - 1}]
        if constraints:
            cons.extend(constraints)
        
        # Bounds (0 to 1 for long-only)
        bounds = tuple((0, 1) for _ in range(self.num_assets))
        
        # Initial guess (equal weights)
        init_weights = np.array([1/self.num_assets] * self.num_assets)
        
        # Optimize
        result = minimize(
            self.negative_sharpe,
            init_weights,
            method='SLSQP',
            bounds=bounds,
            constraints=cons
        )
        
        return result.x
    
    def optimize_min_volatility(self, target_return=None):
        """Optimize for minimum volatility"""
        def portfolio_volatility(weights):
            return self.portfolio_stats(weights)['volatility']
        
        # Constraints
        cons = [{'type': 'eq', 'fun': lambda x: np.sum(x) - 1}]
        
        if target_return:
            cons.append({
                'type': 'eq',
                'fun': lambda x: self.portfolio_stats(x)['return'] - target_return
            })
        
        bounds = tuple((0, 1) for _ in range(self.num_assets))
        init_weights = np.array([1/self.num_assets] * self.num_assets)
        
        result = minimize(
            portfolio_volatility,
            init_weights,
            method='SLSQP',
            bounds=bounds,
            constraints=cons
        )
        
        return result.x
    
    def efficient_frontier(self, num_portfolios=50):
        """Generate efficient frontier"""
        # Get min and max returns
        min_vol_weights = self.optimize_min_volatility()
        max_sharpe_weights = self.optimize_sharpe()
        
        min_ret = self.portfolio_stats(min_vol_weights)['return']
        max_ret = self.portfolio_stats(max_sharpe_weights)['return'] * 1.2
        
        target_returns = np.linspace(min_ret, max_ret, num_portfolios)
        frontier_portfolios = []
        
        for target in target_returns:
            try:
                weights = self.optimize_min_volatility(target_return=target)
                stats = self.portfolio_stats(weights)
                frontier_portfolios.append([
                    stats['volatility'],
                    stats['return'],
                    stats['sharpe'],
                    weights
                ])
            except:
                continue
        
        return np.array(frontier_portfolios, dtype=object)
    
    def visualize_portfolio(self, weights, title="Portfolio Allocation"):
        """Visualize portfolio weights"""
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
        
        # Pie chart
        significant_weights = weights[weights > 0.01]
        labels = [self.returns.columns[i] for i, w in enumerate(weights) if w > 0.01]
        
        ax1.pie(significant_weights, labels=labels, autopct='%1.1f%%',
               startangle=90, textprops={'fontsize': 10})
        ax1.set_title(title, fontsize=14, fontweight='bold')
        
        # Bar chart
        ax2.bar(range(len(weights)), weights, color='#2E86AB', alpha=0.7, edgecolor='black')
        ax2.set_xticks(range(len(weights)))
        ax2.set_xticklabels(self.returns.columns, rotation=45, ha='right')
        ax2.set_ylabel('Weight', fontsize=12)
        ax2.set_title('Portfolio Weights', fontsize=14, fontweight='bold')
        ax2.grid(True, alpha=0.3, axis='y')
        
        plt.tight_layout()
        plt.show()

# Example usage
# Get returns for top stocks
top_tickers = top_stocks.index.tolist()[:10]
returns_data = pd.DataFrame()

for ticker in top_tickers:
    if ticker in stock_data:
        returns_data[ticker] = stock_data[ticker]['Adj Close'].pct_change()

returns_data = returns_data.dropna()

# Optimize portfolio
optimizer = PortfolioOptimizer(returns_data)

# Maximum Sharpe ratio portfolio
max_sharpe_weights = optimizer.optimize_sharpe()
max_sharpe_stats = optimizer.portfolio_stats(max_sharpe_weights)

print("\n" + "="*60)
print("OPTIMAL PORTFOLIO (Maximum Sharpe Ratio)")
print("="*60)
print(f"Expected Return: {max_sharpe_stats['return']*100:.2f}%")
print(f"Volatility: {max_sharpe_stats['volatility']*100:.2f}%")
print(f"Sharpe Ratio: {max_sharpe_stats['sharpe']:.2f}")

print("\nWeights:")
for ticker, weight in zip(top_tickers, max_sharpe_weights):
    if weight > 0.01:
        print(f"  {ticker}: {weight*100:.2f}%")

# Visualize
optimizer.visualize_portfolio(max_sharpe_weights, "Optimal Portfolio (Max Sharpe)")

# Generate efficient frontier
frontier = optimizer.efficient_frontier(50)

# Plot efficient frontier
plt.figure(figsize=(12, 8))

# Plot frontier
vols = [p[0] for p in frontier]
rets = [p[1] for p in frontier]
sharpes = [p[2] for p in frontier]

scatter = plt.scatter(np.array(vols)*100, np.array(rets)*100, 
                     c=sharpes, cmap='viridis', s=50, alpha=0.6)
plt.colorbar(scatter, label='Sharpe Ratio')

# Mark optimal portfolios
plt.scatter(max_sharpe_stats['volatility']*100, max_sharpe_stats['return']*100,
           marker='*', s=500, c='red', edgecolors='black', 
           label='Max Sharpe', zorder=3)

# Individual assets
for ticker in top_tickers:
    ret = optimizer.mean_returns[ticker]
    vol = np.sqrt(optimizer.cov_matrix.loc[ticker, ticker])
    plt.scatter(vol*100, ret*100, marker='o', s=100, 
               edgecolors='black', linewidth=1.5, label=ticker)

plt.xlabel('Volatility (%)', fontsize=13)
plt.ylabel('Expected Return (%)', fontsize=13)
plt.title('Efficient Frontier', fontsize=16, fontweight='bold', pad=20)
plt.legend(loc='best', fontsize=9)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

10.5 Phase 4: Backtesting and Performance Analysis

Comprehensive Backtesting System

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

class Backtester:
    """
    Portfolio backtesting system
    """
    def __init__(self, prices, weights, rebalance_frequency='Q'):
        self.prices = prices
        self.weights = weights
        self.rebalance_freq = rebalance_frequency
        self.results = None
    
    def run_backtest(self, initial_capital=100000):
        """
        Run backtest with periodic rebalancing
        """
        print(f"Running backtest...")
        print(f"Initial Capital: ${initial_capital:,.0f}")
        print(f"Rebalancing: {self.rebalance_freq}")
        
        # Get rebalancing dates
        rebal_dates = self.prices.resample(self.rebalance_freq).last().index
        
        # Initialize
        portfolio_value = pd.Series(index=self.prices.index, dtype=float)
        shares = pd.Series(self.weights * initial_capital / self.prices.iloc[0], 
                          index=self.prices.columns)
        
        trades = []
        
        for date in self.prices.index:
            # Calculate portfolio value
            current_value = (shares * self.prices.loc[date]).sum()
            portfolio_value[date] = current_value
            
            # Rebalance if needed
            if date in rebal_dates and date != self.prices.index[0]:
                target_values = self.weights * current_value
                current_values = shares * self.prices.loc[date]
                
                # Calculate trades
                for ticker in self.prices.columns:
                    current = current_values[ticker]
                    target = target_values[ticker]
                    trade = (target - current) / self.prices.loc[date, ticker]
                    
                    if abs(trade) > 0.1:  # Minimum trade size
                        shares[ticker] += trade
                        trades.append({
                            'Date': date,
                            'Ticker': ticker,
                            'Shares': trade,
                            'Price': self.prices.loc[date, ticker],
                            'Value': trade * self.prices.loc[date, ticker]
                        })
        
        # Calculate metrics
        returns = portfolio_value.pct_change().dropna()
        
        results = {
            'portfolio_value': portfolio_value,
            'returns': returns,
            'trades': pd.DataFrame(trades),
            'final_value': portfolio_value.iloc[-1],
            'total_return': (portfolio_value.iloc[-1] / initial_capital - 1) * 100,
            'cagr': ((portfolio_value.iloc[-1] / initial_capital) ** 
                    (252 / len(portfolio_value)) - 1) * 100,
            'volatility': returns.std() * np.sqrt(252) * 100,
            'sharpe': (returns.mean() * 252) / (returns.std() * np.sqrt(252)),
            'max_drawdown': self._calculate_max_drawdown(portfolio_value),
            'num_trades': len(trades)
        }
        
        self.results = results
        return results
    
    def _calculate_max_drawdown(self, portfolio_value):
        """Calculate maximum drawdown"""
        cummax = portfolio_value.expanding().max()
        drawdown = (portfolio_value - cummax) / cummax
        return drawdown.min() * 100
    
    def print_summary(self):
        """Print backtest summary"""
        if self.results is None:
            print("Run backtest first!")
            return
        
        r = self.results
        
        print("\n" + "="*60)
        print("BACKTEST RESULTS")
        print("="*60)
        print(f"Period: {self.prices.index[0].date()} to {self.prices.index[-1].date()}")
        print(f"Initial Capital: ${r['portfolio_value'].iloc[0]:,.2f}")
        print(f"Final Value: ${r['final_value']:,.2f}")
        print(f"\nPerformance:")
        print(f"  Total Return: {r['total_return']:.2f}%")
        print(f"  CAGR: {r['cagr']:.2f}%")
        print(f"  Volatility: {r['volatility']:.2f}%")
        print(f"  Sharpe Ratio: {r['sharpe']:.2f}")
        print(f"  Max Drawdown: {r['max_drawdown']:.2f}%")
        print(f"\nTrading:")
        print(f"  Total Trades: {r['num_trades']}")
        print(f"  Rebalancing Frequency: {self.rebalance_freq}")
        print("="*60)
    
    def plot_performance(self):
        """Visualize backtest results"""
        if self.results is None:
            print("Run backtest first!")
            return
        
        fig, axes = plt.subplots(3, 1, figsize=(14, 12), sharex=True)
        
        # Portfolio value
        axes[0].plot(self.results['portfolio_value'].index,
                    self.results['portfolio_value'].values,
                    linewidth=2, color='#2E86AB')
        axes[0].set_ylabel('Portfolio Value ($)', fontsize=11)
        axes[0].set_title('Portfolio Value Over Time', fontsize=13, fontweight='bold')
        axes[0].grid(True, alpha=0.3)
        axes[0].yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'${y:,.0f}'))
        
        # Drawdown
        cummax = self.results['portfolio_value'].expanding().max()
        drawdown = (self.results['portfolio_value'] - cummax) / cummax * 100
        
        axes[1].fill_between(drawdown.index, 0, drawdown.values, 
                            color='red', alpha=0.3)
        axes[1].plot(drawdown.index, drawdown.values, 
                    linewidth=2, color='darkred')
        axes[1].set_ylabel('Drawdown (%)', fontsize=11)
        axes[1].set_title('Portfolio Drawdown', fontsize=13, fontweight='bold')
        axes[1].grid(True, alpha=0.3)
        
        # Rolling Sharpe (252-day)
        rolling_sharpe = (
            self.results['returns'].rolling(window=252).mean() * 252 /
            (self.results['returns'].rolling(window=252).std() * np.sqrt(252))
        )
        
        axes[2].plot(rolling_sharpe.index, rolling_sharpe.values,
                    linewidth=2, color='#06A77D')
        axes[2].axhline(y=0, color='red', linestyle='--', linewidth=1, alpha=0.5)
        axes[2].set_ylabel('Sharpe Ratio', fontsize=11)
        axes[2].set_xlabel('Date', fontsize=11)
        axes[2].set_title('Rolling Sharpe Ratio (252-day)', fontsize=13, fontweight='bold')
        axes[2].grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()
    
    def compare_to_benchmark(self, benchmark_prices, benchmark_name='Benchmark'):
        """Compare portfolio to benchmark"""
        if self.results is None:
            print("Run backtest first!")
            return
        
        # Align dates
        common_dates = self.results['portfolio_value'].index.intersection(
            benchmark_prices.index
        )
        
        port_values = self.results['portfolio_value'].loc[common_dates]
        bench_values = benchmark_prices.loc[common_dates]
        
        # Normalize to 100
        port_norm = (port_values / port_values.iloc[0]) * 100
        bench_norm = (bench_values / bench_values.iloc[0]) * 100
        
        # Plot comparison
        plt.figure(figsize=(14, 7))
        
        plt.plot(port_norm.index, port_norm.values, 
                linewidth=2, label='Portfolio', color='#2E86AB')
        plt.plot(bench_norm.index, bench_norm.values, 
                linewidth=2, label=benchmark_name, color='#A23B72')
        
        plt.xlabel('Date', fontsize=12)
        plt.ylabel('Growth of $100', fontsize=12)
        plt.title('Portfolio vs Benchmark Performance', 
                 fontsize=16, fontweight='bold', pad=20)
        plt.legend(fontsize=11)
        plt.grid(True, alpha=0.3)
        plt.tight_layout()
        plt.show()
        
        # Calculate outperformance
        port_return = (port_values.iloc[-1] / port_values.iloc[0] - 1) * 100
        bench_return = (bench_values.iloc[-1] / bench_values.iloc[0] - 1) * 100
        
        print(f"\nPerformance Comparison:")
        print(f"  Portfolio Return: {port_return:.2f}%")
        print(f"  {benchmark_name} Return: {bench_return:.2f}%")
        print(f"  Outperformance: {port_return - bench_return:+.2f}%")

# Example usage
# Prepare price data
price_data = pd.DataFrame()
for ticker in top_tickers:
    if ticker in stock_data:
        price_data[ticker] = stock_data[ticker]['Adj Close']

price_data = price_data.dropna()

# Run backtest
backtester = Backtester(price_data, max_sharpe_weights, rebalance_frequency='Q')
results = backtester.run_backtest(initial_capital=100000)

# Print summary
backtester.print_summary()

# Visualize
backtester.plot_performance()

# Compare to S&P 500
spy_data = yf.download('^GSPC', start=price_data.index[0], 
                       end=price_data.index[-1], progress=False)
backtester.compare_to_benchmark(spy_data['Adj Close'], 'S&P 500')

10.6 Phase 5: Report Generation

Creating a Professional Investment Report

from datetime import datetime
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages

class ReportGenerator:
    """
    Generate professional investment reports
    """
    def __init__(self, portfolio_data, backtest_results, screening_results):
        self.portfolio = portfolio_data
        self.backtest = backtest_results
        self.screening = screening_results
        self.timestamp = datetime.now()
    
    def generate_pdf_report(self, filename='investment_report.pdf'):
        """
        Generate comprehensive PDF report
        """
        print(f"Generating PDF report: {filename}")
        
        with PdfPages(filename) as pdf:
            # Page 1: Cover Page
            self._create_cover_page(pdf)
            
            # Page 2: Executive Summary
            self._create_executive_summary(pdf)
            
            # Page 3: Screening Results
            self._create_screening_page(pdf)
            
            # Page 4: Portfolio Construction
            self._create_portfolio_page(pdf)
            
            # Page 5: Performance Analysis
            self._create_performance_page(pdf)
            
            # Page 6: Risk Analysis
            self._create_risk_page(pdf)
            
            print(f"Report saved: {filename}")
    
    def _create_cover_page(self, pdf):
        """Create cover page"""
        fig = plt.figure(figsize=(8.5, 11))
        fig.text(0.5, 0.7, 'Quantitative Investment Analysis', 
                ha='center', fontsize=24, fontweight='bold')
        fig.text(0.5, 0.65, 'Portfolio Optimization Report',
                ha='center', fontsize=18)
        fig.text(0.5, 0.5, f'Generated: {self.timestamp.strftime("%B %d, %Y")}',
                ha='center', fontsize=12)
        fig.text(0.5, 0.3, 'Data Analytics & Python for Finance',
                ha='center', fontsize=14, style='italic')
        
        plt.axis('off')
        pdf.savefig(fig, bbox_inches='tight')
        plt.close()
    
    def _create_executive_summary(self, pdf):
        """Create executive summary page"""
        fig, ax = plt.subplots(figsize=(8.5, 11))
        ax.axis('off')
        
        summary_text = f"""
EXECUTIVE SUMMARY
{'='*70}

Investment Strategy
• Quantitative screening of tech sector stocks
• Multi-factor ranking system
• Portfolio optimization for maximum risk-adjusted returns
• Quarterly rebalancing

Key Results
• Portfolio Return: {self.backtest['total_return']:.2f}%
• Sharpe Ratio: {self.backtest['sharpe']:.2f}
• Maximum Drawdown: {self.backtest['max_drawdown']:.2f}%
• Number of Holdings: {len([w for w in self.portfolio['weights'] if w > 0.01])}

Recommendation
Based on quantitative analysis, this portfolio demonstrates strong 
risk-adjusted returns with controlled downside risk. The diversified
allocation across selected technology stocks provides exposure to
growth while managing volatility.

        """
        
        ax.text(0.1, 0.9, summary_text, transform=ax.transAxes,
               fontsize=11, verticalalignment='top', fontfamily='monospace')
        
        pdf.savefig(fig, bbox_inches='tight')
        plt.close()
    
    # Additional page creation methods would follow...
    # (screening page, portfolio page, performance page, risk page)

# Generate report
# report_gen = ReportGenerator(
#     portfolio_data={'weights': max_sharpe_weights, 'tickers': top_tickers},
#     backtest_results=results,
#     screening_results=top_stocks
# )
# report_gen.generate_pdf_report('my_investment_report.pdf')

10.7 Project Deliverables

Final Checklist

Code

Data pipeline (collection, cleaning, feature engineering)
Stock screener with multi-factor ranking
Portfolio optimizer with constraints
Backtesting system with rebalancing
Visualization functions
All code well-commented and documented

Analysis

Exploratory data analysis of universe
Statistical testing of selection criteria
Correlation and risk analysis
Factor attribution
Sensitivity analysis of portfolio
Comparison to benchmarks

Documentation

README with project overview
Code documentation and docstrings
Jupyter notebooks with analysis
Professional PDF report
Presentation slides (optional)

Visualizations

Module 10 Summary

Congratulations on completing your capstone project!

What You've Built

A complete, professional-grade quantitative investment system that:

Automatically collects and processes financial data
Screens stocks using multiple criteria
Constructs optimized portfolios
Backtests strategies with realistic assumptions
Generates institutional-quality reports

Skills Demonstrated

Technical Skills

Python programming and software design
Data pipeline development
Statistical analysis and hypothesis testing
Financial modeling and valuation
Portfolio theory and optimization
Backtesting and performance attribution

Financial Skills

Stock screening and selection
Risk management
Portfolio construction
Performance measurement
Comparative analysis

Professional Skills

Project organization and documentation
Clear communication of findings
Report generation
Reproducible research

This Project is Portfolio-Ready

You now have a complete project you can:

Showcase to potential employers
Include in your GitHub portfolio
Present in interviews
Extend with additional features
Use as a foundation for real trading systems

Next Steps

Refine: Add features like transaction costs, taxes, or alternative strategies
Extend: Try different asset classes or international markets
Deploy: Build a web dashboard or automated system
Share: Write about your methodology and findings
Learn: Continue to Module 11 for advanced topics

You're Now a Quantitative Analyst

This project demonstrates real quantitative analyst capabilities. You've gone from Python basics to building a complete investment system. That's a remarkable achievement.

The skills you've developed are in high demand at:

Hedge funds and asset managers
Investment banks
Fintech companies
Corporate finance departments
Trading firms

Keep building, keep learning, and keep pushing your limits.

Continue to Module 11: Advanced Topics & Next Steps →

Module 10: Real-World Capstone Project

Learning Objectives

By the end of this module, you will:

Build a complete end-to-end financial analysis system
Integrate all skills learned throughout the course
Create a professional investment research report
Develop an automated trading strategy with backtesting
Build an interactive portfolio dashboard
Present findings in a clear, compelling manner
Document your work professionally
Create a portfolio piece for job applications

10.1 Project Overview

The Challenge

You will build a comprehensive quantitative investment system that:

Analyzes a universe of stocks
Identifies investment opportunities
Constructs an optimized portfolio
Implements risk management
Backtests performance
Generates professional reports

This project demonstrates mastery of:

Data acquisition and cleaning
Exploratory data analysis
Statistical testing
Financial modeling
Portfolio optimization
Visualization and reporting

Project Structure

quantitative-investment-system/
│
├── data/
│   ├── raw/              # Downloaded data
│   └── processed/        # Cleaned data
│
├── notebooks/
│   ├── 01_data_collection.ipynb
│   ├── 02_exploratory_analysis.ipynb
│   ├── 03_stock_screening.ipynb
│   ├── 04_portfolio_construction.ipynb
│   └── 05_backtesting.ipynb
│
├── src/
│   ├── data_pipeline.py
│   ├── analysis.py
│   ├── portfolio.py
│   └── visualization.py
│
├── reports/
│   └── investment_report.pdf
│
└── README.md

10.2 Phase 1: Data Collection and Preparation

Setting Up the Data Pipeline

import yfinance as yf
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import os

class DataPipeline:
    """
    Comprehensive data pipeline for financial analysis
    """
    def __init__(self, data_dir='data'):
        self.data_dir = data_dir
        self.raw_dir = os.path.join(data_dir, 'raw')
        self.processed_dir = os.path.join(data_dir, 'processed')
        
        # Create directories
        os.makedirs(self.raw_dir, exist_ok=True)
        os.makedirs(self.processed_dir, exist_ok=True)
    
    def download_universe(self, tickers, start_date, end_date):
        """
        Download data for entire universe of stocks
        """
        print(f"Downloading data for {len(tickers)} tickers...")
        
        all_data = {}
        failed_tickers = []
        
        for i, ticker in enumerate(tickers, 1):
            try:
                print(f"  [{i}/{len(tickers)}] {ticker}...", end='')
                data = yf.download(ticker, start=start_date, end=end_date, 
                                 progress=False)
                
                if not data.empty:
                    all_data[ticker] = data
                    print(" ✓")
                else:
                    failed_tickers.append(ticker)
                    print(" ✗ (no data)")
                    
            except Exception as e:
                failed_tickers.append(ticker)
                print(f" ✗ ({str(e)})")
        
        print(f"\nSuccessfully downloaded: {len(all_data)}/{len(tickers)}")
        if failed_tickers:
            print(f"Failed: {failed_tickers}")
        
        return all_data
    
    def save_data(self, data_dict, filename):
        """
        Save data to disk
        """
        filepath = os.path.join(self.raw_dir, filename)
        
        # Combine all data
        combined = pd.DataFrame()
        for ticker, data in data_dict.items():
            data['Ticker'] = ticker
            combined = pd.concat([combined, data])
        
        combined.to_csv(filepath)
        print(f"Data saved to {filepath}")
        
        return filepath
    
    def load_data(self, filename):
        """
        Load data from disk
        """
        filepath = os.path.join(self.raw_dir, filename)
        data = pd.read_csv(filepath, index_col=0, parse_dates=True)
        return data
    
    def clean_data(self, data):
        """
        Clean and prepare data
        """
        print("Cleaning data...")
        
        # Remove duplicates
        data = data[~data.index.duplicated(keep='last')]
        
        # Sort by date
        data = data.sort_index()
        
        # Forward fill missing values (up to 5 days)
        data = data.fillna(method='ffill', limit=5)
        
        # Remove remaining NaN
        data = data.dropna()
        
        print(f"Cleaned data: {len(data)} rows")
        
        return data
    
    def calculate_features(self, data):
        """
        Calculate technical and fundamental features
        """
        print("Calculating features...")
        
        features = pd.DataFrame(index=data.index)
        
        # Price-based features
        features['Close'] = data['Adj Close']
        features['Returns'] = data['Adj Close'].pct_change()
        features['Log_Returns'] = np.log(data['Adj Close'] / data['Adj Close'].shift(1))
        
        # Moving averages
        for period in [20, 50, 200]:
            features[f'SMA_{period}'] = data['Adj Close'].rolling(window=period).mean()
            features[f'Price_to_SMA_{period}'] = data['Adj Close'] / features[f'SMA_{period}']
        
        # Volatility
        features['Volatility_20'] = features['Returns'].rolling(window=20).std()
        features['Volatility_60'] = features['Returns'].rolling(window=60).std()
        
        # Momentum
        for period in [5, 10, 20, 60]:
            features[f'Momentum_{period}'] = data['Adj Close'].pct_change(periods=period)
        
        # Volume features
        features['Volume'] = data['Volume']
        features['Volume_MA_20'] = data['Volume'].rolling(window=20).mean()
        features['Volume_Ratio'] = data['Volume'] / features['Volume_MA_20']
        
        # RSI
        delta = data['Adj Close'].diff()
        gain = delta.where(delta > 0, 0).rolling(window=14).mean()
        loss = -delta.where(delta < 0, 0).rolling(window=14).mean()
        rs = gain / loss
        features['RSI'] = 100 - (100 / (1 + rs))
        
        # Bollinger Bands
        bb_middle = data['Adj Close'].rolling(window=20).mean()
        bb_std = data['Adj Close'].rolling(window=20).std()
        features['BB_Upper'] = bb_middle + (2 * bb_std)
        features['BB_Lower'] = bb_middle - (2 * bb_std)
        features['BB_Position'] = (data['Adj Close'] - features['BB_Lower']) / (features['BB_Upper'] - features['BB_Lower'])
        
        print(f"Calculated {len(features.columns)} features")
        
        return features

# Example usage
pipeline = DataPipeline()

# Define universe (example: tech stocks)
universe = [
    'AAPL', 'MSFT', 'GOOGL', 'AMZN', 'META', 'NVDA', 'TSLA', 'NFLX',
    'AMD', 'INTC', 'CRM', 'ORCL', 'ADBE', 'CSCO', 'AVGO', 'QCOM'
]

# Download data
start_date = '2020-01-01'
end_date = '2024-01-01'

stock_data = pipeline.download_universe(universe, start_date, end_date)

# Save data
pipeline.save_data(stock_data, 'universe_data.csv')

print("\n" + "="*60)
print("Data Collection Complete")
print("="*60)

10.3 Phase 2: Stock Screening and Selection

Building a Quantitative Screening System

import pandas as pd
import numpy as np
from scipy import stats

class StockScreener:
    """
    Multi-factor stock screening system
    """
    def __init__(self, data_dict):
        self.data = data_dict
        self.scores = None
    
    def calculate_metrics(self):
        """
        Calculate screening metrics for each stock
        """
        metrics = {}
        
        for ticker, data in self.data.items():
            if len(data) < 252:  # Need at least 1 year of data
                continue
            
            prices = data['Adj Close']
            returns = prices.pct_change().dropna()
            
            # Calculate metrics
            metrics[ticker] = {
                # Return metrics
                'Total_Return': (prices.iloc[-1] / prices.iloc[0] - 1) * 100,
                'Annual_Return': ((prices.iloc[-1] / prices.iloc[0]) ** (252/len(prices)) - 1) * 100,
                'YTD_Return': (prices.iloc[-1] / prices.iloc[0] - 1) * 100,
                
                # Risk metrics
                'Volatility': returns.std() * np.sqrt(252) * 100,
                'Downside_Vol': returns[returns < 0].std() * np.sqrt(252) * 100,
                'Max_Drawdown': self._calculate_max_drawdown(prices),
                
                # Risk-adjusted metrics
                'Sharpe_Ratio': (returns.mean() * 252) / (returns.std() * np.sqrt(252)),
                'Sortino_Ratio': (returns.mean() * 252) / (returns[returns < 0].std() * np.sqrt(252)),
                
                # Momentum metrics
                'Momentum_1M': (prices.iloc[-1] / prices.iloc[-21] - 1) * 100,
                'Momentum_3M': (prices.iloc[-1] / prices.iloc[-63] - 1) * 100,
                'Momentum_6M': (prices.iloc[-1] / prices.iloc[-126] - 1) * 100,
                
                # Technical indicators
                'RSI': self._calculate_rsi(prices)[-1],
                'Price_to_SMA_50': (prices.iloc[-1] / prices.rolling(50).mean().iloc[-1] - 1) * 100,
                'Price_to_SMA_200': (prices.iloc[-1] / prices.rolling(200).mean().iloc[-1] - 1) * 100,
                
                # Volume
                'Avg_Volume': data['Volume'].mean(),
                'Volume_Trend': (data['Volume'].iloc[-20:].mean() / data['Volume'].iloc[-60:-20].mean() - 1) * 100
            }
        
        return pd.DataFrame(metrics).T
    
    def _calculate_max_drawdown(self, prices):
        """Calculate maximum drawdown"""
        cumulative = (1 + prices.pct_change()).cumprod()
        running_max = cumulative.expanding().max()
        drawdown = (cumulative - running_max) / running_max
        return drawdown.min() * 100
    
    def _calculate_rsi(self, prices, period=14):
        """Calculate RSI"""
        delta = prices.diff()
        gain = delta.where(delta > 0, 0).rolling(window=period).mean()
        loss = -delta.where(delta < 0, 0).rolling(window=period).mean()
        rs = gain / loss
        return 100 - (100 / (1 + rs))
    
    def rank_stocks(self, metrics_df):
        """
        Rank stocks based on multiple factors
        """
        scores = pd.DataFrame(index=metrics_df.index)
        
        # Rank each metric (higher is better)
        # Positive factors
        for col in ['Annual_Return', 'Sharpe_Ratio', 'Momentum_3M', 'Momentum_6M']:
            scores[f'{col}_Score'] = metrics_df[col].rank(pct=True)
        
        # Negative factors (invert)
        for col in ['Volatility', 'Max_Drawdown']:
            scores[f'{col}_Score'] = (1 - metrics_df[col].rank(pct=True))
        
        # Calculate composite score
        scores['Composite_Score'] = scores.mean(axis=1) * 100
        
        # Rank by composite score
        scores['Rank'] = scores['Composite_Score'].rank(ascending=False)
        
        return scores.sort_values('Composite_Score', ascending=False)
    
    def apply_filters(self, metrics_df, filters):
        """
        Apply screening filters
        """
        filtered = metrics_df.copy()
        
        for metric, (min_val, max_val) in filters.items():
            if min_val is not None:
                filtered = filtered[filtered[metric] >= min_val]
            if max_val is not None:
                filtered = filtered[filtered[metric] <= max_val]
        
        return filtered
    
    def screen(self, filters=None, top_n=10):
        """
        Run complete screening process
        """
        print("Running stock screening...")
        
        # Calculate metrics
        metrics = self.calculate_metrics()
        print(f"Analyzed {len(metrics)} stocks")
        
        # Apply filters
        if filters:
            metrics = self.apply_filters(metrics, filters)
            print(f"After filters: {len(metrics)} stocks")
        
        # Rank stocks
        scores = self.rank_stocks(metrics)
        
        # Get top stocks
        top_stocks = scores.head(top_n)
        
        # Combine metrics and scores
        results = pd.concat([metrics.loc[top_stocks.index], scores.loc[top_stocks.index]], axis=1)
        
        return results

# Example usage
screener = StockScreener(stock_data)

# Define filters
filters = {
    'Volatility': (None, 40),  # Max 40% volatility
    'Sharpe_Ratio': (0.5, None),  # Min 0.5 Sharpe
    'Max_Drawdown': (-30, None),  # Max 30% drawdown
    'Avg_Volume': (1000000, None)  # Minimum liquidity
}

# Run screening
top_stocks = screener.screen(filters=filters, top_n=10)

print("\n" + "="*60)
print("TOP 10 STOCKS")
print("="*60)
print(top_stocks[['Annual_Return', 'Sharpe_Ratio', 'Volatility', 
                  'Max_Drawdown', 'Composite_Score', 'Rank']].to_string())

10.4 Phase 3: Portfolio Construction and Optimization

Building the Optimal Portfolio

from scipy.optimize import minimize
import matplotlib.pyplot as plt

class PortfolioOptimizer:
    """
    Portfolio optimization system
    """
    def __init__(self, returns_data):
        self.returns = returns_data
        self.mean_returns = returns_data.mean() * 252
        self.cov_matrix = returns_data.cov() * 252
        self.num_assets = len(returns_data.columns)
    
    def portfolio_stats(self, weights):
        """Calculate portfolio statistics"""
        portfolio_return = np.sum(self.mean_returns * weights)
        portfolio_std = np.sqrt(np.dot(weights.T, np.dot(self.cov_matrix, weights)))
        sharpe = portfolio_return / portfolio_std if portfolio_std > 0 else 0
        
        return {
            'return': portfolio_return,
            'volatility': portfolio_std,
            'sharpe': sharpe
        }
    
    def negative_sharpe(self, weights):
        """Objective function for optimization"""
        return -self.portfolio_stats(weights)['sharpe']
    
    def optimize_sharpe(self, constraints=None):
        """Optimize for maximum Sharpe ratio"""
        # Constraints
        cons = [{'type': 'eq', 'fun': lambda x: np.sum(x) - 1}]
        if constraints:
            cons.extend(constraints)
        
        # Bounds (0 to 1 for long-only)
        bounds = tuple((0, 1) for _ in range(self.num_assets))
        
        # Initial guess (equal weights)
        init_weights = np.array([1/self.num_assets] * self.num_assets)
        
        # Optimize
        result = minimize(
            self.negative_sharpe,
            init_weights,
            method='SLSQP',
            bounds=bounds,
            constraints=cons
        )
        
        return result.x
    
    def optimize_min_volatility(self, target_return=None):
        """Optimize for minimum volatility"""
        def portfolio_volatility(weights):
            return self.portfolio_stats(weights)['volatility']
        
        # Constraints
        cons = [{'type': 'eq', 'fun': lambda x: np.sum(x) - 1}]
        
        if target_return:
            cons.append({
                'type': 'eq',
                'fun': lambda x: self.portfolio_stats(x)['return'] - target_return
            })
        
        bounds = tuple((0, 1) for _ in range(self.num_assets))
        init_weights = np.array([1/self.num_assets] * self.num_assets)
        
        result = minimize(
            portfolio_volatility,
            init_weights,
            method='SLSQP',
            bounds=bounds,
            constraints=cons
        )
        
        return result.x
    
    def efficient_frontier(self, num_portfolios=50):
        """Generate efficient frontier"""
        # Get min and max returns
        min_vol_weights = self.optimize_min_volatility()
        max_sharpe_weights = self.optimize_sharpe()
        
        min_ret = self.portfolio_stats(min_vol_weights)['return']
        max_ret = self.portfolio_stats(max_sharpe_weights)['return'] * 1.2
        
        target_returns = np.linspace(min_ret, max_ret, num_portfolios)
        frontier_portfolios = []
        
        for target in target_returns:
            try:
                weights = self.optimize_min_volatility(target_return=target)
                stats = self.portfolio_stats(weights)
                frontier_portfolios.append([
                    stats['volatility'],
                    stats['return'],
                    stats['sharpe'],
                    weights
                ])
            except:
                continue
        
        return np.array(frontier_portfolios, dtype=object)
    
    def visualize_portfolio(self, weights, title="Portfolio Allocation"):
        """Visualize portfolio weights"""
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
        
        # Pie chart
        significant_weights = weights[weights > 0.01]
        labels = [self.returns.columns[i] for i, w in enumerate(weights) if w > 0.01]
        
        ax1.pie(significant_weights, labels=labels, autopct='%1.1f%%',
               startangle=90, textprops={'fontsize': 10})
        ax1.set_title(title, fontsize=14, fontweight='bold')
        
        # Bar chart
        ax2.bar(range(len(weights)), weights, color='#2E86AB', alpha=0.7, edgecolor='black')
        ax2.set_xticks(range(len(weights)))
        ax2.set_xticklabels(self.returns.columns, rotation=45, ha='right')
        ax2.set_ylabel('Weight', fontsize=12)
        ax2.set_title('Portfolio Weights', fontsize=14, fontweight='bold')
        ax2.grid(True, alpha=0.3, axis='y')
        
        plt.tight_layout()
        plt.show()

# Example usage
# Get returns for top stocks
top_tickers = top_stocks.index.tolist()[:10]
returns_data = pd.DataFrame()

for ticker in top_tickers:
    if ticker in stock_data:
        returns_data[ticker] = stock_data[ticker]['Adj Close'].pct_change()

returns_data = returns_data.dropna()

# Optimize portfolio
optimizer = PortfolioOptimizer(returns_data)

# Maximum Sharpe ratio portfolio
max_sharpe_weights = optimizer.optimize_sharpe()
max_sharpe_stats = optimizer.portfolio_stats(max_sharpe_weights)

print("\n" + "="*60)
print("OPTIMAL PORTFOLIO (Maximum Sharpe Ratio)")
print("="*60)
print(f"Expected Return: {max_sharpe_stats['return']*100:.2f}%")
print(f"Volatility: {max_sharpe_stats['volatility']*100:.2f}%")
print(f"Sharpe Ratio: {max_sharpe_stats['sharpe']:.2f}")

print("\nWeights:")
for ticker, weight in zip(top_tickers, max_sharpe_weights):
    if weight > 0.01:
        print(f"  {ticker}: {weight*100:.2f}%")

# Visualize
optimizer.visualize_portfolio(max_sharpe_weights, "Optimal Portfolio (Max Sharpe)")

# Generate efficient frontier
frontier = optimizer.efficient_frontier(50)

# Plot efficient frontier
plt.figure(figsize=(12, 8))

# Plot frontier
vols = [p[0] for p in frontier]
rets = [p[1] for p in frontier]
sharpes = [p[2] for p in frontier]

scatter = plt.scatter(np.array(vols)*100, np.array(rets)*100, 
                     c=sharpes, cmap='viridis', s=50, alpha=0.6)
plt.colorbar(scatter, label='Sharpe Ratio')

# Mark optimal portfolios
plt.scatter(max_sharpe_stats['volatility']*100, max_sharpe_stats['return']*100,
           marker='*', s=500, c='red', edgecolors='black', 
           label='Max Sharpe', zorder=3)

# Individual assets
for ticker in top_tickers:
    ret = optimizer.mean_returns[ticker]
    vol = np.sqrt(optimizer.cov_matrix.loc[ticker, ticker])
    plt.scatter(vol*100, ret*100, marker='o', s=100, 
               edgecolors='black', linewidth=1.5, label=ticker)

plt.xlabel('Volatility (%)', fontsize=13)
plt.ylabel('Expected Return (%)', fontsize=13)
plt.title('Efficient Frontier', fontsize=16, fontweight='bold', pad=20)
plt.legend(loc='best', fontsize=9)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

10.5 Phase 4: Backtesting and Performance Analysis

Comprehensive Backtesting System

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

class Backtester:
    """
    Portfolio backtesting system
    """
    def __init__(self, prices, weights, rebalance_frequency='Q'):
        self.prices = prices
        self.weights = weights
        self.rebalance_freq = rebalance_frequency
        self.results = None
    
    def run_backtest(self, initial_capital=100000):
        """
        Run backtest with periodic rebalancing
        """
        print(f"Running backtest...")
        print(f"Initial Capital: ${initial_capital:,.0f}")
        print(f"Rebalancing: {self.rebalance_freq}")
        
        # Get rebalancing dates
        rebal_dates = self.prices.resample(self.rebalance_freq).last().index
        
        # Initialize
        portfolio_value = pd.Series(index=self.prices.index, dtype=float)
        shares = pd.Series(self.weights * initial_capital / self.prices.iloc[0], 
                          index=self.prices.columns)
        
        trades = []
        
        for date in self.prices.index:
            # Calculate portfolio value
            current_value = (shares * self.prices.loc[date]).sum()
            portfolio_value[date] = current_value
            
            # Rebalance if needed
            if date in rebal_dates and date != self.prices.index[0]:
                target_values = self.weights * current_value
                current_values = shares * self.prices.loc[date]
                
                # Calculate trades
                for ticker in self.prices.columns:
                    current = current_values[ticker]
                    target = target_values[ticker]
                    trade = (target - current) / self.prices.loc[date, ticker]
                    
                    if abs(trade) > 0.1:  # Minimum trade size
                        shares[ticker] += trade
                        trades.append({
                            'Date': date,
                            'Ticker': ticker,
                            'Shares': trade,
                            'Price': self.prices.loc[date, ticker],
                            'Value': trade * self.prices.loc[date, ticker]
                        })
        
        # Calculate metrics
        returns = portfolio_value.pct_change().dropna()
        
        results = {
            'portfolio_value': portfolio_value,
            'returns': returns,
            'trades': pd.DataFrame(trades),
            'final_value': portfolio_value.iloc[-1],
            'total_return': (portfolio_value.iloc[-1] / initial_capital - 1) * 100,
            'cagr': ((portfolio_value.iloc[-1] / initial_capital) ** 
                    (252 / len(portfolio_value)) - 1) * 100,
            'volatility': returns.std() * np.sqrt(252) * 100,
            'sharpe': (returns.mean() * 252) / (returns.std() * np.sqrt(252)),
            'max_drawdown': self._calculate_max_drawdown(portfolio_value),
            'num_trades': len(trades)
        }
        
        self.results = results
        return results
    
    def _calculate_max_drawdown(self, portfolio_value):
        """Calculate maximum drawdown"""
        cummax = portfolio_value.expanding().max()
        drawdown = (portfolio_value - cummax) / cummax
        return drawdown.min() * 100
    
    def print_summary(self):
        """Print backtest summary"""
        if self.results is None:
            print("Run backtest first!")
            return
        
        r = self.results
        
        print("\n" + "="*60)
        print("BACKTEST RESULTS")
        print("="*60)
        print(f"Period: {self.prices.index[0].date()} to {self.prices.index[-1].date()}")
        print(f"Initial Capital: ${r['portfolio_value'].iloc[0]:,.2f}")
        print(f"Final Value: ${r['final_value']:,.2f}")
        print(f"\nPerformance:")
        print(f"  Total Return: {r['total_return']:.2f}%")
        print(f"  CAGR: {r['cagr']:.2f}%")
        print(f"  Volatility: {r['volatility']:.2f}%")
        print(f"  Sharpe Ratio: {r['sharpe']:.2f}")
        print(f"  Max Drawdown: {r['max_drawdown']:.2f}%")
        print(f"\nTrading:")
        print(f"  Total Trades: {r['num_trades']}")
        print(f"  Rebalancing Frequency: {self.rebalance_freq}")
        print("="*60)
    
    def plot_performance(self):
        """Visualize backtest results"""
        if self.results is None:
            print("Run backtest first!")
            return
        
        fig, axes = plt.subplots(3, 1, figsize=(14, 12), sharex=True)
        
        # Portfolio value
        axes[0].plot(self.results['portfolio_value'].index,
                    self.results['portfolio_value'].values,
                    linewidth=2, color='#2E86AB')
        axes[0].set_ylabel('Portfolio Value ($)', fontsize=11)
        axes[0].set_title('Portfolio Value Over Time', fontsize=13, fontweight='bold')
        axes[0].grid(True, alpha=0.3)
        axes[0].yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'${y:,.0f}'))
        
        # Drawdown
        cummax = self.results['portfolio_value'].expanding().max()
        drawdown = (self.results['portfolio_value'] - cummax) / cummax * 100
        
        axes[1].fill_between(drawdown.index, 0, drawdown.values, 
                            color='red', alpha=0.3)
        axes[1].plot(drawdown.index, drawdown.values, 
                    linewidth=2, color='darkred')
        axes[1].set_ylabel('Drawdown (%)', fontsize=11)
        axes[1].set_title('Portfolio Drawdown', fontsize=13, fontweight='bold')
        axes[1].grid(True, alpha=0.3)
        
        # Rolling Sharpe (252-day)
        rolling_sharpe = (
            self.results['returns'].rolling(window=252).mean() * 252 /
            (self.results['returns'].rolling(window=252).std() * np.sqrt(252))
        )
        
        axes[2].plot(rolling_sharpe.index, rolling_sharpe.values,
                    linewidth=2, color='#06A77D')
        axes[2].axhline(y=0, color='red', linestyle='--', linewidth=1, alpha=0.5)
        axes[2].set_ylabel('Sharpe Ratio', fontsize=11)
        axes[2].set_xlabel('Date', fontsize=11)
        axes[2].set_title('Rolling Sharpe Ratio (252-day)', fontsize=13, fontweight='bold')
        axes[2].grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()
    
    def compare_to_benchmark(self, benchmark_prices, benchmark_name='Benchmark'):
        """Compare portfolio to benchmark"""
        if self.results is None:
            print("Run backtest first!")
            return
        
        # Align dates
        common_dates = self.results['portfolio_value'].index.intersection(
            benchmark_prices.index
        )
        
        port_values = self.results['portfolio_value'].loc[common_dates]
        bench_values = benchmark_prices.loc[common_dates]
        
        # Normalize to 100
        port_norm = (port_values / port_values.iloc[0]) * 100
        bench_norm = (bench_values / bench_values.iloc[0]) * 100
        
        # Plot comparison
        plt.figure(figsize=(14, 7))
        
        plt.plot(port_norm.index, port_norm.values, 
                linewidth=2, label='Portfolio', color='#2E86AB')
        plt.plot(bench_norm.index, bench_norm.values, 
                linewidth=2, label=benchmark_name, color='#A23B72')
        
        plt.xlabel('Date', fontsize=12)
        plt.ylabel('Growth of $100', fontsize=12)
        plt.title('Portfolio vs Benchmark Performance', 
                 fontsize=16, fontweight='bold', pad=20)
        plt.legend(fontsize=11)
        plt.grid(True, alpha=0.3)
        plt.tight_layout()
        plt.show()
        
        # Calculate outperformance
        port_return = (port_values.iloc[-1] / port_values.iloc[0] - 1) * 100
        bench_return = (bench_values.iloc[-1] / bench_values.iloc[0] - 1) * 100
        
        print(f"\nPerformance Comparison:")
        print(f"  Portfolio Return: {port_return:.2f}%")
        print(f"  {benchmark_name} Return: {bench_return:.2f}%")
        print(f"  Outperformance: {port_return - bench_return:+.2f}%")

# Example usage
# Prepare price data
price_data = pd.DataFrame()
for ticker in top_tickers:
    if ticker in stock_data:
        price_data[ticker] = stock_data[ticker]['Adj Close']

price_data = price_data.dropna()

# Run backtest
backtester = Backtester(price_data, max_sharpe_weights, rebalance_frequency='Q')
results = backtester.run_backtest(initial_capital=100000)

# Print summary
backtester.print_summary()

# Visualize
backtester.plot_performance()

# Compare to S&P 500
spy_data = yf.download('^GSPC', start=price_data.index[0], 
                       end=price_data.index[-1], progress=False)
backtester.compare_to_benchmark(spy_data['Adj Close'], 'S&P 500')

10.6 Phase 5: Report Generation

Creating a Professional Investment Report

from datetime import datetime
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages

class ReportGenerator:
    """
    Generate professional investment reports
    """
    def __init__(self, portfolio_data, backtest_results, screening_results):
        self.portfolio = portfolio_data
        self.backtest = backtest_results
        self.screening = screening_results
        self.timestamp = datetime.now()
    
    def generate_pdf_report(self, filename='investment_report.pdf'):
        """
        Generate comprehensive PDF report
        """
        print(f"Generating PDF report: {filename}")
        
        with PdfPages(filename) as pdf:
            # Page 1: Cover Page
            self._create_cover_page(pdf)
            
            # Page 2: Executive Summary
            self._create_executive_summary(pdf)
            
            # Page 3: Screening Results
            self._create_screening_page(pdf)
            
            # Page 4: Portfolio Construction
            self._create_portfolio_page(pdf)
            
            # Page 5: Performance Analysis
            self._create_performance_page(pdf)
            
            # Page 6: Risk Analysis
            self._create_risk_page(pdf)
            
            print(f"Report saved: {filename}")
    
    def _create_cover_page(self, pdf):
        """Create cover page"""
        fig = plt.figure(figsize=(8.5, 11))
        fig.text(0.5, 0.7, 'Quantitative Investment Analysis', 
                ha='center', fontsize=24, fontweight='bold')
        fig.text(0.5, 0.65, 'Portfolio Optimization Report',
                ha='center', fontsize=18)
        fig.text(0.5, 0.5, f'Generated: {self.timestamp.strftime("%B %d, %Y")}',
                ha='center', fontsize=12)
        fig.text(0.5, 0.3, 'Data Analytics & Python for Finance',
                ha='center', fontsize=14, style='italic')
        
        plt.axis('off')
        pdf.savefig(fig, bbox_inches='tight')
        plt.close()
    
    def _create_executive_summary(self, pdf):
        """Create executive summary page"""
        fig, ax = plt.subplots(figsize=(8.5, 11))
        ax.axis('off')
        
        summary_text = f"""
EXECUTIVE SUMMARY
{'='*70}

Investment Strategy
• Quantitative screening of tech sector stocks
• Multi-factor ranking system
• Portfolio optimization for maximum risk-adjusted returns
• Quarterly rebalancing

Key Results
• Portfolio Return: {self.backtest['total_return']:.2f}%
• Sharpe Ratio: {self.backtest['sharpe']:.2f}
• Maximum Drawdown: {self.backtest['max_drawdown']:.2f}%
• Number of Holdings: {len([w for w in self.portfolio['weights'] if w > 0.01])}

Recommendation
Based on quantitative analysis, this portfolio demonstrates strong 
risk-adjusted returns with controlled downside risk. The diversified
allocation across selected technology stocks provides exposure to
growth while managing volatility.

        """
        
        ax.text(0.1, 0.9, summary_text, transform=ax.transAxes,
               fontsize=11, verticalalignment='top', fontfamily='monospace')
        
        pdf.savefig(fig, bbox_inches='tight')
        plt.close()
    
    # Additional page creation methods would follow...
    # (screening page, portfolio page, performance page, risk page)

# Generate report
# report_gen = ReportGenerator(
#     portfolio_data={'weights': max_sharpe_weights, 'tickers': top_tickers},
#     backtest_results=results,
#     screening_results=top_stocks
# )
# report_gen.generate_pdf_report('my_investment_report.pdf')

10.7 Project Deliverables

Final Checklist

Code

Data pipeline (collection, cleaning, feature engineering)
Stock screener with multi-factor ranking
Portfolio optimizer with constraints
Backtesting system with rebalancing
Visualization functions
All code well-commented and documented

Analysis

Exploratory data analysis of universe
Statistical testing of selection criteria
Correlation and risk analysis
Factor attribution
Sensitivity analysis of portfolio
Comparison to benchmarks

Documentation

README with project overview
Code documentation and docstrings
Jupyter notebooks with analysis
Professional PDF report
Presentation slides (optional)

Visualizations

Module 10 Summary

Congratulations on completing your capstone project!

What You've Built

A complete, professional-grade quantitative investment system that:

Automatically collects and processes financial data
Screens stocks using multiple criteria
Constructs optimized portfolios
Backtests strategies with realistic assumptions
Generates institutional-quality reports

Skills Demonstrated

Technical Skills

Python programming and software design
Data pipeline development
Statistical analysis and hypothesis testing
Financial modeling and valuation
Portfolio theory and optimization
Backtesting and performance attribution

Financial Skills

Stock screening and selection
Risk management
Portfolio construction
Performance measurement
Comparative analysis

Professional Skills

Project organization and documentation
Clear communication of findings
Report generation
Reproducible research

This Project is Portfolio-Ready

You now have a complete project you can:

Showcase to potential employers
Include in your GitHub portfolio
Present in interviews
Extend with additional features
Use as a foundation for real trading systems

Next Steps

Refine: Add features like transaction costs, taxes, or alternative strategies
Extend: Try different asset classes or international markets
Deploy: Build a web dashboard or automated system
Share: Write about your methodology and findings
Learn: Continue to Module 11 for advanced topics

You're Now a Quantitative Analyst

This project demonstrates real quantitative analyst capabilities. You've gone from Python basics to building a complete investment system. That's a remarkable achievement.

The skills you've developed are in high demand at:

Hedge funds and asset managers
Investment banks
Fintech companies
Corporate finance departments
Trading firms

Keep building, keep learning, and keep pushing your limits.

Continue to Module 11: Advanced Topics & Next Steps →

Module 10: Real-World Capstone Project

Learning Objectives

10.1 Project Overview

The Challenge

Project Structure

10.2 Phase 1: Data Collection and Preparation

Setting Up the Data Pipeline

10.3 Phase 2: Stock Screening and Selection

Building a Quantitative Screening System

10.4 Phase 3: Portfolio Construction and Optimization

Building the Optimal Portfolio

10.5 Phase 4: Backtesting and Performance Analysis

Comprehensive Backtesting System

10.6 Phase 5: Report Generation

Creating a Professional Investment Report

10.7 Project Deliverables

Final Checklist

Module 10 Summary

Questions & Answers

Module 10: Real-World Capstone Project

Learning Objectives

10.1 Project Overview

The Challenge

Project Structure

10.2 Phase 1: Data Collection and Preparation

Setting Up the Data Pipeline

10.3 Phase 2: Stock Screening and Selection

Building a Quantitative Screening System

10.4 Phase 3: Portfolio Construction and Optimization

Building the Optimal Portfolio

10.5 Phase 4: Backtesting and Performance Analysis

Comprehensive Backtesting System

10.6 Phase 5: Report Generation

Creating a Professional Investment Report

10.7 Project Deliverables

Final Checklist

Module 10 Summary

Questions & Answers