Module 3: Financial Data Sources & APIs

Learning Objectives

By the end of this module, you will:

Understand where to find high-quality financial data
Master working with financial APIs programmatically
Use yfinance library for comprehensive market data
Access economic data from Federal Reserve (FRED)
Work with multiple data providers and their unique features
Build automated data pipelines that update continuously
Handle API authentication and rate limits
Create reusable data retrieval functions

3.1 The Financial Data Landscape

Where Does Financial Data Come From?

Financial data flows from countless sources, each serving different purposes. Understanding this ecosystem helps you choose the right data for your analysis.

Primary Sources

Exchanges: NYSE, NASDAQ, LSE—where securities actually trade
Central Banks: Federal Reserve, ECB, providing economic indicators
Regulatory Filings: SEC, where companies report financials
Market Data Vendors: Bloomberg, Refinitiv, selling aggregated data

Access Methods

Direct APIs: Real-time access through code (what we'll focus on)
Data Aggregators: Services that compile data from multiple sources
Downloaded Files: Historical data in CSV/Excel format
Web Scraping: Extracting data from websites (use cautiously—respect terms of service)

Free vs. Paid Data Sources

Free Sources (Perfect for Learning and Personal Use)

Yahoo Finance (via yfinance): Stock prices, dividends, splits
Alpha Vantage: Stocks, forex, cryptocurrencies
FRED (Federal Reserve): Economic indicators
Quandl: Various financial and economic datasets
IEX Cloud: Market data with free tier

Paid Sources (Professional Use)

Bloomberg Terminal: $20,000+/year—industry standard
Refinitiv Eikon: Comprehensive financial data
FactSet: Institutional-grade data and analytics
Polygon.io: Real-time and historical market data

For this course, we'll focus on free sources that provide professional-quality data for learning and analysis.

3.2 Deep Dive into yfinance

Why yfinance?

The yfinance library is a Python wrapper around Yahoo Finance's API. It's completely free, requires no registration, and provides extensive historical data for stocks, ETFs, mutual funds, currencies, and cryptocurrencies worldwide.

What yfinance Provides

Historical price data (open, high, low, close, volume)
Dividends and stock splits
Company information and statistics
Financial statements (income statement, balance sheet, cash flow)
Options data
Major holders and institutional holders

Installation and Import

# Install if not already installed
!pip install yfinance

# Import
import yfinance as yf
import pandas as pd
import numpy as np

The Ticker Object: Your Gateway to Data

# Create a Ticker object
aapl = yf.Ticker("AAPL")

# This object gives you access to everything about Apple stock

Downloading Historical Price Data

Basic Download

import yfinance as yf

# Download 1 month of data
data = yf.download("AAPL", period="1mo")
print(data.head())

# Download with specific dates
data = yf.download("AAPL", start="2023-01-01", end="2024-01-01")

# Download multiple stocks at once
tickers = ["AAPL", "MSFT", "GOOGL"]
data = yf.download(tickers, period="6mo")

Understanding the Data Structure

# Single stock returns a DataFrame with columns:
# Date (index), Open, High, Low, Close, Adj Close, Volume

print(data.columns)
# Output: ['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']

# Multiple stocks returns a MultiIndex DataFrame
# Level 0: Price type (Open, High, Low, etc.)
# Level 1: Ticker symbol

# Access closing prices for all stocks
closes = data['Close']
print(closes.head())

Different Time Periods

# Valid period values
# 1d, 5d, 1mo, 3mo, 6mo, 1y, 2y, 5y, 10y, ytd, max

# 5 years of data
data = yf.download("AAPL", period="5y")

# Year to date
data = yf.download("AAPL", period="ytd")

# Maximum available history
data = yf.download("AAPL", period="max")

Different Time Intervals

# Valid intervals
# 1m, 2m, 5m, 15m, 30m, 60m, 90m, 1h, 1d, 5d, 1wk, 1mo, 3mo

# Daily data (default)
data = yf.download("AAPL", period="1mo", interval="1d")

# Hourly data (limited to last 730 days)
data = yf.download("AAPL", period="1mo", interval="1h")

# Weekly data
data = yf.download("AAPL", period="2y", interval="1wk")

# 5-minute data (limited to last 60 days)
data = yf.download("AAPL", period="5d", interval="5m")

Dividends and Splits

ticker = yf.Ticker("AAPL")

# Get dividend history
dividends = ticker.dividends
print("Recent Dividends:")
print(dividends.tail(10))

# Get stock split history
splits = ticker.splits
print("\nStock Splits:")
print(splits)

# Calculate total dividends received over a period
start_date = "2023-01-01"
end_date = "2024-01-01"
recent_dividends = dividends[start_date:end_date]
total_dividends = recent_dividends.sum()
print(f"\nTotal dividends from {start_date} to {end_date}: ${total_dividends:.2f}")

Financial Statements

ticker = yf.Ticker("AAPL")

# Income Statement (annual)
income_stmt = ticker.financials
print("Income Statement:")
print(income_stmt)

# Income Statement (quarterly)
quarterly_income = ticker.quarterly_financials
print("\nQuarterly Income Statement:")
print(quarterly_income)

# Balance Sheet
balance_sheet = ticker.balance_sheet
print("\nBalance Sheet:")
print(balance_sheet)

# Cash Flow Statement
cash_flow = ticker.cashflow
print("\nCash Flow:")
print(cash_flow)

# Extract specific items
revenue = income_stmt.loc['Total Revenue']
net_income = income_stmt.loc['Net Income']
print(f"\nRevenue trend: {revenue.values}")
print(f"Net Income trend: {net_income.values}")

Practical Example: Building a Stock Screener

import yfinance as yf
import pandas as pd

def screen_stock(ticker):
    """
    Screen a stock based on fundamental criteria
    """
    try:
        stock = yf.Ticker(ticker)
        info = stock.info
        
        # Extract key metrics
        metrics = {
            'Ticker': ticker,
            'Name': info.get('longName', 'N/A'),
            'Price': info.get('currentPrice', 0),
            'Market Cap': info.get('marketCap', 0),
            'P/E Ratio': info.get('trailingPE', None),
            'Forward P/E': info.get('forwardPE', None),
            'PEG Ratio': info.get('pegRatio', None),
            'Dividend Yield': info.get('dividendYield', 0) * 100 if info.get('dividendYield') else 0,
            'ROE': info.get('returnOnEquity', 0) * 100 if info.get('returnOnEquity') else 0,
            'Debt to Equity': info.get('debtToEquity', 0),
            '52W High': info.get('fiftyTwoWeekHigh', 0),
            '52W Low': info.get('fiftyTwoWeekLow', 0)
        }
        
        return metrics
    
    except Exception as e:
        print(f"Error processing {ticker}: {str(e)}")
        return None

# Screen multiple stocks
tickers = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'META', 'TSLA', 'NVDA', 'JPM', 'V', 'WMT']

print("Screening stocks...")
results = []
for ticker in tickers:
    metrics = screen_stock(ticker)
    if metrics:
        results.append(metrics)

# Create DataFrame
df = pd.DataFrame(results)

# Apply screening criteria
# Example: P/E < 30, Dividend Yield > 1%, ROE > 15%
screened = df[
    (df['P/E Ratio'] < 30) & 
    (df['P/E Ratio'] > 0) &
    (df['Dividend Yield'] > 1) &
    (df['ROE'] > 15)
]

print("\nStocks Meeting Criteria:")
print(screened[['Ticker', 'Name', 'P/E Ratio', 'Dividend Yield', 'ROE']])

3.3 Alpha Vantage: Advanced Market Data

What is Alpha Vantage?

Alpha Vantage provides free APIs for real-time and historical data on stocks, forex, cryptocurrencies, and technical indicators. It requires a free API key but offers more features than yfinance.

Getting Your API Key

Visit www.alphavantage.co/support/#api-key
Enter your email and get a free API key instantly
Free tier allows 5 API calls per minute, 500 per day

Installation

!pip install alpha_vantage

Basic Usage

from alpha_vantage.timeseries import TimeSeries
import pandas as pd

# Initialize with your API key
api_key = 'YOUR_API_KEY_HERE'
ts = TimeSeries(key=api_key, output_format='pandas')

# Get daily data
data, meta_data = ts.get_daily(symbol='AAPL', outputsize='full')
print(data.head())

# Columns: 1. open, 2. high, 3. low, 4. close, 5. volume

Different Data Types

Intraday Data

# Get intraday data (1min, 5min, 15min, 30min, 60min)
data, meta_data = ts.get_intraday(symbol='AAPL', interval='5min', outputsize='full')
print(data.head())

Adjusted Data (includes dividends and splits)

data, meta_data = ts.get_daily_adjusted(symbol='AAPL', outputsize='full')
# Includes: adjusted close, dividend amount, split coefficient
print(data.columns)

Weekly and Monthly Data

# Weekly data
weekly_data, meta_data = ts.get_weekly(symbol='AAPL')

# Monthly data
monthly_data, meta_data = ts.get_monthly(symbol='AAPL')

Technical Indicators

Alpha Vantage can calculate technical indicators for you:

from alpha_vantage.techindicators import TechIndicators

ti = TechIndicators(key=api_key, output_format='pandas')

# Simple Moving Average
sma_data, meta_data = ti.get_sma(symbol='AAPL', interval='daily', time_period=20)

# RSI (Relative Strength Index)
rsi_data, meta_data = ti.get_rsi(symbol='AAPL', interval='daily', time_period=14)

# MACD
macd_data, meta_data = ti.get_macd(symbol='AAPL', interval='daily')

# Bollinger Bands
bbands_data, meta_data = ti.get_bbands(symbol='AAPL', interval='daily', time_period=20)

print("RSI values:")
print(rsi_data.head())

Handling Rate Limits

import time
from alpha_vantage.timeseries import TimeSeries

api_key = 'YOUR_API_KEY_HERE'
ts = TimeSeries(key=api_key, output_format='pandas')

def download_multiple_stocks(tickers):
    """
    Download data for multiple stocks while respecting rate limits
    """
    all_data = {}
    
    for i, ticker in enumerate(tickers):
        print(f"Downloading {ticker} ({i+1}/{len(tickers)})...")
        
        try:
            data, meta_data = ts.get_daily(symbol=ticker, outputsize='compact')
            all_data[ticker] = data
            
            # Wait 12 seconds between calls (5 calls per minute limit)
            if i < len(tickers) - 1:
                time.sleep(12)
                
        except Exception as e:
            print(f"Error downloading {ticker}: {str(e)}")
    
    return all_data

# Download multiple stocks
tickers = ['AAPL', 'MSFT', 'GOOGL']
data = download_multiple_stocks(tickers)

3.4 Federal Reserve Economic Data (FRED)

What is FRED?

FRED (Federal Reserve Economic Data) provides over 800,000 economic time series from 100+ sources. It's the gold standard for macroeconomic data—GDP, inflation, unemployment, interest rates, and much more.

Installation

!pip install fredapi

Getting Your API Key

Visit fred.stlouisfed.org/docs/api/api_key.html
Create a free account
Request an API key

Basic Usage

from fredapi import Fred
import pandas as pd

# Initialize
api_key = 'YOUR_FRED_API_KEY'
fred = Fred(api_key=api_key)

# Get data for a series (each series has a unique ID)
# GDP (Gross Domestic Product)
gdp = fred.get_series('GDP')
print(gdp.tail())

Important Economic Indicators

GDP and Growth

# Real GDP
real_gdp = fred.get_series('GDPC1')

# GDP Growth Rate
gdp_growth = fred.get_series('A191RL1Q225SBEA')

print("Recent Real GDP:")
print(real_gdp.tail())

Inflation

# Consumer Price Index (CPI)
cpi = fred.get_series('CPIAUCSL')

# Core CPI (excluding food and energy)
core_cpi = fred.get_series('CPILFESL')

# Calculate inflation rate (year-over-year)
inflation_rate = cpi.pct_change(periods=12) * 100
print("Recent Inflation Rate:")
print(inflation_rate.tail())

Interest Rates

# Federal Funds Rate
fed_funds = fred.get_series('FEDFUNDS')

# 10-Year Treasury Rate
treasury_10y = fred.get_series('DGS10')

# 30-Year Mortgage Rate
mortgage_30y = fred.get_series('MORTGAGE30US')

print("Current Rates:")
print(f"Federal Funds Rate: {fed_funds.iloc[-1]:.2f}%")
print(f"10-Year Treasury: {treasury_10y.iloc[-1]:.2f}%")
print(f"30-Year Mortgage: {mortgage_30y.iloc[-1]:.2f}%")

Employment

# Unemployment Rate
unemployment = fred.get_series('UNRATE')

# Non-Farm Payrolls
payrolls = fred.get_series('PAYEMS')

# Labor Force Participation Rate
participation = fred.get_series('CIVPART')

print("Employment Metrics:")
print(f"Unemployment Rate: {unemployment.iloc[-1]:.1f}%")
print(f"Labor Participation: {participation.iloc[-1]:.1f}%")

Searching for Data

# Search for series
search_results = fred.search('unemployment rate')
print(search_results.head())

# The search returns a DataFrame with series information
# Most important columns: id, title, observation_start, observation_end

Practical Example: Economic Dashboard

from fredapi import Fred
import pandas as pd
import numpy as np

api_key = 'YOUR_FRED_API_KEY'
fred = Fred(api_key=api_key)

def create_economic_dashboard(start_date='2020-01-01'):
    """
    Create a comprehensive economic dashboard
    """
    print("Fetching economic data...")
    
    # Download key indicators
    indicators = {
        'GDP Growth': 'A191RL1Q225SBEA',
        'Unemployment': 'UNRATE',
        'Inflation (CPI)': 'CPIAUCSL',
        'Fed Funds Rate': 'FEDFUNDS',
        '10Y Treasury': 'DGS10',
        'Consumer Sentiment': 'UMCSENT'
    }
    
    data = {}
    for name, series_id in indicators.items():
        try:
            series = fred.get_series(series_id, observation_start=start_date)
            data[name] = series
        except Exception as e:
            print(f"Error fetching {name}: {str(e)}")
    
    # Create DataFrame
    df = pd.DataFrame(data)
    
    # Calculate year-over-year inflation rate
    if 'Inflation (CPI)' in df.columns:
        df['Inflation Rate (YoY %)'] = df['Inflation (CPI)'].pct_change(periods=12) * 100
    
    # Get most recent values
    latest = df.iloc[-1]
    
    print("\n" + "="*60)
    print("ECONOMIC DASHBOARD - Latest Values")
    print("="*60)
    
    for indicator in latest.index:
        value = latest[indicator]
        if pd.notna(value):
            print(f"{indicator:.<30} {value:.2f}")
    
    print("="*60)
    
    return df

# Create dashboard
economic_data = create_economic_dashboard()

3.5 Building a Data Pipeline

What is a Data Pipeline?

A data pipeline is an automated system that regularly fetches, processes, and stores data. Instead of manually downloading data each time you need it, a pipeline keeps your data fresh automatically.

Simple Data Pipeline Example

import yfinance as yf
import pandas as pd
from datetime import datetime, timedelta
import os

class StockDataPipeline:
    """
    Automated pipeline for stock data
    """
    def __init__(self, tickers, data_folder='stock_data'):
        self.tickers = tickers
        self.data_folder = data_folder
        
        # Create data folder if it doesn't exist
        if not os.path.exists(data_folder):
            os.makedirs(data_folder)
    
    def download_data(self, period='1y'):
        """
        Download data for all tickers
        """
        print(f"Downloading data for {len(self.tickers)} stocks...")
        
        for ticker in self.tickers:
            try:
                print(f"  Fetching {ticker}...")
                data = yf.download(ticker, period=period, progress=False)
                
                if not data.empty:
                    # Save to CSV
                    filename = f"{self.data_folder}/{ticker}.csv"
                    data.to_csv(filename)
                    print(f"  ✓ Saved {ticker} ({len(data)} rows)")
                else:
                    print(f"  ✗ No data for {ticker}")
                    
            except Exception as e:
                print(f"  ✗ Error with {ticker}: {str(e)}")
        
        print("Download complete!")
    
    def load_data(self, ticker):
        """
        Load data for a specific ticker
        """
        filename = f"{self.data_folder}/{ticker}.csv"
        
        if os.path.exists(filename):
            data = pd.read_csv(filename, index_col=0, parse_dates=True)
            return data
        else:
            print(f"No data found for {ticker}")
            return None
    
    def update_data(self):
        """
        Update existing data with new records
        """
        print("Updating data...")
        
        for ticker in self.tickers:
            try:
                # Load existing data
                existing = self.load_data(ticker)
                
                if existing is not None:
                    # Get last date in existing data
                    last_date = existing.index[-1]
                    
                    # Download data from last date to now
                    new_data = yf.download(
                        ticker,
                        start=last_date + timedelta(days=1),
                        progress=False
                    )
                    
                    if not new_data.empty:
                        # Combine old and new data
                        updated = pd.concat([existing, new_data])
                        updated = updated[~updated.index.duplicated(keep='last')]
                        
                        # Save updated data
                        filename = f"{self.data_folder}/{ticker}.csv"
                        updated.to_csv(filename)
                        print(f"  ✓ Updated {ticker} (+{len(new_data)} rows)")
                    else:
                        print(f"  - {ticker} already up to date")
                        
            except Exception as e:
                print(f"  ✗ Error updating {ticker}: {str(e)}")
        
        print("Update complete!")
    
    def get_portfolio_data(self):
        """
        Load all stock data into a single DataFrame
        """
        all_data = {}
        
        for ticker in self.tickers:
            data = self.load_data(ticker)
            if data is not None:
                all_data[ticker] = data['Adj Close']
        
        # Combine into single DataFrame
        portfolio_df = pd.DataFrame(all_data)
        return portfolio_df

# Example usage
tickers = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'TSLA']
pipeline = StockDataPipeline(tickers)

# Initial download
pipeline.download_data(period='2y')

# Load specific stock
aapl_data = pipeline.load_data('AAPL')
print(aapl_data.tail())

# Get all stocks in one DataFrame
portfolio = pipeline.get_portfolio_data()
print(portfolio.head())

# Update data (run this periodically)
pipeline.update_data()

Advanced Pipeline with Error Handling and Logging

import yfinance as yf
import pandas as pd
from datetime import datetime
import os
import logging

class AdvancedDataPipeline:
    """
    Production-ready data pipeline with logging and error handling
    """
    def __init__(self, tickers, data_folder='stock_data'):
        self.tickers = tickers
        self.data_folder = data_folder
        
        # Setup folders
        if not os.path.exists(data_folder):
            os.makedirs(data_folder)
        
        # Setup logging
        log_folder = 'logs'
        if not os.path.exists(log_folder):
            os.makedirs(log_folder)
        
        logging.basicConfig(
            level=logging.INFO,
            format='%(asctime)s - %(levelname)s - %(message)s',
            handlers=[
                logging.FileHandler(f'{log_folder}/pipeline.log'),
                logging.StreamHandler()
            ]
        )
        self.logger = logging.getLogger(__name__)
    
    def download_with_retry(self, ticker, period='1y', max_retries=3):
        """
        Download data with retry logic
        """
        for attempt in range(max_retries):
            try:
                data = yf.download(ticker, period=period, progress=False)
                if not data.empty:
                    return data
                else:
                    self.logger.warning(f"Empty data for {ticker}, attempt {attempt + 1}")
            except Exception as e:
                self.logger.error(f"Error downloading {ticker}, attempt {attempt + 1}: {str(e)}")
                
            if attempt < max_retries - 1:
                import time
                time.sleep(2 ** attempt)  # Exponential backoff
        
        return None
    
    def download_all(self, period='1y'):
        """
        Download all tickers with comprehensive error handling
        """
        self.logger.info(f"Starting download for {len(self.tickers)} tickers")
        
        success_count = 0
        fail_count = 0
        
        for ticker in self.tickers:
            data = self.download_with_retry(ticker, period)
            
            if data is not None:
                filename = f"{self.data_folder}/{ticker}.csv"
                data.to_csv(filename)
                self.logger.info(f"✓ {ticker}: Downloaded {len(data)} rows")
                success_count += 1
            else:
                self.logger.error(f"✗ {ticker}: Failed to download")
                fail_count += 1
        
        self.logger.info(f"Download complete: {success_count} succeeded, {fail_count} failed")
        return success_count, fail_count
    
    def validate_data(self, ticker):
        """
        Validate data quality
        """
        data = self.load_data(ticker)
        
        if data is None:
            return False
        
        issues = []
        
        # Check for missing values
        if data.isnull().any().any():
            issues.append("Contains missing values")
        
        # Check for duplicate dates
        if data.index.duplicated().any():
            issues.append("Contains duplicate dates")
        
        # Check for zero or negative prices
        if (data['Close'] <= 0).any():
            issues.append("Contains invalid prices")
        
        # Check data recency (should be within last 7 days)
        days_old = (datetime.now() - data.index[-1]).days
        if days_old > 7:
            issues.append(f"Data is {days_old} days old")
        
        if issues:
            self.logger.warning(f"{ticker} validation issues: {', '.join(issues)}")
            return False
        else:
            self.logger.info(f"{ticker} validation passed")
            return True
    
    def load_data(self, ticker):
        """
        Load data with error handling
        """
        filename = f"{self.data_folder}/{ticker}.csv"
        
        try:
            if os.path.exists(filename):
                data = pd.read_csv(filename, index_col=0, parse_dates=True)
                return data
            else:
                self.logger.warning(f"No file found for {ticker}")
                return None
        except Exception as e:
            self.logger.error(f"Error loading {ticker}: {str(e)}")
            return None
    
    def health_check(self):
        """
        Check pipeline health
        """
        self.logger.info("Running health check...")
        
        valid_count = 0
        invalid_count = 0
        
        for ticker in self.tickers:
            if self.validate_data(ticker):
                valid_count += 1
            else:
                invalid_count += 1
        
        health_status = {
            'total_tickers': len(self.tickers),
            'valid': valid_count,
            'invalid': invalid_count,
            'health_percentage': (valid_count / len(self.tickers)) * 100
        }
        
        self.logger.info(f"Health check: {health_status['health_percentage']:.1f}% healthy")
        return health_status

# Example usage
pipeline = AdvancedDataPipeline(['AAPL', 'MSFT', 'GOOGL', 'AMZN'])

# Download data
pipeline.download_all(period='1y')

# Check pipeline health
health = pipeline.health_check()
print(f"\nPipeline Health: {health['health_percentage']:.1f}%")

3.6 Best Practices for Working with APIs

Rate Limiting

Most free APIs have rate limits. Respect them to avoid getting blocked.

import time

def rate_limited_download(tickers, api_function, calls_per_minute=5):
    """
    Download data while respecting rate limits
    """
    delay = 60 / calls_per_minute  # seconds between calls
    results = {}
    
    for i, ticker in enumerate(tickers):
        print(f"Processing {ticker} ({i+1}/{len(tickers)})...")
        
        try:
            results[ticker] = api_function(ticker)
            
            # Wait between calls (except for the last one)
            if i < len(tickers) - 1:
                time.sleep(delay)
                
        except Exception as e:
            print(f"Error with {ticker}: {str(e)}")
    
    return results

Caching Data

Avoid redundant API calls by caching data locally.

import os
import pickle
from datetime import datetime, timedelta

class DataCache:
    """
    Simple caching system for API data
    """
    def __init__(self, cache_folder='cache', cache_duration_hours=24):
        self.cache_folder = cache_folder
        self.cache_duration = timedelta(hours=cache_duration_hours)
        
        if not os.path.exists(cache_folder):
            os.makedirs(cache_folder)
    
    def get(self, key):
        """
        Get data from cache if it's fresh
        """
        cache_file = f"{self.cache_folder}/{key}.pkl"
        
        if os.path.exists(cache_file):
            # Check if cache is still fresh
            file_time = datetime.fromtimestamp(os.path.getmtime(cache_file))
            age = datetime.now() - file_time
            
            if age < self.cache_duration:
                with open(cache_file, 'rb') as f:
                    print(f"✓ Loading {key} from cache")
                    return pickle.load(f)
        
        return None
    
    def set(self, key, data):
        """
        Save data to cache
        """
        cache_file = f"{self.cache_folder}/{key}.pkl"
        with open(cache_file, 'wb') as f:
            pickle.dump(data, f)
        print(f"✓ Cached {key}")
    
    def clear(self):
        """
        Clear all cache files
        """
        for file in os.listdir(self.cache_folder):
            os.remove(os.path.join(self.cache_folder, file))
        print("Cache cleared")

# Example usage with yfinance
cache = DataCache(cache_duration_hours=12)

def get_stock_data_cached(ticker, period='1y'):
    """
    Get stock data with caching
    """
    cache_key = f"{ticker}_{period}"
    
    # Try to get from cache
    data = cache.get(cache_key)
    
    if data is None:
        # Not in cache or expired, download fresh data
        print(f"Downloading {ticker}...")
        data = yf.download(ticker, period=period, progress=False)
        cache.set(cache_key, data)
    
    return data

# First call downloads data
aapl_data = get_stock_data_cached('AAPL', '1y')

# Second call uses cached data (if within cache duration)
aapl_data = get_stock_data_cached('AAPL', '1y')

Error Handling

Always handle errors gracefully.

def safe_api_call(function, *args, **kwargs):
    """
    Wrapper for safe API calls with error handling
    """
    try:
        return function(*args, **kwargs)
    except ConnectionError:
        print("Network connection error. Check your internet connection.")
        return None
    except TimeoutError:
        print("API request timed out. Try again later.")
        return None
    except KeyError as e:
        print(f"Data not available: {str(e)}")
        return None
    except Exception as e:
        print(f"Unexpected error: {str(e)}")
        return None

# Example usage
data = safe_api_call(yf.download, 'AAPL', period='1y')

if data is not None:
    print("Data downloaded successfully")
else:
    print("Failed to download data")

3.7 Practice Exercises

Exercise 1: Multi-Source Data Comparison

# Your task: Compare the same stock data from different sources
# 1. Download AAPL data from yfinance
# 2. Download AAPL data from Alpha Vantage
# 3. Compare the closing prices
# 4. Calculate any differences
# 5. Visualize the comparison
# 6. Which source would you trust more and why?

Exercise 2: Economic Indicator Analysis

# Your task: Analyze relationship between economic indicators and stock market
# 1. Download S&P 500 data (^GSPC) using yfinance
# 2. Download unemployment rate from FRED
# 3. Download GDP growth from FRED
# 4. Align the data on matching dates
# 5. Calculate correlation between stock returns and economic indicators
# 6. Create a report summarizing your findings

Exercise 3: Build a Data Update System

# Your task: Create an automated data update system
# 1. Create a function that checks if data exists locally
# 2. If data exists, check if it's current (less than 1 day old)
# 3. If not current, download latest data
# 4. Add logging to track all operations
# 5. Add error handling for network failures
# 6. Test with multiple tickers

Exercise 4: Sector Performance Tracker

# Your task: Track performance across different sectors
# 1. Select 2-3 stocks from each of these sectors:
#    - Technology (e.g., AAPL, MSFT, GOOGL)
#    - Finance (e.g., JPM, BAC, GS)
#    - Healthcare (e.g., JNJ, PFE, UNH)
#    - Energy (e.g., XOM, CVX, COP)
# 2. Download 1 year of data for all stocks
# 3. Calculate returns for each stock
# 4. Calculate average return by sector
# 5. Determine which sector performed best
# 6. Save results to a CSV file

Module 3 Summary

Congratulations! You've mastered the art of accessing and managing financial data from multiple sources.

What You've Accomplished

Data Source Mastery

Understanding the financial data ecosystem
Working fluently with yfinance for market data
Accessing economic indicators via FRED
Using Alpha Vantage for advanced data
Navigating multiple data providers

Technical Skills

Downloading historical price data programmatically
Accessing company fundamentals and financial statements
Retrieving economic indicators and macroeconomic data
Building automated data pipelines
Implementing caching and rate limiting
Handling API errors gracefully

Pipeline Development

Creating reusable data retrieval functions
Building automated update systems
Implementing logging and monitoring
Validating data quality
Managing local data storage

Real-World Capabilities

You can now:

Access virtually any financial data you need for analysis
Build systems that keep your data fresh automatically
Compare data across multiple sources
Handle the complexities of real-world APIs
Create professional-grade data pipelines

What's Next

In Module 4, we'll use all this data to perform exploratory data analysis. You'll learn to calculate returns, measure risk, analyze correlations, identify trends, and extract meaningful insights from the data you can now access effortlessly.

Before Moving Forward

Ensure you're comfortable with:

Downloading data from multiple sources
Understanding different data formats and structures
Basic error handling and retry logic
The concept of data pipelines
Working with API keys and rate limits

Practical Advice

Get Your API Keys: Set up accounts with Alpha Vantage and FRED now—you'll use them throughout the course
Build Your Library: Create a collection of reusable functions for common data tasks
Practice Daily: Try downloading different stocks, experimenting with time periods, exploring various economic indicators
Start Small: Test your pipelines with a few tickers before scaling up

The Power You've Gained

Data access is often the biggest hurdle in financial analysis. You've just cleared that hurdle. With reliable access to professional-quality financial data, you're now equipped to perform analyses that rival those done at major financial institutions.

The data is at your fingertips. Now let's learn what to do with it.

Continue to Module 4: Exploratory Data Analysis for Finance →

Module 3: Financial Data Sources & APIs

Learning Objectives

By the end of this module, you will:

Understand where to find high-quality financial data
Master working with financial APIs programmatically
Use yfinance library for comprehensive market data
Access economic data from Federal Reserve (FRED)
Work with multiple data providers and their unique features
Build automated data pipelines that update continuously
Handle API authentication and rate limits
Create reusable data retrieval functions

3.1 The Financial Data Landscape

Where Does Financial Data Come From?

Financial data flows from countless sources, each serving different purposes. Understanding this ecosystem helps you choose the right data for your analysis.

Primary Sources

Exchanges: NYSE, NASDAQ, LSE—where securities actually trade
Central Banks: Federal Reserve, ECB, providing economic indicators
Regulatory Filings: SEC, where companies report financials
Market Data Vendors: Bloomberg, Refinitiv, selling aggregated data

Access Methods

Direct APIs: Real-time access through code (what we'll focus on)
Data Aggregators: Services that compile data from multiple sources
Downloaded Files: Historical data in CSV/Excel format
Web Scraping: Extracting data from websites (use cautiously—respect terms of service)

Free vs. Paid Data Sources

Free Sources (Perfect for Learning and Personal Use)

Yahoo Finance (via yfinance): Stock prices, dividends, splits
Alpha Vantage: Stocks, forex, cryptocurrencies
FRED (Federal Reserve): Economic indicators
Quandl: Various financial and economic datasets
IEX Cloud: Market data with free tier

Paid Sources (Professional Use)

Bloomberg Terminal: $20,000+/year—industry standard
Refinitiv Eikon: Comprehensive financial data
FactSet: Institutional-grade data and analytics
Polygon.io: Real-time and historical market data

For this course, we'll focus on free sources that provide professional-quality data for learning and analysis.

3.2 Deep Dive into yfinance

Why yfinance?

What yfinance Provides

Historical price data (open, high, low, close, volume)
Dividends and stock splits
Company information and statistics
Financial statements (income statement, balance sheet, cash flow)
Options data
Major holders and institutional holders

Installation and Import

# Install if not already installed
!pip install yfinance

# Import
import yfinance as yf
import pandas as pd
import numpy as np

The Ticker Object: Your Gateway to Data

# Create a Ticker object
aapl = yf.Ticker("AAPL")

# This object gives you access to everything about Apple stock

Downloading Historical Price Data

Basic Download

import yfinance as yf

# Download 1 month of data
data = yf.download("AAPL", period="1mo")
print(data.head())

# Download with specific dates
data = yf.download("AAPL", start="2023-01-01", end="2024-01-01")

# Download multiple stocks at once
tickers = ["AAPL", "MSFT", "GOOGL"]
data = yf.download(tickers, period="6mo")

Understanding the Data Structure

# Single stock returns a DataFrame with columns:
# Date (index), Open, High, Low, Close, Adj Close, Volume

print(data.columns)
# Output: ['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']

# Multiple stocks returns a MultiIndex DataFrame
# Level 0: Price type (Open, High, Low, etc.)
# Level 1: Ticker symbol

# Access closing prices for all stocks
closes = data['Close']
print(closes.head())

Different Time Periods

# Valid period values
# 1d, 5d, 1mo, 3mo, 6mo, 1y, 2y, 5y, 10y, ytd, max

# 5 years of data
data = yf.download("AAPL", period="5y")

# Year to date
data = yf.download("AAPL", period="ytd")

# Maximum available history
data = yf.download("AAPL", period="max")

Different Time Intervals

# Valid intervals
# 1m, 2m, 5m, 15m, 30m, 60m, 90m, 1h, 1d, 5d, 1wk, 1mo, 3mo

# Daily data (default)
data = yf.download("AAPL", period="1mo", interval="1d")

# Hourly data (limited to last 730 days)
data = yf.download("AAPL", period="1mo", interval="1h")

# Weekly data
data = yf.download("AAPL", period="2y", interval="1wk")

# 5-minute data (limited to last 60 days)
data = yf.download("AAPL", period="5d", interval="5m")

Dividends and Splits

ticker = yf.Ticker("AAPL")

# Get dividend history
dividends = ticker.dividends
print("Recent Dividends:")
print(dividends.tail(10))

# Get stock split history
splits = ticker.splits
print("\nStock Splits:")
print(splits)

# Calculate total dividends received over a period
start_date = "2023-01-01"
end_date = "2024-01-01"
recent_dividends = dividends[start_date:end_date]
total_dividends = recent_dividends.sum()
print(f"\nTotal dividends from {start_date} to {end_date}: ${total_dividends:.2f}")

Financial Statements

ticker = yf.Ticker("AAPL")

# Income Statement (annual)
income_stmt = ticker.financials
print("Income Statement:")
print(income_stmt)

# Income Statement (quarterly)
quarterly_income = ticker.quarterly_financials
print("\nQuarterly Income Statement:")
print(quarterly_income)

# Balance Sheet
balance_sheet = ticker.balance_sheet
print("\nBalance Sheet:")
print(balance_sheet)

# Cash Flow Statement
cash_flow = ticker.cashflow
print("\nCash Flow:")
print(cash_flow)

# Extract specific items
revenue = income_stmt.loc['Total Revenue']
net_income = income_stmt.loc['Net Income']
print(f"\nRevenue trend: {revenue.values}")
print(f"Net Income trend: {net_income.values}")

Practical Example: Building a Stock Screener

import yfinance as yf
import pandas as pd

def screen_stock(ticker):
    """
    Screen a stock based on fundamental criteria
    """
    try:
        stock = yf.Ticker(ticker)
        info = stock.info
        
        # Extract key metrics
        metrics = {
            'Ticker': ticker,
            'Name': info.get('longName', 'N/A'),
            'Price': info.get('currentPrice', 0),
            'Market Cap': info.get('marketCap', 0),
            'P/E Ratio': info.get('trailingPE', None),
            'Forward P/E': info.get('forwardPE', None),
            'PEG Ratio': info.get('pegRatio', None),
            'Dividend Yield': info.get('dividendYield', 0) * 100 if info.get('dividendYield') else 0,
            'ROE': info.get('returnOnEquity', 0) * 100 if info.get('returnOnEquity') else 0,
            'Debt to Equity': info.get('debtToEquity', 0),
            '52W High': info.get('fiftyTwoWeekHigh', 0),
            '52W Low': info.get('fiftyTwoWeekLow', 0)
        }
        
        return metrics
    
    except Exception as e:
        print(f"Error processing {ticker}: {str(e)}")
        return None

# Screen multiple stocks
tickers = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'META', 'TSLA', 'NVDA', 'JPM', 'V', 'WMT']

print("Screening stocks...")
results = []
for ticker in tickers:
    metrics = screen_stock(ticker)
    if metrics:
        results.append(metrics)

# Create DataFrame
df = pd.DataFrame(results)

# Apply screening criteria
# Example: P/E < 30, Dividend Yield > 1%, ROE > 15%
screened = df[
    (df['P/E Ratio'] < 30) & 
    (df['P/E Ratio'] > 0) &
    (df['Dividend Yield'] > 1) &
    (df['ROE'] > 15)
]

print("\nStocks Meeting Criteria:")
print(screened[['Ticker', 'Name', 'P/E Ratio', 'Dividend Yield', 'ROE']])

3.3 Alpha Vantage: Advanced Market Data

What is Alpha Vantage?

Alpha Vantage provides free APIs for real-time and historical data on stocks, forex, cryptocurrencies, and technical indicators. It requires a free API key but offers more features than yfinance.

Getting Your API Key

Visit www.alphavantage.co/support/#api-key
Enter your email and get a free API key instantly
Free tier allows 5 API calls per minute, 500 per day

Installation

!pip install alpha_vantage

Basic Usage

from alpha_vantage.timeseries import TimeSeries
import pandas as pd

# Initialize with your API key
api_key = 'YOUR_API_KEY_HERE'
ts = TimeSeries(key=api_key, output_format='pandas')

# Get daily data
data, meta_data = ts.get_daily(symbol='AAPL', outputsize='full')
print(data.head())

# Columns: 1. open, 2. high, 3. low, 4. close, 5. volume

Different Data Types

Intraday Data

# Get intraday data (1min, 5min, 15min, 30min, 60min)
data, meta_data = ts.get_intraday(symbol='AAPL', interval='5min', outputsize='full')
print(data.head())

Adjusted Data (includes dividends and splits)

data, meta_data = ts.get_daily_adjusted(symbol='AAPL', outputsize='full')
# Includes: adjusted close, dividend amount, split coefficient
print(data.columns)

Weekly and Monthly Data

# Weekly data
weekly_data, meta_data = ts.get_weekly(symbol='AAPL')

# Monthly data
monthly_data, meta_data = ts.get_monthly(symbol='AAPL')

Technical Indicators

Alpha Vantage can calculate technical indicators for you:

from alpha_vantage.techindicators import TechIndicators

ti = TechIndicators(key=api_key, output_format='pandas')

# Simple Moving Average
sma_data, meta_data = ti.get_sma(symbol='AAPL', interval='daily', time_period=20)

# RSI (Relative Strength Index)
rsi_data, meta_data = ti.get_rsi(symbol='AAPL', interval='daily', time_period=14)

# MACD
macd_data, meta_data = ti.get_macd(symbol='AAPL', interval='daily')

# Bollinger Bands
bbands_data, meta_data = ti.get_bbands(symbol='AAPL', interval='daily', time_period=20)

print("RSI values:")
print(rsi_data.head())

Handling Rate Limits

import time
from alpha_vantage.timeseries import TimeSeries

api_key = 'YOUR_API_KEY_HERE'
ts = TimeSeries(key=api_key, output_format='pandas')

def download_multiple_stocks(tickers):
    """
    Download data for multiple stocks while respecting rate limits
    """
    all_data = {}
    
    for i, ticker in enumerate(tickers):
        print(f"Downloading {ticker} ({i+1}/{len(tickers)})...")
        
        try:
            data, meta_data = ts.get_daily(symbol=ticker, outputsize='compact')
            all_data[ticker] = data
            
            # Wait 12 seconds between calls (5 calls per minute limit)
            if i < len(tickers) - 1:
                time.sleep(12)
                
        except Exception as e:
            print(f"Error downloading {ticker}: {str(e)}")
    
    return all_data

# Download multiple stocks
tickers = ['AAPL', 'MSFT', 'GOOGL']
data = download_multiple_stocks(tickers)

3.4 Federal Reserve Economic Data (FRED)

What is FRED?

Installation

!pip install fredapi

Getting Your API Key

Visit fred.stlouisfed.org/docs/api/api_key.html
Create a free account
Request an API key

Basic Usage

from fredapi import Fred
import pandas as pd

# Initialize
api_key = 'YOUR_FRED_API_KEY'
fred = Fred(api_key=api_key)

# Get data for a series (each series has a unique ID)
# GDP (Gross Domestic Product)
gdp = fred.get_series('GDP')
print(gdp.tail())

Important Economic Indicators

GDP and Growth

# Real GDP
real_gdp = fred.get_series('GDPC1')

# GDP Growth Rate
gdp_growth = fred.get_series('A191RL1Q225SBEA')

print("Recent Real GDP:")
print(real_gdp.tail())

Inflation

# Consumer Price Index (CPI)
cpi = fred.get_series('CPIAUCSL')

# Core CPI (excluding food and energy)
core_cpi = fred.get_series('CPILFESL')

# Calculate inflation rate (year-over-year)
inflation_rate = cpi.pct_change(periods=12) * 100
print("Recent Inflation Rate:")
print(inflation_rate.tail())

Interest Rates

# Federal Funds Rate
fed_funds = fred.get_series('FEDFUNDS')

# 10-Year Treasury Rate
treasury_10y = fred.get_series('DGS10')

# 30-Year Mortgage Rate
mortgage_30y = fred.get_series('MORTGAGE30US')

print("Current Rates:")
print(f"Federal Funds Rate: {fed_funds.iloc[-1]:.2f}%")
print(f"10-Year Treasury: {treasury_10y.iloc[-1]:.2f}%")
print(f"30-Year Mortgage: {mortgage_30y.iloc[-1]:.2f}%")

Employment

# Unemployment Rate
unemployment = fred.get_series('UNRATE')

# Non-Farm Payrolls
payrolls = fred.get_series('PAYEMS')

# Labor Force Participation Rate
participation = fred.get_series('CIVPART')

print("Employment Metrics:")
print(f"Unemployment Rate: {unemployment.iloc[-1]:.1f}%")
print(f"Labor Participation: {participation.iloc[-1]:.1f}%")

Searching for Data

# Search for series
search_results = fred.search('unemployment rate')
print(search_results.head())

# The search returns a DataFrame with series information
# Most important columns: id, title, observation_start, observation_end

Practical Example: Economic Dashboard

from fredapi import Fred
import pandas as pd
import numpy as np

api_key = 'YOUR_FRED_API_KEY'
fred = Fred(api_key=api_key)

def create_economic_dashboard(start_date='2020-01-01'):
    """
    Create a comprehensive economic dashboard
    """
    print("Fetching economic data...")
    
    # Download key indicators
    indicators = {
        'GDP Growth': 'A191RL1Q225SBEA',
        'Unemployment': 'UNRATE',
        'Inflation (CPI)': 'CPIAUCSL',
        'Fed Funds Rate': 'FEDFUNDS',
        '10Y Treasury': 'DGS10',
        'Consumer Sentiment': 'UMCSENT'
    }
    
    data = {}
    for name, series_id in indicators.items():
        try:
            series = fred.get_series(series_id, observation_start=start_date)
            data[name] = series
        except Exception as e:
            print(f"Error fetching {name}: {str(e)}")
    
    # Create DataFrame
    df = pd.DataFrame(data)
    
    # Calculate year-over-year inflation rate
    if 'Inflation (CPI)' in df.columns:
        df['Inflation Rate (YoY %)'] = df['Inflation (CPI)'].pct_change(periods=12) * 100
    
    # Get most recent values
    latest = df.iloc[-1]
    
    print("\n" + "="*60)
    print("ECONOMIC DASHBOARD - Latest Values")
    print("="*60)
    
    for indicator in latest.index:
        value = latest[indicator]
        if pd.notna(value):
            print(f"{indicator:.<30} {value:.2f}")
    
    print("="*60)
    
    return df

# Create dashboard
economic_data = create_economic_dashboard()

3.5 Building a Data Pipeline

What is a Data Pipeline?

A data pipeline is an automated system that regularly fetches, processes, and stores data. Instead of manually downloading data each time you need it, a pipeline keeps your data fresh automatically.

Simple Data Pipeline Example

import yfinance as yf
import pandas as pd
from datetime import datetime, timedelta
import os

class StockDataPipeline:
    """
    Automated pipeline for stock data
    """
    def __init__(self, tickers, data_folder='stock_data'):
        self.tickers = tickers
        self.data_folder = data_folder
        
        # Create data folder if it doesn't exist
        if not os.path.exists(data_folder):
            os.makedirs(data_folder)
    
    def download_data(self, period='1y'):
        """
        Download data for all tickers
        """
        print(f"Downloading data for {len(self.tickers)} stocks...")
        
        for ticker in self.tickers:
            try:
                print(f"  Fetching {ticker}...")
                data = yf.download(ticker, period=period, progress=False)
                
                if not data.empty:
                    # Save to CSV
                    filename = f"{self.data_folder}/{ticker}.csv"
                    data.to_csv(filename)
                    print(f"  ✓ Saved {ticker} ({len(data)} rows)")
                else:
                    print(f"  ✗ No data for {ticker}")
                    
            except Exception as e:
                print(f"  ✗ Error with {ticker}: {str(e)}")
        
        print("Download complete!")
    
    def load_data(self, ticker):
        """
        Load data for a specific ticker
        """
        filename = f"{self.data_folder}/{ticker}.csv"
        
        if os.path.exists(filename):
            data = pd.read_csv(filename, index_col=0, parse_dates=True)
            return data
        else:
            print(f"No data found for {ticker}")
            return None
    
    def update_data(self):
        """
        Update existing data with new records
        """
        print("Updating data...")
        
        for ticker in self.tickers:
            try:
                # Load existing data
                existing = self.load_data(ticker)
                
                if existing is not None:
                    # Get last date in existing data
                    last_date = existing.index[-1]
                    
                    # Download data from last date to now
                    new_data = yf.download(
                        ticker,
                        start=last_date + timedelta(days=1),
                        progress=False
                    )
                    
                    if not new_data.empty:
                        # Combine old and new data
                        updated = pd.concat([existing, new_data])
                        updated = updated[~updated.index.duplicated(keep='last')]
                        
                        # Save updated data
                        filename = f"{self.data_folder}/{ticker}.csv"
                        updated.to_csv(filename)
                        print(f"  ✓ Updated {ticker} (+{len(new_data)} rows)")
                    else:
                        print(f"  - {ticker} already up to date")
                        
            except Exception as e:
                print(f"  ✗ Error updating {ticker}: {str(e)}")
        
        print("Update complete!")
    
    def get_portfolio_data(self):
        """
        Load all stock data into a single DataFrame
        """
        all_data = {}
        
        for ticker in self.tickers:
            data = self.load_data(ticker)
            if data is not None:
                all_data[ticker] = data['Adj Close']
        
        # Combine into single DataFrame
        portfolio_df = pd.DataFrame(all_data)
        return portfolio_df

# Example usage
tickers = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'TSLA']
pipeline = StockDataPipeline(tickers)

# Initial download
pipeline.download_data(period='2y')

# Load specific stock
aapl_data = pipeline.load_data('AAPL')
print(aapl_data.tail())

# Get all stocks in one DataFrame
portfolio = pipeline.get_portfolio_data()
print(portfolio.head())

# Update data (run this periodically)
pipeline.update_data()

Advanced Pipeline with Error Handling and Logging

import yfinance as yf
import pandas as pd
from datetime import datetime
import os
import logging

class AdvancedDataPipeline:
    """
    Production-ready data pipeline with logging and error handling
    """
    def __init__(self, tickers, data_folder='stock_data'):
        self.tickers = tickers
        self.data_folder = data_folder
        
        # Setup folders
        if not os.path.exists(data_folder):
            os.makedirs(data_folder)
        
        # Setup logging
        log_folder = 'logs'
        if not os.path.exists(log_folder):
            os.makedirs(log_folder)
        
        logging.basicConfig(
            level=logging.INFO,
            format='%(asctime)s - %(levelname)s - %(message)s',
            handlers=[
                logging.FileHandler(f'{log_folder}/pipeline.log'),
                logging.StreamHandler()
            ]
        )
        self.logger = logging.getLogger(__name__)
    
    def download_with_retry(self, ticker, period='1y', max_retries=3):
        """
        Download data with retry logic
        """
        for attempt in range(max_retries):
            try:
                data = yf.download(ticker, period=period, progress=False)
                if not data.empty:
                    return data
                else:
                    self.logger.warning(f"Empty data for {ticker}, attempt {attempt + 1}")
            except Exception as e:
                self.logger.error(f"Error downloading {ticker}, attempt {attempt + 1}: {str(e)}")
                
            if attempt < max_retries - 1:
                import time
                time.sleep(2 ** attempt)  # Exponential backoff
        
        return None
    
    def download_all(self, period='1y'):
        """
        Download all tickers with comprehensive error handling
        """
        self.logger.info(f"Starting download for {len(self.tickers)} tickers")
        
        success_count = 0
        fail_count = 0
        
        for ticker in self.tickers:
            data = self.download_with_retry(ticker, period)
            
            if data is not None:
                filename = f"{self.data_folder}/{ticker}.csv"
                data.to_csv(filename)
                self.logger.info(f"✓ {ticker}: Downloaded {len(data)} rows")
                success_count += 1
            else:
                self.logger.error(f"✗ {ticker}: Failed to download")
                fail_count += 1
        
        self.logger.info(f"Download complete: {success_count} succeeded, {fail_count} failed")
        return success_count, fail_count
    
    def validate_data(self, ticker):
        """
        Validate data quality
        """
        data = self.load_data(ticker)
        
        if data is None:
            return False
        
        issues = []
        
        # Check for missing values
        if data.isnull().any().any():
            issues.append("Contains missing values")
        
        # Check for duplicate dates
        if data.index.duplicated().any():
            issues.append("Contains duplicate dates")
        
        # Check for zero or negative prices
        if (data['Close'] <= 0).any():
            issues.append("Contains invalid prices")
        
        # Check data recency (should be within last 7 days)
        days_old = (datetime.now() - data.index[-1]).days
        if days_old > 7:
            issues.append(f"Data is {days_old} days old")
        
        if issues:
            self.logger.warning(f"{ticker} validation issues: {', '.join(issues)}")
            return False
        else:
            self.logger.info(f"{ticker} validation passed")
            return True
    
    def load_data(self, ticker):
        """
        Load data with error handling
        """
        filename = f"{self.data_folder}/{ticker}.csv"
        
        try:
            if os.path.exists(filename):
                data = pd.read_csv(filename, index_col=0, parse_dates=True)
                return data
            else:
                self.logger.warning(f"No file found for {ticker}")
                return None
        except Exception as e:
            self.logger.error(f"Error loading {ticker}: {str(e)}")
            return None
    
    def health_check(self):
        """
        Check pipeline health
        """
        self.logger.info("Running health check...")
        
        valid_count = 0
        invalid_count = 0
        
        for ticker in self.tickers:
            if self.validate_data(ticker):
                valid_count += 1
            else:
                invalid_count += 1
        
        health_status = {
            'total_tickers': len(self.tickers),
            'valid': valid_count,
            'invalid': invalid_count,
            'health_percentage': (valid_count / len(self.tickers)) * 100
        }
        
        self.logger.info(f"Health check: {health_status['health_percentage']:.1f}% healthy")
        return health_status

# Example usage
pipeline = AdvancedDataPipeline(['AAPL', 'MSFT', 'GOOGL', 'AMZN'])

# Download data
pipeline.download_all(period='1y')

# Check pipeline health
health = pipeline.health_check()
print(f"\nPipeline Health: {health['health_percentage']:.1f}%")

3.6 Best Practices for Working with APIs

Rate Limiting

Most free APIs have rate limits. Respect them to avoid getting blocked.

import time

def rate_limited_download(tickers, api_function, calls_per_minute=5):
    """
    Download data while respecting rate limits
    """
    delay = 60 / calls_per_minute  # seconds between calls
    results = {}
    
    for i, ticker in enumerate(tickers):
        print(f"Processing {ticker} ({i+1}/{len(tickers)})...")
        
        try:
            results[ticker] = api_function(ticker)
            
            # Wait between calls (except for the last one)
            if i < len(tickers) - 1:
                time.sleep(delay)
                
        except Exception as e:
            print(f"Error with {ticker}: {str(e)}")
    
    return results

Caching Data

Avoid redundant API calls by caching data locally.

import os
import pickle
from datetime import datetime, timedelta

class DataCache:
    """
    Simple caching system for API data
    """
    def __init__(self, cache_folder='cache', cache_duration_hours=24):
        self.cache_folder = cache_folder
        self.cache_duration = timedelta(hours=cache_duration_hours)
        
        if not os.path.exists(cache_folder):
            os.makedirs(cache_folder)
    
    def get(self, key):
        """
        Get data from cache if it's fresh
        """
        cache_file = f"{self.cache_folder}/{key}.pkl"
        
        if os.path.exists(cache_file):
            # Check if cache is still fresh
            file_time = datetime.fromtimestamp(os.path.getmtime(cache_file))
            age = datetime.now() - file_time
            
            if age < self.cache_duration:
                with open(cache_file, 'rb') as f:
                    print(f"✓ Loading {key} from cache")
                    return pickle.load(f)
        
        return None
    
    def set(self, key, data):
        """
        Save data to cache
        """
        cache_file = f"{self.cache_folder}/{key}.pkl"
        with open(cache_file, 'wb') as f:
            pickle.dump(data, f)
        print(f"✓ Cached {key}")
    
    def clear(self):
        """
        Clear all cache files
        """
        for file in os.listdir(self.cache_folder):
            os.remove(os.path.join(self.cache_folder, file))
        print("Cache cleared")

# Example usage with yfinance
cache = DataCache(cache_duration_hours=12)

def get_stock_data_cached(ticker, period='1y'):
    """
    Get stock data with caching
    """
    cache_key = f"{ticker}_{period}"
    
    # Try to get from cache
    data = cache.get(cache_key)
    
    if data is None:
        # Not in cache or expired, download fresh data
        print(f"Downloading {ticker}...")
        data = yf.download(ticker, period=period, progress=False)
        cache.set(cache_key, data)
    
    return data

# First call downloads data
aapl_data = get_stock_data_cached('AAPL', '1y')

# Second call uses cached data (if within cache duration)
aapl_data = get_stock_data_cached('AAPL', '1y')

Error Handling

Always handle errors gracefully.

def safe_api_call(function, *args, **kwargs):
    """
    Wrapper for safe API calls with error handling
    """
    try:
        return function(*args, **kwargs)
    except ConnectionError:
        print("Network connection error. Check your internet connection.")
        return None
    except TimeoutError:
        print("API request timed out. Try again later.")
        return None
    except KeyError as e:
        print(f"Data not available: {str(e)}")
        return None
    except Exception as e:
        print(f"Unexpected error: {str(e)}")
        return None

# Example usage
data = safe_api_call(yf.download, 'AAPL', period='1y')

if data is not None:
    print("Data downloaded successfully")
else:
    print("Failed to download data")

3.7 Practice Exercises

Exercise 1: Multi-Source Data Comparison

# Your task: Compare the same stock data from different sources
# 1. Download AAPL data from yfinance
# 2. Download AAPL data from Alpha Vantage
# 3. Compare the closing prices
# 4. Calculate any differences
# 5. Visualize the comparison
# 6. Which source would you trust more and why?

Exercise 2: Economic Indicator Analysis

# Your task: Analyze relationship between economic indicators and stock market
# 1. Download S&P 500 data (^GSPC) using yfinance
# 2. Download unemployment rate from FRED
# 3. Download GDP growth from FRED
# 4. Align the data on matching dates
# 5. Calculate correlation between stock returns and economic indicators
# 6. Create a report summarizing your findings

Exercise 3: Build a Data Update System

# Your task: Create an automated data update system
# 1. Create a function that checks if data exists locally
# 2. If data exists, check if it's current (less than 1 day old)
# 3. If not current, download latest data
# 4. Add logging to track all operations
# 5. Add error handling for network failures
# 6. Test with multiple tickers

Exercise 4: Sector Performance Tracker

# Your task: Track performance across different sectors
# 1. Select 2-3 stocks from each of these sectors:
#    - Technology (e.g., AAPL, MSFT, GOOGL)
#    - Finance (e.g., JPM, BAC, GS)
#    - Healthcare (e.g., JNJ, PFE, UNH)
#    - Energy (e.g., XOM, CVX, COP)
# 2. Download 1 year of data for all stocks
# 3. Calculate returns for each stock
# 4. Calculate average return by sector
# 5. Determine which sector performed best
# 6. Save results to a CSV file

Module 3 Summary

Congratulations! You've mastered the art of accessing and managing financial data from multiple sources.

What You've Accomplished

Data Source Mastery

Understanding the financial data ecosystem
Working fluently with yfinance for market data
Accessing economic indicators via FRED
Using Alpha Vantage for advanced data
Navigating multiple data providers

Technical Skills

Downloading historical price data programmatically
Accessing company fundamentals and financial statements
Retrieving economic indicators and macroeconomic data
Building automated data pipelines
Implementing caching and rate limiting
Handling API errors gracefully

Pipeline Development

Creating reusable data retrieval functions
Building automated update systems
Implementing logging and monitoring
Validating data quality
Managing local data storage

Real-World Capabilities

You can now:

Access virtually any financial data you need for analysis
Build systems that keep your data fresh automatically
Compare data across multiple sources
Handle the complexities of real-world APIs
Create professional-grade data pipelines

What's Next

Before Moving Forward

Ensure you're comfortable with:

Downloading data from multiple sources
Understanding different data formats and structures
Basic error handling and retry logic
The concept of data pipelines
Working with API keys and rate limits

Practical Advice

Get Your API Keys: Set up accounts with Alpha Vantage and FRED now—you'll use them throughout the course
Build Your Library: Create a collection of reusable functions for common data tasks
Practice Daily: Try downloading different stocks, experimenting with time periods, exploring various economic indicators
Start Small: Test your pipelines with a few tickers before scaling up

The Power You've Gained

The data is at your fingertips. Now let's learn what to do with it.

Continue to Module 4: Exploratory Data Analysis for Finance →

Module 3: Financial Data Sources & APIs

Learning Objectives

3.1 The Financial Data Landscape

Where Does Financial Data Come From?

Free vs. Paid Data Sources

3.2 Deep Dive into yfinance

Why yfinance?

Installation and Import

The Ticker Object: Your Gateway to Data

Downloading Historical Price Data

Company Information

Dividends and Splits

Financial Statements

Practical Example: Building a Stock Screener

3.3 Alpha Vantage: Advanced Market Data

What is Alpha Vantage?

Getting Your API Key

Installation

Basic Usage

Different Data Types

Technical Indicators

Handling Rate Limits

3.4 Federal Reserve Economic Data (FRED)

What is FRED?

Installation

Getting Your API Key

Basic Usage

Important Economic Indicators

Searching for Data

Practical Example: Economic Dashboard

3.5 Building a Data Pipeline

What is a Data Pipeline?

Simple Data Pipeline Example

Advanced Pipeline with Error Handling and Logging

3.6 Best Practices for Working with APIs

Rate Limiting

Caching Data

Error Handling

3.7 Practice Exercises

Exercise 1: Multi-Source Data Comparison

Exercise 2: Economic Indicator Analysis

Exercise 3: Build a Data Update System

Exercise 4: Sector Performance Tracker

Module 3 Summary

Module 3: Financial Data Sources & APIs

Learning Objectives

3.1 The Financial Data Landscape

Where Does Financial Data Come From?

Free vs. Paid Data Sources

3.2 Deep Dive into yfinance

Why yfinance?

Installation and Import

The Ticker Object: Your Gateway to Data

Downloading Historical Price Data

Company Information

Dividends and Splits

Financial Statements

Practical Example: Building a Stock Screener

3.3 Alpha Vantage: Advanced Market Data

What is Alpha Vantage?

Getting Your API Key

Installation

Basic Usage

Different Data Types

Technical Indicators

Handling Rate Limits

3.4 Federal Reserve Economic Data (FRED)

What is FRED?

Installation

Getting Your API Key

Basic Usage

Important Economic Indicators

Searching for Data

Practical Example: Economic Dashboard

3.5 Building a Data Pipeline

What is a Data Pipeline?

Simple Data Pipeline Example

Advanced Pipeline with Error Handling and Logging

3.6 Best Practices for Working with APIs

Rate Limiting