Module 3: Financial Data Sources & APIs
Learning Objectives
By the end of this module, you will:
- Understand where to find high-quality financial data
- Master working with financial APIs programmatically
- Use yfinance library for comprehensive market data
- Access economic data from Federal Reserve (FRED)
- Work with multiple data providers and their unique features
- Build automated data pipelines that update continuously
- Handle API authentication and rate limits
- Create reusable data retrieval functions
3.1 The Financial Data Landscape
Where Does Financial Data Come From?
Financial data flows from countless sources, each serving different purposes. Understanding this ecosystem helps you choose the right data for your analysis.
Primary Sources
- Exchanges: NYSE, NASDAQ, LSE—where securities actually trade
- Central Banks: Federal Reserve, ECB, providing economic indicators
- Regulatory Filings: SEC, where companies report financials
- Market Data Vendors: Bloomberg, Refinitiv, selling aggregated data
Access Methods
- Direct APIs: Real-time access through code (what we'll focus on)
- Data Aggregators: Services that compile data from multiple sources
- Downloaded Files: Historical data in CSV/Excel format
- Web Scraping: Extracting data from websites (use cautiously—respect terms of service)
Free vs. Paid Data Sources
Free Sources (Perfect for Learning and Personal Use)
- Yahoo Finance (via yfinance): Stock prices, dividends, splits
- Alpha Vantage: Stocks, forex, cryptocurrencies
- FRED (Federal Reserve): Economic indicators
- Quandl: Various financial and economic datasets
- IEX Cloud: Market data with free tier
Paid Sources (Professional Use)
- Bloomberg Terminal: $20,000+/year—industry standard
- Refinitiv Eikon: Comprehensive financial data
- FactSet: Institutional-grade data and analytics
- Polygon.io: Real-time and historical market data
For this course, we'll focus on free sources that provide professional-quality data for learning and analysis.
3.2 Deep Dive into yfinance
Why yfinance?
The yfinance library is a Python wrapper around Yahoo Finance's API. It's completely free, requires no registration, and provides extensive historical data for stocks, ETFs, mutual funds, currencies, and cryptocurrencies worldwide.
What yfinance Provides
- Historical price data (open, high, low, close, volume)
- Dividends and stock splits
- Company information and statistics
- Financial statements (income statement, balance sheet, cash flow)
- Options data
- Major holders and institutional holders
Installation and Import
# Install if not already installed
!pip install yfinance
# Import
import yfinance as yf
import pandas as pd
import numpy as np
The Ticker Object: Your Gateway to Data
# Create a Ticker object
aapl = yf.Ticker("AAPL")
# This object gives you access to everything about Apple stock
Downloading Historical Price Data
Basic Download
import yfinance as yf
# Download 1 month of data
data = yf.download("AAPL", period="1mo")
print(data.head())
# Download with specific dates
data = yf.download("AAPL", start="2023-01-01", end="2024-01-01")
# Download multiple stocks at once
tickers = ["AAPL", "MSFT", "GOOGL"]
data = yf.download(tickers, period="6mo")
Understanding the Data Structure
# Single stock returns a DataFrame with columns:
# Date (index), Open, High, Low, Close, Adj Close, Volume
print(data.columns)
# Output: ['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']
# Multiple stocks returns a MultiIndex DataFrame
# Level 0: Price type (Open, High, Low, etc.)
# Level 1: Ticker symbol
# Access closing prices for all stocks
closes = data['Close']
print(closes.head())
Different Time Periods
# Valid period values
# 1d, 5d, 1mo, 3mo, 6mo, 1y, 2y, 5y, 10y, ytd, max
# 5 years of data
data = yf.download("AAPL", period="5y")
# Year to date
data = yf.download("AAPL", period="ytd")
# Maximum available history
data = yf.download("AAPL", period="max")
Different Time Intervals
# Valid intervals
# 1m, 2m, 5m, 15m, 30m, 60m, 90m, 1h, 1d, 5d, 1wk, 1mo, 3mo
# Daily data (default)
data = yf.download("AAPL", period="1mo", interval="1d")
# Hourly data (limited to last 730 days)
data = yf.download("AAPL", period="1mo", interval="1h")
# Weekly data
data = yf.download("AAPL", period="2y", interval="1wk")
# 5-minute data (limited to last 60 days)
data = yf.download("AAPL", period="5d", interval="5m")
Company Information
ticker = yf.Ticker("AAPL")
# Get all available info
info = ticker.info
# Useful information includes:
print(f"Company Name: {info['longName']}")
print(f"Sector: {info['sector']}")
print(f"Industry: {info['industry']}")
print(f"Market Cap: ${info['marketCap']:,}")
print(f"P/E Ratio: {info.get('trailingPE', 'N/A')}")
print(f"Forward P/E: {info.get('forwardPE', 'N/A')}")
print(f"Dividend Yield: {info.get('dividendYield', 0) * 100:.2f}%")
print(f"52 Week High: ${info['fiftyTwoWeekHigh']}")
print(f"52 Week Low: ${info['fiftyTwoWeekLow']}")
print(f"Average Volume: {info['averageVolume']:,}")
print(f"Website: {info['website']}")
Dividends and Splits
ticker = yf.Ticker("AAPL")
# Get dividend history
dividends = ticker.dividends
print("Recent Dividends:")
print(dividends.tail(10))
# Get stock split history
splits = ticker.splits
print("\nStock Splits:")
print(splits)
# Calculate total dividends received over a period
start_date = "2023-01-01"
end_date = "2024-01-01"
recent_dividends = dividends[start_date:end_date]
total_dividends = recent_dividends.sum()
print(f"\nTotal dividends from {start_date} to {end_date}: ${total_dividends:.2f}")
Financial Statements
ticker = yf.Ticker("AAPL")
# Income Statement (annual)
income_stmt = ticker.financials
print("Income Statement:")
print(income_stmt)
# Income Statement (quarterly)
quarterly_income = ticker.quarterly_financials
print("\nQuarterly Income Statement:")
print(quarterly_income)
# Balance Sheet
balance_sheet = ticker.balance_sheet
print("\nBalance Sheet:")
print(balance_sheet)
# Cash Flow Statement
cash_flow = ticker.cashflow
print("\nCash Flow:")
print(cash_flow)
# Extract specific items
revenue = income_stmt.loc['Total Revenue']
net_income = income_stmt.loc['Net Income']
print(f"\nRevenue trend: {revenue.values}")
print(f"Net Income trend: {net_income.values}")
Practical Example: Building a Stock Screener
import yfinance as yf
import pandas as pd
def screen_stock(ticker):
"""
Screen a stock based on fundamental criteria
"""
try:
stock = yf.Ticker(ticker)
info = stock.info
# Extract key metrics
metrics = {
'Ticker': ticker,
'Name': info.get('longName', 'N/A'),
'Price': info.get('currentPrice', 0),
'Market Cap': info.get('marketCap', 0),
'P/E Ratio': info.get('trailingPE', None),
'Forward P/E': info.get('forwardPE', None),
'PEG Ratio': info.get('pegRatio', None),
'Dividend Yield': info.get('dividendYield', 0) * 100 if info.get('dividendYield') else 0,
'ROE': info.get('returnOnEquity', 0) * 100 if info.get('returnOnEquity') else 0,
'Debt to Equity': info.get('debtToEquity', 0),
'52W High': info.get('fiftyTwoWeekHigh', 0),
'52W Low': info.get('fiftyTwoWeekLow', 0)
}
return metrics
except Exception as e:
print(f"Error processing {ticker}: {str(e)}")
return None
# Screen multiple stocks
tickers = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'META', 'TSLA', 'NVDA', 'JPM', 'V', 'WMT']
print("Screening stocks...")
results = []
for ticker in tickers:
metrics = screen_stock(ticker)
if metrics:
results.append(metrics)
# Create DataFrame
df = pd.DataFrame(results)
# Apply screening criteria
# Example: P/E < 30, Dividend Yield > 1%, ROE > 15%
screened = df[
(df['P/E Ratio'] < 30) &
(df['P/E Ratio'] > 0) &
(df['Dividend Yield'] > 1) &
(df['ROE'] > 15)
]
print("\nStocks Meeting Criteria:")
print(screened[['Ticker', 'Name', 'P/E Ratio', 'Dividend Yield', 'ROE']])
3.3 Alpha Vantage: Advanced Market Data
What is Alpha Vantage?
Alpha Vantage provides free APIs for real-time and historical data on stocks, forex, cryptocurrencies, and technical indicators. It requires a free API key but offers more features than yfinance.
Getting Your API Key
- Visit www.alphavantage.co/support/#api-key
- Enter your email and get a free API key instantly
- Free tier allows 5 API calls per minute, 500 per day
Installation
!pip install alpha_vantage
Basic Usage
from alpha_vantage.timeseries import TimeSeries
import pandas as pd
# Initialize with your API key
api_key = 'YOUR_API_KEY_HERE'
ts = TimeSeries(key=api_key, output_format='pandas')
# Get daily data
data, meta_data = ts.get_daily(symbol='AAPL', outputsize='full')
print(data.head())
# Columns: 1. open, 2. high, 3. low, 4. close, 5. volume
Different Data Types
Intraday Data
# Get intraday data (1min, 5min, 15min, 30min, 60min)
data, meta_data = ts.get_intraday(symbol='AAPL', interval='5min', outputsize='full')
print(data.head())
Adjusted Data (includes dividends and splits)
data, meta_data = ts.get_daily_adjusted(symbol='AAPL', outputsize='full')
# Includes: adjusted close, dividend amount, split coefficient
print(data.columns)
Weekly and Monthly Data
# Weekly data
weekly_data, meta_data = ts.get_weekly(symbol='AAPL')
# Monthly data
monthly_data, meta_data = ts.get_monthly(symbol='AAPL')
Technical Indicators
Alpha Vantage can calculate technical indicators for you:
from alpha_vantage.techindicators import TechIndicators
ti = TechIndicators(key=api_key, output_format='pandas')
# Simple Moving Average
sma_data, meta_data = ti.get_sma(symbol='AAPL', interval='daily', time_period=20)
# RSI (Relative Strength Index)
rsi_data, meta_data = ti.get_rsi(symbol='AAPL', interval='daily', time_period=14)
# MACD
macd_data, meta_data = ti.get_macd(symbol='AAPL', interval='daily')
# Bollinger Bands
bbands_data, meta_data = ti.get_bbands(symbol='AAPL', interval='daily', time_period=20)
print("RSI values:")
print(rsi_data.head())
Handling Rate Limits
import time
from alpha_vantage.timeseries import TimeSeries
api_key = 'YOUR_API_KEY_HERE'
ts = TimeSeries(key=api_key, output_format='pandas')
def download_multiple_stocks(tickers):
"""
Download data for multiple stocks while respecting rate limits
"""
all_data = {}
for i, ticker in enumerate(tickers):
print(f"Downloading {ticker} ({i+1}/{len(tickers)})...")
try:
data, meta_data = ts.get_daily(symbol=ticker, outputsize='compact')
all_data[ticker] = data
# Wait 12 seconds between calls (5 calls per minute limit)
if i < len(tickers) - 1:
time.sleep(12)
except Exception as e:
print(f"Error downloading {ticker}: {str(e)}")
return all_data
# Download multiple stocks
tickers = ['AAPL', 'MSFT', 'GOOGL']
data = download_multiple_stocks(tickers)
3.4 Federal Reserve Economic Data (FRED)
What is FRED?
FRED (Federal Reserve Economic Data) provides over 800,000 economic time series from 100+ sources. It's the gold standard for macroeconomic data—GDP, inflation, unemployment, interest rates, and much more.
Installation
!pip install fredapi
Getting Your API Key
- Visit fred.stlouisfed.org/docs/api/api_key.html
- Create a free account
- Request an API key
Basic Usage
from fredapi import Fred
import pandas as pd
# Initialize
api_key = 'YOUR_FRED_API_KEY'
fred = Fred(api_key=api_key)
# Get data for a series (each series has a unique ID)
# GDP (Gross Domestic Product)
gdp = fred.get_series('GDP')
print(gdp.tail())
Important Economic Indicators
GDP and Growth
# Real GDP
real_gdp = fred.get_series('GDPC1')
# GDP Growth Rate
gdp_growth = fred.get_series('A191RL1Q225SBEA')
print("Recent Real GDP:")
print(real_gdp.tail())
Inflation
# Consumer Price Index (CPI)
cpi = fred.get_series('CPIAUCSL')
# Core CPI (excluding food and energy)
core_cpi = fred.get_series('CPILFESL')
# Calculate inflation rate (year-over-year)
inflation_rate = cpi.pct_change(periods=12) * 100
print("Recent Inflation Rate:")
print(inflation_rate.tail())
Interest Rates
# Federal Funds Rate
fed_funds = fred.get_series('FEDFUNDS')
# 10-Year Treasury Rate
treasury_10y = fred.get_series('DGS10')
# 30-Year Mortgage Rate
mortgage_30y = fred.get_series('MORTGAGE30US')
print("Current Rates:")
print(f"Federal Funds Rate: {fed_funds.iloc[-1]:.2f}%")
print(f"10-Year Treasury: {treasury_10y.iloc[-1]:.2f}%")
print(f"30-Year Mortgage: {mortgage_30y.iloc[-1]:.2f}%")
Employment
# Unemployment Rate
unemployment = fred.get_series('UNRATE')
# Non-Farm Payrolls
payrolls = fred.get_series('PAYEMS')
# Labor Force Participation Rate
participation = fred.get_series('CIVPART')
print("Employment Metrics:")
print(f"Unemployment Rate: {unemployment.iloc[-1]:.1f}%")
print(f"Labor Participation: {participation.iloc[-1]:.1f}%")
Searching for Data
# Search for series
search_results = fred.search('unemployment rate')
print(search_results.head())
# The search returns a DataFrame with series information
# Most important columns: id, title, observation_start, observation_end
Practical Example: Economic Dashboard
from fredapi import Fred
import pandas as pd
import numpy as np
api_key = 'YOUR_FRED_API_KEY'
fred = Fred(api_key=api_key)
def create_economic_dashboard(start_date='2020-01-01'):
"""
Create a comprehensive economic dashboard
"""
print("Fetching economic data...")
# Download key indicators
indicators = {
'GDP Growth': 'A191RL1Q225SBEA',
'Unemployment': 'UNRATE',
'Inflation (CPI)': 'CPIAUCSL',
'Fed Funds Rate': 'FEDFUNDS',
'10Y Treasury': 'DGS10',
'Consumer Sentiment': 'UMCSENT'
}
data = {}
for name, series_id in indicators.items():
try:
series = fred.get_series(series_id, observation_start=start_date)
data[name] = series
except Exception as e:
print(f"Error fetching {name}: {str(e)}")
# Create DataFrame
df = pd.DataFrame(data)
# Calculate year-over-year inflation rate
if 'Inflation (CPI)' in df.columns:
df['Inflation Rate (YoY %)'] = df['Inflation (CPI)'].pct_change(periods=12) * 100
# Get most recent values
latest = df.iloc[-1]
print("\n" + "="*60)
print("ECONOMIC DASHBOARD - Latest Values")
print("="*60)
for indicator in latest.index:
value = latest[indicator]
if pd.notna(value):
print(f"{indicator:.<30} {value:.2f}")
print("="*60)
return df
# Create dashboard
economic_data = create_economic_dashboard()
3.5 Building a Data Pipeline
What is a Data Pipeline?
A data pipeline is an automated system that regularly fetches, processes, and stores data. Instead of manually downloading data each time you need it, a pipeline keeps your data fresh automatically.
Simple Data Pipeline Example
import yfinance as yf
import pandas as pd
from datetime import datetime, timedelta
import os
class StockDataPipeline:
"""
Automated pipeline for stock data
"""
def __init__(self, tickers, data_folder='stock_data'):
self.tickers = tickers
self.data_folder = data_folder
# Create data folder if it doesn't exist
if not os.path.exists(data_folder):
os.makedirs(data_folder)
def download_data(self, period='1y'):
"""
Download data for all tickers
"""
print(f"Downloading data for {len(self.tickers)} stocks...")
for ticker in self.tickers:
try:
print(f" Fetching {ticker}...")
data = yf.download(ticker, period=period, progress=False)
if not data.empty:
# Save to CSV
filename = f"{self.data_folder}/{ticker}.csv"
data.to_csv(filename)
print(f" ✓ Saved {ticker} ({len(data)} rows)")
else:
print(f" ✗ No data for {ticker}")
except Exception as e:
print(f" ✗ Error with {ticker}: {str(e)}")
print("Download complete!")
def load_data(self, ticker):
"""
Load data for a specific ticker
"""
filename = f"{self.data_folder}/{ticker}.csv"
if os.path.exists(filename):
data = pd.read_csv(filename, index_col=0, parse_dates=True)
return data
else:
print(f"No data found for {ticker}")
return None
def update_data(self):
"""
Update existing data with new records
"""
print("Updating data...")
for ticker in self.tickers:
try:
# Load existing data
existing = self.load_data(ticker)
if existing is not None:
# Get last date in existing data
last_date = existing.index[-1]
# Download data from last date to now
new_data = yf.download(
ticker,
start=last_date + timedelta(days=1),
progress=False
)
if not new_data.empty:
# Combine old and new data
updated = pd.concat([existing, new_data])
updated = updated[~updated.index.duplicated(keep='last')]
# Save updated data
filename = f"{self.data_folder}/{ticker}.csv"
updated.to_csv(filename)
print(f" ✓ Updated {ticker} (+{len(new_data)} rows)")
else:
print(f" - {ticker} already up to date")
except Exception as e:
print(f" ✗ Error updating {ticker}: {str(e)}")
print("Update complete!")
def get_portfolio_data(self):
"""
Load all stock data into a single DataFrame
"""
all_data = {}
for ticker in self.tickers:
data = self.load_data(ticker)
if data is not None:
all_data[ticker] = data['Adj Close']
# Combine into single DataFrame
portfolio_df = pd.DataFrame(all_data)
return portfolio_df
# Example usage
tickers = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'TSLA']
pipeline = StockDataPipeline(tickers)
# Initial download
pipeline.download_data(period='2y')
# Load specific stock
aapl_data = pipeline.load_data('AAPL')
print(aapl_data.tail())
# Get all stocks in one DataFrame
portfolio = pipeline.get_portfolio_data()
print(portfolio.head())
# Update data (run this periodically)
pipeline.update_data()
Advanced Pipeline with Error Handling and Logging
import yfinance as yf
import pandas as pd
from datetime import datetime
import os
import logging
class AdvancedDataPipeline:
"""
Production-ready data pipeline with logging and error handling
"""
def __init__(self, tickers, data_folder='stock_data'):
self.tickers = tickers
self.data_folder = data_folder
# Setup folders
if not os.path.exists(data_folder):
os.makedirs(data_folder)
# Setup logging
log_folder = 'logs'
if not os.path.exists(log_folder):
os.makedirs(log_folder)
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler(f'{log_folder}/pipeline.log'),
logging.StreamHandler()
]
)
self.logger = logging.getLogger(__name__)
def download_with_retry(self, ticker, period='1y', max_retries=3):
"""
Download data with retry logic
"""
for attempt in range(max_retries):
try:
data = yf.download(ticker, period=period, progress=False)
if not data.empty:
return data
else:
self.logger.warning(f"Empty data for {ticker}, attempt {attempt + 1}")
except Exception as e:
self.logger.error(f"Error downloading {ticker}, attempt {attempt + 1}: {str(e)}")
if attempt < max_retries - 1:
import time
time.sleep(2 ** attempt) # Exponential backoff
return None
def download_all(self, period='1y'):
"""
Download all tickers with comprehensive error handling
"""
self.logger.info(f"Starting download for {len(self.tickers)} tickers")
success_count = 0
fail_count = 0
for ticker in self.tickers:
data = self.download_with_retry(ticker, period)
if data is not None:
filename = f"{self.data_folder}/{ticker}.csv"
data.to_csv(filename)
self.logger.info(f"✓ {ticker}: Downloaded {len(data)} rows")
success_count += 1
else:
self.logger.error(f"✗ {ticker}: Failed to download")
fail_count += 1
self.logger.info(f"Download complete: {success_count} succeeded, {fail_count} failed")
return success_count, fail_count
def validate_data(self, ticker):
"""
Validate data quality
"""
data = self.load_data(ticker)
if data is None:
return False
issues = []
# Check for missing values
if data.isnull().any().any():
issues.append("Contains missing values")
# Check for duplicate dates
if data.index.duplicated().any():
issues.append("Contains duplicate dates")
# Check for zero or negative prices
if (data['Close'] <= 0).any():
issues.append("Contains invalid prices")
# Check data recency (should be within last 7 days)
days_old = (datetime.now() - data.index[-1]).days
if days_old > 7:
issues.append(f"Data is {days_old} days old")
if issues:
self.logger.warning(f"{ticker} validation issues: {', '.join(issues)}")
return False
else:
self.logger.info(f"{ticker} validation passed")
return True
def load_data(self, ticker):
"""
Load data with error handling
"""
filename = f"{self.data_folder}/{ticker}.csv"
try:
if os.path.exists(filename):
data = pd.read_csv(filename, index_col=0, parse_dates=True)
return data
else:
self.logger.warning(f"No file found for {ticker}")
return None
except Exception as e:
self.logger.error(f"Error loading {ticker}: {str(e)}")
return None
def health_check(self):
"""
Check pipeline health
"""
self.logger.info("Running health check...")
valid_count = 0
invalid_count = 0
for ticker in self.tickers:
if self.validate_data(ticker):
valid_count += 1
else:
invalid_count += 1
health_status = {
'total_tickers': len(self.tickers),
'valid': valid_count,
'invalid': invalid_count,
'health_percentage': (valid_count / len(self.tickers)) * 100
}
self.logger.info(f"Health check: {health_status['health_percentage']:.1f}% healthy")
return health_status
# Example usage
pipeline = AdvancedDataPipeline(['AAPL', 'MSFT', 'GOOGL', 'AMZN'])
# Download data
pipeline.download_all(period='1y')
# Check pipeline health
health = pipeline.health_check()
print(f"\nPipeline Health: {health['health_percentage']:.1f}%")
3.6 Best Practices for Working with APIs
Rate Limiting
Most free APIs have rate limits. Respect them to avoid getting blocked.
import time
def rate_limited_download(tickers, api_function, calls_per_minute=5):
"""
Download data while respecting rate limits
"""
delay = 60 / calls_per_minute # seconds between calls
results = {}
for i, ticker in enumerate(tickers):
print(f"Processing {ticker} ({i+1}/{len(tickers)})...")
try:
results[ticker] = api_function(ticker)
# Wait between calls (except for the last one)
if i < len(tickers) - 1:
time.sleep(delay)
except Exception as e:
print(f"Error with {ticker}: {str(e)}")
return results
Caching Data
Avoid redundant API calls by caching data locally.
import os
import pickle
from datetime import datetime, timedelta
class DataCache:
"""
Simple caching system for API data
"""
def __init__(self, cache_folder='cache', cache_duration_hours=24):
self.cache_folder = cache_folder
self.cache_duration = timedelta(hours=cache_duration_hours)
if not os.path.exists(cache_folder):
os.makedirs(cache_folder)
def get(self, key):
"""
Get data from cache if it's fresh
"""
cache_file = f"{self.cache_folder}/{key}.pkl"
if os.path.exists(cache_file):
# Check if cache is still fresh
file_time = datetime.fromtimestamp(os.path.getmtime(cache_file))
age = datetime.now() - file_time
if age < self.cache_duration:
with open(cache_file, 'rb') as f:
print(f"✓ Loading {key} from cache")
return pickle.load(f)
return None
def set(self, key, data):
"""
Save data to cache
"""
cache_file = f"{self.cache_folder}/{key}.pkl"
with open(cache_file, 'wb') as f:
pickle.dump(data, f)
print(f"✓ Cached {key}")
def clear(self):
"""
Clear all cache files
"""
for file in os.listdir(self.cache_folder):
os.remove(os.path.join(self.cache_folder, file))
print("Cache cleared")
# Example usage with yfinance
cache = DataCache(cache_duration_hours=12)
def get_stock_data_cached(ticker, period='1y'):
"""
Get stock data with caching
"""
cache_key = f"{ticker}_{period}"
# Try to get from cache
data = cache.get(cache_key)
if data is None:
# Not in cache or expired, download fresh data
print(f"Downloading {ticker}...")
data = yf.download(ticker, period=period, progress=False)
cache.set(cache_key, data)
return data
# First call downloads data
aapl_data = get_stock_data_cached('AAPL', '1y')
# Second call uses cached data (if within cache duration)
aapl_data = get_stock_data_cached('AAPL', '1y')
Error Handling
Always handle errors gracefully.
def safe_api_call(function, *args, **kwargs):
"""
Wrapper for safe API calls with error handling
"""
try:
return function(*args, **kwargs)
except ConnectionError:
print("Network connection error. Check your internet connection.")
return None
except TimeoutError:
print("API request timed out. Try again later.")
return None
except KeyError as e:
print(f"Data not available: {str(e)}")
return None
except Exception as e:
print(f"Unexpected error: {str(e)}")
return None
# Example usage
data = safe_api_call(yf.download, 'AAPL', period='1y')
if data is not None:
print("Data downloaded successfully")
else:
print("Failed to download data")
3.7 Practice Exercises
Exercise 1: Multi-Source Data Comparison
# Your task: Compare the same stock data from different sources
# 1. Download AAPL data from yfinance
# 2. Download AAPL data from Alpha Vantage
# 3. Compare the closing prices
# 4. Calculate any differences
# 5. Visualize the comparison
# 6. Which source would you trust more and why?
Exercise 2: Economic Indicator Analysis
# Your task: Analyze relationship between economic indicators and stock market
# 1. Download S&P 500 data (^GSPC) using yfinance
# 2. Download unemployment rate from FRED
# 3. Download GDP growth from FRED
# 4. Align the data on matching dates
# 5. Calculate correlation between stock returns and economic indicators
# 6. Create a report summarizing your findings
Exercise 3: Build a Data Update System
# Your task: Create an automated data update system
# 1. Create a function that checks if data exists locally
# 2. If data exists, check if it's current (less than 1 day old)
# 3. If not current, download latest data
# 4. Add logging to track all operations
# 5. Add error handling for network failures
# 6. Test with multiple tickers
Exercise 4: Sector Performance Tracker
# Your task: Track performance across different sectors
# 1. Select 2-3 stocks from each of these sectors:
# - Technology (e.g., AAPL, MSFT, GOOGL)
# - Finance (e.g., JPM, BAC, GS)
# - Healthcare (e.g., JNJ, PFE, UNH)
# - Energy (e.g., XOM, CVX, COP)
# 2. Download 1 year of data for all stocks
# 3. Calculate returns for each stock
# 4. Calculate average return by sector
# 5. Determine which sector performed best
# 6. Save results to a CSV file
Module 3 Summary
Congratulations! You've mastered the art of accessing and managing financial data from multiple sources.
What You've Accomplished
Data Source Mastery
- Understanding the financial data ecosystem
- Working fluently with yfinance for market data
- Accessing economic indicators via FRED
- Using Alpha Vantage for advanced data
- Navigating multiple data providers
Technical Skills
- Downloading historical price data programmatically
- Accessing company fundamentals and financial statements
- Retrieving economic indicators and macroeconomic data
- Building automated data pipelines
- Implementing caching and rate limiting
- Handling API errors gracefully
Pipeline Development
- Creating reusable data retrieval functions
- Building automated update systems
- Implementing logging and monitoring
- Validating data quality
- Managing local data storage
Real-World Capabilities
You can now:
- Access virtually any financial data you need for analysis
- Build systems that keep your data fresh automatically
- Compare data across multiple sources
- Handle the complexities of real-world APIs
- Create professional-grade data pipelines
What's Next
In Module 4, we'll use all this data to perform exploratory data analysis. You'll learn to calculate returns, measure risk, analyze correlations, identify trends, and extract meaningful insights from the data you can now access effortlessly.
Before Moving Forward
Ensure you're comfortable with:
- Downloading data from multiple sources
- Understanding different data formats and structures
- Basic error handling and retry logic
- The concept of data pipelines
- Working with API keys and rate limits
Practical Advice
- Get Your API Keys: Set up accounts with Alpha Vantage and FRED now—you'll use them throughout the course
- Build Your Library: Create a collection of reusable functions for common data tasks
- Practice Daily: Try downloading different stocks, experimenting with time periods, exploring various economic indicators
- Start Small: Test your pipelines with a few tickers before scaling up
The Power You've Gained
Data access is often the biggest hurdle in financial analysis. You've just cleared that hurdle. With reliable access to professional-quality financial data, you're now equipped to perform analyses that rival those done at major financial institutions.
The data is at your fingertips. Now let's learn what to do with it.
Continue to Module 4: Exploratory Data Analysis for Finance →

