URL Parsing
URLs have many components: protocol, domain, port, path, query, and fragment. Let's parse them!
Basic URL Structure
https://www.example.com:8080/path/to/page?query=value#section
|____| |_____________| |__||____________||__________||_____|
protocol domain port path query fragment
Simple URL Matching
URLs with Paths
Protocol Extraction
Domain Extraction
Query Parameters
Fragments
Port Numbers
Complete URL Parser
This captures:
- Protocol (http/https)
- Domain
- Port (optional)
- Path (optional)
- Query (optional)
- Fragment (optional)
Practice Playground
Try extracting:
- Protocols:
https?(?=://) - Domains:
(?<=://)[^/:]+ - Ports:
(?<=:)\d+(?=/) - Paths:
(?<=\w)/[^?#\s]+ - Query params:
(?<=\?)[^#\s]+
Key Takeaways
- URLs have many optional components
- Use groups to capture specific parts
- Lookarounds help extract values cleanly
- Real URL parsing often uses dedicated libraries
- Regex is great for simple extraction tasks

