Bird Strike: Design & Implement a Relational Database

Northeastern University, CS 5200 Database Management Systems ยท Spring 2023

MySQL R Relational Modeling
Bird strike relational database practicum

Overview

I implemented a relational workflow on published bird-strike incident data. The resulting MySQL layout uses four related tables for incidents, airports, airlines, and sky conditions: primary keys throughout, foreign keys from the fact table into the lookups, and checks that keep flight phase inside a coarse set (takeoff, landing, inflight, unknown) while requiring non-negative altitude.

The pipeline relies on R to clean dates and missing labels, map raw flight phases into those buckets, convert pilot-warned flags, escape awkward quotation marks, and bulk-load rows through RMySQL and DBI. SQL ranks states by strike count, lists airlines above the fleet-wide average, and aggregates incidents by calendar month and flight phase; R pulls that grid and plots total strikes per month. A stored procedure mirrors the same normalization rules, adds airports, airlines, or sky-condition rows when they are missing, and registers a new incident in one step.

Deliverables

  • DDL and integrity: Table definitions, foreign keys from the incident fact table to airports, airlines, and conditions, plus checks on flight phase and altitude.
  • ETL in R: Column selection, invalid-date drops, batch inserts for dimensions and facts, and synthetic key alignment before incidents load.
  • Analytical SQL: Top states by incident volume; airlines with above-average strike counts (subquery and HAVING); counts grouped by month and flight phase for plotting.
  • Visualization: Scatter plot of total bird strikes by month with labeled points.
  • Stored procedure: Encapsulates the same normalization logic, extends dimension rows when needed, and records a new strike in a single database round-trip (exercised from the notebook).
  • Documentation: Written report (PDF) and linked R Markdown source.

Technology stack

  • MySQL: Schema, constraints, analytical queries, and stored procedures.
  • R: Cleaning, key matching, batch SQL execution, and plotting.
  • RMySQL / DBI: Connection and execution from the notebook.