Try out this R project to see how one variable might affect an outcome. It’s conceivable that weather conditions could influence flight delays. How do you incorporate weather information into the assessment of delay?
One nycflights13
data frame called weather provides the weather
data for every day and hour at each of the three origin airports. Here’s a glimpse of exactly what it has:
> glimpse(weather,60) Observations: 26,130 Variables: 15 $ origin "EWR", "EWR", "EWR", "EWR", "EWR", "... $ year 2013, 2013, 2013, 2013, 2013, 2013, ... $ month 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... $ day 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... $ hour 0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 1... $ temp 37.04, 37.04, 37.94, 37.94, 37.94, 3... $ dewp 21.92, 21.92, 21.92, 23.00, 24.08, 2... $ humid 53.97, 53.97, 52.09, 54.51, 57.04, 5... $ wind_dir 230, 230, 230, 230, 240, 270, 250, 2... $ wind_speed 10.35702, 13.80936, 12.65858, 13.809... $ wind_gust 11.918651, 15.891535, 14.567241, 15.... $ precip 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... $ pressure 1013.9, 1013.0, 1012.6, 1012.7, 1012... $ visib 10, 10, 10, 10, 10, 10, 10, 10, 10, ... $ time_hour 2012-12-31 19:00:00, 2012-12-31 20:...So the variables it has in common with
flites_name_day
are the first six and the last one. To join the two data frames, use this code:
flites_day_weather <- flites_day %>% inner_join(weather, by = c("origin","year","month","day","hour","time_hour"))Now you can use
flites_day_weather
to start answering questions about departure delay and the weather.What questions will you ask? How will you answer them? What plots will you draw? What regression lines will you create? Will scale()
help?
And, when you’re all done, take a look at arrival delay (arr_delay
).