Lab 04
Advanced Data Analysis and Statistics
Pre-lab Prep
- To use the cloud computing platform Google Colab, you need a Google account and access to Google Drive. SU students can use their @g.syr.edu account.
- Students who are not familiar with Google Colab are strongly encouraged to watch this quickview video and visit the Google Colab website to navigate through Welcome to Colab, Overview of Colab, and Guide to Markdown
- Students are strongly encouraged to read through Lab 04 Instructions before class.
Lab Materials
1. Lab 04 Instructions
Please download Lab 04 Instructions and go through it before class.
2. Lab 04 Demo
- To save time setting up the coding environment and dependencies on your local computer, you can click the Open in Colab button at the top of this webpage to open it in Google Colab.
- Once you have opened it in Google Colab, log in using your Google account then click on ‘Runtime’ in the menu bar. Then, select ‘Change runtime type’ and modify the runtime type from Python 3 to R.
#######################################################################################
# Downloading River Chemistry Time Series Data Using dataRetrieval package
#######################################################################################
# In the demo, the specific USGS site I am going to download Ca data for
# has the site number USGS-01391500 (Saddle River at Lodi NJ)
# let's define a variable to store the site number
siteid <- "USGS-01391500"
# In lab 04 deliverables, you need to explore the other two sites:
# USGS-01111500 (BRANCH RIVER AT FORESTDALE, RI)
# USGS-02336300 (PEACHTREE CREEK AT ATLANTA, GA)
# USGS encode all chemicals as numeric codes. Calcium's code is 00915 while Sodium code is 00930
parmCd <- "00930"
# let's focus on water quality data collected from 1978 to 2018
start.date = as.Date("1978-01-01")
end.date = as.Date("2019-01-01")
# Simplify the dataset by keeping only most essential columns (i.e., location, sampling date, data value)
demo_site <- demo_site %>%
select(c("MonitoringLocationIdentifier", "ActivityStartDate",
"year", "month", "mday", "ResultMeasureValue")) %>%
rename(site_no=MonitoringLocationIdentifier, sample_dt=ActivityStartDate,
result_va=ResultMeasureValue)
#######################################################################################
# Data Visualization and Regression Analysis
#######################################################################################
# aggregate data by year
annual_summary <- tapply(demo_site$result_va, demo_site$year, mean, na.rm=TRUE)
annual_summary <- data.frame(year=as.numeric(names(annual_summary)),
result_annual=annual_summary)
# plotting
plot(x = annual_summary$year,
y = annual_summary$result_annual,
xlab="Year",
ylab="Annual Mean Na Concentration (mg/L)",
main="Temporal Trend of Annual Mean Na Concentration 1978-2018",
type="b")
# regression analysis
abline(lm(result_annual ~ year, data=annual_summary), col=2)
summary(lm(result_annual ~ year, data=annual_summary))
# save the figure to pdf
3. Lab 04 Deliverable
- Modify and rerun the demo code to generate the temporal plot of Na concentration for the other two sites (USGS-01111500 and USGS-02336300) and perform the regression analysis for both sites
- Submit a single-page PDF file including these two plots plus 2-3 paragraphs describing these two plots and what explain the difference (refer to papers in Lab 01 and previous lectures) in the temporal trend at these two sites
Deliverables
Deliverables | Date Assigned | Date Due |
---|---|---|
Lab 04 (refer to the SU Blackboard website) | Thur 10/24/2024 | Thur 10/31/2024, 12:30pm ET |