diff --git a/projects/geospatial.md b/projects/geospatial.md index bc4ebb8..c0f5d3c 100644 --- a/projects/geospatial.md +++ b/projects/geospatial.md @@ -9,7 +9,7 @@ ## Tutorial framing -Geospatial data are complex because observations are tied to coordinate systems, geometric boundaries, raster surfaces, and spatial dependence rather than arriving as independent rows in a single analysis-ready table. +Geospatial data are complex because observations are tied to coordinate systems, geometric boundaries, raster surfaces, and spatial dependence rather than arriving as independent rows in a single, analysis-ready, table. Students should learn three main things about these data: 1. How spatial data are represented through vector geometries, raster grids, coordinate reference systems, spatial identifiers, and formats or services such as GeoJSON, Shapefiles, GeoTIFF, WFS, and WMS. @@ -29,22 +29,22 @@ Students should learn three main things about these data: ## Resources ### Data sources -- [PDOK (Public services on the map)](https://www.pdok.nl/), specifically: +- [PDOK (Publieke Dienstverlening Op de Kaart, Public Services On the Map)](https://www.pdok.nl/), specifically: - [Statistics Netherlands' areal boundaries data](https://www.pdok.nl/introductie/-/article/cbs-gebiedsindelingen) - [Wageningen university's land-use data](https://www.pdok.nl/introductie/-/article/landelijk-grondgebruik-nederland-lgn-) -- [Statistics Netherlands core figures](https://www.cbs.nl/nl-nl/maatwerk/2025/40/kerncijfers-wijken-en-buurten-2025) +- [Statistics Netherlands Key figures for districts and neighborhoods](https://www.cbs.nl/nl-nl/maatwerk/2025/40/kerncijfers-wijken-en-buurten-2025) -Feel free to use different sources if you want. +Feel free to use additional sources if you want. ### Knowledge sources - R packages `sf` and `terra` - The book [Geocomputation with R](https://r.geocompx.org/) (e.g. chapter on raster-vector interactions and data I/O) -- Find your own resources on spatial autoregressive models: CAR. +- Find your own resources on spatial autoregressive models: conditional autoregressive model (CAR) and simultaneously autoregressive model (SAR). ## Week-by-week ### Week 1: -Start from raw spatial files or web services, identify the data generating process, and explain vector/raster or point/polygon structure before doing any modeling. +Start from raw spatial files or web services, identify the data generating/collection process, and explain vector/raster or point/polygon structure before doing any modeling. Visualize the data in the most appropriate way. - What is the standard key identifier for municipalities in the Netherlands? - Can we connect directly to PDOK from R to retrieve all municipalities' boundaries? Or can we download the information? - Can we connect to PDOK from R to retrieve land-use information? @@ -57,9 +57,10 @@ Prepare for the roundtable of week 2: ### Week 2 Operationalize the research question by turning raw geometry-linked files into one analysis table, and document why the data were stored in that format. -- How can we create a tidy dataset of municipalities with their land-use and population characteristics to perform statistical modeling? - What, exactly, does land-use mean? - What dimensions of population composition do we find relevant? +- How can we create a tidy dataset of municipalities with their land-use and population characteristics to perform statistical modeling? + Prepare for the roundtable of week 3: - Explain the main spatial operations: spatial joins, aggregation from grid or point data, etc. @@ -71,7 +72,7 @@ Fit models, explain preprocessing decisions, and show one sensitivity check to s - Do we need to do some transformations, what type, GLM? Or just linear model? - Fit a baseline (non-spatial) model first, then test residual spatial dependence (e.g. Moran's I on residuals). Only escalate to SAR/CAR if the baseline residuals show meaningful spatial structure. - Which parameters, specifically, answer our research question? -- Sensitivity check: show one Modifiable Areal Unit Problem (MAUP) sensitivity — re-run the analysis at a different aggregation level (e.g. neighbourhood vs municipality) or with a different boundary definition, and report whether the conclusion changes. +- Sensitivity check: show one Modifiable Areal Unit Problem (MAUP) sensitivity, i.e., re-run the analysis at a different aggregation level (e.g. neighbourhood vs municipality) or with a different boundary definition, and report whether the conclusion changes. Prepare for the roundtable of week 4: diff --git a/projects/networks.md b/projects/networks.md index 1dafda6..fc74d3f 100644 --- a/projects/networks.md +++ b/projects/networks.md @@ -10,10 +10,10 @@ ## Tutorial framing -Network data are complex because observations are connected through ties, direction, weights, missing nodes, and dependence between relations rather than arriving as independent rows in a single analysis-ready table. +Network data are complex because observations are connected through ties, direction, weights, missing nodes and ties, and dependence between relations rather than data structured as independent rows in a single analysis-ready table. Students should learn three main things about these data: -1. How networks are represented through nodes, edges, edge lists, adjacency matrices, sparse matrices, GraphML, and choices about direction, weight, time, and isolates. +1. How networks are represented through nodes, edges, edge lists, adjacency matrices, sparse matrices, GraphML, and how to make critical choices about direction, weight, time, and isolates. 2. How to turn raw graph files into a clean network object while documenting what counts as a node, what counts as a tie, and which representation best matches the research question. 3. How network dependence affects standard statistical assumptions, and how network statistics, reference models, permutation tests, or clustering can support claims about homophily, polarization, centrality, or other network structures. @@ -39,28 +39,28 @@ Students should learn three main things about these data: ### Knowledge sources -- C/R/Python packages `igraph`, +- C/R/Python packages `igraph` - Introduction to networks - Chapter 0 of "A First Course in Network Science": https://github.com/CambridgeUniversityPress/FirstCourseNetworkScience/blob/master/sample/chapters/chapter0.pdf - App: https://javier.science/marimo_intro_networks/ - Guide for reference models: https://pubmed.ncbi.nlm.nih.gov/34216192/ -- Observed network vs latent network: https://www.nature.com/articles/s41467-022-34267-9 +- Observed vs latent networks: https://www.nature.com/articles/s41467-022-34267-9 ## Week-by-week ### Week 1: Begin with raw repository files and explain what the network is, who generated it, for what purpose, and the different storage formats. - Explain the underlying network in substantive terms: what the nodes and ties represent, and whether the graph is directed or undirected, weighted or unweighted, static or temporal. -- What is GraphML? How does it relate to XML? +- What is the GraphML data type? How does it relate to XML? How is this different from other network data types? - Are adjacency matrices sparse or dense? -- Read about different layout algorithms. +- Read about different visualization layout algorithms. Explore static/interactive visualization tools. Prepare for roundtable in week 2: - What is a network and why is it a useful representation of data? -- What are the main ways to represent a network: edge lists, adjacency matrices, and XML or GraphML-like +- What are the main ways to represent a network: edge lists, adjacency matrices, and XML or GraphML-like? - What are the advantages and disadvantages of adjacency matrices over edge lists? How do sparse matrices fix this and what are they? -- How do you visualize a network? +- How do you visualize a network? What could be the pitfalls of having your analysis based on the network visualization only? ### Week 2: @@ -74,7 +74,7 @@ Operationalize the research question by turning raw graph files into a clean fil Prepare for roundtable in week 3: -- Be able to describe three analyses typically done on networks (e.g. assortativity, centrality, clustering) at a conceptual level, so the rest of the class understands the landscape — but your own project should report only the one statistic and one permutation comparison committed to above. +- Be able to describe three analyses typically done on networks (e.g. assortativity, centrality, clustering) at a conceptual level, so the rest of the class understands the landscape, but your own project should report only the one statistic and one permutation comparison committed to above. - Explain the selection vs influence debate in networks.