STAT 415/615 Regression - Final Project
- A good collection of real data sets suitable for this project is in the
Machine Learning UCI Repository.
- A diverse collection of datasets from Hawkes
prepared for your projects and supplied with
project ideas.
- A huge collection of data sets is linked to this
data mining metasite called KDnuggets.
- If you are interested, you may get tons of
Government data.
- Also, Biomedical data from various sources.
- Detailed NFL data since 1999, supported by several
R and Python packages.
- Air and space exploration? Here are
NASA data bases.
Social Justice
- Detailed demographics data in the U.S.A. from the US Census Bureau
- Notice the COVID-19 and Race and Ethnicity from the COVID tracking project
- Income disparity from the US Census Bureau
- Poverty data from the US Census Bureau
- Health insurance coverage from the US Census Bureau
- Household income from the US Census Bureau
- Race and Economic Opportunity Data Tables from the US Census Bureau
- Labor Force Statistics from the Current Population Survey from the US Bureau of Labor Statistics
- Race and Origin of Victims and Offenders, the National Crime Victimization Survey from the US Dept of Justice Office of Justice Programs
- Racial profiling, arrests, citations, warnings - police data from the US Data.gov
- Unemployment, poverty, educational attainment for the U.S. States and counties from the US Dept. of Agriculture
- Data sources for studies on racial justice and health equity from the UCLA Center for the Study of Racism, Social Justice & Health.
COVID-19 data
- Humanitarian Data Exchange (HDX) is a metasite that publishes and updates complete COVID-19 data from the World Health Organization, Metabiota, Global Health 50/50, Assessment Capacities Project (ACAPS), and others.
Location: https://data.humdata.org/, https://data.humdata.org/event/covid-19
- Johns Hopkins University COVID-19 detailed up-to-date data on confirmed infected, recovered, tested, and fatal cases by countries, states, and main locations of the outbreak are published on HDX and Github.
Location: https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases,
https://github.com/CSSEGISandData/COVID-19
- GitHub, Inc., publishes and updates data bases and accompanying software packages on the on the COVID-19 pandemic outbreak.
Location:
https://github.com/datasets/covid-19,
https://github.com/github/covid19-dashboard,
https://github.com/ImperialCollegeLondon/covid19model,
https://github.com/neherlab/covid19_scenarios,
https://github.com/nytimes/covid-19-data
and by country:
Italy,
Japan,
India, etc.
- CEBM/Oxford data
Questions/comments/suggestions? Write to baron@american.edu