ESP 178 Applied Research Methods 2/26/16 Class Exercise: Quantitative Analysis Introduction: In summer 2006, my student Ted Buehler and I conducted a survey of residents in Davis and five other cities. The purpose of the survey was to test the effect of bicycle infrastructure and bicycle culture in a community on bicycling behavior for the residents of that community while controlling for sociodemographic characteristics and personal attitudes. Our conceptual model looked something like this: Bicycle infrastructure Bicycle culture Safety concerns Bicycling behavior Attitudes Sociodemographics The cities chosen for the study are roughly similar to Davis in terms of population and geographic size, but they vary with respect to bicycle infrastructure and culture. We surveyed a random sample of residents in each of these cities. The dataset I ve given you (a subset of the entire dataset for the survey), includes the following variables, measured at the individual level: Biking DAYSBIKED Number of days in last 7 days respondent rode a 0 to 7 days ratio bicycle BIKER Whether respondent rode a bike within the last 7 days 0=no; 1=yes nominal Perceptions of Bicycle Infrastructure in City BIKELANE Major streets have bike lanes 1 = not at all true; Ordinal (but WIDESTRE Streets without bike lanes are wide enough to bike on 2 = somewhat true; ratio) BIKERACK Stores and other destinations have bike racks 3 = mostly true; 1
PATHLIGH Streets and bike paths are well lighted 4 = entirely true BUTTONS Intersections have push-buttons or sensors for bicycles or pedestrians PATHNETW City has a network of off-street bike paths FREEOFOB Bike lanes are free of obstacles PATHGAPS Bike route network has big gaps HILLY Area is too hilly for easy bicycling Perceptions of Bicycle Culture in City DRIVERSO Most drivers seem oblivious to bicyclists 1 = strongly Ordinal (but DRIVERSY Most drivers yield to bicyclists disagree; DRIVERSW Most drivers watch for bicyclists at intersections 2 = disagree; ratio) DRIVERSS Most people drive faster than the speed limit 3 = neutral; BIKERSSP Most bicyclists look like they spend a lot of money on their bikes 4 = agree; 5 = strongly agree RARESHOP Rare for people to shop for groceries on a bike BIKENORM Bicycling is a normal mode of transportation for adults in this community BIKEPOOR Most bicyclists look like they are too poor to own a car KIDSBIKE Kids often ride their bikes around my neighborhood for fun BIKENOSA Many bicyclists appear to have little regard for their personal safety Safety Concerns HITBYCAR BICYBYBI Concerned about being hit by car Concerned about being hit by another bicyclists while riding bike 1 = not at all concerned; 2= somewhat concerned; Ordinal (but ratio) BITBYDOG Concerned about being bitten by a dog 3 = very concerned MUGGED Concerned about being mugged or attacked CRASH Concerned about crashing because of road hazards (e.g. uneven pavement or debris) Attitudinal Measures GETEXER ENJOYEXE It is important for me to get regular exercise I enjoy physical exercise 1 = strongly disagree; 2 = disagree; Ordinal (but ratio) PRICEGAS The price of gasoline affects the choices I make about my daily travel 3 = neutral; 4 = agree; LIMITAIR I try to limit my driving to help improve air quality 5 = strongly agree TRAVELWA Travel time is generally wasted time LIKEBIKE I like riding a bike 2
NEEDCAR LIKEDRV PREFERBI LIMITDRI I need a car to do many of the things I like to do I like driving I prefer to ride a bike rather than drive whenever possible I try to limit my driving as much as possible Socio-Demographic Characteristics FEMALE Pretty obvious, isn t it? 0=male, 1=female nominal AGE Age of respondent n/a ratio DRIVERSL Driver s license? 0=no, 1=yes nominal BIKELIMIT Has a physical or anxiety condition that limits or 0=no, 1=yes nominal prevents bicycling COLLEGE Whether college degree or technical school 0=no, 1=yes nominal degree/certificate or higher INCOME Total household income n/a ratio RENTER Rent current residence? 0=owner, 1=renter nominal MARRIED Married or in a steady relationship 0=no, 1=yes nominal WORKER Respondents works or volunteers outside the 0=no, 1=yes nominal house at least one day per week CAR Owns or has regular access to a car 0=no, 1=yes nominal BIKE Owns or has regular access to a bicycle (in 0=no, 1=yes nominal working condition) BIKEDASKID Ever rode a bicycle when about the age of 12 0=no, 1=yes nominal City of Residence CITYORIG City where respondent lives 1=Turlock; nominal 2=Davis; 3=Woodland; 4=Chico; 5=Eugene; 6=Boulder DAVIS If respondent lives in Davis 0=no, 1=yes nominal EUGENE If respondent lives in Eugene 0=no, 1=yes nominal BOULDER If respondent lives in Boulder 0=no, 1=yes nominal 3
Exercise We ll do two kinds of analysis with the data. First, we ll do univariate analysis, otherwise known as descriptive statistics. By looking at the characteristics of the data for each variable, we get to know the dataset a bit. Second, we ll look at the associations between variables, using either bivariate analysis (two variables) or multivariate analysis (more than two variables). Remember that the techniques we use will depend on the level of measurement, whether nominal, ordinal, or ratio. As you do the analysis, put together a one-page memo that summarizes your findings (suggestion: open a Word file and leave it open as you work on the analysis; cut and paste as needed). We ll be using the Rcmdr ( R Commander ) statistical package in the R programming environment; notes on how to do the required analyses in R Commander are included in italics, below. 0. Getting started: a. Navigate to the Z: drive. Open the file Bike_Survey_for_ESP178_launch.r. This will open Rstudio. Select/highlight all lines of code, and click the Run icon near the top of the tab. This will launch R Commander. If you are prompted whether to install additional packages, click No. b. We must load our data into R Commander. Click the box to the right of Data set: and select DatasetIntegers. c. Just a note about how R Commander works: in the R Script tab, you ll see a running documentation of your actions every operation you run (even loading the data into the program) will be documented there in the R programming language. This record can be helpful for learning to write code in R and for reproducing your work at a later date. In the R Markdown tab, you ll find a different form of documentation, which can be used to generate reports. For example, after running any operation, you may click Generate report to view the results. 1. Univariate Analysis: Start with the univariate analysis, as a way of getting to know the data: d. Describe the behavior of the survey respondents in terms of level of biking, for both DAYSBIKED and BIKER. What do the distributions look like? What share of respondents bike? What is the average number of days biked? For univariate statistics (including means, modes, medians, frequency distributions, etc.), go to Graphs on main menu, then pick Histograms. Highlight the variable of interest under Variable (pick one) then click OK. The output should appear in a new window. Alternatively, to generate a report, select the R Markdown tab and click Generate report (html format is fine). e. Describe the characteristics of the survey respondents. Run the basic univariate statistics for five of the socio-demographic characteristics. (Hint: look at frequency distributions for nominal and ordinal variables and means and standard deviations for ratio variables.) Anything interesting? 4
f. Now pick one of the other sets of independent variables perceptions of bicycle infrastructure, perceptions of bicycle culture, safety concerns, or attitudes. Run the basic univariate statistics for the items in this set. Anything interesting? 2. Bivariate Analysis: Now let s see what kind of association there is between the independent variables and biking behavior: g. Let s start with the BIKER variable. Look through the potential independent variables perceptions of bicycle infrastructure, perceptions of bicycle culture, safety concerns, attitudes, socio-demographics and find a nominal or ordinal variable that you think explains whether someone is a biker or not (e.g. FEMALE, or ORIGCITY). What is your hypothesis? To test your hypothesis, run a cross-tab with BIKER as the dependent variable (Y). Which group (e.g. men or women, if you used FEMALE, or city if you used ORIGCITY) is more likely to be a biker? Is this what you expected? Are the distributions significantly different (hint: look at the significance level for the Pearson chi-square statistic)? Try this with a second nominal or ordinal independent variable. In the Data set: box, switch to the DatasetFactors data. For chi-square, on the main menu under Statistics, select Contingency tables >> Two-way table. Under Row variable, select the appropriate dependent variable (Y-Response) and under Column variable, select an independent variable (X-Factor) and click OK. To generate a report, select the R Markdown tab and click Generate report (html format is fine). h. Now let s see if this variable is associated with the frequency of biking - DAYSBIKED. What is your hypothesis? Compare the mean of DAYSBIKED for each group (e.g. male vs. female). Which group has the highest mean? Is this what you expected? Run a one-way ANOVA (analysis of variance) to see if the differences between these groups are statistically significant (hint: look at the significance level for the F statistic). Try this with your second independent variable. For ANOVA, on the main menu under Statistics, select Means >> One-way ANOVA. Under Groups select the independent variable, and under Response Variable, select the appropriate dependent variable. Click OK. To generate a report, select the R Markdown tab and click Generate report (html format is fine). i. Now see how age works as an explanatory variable. Run a linear regression with DAYSBIKED as the dependent variable and age as the independent variable. Is the coefficient for the age variable statistically significant (hint: look at the significance level for the coefficient)? How much of the variation in biking levels is explained by age (hint: look at the R-square value for the model)? In the Data set: box, switch to the DatasetIntegers data. For regression, on the main menu under Statistics, select Fit models >> Linear regression. Select the appropriate response (dependent) variable and select one or more explanatory (independent) variables. Click OK. To generate a report, select the R Markdown tab and click Generate report (html format is fine). j. If you have time, try running a linear regression with multiple independent variables. For regression, on the main menu under Statistics, select Fit models >> Linear model. Pick 5-6 key variables you think would explain biking frequency. 5
k. After all this analysis, what do you conclude? What factors do you think are most important in explaining bicycle commuting? 3. Memo Now that you have done some analysis, write a one-page that states your hypotheses and summarizes what you found, using statistics as appropriate to make your point. It helps if you cut and paste key results into a Word file as you do the analysis. You may reference the html, pdf, or docx report file generated by R Commander in your memo. Submit your memo to Matt via email by the end of class (mhamilton@ucdavis.edu). Save a copy for yourself by emailing it to yourself or putting it on a USB drive. 6