Description
Breakfast Cereals. Use the data for the breakfast cereals example given below to explore and summarize thedata as follows. (Note that a few records contain missing values; Since there are just a few, a simple solutionis to remove them first. You can use the Missing Data Handling utility in XLMiner.)a. Which variables are quantitative/Numerical? Which are ordinal? Which are nominal?b. Create a table with the average, median, min max, and standard deviation for each of the quantitativevariables. This can be done through Excels functions or Excels Data -> Data Analysis -> Descriptivestatistics menu.c. Use XLMiner to plot a histogram for each of the quantitative variables. Based on the histograms andsummary statistics, answer the following questions:i. Which variables have the largest variability?ii. Which variables seem skewed?iii. Are there any values that seem extreme?d. Use XLMiner to plot a side-by-side boxplot comparing the calories in hot vs. cold cereals. What doesthis plot show us?e. Compute the correlation table for the quantitative variable (use Excels Data -> Data Analysis ->Correlation menu). In addition, use XLMiner to generate a matrix plot for these variables.i. Which pair of variables is most strongly correlated?ii. How can we reduce the number of variables based on these correlations?iii. How would the correlation change if we normalized the data first?2. University Rankings. The data set on American college and university ranking contains information on 1,302American college and university offering an undergraduate program. For each university there are 17measurements that include continuous measurements (e.g., tuition and graduate rate) and categoricalmeasurements (e.g., location by state and whether it is a private or public school).a. Remove all categorical variables. Then remove all records with missing numerical measurements fromthe data set (by creating a new worksheet).b. Conduct a principal components analysis on the cleaned data and comment on the result. Should thedata be normalized? Discuss what characterizes the components you consider key.3. Salles of Toyota Corolla Cars. The file ToyotaCorolla.xls contains data on used cars on sale during latesummer of 2004 in the Netherlands. It has 1,436 records containing details on 38 attributes, including Price,Age, Kilometer, HP and other specifications. The goal will be to predict the price of a used Toyota Corollabased on its specifications.a. Identify the categorical variables.b. Explain the relationship between a categorical variable and the series of binary dummy variablesderived from it.c. How many dummy variables are required to capture the information in a categorical variable with Ncategories?d. Using XLMiners data utility, convert the categorical variables in the dataset into dummy binaries, andexplain in words, for one record, the values in the derived binary dummy.e. Use Excels correlation command (Data -> Data Analysis -> Correlation menu) to produce a correlationmatrix and XLMiners matrix plot to obtain a matrix of all scatter plots