Suppose your goal is to build a model to predict which of your customers don’t have health insurance; perhaps you want to market inexpensive health insurance packages to them. You’ve collected a dataset of customers whose health insurance status you know. You’ve also identified some customer properties that you believe help predict the probability of insurance coverage: age, employment status, income, information about residence and vehicles, and so on.
In this assignment we’ll address issues that you can discover during the data exploration/visualization phase. First you’ll treat missing values. Then you will apply some common data transformations and when they’re appropriate: converting continuous variables to discrete; normalization and rescaling; and logarithmic transformations.
Customer data can be downloaded from : custdata.RDS
1. Load data into a data frame named custData using readRDS() function.
If you saved file custdata.RDS in the folder C:/tmp, just load data as
2. Print number of rows and columns in the file. Use dim() function.
3. Print column names.
4. Print number of NAs in each column.
Hint: One way to find NAs is to use sum() and is.na() functions, by passing the column to is.na().
5. Adding New Columns to a Data Frame
The variable gas_usage mixes numeric and symbolic data: values greater than 3 are
monthly gas bills, but values from 1 to 3 are special codes. In addition, gas_usage has
some missing values.
The value 1 means “Gas bill included in rent or condo fee”.
The value 2 means “Gas bill included in electricity payment”.
The value 3 means “No charge or gas not used”.
One way to treat gas_usage is to convert all the special codes (1,2,3) to NA, and to add three new indicator variables, one for each code. For example, the indicator variable gas_with_electricity will have the value 1 whenever the original gas_usage
variable had the value 2, and the value 0 otherwise.
A) Create the three new indicator variables, gas_with_rent, gas_with_electricity, and no_gas_bill. Add these indicators to the data frame custData.
Hint: Use ifelse() function. Check texbook pages 66-67 for samples.
B) Print the column names of custData to check if these new columns are added.
6. Convert Invalid Values to NA
The variable age has the problematic value 0, which probably means that the age is unknown. In addition, there are a few customers with age greater than 100, which may also be an error. However, for this project you decide to only treat the value 0 as invalid, and to assume ages greater than one hundred years are valid.
The variable income has negative values. We’ll assume for this project those values are invalid.
A) Convert invalid age and income variables to NA, as if they were “missing variables.”
B) Convert all values of gas_usage that are less than 4 to NA. (The reason we want to do this is because we already created three new indicators for the codes 1,2 and, 3 in gas_usage column. And therefore we want to label these entries as missing variables because they don’t represent the gas bill amount.)
Hint: Use ifelse() function. Check texbook pages 66-67 for samples.
7. Barcharts, Histograms, Scatter Plots
A) Plot barcharts of the predictors num_vehicles, recent_move, health_ins, marital_status, is_employed, and housing_type.
The following is the bar chart of the housing_type:
B) Print histogram of age and income. Comment on the distribution and skewness of the data for these predictors.
C) Print the scatter plot of age versus income:
8. Density Plot and Transformation to Eliminate Skew
A) Print the density plots of income and age.
B) Is data right or left skewed?
C) If data is skewed, apply a transformation to remove the skewness as much as possible.
Hint: Check textbook page 74-75.
The following is the density plot of the income :
And the following is the density plot after log10() is used to transform income:
9. Convert Continuous Variable to Discrete
We would like to create the following ranges for the age predictor.
[0,25], (25,65], (65,130]
A) Use cut() function to cut the age predictor data into ranges given above. Add the result as a column to the data frame custData as a new predictor named ageRange.
Hint: Listing 4.6 in the textbook, page 71.
B) Plot the bar chart of the ageRange, as shown below:
10. Imputed Value for the age Predictor
You might believe that the data is missing because the data collection failed at random, independent of the situation and of the other values. In this case, you can replace the missing values with “a reasonable estimate,” or imputed value. Statistically, one commonly used estimate is the expected, or mean.
For age predictor replace all NAs by the mean of the age values that are not NAs.
Caution: The R mean() function returns a number not an integer. Make sure that you convert it to integer using as.integer() function.
A) Print the mean value you found.
B) After replacing the NAs with mean values, repeat the same process in part 10 above to print the bar chart:
In part 5) of week 4 assignment one of the question is about adding indicator variables (new columns) to the data frame. The following statement describes the indicator variable gas_with_electricity:
For example, the indicator variable gas_with_electricity will have the value 1 whenever the original gas_usage variable had the value 2, and the value 0 otherwise.
Assume that the name of data frame is custData. To add the indicator variable gas_with_electricity to the data frame custData, simply use the ifelse() as shown below:
The statement above adds a new column named gas_with_electricity with values 1 or 0 based on the values of gas_usage column from custData data frame. So, if the value of gas_usage is 2 it assigns 1 as the value of gas_with_electricity otherwise it assigns 0 as the value of gas_with_electricity.
The other two columns will be added similarly.
Why Work with Us
Top Quality and Well-Researched Papers
We always make sure that writers follow all your instructions precisely. You can choose your academic level: high school, college/university or professional, and we will assign a writer who has a respective degree.
Professional and Experienced Academic Writers
We have a team of professional writers with experience in academic and business writing. Many are native speakers and able to perform any task for which you need help.
Free Unlimited Revisions
If you think we missed something, send your order for a free revision. You have 10 days to submit the order for review after you have received the final document. You can do this yourself after logging into your personal account or by contacting our support.
Prompt Delivery and 100% Money-Back-Guarantee
All papers are always delivered on time. In case we need more time to master your paper, we may contact you regarding the deadline extension. In case you cannot provide us with more time, a 100% refund is guaranteed.
Original & Confidential
We use several writing tools checks to ensure that all documents you receive are free from plagiarism. Our editors carefully review all quotations in the text. We also promise maximum confidentiality in all of our services.
24/7 Customer Support
Our support agents are available 24 hours a day 7 days a week and committed to providing you with the best customer experience. Get in touch whenever you need any assistance.
Try it now!
How it works?
Follow these simple steps to get your paper done
Place your order
Fill in the order form and provide all details of your assignment.
Proceed with the payment
Choose the payment system that suits you most.
Receive the final file
Once your paper is ready, we will email it to you.
No need to work on your paper at night. Sleep tight, we will cover your back. We offer all kinds of writing services.
No matter what kind of academic paper you need and how urgent you need it, you are welcome to choose your academic level and the type of your paper at an affordable price. We take care of all your paper needs and give a 24/7 customer care support system.
Admission Essays & Business Writing Help
An admission essay is an essay or other written statement by a candidate, often a potential student enrolling in a college, university, or graduate school. You can be rest assurred that through our service we will write the best admission essay for you.
Our academic writers and editors make the necessary changes to your paper so that it is polished. We also format your document by correctly quoting the sources and creating reference lists in the formats APA, Harvard, MLA, Chicago / Turabian.
If you think your paper could be improved, you can request a review. In this case, your paper will be checked by the writer or assigned to an editor. You can use this option as many times as you see fit. This is free because we want you to be completely satisfied with the service offered.