class: title-slide <br> <br> .right-panel[ # Changing Variables ## Dr. Mine Dogucu ] --- ```r glimpse(lapd) ``` ``` ## Rows: 14,824 ## Columns: 3 ## $ job_class_title <chr> "Police Detective II", "Police Sergeant I", "Police L… ## $ employment_type <chr> "Full Time", "Full Time", "Full Time", "Full Time", "… ## $ base_pay <dbl> 119321.60, 113270.70, 148116.00, 78676.87, 109373.63,… ``` **Goal**: Create a new variable called `base_pay_k` that represents `base_pay` in thousand dollars. --- ```r lapd %>% mutate(base_pay_k = base_pay/1000) ``` ``` ## # A tibble: 14,824 x 4 ## job_class_title employment_type base_pay base_pay_k ## <chr> <chr> <dbl> <dbl> ## 1 Police Detective II Full Time 119322. 119. ## 2 Police Sergeant I Full Time 113271. 113. ## 3 Police Lieutenant II Full Time 148116 148. ## 4 Police Service Representative II Full Time 78677. 78.7 ## 5 Police Officer III Full Time 109374. 109. ## 6 Police Officer II Full Time 95002. 95.0 ## 7 Police Officer II Full Time 95379. 95.4 ## 8 Police Officer II Full Time 95388. 95.4 ## 9 Equipment Mechanic Full Time 80496 80.5 ## 10 Detention Officer Full Time 69640 69.6 ## # … with 14,814 more rows ``` --- ```r glimpse(lapd) ``` ``` ## Rows: 14,824 ## Columns: 3 ## $ job_class_title <chr> "Police Detective II", "Police Sergeant I", "Police L… ## $ employment_type <chr> "Full Time", "Full Time", "Full Time", "Full Time", "… ## $ base_pay <dbl> 119321.60, 113270.70, 148116.00, 78676.87, 109373.63,… ``` **Goal**: Create a new variable called `base_pay_level` which has `Less Than Median`, `Greater Than Median`. We will consider $62474 as the median (from previous lecture). --- Let's first check to see there is anyone earning exactly the median value. ```r lapd %>% filter(base_pay == 62474) ``` ``` ## # A tibble: 0 x 3 ## # … with 3 variables: job_class_title <chr>, employment_type <chr>, ## # base_pay <dbl> ``` --- ```r lapd %>% mutate(base_pay_level = if_else(base_pay < 62474, "Less Than Median", "Greater Than Median")) ``` ``` ## # A tibble: 14,824 x 4 ## job_class_title employment_type base_pay base_pay_level ## <chr> <chr> <dbl> <chr> ## 1 Police Detective II Full Time 119322. Greater Than Median ## 2 Police Sergeant I Full Time 113271. Greater Than Median ## 3 Police Lieutenant II Full Time 148116 Greater Than Median ## 4 Police Service Representative II Full Time 78677. Greater Than Median ## 5 Police Officer III Full Time 109374. Greater Than Median ## 6 Police Officer II Full Time 95002. Greater Than Median ## 7 Police Officer II Full Time 95379. Greater Than Median ## 8 Police Officer II Full Time 95388. Greater Than Median ## 9 Equipment Mechanic Full Time 80496 Greater Than Median ## 10 Detention Officer Full Time 69640 Greater Than Median ## # … with 14,814 more rows ``` --- ```r glimpse(lapd) ``` ``` ## Rows: 14,824 ## Columns: 3 ## $ job_class_title <chr> "Police Detective II", "Police Sergeant I", "Police L… ## $ employment_type <chr> "Full Time", "Full Time", "Full Time", "Full Time", "… ## $ base_pay <dbl> 119321.60, 113270.70, 148116.00, 78676.87, 109373.63,… ``` **Goal**: Create a new variable called `base_pay_level` which has `Less Than 0`, `No Income`, `Less than Median and Greater than 0` and `Greater than Median`. We will consider $62474 as the median (from previous lecture). --- ```r lapd %>% mutate(base_pay_level = case_when( base_pay < 0 ~ "Less than 0", base_pay == 0 ~ "No Income", base_pay < 62474 & base_pay > 0 ~ "Less than Median, Greater than 0", base_pay > 62474 ~ "Greater than Median")) ``` ``` ## # A tibble: 14,824 x 4 ## job_class_title employment_type base_pay base_pay_level ## <chr> <chr> <dbl> <chr> ## 1 Police Detective II Full Time 119322. Greater than Median ## 2 Police Sergeant I Full Time 113271. Greater than Median ## 3 Police Lieutenant II Full Time 148116 Greater than Median ## 4 Police Service Representative II Full Time 78677. Greater than Median ## 5 Police Officer III Full Time 109374. Greater than Median ## 6 Police Officer II Full Time 95002. Greater than Median ## 7 Police Officer II Full Time 95379. Greater than Median ## 8 Police Officer II Full Time 95388. Greater than Median ## 9 Equipment Mechanic Full Time 80496 Greater than Median ## 10 Detention Officer Full Time 69640 Greater than Median ## # … with 14,814 more rows ``` --- To see what we have created ```r lapd %>% mutate(base_pay_level = case_when( base_pay < 0 ~ "Less than 0", base_pay == 0 ~ "No Income", base_pay < 62474 & base_pay > 0 ~ "Less than Median, Greater than 0", base_pay > 62474 ~ "Greater than Median")) %>% select(base_pay_level) ``` ``` ## # A tibble: 14,824 x 1 ## base_pay_level ## <chr> ## 1 Greater than Median ## 2 Greater than Median ## 3 Greater than Median ## 4 Greater than Median ## 5 Greater than Median ## 6 Greater than Median ## 7 Greater than Median ## 8 Greater than Median ## 9 Greater than Median ## 10 Greater than Median ## # … with 14,814 more rows ``` --- We can use pipes with ggplot too! .left-panel[ ```r lapd %>% mutate(base_pay_level = case_when( base_pay < 0 ~ "Less than 0", base_pay == 0 ~ "No Income", base_pay < 62474 & base_pay > 0 ~ "Less than Median, Greater than 0", base_pay > 62474 ~ "Greater than Median")) %>% select(base_pay_level) %>% ggplot(aes(x = base_pay_level)) + geom_bar() ``` ] .right-panel[ ![](04c-change-variable_files/figure-html/unnamed-chunk-13-1.png)<!-- --> ] --- ```r glimpse(lapd) ``` ``` ## Rows: 14,824 ## Columns: 3 ## $ job_class_title <chr> "Police Detective II", "Police Sergeant I", "Police L… ## $ employment_type <chr> "Full Time", "Full Time", "Full Time", "Full Time", "… ## $ base_pay <dbl> 119321.60, 113270.70, 148116.00, 78676.87, 109373.63,… ``` **Goal**: Make `job_class_title` and `employment_type` factor variables. --- ```r lapd %>% mutate(employment_type = as.factor(employment_type), job_class_title = as.factor(job_class_title)) ``` ``` ## # A tibble: 14,824 x 3 ## job_class_title employment_type base_pay ## <fct> <fct> <dbl> ## 1 Police Detective II Full Time 119322. ## 2 Police Sergeant I Full Time 113271. ## 3 Police Lieutenant II Full Time 148116 ## 4 Police Service Representative II Full Time 78677. ## 5 Police Officer III Full Time 109374. ## 6 Police Officer II Full Time 95002. ## 7 Police Officer II Full Time 95379. ## 8 Police Officer II Full Time 95388. ## 9 Equipment Mechanic Full Time 80496 ## 10 Detention Officer Full Time 69640 ## # … with 14,814 more rows ``` --- `as.factor()` - makes a vector factor `as.numeric()` - makes a vector numeric `as.integer()` - makes a vector integer `as.double()` - makes a vector double `as.character()` - makes a vector character --- class: middle Once again we did not "save" anything into `lapd`. As we work on data cleaning it makes sense not to "save" the data frames. Once we see the final data frame we want then we can "save" (i.e. overwrite) it. --- ```r lapd <- lapd %>% mutate(employment_type = as.factor(employment_type), job_class_title = as.factor(job_class_title), base_pay_level = case_when( base_pay < 0 ~ "Less than 0", base_pay == 0 ~ "No Income", base_pay < 62474 & base_pay > 0 ~ "Less than Median, Greater than 0", base_pay > 62474 ~ "Greater than Median")) ```