Week 34 : Thanksgiving

Week 34 : Thanksgiving

# Load the packges
library(ggplot2)
library(ggthemr)
library(stringr)
library(gridExtra)
library(tidyverse)
library(tweenr)
library(gganimate)
library(kableExtra)
library(magrittr)
library(knitr)
library(readr)

#load the data
Thanksgiving<-read_csv("thanksgiving_meals.csv")

# apply the theme grape
ggthemr("grape")

#subset the people who said yes for celebrating thanksgiving
Thanksgiving_Yes<-subset(Thanksgiving,celebrate=="Yes")

#subset the people who said no for celebrating thanksgiving
Thanksgiving_No<-subset(Thanksgiving,celebrate=="No")

Data set was provided on week 34 for TidyTuesday analysis. As it is Thanksgiving week this is understandable. You can receive the data set here. There are more than 65 variables and 1058 observations. The data was acquired buy a survey conducted online and information about them are here.

GitHub Code

People who do and do not celebrate Thanksgiving

In this ThanksGiving data set 980 are celebrating, and 78 are not celebrating Thanksgiving. I will use plots to understand their composition and tables to explain them further.

Variables in consideration for this task is none other than Age, Gender, Family Income and US regions. Finally my aim is to create animated plots and interactive tables for the above variables through the help of packages gganimate and kable.

Age Distribution

First the age distribution has only 4 groups, where people who celebrate Thanksgiving in the age category of 18-29 is the very least. Highest count goes to the age category of 45-59 with 269. There are 33 missing observations and they were removed.

Considering the people who do not celebrate Thanksgiving the least count of 6 goes to category of 60+, but here the category of 18-29 has the highest count of 31. No missing observations were recorded here.

Below is an animated bar plot where the counts change for their respective 4 categories. As 90% of respondents have answered Yes for celebrating Thanksgiving and rest have answered No we can clearly see the count differences

attach(Thanksgiving_Yes)
attach(Thanksgiving_No)
# people who do not celebrate 
dont_age<-as.data.frame(summary.factor(Thanksgiving_No$age))
# people who do celebrate
do_age<-as.data.frame(summary.factor(na.omit(Thanksgiving_Yes$age)))
# people who do celebrate
data_do_age<-data.frame(group=c("18-29","30-44","45-59","60+"),
                      values=do_age$`summary.factor(na.omit(Thanksgiving_Yes$age))`,
                        frame=rep("Do Celebrate",4))
# people who do not celebrate
data_dont_age<-data.frame(group=c("18-29","30-44","45-59","60+"),
                          values=dont_age$`summary.factor(Thanksgiving_No$age)`,
                          frame=rep("Do not Celebrate",4))
# combining both
data_age<-rbind(data_do_age,data_dont_age)

# animated bar plot for people who do celebrate and who do not celebrate 
ggplot(data_age,aes(x=factor(group),values))+
  geom_bar(stat = 'identity',position = "identity")+
  ylab("Frequency")+xlab("Age Group")+
  ggtitle("Animated plot how Do and Do not people prefer \naccording to Age")+
  geom_text(aes(label=values), vjust=1)+
  transition_states(frame,transition_length = 2,state_length = 3)+
  enter_fade()+
  exit_shrink()+
  ease_aes('cubic-in-out')

detach(Thanksgiving_Yes)
detach(Thanksgiving_No)

Age with Other Factors

First table is Age vs Gender for people who celebrate Thanksgiving. All age categories have a percentage range in between 19 and 29. Highest percentage of 28.4055 is for Age category 45 - 59. Female have a higher percentage of 54.3823.

Female who are 60+ have the highest percentage of 15.2059, while lowest percentage of 8.7645 is for male in the age category of 18-29.

attach(Thanksgiving_Yes)

#kable(addmargins(table(age,gender))) %>% 
#  kable_styling("striped",full_width = F) %>%
#  column_spec(4,bold = T,color = "red") %>%
#  row_spec(5,bold = T,color = "red")

# table of percentages for people who do celebrate
kable(addmargins(round(prop.table(table(age,gender)),6)*100),"html") %>% 
  kable_styling("striped",full_width = F) %>%
  column_spec(4,bold = T,color = "red") %>%
  row_spec(5,bold = T,color = "red") %>%
  row_spec(0,bold = T,align = 'center')
Female Male Sum
18 - 29 10.7709 8.7645 19.5354
30 - 44 13.4108 11.4044 24.8152
45 - 59 14.9947 13.4108 28.4055
60+ 15.2059 12.0380 27.2439
Sum 54.3823 45.6177 100.0000
detach(Thanksgiving_Yes)

When considering the people who do not celebrate Thanksgiving, highest percentage of 62.8205 is for Male, while age category of 18-29 have the highest percentage of 39.7436.

Male who are in between 18 and 29 have the highest percentage of 25.6410, while Female who are above 60 have the lowest percentage of 2.5641.

attach(Thanksgiving_No)

#kable(addmargins(table(age,gender))) %>% 
#  kable_styling("striped",full_width = F) %>%
#  column_spec(4,bold = T,color = "red") %>%
#  row_spec(5,bold = T,color = "red")

# table of percentages for people who do not celebrate
kable(addmargins(round(prop.table(table(age,gender)),6)*100),"html") %>% 
  kable_styling("striped",full_width = F) %>%
  column_spec(4,bold = T,color = "red") %>%
  row_spec(5,bold = T,color = "red") %>%
  row_spec(0,bold = T,align = 'center')
Female Male Sum
18 - 29 14.1026 25.6410 39.7436
30 - 44 11.5385 19.2308 30.7693
45 - 59 8.9744 12.8205 21.7949
60+ 2.5641 5.1282 7.6923
Sum 37.1796 62.8205 100.0001
detach(Thanksgiving_No)

With relative to people who celebrate Thanksgiving in the Family Income category highest percentage goes to USD 25,000 to 49,999.

People who have Family Income USD 25,000 to 49,999 and age above 60 have the highest percentage of 4.96, while lowest percentage of 0.11 is for people who have Family Income in between USD 175,000 to 199,999 of age category of 18-29.

attach(Thanksgiving_Yes)

#kable(addmargins(table(age,family_income)),"html") %>% 
#  kable_styling("striped",full_width = F) %>%
#  column_spec(13,bold = T,color = "red") %>%
#  row_spec(5,bold = T,color = "red")

# table of percentages for people who do celebrate
kable(addmargins(round(prop.table(table(age,family_income)),6)*100),"html") %>% 
  kable_styling("striped",full_width = T,font_size = 9) %>%
  column_spec(13,bold = T,color = "red") %>%
  row_spec(5,bold = T,color = "red") %>%
  row_spec(0,bold = T,angle = 270,align = 'center')
$0 to $9,999 $10,000 to $24,999 $100,000 to $124,999 $125,000 to $149,999 $150,000 to $174,999 $175,000 to $199,999 $200,000 and up $25,000 to $49,999 $50,000 to $74,999 $75,000 to $99,999 Prefer not to answer Sum
18 - 29 3.4847 2.1119 0.9504 0.2112 0.5280 0.1056 0.6336 3.6959 2.1119 2.0063 3.6959 19.5354
30 - 44 1.3728 1.5839 2.5343 0.8448 0.6336 0.3168 1.2672 4.8574 4.0127 4.2239 3.1679 24.8153
45 - 59 0.3168 1.3728 4.1183 2.5343 2.0063 1.1616 3.1679 4.0127 3.1679 3.6959 2.8511 28.4056
60+ 0.3168 1.2672 3.9071 1.4784 0.8448 1.1616 2.9567 4.9630 4.1183 3.4847 2.7455 27.2441
Sum 5.4911 6.3358 11.5101 5.0687 4.0127 2.7456 8.0254 17.5290 13.4108 13.4108 12.4604 100.0004
detach(Thanksgiving_Yes)

Of people who do not celebrate Thanksgiving the Family Income category has the highest percentage which goes to People who prefer not to answer.

15 cells in this table are zero which is the lowest percentage that can occur, while highest percentage goes to people who are in the age category 18 -29 while Family Income is USD 0 to 9,999 and prefer not to answer.

attach(Thanksgiving_No) 

#kable(addmargins(table(age,family_income))) %>% 
#  kable_styling("striped",full_width = F) %>%
#  column_spec(13,bold = T,color = "red") %>%
#  row_spec(5,bold = T,color = "red")

# table of percentages for people who do not celebrate
kable(addmargins(round(prop.table(table(age,family_income)),6)*100),"html") %>% 
  kable_styling("striped",full_width = T,font_size = 9) %>%
  column_spec(13,bold = T,color = "red") %>%
  row_spec(5,bold = T,color = "red") %>%
  row_spec(0,bold = T,angle = 270,align = 'center') 
$0 to $9,999 $10,000 to $24,999 $100,000 to $124,999 $125,000 to $149,999 $150,000 to $174,999 $175,000 to $199,999 $200,000 and up $25,000 to $49,999 $50,000 to $74,999 $75,000 to $99,999 Prefer not to answer Sum
18 - 29 12.8205 2.5641 1.2821 0.0000 0.0000 0.0000 0.0000 2.5641 5.1282 2.5641 12.8205 39.7436
30 - 44 2.5641 3.8462 1.2821 0.0000 1.2821 0.0000 2.5641 8.9744 3.8462 1.2821 5.1282 30.7695
45 - 59 2.5641 2.5641 0.0000 1.2821 1.2821 1.2821 2.5641 5.1282 0.0000 3.8462 1.2821 21.7951
60+ 0.0000 1.2821 0.0000 0.0000 0.0000 0.0000 0.0000 1.2821 1.2821 0.0000 3.8462 7.6925
Sum 17.9487 10.2565 2.5642 1.2821 2.5642 1.2821 5.1282 17.9488 10.2565 7.6924 23.0770 100.0007
detach(Thanksgiving_No)

For the people who celebrate Thanksgiving highest percentage of 21.80 goes to US region of South Atlantic. While lowest percentage goes to Mountain with 4.41.

People who are from South Atlantic in the age categories of 45-59 and 60+ have the highest percentage of 6.55. While the lowest percentage of 0.64 goes to people who are in the age category of 18-29 and from East South Central.

attach(Thanksgiving_Yes)

#kable(addmargins(table(age,us_region))) %>% 
#  kable_styling("striped",full_width = F) %>%
#  column_spec(11,bold = T,color = "red") %>%
#  row_spec(5,bold = T,color = "red")

# table of percentages for people who do celebrate
kable(addmargins(round(prop.table(table(age,us_region)),4)*100),"html")  %>% 
  kable_styling("striped",full_width = T,font_size = 9) %>%
  column_spec(11,bold = T,color = "red") %>%
  row_spec(5,bold = T,color = "red") %>%
  row_spec(0,bold = T,align = 'center')
East North Central East South Central Middle Atlantic Mountain New England Pacific South Atlantic West North Central West South Central Sum
18 - 29 2.79 0.64 2.26 0.75 0.97 3.11 3.76 2.26 2.47 19.01
30 - 44 3.44 1.18 4.51 0.97 1.61 3.87 4.94 1.72 2.47 24.71
45 - 59 4.40 2.15 4.94 1.40 1.72 3.22 6.55 1.72 2.69 28.79
60+ 4.94 2.04 3.87 1.29 1.61 3.76 6.55 1.93 1.50 27.49
Sum 15.57 6.01 15.58 4.41 5.91 13.96 21.80 7.63 9.13 100.00
detach(Thanksgiving_Yes)

Of people who do not celebrate Thanksgiving 23.5294% are from Pacific, while lowest percentage is for people who are from New England and West North Central with 4.4118.

10 cells have zero values which is the lowest percentage value. While highest percentage of 11.7647 occurs to people from Pacific and in the age category 30-44.

attach(Thanksgiving_No)

#kable(addmargins(table(age,us_region)))  %>% 
#  kable_styling("striped",full_width = F) %>%
#  column_spec(11,bold = T,color = "red") %>%
#  row_spec(5,bold = T,color = "red")

# table of percentages for people who do not celebrate
kable(addmargins(round(prop.table(table(age,us_region)),6)*100),"html")  %>% 
  kable_styling("striped",full_width = T,font_size = 9) %>%
  column_spec(11,bold = T,color = "red") %>%
  row_spec(5,bold = T,color = "red") %>%
  column_spec(1,bold = T,width = '2cm') %>%
  row_spec(0,bold = T,align = 'center')
East North Central East South Central Middle Atlantic Mountain New England Pacific South Atlantic West North Central West South Central Sum
18 - 29 1.4706 2.9412 8.8235 4.4118 0.0000 7.3529 4.4118 2.9412 5.8824 38.2354
30 - 44 2.9412 0.0000 5.8824 1.4706 1.4706 11.7647 5.8824 0.0000 2.9412 32.3531
45 - 59 2.9412 2.9412 4.4118 2.9412 2.9412 1.4706 4.4118 0.0000 0.0000 22.0590
60+ 0.0000 0.0000 1.4706 0.0000 0.0000 2.9412 1.4706 1.4706 0.0000 7.3530
Sum 7.3530 5.8824 20.5883 8.8236 4.4118 23.5294 16.1766 4.4118 8.8236 100.0005
detach(Thanksgiving_No)

Gender Distribution

We have two types of gender categories in this data set which are male and female. According to the people who celebrate Thanksgiving 515 are Female, while only 432 are male. Here also there are 33 missing observations and they have been removed.

But this is not the case for those who do not celebrate Thanksgiving. Female have a count of only 29, where males have a count of 49. There were no missing observations.

attach(Thanksgiving_Yes)
attach(Thanksgiving_No)
# people who do not celebrate
dont_sex<-as.data.frame(summary.factor(Thanksgiving_No$gender))
# people who do celebrate
do_sex<-as.data.frame(summary.factor(na.omit(Thanksgiving_Yes$gender)))
# people who do celebrate
data_do_sex<-data.frame(group=c("Female","Male"),
                  values=do_sex$`summary.factor(na.omit(Thanksgiving_Yes$gender))`,
                  frame=rep("Do Celebrate",2))
# people who do not celebrate
data_dont_sex<-data.frame(group=c("Female","Male"),
                          values=dont_sex$`summary.factor(Thanksgiving_No$gender)`,
                          frame=rep("Do not Celebrate",2))
# combining both 
data_sex<-rbind(data_do_sex,data_dont_sex)

# animated plot for people who do celebrate and who do not celebrate
ggplot(data_sex,aes(group,values))+
  geom_bar(stat = 'identity',position = "identity")+
  ylab("Frequency")+xlab("Gender")+
  ggtitle("Animated plot how Do and Do not people prefer \naccording to Gender")+
  scale_y_continuous(labels= seq(0,520,10),breaks = seq(0,520,10))+
  geom_text(aes(label=values), vjust=1)+
  transition_states(frame,transition_length = 2,state_length = 3)+
  enter_fade()+
  exit_shrink()+
  ease_aes('elastic-in-out')

detach(Thanksgiving_Yes)
detach(Thanksgiving_No)

Gender with Other Factors

Of people who do celebrate Thanksgiving highest percentage of 10.14 goes to Females where Family Income is USD 25,000 to 49,999. While lowest percentage of 1.27 goes to Males of Family Income category USD 175,000 to 199,999.

attach(Thanksgiving_Yes)

#kable(addmargins(table(gender,family_income)))  %>% 
#  kable_styling("striped",full_width = F) %>%
#  column_spec(13,bold = T,color = "red") %>%
#  row_spec(3,bold = T,color = "red")

# table of percentages for people who do celebrate
kable(addmargins(round(prop.table(table(gender,family_income)),4)*100),"html")  %>% 
  kable_styling("striped",full_width = T,font_size = 8.75) %>%
  column_spec(13,bold = T,color = "red") %>%
  row_spec(3,bold = T,color = "red") %>%
  column_spec(1,bold = T,width = '2cm') %>%
  row_spec(0,bold = T,angle = 270,align = 'center')
$0 to $9,999 $10,000 to $24,999 $100,000 to $124,999 $125,000 to $149,999 $150,000 to $174,999 $175,000 to $199,999 $200,000 and up $25,000 to $49,999 $50,000 to $74,999 $75,000 to $99,999 Prefer not to answer Sum
Female 2.64 3.80 5.39 2.11 2.11 1.48 4.44 10.14 7.92 6.97 7.39 54.39
Male 2.85 2.53 6.12 2.96 1.90 1.27 3.59 7.39 5.49 6.44 5.07 45.61
Sum 5.49 6.33 11.51 5.07 4.01 2.75 8.03 17.53 13.41 13.41 12.46 100.00
detach(Thanksgiving_Yes)

3 cells in the below table are zero values, which is the lowest percentage value. Highest percentage of 14.1026 goes to Males who chose not to answer regarding Family Income where they do not celebrate Thanksgiving.

attach(Thanksgiving_No)

#kable(addmargins(table(gender,family_income)))  %>% 
#  kable_styling("striped",full_width = F) %>%
#  column_spec(13,bold = T,color = "red") %>%
#  row_spec(3,bold = T,color = "red")

# table of percentages for people who do not celebrate
kable(addmargins(round(prop.table(table(gender,family_income)),6)*100),"html")  %>% 
  kable_styling("striped",full_width = F,font_size = 10) %>%
  column_spec(13,bold = T,color = "red") %>%
  row_spec(3,bold = T,color = "red") %>%
  column_spec(1,bold = T,width = '1.5cm') %>%
  row_spec(0,bold = T,angle = 270,align = 'center')
$0 to $9,999 $10,000 to $24,999 $100,000 to $124,999 $125,000 to $149,999 $150,000 to $174,999 $175,000 to $199,999 $200,000 and up $25,000 to $49,999 $50,000 to $74,999 $75,000 to $99,999 Prefer not to answer Sum
Female 6.4103 3.8462 1.2821 1.2821 0.0000 0.0000 1.2821 5.1282 3.8462 5.1282 8.9744 37.1798
Male 11.5385 6.4103 1.2821 0.0000 2.5641 1.2821 3.8462 12.8205 6.4103 2.5641 14.1026 62.8208
Sum 17.9488 10.2565 2.5642 1.2821 2.5641 1.2821 5.1283 17.9487 10.2565 7.6923 23.0770 100.0006
detach(Thanksgiving_No)

Female from South Atlantic who celebrate Thanksgiving have a highest percentage of 12.14. Where respondents from Mountain region and Males have the lowest percentage of 1.29.

attach(Thanksgiving_Yes)

#kable(addmargins(table(gender,us_region)))  %>% 
#  kable_styling("striped",full_width = F) %>%
#  column_spec(11,bold = T,color = "red") %>%
#  row_spec(3,bold = T,color = "red")

# table of percentages for people who do celebrate
kable(addmargins(round(prop.table(table(gender,us_region)),4)*100),"html")  %>% 
  kable_styling("striped",full_width = F,font_size = 10) %>%
  column_spec(11,bold = T,color = "red") %>%
  row_spec(3,bold = T,color = "red") %>%
  column_spec(1,bold = T,width = '1.5cm') %>%
  row_spec(0,bold = T,align = 'center')
East North Central East South Central Middle Atlantic Mountain New England Pacific South Atlantic West North Central West South Central Sum
Female 8.16 3.33 8.59 3.11 3.33 7.30 12.14 4.19 4.40 54.55
Male 7.41 2.69 6.98 1.29 2.58 6.66 9.67 3.44 4.73 45.45
Sum 15.57 6.02 15.57 4.40 5.91 13.96 21.81 7.63 9.13 100.00
detach(Thanksgiving_Yes)

Male respondents who do not celebrate Thanksgiving where they are from Middle Atlantic have a highest percentage of 16.1765. Even though Females of West South Central have the lowest percentage of 1.4706 and Males from West North Central also have the same percentage value.

attach(Thanksgiving_No)

#kable(addmargins(table(gender,us_region)))  %>% 
#  kable_styling("striped",full_width = F) %>%
#  column_spec(11,bold = T,color = "red") %>%
#  row_spec(3,bold = T,color = "red")

# table of percentages for people who do not celebrate
kable(addmargins(round(prop.table(table(gender,us_region)),6)*100),"html")  %>% 
  kable_styling("striped",full_width = F,font_size = 10) %>%
  column_spec(11,bold = T,color = "red") %>%
  row_spec(3,bold = T,color = "red") %>%
  column_spec(1,bold = T,width = '1.5cm') %>%
  row_spec(0,bold = T,align = 'center')
East North Central East South Central Middle Atlantic Mountain New England Pacific South Atlantic West North Central West South Central Sum
Female 4.4118 2.9412 4.4118 4.4118 1.4706 8.8235 7.3529 2.9412 1.4706 38.2354
Male 2.9412 2.9412 16.1765 4.4118 2.9412 14.7059 8.8235 1.4706 7.3529 61.7648
Sum 7.3530 5.8824 20.5883 8.8236 4.4118 23.5294 16.1764 4.4118 8.8235 100.0002
detach(Thanksgiving_No)

Family Income Distribution

There are 11 categories when it comes to Family Income. The option of Prefer Not to answer is given and has been chosen by people who celebrate and people who do not celebrate Thanksgiving.

Considering the people the who celebrate Thanksgiving, highest count of 166 goes to the category of 25,000 to 49,999 USD. While least count goes to 175,000 to 199,999 USD and the count is 26. Further, 118 people have chosen not to answer this question. 33 Missing observations were removed.

Where as in people who do not celebrate Thanksgiving, second highest count goes to the categories of 0 to 9,999 USD and 25,000 to 49,999 USD, where the count is 14. Similarly, for the least count of 1 also there are two Family Income categories, which are 125,000 to 149,999 USD and 175,000 to 199,999 USD. Prefer not to answer is the choice of 18 respondents who participated in this survey. No missing observations were recorded.

As before here also an animated bar plot is used to explain this.

attach(Thanksgiving_Yes)
attach(Thanksgiving_No)
# people who do not celebrate
dont_FI<-as.data.frame(summary.factor(Thanksgiving_No$family_income))
# people who do celebrate
do_FI<-as.data.frame(summary.factor(na.omit(Thanksgiving_Yes$family_income)))
# people who do celebrate
data_do_FI<-data.frame(group=c("0-9,999","10,000-24,999","25,000-49,999",
                               "50,000-74,999","75,000-99,999","100,000-124,999",
                               "125,000-149,999","150,000-174,999",
                               "175,000-199,999","200,000 and up","Not to answer"),
                       values=c(52,60,166,127,127,109,48,38,26,76,118),
                       frame=rep("Do Celebrate",11))
# people who do not celebrate
data_dont_FI<-data.frame(group=c("0-9,999","10,000-24,999","25,000-49,999",
                                 "50,000-74,999","75,000-99,999","100,000-124,999",
                               "125,000-149,999","150,000-174,999","175,000-199,999",
                               "200,000 and up","Not to answer"),
                         values=c(14,8,14,8,6,2,1,2,1,4,18),
                         frame=rep("Do not Celebrate",11))
# combine the dataset
data_FI<-rbind(data_do_FI,data_dont_FI)

# animated plot for people who do celebrate and who do not celebrate
ggplot(data_FI,aes(group,values))+
  geom_bar(stat = 'identity',position = "identity")+
  ylab("Frequency")+xlab("Family Income in dollars")+
  ggtitle("Animated plot how Do and Do not people prefer \naccording to Family Income")+
  scale_y_continuous(labels= seq(0,170,10),breaks = seq(0,170,10))+
  geom_text(aes(label=values), vjust=1)+coord_flip()+
  transition_states(frame,transition_length = 2,state_length = 3)+
  enter_fade()+
  exit_shrink()+
  ease_aes('bounce-in')

detach(Thanksgiving_Yes)
detach(Thanksgiving_No)

Family Income with Other Factors

Of people who do celebrate Thanksgiving the lowest percentage of zero is for people from West North Central with Family Income USD 175,000 to 199,999, people from West South Central with Family Income USD 175,000 to 199,999, people from Mountain with Family Income USD 150,000 to 174,999 and people from Mountain with Family Income USD 175,000 to 199,999. Highest percentage of 4.9409 goes to people from South Atlantic with Family Income USD 25,000 to 49,999.

attach(Thanksgiving_Yes)

#kable(addmargins(table(family_income,us_region)))  %>% 
#  kable_styling("striped",full_width = F) %>%
#  column_spec(11,bold = T,color = "red") %>%
#  row_spec(12,bold = T,color = "red")

# table of percentages for people who do celebrate
kable(addmargins(round(prop.table(table(family_income,us_region)),6)*100),"html")%>% 
  kable_styling("striped",full_width = F,font_size = 10) %>%
  column_spec(11,bold = T,color = "red") %>%
  row_spec(12,bold = T,color = "red") %>%
  column_spec(1,bold = T,width = '2.5cm') %>%
  row_spec(0,bold = T,align = 'center')
East North Central East South Central Middle Atlantic Mountain New England Pacific South Atlantic West North Central West South Central Sum
$0 to $9,999 0.6445 0.2148 0.6445 0.1074 0.2148 1.0741 0.8593 0.2148 0.9667 4.9409
$10,000 to $24,999 0.8593 0.6445 0.5371 0.1074 0.3222 1.5038 1.3963 0.5371 0.3222 6.2299
$100,000 to $124,999 2.7927 0.8593 1.3963 0.4296 0.5371 1.6112 2.2556 0.3222 1.5038 11.7078
$125,000 to $149,999 0.5371 0.1074 0.6445 0.3222 0.3222 0.6445 1.8260 0.2148 0.5371 5.1558
$150,000 to $174,999 0.2148 0.2148 0.5371 0.0000 0.2148 0.9667 0.8593 0.3222 0.7519 4.0816
$175,000 to $199,999 0.6445 0.2148 1.0741 0.0000 0.2148 0.3222 0.3222 0.0000 0.0000 2.7926
$200,000 and up 0.7519 0.4296 2.0408 0.6445 0.7519 1.0741 1.1815 0.5371 0.6445 8.0559
$25,000 to $49,999 2.2556 1.1815 2.5779 0.6445 1.0741 2.1482 4.9409 1.5038 1.0741 17.4006
$50,000 to $74,999 2.5779 0.9667 2.0408 0.5371 0.5371 1.8260 2.7927 1.3963 0.8593 13.5339
$75,000 to $99,999 2.7927 0.8593 1.9334 0.8593 0.5371 1.2889 2.4705 1.2889 1.5038 13.5339
Prefer not to answer 1.5038 0.3222 2.1482 0.7519 1.1815 1.5038 2.9001 1.2889 0.9667 12.5671
Sum 15.5748 6.0149 15.5747 4.4039 5.9076 13.9635 21.8044 7.6261 9.1301 100.0000
detach(Thanksgiving_Yes)

There are a lot of cell values which have zero therefore I am not going to state them. Further, Highest percentage value of 5.8824 is from people of Middle Atlantic and Family Income categories of USD 0 to 9,999 and USD 25,000 to 49,999.

attach(Thanksgiving_No)

#kable(addmargins(table(family_income,us_region)))  %>% 
#  kable_styling("striped",full_width = F) %>%
#  column_spec(11,bold = T,color = "red") %>%
#  row_spec(12,bold = T,color = "red")

# table of percentages for people who do not celebrate
kable(addmargins(round(prop.table(table(family_income,us_region)),6)*100),"html")%>% 
  kable_styling("striped",full_width = F,font_size = 10) %>%
  column_spec(11,bold = T,color = "red") %>%
  row_spec(12,bold = T,color = "red")%>%
  column_spec(1,bold = T,width = '2.5cm') %>%
  row_spec(0,bold = T,align = 'center')
East North Central East South Central Middle Atlantic Mountain New England Pacific South Atlantic West North Central West South Central Sum
$0 to $9,999 0.0000 0.0000 5.8824 4.4118 0.0000 2.9412 2.9412 0.0000 1.4706 17.6472
$10,000 to $24,999 0.0000 1.4706 0.0000 1.4706 0.0000 1.4706 1.4706 1.4706 1.4706 8.8236
$100,000 to $124,999 0.0000 0.0000 1.4706 0.0000 0.0000 1.4706 0.0000 0.0000 0.0000 2.9412
$125,000 to $149,999 1.4706 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.4706
$150,000 to $174,999 0.0000 0.0000 2.9412 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 2.9412
$175,000 to $199,999 0.0000 1.4706 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.4706
$200,000 and up 1.4706 0.0000 0.0000 0.0000 1.4706 0.0000 2.9412 0.0000 0.0000 5.8824
$25,000 to $49,999 2.9412 0.0000 5.8824 1.4706 1.4706 4.4118 2.9412 0.0000 0.0000 19.1178
$50,000 to $74,999 1.4706 1.4706 1.4706 0.0000 0.0000 4.4118 0.0000 1.4706 1.4706 11.7648
$75,000 to $99,999 0.0000 0.0000 0.0000 1.4706 1.4706 4.4118 1.4706 0.0000 0.0000 8.8236
Prefer not to answer 0.0000 1.4706 2.9412 0.0000 0.0000 4.4118 4.4118 1.4706 4.4118 19.1178
Sum 7.3530 5.8824 20.5884 8.8236 4.4118 23.5296 16.1766 4.4118 8.8236 100.0008
detach(Thanksgiving_No)

US Region Distribution

There are 9 regions in both sides, and also both sides have missing values. People who do celebrate Thanksgiving have 49 missing values, while only 10 are missing values for people who do not celebrate Thanksgiving.

attach(Thanksgiving_Yes)
attach(Thanksgiving_No)
# people who do not celebrate
dont_USR<-as.data.frame(summary.factor(na.omit(Thanksgiving_No$us_region)))
# people who do celebrate
do_USR<-as.data.frame(summary.factor(na.omit(Thanksgiving_Yes$us_region)))
# people who do celebrate
data_do_USR<-data.frame(group=c("East North Central", "East South Central",
                                "West South Central", "West North Central",
                                "Middle Atlantic","South Atlantic", "Mountain", 
                                "New England", "Pacific"),
                       values=c(145,56,85,71,145,203,41,55,130),
                       frame=rep("Do Celebrate",9))
# people who do not celebrate
data_dont_USR<-data.frame(group=c("East North Central", "East South Central",
                                  "West South Central", "West North Central",
                                "Middle Atlantic","South Atlantic", "Mountain", 
                                "New England", "Pacific"),
                         values=c(5,4,6,3,14,11,6,3,16),
                         frame=rep("Do not Celebrate",9))
# combine both datasets
data_USR<-rbind(data_do_USR,data_dont_USR)

# animated plot for people who do celebrate and who do not celebrate
ggplot(data_USR,aes(x=str_wrap(group,7),values))+
  geom_bar(stat = 'identity',position = "identity")+
  ylab("Frequency")+xlab("US Regions")+
  ggtitle("Animated plot how Do and Do not people prefer \naccording to US Regions")+
  scale_y_continuous(labels= seq(0,210,10),breaks = seq(0,210,10))+
  geom_text(aes(label=values), vjust=1)+
  transition_states(frame,transition_length = 2,state_length = 3)+
  enter_fade()+
  exit_shrink()+
  ease_aes('cubic-in-out')

detach(Thanksgiving_Yes)
detach(Thanksgiving_No)

Conclusion

I shall conclude my findings in point form

  • We can use gganimate to make bar plots interesting and useful.

  • kable is very useful because of styling options.

Further Analysis

  • There are more than 50 variables therefore much more can be done than describing the data-set.

  • We can use advanced methods such as clustering and model fitting.

Please see that
This is my fourth post on the internet so please be kind to tolerate my mistakes in grammar and spellings. I intend to post more statistics related materials in the future and learn accordingly. Thank you for reading.

THANK YOU

Related