6 Streaming Analytics

6.1 Introduction

In this chapter, we explore whether age group influences people’s preferred streaming platform. Specifically, we examine whether certain platforms (Netflix, Hulu, Disney+, or Amazon) appeal more to specific age demographics. Using simulated survey data, we conducted analyses including descriptive statistics, data visualization, and a chi-square test of independence. We also calculated Cramer’s V to assess the strength of the relationship between age group and platform preference, and identified which specific age–platform combinations contribute most to any significant differences.

6.2 Required Packages

First, we load the packages used in this report.

library(readxl)
library(tidyverse)
library(ggplot2)
library(ggthemes)
library(pheatmap)
library(rcompanion)

6.3 Data Preparation

6.3.1 Load and Setup Data

platform_age_preference <- read_excel("Streaming Services and Age.xlsx")
summary(platform_age_preference)
##     AgeCat            Platform        
##  Length:300         Length:300        
##  Class :character   Class :character  
##  Mode  :character   Mode  :character
str(platform_age_preference)
## tibble [300 × 2] (S3: tbl_df/tbl/data.frame)
##  $ AgeCat  : chr [1:300] "18–25" "18–25" "18–25" "18–25" ...
##  $ Platform: chr [1:300] "Other" "Hulu" "Netflix" "Netflix" ...

6.3.2 Contingency Table: Age Catorgry

table(platform_age_preference$AgeCat) 
## 
## 18–25 26–40   41+ 
##   100   100   100

6.3.3 Contingency Table: Streaming Platform

table(platform_age_preference$Platform)
## 
##  Amazon Disney+    Hulu Netflix   Other 
##      54      61      46     111      28

6.3.4 Contingency Table: Age Catorgry + Streaming Platform

table(platform_age_preference$AgeCat, platform_age_preference$Platform)
##        
##         Amazon Disney+ Hulu Netflix Other
##   18–25      4      22   23      47     4
##   26–40     11      25   16      41     7
##   41+       39      14    7      23    17
table_total <- table(platform_age_preference$AgeCat, platform_age_preference$Platform)

6.4 Visualization

6.4.1 Stacked Bar Chart: Platform Preference Within Each Age Group

stacked_graph <- ggplot(platform_age_preference, aes(x = AgeCat, fill = Platform)) +
  geom_bar(position = "fill") +
  labs(
    title = "Platform Preference Within Each Age Group",
    x = "Age Group",
    y = "Proportion",
    fill = "Platform") +
  theme_fivethirtyeight()

stacked_graph
Stacked bar chart showing the proportion of streaming platform preferences within each age group. Each bar represents an age group, with segments indicating the relative share of preferences for each platform.

(#fig:stacked bar)Stacked bar chart showing the proportion of streaming platform preferences within each age group. Each bar represents an age group, with segments indicating the relative share of preferences for each platform.

6.4.1.1 Interpretation:

The stacked bar chart shows the proportion of streaming platform preferences within each age group. We can see that Netflix is the most popular platform among people aged 18–25 and 26–40. However, as age increases, preferences begin to shift. For adults aged 41 and older, Amazon becomes the most preferred platform instead of Netflix.

6.4.2 Clustered Bar Chart: Platform Preference Counts by Age Groups

clustered_graph<- ggplot(platform_age_preference, aes(x= AgeCat, fill = Platform)) +
  geom_bar(position = "dodge") +
  labs(
    title="Platform Preference Counts by Age Groups",
    x = "Age Group",
    y = "Counts",
    fill = "Platform") + 
  theme_fivethirtyeight()

clustered_graph 
Clustered bar chart showing the number of respondents preferring each streaming platform across age groups. Bars within each age group represent counts for each platform, allowing for direct comparison of platform popularity by age category.

(#fig:clustered bar)Clustered bar chart showing the number of respondents preferring each streaming platform across age groups. Bars within each age group represent counts for each platform, allowing for direct comparison of platform popularity by age category.

6.4.2.1 Interpretation:

The clustered bar chart shows the number of people in each age group who prefer each streaming platform. We can see that Netflix has the highest number of users in the 18–25 and 26–40 age groups. Again, for adults aged 41 and older, Amazon becomes the most preferred platform, replacing Netflix as the leading choice.

6.5 Chi-Square Test of Independence

chisq.test(table_total)
## 
##  Pearson's Chi-squared test
## 
## data:  table_total
## X-squared = 68.044, df = 8, p-value = 1.203e-11
chi_total <- chisq.test(table_total)

6.5.0.1 Interpretation:

The chi-square test showed a significant association between age group and streaming platform preference, χ²(8) = 68.04, p < .001. This means platform preference differs by age, so age and platform choice are not independent.

6.6 Observed, Expected, and Residual Values

6.6.1 Observed Values

chi_total$observed
##        
##         Amazon Disney+ Hulu Netflix Other
##   18–25      4      22   23      47     4
##   26–40     11      25   16      41     7
##   41+       39      14    7      23    17

6.6.2 Expected Values

chi_total$expected
##        
##         Amazon  Disney+     Hulu Netflix    Other
##   18–25     18 20.33333 15.33333      37 9.333333
##   26–40     18 20.33333 15.33333      37 9.333333
##   41+       18 20.33333 15.33333      37 9.333333

6.6.3 Residual Values

chi_total$residuals
##        
##             Amazon    Disney+       Hulu    Netflix      Other
##   18–25 -3.2998316  0.3696106  1.9578900  1.6439899 -1.7457431
##   26–40 -1.6499158  1.0349098  0.1702513  0.6575959 -0.7637626
##   41+    4.9497475 -1.4045204 -2.1281413 -2.3015858  2.5095057

6.6.3.1 Interpretation:

After running a chi-square test, we looked more closely at the observed, expected, and residual values. Residuals show how different the observed values are from the expected values. The largest positive residual (more people than expected) was for adults aged 41+ choosing Amazon (39 observed vs. 18 expected). Older adults also chose “Other” platforms more than expected (17 observed vs. 9.33 expected). In contrast, the largest negative residual (fewer people than expected) was for the 18–25 age group choosing Amazon. Only 4 people chose Amazon in this group, even though about 18 were expected (residual = -3.30).

6.7 Contributions to the Chi-Square Statistic

6.7.1 Cell Contributions

cell_contributions <- round((((chi_total$observed - chi_total$expected)^2) / chi_total$expected),2)
contributions_function <- function(x){
  cell_contributions<- round((((x$observed - x$expected)^2) / x$expected),2)
  cell_contributions
}
contributions_total <- contributions_function(chi_total)
contributions_total
##        
##         Amazon Disney+  Hulu Netflix Other
##   18–25  10.89    0.14  3.83    2.70  3.05
##   26–40   2.72    1.07  0.03    0.43  0.58
##   41+    24.50    1.97  4.53    5.30  6.30

6.7.2 Percent Contributions

percent_contributions <- round((cell_contributions / chi_total$statistic) * 100, 2)

percent_contributions_function<- function(x){
  cell_contributions<-round((((x$observed - x$expected)^2) / x$expected),2)
  
  percent_contributions<- round((cell_contributions / x$statistic) * 100, 2)
  percent_contributions
}

total_percent <- percent_contributions_function(chi_total)
total_percent
##        
##         Amazon Disney+  Hulu Netflix Other
##   18–25  16.00    0.21  5.63    3.97  4.48
##   26–40   4.00    1.57  0.04    0.63  0.85
##   41+    36.01    2.90  6.66    7.79  9.26

6.7.3 Heatmap of Percentage Contributions by Age Group and Streaming Platform

pheatmap(percent_contributions,
         display_numbers = TRUE,
         cluster_rows = FALSE,
         cluster_cols = FALSE,
         main = "% Contribution to Chi-Square Statistic")
Heatmap showing the percentage contribution of each age group–platform combination to the overall chi-square statistic. Cells with higher percentage values (shown in red tones) indicate combinations that contribute more strongly to the observed association between age group and streaming platform preference.

Figure 6.1: Heatmap showing the percentage contribution of each age group–platform combination to the overall chi-square statistic. Cells with higher percentage values (shown in red tones) indicate combinations that contribute more strongly to the observed association between age group and streaming platform preference.

pheatmap_function <- function(x) {
  contributions <- ((x$observed - x$expected)^2) / x$expected
  percent_contributions <- round(((contributions / x$statistic) * 100),2)
  return(percent_contributions)
}

pheat_total<- pheatmap_function(chi_total)
pheat_total
##        
##         Amazon Disney+  Hulu Netflix Other
##   18–25  16.00    0.20  5.63    3.97  4.48
##   26–40   4.00    1.57  0.04    0.64  0.86
##   41+    36.01    2.90  6.66    7.79  9.26

6.7.3.1 Interpretation:

The cell that contributed the most to the overall Chi-Square statistic is the combination of older adults (41+) and Amazon, accounting for about 36% of the total χ² value. This means that viewers aged 41 and older chose Amazon far more often than expected compared to other age groups. In contrast, younger viewers (18–25) contributed a notable portion (16%) of the χ² value through lower-than-expected counts for Amazon, indicating that fewer young adults prefer Amazon than expected.

6.8 Effect Size (Cramer’s V)

cramerV(table_total)
## Cramer V 
##   0.3368

6.8.0.1 Interpretation:

A Cramer’s V value of 0.34 indicates a moderate association between age category and streaming platform preference. This suggests that age has a noticeable influence on which platform people prefer. Specifically, older and younger viewers tend to show distinct viewing patterns.

6.9 Final Interpretation

The Chi-Square test revealed a significant relationship between age category and streaming platform preference, χ²(8, N = 300) = 68.04, p < .001. The largest contributions came from the 41+/Amazon and 18–25/Amazon combinations, with older adults choosing Amazon far more often than expected, while younger adults chose it far less. Cramer’s V = 0.34 indicates a moderate association, suggesting that age noticeably influences streaming choices. In real-world context, this means that streaming services could benefit from adjusting their marketing and content strategies to better target different age groups, as platform preferences shift with age.