ML Interview Q Series: What metrics would you analyze to assess if only top-tier creators now succeed on YouTube?
📚 Browse the full ML Interview series here.
Comprehensive Explanation
A practical approach to determine whether amateur creators are truly underperforming while only superstars thrive involves examining a broad spectrum of metrics and dissecting the results across different segments of creators. Below are the primary data points and analyses you could consider:
Historical Engagement Data
Evaluating changes over time is crucial to see if amateurs used to have more equitable success:
Compare average and median views, watch time, likes, and comments for amateur vs. superstar channels historically (e.g., six months ago, one year ago, two years ago).
Investigate changes in the share of total views or watch time for each creator segment over time.
Look for any shifts or discontinuities that might have coincided with platform or policy changes, e.g., recommendation algorithm updates.
Creator Segmentation
You need clear definitions:
“Amateur” creators might be those below a threshold of subscribers, watch time, or total views.
“Superstar” creators might be in the top percentile of subscribers or total views. With these definitions:
Evaluate metrics within each segment separately and compare how key metrics change over time between segments.
Distribution Metrics
When concerns arise about concentration of success among a few top performers, it helps to measure how skewed performance is:
The proportion of views or watch time contributed by the top 1%, 5%, or 10% of creators.
The Gini coefficient (a classical statistic for measuring inequality) for distribution of views, watch time, or ad revenue.
Below is the Gini coefficient formula that is often used to measure inequality:
Here:
n is the total number of creators in the segment of interest.
x_i is the performance metric (e.g., total views, watch time, ad revenue) for the i-th creator.
A larger G value indicates higher inequality in distribution, meaning fewer creators dominate the overall performance.
Platform-Driven Factors
Sometimes, YouTube algorithm changes might reward different behaviors:
If the recommendation system was updated to favor more watch-time-heavy or retention-heavy videos, top-tier creators with professional equipment and editing might be able to produce content that retains viewers for longer durations.
Changes in monetization policies or content guidelines might disproportionately impact smaller creators.
Conversion and Retention Data
You can investigate how new or less established creators fare in terms of:
Impressions (how often their videos appear in recommended lists).
Click-through rate (CTR) from impression to actual view.
Viewer retention (percentage of video watched).
Trends in average watch sessions or watch time growth patterns among amateurs.
Audience Demographics
A demographic analysis can reveal if amateurs are losing traction in certain viewer segments. This might indicate changes in user preference or the overall content landscape. Comparing demographic shifts in viewership for small vs. large creators helps clarify if the user base has changed in a way that favors high-profile creators.
External Influences
Industry-wide changes (e.g., more professional brand deals, production budgets, etc.) might have raised the quality bar. Amateurs may still achieve success, but the threshold for what audiences consider “watchable quality” might have increased.
Example Code Snippet for Basic Investigations
import pandas as pd
import numpy as np
# Suppose we have a DataFrame `videos` with columns:
# 'creator_id', 'creator_segment', 'views', 'watch_time', 'upload_date', ...
# 'creator_segment' can be "amateur" or "superstar"
# We'll compare average watch time over two time periods.
# Filter data for old period (e.g., one year ago) vs. new period (recent quarter)
old_period = videos[(videos['upload_date'] >= '2022-01-01') & (videos['upload_date'] <= '2022-03-31')]
new_period = videos[(videos['upload_date'] >= '2023-01-01') & (videos['upload_date'] <= '2023-03-31')]
# Group by 'creator_segment' and compute mean watch time
old_watch_time_stats = old_period.groupby('creator_segment')['watch_time'].mean()
new_watch_time_stats = new_period.groupby('creator_segment')['watch_time'].mean()
print("Old period watch time (by segment):\n", old_watch_time_stats)
print("New period watch time (by segment):\n", new_watch_time_stats)
# We can also compute top 5% in each segment:
threshold_amateur = np.percentile(old_period[old_period['creator_segment']=='amateur']['views'], 95)
top_amateurs_old = old_period[(old_period['creator_segment']=='amateur') & (old_period['views']>=threshold_amateur)]
...
You could do more elaborate analyses, such as computing the Gini coefficient directly or analyzing correlation factors that drive engagement shifts.
Possible Follow-Up Questions
How do we define who is an amateur creator vs. a superstar creator?
A practical way is by using thresholds on certain metrics:
Subscriber count cutoffs (e.g., fewer than 10,000 subscribers vs. over 1 million subscribers).
Historical view count or watch time to segment creators.
Industry-based tiers (like the YouTube official “Silver,” “Gold,” or “Diamond” status).
One challenge is ensuring the thresholds accurately capture the spirit of “amateur” vs. “superstar.” Sometimes, an “amateur” who uploads fewer but very high-quality videos might have views comparable to established creators, so you must define these categories iteratively and test the definitions for correctness.
Why might it appear that only superstars do well, even if amateurs have a chance?
Several reasons might lead to the impression that only big names succeed:
Superstar creators often receive significant off-platform promotion, sponsorship, or media coverage, skewing perceived performance.
The sheer volume of new creators means many smaller creators may compete for limited attention, making it harder for each to stand out.
Public perception could be biased by trending or viral content, where superstars frequently dominate recommended feeds.
Cognitive bias: If the PM only notices top creators on the trending page or in press coverage, it might overshadow the performance of smaller channels.
Could certain algorithms or recommendation engine changes disproportionately favor large creators?
Yes. If an algorithm factors in high click-through rates and strong watch times, established creators might already have an engaged audience that helps them rank higher in recommendations. You’d want to examine:
Changes in recommended traffic over time for each segment.
If certain user engagement thresholds (likes, comments, etc.) amplify a video’s visibility, superstars likely benefit from an existing audience that helps their content trend quicker.
Why is distribution analysis (like the Gini coefficient) relevant here?
Distribution analysis reveals whether engagement (views, watch time, etc.) is getting more heavily concentrated among a smaller fraction of creators. If you observe a rising Gini coefficient or a larger share of views accrued by the top 1% or 5% of channels, it indicates increasing inequality. This provides a quantitative foundation to support or refute the claim that the big channels are dominating to a greater extent than before.
How would you handle a potential sampling bias when analyzing these metrics?
Possible sources of bias include:
Focus only on monetized or partnered creators, potentially missing new or unregistered amateurs.
Variation in dataset coverage if some smaller creators are excluded or overshadowed by the ranking logic in data collection.
Seasonal trends (e.g., during a pandemic, amateur creators might flourish due to more free time).
To mitigate this:
Include as many channels as possible in the analysis.
Segment by multiple definitions of “amateur,” not just subscriber count.
Consider consistent time windows for older vs. newer data to avoid seasonal effects.
If we find that amateur creators are indeed struggling, what next?
Potential steps include:
Adjusting recommendation algorithms to highlight emerging creators more frequently.
Providing platform tools and educational resources to help amateurs improve content quality, tagging, and SEO.
Experimenting with short-form or community-post features that can help smaller creators break out.
Running user outreach or marketing campaigns showcasing lesser-known talent.
How can we be sure that changes in data aren’t just due to a greater total number of content creators?
With increasing overall content, the total “view pie” might still be growing. It’s important to normalize or look at relative shares of watch time or subscribers. Even if absolute numbers look higher for everyone, the proportion of total engagement going to the top channels might have shifted. Examining how those proportions change over time is key.
By combining these data-driven analyses, you gain clarity on whether superstars truly overshadow amateurs or whether the concern arises from perception biases, broader platform shifts, or new competitor influences.
Below are additional follow-up questions
How would you deal with outliers when comparing the performance of amateur creators to superstar creators?
Outliers can skew averages and distributions, leading to misleading conclusions. Creators with a single viral video or exceptionally high-profile collaborations might inflate performance metrics. Conversely, some channels might have anomalies such as massive drops in viewership due to content flags or community strikes. Here is how you can address them:
Winsorizing or Truncating: Replace extreme values (top/bottom 1% or 5%) with boundary values, so they don’t disproportionately affect mean calculations.
Median and Interquartile Range: Use median or interquartile range instead of the mean, because the median is more robust to outliers. For example, if you track “median views” for amateurs over time, a single viral phenomenon won’t distort the central tendency.
Sub-group Analysis: Examine outliers as their own category. Sometimes, outlier channels are the most interesting segment to analyze separately to see what triggered their unusually high or low performance.
Pitfalls: Over-aggressive outlier handling might discard valid signals of unusual but genuine success. Under-aggressive handling might hide important changes in distribution.
If short-form content appears to drive higher engagement for amateurs, how does that influence your recommendations?
Short-form content, such as YouTube Shorts, might cater to spontaneous, discovery-based viewing:
Shift in Strategy: If amateurs excel in short-form, you could recommend that they focus on quick, snackable videos with strong hooks, optimizing for vertical display and fast viewer retention.
Compare Segments: Investigate whether the gap between amateurs and superstars is narrower for Shorts vs. long-form videos. This might indicate an opportunity for amateurs to grow if they take advantage of new formats.
User Retention Insights: Short-form’s watch-time patterns differ. You need to analyze how watch-time or completion rates for shorter videos stack up against longer ones. If amateurs can sustain quick but consistent content, they might build audience loyalty.
Pitfalls: A short-form surge might be short-lived if overshadowed by competition or if the recommendation algorithms change. Also, watch time in short-form can be high-volume but fleeting, so you must consider whether this success translates into stable subscriber growth or if it remains ephemeral.
How do you measure the impact of platform features (like Shorts or live streaming) on the success distribution between amateurs and superstars?
When the platform rolls out or heavily promotes certain features, it could favor specific creator segments:
A/B Testing: If you have data from regions or user groups where new features were introduced at different times, compare the performance shifts among amateurs before and after feature rollout. Check if superstars benefited more or if smaller channels got a visibility boost.
Attribution Modeling: Track the percentage of total views for amateurs that come via newly launched features or modules on the homepage or subscription feeds. Examine any spike or sustained uplift.
Cross-Feature Correlation: See if amateur creators who adopt new features promptly gain momentum. This can suggest that early adopters can achieve near “superstar” levels of engagement if they align with new platform trends.
Pitfalls: Confounding factors might arise if superstars are also early adopters or if the platform invests more in promoting established channels’ new feature content. You need to carefully isolate the effect of new features from the inherent popularity of big channels.
Suppose the data indicates that the median amateur channel has improved, but the average remains mostly unchanged. How do you interpret that discrepancy?
A difference between the median and mean signals distributional shifts:
Improvement for Most: If the median is up, it means at least half of amateur channels are doing better than before. So the typical amateur is seeing better performance.
Stagnant Average: A flat or declining average might be due to extreme high-value channels (outliers) either leaving or drastically dropping in performance. It can also mean that the top few amateurs who used to inflate the mean no longer do.
Depth of Engagement: It often implies that gains are more evenly spread among the majority, but the peak creators in the amateur segment did not push the group average upward.
Pitfalls: Focusing solely on mean-based metrics can mask widespread improvements. Conversely, focusing solely on the median can hide a collapse at the high end.
How would you use qualitative data, such as surveys or direct user feedback, to validate the concern that only superstars succeed?
Quantitative metrics can show performance trends, but qualitative feedback illuminates user and creator sentiment:
Creator Surveys: Ask amateur creators if they feel overshadowed, which factors they believe limit their growth (e.g., discoverability, content style, promotion budgets).
Audience Surveys: Investigate whether viewers tend to watch superstar channels more often because of perceived quality or familiarity, or if they find it harder to discover smaller creators they might enjoy.
Interpreting Divergence: If you see strong qualitative sentiment that amateurs are not discovering audiences, but your metrics show stable or growing performance, it may indicate a perception gap or specific discoverability friction.
Pitfalls: Survey responses are often subject to selection bias (only certain creators respond) or social desirability bias. Creators might blame external factors rather than acknowledging content shortcomings. Cross-reference with actual engagement data to avoid incorrect conclusions.
If amateurs truly show declining performance, how do you distinguish between a short-term fluctuation and a sustained downward trend?
Performance dips might be seasonal or triggered by specific, transient factors:
Time Window Analysis: Break down metrics by monthly, quarterly, or yearly intervals. A brief decline could coincide with significant events (e.g., a major holiday season where fewer new creators post).
Rolling Averages: Use rolling windows (e.g., a three-month or six-month moving average) to smooth out temporary noise. A consistent downward slope suggests a longer-term trend.
Event Correlation: Check if the dip matches a known algorithm change or a large competitor event that siphoned away audience. If performance bounces back, it was likely short-term.
Pitfalls: Over-reliance on short windows might lead to false positives, as trends can appear or vanish quickly. Conversely, overlooking short-term dips might mask new and growing issues if the environment is rapidly evolving.
If data shows that channel age strongly correlates with higher views, how do you isolate whether it’s age or popularity driving the success gap?
Channel age can be correlated with popularity simply because older channels have had more time to accumulate subscribers:
Matched Pairs or Stratification: Compare new and old channels with similar subscriber counts or similar monthly uploads to see if performance differences remain. If older channels outperform even at the same subscriber tier, it might be an age-related advantage (algorithmic preference or brand recognition).
Growth Rate over Time: Plot subscriber/view growth from channel start date to see if older channels grew faster at inception or if they just had more time to accumulate audience.
Pitfalls: Age might be a proxy for numerous other factors (consistent content schedule, adaptation to new features). It’s easy to confuse correlation with causation, so carefully isolate channel age from other confounders.
Could amateurs outperform superstars in specific niches, and how would you account for category differences?
Certain specialized content (e.g., niche DIY or highly localized topics) might have less competition among established creators:
Categorical Segmentation: Tag videos by genre (gaming, vlogs, educational, etc.) and compare how amateurs fare within each. Niche categories might show a more level playing field than mainstream categories.
Demand-Supply Analysis: If an amateur covers an emerging or underserved niche, they might attract a loyal following more quickly than if they compete against massive channels in a saturated mainstream category.
Algorithmic Discovery: Niche content might receive targeted recommendations to a small but dedicated audience. This can inflate engagement metrics (like watch time) relative to the channel’s size.
Pitfalls: Category data might be incomplete or inconsistently labeled. Also, niches can shift quickly if larger creators pivot into them.
How can you determine whether improving production quality or equipment truly creates a barrier for amateurs?
One suspicion is that the platform environment now demands higher production values:
Content Quality Proxy: Use proxies for “production value,” such as video resolution, average editing cuts per minute, presence of subtitles, or multi-camera angles. Compare performance among amateurs with differing levels of production complexity.
Audience Sentiment: Look at comments or survey data to see if viewers perceive less polished videos as subpar, or if they still value authentic, raw content.
Sudden Upgrades: Track amateurs who recently upgraded equipment (e.g., started producing 4K videos, introduced better audio). See if their channel experienced a noticeable inflection in engagement.
Pitfalls: “Quality” is subjective; simple content can also go viral. Overemphasis on equipment might ignore creativity, storytelling, or personal branding, which can overshadow pure production value.
How do external factors (like new competing platforms or changing audience demographics) complicate your analysis?
External shifts can reshape the video ecosystem entirely:
Competing Platforms: If many small creators move to TikTok or other short-form platforms, YouTube’s amateur segment might shrink or face stiffer cross-platform competition.
Demographic Changes: Younger audiences may spend more time on new apps, leaving YouTube for specific types of content. This can affect amateurs who rely on that demographic.
Global Events: Major global happenings (e.g., economic recessions, political unrest) can shift content consumption patterns. A new wave of creators could appear, or existing ones might pivot content style.
Pitfalls: Overlooking these external influences can lead to attributing changes entirely to YouTube’s algorithm or feature rollout. You must analyze external data, such as market research on user behavior or platform usage trends, to confirm whether the issue lies within YouTube or in the broader ecosystem.