본문 바로가기
정보

Netflix&AI, Data Science, and Machine Learning: 넷플릭스와 인공지능, 데이터사이언스 그리고 머신러닝

by 모오오어 2020. 7. 7.
반응형
SMALL

 

 

#1

How Netflix Uses AI, Data Science, and Machine Learning 

 

Netflix’s machine learning algorithms are driven by business needs.

The presence of AI in today’s society is becoming more and more ubiquitous— particularly as large companies like Netflix, Amazon, Facebook, Spotify, and many more continually deploy AI-related solutions that directly interact (often behind the scenes) with consumers every day.

When properly applied to business problems, these AI-related solutions can provide unique solutions that scale and improve over time, creating a significant impact for both business and user. But what does it mean to “properly apply” an AI solution? Does that mean there is a wrong way? From a product perspective, the short answer is yes, and we’ll get to why that is later in this article as we dig deeper.

Overview: First, we will outline 5 use cases of data science or machine learning at Netflix. We’ll then discuss some business needs vs technical considerations a Product Manager would look at. Then we will dive a little deeper into what is perhaps the most interesting of these 5 use cases as we identify what business problem it seeks to solve.

 

5 Use Cases of AI/Data/Machine Learning at Netflix

  1. Personalization of Movie Recommendations — Users who watch A are likely to watch B. This is perhaps the most well-known feature of Netflix. Netflix uses the watching history of other users with similar tastes to recommend what you may be most interested in watching next so that you stay engaged and continue your monthly subscription for more.
  2. Auto-Generation and Personalization of Thumbnails / Artwork — Using thousands of video frames from an existing movie or show as a starting point for thumbnail generation, Netflix annotates these images then ranks each image to identify which thumbnails have the highest likelihood of resulting in your click. These calculations are based on what others who are similar to you have clicked on. One finding could be that users who like certain actors/movie genres are more likely to click thumbnails with certain actors/image attributes.
  3. Location Scouting for Movie Production (Pre-Production) — Using data to help decide on where and when best to shoot a movie set — given constraints of scheduling (actor/crew availability), budget(venue, flight/hotel costs), and production scene requirements (day vs night shoot, the likelihood of weather event risks in a location). Notice this is more of a data science optimization problem rather than a machine learning model that makes predictions based on past data.
  4. Movie Editing (Post-Production) —Using historical data of when quality control checks have failed in the past (when syncing of subtitles to sound/movements were off in the past) — to predict when a manual check is most beneficial in what could otherwise be a very time-intensive and laborious process.
  5. Streaming Quality — Using past viewing data to predict bandwidth usage to help Netflix decide when to cache regional servers for faster load times during peak (expected) demand.

These 5 use cases/applications of data science or machine learning just in Netflix alone have had such a scalable impact that they have forever changed the technology landscape and user experience for millions and more to come. Adoption of these AI-related solutions is only going to get stronger over time.

But before these use cases were as commonplace as they are today and used by users like you and me, someone or some group within Netflix properly connected these AI solutions with a business need. Without this business link, these use cases would simply be pie-in-the-sky ideas sitting at the bottom of a backlog like so many other great ideas. Only through proper positioning and connection with Netflix’s core business problem did these ideas become the reality that they are today.

Netflix uses machine learning to generate many variations of high-probability click-thru image thumbnails that it relentlessly and continuously A/B tests throughout its user base — for each user and each movie — all to increase the probability that you will click and watch.

 

 

 

 

 

What is Business Need/Problem?

Notice in each of the use cases I’ve identified above, each one is associated with a specific business need, goal, or hypothesis.

This is important for any product manager — to avoid the temptation of the tech enthusiast who marvels in the details of the data science / or ML for intellectual reasons without clearly identifying the problem or business need — potentially using up valuable technical resources with no business impact.

At the end of the day, product managers need to properly connect a business problem to a data machine learning solution. We want to avoid having a solution that’s chasing for a problem, otherwise, the project will lose momentum within the company: engineers won’t be clear what their North star is, stakeholders across the organization won’t buy-in and allocate the necessary resources to make the project a success, etc.

Make sure there is a problem to which an AI solution can be directly connected

Machine learning (ML) is a potential AI solution — but we need to first define the problem before prescribing that solution.

What’s the business result we are trying to achieve with ML? Because this core business need is what drives the parameters of the ML models used, what data is collected and processed, etc. We don’t do ML to provide personalization just because it’s interesting tech — we need to link it to a business problem. Data scientists are specialists in uncovering insights from the data, but it is the product manager’s role to properly link it to a business need or problem and compare it with competing priorities.

For example, a tech enthusiast might say:

Wouldn’t it be cool if you could analyze / debate an episode using voice with Netflix — and Netflix, with data input from thousands of other users’ reactions to that episode, could respond intelligently to your comments in a back and forth 2-way dialogue?

Yes, that would be a pretty awesome use case leveraging natural language processing (NLP) to understand your post-episode commentary in context. In addition to NLP, this use case uses text to voice personalities as well as sentiment analysis of how thousands of others felt about what happened in that episode, or how they feel about a certain character. Indeed, this is a beautiful merging of multiple cutting edge technologies in one use case.

If a pilot MVP version of this showed that users who engaged with his new feature stayed longer or came back more often or helped drive more word of mouth about Netflix, then it could warrant further resources. The initial decision to build that MVP would depend on the strategic decision made by stakeholders, not necessarily prioritized by metric. That will depend on company strategy.

But as beautiful of a user scenario the above is, what problem does that solve?

How does it relate to Netflix’s main problem of keeping users subscribed every month? If it’s related, what evidence (qualitative or quantitative do we have to support that relationship?

And if this is a legitimate solution to that problem, is there a simpler version of this solution that could equally accomplish that problem but be less technically complex? For example, instead of voice input and voice output, how might the complexity of just text input and text output affect the level of effort and impact on user engagement?

What if a conversational AI interface without the voice part (just text) achieved 80% of the intended user engagement but only required 40% of the development effort? Would it be worth considering such an alternative route?

What business impact would such a solution have in comparison to the level of effort? How does this ratio compare with that of other competing tasks in the backlog?

These are all product-focused questions that a PM should be asking to align technology solutions with business needs. Because ultimately, it’s the business need that drives the parameters of an ML model, not the other way around.

So let’s look once again at movie recommendations and those personalized thumbnails — what’s the problem or business goal?

Because You Watched…You’ll Love it… — What Problem Does Movie Recommendation Help Solve?

Movie Recommendations: Identifying the Problem

Here the problem is that Netflix has a huge collection of content (over 100 million different products, according to Netflix) that is constantly changing and can be overwhelming for a user to consume. Users don’t want to be frustrated in finding content relevant to their interests. So then, what is the best way to allow each user to consume that data in a way that ultimately maximizes subscription loyalty?

Product Goals include:

  • Increase/maintain viewership in terms of # minutes consumed,
  • Increase in # of titles explored, frequency of logging back in
  • Exceeding whichever minimum threshold that the company determines is a success metric
  • The overall increase in monthly subscription loyalty/decrease in subscriber cancellations

Netflix Personalized Thumbnails At Work: 2 Different Users Seeing 2 Different Images for the same Godfather movie: 1 showing a dramatic closeup of a face, the other showing a happy smiling couple.

Personalized Image Thumbnail / Artwork: Identifying the Problem

This use case is a subset of Movie Recommendations. Given that movie recommendations are provided to the user, we now have yet another business/user problem.

Problem: How (and when) do we best present that movie recommendation to the user in a way that maximizes viewership and monthly subscriber loyalty?

Well, one way to provide that recommendation is through an image thumbnail — but what kind of thumbnail do we provide? And how confident are we that tweaking an image thumbnail will affect viewership or subscriber loyalty in a positive way?

And how important is that thumbnail? Do we have data for that?

Gathering Data to Support That Hypothesis

Well, you can be assured that some product-focused individual at Netflix — at a time before 2014 — was asking these same questions internally. And that individual or group worked together (probably with UX and related stakeholders) to put together user studies or data elsewhere, to prove that there was indeed a strong link between an image thumbnail and viewership.

That was their hypothesis: that adjusting the artistic content of an image thumbnail could have a strong link to viewership.

Well, turns out, back in 2014, Netflix conducted studies showing just how important that thumbnail is:

Nick Nelson, Netflix’s global manager of creative services, explained that the company conducted research in early 2014 that found artwork was “not only the biggest influencer” for a user’s decision about what to watch, it also constituted over 82 percent of their focus while browsing Netflix.

“We also saw that users spent an average of 1.8 seconds considering each title they were presented with while on Netflix,” Nelson wrote. “We were surprised by how much impact an image had on a member finding great content, and how little time we had to capture their interest.”

A small, compelling thumbnail could mean the difference between getting you to spend the entire weekend watching Netflix’s latest Originals hit or losing interest and bouncing over to a competing service like Hulu or similar OTT streaming services like ESPN / Disney / HBO Go.

So based on studies, the hypothesis above was shown to be very true.

OK, Thumbnails Are Important. But What Exactly Do We Tweak?

And how does an unstructured data set like a bunch of image thumbnails get fed into a digital/mathematical machine learning model? We’ll answer this second question further below.

First, given how important the thumbnail was to a user’s decision to watch something, how can Netflix generate better thumbnails for each user to increase the chance that a user will watch a video?

Using the movie’s original art as the only thumbnail used for every single person most likely won’t yield the highest click rates. The business is likely leaving clicks (and viewer stream time) on the table!

What if Netflix custom created a different thumbnail for each user that is optimized to increase click rates?

What are things within an image thumbnail that is within Netflix’s control that they can tweak to increase those click rates?

Same Riverdale Movie, but two different artistic image thumbnails, based on user’s past preference for romance (sweet smiles) or thriller (serious, dramatic looks) movie genres.

Which actor(s)/character(s) should be on that thumbnail if any? How many? Which auto-generated frame or poster variation would be most enticing for a particular user to click on? What lighting works best? Filters?

What data do we have on other users’ past clicking behavior can we draw associations from to help inform this thumbnail decision at scale?

  • Increase click-thru-rates (CTR) of movie recommendations — signifying engagement
  • The hypothesis that higher engagement rates will lead to higher subscriber satisfaction and loyalty

So this is a really interesting problem with the image thumbnail that can have a huge impact on the likelihood that someone will click on a video and watch.

If the goal is to maximize that probability of watching by tweaking the thumbnail — what are some product decisions to consider?

Product Considerations In Personalized Image Thumbnails

We won’t dive into each of the use cases above, but let’s dive a little further into the second one: Artwork / Thumbnail Personalization

This is a data-driven personalization feature that sits on top of the Movie recommendation engine

Product Considerations

Algorithms are great, but they do have limitations. A product manager should always think ahead of possible edge-case scenarios in which the algorithm may fail to produce the best results.

  1. Each movie should ideally have a personalized thumbnail that maximizes clicks. Since Netflix has data on clicking behavior of other people with similar interests, it is a reasonable hypothesis to guess that if other people with similar interests and watch history had a high click-thru rate on a certain thumbnail, then it is likely that this image thumbnail will perform will on a new person who hasn’t yet been recommended this movie/thumbnail.
  2. The personalized thumbnail should take into consideration other movies there are being recommended at the same time — and what those image recommendations are. Let’s say Netflix is recommending 2 different Spiderman movies to a user side by side — and they both have Spiderman facing the camera mask off. One is Tobey Maguire and the other is Andrew Garfield. Wouldn’t it be weird for the user to see both portraits of Maguire and Garfield as Spiderman with their masks off — side by side? Something to account for if that ever were to occur.
    One image thumbnail could work well in isolation, but that may not be good enough when a page of a dozen thumbnails shows up. If they are all optimized to look the same way, then as a group, each one may seem less compelling. So looking at each thumbnail together with what else is being presented will be important.
  3. Data is great, but watch out for algorithms that do their job too well, resulting in unintended consequences / false positives!
    In statistics, they call this a Type I error — falsely (or improperly) suggesting an image thumbnail that shouldn’t be suggested.

Case in point: Just look at the example below of Like Father, a movie starring Kristen Bell. Yet, Netflix’s algorithm (arguably) made false thumbnail recommendations of supporting black actors/actresses who don’t represent what the movie was about but did experience a higher click rate among certain ethnic audiences.

Black users are seeing the thumbnail on the right, despite it not being representative of what the movie is about.

So be aware that an overly optimized/personalized experience could create a monotonous user experience that in some cases can be misleading to the user. We want to provide a healthy mix of the familiar with the unexpected but also accurately portray content to the user so they aren’t improperly misled.

Here’s another example:

Based on a high likelihood of click-thru-rates (CTRs), Netflix ended up presenting thumbnails to users that matched a user’s ethnicity — — even when that (usually) supporting actor/actress had very little screentime in that movie.

A black user’s recommendation shows thumbnails reflecting her ethnicity — even when that thumbnail is not necessarily representative of the movie in general.

While this is a data-supported initiative, it’s quite obvious to the user that there’s a feeling of dis-ingenuousness that can be misleading in terms of a thumbnail accurately representing that movie (Type I false positive error).

 

Of course, this algorithm will likely be fine-tuned over time, but the lesson here is don’t overdo it when capitalizing on data — apply some common sense to balance it out.

We don’t want to improperly mislead users or let them know they are being treated differently because of their race, for example.

4. Lastly, the algorithm should take into consideration what thumbnail images the user previously saw in association with this movie and aim to provide a consistent, non-confusing user experience.

We want to avoid the user seeing different thumbnails each time that movie appears to the user. Not only would this confuse the user, but it would also make it difficult for a Product Manager to assign attribution to a click — which images resulted in a higher click-thru-rate (CTR) when it keeps changing? PM’s need to be able to properly attribute each new result to a specific change — so maintaining consistent data attribution is important.

So those are some things a product manager would consider when designing edge case scenarios and what extreme cases of data usage can result in. Speaking of data, what specifically does Netflix work off of?

What Data Do We Have?

There are 2 parts to this:

  1. What data does Netflix use to create these personalized thumbnails/artwork?
  2. What data does Netflix use to target these custom-created thumbnails to the appropriate individual?

For the first question, consider that

  • A 1-hour episode of Stranger Things has >86,000 static video frames
  • These video frames can each individually be assigned certain attributes that are later used to filter down to the best thumbnail candidates through a set of tools and algorithms called Aesthetic Visual Analysis (AVA). This is designed to find the best custom thumbnail image out of every static frame of the video
  • Netflix Annotation — Netflix creates metadata for each frame including brightness (.67), # of faces (3), skin tones (.2), probability of nudity (.03), level of motion blur (4), symmetry (.4)
  • Netflix Image Ranking — Netflix uses the metadata from above to pick out specific images that are the highest quality (good lighting, no motion blur, probably contains some face shot of major characters from a decent angle, don’t contain unauthorized branded content, etc) and most clickable

For the second question of what data Netflix uses to identify who to target these custom-generated thumbnails towards, consider that Netflix tracks:

  • # of movies watched, # of minutes of each show watched
  • % of completion for every video/series
  • # of upvotes, which movies were favorited, etc
  • % of overall watch content that is attributable to any specific show (and therefore level of affinity that user has to a specific show or related cast members)
  • any seasonal or weekly trends related to a user’s level of engagement, etc.

Interesting to note, in Mid 2018, Netflix stopped accepting user reviews as a data point, which it had previously solicited only on their website. Why? Because this “feature” actually reduces viewership, as negative reviews discourage users from trying out a video. This is just yet another example of how a business need supercedes a popular user need!

So Netflix has a TON of data on each of its customers — from videos watched to images clicked. What do they do with all that data?

How Netflix Uses Data to Construct A Universe of User Profile Interests

Well, they use it to put together a 360 profile of each user and mathematically index every user according to hundreds, possibly thousands of different attributes.

They do this to try to group people with similar interests together so they can use data from one user to help predict likely behavior of other similar users.

How does this grouping of similar user profiles work and how does a product manager make sense of the data?

Having gone through the complex math and algorithms associated with matrices, vectors, and n-dimensional feature analysis, I found the easiest way to understand how this works is through a 3D-spatial representation of 10+ dimensions.

Here’s a screenshot I took when using Google’s TensorBoard on the NIST database of handwritten digits. It’s a fancy plot called the t-SNE plot — effectively a 3D representation of a lot more dimensions than just 3. In this case, we are showing 10 dimensions (one for each digit from 1 to 10) on a 3D sphere-like coordinate system.

A t-SNE plot of 10 dimensions in a 3D view using Google’s Tensorboard. Looks complex at first, but is quite simple.

Each hand-written digit’s position in this spatial representation can be described by a vector — a coordinate-like series of numbers across however many feature dimensions.

Likewise, with Netflix users, each user profile’s position in the above chart could be described by numerical values each representing an individual dimension of that user’s interest — including movie genre, favorite actors/actresses, movie topic, etc.

Reimagining Netflix Users in Mathematical Relation To Each Other

Let’s pretend in the digits diagram above that:

  • “6” = romantic comedy
  • “4” = thriller

If a user is labeled a “6” by Netflix, then he/she will be placed in the general vicinity of where all the other turquoise 6’s are in the above spatial representation (near the bottom).

Likewise, if a user is labeled a “4” by Netflix, then he/she will be placed in the general vicinity of where all the other magenta 4’s are in the above spatial representation (near the top).

Let’s pretend each number represents a movie genre. A user who likes Romantic Comedies (6) could mathematically be closer to someone who likes Parody (5) than someone who likes a Thriller (4).

Notice how the turquoise “6” region (romantic comedy) somewhat overlaps with the grey “5” region. This could be analogous to how users who like romantic comedies could also like parody or satire movies because they both involve laughing.

Likewise, since the magenta “4” region (thriller) is somewhat close to the pink “9” region — this pink 9 region could represent those who like action movies — mathematically closer to the thriller “4” region than the romantic comedy “6” region.

Does that make sense? So when spatially represented, the distance between two user profiles represents how similar/different their tastes are. Of course, this can get infinitely more complex when someone who likes romantic comedies also likes thrillers — but the purpose of this analogy is to show the general idea of mathematical/spatial relationships between different categories.

Interest groups that are related to each other would appear closer together and could be good predictors of what a user will like, given that the user likes something else nearby.

This is how Netflix, or any company leveraging ML models, creates relationships between seemingly unstructured data and turning that data into numbers. These numbers by themselves don’t make much sense, but together in relation to each other, they begin to make sense.

For the same Good Will Hunting movie below, one user identified as a comedy fan would be shown a Robin Williams (comedian) thumbnail, whereas another user identified as a romantic comedy fan would be shown a kissing thumbnail featuring Matt Damon and Minnie Driver. While not perfect, Netflix’s algorithms suggest that such level of personalization based on user profile characteristics increases the probability of click-thru rates.

So let’s summarize. A bunch of Netflix image thumbnails is a bunch of unstructured data.

But once Netflix annotates each thumbnail and assigns metadata to each one to describe what’s in that thumbnail — now we have a numeric representation of that unstructured data.

A plot that numeric representation in the form of vectors across a 3D sphere-like we did above — and now Netflix start forming relationships between data points.

Netflix then finds data points that are relatively near each other and uses them to help predict future click-thru behavior. If predictions turn out bad or good, they adjust the mathematical positioning of these characteristics accordingly until the model becomes better and better over time.

So that’s how Netflix turns unstructured data into mathematical representations. It uses the relational distance between data points as a basis for making and improving upon image thumbnail recommendations.

 

 

What Did Netflix Learn From All This Data?

Now that we know how Netflix turns images into numbers in a machine learning model, what are some insights Netflix has found from all the data processing and A/B tests they have conducted for so many years?

Well, besides learning the millions of individual thumbnails that converted users to loyal subscribers over time, here are a few additional things Netflix has learned for what works in terms of thumbnails:

  • Show close-ups of emotionally expressive faces
  • Show people villains instead of heroes
  • Don’t show more than three characters

In Conclusion: Netflix Deployed AI (mostly) in the Right Way. Let’s Learn From Their Approach.

Netflix has done a phenomenal job of applying AI, data science, and machine learning the “right way” — using a product-based approach that focuses on business need first, then AI solution next, rather than the other way around.

When applied properly, AI can do wonders.

We’ve seen how effective AI solutions can be in personalizing the experience for the benefit of both Netflix in terms of subscriptions and users in terms of overall satisfaction.

We’ve also seen limitations of algorithms that “overdo it” and discussed specific examples in which the Netflix algorithm presented misleading thumbnails to people of color because the algorithm optimized for clicks, effectively “tricking” the users into clicking bait. This happened even when that thumbnail did not accurately represent that video.

No algorithm will be perfect in accounting for all the nuances of human experience. Algorithms designed to exploit metrics will do just that — so it is the role of the product manager to work with design or other team members to find ways to address these deficiencies in algorithms.

Going forward, the integration of AI in society as well as in the corporate enterprise space will continue to become more and more prevalent.

Technologists may tend to prescribe existing AI solutions, but the most effective way to adopt AI is the way Netflix did — from a business-driven perspective first.

Dig deep and you will see that Netflix generated supporting data before making the strategic move forward.

As the world of AI, data science, and machine learning continues to grow, we product managers can all take a lesson or two out of the Netflix playbook when it comes to properly deploy AI solutions.

 

 

 

 

 

#2

 

넷플릭스의 성공 비결. 수 많은 콘텐츠를 저렴한 가격에 무제한으로 볼 수 있다는 것이 가장 큰 이유겠지만, 사용자 친화적인 정책과 구조도 성공에 한 몫했다. 넷플릭스는 사용자가 선호하는 콘텐츠를 파악한 후 이를 바탕으로 유사한 콘텐츠를 사용자에게 추천해주는 시스템을 도입했다. 사용자가 일일이 검색하지 않고, 추천 시스템 만으로도 취향에 맞는 콘텐츠를 찾은 후 감상할 수 있는 것. 많은 사용자에게 호평받은 넷플릭스 추천 시스템의 비결을 토드 옐린(Todd Yellin) 넷플릭스 제품 혁신 부사장에게 들었다.

넷플릭스

넷플릭스 추천 시스템의 두 기둥: 노가다와 머신러닝

"넷플릭스는 남녀노소 누구나 사용할 수 있도록 설계되어 있다. 넷플릭스 사용자 경험의 핵심은 추천 시스템이다. 넷플릭스의 추천 시스템은 차로 비유하면 엔진에 해당한다. 서비스를 지탱하는 핵심 기술이다."

 

"인터넷 시대가 열리면서 뉴스, 영화, 드라마, 음악 등 다양한 콘텐츠가 쏟아지고 있다. 하지만 이렇게 많은 콘텐츠 때문에 사용자는 오히려 혼란을 느낀다. 정보가 너무 많다. 대체 무엇을 시청해야 한단 말인가. 넷플릭스의 추천 시스템은 이러한 소비자들의 고민을 대신 해주는 기술이다. 넷플릭스에 수 천개의 콘텐츠가 존재한다고 해서 사용자가 그 모든 것을 검토하지는 않는다. 보통 30~40개의 타이틀만 검토한다. 추천 시스템은 사용자가 감상한 30~40개의 콘텐츠를 바탕으로 이와 유사한 콘텐츠를 찾아준다. 넷플릭스를 이용하면 검색을 할 필요가 없어진다. 넷플릭스가 추천해주는 콘텐츠만 감상해도 충분히 만족할 수 있다."

토드 옐린 넷플릭스 제품 혁신 부사장<토드 옐린 넷플릭스 제품 혁신 부사장>

넷플릭스의 추천 시스템은 심심할 때 시간을 죽이기 위해 영화를 몰아보는 라이트 유저(서비스 이용 빈도가 낮은 사용자)를 중심으로 큰 반향을 이끌어 냈다. 넷플릭스만 켜면 그동안 밀린 자신 취향의 영화를 한 눈에 파악한 후 모두 감상할 수 있으니 말이다. 이러한 열풍을 설명하기 위해 '영화 폭식(binge-watching)'이라는 신조어까지 탄생했다.

넷플릭스 추천 시스템의 비결은 뭘까. 혹자는 정교한 컴퓨터 알고리즘을 통해 사용자 취향에 맞는 영화를 찾아준다고 말한다. 틀린 말은 아니다. 넷플릭스의 추천 시스템에는 머신러닝(기계학습)이 적용되어 있으니 말이다. 하지만 옐린 부사장이 밝힌 '진짜 비결'은 전혀 다른 것이었다. 바로 수많은 인력을 동원한 '노가다'식 시스템이었다.

"넷플릭스에 신작이 입고되면 내부의 콘텐츠 팀이 해당 영화, 드라마, 애니메이션을 일일이 감상한다. 그 다음 엑셀 스프레드 시트에 해당 영화와 관련있다고 생각되는 모든 태그(꼬리표)를 입력한다. 태그는 엄청 많이, 그리고 되도록 자세하게 입력한다."

 

"사용자가 처음 넷플릭스에 가입하면 자신의 취향에 맞는 콘텐츠 3개를 고르게 된다. 3개의 콘텐츠에 붙은 태그를 바탕으로 컴퓨터 알고리즘이 사용자 취향에 맞는 콘텐츠를 찾아준다. 태그의 일치도가 높은 콘텐츠가 우선 노출된다. 이후 사용자가 넷플릭스의 콘텐츠를 많이 감상하면 감상할 수록 더욱 정확한 결과가 나온다. 머신러닝을 바탕으로 넷플릭스의 클라우드 컴퓨팅 시스템이 수많은 태그를 일일이 대조한 후 사용자 취향에 맞는 콘텐츠를 찾아준다. 태그는 영어로만 입력하는 것이 아니다. 태그도 현지화된다. 해당 국가의 문화와 언어에 맞춰서 다양한 언어로 태그를 매긴다. 나라 별로 취향이 천차만별이기 때문이다."

때문에 넷플릭스의 메인 화면은 사용자 별로 전혀 다르다. 7,500만 명의 넷플릭스 가입자가 있으면, 7,500만 명의 넷플릭스 메인 화면이 존재한다.

또한, 넷플릭스의 추천 시스템은 사용자 개인 데이터 뿐만 아니라 지역에서 수집한 사용자 집단(클러스터)의 데이터도 활용된다. 어떤 지역에서 어떤 장르의 콘텐츠를 선호하는지 분석한 후 현지 사용자의 추천 시스템에 반영하는 식이다. 예를 들어 애니메이션의 선호도가 높은 일본 사용자에겐 신작 애니메이션도 추천하고, 러브 코미디의 선호도가 높은 한국 사용자에겐 신작 러브 코미디도 추천하는 식이다.

"넷플릭스는 두 가지 형태로 구성되어 있다. 외부 인터페이스와 내부 알고리즘이다. 많은 경쟁 서비스가 넷플릭스의 인터페이스를 베끼고 있다. 하지만 알고리즘은 베끼지 못한다. 추천 시스템 알고리즘이야 말로 넷플릭스의 핵심 경쟁력이다."

포스터 하나도 사용자 취향에 맞게

"넷플릭스의 주인은 사용자다. 직원이 임의로 판단하는 것은 금물이다. 때문에 콘텐츠를 제작한 후 사용자의 반응(피드백)을 언제나 철저하게 반영하고 있다."

"피드백의 가장 대표적인 사례가 '포스터'다. 얼마 전 드라마 '풀러 하우스'를 선보일 때 사용자들에게 6개의 포스터 이미지가 무작위로 노출되도록 했다. 그리고 이 가운데 어떤 이미지를 선호하는지 데이터를 수집했다. 넷플릭스 내부에선 주연 배우들의 이미지가 노출된 포스터를 선호할 것이라고 판단했으나, 결과는 그와 반대였다. 드라마의 배경이 되는 샌프란시스코의 명물 '금문교'를 내세운 포스터가 사용자들에게 선택받았다. 이 데이터가 나온 즉시 모든 사용자에게 금문교를 내세운 포스터가 노출되도록 했다."

토드 옐린 넷플릭스 제품 혁신 부사장

토드 옐린 넷플릭스 제품 혁신 부사장<사용자 취향에 맞게 포스터를 노출하는 넷플릭스의 시스템을 설명 중인 토드 옐린 넷플릭스 제품 혁신 부사장>

지역 별로 (포스터를 보는) 사용자 취향이 다를 수도 있다. 때문에 넷플릭스는 지역 별로 데이터를 달리 수집해서 해당 지역에서 선호하는 포스터를 내걸고 있다. 포스터는 6개만 제작하는 것이 아니다. 인기 있는 작품의 경우 더 많이 제작하기도 한다. '제시카 존스' 같이 화제가 된 작품은 포스터를 더 많이 제작해 사용자 선택의 폭을 넓혔다.

 

 

#3

 

 

“머신러닝이 만족도 80% 넷플릭스 추천 시스템 만든다”

콘텐츠 추천 서비스, 개인 맞춤형 서비스를 얘기할 때 빠지지 않고 등장하는 서비스가 있다. 100명이 넘는 사람이 모여 수집한 데이터를 바탕으로 새로운 알고리즘 서비스를 만들어 내는 곳, 세계에서 가장 정교한 추천 알고리즘을 가지고 있다고 자부하는 곳, 가장 많은 사용자 시청 정보를 가지고 맞춤형 서비스를 개발한다고 외치는 곳. 바로 세계적인 인터넷 기반 TV 서비스 기업 넷플릭스다.

 

카를로스 고메즈 유리베 넷플릭스 제품 혁신 담당 부사장

“우리 꿈은 사람들이 아주 쉽게 자기가 좋아할 만한 영상을 볼 수 있는 환경을 만들어주는 것입니다. 넷플릭스에 들어와서 순간적으로 ‘이거 내가 좋아하는 거네?’하면서 기분 좋게 시청할 수 있는 환경을 만들고자 합니다.”

넷플릭스 개인화 알고리즘을 책임지고 있는 카를로스 고메즈 유리베 넷플릭스 제품 혁신 담당 부사장이 밝힌 넷플릭스 추천 서비스의 목표다. 그는 넷플릭스 시청자 중 80%가 추천 시스템에 만족하고 자신이 좋아하는 콘텐츠를 볼 수 있게 되는 게 꿈이라며, 좋은 추천 시스템을 만들기 위해 노력하는 중이라고 밝혔다.

“굉장히 다양한 변수가 있지만, 추천 알고리즘 개발하는 데 있어 인기와 개인화를 빼놓을 수 없습니다. 추천 서비스 기본은 ‘인기를 끄는 콘텐츠가 무엇인지’, ‘어떤 콘텐츠를 사람이 많이 보는지’, ‘어떤 기기를 바탕으로 어느 시간에 콘텐츠를 시청하고 있는지’, ‘어떤 분위기에서 콘텐츠를 소비하고 있는지’ 등을 정보를 파악해 적절히 조화를 이루면 좋은 알고리즘을 만들 수 있고 추천 서비스도 선보일 수 있다고 생각합니다.”

넷플릭스 추천 알고리즘엔 다양한 변수가 들어간다. 인기와 개인화 못지않게 다양성도 중요 변수다. 사용자가 주로 시청하는 콘텐츠 영역에만 머무르지 않고, 해당 사용자가 시청할 수 있는, 시청하면 좋아할 콘텐츠를 계속해서 발굴해 보여준다. 언제, 어떤 타이밍에 추천 서비스를 선보일지도 고민한다. 너무 빠르게 추천하다 보면 가입자가 혼란스러워지고, 너무 느리게 추천하면 볼 게 없다고 생각할 가능성이 있기 때문이다. 그 외에 사용 언어도 신경 쓰고 국가별 문화도 신경 쓴다.

사용자 시청 정보, 플레이 데이터를 바탕으로 가능한 모든 경우를 분석합니다. 단순히 넷플릭스 계정 하나만 분석하는 게 아니라 넷플릭스 계정에서 나뉘는 각 프로필 정보도 파악해서 분석하지요. PC와 모바일, 태블릿 등 다양한 기기에서 어떻게 콘텐츠를 소비하는지도 살펴봅니다.”

넷플릭스는 190개국, 전세계 8100만명이 가입한 글로벌 서비스다. 지난 1월 새로운 국가 130여곳에 새로 진출하면서 전세계 어느 곳에서나 원활한 서비스를 제공하기 위해 개인화 추천 서비스를 무기로 들고 나왔다. 1년에 걸친 노력과 연구 끝에 넷플릭스는 전세계 가입자가 이용할 수 있는 개인화 추천 시스템을 개발했다. 가입자가 원하는 콘텐츠를 빠르고 간편하게 찾을 수 있게 돕는다.

이제 막 넷플릭스 서비스를 이용할 수 있는 나라에서도 추천 서비스는 문제없이 돌아간다. 나라가 아닌 전세계 사용자 그룹을 나눠서 가입자 개개인 거주 국가와 상관없이 콘텐츠를 추천한다. 당장은 부족하게 보일지 모르지만, 시간이 흐르고 콘텐츠가 쌓이면 넷플릭스 서비스는 힘을 발휘한다.

참고로 이 모든 서비스는 알고리즘, 컴퓨팅 시스템으로 운영된다. 넷플릭스 추천 서비스에 사람은 없다. 시스템이 자리한다. 물론 처음부터 넷플릭스가 알고리즘을 바탕으로 개인 맞춤형 추천 서비스를 만든 건 아니다. 알고리즘을 이용하는 방식보다 태그 입력 같은 콘텐츠 색인 목록을 바탕으로 추천 서비스를 제공하는 게 더 낫다는 내부 의견도 있었다. 그러나 이런 의견 다툼은 6년 전 진행한 실험에서 끝났다. “제가 6년 전 넷플릭스에 합류할 때 일입니다. 통계와 머신러닝을 이용한 알고리즘을 통해 추천하는 게 얼마나 적합한가에 대한 얘기가 있었습니다. 이때 실제 추천할 때 태그를 이용해 하는 게 낫지 않겠냐고 믿는 사람이 많았습니다. 그래서 직접 실험을 해봤습니다.”

넷플릭스는 사용자를 서로 다른 그룹으로 나눠 실험을 진행했다. 첫 번째 그룹에는 태그를 기반으로 추천 서비스를 만들어 제공했다. 두 번째 그룹에게는 영화 전문가에게 특정 영화를 보고 어떤 영화와 유사할지, 이 영화를 좋아하는 사람은 어떤 영화를 좋아할지 등 자문해 얻은 정보를 바탕으로 추천 서비스를 만들었다. 세 번째 그룹에는 태그 정보를 무시하고 통계와 머신러닝 기반으로 추천 서비스를 운영했다.

결과는 통계와 머신러닝 기반 추천 서비스를 경험한 사용자 집단에서 압도적으로 높은 만족도가 나왔다. 넷플릭스 구독을 해지한 사용자도 가장 적었다. 넷플릭스가 추천 알고리즘 개발 논문을 선보이며 공을 들이는 이유다.“A/B 테스트 대조 실험을 진행했습니다. 두세 달 동안 실제 서비스 환경에서 몇십만명에 이르는 사용자를 대상으로 실험을 진행했지요. 이 기간에 시청 시간이 어떻게 달라지는지, 추천된 영화를 얼마나 많이 시청하는지, 사용자가 해당 기간 동안 얼마나 탈퇴하는지 등을 살폈습니다. 그 결과 맨 마지막 사용자군 반응이 가장 좋게 나타나더군요.”그 뒤로 넷플릭스는 고민하지 않았다. 실제 서비스 운영 결과에서 얻은 경험을 바탕으로 어떻게 하면 더 좋은 추천 서비스를 만들 수 있을지 알고리즘을 개선하는 데 집중했다. 좋은 서비스를 바탕으로 차근차근 회원 수를 늘려나갔다.

“넷플릭스는 계속 성장하고 있습니다. 알고리즘 추천 서비스를 개선하고 나면, 이를 통해 얻은 정보를 바탕으로 콘텐츠를 보강할 계획입니다. 그럼 더 많은 콘텐츠를 바탕으로 사용자에게 더 나은 추천 서비스를 제공할 수 있게 되겠지요.”

 

반응형
LIST

댓글