How To Make Recommendation in Case Study (With Examples)

How To Make Recommendation in Case Study (With Examples)

After analyzing your case study’s problem and suggesting possible courses of action , you’re now ready to conclude it on a high note. 

But first, you need to write your recommendation to address the problem. In this article, we will guide you on how to make a recommendation in a case study. 

Table of Contents

What is recommendation in case study, what is the purpose of recommendation in the case study, 1. review your case study’s problem, 2. assess your case study’s alternative courses of action, 3. pick your case study’s best alternative course of action, 4. explain in detail why you recommend your preferred course of action, examples of recommendations in case study, tips and warnings.

example of recommendation in case study 1

The Recommendation details your most preferred solution for your case study’s problem.

After identifying and analyzing the problem, your next step is to suggest potential solutions. You did this in the Alternative Courses of Action (ACA) section. Once you’re done writing your ACAs, you need to pick which among these ACAs is the best. The chosen course of action will be the one you’re writing in the recommendation section. 

The Recommendation portion also provides a thorough justification for selecting your most preferred solution. 

Notice how a recommendation in a case study differs from a recommendation in a research paper . In the latter, the recommendation tells your reader some potential studies that can be performed in the future to support your findings or to explore factors that you’re unable to cover. 

example of recommendation in case study 2

Your main goal in writing a case study is not only to understand the case at hand but also to think of a feasible solution. However, there are multiple ways to approach an issue. Since it’s impossible to implement all these solutions at once, you only need to pick the best one. 

The Recommendation portion tells the readers which among the potential solutions is best to implement given the constraints of an organization or business. This section allows you to introduce, defend, and explain this optimal solution. 

How To Write Recommendation in Case Study

example of recommendation in case study 3

You cannot recommend a solution if you are unable to grasp your case study’s issue. Make sure that you’re aware of the problem as well as the viewpoint from which you want to analyze it . 

example of recommendation in case study 4

Once you’ve fully grasped your case study’s problem, it’s time to suggest some feasible solutions to address it. A separate section of your manuscript called the Alternative Courses of Action (ACA) is dedicated to discussing these potential solutions. 

Afterward, you need to evaluate each ACA by identifying its respective advantages and disadvantages. 

example of recommendation in case study 5

After evaluating each proposed ACA, pick the one you’ll recommend to address the problem. All alternatives have their pros and cons so you must use your discretion in picking the best among these ACAs.

To help you decide which ACA to pick, here are some factors to consider:

  • Realistic : The organization must have sufficient knowledge, expertise, resources, and manpower to execute the recommended solution. 
  • Economical: The recommended solution must be cost-effective.
  • Legal: The recommended solution must adhere to applicable laws.
  • Ethical: The recommended solution must not have moral repercussions. 
  • Timely: The recommended solution can be executed within the expected timeframe. 

You may also use a decision matrix to assist you in picking the best ACA 1 .  This matrix allows you to rank the ACAs based on your criteria. Please refer to our examples in the next section for an example of a Recommendation formed using a decision matrix. 

example of recommendation in case study 6

Provide your justifications for why you recommend your preferred solution. You can also explain why other alternatives are not chosen 2 .  

example of recommendation in case study 7

To help you understand how to make recommendations in a case study, let’s take a look at some examples below.

Case Study Problem : Lemongate Hotel is facing an overwhelming increase in the number of reservations due to a sudden implementation of a Local Government policy that boosts the city’s tourism. Although Lemongate Hotel has a sufficient area to accommodate the influx of tourists, the management is wary of the potential decline in the hotel’s quality of service while striving to meet the sudden increase in reservations. 

Alternative Courses of Action:

  • ACA 1: Relax hiring qualifications to employ more hotel employees to ensure that sufficient human resources can provide quality hotel service
  • ACA 2: Increase hotel reservation fees and other costs as a response to the influx of tourists demanding hotel accommodation
  • ACA 3: Reduce privileges and hotel services enjoyed by each customer so that hotel employees will not be overwhelmed by the increase in accommodations.

Recommendation: 

Upon analysis of the problem, it is recommended to implement ACA 1. Among all suggested ACAs, this option is the easiest to execute with the minimal cost required. It will not also impact potential profits and customers’ satisfaction with hotel service.

Meanwhile, implementing ACA 2 might discourage customers from making reservations due to higher fees and look for other hotels as substitutes. It is also not recommended to do ACA 3 because reducing hotel services and privileges offered to customers might harm the hotel’s public reputation in the long run. 

The first paragraph of our sample recommendation specifies what ACA is best to implement and why.

Meanwhile, the succeeding paragraphs explain that ACA 2 and ACA 3 are not optimal solutions due to some of their limitations and potential negative impacts on the organization. 

Example 2 (with Decision Matrix)

Case Study: Last week, Pristine Footwear released its newest sneakers model for women – “Flightless.” However, the management noticed that “Flightless” had a mediocre sales performance in the previous week. For this reason, “Flightless” might be pulled out in the next few months.  The management must decide on the fate of “Flightless” with Pristine Footwear’s financial performance in mind. 

  • ACA 1: Revamp “Flightless” marketing by hiring celebrities/social media influencers to promote the product
  • ACA 2: Improve the “Flightless” current model by tweaking some features to fit current style trends
  • ACA 3: Sell “Flightless” at a lower price to encourage more customers
  • ACA 4: Stop production of “Flightless” after a couple of weeks to cut losses

Decision Matrix

Recommendation

Based on the decision matrix above 3 , the best course of action that Pristine Wear, Inc. must employ is ACA 3 or selling “Flightless” shoes at lower prices to encourage more customers. This solution can be implemented immediately without the need for an excessive amount of financial resources. Since lower prices entice customers to purchase more, “Flightless” sales might perform better given a reduction in its price.

In this example, the recommendation was formed with the help of a decision matrix. Each ACA was given a score of between 1 – 4 for each criterion. Note that the criterion used depends on the priorities of an organization, so there’s no standardized way to make this matrix. 

Meanwhile, the recommendation we’ve made here consists of only one paragraph. Although the matrix already revealed that ACA 3 tops the selection, we still provided a clear explanation of why it is the best. 

  • Recommend with persuasion 4 . You may use data and statistics to back up your claim. Another option is to show that your preferred solution fits your theoretical knowledge about the case. For instance, if your recommendation involves reducing prices to entice customers to buy higher quantities of your products, you may invoke the “law of demand” 5 as a theoretical foundation of your recommendation. 
  • Be prepared to make an implementation plan. Some case study formats require an implementation plan integrated with your recommendation. Basically, the implementation plan provides a thorough guide on how to execute your chosen solution (e.g., a step-by-step plan with a schedule).
  • Manalili, K. (2021 – 2022). Selection of Best Applicant (Unpublished master’s thesis). Bulacan Agricultural State College. Retrieved September 23, 2022, from https://www.studocu.com/ph/document/bulacan-agricultural-state-college/business-administration/case-study-human-rights/19062233.
  • How to Analyze a Case Study. (n.d.). Retrieved September 23, 2022, from https://wps.prenhall.com/bp_laudon_essbus_7/48/12303/3149605.cw/content/index.html
  • Nguyen, C. (2022, April 13). How to Use a Decision Matrix to Assist Business Decision Making. Retrieved September 23, 2022, from https://venngage.com/blog/decision-matrix/
  • Case Study Analysis: Examples + How-to Guide & Writing Tips. (n.d.). Retrieved September 23, 2022, from https://custom-writing.org/blog/great-case-study-analysis
  • Hayes, A. (2022, January O8). Law of demand. Retrieved September 23, 2022, from https://www.investopedia.com/terms/l/lawofdemand.asp

Written by Jewel Kyle Fabula

in Career and Education , Juander How

case study on content recommendation

Jewel Kyle Fabula

Jewel Kyle Fabula is a Bachelor of Science in Economics student at the University of the Philippines Diliman. His passion for learning mathematics developed as he competed in some mathematics competitions during his Junior High School years. He loves cats, playing video games, and listening to music.

Browse all articles written by Jewel Kyle Fabula

Copyright Notice

All materials contained on this site are protected by the Republic of the Philippines copyright law and may not be reproduced, distributed, transmitted, displayed, published, or broadcast without the prior written permission of filipiknow.net or in the case of third party materials, the owner of that content. You may not alter or remove any trademark, copyright, or other notice from copies of the content. Be warned that we have already reported and helped terminate several websites and YouTube channels for blatantly stealing our content. If you wish to use filipiknow.net content for commercial purposes, such as for content syndication, etc., please contact us at legal(at)filipiknow(dot)net

📢 Tatvic is now a global partner of Usercentrics, world’s leading consent management platform

How Content Recommendation Platforms Capture Users’ Attention?

Last updated:.

  • July 21, 2023

Services:    

Share blog on.

As we look ahead, the future of marketing is set to be revolutionized by personalization. With rapid technological advancements paving the way for more human-centric experiences in the online world and a surge in digital behaviours post-pandemic, organizations must embrace the growing demand for personalized interactions. 

Within the vast landscape of customization, content recommendation platform emerges as a prominent branch, holding the key to revolutionizing marketing strategies. In its simplest form, consumers may encounter generic suggestions based on content popularity.

However, with the advent of AI-powered content recommendation engines, businesses can now tap into product catalogues and customer data to offer highly personalized choices.

In this article, we embark on a journey into the fundamental aspects of content recommendations. We’ll delve into the intricacies of suggestion placement strategies, the advantages of adopting content recommendation approaches, and the diverse platforms that empower this innovative marketing technique.

Moreover, we’ll provide real-life examples of content suggestions to demonstrate how they can catalyze your website’s or business’s growth and success. Let’s unlock the potential of content recommendations and uncover their transformative impact on user engagement and revenue generation.

What is a content recommendation engine?

The content recommendation engine is an advanced system that suggests relevant and personalized content to users based on their interests, preferences, and browsing behaviour. 

It leverages sophisticated algorithms, often powered by artificial intelligence, to analyze user data and understand their individual needs. By matching user profiles with appropriate content, content recommendation aims to enhance user engagement, improve user experience, and increase the likelihood of repeat visits and conversions on websites or digital platforms. 

This technology is widely employed by various online platforms and websites to deliver tailored content suggestions seamlessly integrated into the user’s browsing experience.

One common manifestation of content recommendations is through related content widgets, often labelled as “ recommended content ” or “ you might also like .” These widgets prompt users to explore additional material from the same website or other sites offering comparable content. The Wunder Ground serves as a notable example of effectively implementing content recommendations to engage users with relevant and captivating content.

How Content Recommendation Works?

Content recommendations operate through diverse approaches, depending on the platform and technology employed. 

1. Keyword and Tag-based Recommendations :

Certain platforms use keywords and tags to suggest relevant material to users. By analyzing user input, search queries, and content tags, the recommendation engine identifies content that aligns with the user’s interests and preferences.

2. User Interaction Tracking :

Some content recommendation tools track user interactions with website content, learning from their behaviour to make personalized suggestions during their subsequent visits. By understanding the user’s browsing history and content consumption patterns, the engine can tailor content recommendations to suit individual preferences.

3. Integration via Website Plugins :

Content suggestions can be seamlessly integrated into websites through the use of plugins. These plugins enable the presentation of related content widgets or personalized recommendation lists to users, fostering a more engaging browsing experience.

For instance, Amazon employs a content recommendation engine to suggest products based on users’ prior interactions , including viewed products, purchase history, and browsing activity. This AI-driven system enhances the user’s shopping experience by offering personalized product recommendations tailored to their specific interests and preferences.

What are the types of Content Recommendations?

Content recommendations can be categorized into two main types: on-page and off-page suggestions, each offering distinct benefits to website owners and users.

a) On-page Recommendations:

On-page content suggestions are strategically placed within your website, encouraging users to explore more of your material and prolong their stay. By presenting relevant content to visitors based on their current interests, on-page recommendations aim to increase the time users spend on your page and website, ultimately reducing bounce rates . 

As users engage more with your content, they are more likely to share it on social media, amplifying the visibility of your business and content. This type of recommendation is particularly advantageous for websites with substantial content portfolios, ensuring that a wider audience discovers and engages with the diverse material. However, for younger websites with limited content, the full potential of on-page content recommendations might not be fully realized.

CNN provides a real-life example of on-page content recommendations, showcasing content suggestions that align with their existing on-page material, enhancing user engagement and navigation.

b) Off-page Recommendations:

Off-page content recommendations guide users to external websites, directing them to relevant content hosted on other platforms. This type of content-based recommendation is particularly beneficial for websites that rely on referral or affiliate revenue streams. 

By offering users valuable content from external sources, you can enhance their browsing experience and foster loyalty. For websites that are part of a broader publishing network, off-page recommendations can be effectively used to drive traffic to other sites within the network, creating a mutually beneficial ecosystem. 

For instance, real estate content highlighted by “ magicbricks ” in “ timesofindia ” exemplifies how off-page recommendations can promote and strengthen content across different platforms.

What are the tips to improve the impact of Content Recommendations?

1. introduction:.

While the introduction of your blog article is crucial in grabbing readers’ attention, the conclusion plays an equally vital role. It serves as the key to sustaining their interest and enticing them to return for more engaging content. In this section, we’ll explore the power of content recommendations in your blog post’s conclusion, offering seven effective ways to leave a lasting impression on your audience.

2. Summarize your Main Point :

Incorporate content recommendations that expand on the core message of your blog article. Suggest additional resources or in-depth guides that elaborate on the topic, ensuring your readers gain a comprehensive understanding.

3. Activate the Reader’s Will to Act :

Drive engagement by persuading readers to take action. Recommend practical steps or actionable tips related to the blog’s content, motivating them to implement the insights shared.

4. Request a Share of the Reader’s Post :

Encourage social sharing by explicitly asking readers to share your blog post with their network. Utilize social media buttons or catchy CTAs to make sharing effortless.

5. Connect to Another Valuable Website :

Enhance your blog’s credibility by recommending external sources, such as newspaper articles or industry books. This not only adds value but also showcases your awareness of current trends in your profession.

6. Pose a Query to Invite Feedback :

Stimulate interaction and increase user engagement by posing thought-provoking questions. Encourage readers to leave comments, sharing their thoughts and experiences on the topic.

7. Inform Readers about Upcoming Events:

Tease your audience with exciting upcoming content. Mention webinars, podcasts, or exclusive releases, enticing them to subscribe to your blog for timely updates.

Promote Your Business or Product: Capitalize on related content recommendations to promote your business or product. For example, if your article discussed overcoming procrastination, link to an e-course on time management.

Where to Place Recommendations

Content Recommendation Using On-Site Widgets

A very easy tactic that is also quite successful for expanding your audience is the content recommendation widget!

You propose material using on-site widgets in this sort of placement so that when the reader scrolls down, the content shows on the right-hand side of the page.

With this form of placement, you suggest material using website widgets so that it will display on the reader’s right as they scroll down the page.

Here is an example of timesofindia blog news

Key Benefits:

  • Websites that use content suggestion widgets can increase revenue.
  • Widgets that offer personalized recommendations boost conversions by 15% to 20%.
  • Increased content engagement may be achieved with the use of recommendation widgets.

Creating a Product Recommendation Popup

Popups can make or break a website’s success, depending on how they are utilized. When done right, they can transform site visitors into paying clients and foster customer loyalty over time. Carefully choosing the recommended products can significantly impact the success of popups, ultimately leading to increased conversions and repeat customers.

To ensure popups don’t disrupt the user experience, consider the timing

Schedule popup suggestions to appear at strategic moments, such as when the visitor has engaged with enough content or is about to exit the page.

Example from GetResponse : Observe how GetResponse effectively times their popups, enhancing user satisfaction and engagement.

Content Recommendation Strategies

Smart content suggestion tactics assist you in providing the finest material to each visitor and providing a better experience. This will also help you discover more relevant material and increase engagement with it.

In this blog article, we will look at the top six best content suggestion tactics for 2023.

1. Most Popular Content:

The number of page views or unique site visitors that a piece of content receives is frequently used to identify the most viewed content. The length of time that a piece of content spends in different categories can also indicate its popularity.

This is one of the best content suggestion tactics for engaging first-time visitors to your site. Examine how “The Week” promotes its most popular material on its website:

2. Associated Information:

Proposing similar material is a top-notch content suggestion tactic for captivating clients and new visitors. Marketers leverage content recommendation engines equipped with complex algorithms to curate personalized suggestions based on users’ past interactions.

YouTube serves as a prime example, dynamically populating related videos on the right-hand side in real-time, tailoring recommendations to match the current video, and alluring viewers with engaging content.

3. Last Viewed Content:

The second best content suggestion strategy on our list is to propose new material based on prior content read by your audience.

AI-based algorithms provide the most recently watched material to the visitor first, and content suggestions are often based on watching statistics from the previous week, month, or any other time period.

This content recommendations-based method is used by popular websites such as Netflix and Amazon to engage their consumers.

4. Highly Individualized Recommendations:

Make sure your material is highly personalized if you want it to be viewed by as many individuals as possible.

In addition, contemporary customers need highly customized experiences across all digital media. According to a 2021 survey, 60% of participants stated they would return to a company that provides a personalized experience.

Content recommendation engines use various data sets in addition to demographic information to segment audiences and personalize content recommendations:

  • Internet searches
  • geographical area
  • Purchase background
  • Social interaction
  • Items in the shopping basket right now

5. People Also Viewed:

Using “People Also Viewed” as a content suggestion method frequently yields positive results.

Here, you give a user content recommendations based on what other users have watched. Here is one instance:

6. Group Content Recommendation:

Grouping material is an excellent content suggestion approach for a few reasons.

For starters, it enables you to display a greater volume of material to your audience at the same time. This is especially beneficial if you want to recommend a lot of stuff but don’t want to overload your viewers.

Bundling content keeps your users interested. You increase their chances of finding anything that interests them by offering numerous pieces of content (or items) at once.

This is how Accuweather promotes a collection of content articles covering a wide range of topics.

Conclusions

Content recommendations may boost personalisation on your website, increasing brand recognition and generating quality leads for your company. This, in turn, can enhance sales and income for your brand.

Siddarth Iyer

Sign up for our monthly newsletter

  • April 25, 2024

Introducing Google Maps Platform New Places API

  • Google Maps Platform
  • April 23, 2024

How to Implement Google Consent Mode v2?

  • March 29, 2024

Top-10 User Experience Fundamentals to Skyrocket Mobile Conversion Rate

  • Conversion Rate Optimization (CRO)
  • March 15, 2024

How competitor analysis helps in creating better App UX design?

Quick Links

© 2023 All rights reserved. Tatvic Analytics Private Limited.

Leverage Tatvic's comprehensive approach to 

  • Craft a Winning MarTech Strategy
  • Maximize the Power of Technology
  • Drive Data-Driven Decision Making
  • Continuously Optimize and Adapt
  • Stay Ahead in the Digital Landscape

Enquire Now

How to Write a Case Study: Bookmarkable Guide & Template

Braden Becker

Published: November 30, 2023

Earning the trust of prospective customers can be a struggle. Before you can even begin to expect to earn their business, you need to demonstrate your ability to deliver on what your product or service promises.

company conducting case study with candidate after learning how to write a case study

Sure, you could say that you're great at X or that you're way ahead of the competition when it comes to Y. But at the end of the day, what you really need to win new business is cold, hard proof.

One of the best ways to prove your worth is through a compelling case study. In fact, HubSpot’s 2020 State of Marketing report found that case studies are so compelling that they are the fifth most commonly used type of content used by marketers.

Download Now: 3 Free Case Study Templates

Below, I'll walk you through what a case study is, how to prepare for writing one, what you need to include in it, and how it can be an effective tactic. To jump to different areas of this post, click on the links below to automatically scroll.

Case Study Definition

Case study templates, how to write a case study.

  • How to Format a Case Study

Business Case Study Examples

A case study is a specific challenge a business has faced, and the solution they've chosen to solve it. Case studies can vary greatly in length and focus on several details related to the initial challenge and applied solution, and can be presented in various forms like a video, white paper, blog post, etc.

In professional settings, it's common for a case study to tell the story of a successful business partnership between a vendor and a client. Perhaps the success you're highlighting is in the number of leads your client generated, customers closed, or revenue gained. Any one of these key performance indicators (KPIs) are examples of your company's services in action.

When done correctly, these examples of your work can chronicle the positive impact your business has on existing or previous customers and help you attract new clients.

case study on content recommendation

Free Case Study Templates

Showcase your company's success using these three free case study templates.

  • Data-Driven Case Study Template
  • Product-Specific Case Study Template
  • General Case Study Template

You're all set!

Click this link to access this resource at any time.

Why write a case study? 

I know, you’re thinking “ Okay, but why do I need to write one of these? ” The truth is that while case studies are a huge undertaking, they are powerful marketing tools that allow you to demonstrate the value of your product to potential customers using real-world examples. Here are a few reasons why you should write case studies. 

1. Explain Complex Topics or Concepts

Case studies give you the space to break down complex concepts, ideas, and strategies and show how they can be applied in a practical way. You can use real-world examples, like an existing client, and use their story to create a compelling narrative that shows how your product solved their issue and how those strategies can be repeated to help other customers get similar successful results.  

2. Show Expertise

Case studies are a great way to demonstrate your knowledge and expertise on a given topic or industry. This is where you get the opportunity to show off your problem-solving skills and how you’ve generated successful outcomes for clients you’ve worked with. 

3. Build Trust and Credibility

In addition to showing off the attributes above, case studies are an excellent way to build credibility. They’re often filled with data and thoroughly researched, which shows readers you’ve done your homework. They can have confidence in the solutions you’ve presented because they’ve read through as you’ve explained the problem and outlined step-by-step what it took to solve it. All of these elements working together enable you to build trust with potential customers.

4. Create Social Proof

Using existing clients that have seen success working with your brand builds social proof . People are more likely to choose your brand if they know that others have found success working with you. Case studies do just that — putting your success on display for potential customers to see. 

All of these attributes work together to help you gain more clients. Plus you can even use quotes from customers featured in these studies and repurpose them in other marketing content. Now that you know more about the benefits of producing a case study, let’s check out how long these documents should be. 

How long should a case study be?

The length of a case study will vary depending on the complexity of the project or topic discussed. However, as a general guideline, case studies typically range from 500 to 1,500 words. 

Whatever length you choose, it should provide a clear understanding of the challenge, the solution you implemented, and the results achieved. This may be easier said than done, but it's important to strike a balance between providing enough detail to make the case study informative and concise enough to keep the reader's interest.

The primary goal here is to effectively communicate the key points and takeaways of the case study. It’s worth noting that this shouldn’t be a wall of text. Use headings, subheadings, bullet points, charts, and other graphics to break up the content and make it more scannable for readers. We’ve also seen brands incorporate video elements into case studies listed on their site for a more engaging experience. 

Ultimately, the length of your case study should be determined by the amount of information necessary to convey the story and its impact without becoming too long. Next, let’s look at some templates to take the guesswork out of creating one. 

To help you arm your prospects with information they can trust, we've put together a step-by-step guide on how to create effective case studies for your business with free case study templates for creating your own.

Tell us a little about yourself below to gain access today:

And to give you more options, we’ll highlight some useful templates that serve different needs. But remember, there are endless possibilities when it comes to demonstrating the work your business has done.

1. General Case Study Template

case study templates: general

Do you have a specific product or service that you’re trying to sell, but not enough reviews or success stories? This Product Specific case study template will help.

This template relies less on metrics, and more on highlighting the customer’s experience and satisfaction. As you follow the template instructions, you’ll be prompted to speak more about the benefits of the specific product, rather than your team’s process for working with the customer.

4. Bold Social Media Business Case Study Template

case study templates: bold social media business

You can find templates that represent different niches, industries, or strategies that your business has found success in — like a bold social media business case study template.

In this template, you can tell the story of how your social media marketing strategy has helped you or your client through collaboration or sale of your service. Customize it to reflect the different marketing channels used in your business and show off how well your business has been able to boost traffic, engagement, follows, and more.

5. Lead Generation Business Case Study Template

case study templates: lead generation business

It’s important to note that not every case study has to be the product of a sale or customer story, sometimes they can be informative lessons that your own business has experienced. A great example of this is the Lead Generation Business case study template.

If you’re looking to share operational successes regarding how your team has improved processes or content, you should include the stories of different team members involved, how the solution was found, and how it has made a difference in the work your business does.

Now that we’ve discussed different templates and ideas for how to use them, let’s break down how to create your own case study with one.

  • Get started with case study templates.
  • Determine the case study's objective.
  • Establish a case study medium.
  • Find the right case study candidate.
  • Contact your candidate for permission to write about them.
  • Ensure you have all the resources you need to proceed once you get a response.
  • Download a case study email template.
  • Define the process you want to follow with the client.
  • Ensure you're asking the right questions.
  • Layout your case study format.
  • Publish and promote your case study.

1. Get started with case study templates.

Telling your customer's story is a delicate process — you need to highlight their success while naturally incorporating your business into their story.

If you're just getting started with case studies, we recommend you download HubSpot's Case Study Templates we mentioned before to kickstart the process.

2. Determine the case study's objective.

All business case studies are designed to demonstrate the value of your services, but they can focus on several different client objectives.

Your first step when writing a case study is to determine the objective or goal of the subject you're featuring. In other words, what will the client have succeeded in doing by the end of the piece?

The client objective you focus on will depend on what you want to prove to your future customers as a result of publishing this case study.

Your case study can focus on one of the following client objectives:

  • Complying with government regulation
  • Lowering business costs
  • Becoming profitable
  • Generating more leads
  • Closing on more customers
  • Generating more revenue
  • Expanding into a new market
  • Becoming more sustainable or energy-efficient

3. Establish a case study medium.

Next, you'll determine the medium in which you'll create the case study. In other words, how will you tell this story?

Case studies don't have to be simple, written one-pagers. Using different media in your case study can allow you to promote your final piece on different channels. For example, while a written case study might just live on your website and get featured in a Facebook post, you can post an infographic case study on Pinterest and a video case study on your YouTube channel.

Here are some different case study mediums to consider:

Written Case Study

Consider writing this case study in the form of an ebook and converting it to a downloadable PDF. Then, gate the PDF behind a landing page and form for readers to fill out before downloading the piece, allowing this case study to generate leads for your business.

Video Case Study

Plan on meeting with the client and shooting an interview. Seeing the subject, in person, talk about the service you provided them can go a long way in the eyes of your potential customers.

Infographic Case Study

Use the long, vertical format of an infographic to tell your success story from top to bottom. As you progress down the infographic, emphasize major KPIs using bigger text and charts that show the successes your client has had since working with you.

Podcast Case Study

Podcasts are a platform for you to have a candid conversation with your client. This type of case study can sound more real and human to your audience — they'll know the partnership between you and your client was a genuine success.

4. Find the right case study candidate.

Writing about your previous projects requires more than picking a client and telling a story. You need permission, quotes, and a plan. To start, here are a few things to look for in potential candidates.

Product Knowledge

It helps to select a customer who's well-versed in the logistics of your product or service. That way, he or she can better speak to the value of what you offer in a way that makes sense for future customers.

Remarkable Results

Clients that have seen the best results are going to make the strongest case studies. If their own businesses have seen an exemplary ROI from your product or service, they're more likely to convey the enthusiasm that you want prospects to feel, too.

One part of this step is to choose clients who have experienced unexpected success from your product or service. When you've provided non-traditional customers — in industries that you don't usually work with, for example — with positive results, it can help to remove doubts from prospects.

Recognizable Names

While small companies can have powerful stories, bigger or more notable brands tend to lend credibility to your own. In fact, 89% of consumers say they'll buy from a brand they already recognize over a competitor, especially if they already follow them on social media.

Customers that came to you after working with a competitor help highlight your competitive advantage and might even sway decisions in your favor.

5. Contact your candidate for permission to write about them.

To get the case study candidate involved, you have to set the stage for clear and open communication. That means outlining expectations and a timeline right away — not having those is one of the biggest culprits in delayed case study creation.

Most importantly at this point, however, is getting your subject's approval. When first reaching out to your case study candidate, provide them with the case study's objective and format — both of which you will have come up with in the first two steps above.

To get this initial permission from your subject, put yourself in their shoes — what would they want out of this case study? Although you're writing this for your own company's benefit, your subject is far more interested in the benefit it has for them.

Benefits to Offer Your Case Study Candidate

Here are four potential benefits you can promise your case study candidate to gain their approval.

Brand Exposure

Explain to your subject to whom this case study will be exposed, and how this exposure can help increase their brand awareness both in and beyond their own industry. In the B2B sector, brand awareness can be hard to collect outside one's own market, making case studies particularly useful to a client looking to expand their name's reach.

Employee Exposure

Allow your subject to provide quotes with credits back to specific employees. When this is an option for them, their brand isn't the only thing expanding its reach — their employees can get their name out there, too. This presents your subject with networking and career development opportunities they might not have otherwise.

Product Discount

This is a more tangible incentive you can offer your case study candidate, especially if they're a current customer of yours. If they agree to be your subject, offer them a product discount — or a free trial of another product — as a thank-you for their help creating your case study.

Backlinks and Website Traffic

Here's a benefit that is sure to resonate with your subject's marketing team: If you publish your case study on your website, and your study links back to your subject's website — known as a "backlink" — this small gesture can give them website traffic from visitors who click through to your subject's website.

Additionally, a backlink from you increases your subject's page authority in the eyes of Google. This helps them rank more highly in search engine results and collect traffic from readers who are already looking for information about their industry.

6. Ensure you have all the resources you need to proceed once you get a response.

So you know what you’re going to offer your candidate, it’s time that you prepare the resources needed for if and when they agree to participate, like a case study release form and success story letter.

Let's break those two down.

Case Study Release Form

This document can vary, depending on factors like the size of your business, the nature of your work, and what you intend to do with the case studies once they are completed. That said, you should typically aim to include the following in the Case Study Release Form:

  • A clear explanation of why you are creating this case study and how it will be used.
  • A statement defining the information and potentially trademarked information you expect to include about the company — things like names, logos, job titles, and pictures.
  • An explanation of what you expect from the participant, beyond the completion of the case study. For example, is this customer willing to act as a reference or share feedback, and do you have permission to pass contact information along for these purposes?
  • A note about compensation.

Success Story Letter

As noted in the sample email, this document serves as an outline for the entire case study process. Other than a brief explanation of how the customer will benefit from case study participation, you'll want to be sure to define the following steps in the Success Story Letter.

7. Download a case study email template.

While you gathered your resources, your candidate has gotten time to read over the proposal. When your candidate approves of your case study, it's time to send them a release form.

A case study release form tells you what you'll need from your chosen subject, like permission to use any brand names and share the project information publicly. Kick-off this process with an email that runs through exactly what they can expect from you, as well as what you need from them. To give you an idea of what that might look like, check out this sample email:

sample case study email release form template

8. Define the process you want to follow with the client.

Before you can begin the case study, you have to have a clear outline of the case study process with your client. An example of an effective outline would include the following information.

The Acceptance

First, you'll need to receive internal approval from the company's marketing team. Once approved, the Release Form should be signed and returned to you. It's also a good time to determine a timeline that meets the needs and capabilities of both teams.

The Questionnaire

To ensure that you have a productive interview — which is one of the best ways to collect information for the case study — you'll want to ask the participant to complete a questionnaire before this conversation. That will provide your team with the necessary foundation to organize the interview, and get the most out of it.

The Interview

Once the questionnaire is completed, someone on your team should reach out to the participant to schedule a 30- to 60-minute interview, which should include a series of custom questions related to the customer's experience with your product or service.

The Draft Review

After the case study is composed, you'll want to send a draft to the customer, allowing an opportunity to give you feedback and edits.

The Final Approval

Once any necessary edits are completed, send a revised copy of the case study to the customer for final approval.

Once the case study goes live — on your website or elsewhere — it's best to contact the customer with a link to the page where the case study lives. Don't be afraid to ask your participants to share these links with their own networks, as it not only demonstrates your ability to deliver positive results and impressive growth, as well.

9. Ensure you're asking the right questions.

Before you execute the questionnaire and actual interview, make sure you're setting yourself up for success. A strong case study results from being prepared to ask the right questions. What do those look like? Here are a few examples to get you started:

  • What are your goals?
  • What challenges were you experiencing before purchasing our product or service?
  • What made our product or service stand out against our competitors?
  • What did your decision-making process look like?
  • How have you benefited from using our product or service? (Where applicable, always ask for data.)

Keep in mind that the questionnaire is designed to help you gain insights into what sort of strong, success-focused questions to ask during the actual interview. And once you get to that stage, we recommend that you follow the "Golden Rule of Interviewing." Sounds fancy, right? It's actually quite simple — ask open-ended questions.

If you're looking to craft a compelling story, "yes" or "no" answers won't provide the details you need. Focus on questions that invite elaboration, such as, "Can you describe ...?" or, "Tell me about ..."

In terms of the interview structure, we recommend categorizing the questions and flowing them into six specific sections that will mirror a successful case study format. Combined, they'll allow you to gather enough information to put together a rich, comprehensive study.

Open with the customer's business.

The goal of this section is to generate a better understanding of the company's current challenges and goals, and how they fit into the landscape of their industry. Sample questions might include:

  • How long have you been in business?
  • How many employees do you have?
  • What are some of the objectives of your department at this time?

Cite a problem or pain point.

To tell a compelling story, you need context. That helps match the customer's need with your solution. Sample questions might include:

  • What challenges and objectives led you to look for a solution?
  • What might have happened if you did not identify a solution?
  • Did you explore other solutions before this that did not work out? If so, what happened?

Discuss the decision process.

Exploring how the customer decided to work with you helps to guide potential customers through their own decision-making processes. Sample questions might include:

  • How did you hear about our product or service?
  • Who was involved in the selection process?
  • What was most important to you when evaluating your options?

Explain how a solution was implemented.

The focus here should be placed on the customer's experience during the onboarding process. Sample questions might include:

  • How long did it take to get up and running?
  • Did that meet your expectations?
  • Who was involved in the process?

Explain how the solution works.

The goal of this section is to better understand how the customer is using your product or service. Sample questions might include:

  • Is there a particular aspect of the product or service that you rely on most?
  • Who is using the product or service?

End with the results.

In this section, you want to uncover impressive measurable outcomes — the more numbers, the better. Sample questions might include:

  • How is the product or service helping you save time and increase productivity?
  • In what ways does that enhance your competitive advantage?
  • How much have you increased metrics X, Y, and Z?

10. Lay out your case study format.

When it comes time to take all of the information you've collected and actually turn it into something, it's easy to feel overwhelmed. Where should you start? What should you include? What's the best way to structure it?

To help you get a handle on this step, it's important to first understand that there is no one-size-fits-all when it comes to the ways you can present a case study. They can be very visual, which you'll see in some of the examples we've included below, and can sometimes be communicated mostly through video or photos, with a bit of accompanying text.

Here are the sections we suggest, which we'll cover in more detail down below:

  • Title: Keep it short. Develop a succinct but interesting project name you can give the work you did with your subject.
  • Subtitle: Use this copy to briefly elaborate on the accomplishment. What was done? The case study itself will explain how you got there.
  • Executive Summary : A 2-4 sentence summary of the entire story. You'll want to follow it with 2-3 bullet points that display metrics showcasing success.
  • About the Subject: An introduction to the person or company you served, which can be pulled from a LinkedIn Business profile or client website.
  • Challenges and Objectives: A 2-3 paragraph description of the customer's challenges, before using your product or service. This section should also include the goals or objectives the customer set out to achieve.
  • How Product/Service Helped: A 2-3 paragraph section that describes how your product or service provided a solution to their problem.
  • Results: A 2-3 paragraph testimonial that proves how your product or service specifically benefited the person or company and helped achieve its goals. Include numbers to quantify your contributions.
  • Supporting Visuals or Quotes: Pick one or two powerful quotes that you would feature at the bottom of the sections above, as well as a visual that supports the story you are telling.
  • Future Plans: Everyone likes an epilogue. Comment on what's ahead for your case study subject, whether or not those plans involve you.
  • Call to Action (CTA): Not every case study needs a CTA, but putting a passive one at the end of your case study can encourage your readers to take an action on your website after learning about the work you've done.

When laying out your case study, focus on conveying the information you've gathered in the most clear and concise way possible. Make it easy to scan and comprehend, and be sure to provide an attractive call-to-action at the bottom — that should provide readers an opportunity to learn more about your product or service.

11. Publish and promote your case study.

Once you've completed your case study, it's time to publish and promote it. Some case study formats have pretty obvious promotional outlets — a video case study can go on YouTube, just as an infographic case study can go on Pinterest.

But there are still other ways to publish and promote your case study. Here are a couple of ideas:

Lead Gen in a Blog Post

As stated earlier in this article, written case studies make terrific lead-generators if you convert them into a downloadable format, like a PDF. To generate leads from your case study, consider writing a blog post that tells an abbreviated story of your client's success and asking readers to fill out a form with their name and email address if they'd like to read the rest in your PDF.

Then, promote this blog post on social media, through a Facebook post or a tweet.

Published as a Page on Your Website

As a growing business, you might need to display your case study out in the open to gain the trust of your target audience.

Rather than gating it behind a landing page, publish your case study to its own page on your website, and direct people here from your homepage with a "Case Studies" or "Testimonials" button along your homepage's top navigation bar.

Format for a Case Study

The traditional case study format includes the following parts: a title and subtitle, a client profile, a summary of the customer’s challenges and objectives, an account of how your solution helped, and a description of the results. You might also want to include supporting visuals and quotes, future plans, and calls-to-action.

case study format: title

Image Source

The title is one of the most important parts of your case study. It should draw readers in while succinctly describing the potential benefits of working with your company. To that end, your title should:

  • State the name of your custome r. Right away, the reader must learn which company used your products and services. This is especially important if your customer has a recognizable brand. If you work with individuals and not companies, you may omit the name and go with professional titles: “A Marketer…”, “A CFO…”, and so forth.
  • State which product your customer used . Even if you only offer one product or service, or if your company name is the same as your product name, you should still include the name of your solution. That way, readers who are not familiar with your business can become aware of what you sell.
  • Allude to the results achieved . You don’t necessarily need to provide hard numbers, but the title needs to represent the benefits, quickly. That way, if a reader doesn’t stay to read, they can walk away with the most essential information: Your product works.

The example above, “Crunch Fitness Increases Leads and Signups With HubSpot,” achieves all three — without being wordy. Keeping your title short and sweet is also essential.

2. Subtitle

case study format: subtitle

Your subtitle is another essential part of your case study — don’t skip it, even if you think you’ve done the work with the title. In this section, include a brief summary of the challenges your customer was facing before they began to use your products and services. Then, drive the point home by reiterating the benefits your customer experienced by working with you.

The above example reads:

“Crunch Fitness was franchising rapidly when COVID-19 forced fitness clubs around the world to close their doors. But the company stayed agile by using HubSpot to increase leads and free trial signups.”

We like that the case study team expressed the urgency of the problem — opening more locations in the midst of a pandemic — and placed the focus on the customer’s ability to stay agile.

3. Executive Summary

case study format: executive summary

The executive summary should provide a snapshot of your customer, their challenges, and the benefits they enjoyed from working with you. Think it’s too much? Think again — the purpose of the case study is to emphasize, again and again, how well your product works.

The good news is that depending on your design, the executive summary can be mixed with the subtitle or with the “About the Company” section. Many times, this section doesn’t need an explicit “Executive Summary” subheading. You do need, however, to provide a convenient snapshot for readers to scan.

In the above example, ADP included information about its customer in a scannable bullet-point format, then provided two sections: “Business Challenge” and “How ADP Helped.” We love how simple and easy the format is to follow for those who are unfamiliar with ADP or its typical customer.

4. About the Company

case study format: about the company

Readers need to know and understand who your customer is. This is important for several reasons: It helps your reader potentially relate to your customer, it defines your ideal client profile (which is essential to deter poor-fit prospects who might have reached out without knowing they were a poor fit), and it gives your customer an indirect boon by subtly promoting their products and services.

Feel free to keep this section as simple as possible. You can simply copy and paste information from the company’s LinkedIn, use a quote directly from your customer, or take a more creative storytelling approach.

In the above example, HubSpot included one paragraph of description for Crunch Fitness and a few bullet points. Below, ADP tells the story of its customer using an engaging, personable technique that effectively draws readers in.

case study format: storytelling about the business

5. Challenges and Objectives

case study format: challenges and objectives

The challenges and objectives section of your case study is the place to lay out, in detail, the difficulties your customer faced prior to working with you — and what they hoped to achieve when they enlisted your help.

In this section, you can be as brief or as descriptive as you’d like, but remember: Stress the urgency of the situation. Don’t understate how much your customer needed your solution (but don’t exaggerate and lie, either). Provide contextual information as necessary. For instance, the pandemic and societal factors may have contributed to the urgency of the need.

Take the above example from design consultancy IDEO:

“Educational opportunities for adults have become difficult to access in the United States, just when they’re needed most. To counter this trend, IDEO helped the city of South Bend and the Drucker Institute launch Bendable, a community-powered platform that connects people with opportunities to learn with and from each other.”

We love how IDEO mentions the difficulties the United States faces at large, the efforts its customer is taking to address these issues, and the steps IDEO took to help.

6. How Product/Service Helped

case study format: how the service helped

This is where you get your product or service to shine. Cover the specific benefits that your customer enjoyed and the features they gleaned the most use out of. You can also go into detail about how you worked with and for your customer. Maybe you met several times before choosing the right solution, or you consulted with external agencies to create the best package for them.

Whatever the case may be, try to illustrate how easy and pain-free it is to work with the representatives at your company. After all, potential customers aren’t looking to just purchase a product. They’re looking for a dependable provider that will strive to exceed their expectations.

In the above example, IDEO describes how it partnered with research institutes and spoke with learners to create Bendable, a free educational platform. We love how it shows its proactivity and thoroughness. It makes potential customers feel that IDEO might do something similar for them.

case study format: results

The results are essential, and the best part is that you don’t need to write the entirety of the case study before sharing them. Like HubSpot, IDEO, and ADP, you can include the results right below the subtitle or executive summary. Use data and numbers to substantiate the success of your efforts, but if you don’t have numbers, you can provide quotes from your customers.

We can’t overstate the importance of the results. In fact, if you wanted to create a short case study, you could include your title, challenge, solution (how your product helped), and result.

8. Supporting Visuals or Quotes

case study format: quote

Let your customer speak for themselves by including quotes from the representatives who directly interfaced with your company.

Visuals can also help, even if they’re stock images. On one side, they can help you convey your customer’s industry, and on the other, they can indirectly convey your successes. For instance, a picture of a happy professional — even if they’re not your customer — will communicate that your product can lead to a happy client.

In this example from IDEO, we see a man standing in a boat. IDEO’s customer is neither the man pictured nor the manufacturer of the boat, but rather Conservation International, an environmental organization. This imagery provides a visually pleasing pattern interrupt to the page, while still conveying what the case study is about.

9. Future Plans

This is optional, but including future plans can help you close on a more positive, personable note than if you were to simply include a quote or the results. In this space, you can show that your product will remain in your customer’s tech stack for years to come, or that your services will continue to be instrumental to your customer’s success.

Alternatively, if you work only on time-bound projects, you can allude to the positive impact your customer will continue to see, even after years of the end of the contract.

10. Call to Action (CTA)

case study format: call to action

Not every case study needs a CTA, but we’d still encourage it. Putting one at the end of your case study will encourage your readers to take an action on your website after learning about the work you've done.

It will also make it easier for them to reach out, if they’re ready to start immediately. You don’t want to lose business just because they have to scroll all the way back up to reach out to your team.

To help you visualize this case study outline, check out the case study template below, which can also be downloaded here .

You drove the results, made the connection, set the expectations, used the questionnaire to conduct a successful interview, and boiled down your findings into a compelling story. And after all of that, you're left with a little piece of sales enabling gold — a case study.

To show you what a well-executed final product looks like, have a look at some of these marketing case study examples.

1. "Shopify Uses HubSpot CRM to Transform High Volume Sales Organization," by HubSpot

What's interesting about this case study is the way it leads with the customer. This reflects a major HubSpot value, which is to always solve for the customer first. The copy leads with a brief description of why Shopify uses HubSpot and is accompanied by a short video and some basic statistics on the company.

Notice that this case study uses mixed media. Yes, there is a short video, but it's elaborated upon in the additional text on the page. So, while case studies can use one or the other, don't be afraid to combine written copy with visuals to emphasize the project's success.

2. "New England Journal of Medicine," by Corey McPherson Nash

When branding and design studio Corey McPherson Nash showcases its work, it makes sense for it to be visual — after all, that's what they do. So in building the case study for the studio's work on the New England Journal of Medicine's integrated advertising campaign — a project that included the goal of promoting the client's digital presence — Corey McPherson Nash showed its audience what it did, rather than purely telling it.

Notice that the case study does include some light written copy — which includes the major points we've suggested — but lets the visuals do the talking, allowing users to really absorb the studio's services.

3. "Designing the Future of Urban Farming," by IDEO

Here's a design company that knows how to lead with simplicity in its case studies. As soon as the visitor arrives at the page, he or she is greeted with a big, bold photo, and two very simple columns of text — "The Challenge" and "The Outcome."

Immediately, IDEO has communicated two of the case study's major pillars. And while that's great — the company created a solution for vertical farming startup INFARM's challenge — it doesn't stop there. As the user scrolls down, those pillars are elaborated upon with comprehensive (but not overwhelming) copy that outlines what that process looked like, replete with quotes and additional visuals.

4. "Secure Wi-Fi Wins Big for Tournament," by WatchGuard

Then, there are the cases when visuals can tell almost the entire story — when executed correctly. Network security provider WatchGuard can do that through this video, which tells the story of how its services enhanced the attendee and vendor experience at the Windmill Ultimate Frisbee tournament.

5. Rock and Roll Hall of Fame Boosts Social Media Engagement and Brand Awareness with HubSpot

In the case study above , HubSpot uses photos, videos, screenshots, and helpful stats to tell the story of how the Rock and Roll Hall of Fame used the bot, CRM, and social media tools to gain brand awareness.

6. Small Desk Plant Business Ups Sales by 30% With Trello

This case study from Trello is straightforward and easy to understand. It begins by explaining the background of the company that decided to use it, what its goals were, and how it planned to use Trello to help them.

It then goes on to discuss how the software was implemented and what tasks and teams benefited from it. Towards the end, it explains the sales results that came from implementing the software and includes quotes from decision-makers at the company that implemented it.

7. Facebook's Mercedes Benz Success Story

Facebook's Success Stories page hosts a number of well-designed and easy-to-understand case studies that visually and editorially get to the bottom line quickly.

Each study begins with key stats that draw the reader in. Then it's organized by highlighting a problem or goal in the introduction, the process the company took to reach its goals, and the results. Then, in the end, Facebook notes the tools used in the case study.

Showcasing Your Work

You work hard at what you do. Now, it's time to show it to the world — and, perhaps more important, to potential customers. Before you show off the projects that make you the proudest, we hope you follow these important steps that will help you effectively communicate that work and leave all parties feeling good about it.

Editor's Note: This blog post was originally published in February 2017 but was updated for comprehensiveness and freshness in July 2021.

New Call-to-action

Don't forget to share this post!

Related articles.

7 Pieces of Content Your Audience Really Wants to See [New Data]

7 Pieces of Content Your Audience Really Wants to See [New Data]

How to Market an Ebook: 21 Ways to Promote Your Content Offers

How to Market an Ebook: 21 Ways to Promote Your Content Offers

How to Write a Listicle [+ Examples and Ideas]

How to Write a Listicle [+ Examples and Ideas]

28 Case Study Examples Every Marketer Should See

28 Case Study Examples Every Marketer Should See

What Is a White Paper? [FAQs]

What Is a White Paper? [FAQs]

What is an Advertorial? 8 Examples to Help You Write One

What is an Advertorial? 8 Examples to Help You Write One

How to Create Marketing Offers That Don't Fall Flat

How to Create Marketing Offers That Don't Fall Flat

20 Creative Ways To Repurpose Content

20 Creative Ways To Repurpose Content

16 Important Ways to Use Case Studies in Your Marketing

16 Important Ways to Use Case Studies in Your Marketing

11 Ways to Make Your Blog Post Interactive

11 Ways to Make Your Blog Post Interactive

Showcase your company's success using these free case study templates.

Marketing software that helps you drive revenue, save time and resources, and measure and optimize your investments — all on one easy-to-use platform

  • Affiliate Marketing
  • Landing Pages
  • Lead Generation
  • Marketing Attribution
  • Native Advertising
  • Brand Awareness
  • Content Marketing
  • Performance Marketing
  • Taboola Best Practices
  • Programmatic Marketing
  • Video Marketing
  • Driving Traffic
  • Research & Reports
  • GenAI Ad Maker
  • Case Studies
  • Data & Trends
  • Revenue Optimization
  • Product News
  • Taboola Life
  • Company News
  • Data Science
  • Machine Learning
  • Tips and Tricks
  • Audience Engagement

What is Content Recommendation? A Full Guide to What Content Recommendation is and Why it’s Important

“Content recommendation” is a term you hear pretty regularly when discussing online  content marketing and promotion. Done right, recommended content can help you increase website traffic , get more leads , and boost sales, all of which are great reasons to integrate them into your promotion strategy.

In this guide, I’ll examine the ins and outs of content recommendations, including methods for recommendation placements, why content recommendation is a valuable tactic, and content recommendation platforms. We’ll also share examples of content recommendations so you can see how they can help grow your website or business.

What is Content Recommendation?

At its simplest, content recommendation is a system for suggesting content to visitors who view your engaging website based on what they are already interested in. It’s kind of like Netflix, but for web content. It is a form of native advertising  in that content recommendations integrate seamlessly with a site’s other content.

You’ll often see content recommendations in the form of a related content widget (with the title “recommended content” or “you might also like”) urging visitors to look at more content from your site or other sites with similar content. Here’s an example from  The Weather Channel .

Content recommendation example - weather channel

However, as you’ll see, that’s not the only purpose of content-based recommendations.

Content recommendation works in a variety of ways via website plugins and software platforms. Some content-based recommendation tools use keywords and tags to make their suggestions, while others pay attention to how people are engaging with the content they’re looking at now, and use that as a guide for recommendations. You can also have hybrid recommendation tools that use both of those approaches.

An example of content recommendations in action is  Europa Press , which saw clickthrough rate increase 46% and revenue increase 78% by using content recommendation widgets.

Europa press content recommendations

On vs Off-Page Recommendations

There are two main types of content recommendations: on-page and off-page. On-page recommendations draw visitors deeper into your site by recommending more of your own content for them to check out.

On-page content-based recommendations help decrease your website bounce rate and increase the time people spend on your page and website. In turn, that boosts engagement with your content and can result in more social shares, creating even more awareness of your brand and content.

If you publish a lot of valuable content, then it’s worth exploring on-page content recommendations to make sure more people see more of that content. For example,  Newshub  increased revenue by 168% and got 14 million pageviews for video recommendations by using Taboola’s content recommendation platform.

on page content recommendation

On the other hand, if your site is brand new and you don’t have a lot of content yet, you may not get the full benefit of on-page content recommendation.

Off-page recommendations take your visitors to other people’s websites by recommending related external content that they will find interesting. This type of content-based recommendation is useful for websites relying on referral or affiliate income , which can boost your income over time. It’s also useful for sites that are part of a larger publishing network, as you can use recommendations to drive traffic to other sites in the group.

When using off-page recommendations, it’s important to ensure that the content you’re recommending is high quality. If it isn’t, that can hurt your visitors’ perception of your brand. And it’s essential to avoid driving traffic to competitors’ sites, so you don’t lose customers. However, it’s simple to take care of that when setting the rules for content recommendations.

Here’s an example of off-page recommendations in action.  SocioPal  used off-page recommendations to drive traffic to its app download page, increasing downloads by 30%.

off page content recommendation

Hear.com  had a tenfold increase in traffic to its site after inserting content recommendation placements on other sites.

Recommendation Placement

A crucial factor in achieving your goals for content recommendation is recommendation placement: where you actually put your recommendations so that your visitors will see and act on them.

One common location for article or video recommendations is at the end of a piece of content, so visitors have somewhere to go next when they’ve finished reading a blog post or article. For example,  Blinkist ‘s placements gained the company more than 60,000 signups in six months.

content recommendations

And  Placetel  used branded video at the end of a page to increase brand awareness The campaign got 80 million impressions and drove a 750% increase in native signups.

Placetel campaign

Your placements can also mimic the look of a social media site.  Ynet  implemented Taboola Feed and recorded an increase in CTR across all its sites.

Ynet Taboola Feed

Placing recommendations under content isn’t the only option. You can also make content recommendations via on-site widgets that slide into place when people are near the end of the page or have been on the page for a few seconds. And you can also showcase content recommendations via popups. The secret to these last two placements is timing, as you don’t want to annoy visitors by interrupting their current reading experience.

Why is Content Recommendation Important?

Content recommendation is important because it provides an easy way to personalize your visitors’ experience, so they see the content that’s most relevant to them at any time. This, in turn, helps to improve brand awareness, time on your site, and boost  user engagement .

Content recommendation works well for both advertisers and publishers. Publishers can acquire new readers and subscribers, and understand which content is most popular. Advertisers can use content recommendations to put their sponsored content on high-quality sites, without worrying about being blocked by ad blockers.

What is a Content Recommendation Platform?

A content recommendation platform, also known as a content discovery platform, is the software that makes content recommendations work. It uses algorithms to automatically identify and recommend relevant content to website visitors based on preset criteria. For example, you can choose to recommend content on a particular topic to visitors to a certain page.

One example of a content recommendation platform is Taboola. Taboola’s content discovery functionality enables marketers and publishers to reach new audiences, generate leads, increase brand awareness and boost engagement with high-quality placements and finely targeted recommendations.

For example,  Newshub  used Taboola Feed to drive traffic to sponsored articles written for clients, and achieved a 168% revenue boost.

newshub taboola feed

EuroNews  doubled revenue in a week, and saw a 60% increase in organic CTR, by using Taboola Feed and Taboola Newsroom to increase engagement.

euronews taboola case study

Travel site  Traicey  used Taboola Feed to drive a 200% increase in revenue. Organic CTR also rose by 150%.

Traicy Taboola case study

Final Thoughts

As you’ve seen, content recommendations have the potential to increase reader and visitor engagement, resulting in greater brand awareness, as well as more leads and sales. Now that you understand more about how content based recommendations work and how they can help your brand, get in touch with Taboola to learn more.

Get Your Taboola Feed Today!

Book a Demo

Multiply your Shopify Store's Revenue with Personalization

30 day free-trial

Try Argoid for your business now!

Get product recommendation ribbons like 'Trending', 'Similar Products' and more, to improve conversion and sales Try Argoid, risk-free

Decoding Amazon's Recommendation System

case study on content recommendation

SHARE THIS BLOG

A recommendation engine is a tool that uses machine learning to filter specific items from a larger dataset. There are different types of filtering methods, but in general, the recommendation engine uses existing user insights - what a user has previously interacted with, what products were purchased, which movies were watched, and so on, to match other items similar to these, and then makes an item recommendation. The Amazon recommendation system works in a similar way.

amazon recommendation engine

Because 'recommendations' are made based on a user's past interactions and interests, they are personalized to each user and the user is therefore very likely to find the recommended product interesting as well.

And this is not an assumption, but a proven fact. Firstly, the very act of delivering personalized content is powerful - 80 percent of consumers are more likely to purchase from a brand that delivers personalized content, according to a study by Epsilon Marketing.

Secondly, a brand that recommends products experiences higher conversions rates than a brand that doesn't - customers who click on product recommendations are 4 times more likely to add that product to cart and complete the purchase according to a study by PracticalEcommerce ..

The most potent example of the benefits of recommendations is probably Amazon. Amazon is the largest e-commerce brand in the world in terms of revenue and market share. ( Statista )

In 2021, Amazon's net revenue from e-commerce sales was US$470 billion , and about 35 percent of all sales on Amazon happen via recommendations.

This clearly elucidates the power of recommendations. In this case study, we look at how Amazon is using recommendations across its store and even off it, the technology behind the recommendation engine, and how you can implement similar strategies to increase engagement and conversions on your e-commerce store.

Decoding the Amazon Recommendation Engine

Here is a sneak peek into Amazon's efficient recommendation engine and how it works:

On-page recommendations - recommendations made on Amazon.com

User-specific product listing.

Clicking on 'My Recommendations' on the menu bar takes you to a page where Amazon displays recommended products based on:

  • Your past purchases,
  • Your past interactions (products that you have visited but not purchased, search history, and so on)

product listing

All products listed on this page are filtered by the Amazon recommendation system engine based on your shopping behavior. These are products you have shown an interest in earlier but not purchased, products similar to those you have already purchased, or products similar to those you have shown an interest in but not purchased.

This is one way how Amazon uses recommendations to increase engagement and conversions. This option, however, requires action from the user -  they have to click and visit the recommendations page. To increase visibility, Amazon takes the recommendations to the user directly, in real-time, as you will see in the next few methods listed.

Recommendations based on purchases made by similar users

On visiting the product page for an Oculus VR headset, Amazon displays the following recommendations after the product information:

user based recommendation

Here, Amazon is looking to cross-sell products that other users paired along with their Oculus VR headset purchase. Because users similar to you who previously purchased the Oculus headset also bought a travel case for it, Amazon's recommendation system presents the travel case with the assumption that you will find it useful as well.

You might find another section of recommendations, "Customers who viewed this item also viewed" along with the above one, like so:

amazon recommendation engine

Here, Amazon is filtering the recommendation by users who viewed the Oculus headset (and not bought it) and other similar items that the user then went on to view .

You might also see the following segment based on which product you are viewing:

recommendation segments

Here the products are being filtered again based on users similar to you who viewed the product you are currently viewing. 

In all these cases, the recommendation engine filters items based on two factors:

  • The features of the product,
  • User characteristics - to find other users similar to you who also purchased the product you are viewing

Cross-selling based on categories/product relationships

In this method, Amazon uses the features of the product to make a recommendation. On viewing the product page of a Dell gaming laptop, Amazon makes the following recommendations after the product details:

cross-selling on Amazon

The mouse recommended in this section is a gaming mouse, and 'gaming' is a core feature of the laptop that is selected, and the backpack is a Dell gaming backpack for laptops , which again are the core features of the item that is selected.

Amazon is looking to help the user find products that accompany the current product being viewed, by filtering items based on product features and categories.

This segment is similar to the "Frequently bought together" recommendation, which pops up when you select the gaming backpack:

frequently bought together recommendations

These recommendations seem to be made based on product features and based on what other users similar to you have purchased together in one transaction. In both samples, however, product features and their category (gaming, Dell, and so on) seem to be the primary filter.

Recommendations based on browsing history

On logging in to Amazon, users are generally presented with recommendations based on products they showed interest in but did not purchase:

browsing history based recommendatons

The two sections "Keep shopping for" and "Pick up where you left off" are recommendations based on your browsing history - products you actually viewed but didn't purchase.

Another segment you will often see is suggestions of items similar to those in your browsing history:

amazon shopping recommendations

These recommendations are also based on your browsing history but are items similar to those you viewed.

Narrowing based on interests and offers

Amazon takes recommendations one step further by further narrowing them based on available offers:

interest-based offers

Here, Amazon is recommending products based on your browsing and purchase history, but only those products that have an active discount or offer. This further increases the chances of capturing interest and making a conversion.

Up-sell recommendations

If you visit the product page of an older version of Kindle, you will see the following recommendation:

case study on content recommendation

Here, Amazon is recommending a better (and more expensive) version of the product you are viewing. The purpose of this recommendation is to upsell - to sell a better version of the product which will also bring in more revenue. The filtering system looks at product features to identify the right recommendation.

Off-page recommendations - recommendations made on platforms other than Amazon.com

Personalized emails.

Amazon uses on-page data like your purchases and browsing history to make off-page recommendations as well, and one way is through personalized emails:

amazon personalized emails

Because the user browsed for VR headsets on Amazon's store, they sent a follow-up email with a product recommendation.

The above recommendation was based on browsing history - products actually viewed. Amazon also sends personalized emails with recommendations of products similar to what were viewed:

amazon emails

Display ads

Amazon runs display ads across their partner networks. The ads are not of random products but are personalised to each user based on their past purchases and browsing history.

display ads amazon

In this example, the user is seeing an ad for a digital camera on a blog page. This is because the user browsed for cameras on Amazon and Amazon then targeted the user with a personalized display ad.

How does Amazon’s recommendation engine work?

Here's what Amazon says about its recommendation system :

"We make recommendations based on your interests.

We examine the items you've purchased, items you've told us you own, and items you've rated. We compare your activity on our site with that of other customers, and using this comparison, recommend other items that may interest you in Your Amazon.

Your recommendations change regularly, based on a number of factors, including when you purchase or rate a new item, and changes in the interests of other customers like you."

The Amazon recommendation system uses different filtering methods to recommend products.

Item-to-item collaborative filtering

One is the item-to-item collaborative filtering system ( source ). For the system to work, the engine needs to collect tons of user and product data and create relationships between them. There are three relationship models that are created:

  • User-item: A user specific matrix is created that contains the data of all products they have purchased and interacted with.
  • Item-item: The item-item matrix contains a mapping of product feature similarities. A gaming laptop and gaming mouse have the relationship of being an electronic item, a computing item, a gaming product, and so on.
  • User-user: This matrix contains a mapping of the similarities in user characteristics. Two users who purchased the same product and then gave it a rating of 4, for example, are mapped together.

The item-item collaborative filtering system uses these matrices to filter out relevant products for each user. Here's a simple example illustrating how it works:

collaborative filtering

  • In this example, users Cathy, Amy and Rose are first grouped as similar in a user-user matrix based on the similarity in their past purchases, browsing history, and product rating. In this case, the similarity is in their interest in the ice cream cone (the only common product all three have purchased).
  • Next, the recommendation engine looks to create relationships between items (based on user interest and not product features). Here, because Cathy and Amy have both shown interest in a sundae and an ice cream cone, the engine maps the two together.
  • The sundae is then recommended to Rose because that item (sundae) was mapped to another item (ice cream cone) he has already shown an interest in. Hence, item-item collaborative filtering.

When you click on an Oculus VR headset and Amazon recommends a travel case, here's what is happening:

  • You are a part of a user-user matrix based on a similarity in past purchases - you and other users have purchased or browsed the same products, or given the same products similar ratings.
  • The engine then tries to create product relationships based on the interactions of users within this group. In this case, many users who bought the VR headset also bought the travel case, and so they were mapped with each other.
  • When you show interest in the VR headset, you're shown the travel case as a recommendation because it is considered an item of interest by other users in your group who also bought the VR headset.

This is how most recommendations like the "Customers who bought this also bought" work on Amazon.

Content-based filtering

Amazon's recommendation engine also uses the content-based filtering system to recommend products. It utilizes the user-item and item-item matrices to achieve this.

Once you interact with a product, it looks for other products similar to this (based on feature relationships in this case) and then presents them to you. If you view a Dell gaming laptop, you will be recommended other gaming laptops based on similarity in features - CPU cores, processor type, RAM, storage capacity, and so on.

This is how the "related to items you have viewed" and other similar recommendations work.

Bandit and causal inference in recommendations

Bandit and causal inference in filtering are newer models that are being researched and improved. 

The concept of 'bandits' is derived from the analogy of a gambler at a slot machine which has multiple arms, each with unknown probabilities for a payout. The gambler systematically tries each arm and analyzes the results to find the best option for the maximum payout.

new filtering models

This concept is applied by the engine to identify the best filtering algorithm for each user based on which recommended products they interact with.

Causal inference in filtering is another method of improving existing algorithms. Causal inference in statistics is a method of identifying the cause of an effect. In the case of Amazon, the effect is the user clicking a product link and the engine uses causal inference to identify the factors that might have caused this action.

3 Ways Amazon uses AI for personalized product recommendations 

In-store (web and mobile) recommendations.

Amazon user recommendations on its store to personalize the shopping experience for every consumer. Products shown on the homepage, suggested items on the product page, and so on, are all tailored to the individual user using a recommendation engine.

Amazon Alexa

Amazon's voice assistant Alexa also uses AI to collect data points and deliver personalized recommendations. For example, based on what music a user tends to listen to on Alexa, the voice assistant creates personalized playlists and suggestions. 

Amazon GO is Amazon's unmanned physical store, where consumers can walk in, pick a product, and leave. These stores use cameras and AI for computer vision to track users and products being picked, scan the barcode, and then add products to the user's cart on their Amazon GO app. Users can either pay later or the money is deducted through a preselected payment method. This purchase data is attributed to the user who then gets personalized recommendations on other Amazon platforms, like Alexa and Amazon.com.

Achieving Amazon-like personalization for your e-commerce store

Create a relevant user-journey.

Amazon has created a seamless user journey that nudges the consumer forward. The homepage makes it easy for the user to find the product they want, the product page has all information laid out and then recommendations are made to upsell or cross-sell, and then users are walked through an easy to use checkout process. A great UX is important for the success of an e-commerce store.

Buy a personalized recommendation engine 

We have already established that implementing a recommendation engine will increase engagement and sales in an e-commerce store. Recommendation engines, however, are not easy to build, train and enhance.

Fortunately, there are solutions available that you can plug and use to get the benefits of recommendations right from day 1. These solutions can be implemented without any coding and they are also tailored to work for your specific store.

If you would like to know more about implementing a solution for your store, talk to us and get a demo of our recommendation engine.

Follow a simple checkout process

One of the top reasons for cart abandonment is a long and complicated checkout process.

case study on content recommendation

Amazon uses smart tactics to simplify its checkout process. One is the 'Buy Now' button that lets you immediately purchase a product without having to go to the cart page. Two is by saving information like your address and the preferred payment method they reduce the number of steps in the checkout process.

Use a transparent return policy

A return policy gives consumers a feeling of security. It also helps users get over last-minute resistance that often crops up when buying a product online. On the flip side, not having a return policy makes consumers suspicious of the brand especially since major e-commerce stores like Amazon offer an easy way to return or exchange products.

Make users feel valued

Customer loyalty is a big driver of sales, and loyalty is nurtured by creating value. In fact, a report by Loyalty Lion stated that loyal consumers spend 67 percent more on average than new ones. Amazon offers its Prime customers a lot of discounts and benefits that make them feel valued and special. Implement such strategies that are personal to the user and make them feel valued by your brand.

Implementing a recommendation engine like Amazon's

Recommendations enhance the level of personalization in an e-commerce store, and we explored the different ways Amazon uses recommendations to promote, upsell and cross-sell products throughout their store. In fact, once you have interacted with products on Amazon every segment on the site is essentially based on recommendations.

Are you looking to implement a custom recommendation engine on your e-commerce store? Drop us a message for a demo of our product and we'll help you launch a fully-functional recommendation engine.

Similar blogs recommended for you

case study on content recommendation

Revolutionizing TV Content Scheduling with AI: Introducing Argoid’s FAST Channel AI Co-Planner

Learn about Argoid's FAST channel AI Co-planner

case study on content recommendation

The Rise of FAST: How Ad-Supported Streaming is Changing TV

A brief on what is FAST and it is changing the television

case study on content recommendation

Framing the Future: How Streaming Media Recommendation Engines Can Transform Your Content Strategy

Learn how you can leverage streaming streaming media recommendation engines to transform your content strategy

Try Argoid for your business

Zero setup fee . comprehensive product . packages that suit your business..

case study on content recommendation

AI-powered real-time relevance for your viewers!

case study on content recommendation

Subscribe to our newsletter

Get the latest on AI in eCommerce and Streaming/OTT, AI-based recommendation systems and hyper-personalization.

  • Search Menu
  • Sign in through your institution
  • Volume 2024, 2024 (In Progress)
  • Volume 2023, 2023
  • Author Guidelines
  • Submission Site
  • Open Access
  • About Database
  • About the International Society for Biocuration
  • Editorial Board
  • Advertising and Corporate Services
  • Journals Career Network
  • Self-Archiving Policy
  • Journals on Oxford Academic
  • Books on Oxford Academic

International Society for Biocuration

Article Contents

Introduction, methods and evaluation, conclusion and future work, acknowledgement, conflicts of interest..

  • < Previous

A content-based dataset recommendation system for researchers—a case study on Gene Expression Omnibus (GEO) repository

ORCID logo

  • Article contents
  • Figures & tables
  • Supplementary Data

Braja Gopal Patra, Kirk Roberts, Hulin Wu, A content-based dataset recommendation system for researchers—a case study on Gene Expression Omnibus (GEO) repository, Database , Volume 2020, 2020, baaa064, https://doi.org/10.1093/database/baaa064

  • Permissions Icon Permissions

It is a growing trend among researchers to make their data publicly available for experimental reproducibility and data reusability. Sharing data with fellow researchers helps in increasing the visibility of the work. On the other hand, there are researchers who are inhibited by the lack of data resources. To overcome this challenge, many repositories and knowledge bases have been established to date to ease data sharing. Further, in the past two decades, there has been an exponential increase in the number of datasets added to these dataset repositories. However, most of these repositories are domain-specific, and none of them can recommend datasets to researchers/users. Naturally, it is challenging for a researcher to keep track of all the relevant repositories for potential use. Thus, a dataset recommender system that recommends datasets to a researcher based on previous publications can enhance their productivity and expedite further research. This work adopts an information retrieval (IR) paradigm for dataset recommendation. We hypothesize that two fundamental differences exist between dataset recommendation and PubMed-style biomedical IR beyond the corpus. First, instead of keywords, the query is the researcher, embodied by his or her publications. Second, to filter the relevant datasets from non-relevant ones, researchers are better represented by a set of interests, as opposed to the entire body of their research. This second approach is implemented using a non-parametric clustering technique. These clusters are used to recommend datasets for each researcher using the cosine similarity between the vector representations of publication clusters and datasets. The maximum normalized discounted cumulative gain at 10 (NDCG@10), precision at 10 (p@10) partial and p@10 strict of 0.89, 0.78 and 0.61, respectively, were obtained using the proposed method after manual evaluation by five researchers. As per the best of our knowledge, this is the first study of its kind on content-based dataset recommendation. We hope that this system will further promote data sharing, offset the researchers’ workload in identifying the right dataset and increase the reusability of biomedical datasets.

Database URL : http://genestudy.org/recommends/#/

In the Big Data era, extensive amounts of data have been generated for scientific discoveries. However, storing, accessing, analyzing and sharing a vast amount of data are becoming major bottlenecks for scientific research. Furthermore, making a large number of public scientific data findable, accessible, interoperable and reusable is a challenging task.

The research community has devoted substantial effort to enable data sharing. Promoting existing datasets for reuse is a major initiative that gained momentum in the past decade ( 1 ). Many repositories and knowledge bases have been established for specific types of data and domains. Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/), UKBioBank (https://www.ukbiobank.ac.uk/), ImmPort (https://www.immport.org/shared/home) and TCGA (https://portal.gdc.cancer.gov/) are some examples of repositories for biomedical datasets. DATA.GOV archives the U.S. Government’s open data related to agriculture, climate, education, etc. for research use. However, a researcher looking for previous datasets on a topic still has to painstakingly visit all the individual repositories to find relevant datasets. This is a tedious and time-consuming process.

An initiative was taken by the developers of DataMed (https://datamed.org) to solve the aforementioned issues for the biomedical community by combining biomedical repositories together and enhancing the query searching based on advanced natural language processing (NLP) techniques ( 1 , 2 ). DataMed indexes provides the functionality to search diverse categories of biomedical datasets ( 1 ). The research focus of this last work was retrieving datasets using a focused query. In addition to that biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) dataset retrieval challenge was organized in 2016 to evaluate the effectiveness of information retrieval (IR) techniques in identifying relevant biomedical datasets in DataMed ( 3 ). Among the teams participated in this shared task, use of probabilistic or machine learning based IR ( 4 ), medical subject headings (MeSH) term based query expansion ( 5 ), word embeddings and identifying named entity ( 6 ), and re-ranking ( 7 ) for searching datasets using a query were the prevalent approaches. Similarly, a specialized search engine named Omicseq was developed for retrieving omics data ( 8 ).

Google Dataset Search (https://toolbox.google.com/datasetsearch) provides the facility to search datasets on the web, similar to DataMed. While DataMed indexes only biomedical domain data, indexing in Google Dataset Search covers data across several domains. Datasets are created and added to repositories frequently, which makes it difficult for a researcher to know and keep track of all datasets. Further, search engines such as DataMed or Google Dataset Search are helpful when the user knows what type of dataset to search for, but determining the user intent in web searches is a difficult problem due to the sparse data available concerning the searcher ( 9 ). To overcome the aforementioned problems and make dataset search more user-friendly, a dataset recommendation system based on a researcher’s profile is proposed here. The publications of researchers indicate their academic interest, and this information can be used to recommend datasets. Recommending a dataset to an appropriate researcher is a new field of research. There are many datasets available that may be useful to certain researchers for further exploration, and this important aspect of dataset recommendation has not been explored earlier.

Recommendation systems, or recommenders, are an information filtering system that deploys data mining and analytics of users’ behaviors, including preferences and activities, for predictions of users’ interests on information, products or services. Research publications in recommendation systems can be broadly grouped as content-based or collaborative filtering recommendation systems ( 10 ). This article describes the development of a recommendation system for scholarly use. In general, developing a scholarly recommendation system is both challenging and unique because semantic information plays an important role in this context, as inputs such as title, abstract and keywords need to be considered ( 11 ). The usefulness of similar research article recommendation systems has been established by the acceptance of applications such as Google Scholar (https://scholar.google.com/), Academia.edu (https://www.academia.edu/), ResearchGate (https://www.researchgate.net/), Semantic Scholar (https://www.semanticscholar.org/) and PubMed (https://www.ncbi.nlm.nih.gov/pubmed/) by the research community.

Dataset recommendation is a challenging task due to the following reasons. First, while standardized formats for dataset metadata exist ( 12 ), no such standard has achieved universal adoption, and researchers use their own convention to describe their datasets. Further, many datasets do not have proper metadata, which makes the prepared dataset difficult to reuse/recommend. Second, there are many dataset repositories with the same dataset in different formats, making recommendation a challenging task. Additionally, the dataset recommendation system should be scalable to the increasing number of online datasets. We cast the problem of recommending datasets to researchers as a ranking problem of datasets matched against the researcher’s individual publication(s). The recommendation system can be viewed as an IR system where the most similar datasets can be retrieved for a researcher using his/her publications.

Data linking or identifying/clustering similar datasets have received relatively less attention in research on recommendation systems. Previous work on this topic includes ( 13–15 ). Reference ( 13 ) defined dataset recommendation as to the problem of computing a rank score for each of a set of target datasets ( D T ) so that the rank score indicates the relatedness of D T to a given source dataset ( D S ). The rank scores provide information on the likelihood of a D T to contain linking candidates for D S . Reference ( 15 ) proposed a dataset recommendation system by first creating similarity-based dataset networks, and then recommending connected datasets to users for each dataset searched. Despite the promising result this approach suffers from the cold start problem. Here cold start problem refers to the user’s initial dataset selection, where the user has no idea what dataset to select/search. If a user chooses a wrong dataset initially, then the system will always recommend wrong datasets to the user.

Some experiments were performed to identify datasets shared in the biomedical literature ( 16–18 ). Reference ( 17 ) identified data shared in biomedical literature articles using regular expression patterns and machine learning algorithms. Reference ( 16 ) identified datasets in social sciences papers using a semi-automatic method. The last system reportedly performed well (F-measure of 0.83) in finding datasets in the da|ra dataset registry. Different deep learning methods were used to extract the dataset mentions in publication and detect mention text fragment to a particular dataset in the knowledge base ( 18 ). Further, a content-based recommendation system was developed for recommending literature for datasets in ( 11 ), which was the first step toward developing a literature recommendation tool by recommending relevant literature for datasets.

This article proposes a dataset recommender that recommends datasets to researchers based on their publications. We collected dataset metadata (title and summary) from GEO and researcher’s publications (title, abstract and year of publication) from PubMed using name and curriculum vitae (CV) for developing a dataset recommendation system. A vector space model (VSM) is used to compare publications and datasets. We propose two novel ideas:

A method for representing researchers with multiple vectors reflecting each researcher’s diverse interests.

A system for recommending datasets to researchers based on their research vectors.

For the datasets, we focus on GEO (https://www.ncbi.nlm.nih.gov/geo/). GEO is a public repository for high-throughput microarray and next-generation sequence functional genomics data. It was found that an average of 21 datasets was added daily in the last 6 years (i.e. 2014–19). This gives a glimpse of the increasing number of datasets being made available online, considering that there are many other online data repositories as well. Many of these datasets were collected at significant expense, and most of these datasets were used only once. We believe that reusability of these datasets can be improved by recommending these to appropriate researchers.

Efforts on restructuring GEO have been performed by curating available metadata. In reference ( 19 ), the authors identified the important keywords present in the datasets descriptions and searched other similar datasets. Another task on restructuring the GEO database, ReGEO (http://regeo.org/) was developed by ( 20 ), who identified important metadata such as time points and cell lines for datasets using automated NLP techniques.

We developed this dataset recommendation system for researchers as a part of the dataset reusability platform (GETc Research Platform(http://genestudy.org/)) for GEO developed at the University Texas Health Science Center at Houston. This website recommends datasets to users using their publications.

The rest of the article is organized in the following manner. Section  2 provides an overview of GEO datasets and researcher publications. Methods used for developing the recommendation system and evaluation techniques used in this experiment are described in Section  3 . Section  4 describes results. Section  5 provides a discussion. Finally, conclusion and future directions are discussed in Section  6 .

The proposed dataset recommendation system requires both dataset metadata and the user profile for which datasets will be recommended. We collected metadata of datasets from the GEO repository, and researcher publications from PubMed using their names and CVs. The data collection methods and summaries of data are discussed next.

GEO Datasets

GEO is one of the most popular public repositories for functional genomics data. As of December 18, 2019, there were 122 222 series of datasets available in GEO. Histograms of datasets submitted to GEO per day and per year as presented in Figure  1 showed an increasing trend of submitting datasets to GEO, which justified our selection of this repository for developing the recommendation system.

Histogram of datasets submitted to GEO based on datasets collected on December 18, 2019

Histogram of datasets submitted to GEO based on datasets collected on December 18, 2019

Overview of dataset indexing pipeline

Overview of dataset indexing pipeline

Statistics of datasets collected from GEO

For the present experiment, metadata such as title, summary, submission date and name of dataset creator(s) were collected from GEO and indexed in a database, as shown in Figure  2 . We also collected the PMIDs of articles associated with each dataset. However, many datasets did not have articles associated with them. The detailed information of collected datasets is presented in Table  1 . Out of a total of 122 222 GEO datasets, 89 533 had 92 884 associated articles, out of which 61 228 were unique. The maximum number of articles associated with the datasets (‘GSE15907’ and ‘GSE31312’) was 10. These articles were used to remove the publications that were not related to GEO. Further, we used the GEO-related publications for building word embeddings to be used for subsequent text normalization as outlined in Section  3 .

Researcher publications

A researcher’s academic interest can be extracted from publications, grants, talks, seminars and much more. All this information is typically available in the CV, but it is presented in the form of titles/short texts. Here, short texts imply limited information. Further, lack of standardization in CV formats poses challenges to parse the CVs. In this work, an alternative approach was undertaken, which is outlined next.

Title and year of the researcher’s publications were present in the CV. However, we required title, abstract and year of publication for our experiment. A researcher’s list of publications (titles and abstracts) are easier to get from web sources such as Google Scholar, PubMed, Semantic Scholar and others. Unfortunately, the full texts of most scientific articles are not publicly available. Thus, for the present experiment, we used only the title and abstract of publications in identifying the researcher’s areas of research.

Overview of researcher’s publication extraction system to remove the author disambiguation

Overview of researcher’s publication extraction system to remove the author disambiguation

Given a researcher, we searched the researcher’s name in PubMed using Entrez API (https://www.ncbi.nlm.nih.gov/books/NBK25 501/) and collected all the publications. Multiple researchers with exact same name might exist, thus, querying the name in PubMed might sometime result in publications from other researchers as well. This is a typical challenge of author disambiguation. However, there are a few attempts that have been undertaken to resolve the issue of author disambiguation, and one of them is ORCID (https://orcid.org). A researcher needs to provide ORCID id to access his/her ORCID details. However, to the best of our knowledge, many researchers in the biomedical domain did not have an associated ORCID account. Thus we used a simple method to disambiguate the authors by using their CVs. Initially, the recommendation system prompts a researcher to provide his/her name and a CV (or list of publications). Next, we collected the publications (titles, names, MeSH terms and year of publication) for a researcher from PubMed by searching his/her name. For removing the publications of other authors with the same name, titles of all collected publications from PubMed were matched against the titles present in the CV. In the case of a match, publications were kept for further processing. An overview of the technique used for the researcher’s publication collection is provided in Figure  3 .

One of the limitations of the above publication collection method is that the publications could not be collected if they were not listed in PubMed. Further, the datasets used in the present experiments were from the biomedical domain, and the publications not listed in PubMed were less pertinent to biomedical datasets. For example, someone’s biomedical interests (in PubMed) may be more reliable markers for biomedical datasets than a theoretical computer science or statistics paper. Another downside is if the researcher’s CV may not be fully up-to-date.

This section describes how the two main objects of interest (datasets and publications of researchers) were embedded in a vector space and then how these vectors were compared in order to make recommendations. First, both datasets and papers were treated as text objects: the text of a dataset includes its title and summary, while the text of a paper includes its title and abstract. Pre-processing was performed on both a researcher’s publications and datasets by removing the low-value stopwords, links, punctuation and junk words. Further, the nltk WordNet lemmatizer (https://www.nltk.org/_modules/nltk/stem/wordnet.html) was used to get the root forms of the words. Next, we describe the methods used for converting datasets and researchers into vectors.

Dataset vector generation

VSMs can be built from text in a variety of ways, each of which has its distinct advantages and thus merit experimentation. For the present experiment, we used TF-IDF because it achieved better results for related literature recommendation for datasets in ( 11 ).

TF-IDF : For vocabulary W , each unique word w  ∈  W is assigned a score proportional to its frequency in the text (term frequency, TF) and its inverse frequency in the full collection (inverse document frequency, IDF). We tuned parameters such as minimum document frequency (min-df) and maximum n-gram size. For the present study, we kept maximum n-gram size = 2 (i.e. unigrams and bigrams) as including the higher n-gram increases the sparsity as well as computational complexity.

We converted each dataset into a vector using TF-IDF. For each dataset, the title and summary were preprocessed and normalized and then converted into a single vector. Finally, each publication vector (or publication cluster vector) is compared with dataset vectors to generate the recommendation score. Different methods for representing a researcher’s papers as vectors are discussed next.

Researcher vector generation

Baseline method.

For the baseline method, we combined multiple text-derived paper vectors into a single researcher vector ( v r ) in the same vector space using Equation ( 1 ):

where P r is the set of papers of a researcher r ; N r is the total number of papers of that researcher, and it acts as a normalization term; v p is the vector for a single paper p using TF-IDF; λ p is a recency penalty to favor more recent papers (thus better reflecting the researcher’s current interest).

It is evident that a researcher will be interested in datasets recommended for his/her current work rather than the work performed a few years back. Thus, we penalized each of the paper vectors from a different year, as stated in Equation ( 2 ):

where t is the difference between the current year and year of publication. k is the decaying function to decrease the rate proportional to its current value, and for the present study, we kept k =0.05.

. Multi-interest dataset recommendation (MIDR)

The baseline method for creating a researcher vector may be helpful for new researchers without many publications, whereas an established researcher may have multiple areas of expertise with multiple papers in each. Also, if the number of papers is imbalanced in multiple areas, then the above baseline method may not work. With a highly imbalanced set of publications this would obviously bias dataset recommendation to the dominant interest. For a more balanced set of interests that are highly dispersed, this mixture would result in the ‘centroid’ of these interests, which could be quite distinct from the individual interests. Both these cases are undesirable. The centroid of a researcher’s interests may not be of much interest to them (e.g. a researcher interested in mouse genomics and HIV vaccines may not be interested in mouse vaccines ).

For example, initial experiments were performed on Researcher 1 (mentioned later in Section 4), and it was observed that the datasets recommended for a researcher were biased toward a single research area with the largest number of publications. For example, Researcher 1 has a dominant number of publications on HIV and the baseline system recommends only HIV datasets, even if Researcher 1 has multiple research areas.

A critical limitation of the above baseline approach is that researchers can have multiple areas of expertise. We can easily build multiple vectors, each corresponding to a different expertise if we know how to properly group/cluster a researcher’s papers according to expertise or topic. However, parametric methods such as k-means clustering and latent Dirichlet allocation require specifying a priori how many clusters/topics to utilize. Generalizing the number of clusters is not possible due to a varying number of publications of researchers. Instead, our insight is that the more publications a researcher has, the more interests or areas of expertise he/she likely has as well, but this should be modeled as a ‘soft’ constraint rather than a ‘hard’ constraint. We propose to employ the non-parametric Dirichlet Process Mixture Model (DPMM) ( 21 ) to cluster papers into several groups of expertise.

High level architecture of proposed dataset recommendation

High level architecture of proposed dataset recommendation

DPMM : We employed a Gibbs Sampling-based Dirichlet Process Mixture Modeling for text clustering. DPMM offers the following advantages over its traditional counterparts. First, the number of text clusters need not be specified; second, it is relatively scalable and third, it is robust to outliers ( 22 ). The technique employs a collapsed Gibbs Sampling mechanism for Dirichlet process models wherein the clusters are added or removed based on the probability of a cluster associating with a given document. The scalability of the technique stems from the fact that word frequencies are used for text clustering. This reduces the computational burden significantly, considering the large number of samples associated with text processing problems. Further, the optimal number of clusters is likely to be chosen, as clusters with low association probability with documents are eliminated, and new clusters are created for documents that do not belong to selected clusters with high probability. For example, if a cluster c 1 contains five documents, each with low association probability, then the cluster c 1 is eliminated, and new clusters are initialized. In DPMM, the decision to create a new cluster is based on the number of papers to be clustered and the similarity of a given paper to previously clustered papers. Thus, researchers with many papers but few interests can still result in fewer clusters than a researcher with fewer papers but more interests. For example, our evaluation includes two researchers, one with 53 papers and one with 32; however, the DPMM resulted in five and six clusters, respectively. After clustering, we created a pseudo-researcher for each cluster using Equation ( 1 ), though one that can be tied back to the original researcher. The recommendation system uses these pseudo-researchers in its similarity calculations along the same lines as described above. Further, the α parameter was tuned to control the number of clusters ( 22 ). We describe tuning of the α parameter in Section 3.4.

Text normalization : Text normalization plays an important role in improving the performance of any NLP system. We also implemented text normalization techniques to improve the efficiency of the proposed clustering algorithm. We normalized similar words by grouping them together and replacing them with the most frequent words in the same word group. For example, HIV, HIV-1, HIV/AIDS and AIDS were replaced with the most frequent word HIV . For identifying similar words, we trained a word2vec model on the articles from PubMed using Gensim (https://radimrehurek.com/gensim/). The datasets are related to gene expressions, while the articles collected from PubMed contain a variety of topics related to biomedicine and life sciences which may not be suitable for building a word embedding in the current study (since some of these articles are highly unrelated to the type of information in GEO). The articles before 1998 were removed as the research on micro-array data started during that year ( 23 ). The publications related to GEO are filtered using the MeSH terms. We also developed a MeSH term classification system for those publications without MeSH terms. More details on GEO related publications filtering can be found in ( 11 ).

The similar words were identified using the most_similar function of word2vec . We only considered the top five similar words for each word using most_similar function. The normalized text was used for clustering. It was observed from the initial experiments that the text normalization improved clustering and resulted in the reduced number of clusters using DPMM.

Dataset recommendation

The most similar datasets can be recommended to researchers simply by comparing the cosine similarity of the researcher and dataset vectors using Equation ( 3 ):

where D is all the datasets that can be recommended to researcher r ; |$\cos(v_r, v_d)$| is the cosine similarity between researcher vector ( v r ) and dataset vector ( v d ).

The high-level system architecture of the dataset recommendation system is shown in Figure  4 . This dataset recommendation system is initiated by a researcher (user) by submitting his/her name and CV (or list of publications). The name is searched in PubMed for publication details, and then titles of publications from PubMed were matched with publication titles in CV. The matched publications are then clustered using DPMM to identify research fields of the researcher. Finally, the top similar datasets are recommended using the calculated cosine similarity between the researcher vector (or researcher’s cluster vector) and dataset vectors. The researcher vector (or researcher’s cluster vector) is calculated using Equation ( 1 ).

Three dataset recommendation systems are evaluated in this article: a baseline method using the researcher’s vector generation method and two proposed methods using the proposed researcher’s vector generation method.

Baseline system

The baseline system uses the researcher’s vector using Equation ( 1 ) of the baseline method in Section  3.2.1 . The top datasets are recommended after calculating the cosine similarity between the researcher’s vector and dataset vectors. This system reflects only one research field for each researcher.

MIDR System

The cluster vectors are generated using the modified Equation ( 1 ). Here, cluster-specific research area vectors are created for each researcher, instead of a single vector for each researcher as in baseline system. Papers in a single cluster are multiplied with their recency factors and summed. Then, the summation was divided the number of papers in that cluster.

This system uses multiple pseudo vectors for multiple clusters of a researcher ( ⁠|$v_{c_i}$| for i th cluster), indicating different research fields that a researcher might have, as mentioned in Section 3.2.2 .

This system compares each cluster vector with the dataset vectors and recommends the top datasets by computing the cosine similarity among them. Finally, it merges all the recommended datasets in a round-robin fashion for all the clusters, so that the researcher is able to see various datasets related to different research fields together.

MIDR System (Separate)

This system is an extension of our proposed MIDR system. Some researchers liked the way recommended datasets were merged. However, other researchers wanted dataset recommendations for each cluster separately. For this reason, another system was developed where the recommended datasets were shown separately for each research cluster, allowing researchers to obtain different recommended datasets for different research interests.

Number of clusters with varying α values for proposed α based on our initial evaluation. Abbreviations: P: Proposed, a: total number of clusters, b: number of clusters which contains more than one paper, c: number of clusters which contains only one paper

Tuning the α parameter

A researcher with a higher number of publications is more likely to have more research interests. In this paper, research interests are represented as clusters, expressed as vectors. A Dirichlet process is non-parametric because, in theory, there can be an infinite number of clusters. By changing the α parameter, DPMM can vary the number of clusters. The α value is inversely related to the number of clusters, i.e. decreasing the α parameter in DPMM may increase the number of output clusters. Therefore, we propose an α value, which is also inversely related to the number of research publications. Further, the α value must stabilize after a certain threshold to avoid the formation of too many clusters, and it must be generalized to the number of publications. To this end, α is calculated as follows:

where N is the total number of papers for a researcher. The α value is proposed based on manually observing the clusters and collecting feedback from different researchers. Apart from inherent requirements for setting α , Equation ( 4 ) maintains a reasonable number of clusters, which was found useful by most of the evaluators.

Different α values and their corresponding number of clusters are provided in Table  2 . The number of clusters are divided into three categories: (a) total number of clusters, (b) number of clusters which contains more than one paper, (c) number of clusters which contains only one paper. We removed the clusters with one paper and used the clusters with two or more papers for recommending datasets. We observed that the number of clusters did not entirely depend upon the number of papers, a researcher had. Moreover, it largely reflected the number of research fields that the researcher participated in. For example, Researcher 2 had fewer publications than Researcher 1 and Researcher 3, but the number of clusters was more than the others. This shows that non-parametric clustering is a good technique for segmenting research areas.

There is no existing labeled clustered publication datasets available for automatic evaluation. Again, manually evaluating the clusters was a time and resource-consuming task. It might be biased as the evaluation depends upon different judgments for different researchers. Thus, we implemented K-Means for comparing to the proposed DPMM. The automatic cluster comparison was performed using inter- and intra-cluster cosine similarity (IACCS) of words and MeSH terms in the publications, separately. IACCS was the mean cosine similarity of words or MeSH terms for each pair of papers in a given cluster. Considering a cluster of size n ( ⁠|$X=\{x_1, x_2, \dots x_n\}$|⁠ ), the IACCS can be formulated using Equation ( 5 ):

where, x i and x j are the list of MeSH terms or words of the i th and j th paper, respectively, and |$\cos(x_i, x_j)$| is the cosine similarity between them. Finally, the mean of IACCS was calculated using the IACCS of individual clusters.

We computed the mean cosine similarity between words or MeSH terms of papers within clusters to calculate the inter-cluster cosine similarity (ICCS). Considering n clusters ( ⁠|$c_1, c_2, \dots c_n$|⁠ ), ICSS can be formulated using Equation ( 6 ):

where, c i and c j are the list of MeSH terms or words of all the papers in the i th and j th clusters, respectively, and |$\cos(c_i, c_j)$| is the cosine similarity between them.

For the baseline comparison, publication vectors are created using TF-IDF, then K-Means is used to compute the publication clusters. K-Means is a parametric unsupervised clustering. We implemented K-Means with two and five clusters separately for comparison purposes. On the other hand, the tuning parameter proposed for DPMM resulted in a variable number of clusters for different researchers, and these clusters were used for comparison.

Recommendation system

Mean IACCS and ICCS for K-Means and DPMM (with different cluster sizes as mentioned in Table  2 ).

Being a novel task, no prior ground truth annotations exist for publication-driven dataset recommendation. Thus, we performed a manual evaluation for each developed dataset recommendation system. We asked researchers to rate each retrieved dataset based on their publications or publication clusters. The researchers included in this study have already worked on the datasets from GEO and published papers on these datasets. The rating criterion was how likely they want to work on the retrieved datasets. We asked them to rate using one to three ‘stars’, with three stars being the highest score. Later, normalized discounted cumulative gain (NDCG) at 10 and Precision at 10 (P@10) were calculated to evaluate different systems. The ratings are:

1 star [not relevant] : This dataset is not useful at all.

2 star [partially relevant] : This dataset is partially relevant to the publication cluster. The researcher has already used this dataset or maybe work on it in the future.

3 star [most relevant] : This dataset is most relevant to the publication cluster, and the researcher wants to work on this dataset as soon as possible.

The primary evaluation metric used in this work is NDCG, which is a family of ranking measures widely used in IR applications. It has advantages compared to many other measures. First, NDCG allows each retrieved document to have a graded relevance, while most traditional ranking measures only allow binary relevance (i.e. each document is viewed as either relevant or not relevant). This enables the three-point scale to be directly incorporated into the evaluation metric. Second, NDCG involves a discount function over the rank while many other measures uniformly weight all positions. This feature is particularly important for search engines as users care about top-ranked documents much more than others ( 24 ). NDCG is calculated as follows:

where rating ( i ) is the i th dataset rating provided by users. For the present study, we set p = 10 for the simplicity of manual annotation.

The NDCG@10 for the baseline and MIDR systems is calculated using the ratings of only the top ten retrieved datasets. For the MIDR system (separate), there were multiple publication clusters for a single user, and for each publication cluster we recommended datasets separately. NDCG@10 was calculated for each publication cluster using the top ten datasets and later averaged to get a final NDCG@10 for a single researcher. For NDCG@10 calculation, the 1-star, 2-star and 3-star are converted to 0, 1 and 2, respectively. We also calculated P@10 (strict and partial) for the baseline and proposed systems. Strict considers only 3-star, while partial considers both 2- and 3-star results. The results presented in this study were evaluated using a total of five researchers (with an average of 32 publications) who already worked on GEO datasets. This is admittedly a small sample size, but is large enough to draw coarse comparisons on this novel task.

We compared DPMM clustering with K-Means as mentioned in Section  3.5 . ICCS and mean IACCS values for different clustering methods are presented in Table  3 . In general, higher mean IACSS and lower ICCS generally indicate better clustering. However, this is not always the case, especially when the number of clusters are small, and each cluster contains multiple publications for a single researcher. In this situation, the IACSS for individual cluster decreases after being divided by the number of publication pairs in each cluster. Furthermore, DPMM and K-Means were comparable when the number of clusters produced by both were close to each other. For all the cases, DPMM had higher mean IACSS and lower ICCS than K-Means using words. This suggests that DPMM was well-suited for clustering a researcher’s publications into multiple research fields.

Researcher-specific results of the dataset recommendation system are shown in Table  4 . The results for individual researchers are listed for all the systems. Metric-specific average results for all the systems are also shown in Table  4 . The baseline system did not have any publication clusters and all publications were vectorized using Equation ( 1 ). Next, the top ten similar datasets were used to evaluate the results of the baseline system and it obtained the average NDGC@10, P@10 (P) and P@10 (S) of 0.80, 0.69 and 0.45, respectively.

The proposed MIDR system obtained the average NDCG@10, P@10 (P) and P@10 (S) of 0.89, 0.78 and 0.61, respectively. The proposed MIDR (separate) system obtained the average NDGC@10, P@10 (P) and P@10 (S) of 0.62, 0.45 and 0.31, respectively. For calculating NDCG@10 and P@10 in the proposed MIDR (separate) system, individual cluster scores were calculated first, and then divided by the total number of clusters.

NDCG@10, partial and strict P@10 values of the different dataset recommendation systems based on three evaluators. Abbreviations: Partial: P; Strict: S

The proposed MIDR system performed better than the baseline system. The MIDR system recommended a variety of datasets involving multiple clusters/research fields as opposed to the baseline system recommended datasets from a single research field with the maximum number of publications.

Performances of the baseline and proposed MIDR (separate) systems could not be directly compared. Evaluation of the MIDR (separate) system was performed over multiple clusters with ten datasets recommended for each cluster. In contrast, evaluation of the baseline system was performed only on 10 datasets, for example, for Researcher 1 in Table  4 , evaluations of baseline system and MIDR (separate) system were performed on 10 and 50 datasets, respectively. There were other advantages of the MIDR (separate) system over the baseline system, irrespective of higher NDCG@10 for the latter. The baseline system had a bias toward a specific research field which was eliminated in the MIDR (separate) system. For Researchers 1 and 2 in Table  4 , the datasets recommended by the baseline system were found in the results of two clusters/research fields (which had the maximum number of publications) in the proposed MIDR (separate) system. However, for Researcher 3 in Table  4 , recommended datasets of the baseline system were found in the results of only one research field (with the maximum number of publications) in the proposed MIDR (separate) system.

For Researcher 1 in Table  4 , there were 31 papers with HIV keywords and those papers were not published recently. We penalized the papers according to the year of publication for all methods. However, the top datasets contained ‘HIV’ or related keywords for the baseline method. We manually checked the top 100 results and found that those were relevant to HIV . Whereas, the proposed MIDR system clustered the publications into different groups (such as HIV, Flu/Influenza , and others), which resulted in recommendations for different research fields. Therefore, Researcher 1 had the flexibility to choose the datasets after looking at the preferred clusters in the proposed MIDR or MIDR (separate) system.

Similarly, the results of the MIDR and MIDR (separate) systems could not be directly compared. Evaluation of the MIDR system was performed based on 10 datasets recommended for each researcher, whereas evaluation of MIDR (separate) system was performed based on 10 recommended datasets for each research field (cluster), which could be more than 10 datasets if a researcher had more than one research fields (clusters). Hence, the NDCG@10 and P@10 scores of MIDR (separate) system were less than the MIDR system.

For researchers looking to find specific types of datasets, a keyword-based IR system might be more useful. For researcher who generally wanted to find datasets related to their interests, but did not have a particular interest in mind, could benefit from our system. For instance, if a researcher wanted a regular update of datasets relevant to their interest, our method would be better suited. However, this proposed system may not be useful to early-stage researchers due to fewer publications. They may take advantage of the available dataset retrieval systems such as DataMed, Omicseq and Google Dataset Search; or the text-based dataset searching that we provided on the website.

Error analysis

For some clusters, evaluators rated all recommended datasets as one star. In most of these cases, we observed that the research field of that cluster was out of the scope of GEO. In this case, the NDCG@10 score was close to 1, but the P@10 score was 0. This may be one of the reasons why NDCG@10 scores were much higher compared to P@10 scores.

Screenshots of dataset recommendation system Researcher 1 (up) and Researcher 2 (down).

Screenshots of dataset recommendation system Researcher 1 (up) and Researcher 2 (down).

Initially, we had not identified whether the clusters were related to GEO or not. We recommended datasets for these unrelated clusters. For example, Researcher 2 in Table  4 had a paper cluster which was related to statistical image analysis. For this specific cluster, Researcher 2 rated all the recommended datasets as one star, which reduced the scores of the systems.

Later, we identified a threshold by averaging the similarity scores of publications and datasets for each cluster, and were able to remove the clusters which were not related to GEO. The threshold was set to 0.05 for the present study, i.e. a cluster was not considered for evaluation or showing recommendation if the average similarity score of the top 10 datasets for that cluster was less than or equals to 0.05. This threshold technique improved the results of proposed systems by 3% for Researcher 2. However, a thorough investigation on threshold involving datasets from different biomedical domains is needed for future work.

Further, a dislike button for each cluster may be provided, and users may press the dislike button if that cluster is not related to GEO datasets. Later, this information can be used to build a machine learning-based system to identify and remove such clusters from further processing. This will improve the usefulness and reduce time complexity of the proposed recommendation system.

. Limitations

The researchers’ names are searched in PubMed to collect their publications. Many recent conference/journal publications are not updated in PubMed. Further, if the researcher has most of his/her publications that did not belong to the biomedical domain, then there is a low chance of getting those papers in PubMed. This makes the dataset recommendation task harder. Authors might later be able to include a subset of their non-PubMed articles for consideration in dataset recommendation (e.g. bioRxiv preprints), but this work is currently limited to PubMed publications only.

We used PubMed name search to find the titles of a researcher’s papers. Finally, the titles were matched with the text in the CV to get publications. If there is any typo in the CV, then that publication would be rejected from being processed in further steps. As we do not fully parse the CV, instead just performing string matching to find publications, there is a high chance of rejecting publications with small typos.

The manual evaluation was performed by five researchers only. For each cluster, 10 datasets were recommended, and each researcher has to evaluate an average of 40 datasets. It was a time-consuming task for evaluators to check each of the recommended datasets. For manual evaluation, we required the human judges with expertise on the GEO datasets, which was challenging to find. Further research will entail the scaling of this evaluation process.

GETc Platform

We developed the GETc research platform that recommends datasets to researchers using the proposed methods. A researcher needs to provide his/her name (as in PubMed) and CV (or list of publications) in the website. After processing his/her publications collected from PubMed, the recommendation system recommends datasets from GEO. Researchers can provide feedback for the datasets recommended by our system based on the evaluation criteria mentioned in Section  3.5 . A screenshot of the dataset recommendation system is shown in Figure  5 . This platform also recommends datasets using texts/documents, where cosine similarity of text and datasets are calculated, and datasets with a high score are recommended to users. Apart from dataset recommendation, it can also recommend literature and collaborators for each dataset. The platform analyzes time-course datasets using a specialized analysis pipeline (http://genestudy.org/pipeline) ( 25 ). We believe that these functions implemented in the GETc platform will significantly improve the reusability of datasets.

This work is the first step toward developing a dataset recommendation tool to connect researchers to relevant datasets they may not otherwise be aware of. The maximum NDGC@10, P@10 (P) and P@10 (S) of 0.89, 0.78 and 0.61 were achieved based on the proposed method (MIDR) using five evaluators. This recommendation system will hopefully lead to greater biomedical data reuse and improved scientific productivity. Similar dataset recommendation can be developed for different datasets from both biomedical and other domains.

The next goal is to identify the clusters which are not related to datasets and used for recommendations in the present article. These clusters can be removed from further experiments. Later, we plan to implement other embedding methods and test the dataset recommendation system on a vast number of users. A user-specific feedback-based system can be developed to remove datasets from the recommendations. Several additional dataset repositories can be added in the future. Other APIs can also be added to retrieve more complete representation of researcher’s publication history.

Availability:   http://genestudy.org/recommends/#/

We thank Drs. H.M, J.T.C, A.G and W.J.Z. for their help in evaluating the results and comments on designing that greatly improved the GETc research platform.

This project is mainly supported by the Center for Big Data in Health Sciences (CBD-HS) at School of Public Health, University of Texas Health Science Center at Houston (UTHealth) and partially supported by the Cancer Research and Prevention Institute of Texas (CPRIT) project RP170 668 (K.R., H.W.) as well as the National Institute of Health (NIH) (grant R00LM012104) (K.R.).

None declared.

Chen   X.  et al. . ( 2018 ) Datamed–an open source discovery index for finding biomedical datasets . Journal of the American Medical Informatics Association , 25 , 300 – 308 . doi: 10.1093/jamia/ocx121

Google Scholar

Roberts   K.  et al. . ( 2017 ) Information retrieval for biomedical datasets: the 2016 biocaddie dataset retrieval challenge . Database , 2017 , 1 – 9 . doi: 10.1093/database/bax068

Cohen   T.  et al. . ( 2017 ) A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 biocaddie dataset retrieval challenge . Database , 2017 , 1 – 10 . doi: 10.1093/database/bax061

Karisani   P. , Qin   Z.S. and Agichtein   E.  et al.  ( 2018 ) Probabilistic and machine learning-based retrieval approaches for biomedical dataset retrieval . Database , 2018 , 1 – 12 . doi: 10.1093/database/bax104

Wright   T.B. , Ball   D. and Hersh   W.  et al.  ( 2017 ) Query expansion using mesh terms for dataset retrieval: Ohsu at the biocaddie 2016 dataset retrieval challenge . Database , 2017 , 1 – 9 . doi: 10.1093/database/bax065

Scerri   A.  et al. . ( 2017 ) Elsevier’s approach to the biocaddie 2016 dataset retrieval challenge . Database , 2017 , 1 – 12 . doi: 10.1093/database/bax056

Wei   W.  et al. . ( 2018 ) Finding relevant biomedical datasets: the UC San Diego solution for the biocaddie retrieval challenge . Database , 2018 , 1 – 10 . doi: 10.1093/database/bay017

Sun   X.  et al. . ( 2017 ) Omicseq: a web-based search engine for exploring omics datasets . Nucleic acids research , 45 , W445 – W452 . doi: 10.1093/nar/gkx258

Jansen   B.J.  et al. . ( 2007 ) Determining the user intent of web search engine queries . In Proceedings of the 16th international conference on World Wide Web . ACM , Banff, Alberta, Canada   pp. 1149 – 1150 .

Achakulvisut   T.  et al. . ( 2016 ) Science concierge: A fast content-based recommendation system for scientific publications . PloS one , 11 , e0158423. doi: 10.1371/journal.pone.0158423

Patra   B.G.  et al. . ( 2020 ) A content-based literature recommendation system for datasets to improve data reusability. A case study on Gene Expression Omnibus (GEO) datasets . Journal of Biomedical Informatics , 104 , 103399. doi: 10.1016/j.jbi.2020.103399

Sansone   S.-A.  et al. . ( 2017 ) Dats, the data tag suite to enable discoverability of datasets . Scientific data , 4 , 170059. doi: 10.1038/sdata.2017.59

Ellefi   M.B.  et al.  ( 2016 ) Dataset recommendation for data linking: An intensional approach . In European Semantic Web Conference . Springer , Heraklion, Crete, Greece   pp. 36 – 51 .

Nunes   B.P.  et al.  ( 2013 ). Combining a co-occurrence-based and a semantic measure for entity linking . In Extended Semantic Web Conference , Springer , Montpellier, France   548 – 562 .

Srivastava   K.S. ( 2018 ). Predicting and recommending relevant datasets in complex environments . US Patent App. 15/721,122.

Ghavimi   B.  et al.  ( 2016 ) Identifying and improving dataset references in social sciences full texts . arXiv preprint arXiv:1603.01774 Positioning and Power in Academic Players, Agents and Agendas IOS Press   105 – 114 . doi: 10.3233/978-1-61499-649-1-105

Piwowar   H.A. and Chapman   W.W. ( 2008 ) Identifying data sharing in biomedical literature . In AMIA Annual Symposium Proceedings . American Medical Informatics Association , Washington, D.C., USA   Vol. 2008 , p 596.

Prasad   A.  et al. . ( 2019 ) Dataset mention extraction and classification . In Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications . Association for Computational Linguistics , Minneapolis, Minnesota, USA   pp 31 – 36 .

Li   Z. , Li   J. and Yu   P.  et al. . ( 2018 ) Geometacuration: a web-based application for accurate manual curation of gene expression omnibus metadata . Database , 2018 , 1 – 8 . doi: 10.1093/database/bay019

Chen   G.  et al.  ( 2019 ) Restructured geo: restructuring gene expression omnibus metadata for genome dynamics analysis . Database , 2019 , 1 – 8 . doi: 10.1093/database/bay145

Neal   R.M. ( 2000 ) Markov chain sampling methods for dirichlet process mixture models . Journal of computational and graphical statistics , 9 , 249 – 265 .

Yin   J. and Wang   J. ( 2016 ) A model-based approach for text clustering with outlier detection . In Proceedings of the 2016 IEEE 32nd International Conference on Data Engineering (ICDE) . IEEE , Helsinki, Finland , pp 625 – 636 .

Lenoir   T. and Giannella   E. ( 2006 ) The emergence and diffusion of dna microarray technology . Journal of biomedical discovery and collaboration , 1 , 11. doi: 10.1186/1747-5333-1-11

Wang   Y.  et al. . ( 2013 ) A theoretical analysis of ndcg ranking measures . In 26th Annual Conference on Learning Theory (COLT 2013)   PMLR   Princeton, NJ, USA . Vol. 8 .

Carey   M.  et al. . ( 2018 ) A big data pipeline: Identifying dynamic gene regulatory networks from time-course gene expression omnibus data with applications to influenza infection . Statistical methods in medical research , 27 , 1930 – 1955 . doi: 10.1177/0962280217746719

Author notes

Citation details: Patra,B.G., Roberts,K., Wu,H., A content-based dataset recommendation system for researchers—a case study on Gene Expression Omnibus (GEO) repository. Database (2020) Vol. 00: article ID baaa064; doi:10.1093/database/baaa064

Email alerts

Citing articles via.

  • Recommend to your Library

Affiliations

  • Online ISSN 1758-0463
  • Copyright © 2024 Oxford University Press
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations

Randomly-hashed item ids are used ubiquitously in recommendation models. However, the learned representations from random hashing prevents generalization across similar items, causing problems of learning unseen and long-tail items, especially when item corpus is large, power-law distributed, and evolving dynamically. In this paper, we propose using content-derived features as a replacement for random ids. We show that simply replacing ID features with content-based embeddings can cause a drop in quality due to reduced memorization capability. To strike a good balance of memorization and generalization, we propose to use Semantic IDs Rajput et al. ( 2023 ) – a compact discrete item representation learned from frozen content embeddings using RQ-VAE that captures the hierarchy of concepts in items – as a replacement for random item ids. Similar to content embeddings, the compactness of Semantic IDs poses a problem of easy adaption in recommendation models. We propose novel methods for adapting Semantic IDs in industry-scale ranking models, through hashing sub-pieces of of the Semantic-ID sequences. In particular, we find that the SentencePiece model  Kudo ( 2018 ) that is commonly used in LLM tokenization outperforms manually crafted pieces such as N-grams. To the end, we evaluate our approaches in a real-world ranking model for YouTube recommendations. Our experiments demonstrate that Semantic IDs can replace the direct use of video IDs by improving the generalization ability on new and long-tail item slices without sacrificing overall model quality.

1 Introduction

Neural models with large embedding tables are widely used in industry-scale recommender systems for scoring and ranking vast collections of items. These tables, often containing millions or even billions of rows, facilitate rapid memorization of item quality by modeling randomly-hashed item identifiers. It’s worth noting that learning good item representations is crucial for personalization, as users are typically modeled as a sequence of items. Concretely, in this paper, we consider a neural ranking in a video recommendation system at YouTube. In this model, every video gets a unique identifier referred to as video ID, which is a random string devoid of meaning. This approach is widely adopted in numerous industry-scale recommender systems (e.g., Cheng et al. ( 2016 ); Kim et al. ( 2007 ); Koren et al. ( 2009 ); Zhao et al. ( 2019b ) ).

In this paper, we study content-based item representations that can improve the generalization for new and long-tail item distributions while keeping models’ power of memorization without sacrificing overall quality, with a focus on recommendation ranking models. A common technique for encoding item id is to learn one-hot embeddings. However, given an extremely large item corpus with billions of videos, learning one embedding vector per video can be resource-intensive, and more importantly, are vulnerable to the data sparsity of torso and tail items. For using a limited number of embeddings, an alternative approach is to use the hashing trick  Weinberger et al. ( 2009 ) that maps many items to the same row. This approach can cause random collisions when the original item IDs are not semantically meaningful. When it comes to using content embeddings from pre-trained multimodal item encoders, it is unclear if large item ID table can be fully replaced due to the loss of item-level memorization. In   Yuan et al. ( 2023 ) , authors show that frozen item embeddings outperformed item ID baselines for SASRec  Kang and McAuley ( 2018 ) , but not for two-tower models  Rendle et al. ( 2020 ) for datasets with up to 150 ⁢ k 150 𝑘 150k 150 italic_k -size corpus. In our experiments in YouTube with a much larger corpus, we observed a significant quality reduction (Section  4.2 ) when the use of video IDs is replaced with content embeddings. A recent study  Ni et al. ( 2023 ) has demonstrated the effectiveness of video encoders that use end-to-end training (VideoRec) to replace video ID in recommendation models for short videos. However, this approach comes with 10-50x computational cost over the ID baseline.

We propose a new framework of adapting content embeddings in ranking models with the flexibility of controlling generalization and memorization. Our method is based on item Semantic IDs (SIDs) which are originally proposed in TIGER Rajput et al. ( 2023 ) as a hierarchical, sequential and compact representation for generative retrieval. The hierarchical nature of SID offer the flexibilty of granuality control by using various levels of prefixes, and the sequential property draws the connection to subword tokenization, e.g., SentencePiece model (SPM) Kudo ( 2018 ) in LLMs. Notably, TIGER  Rajput et al. ( 2023 ) uses SIDs for generative retrieval where efficiency is not a primary consideration, while our work focuses on using Semantic IDs in resource-constrained and latency-sensitive production-scale ranking models, where the hashing and adaptation through embeddings is the key.

The detailed contributions are: (1) We propose two ways of adapting SIDs in recommendation models as a replacement of item IDs: n-gram and SPM. For both of them, the key idea is to create content-based hashing through sub-pieces of item SIDs, while SPM provides a learnable approach from item distribution by grouping sub-pieces with variable lengths; (2) We conduct extensive experiments on the YouTube dataset to demonstrate the effectiveness of our approaches. To that end, we show that SID-based adaption outperforms the directly using content embeddings. We also demonstrate the superior performance of SPM over n-gram when using large embedding tables with the same number of embedding lookups per item; (3) We also demonstrate the productionization of SIDs for a corpus of billions of videos in YouTube with examples of meaningful and granular hierarchical relationships, along with the success of replacing video IDs in the product scenario.

2 Related Work

Embedding learning.

Recommender models rely on learning good representation of categorical features. A common technique to encode categorical features is to train embeddings using one-hot embeddings. Word2vec  Mikolov et al. ( 2013 ) popularized this in the context of language models. Hashing trick  Weinberger et al. ( 2009 ) is typically used when the cardinality is high, but it causes random collisions. Multiple hashing  Zhang et al. ( 2020 ) offers some relief but still leads to random collisions. Deep Hash Embedding   Kang et al. ( 2021 ) circumvents this problem by not maintaining embedding tables but at the cost of increased computation in the hidden layers. In contrast, we use Semantic IDs — a compute-efficient way to avoid random collisions during embedding learning for item IDs. Semantic IDs improve generalization in recommender models by enabling collisions between semantically related items.

Cold-start and content information

Content-based recommender models have been proposed to combat cold-start issues (e.g.   Schein et al. ( 2002 ) ,   Volkovs et al. ( 2017b ) ) and to enable transferable recommendations ( Wang et al. ( 2022 ) ,   Hou et al. ( 2022 ) ,   Ni et al. ( 2023 ) ). Recently, embeddings derived from content information are also popular (e.g., DropoutNet  Volkovs et al. ( 2017a ) , CC-CC  Shi et al. ( 2019 ) and   Du et al. ( 2020 ) ). PinSage  Ying et al. ( 2018 ) aggregates visual, text, and engagement information to represent items. Moreover, PinnerFormer  Pancha et al. ( 2022 ) uses sequences of PinSage embeddings corresponding to item history to build a sequential recommendation model. In contrast to these efforts, our goal is to develop content-derived representations that not only generalize well. However, it can also improve performance relative to using item ID features which is a significantly challenging task. Ni et al. ( 2023 ) have successfully tackled the challenge of replacing video ID with content embedding derived from video encoders that are trained end-to-end with the recommendation model for short videos. In a similar vein, TransRec  Wang et al. ( 2022 ) also trains end-to-end and uses multiple modality information to represent items for enabling transferable recommendations. However, both approaches significantly increase training costs, making them challenging to deploy in production. Semantic IDs offer an efficient compression of content embeddings into discrete tokens, making it feasible to use content signals in production recommendation systems. Furthermore, unlike PinnerFormer  Pancha et al. ( 2022 ) which is used for offline inference, our focus is to improve generalization of a ranking model used for real-time inference. Therefore, approaches that significantly increase resource costs (including storage, training and serving) make them infeasible to deploy in production. Semantic IDs offer an efficient compression of content embeddings into discrete tokens, making it feasible to use content signals in production recommendation systems. Ni et al. ( 2023 ) introduce a large dataset of short videos and show that existing video encoders do not produce embeddings that are useful for recommendations purposes.

Discrete representations

Several techniques exist to discretize embeddings, including VQ-VAE  Van Den Oord et al. ( 2017 ) , VQ-GAN  Esser et al. ( 2021 ) and their variants used for generative modeling (e.g., Parti  Yu et al. ( 2022 ) and SoundStream  Zeghidour et al. ( 2021 ) ). TIGER  Rajput et al. ( 2023 ) used RQ-VAE in the context of recommender applications. Conventional techniques like Product Quantization  Jegou et al. ( 2010 ) and its variants are used by many recommender models (e.g., MGQE  Kang et al. ( 2020 ) and   Hou et al. ( 2022 ) ). However, these do not offer hierarchical semantics, which we leverage in our work.

3 Proposed Approaches

3.1 overview.

Given content embeddings for a corpus of items, in contrast with the approach of directly using the embeddings as input feature, we propose an efficient two-stage approach to leverage content signal in downstream recommendation models.

Stage 1: Efficient compression of content embeddings into discrete Semantic IDs . We propose a Residual Quantization technique called RQ-VAE  Rajput et al. ( 2023 ); Lee et al. ( 2022 ); Zeghidour et al. ( 2021 ) to quantize dense content embeddings into discrete tokens to capture semantic information about videos. This compression is crucial to allow us to efficiently represent a user’s past history because each item can be efficiently be represented as a few integers rather than high-dim embeddings. Once trained, we freeze the trained RQ-VAE model and use it for training the downstream ranking model in Stage 2.

Stage 2: Training the ranking model with Semantic IDs . We use the model from Stage 1 to map each item to its Semantic ID and then train embeddings for Semantic ID, along with the rest of the ranking model (Section  3.3 ). In practical scenario, ranking models are typically trained sequentially on recently logged data.

A key design choice in our proposal is to train and then freeze the RQ-VAE model from Stage 1. The frozen RQ-VAE model generates Semantic IDs for training and serving the ranking model. Recent data may include items that may not exist in the training distribution of the RQ-VAE model. This raises a potential concern from freezing the model, which could hurt the performance of the ranking model over time. As detailed in Appendix  A.2 , our analysis of YouTube ranking models utilizing Semantic IDs derived from RQ-VAE models trained on both older and recent data reveals comparable performance, indicating the stability of learned semantic representations over time.

3.2 RQ-VAE for Semantic IDs (SIDs)

Refer to caption

superscript subscript 𝑙 1 𝐿 𝛽 superscript norm subscript 𝒓 𝑙 sg delimited-[] subscript 𝒆 subscript 𝑐 𝑙 2 superscript norm sg delimited-[] subscript 𝒓 𝑙 subscript 𝒆 subscript 𝑐 𝑙 2 {\mathcal{L}}_{rqvae}=\sum_{l=1}^{L}\ \beta\|{\bm{r}}_{l}-\text{sg}[{\bm{e}}_{% c_{l}}]\|^{2}+\|\text{sg}[{\bm{r}}_{l}]-{\bm{e}}_{c_{l}}\|^{2} caligraphic_L start_POSTSUBSCRIPT italic_r italic_q italic_v italic_a italic_e end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_β ∥ bold_italic_r start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT - sg [ bold_italic_e start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ sg [ bold_italic_r start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ] - bold_italic_e start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and sg denotes the stop-gradient operator. ℒ r ⁢ e ⁢ c ⁢ o ⁢ n subscript ℒ 𝑟 𝑒 𝑐 𝑜 𝑛 {\mathcal{L}}_{recon} caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_c italic_o italic_n end_POSTSUBSCRIPT aims to reconstruct the content embedding 𝒙 𝒙 {\bm{x}} bold_italic_x . The first and the second terms in ℒ r ⁢ q ⁢ v ⁢ a ⁢ e subscript ℒ 𝑟 𝑞 𝑣 𝑎 𝑒 {\mathcal{L}}_{rqvae} caligraphic_L start_POSTSUBSCRIPT italic_r italic_q italic_v italic_a italic_e end_POSTSUBSCRIPT encourages the encoder and the codebook vectors to be trained such that 𝒓 l subscript 𝒓 𝑙 {\bm{r}}_{l} bold_italic_r start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT and 𝒆 c l subscript 𝒆 subscript 𝑐 𝑙 {\bm{e}}_{c_{l}} bold_italic_e start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT move towards each other.

3.3 Semantic ID Representation in Ranking

In this section, we discuss how we model item representations derived from SIDs to use in ranking models. For a given item v 𝑣 v italic_v , an RQ-VAE model with L 𝐿 L italic_L levels generates a SID as a sequence ( c 1 v , … ⁢ c L v ) subscript superscript 𝑐 𝑣 1 … subscript superscript 𝑐 𝑣 𝐿 (c^{v}_{1},...c^{v}_{L}) ( italic_c start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … italic_c start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) . The idea of adaptation is to create subwords for hashing the SID sequence into a number of learnable embeddings. We propose two techniques for the adaptation:

SPM-based: While N-gram-based video representations offer a straightforward approach to capture relationships between sequential codes in Semantic ID, they suffer from limitations that hinder their effectiveness. First, their reliance on fixed grouping based on predefined N-gram sizes restricts their ability to adapt to the specific characteristics of the Semantic ID corpus, leading to suboptimal embedding table lookups. Second, the number of rows in the embedding tables in N-gram grow exponentially with N, imposing a significant memory burden. These challenges motivate adaptation of Semantic IDs with Sentence Piece Models (SPM) Kudo ( 2018 ) , which offer a more adaptive and efficient solution for representing item content. We propose using SPM to dynamically learn Semantic ID subwords based on the distribution of impressed items. This allows dynamic length subwords such that popular co-occuring codes are automatically comined as a single subgroup, whereas codes that rarely co-occur together may fallback to unigram. For SPM-based representation, we learn a single embedding table where each row corresponds to a particular variable-length subpieces. By adaptively constructing subword vocabularies given a fixed embedding table size, the SPM vocabulary allows striking a balance between generalization and memorization.

4 Experiments

4.1 experimental setup.

Ranking Model. We conduct our experiments on a multitask production ranking model Tang et al. ( 2023 ); Zhao et al. ( 2019a ) , which is used for recommending the next video to watch, given a video a user is watching and user’s past activities. This model uses O(10) million buckets for random hashing to accommodate O(100) millions of videos in our corpus and is trained sequentially on logged data. In the baseline, random hashing of video IDs is used for three key features: users’ watch history, watch video, and the candidate video to be ranked. We evaluate our methods on the data that the trained model has not yet seen, allowing us to understand the performance under the data-distribution shift of the video corpus.

The inherent scale and real-time demands of ranking models necessitate embedding tables with specific characteristics to ensure efficient and effective performance. Firstly, the embedding table size needs to easily fit in the memory. This was one of our key considerations when deciding N in the N-gram-based Semantic ID representations. Since the number of rows in the embedding tables grow exponentially with N, we limit our analysis to N ≤ 2 𝑁 2 N\leq 2 italic_N ≤ 2 for N-gram-based representations. Secondly, the embedding lookups need to be fast to provide near-instantaneous responses to user requests. Our analysis is grounded in the above two properties.

Content Embeddings. Semantic IDs are generated using dense content embeddings. We use a video encoder to generate dense content embeddings for each YouTube video. The video encoder is a transformer model that uses Video- BERT  Sun et al. ( 2019 ) as the backbone architecture, takes audio and visual features as inputs, and outputs 2048 2048 2048 2048 -dimensional embeddings that capture the topicality of the video. This model was trained using techniques described in   Lee et al. ( 2020 ) .

Experimental Settings. We compare the two proposed Semantic ID-based representations with two baseline representation techniques: directly using raw content embeddings referred to as Dense Input , and the commonly used randomized hashed IDs referred to as Random Hashing . Since directly using dense input embeddings as item representation obviate the need for embedding table parameters, we also introduce additional baselines for the Dense Input approach for a fair comparison, where we increase the ranking model layers by 1.5x and 2x to study how increasing the model depth affects the ranking performance. To generate the Semantic IDs, we use L = 8 𝐿 8 L=8 italic_L = 8 depth resulting in 8 codes in the Semantic ID of each video. The codebook size for RQ-VAE was set to K = 2048 𝐾 2048 K=2048 italic_K = 2048 .

𝑁 1 N+1 italic_N + 1 )-th day. We refer to this as CTR/1D. CTR AUC and CTR/1D AUC metrics evaluate the model’s ability to generalize over time due to data distribution shifts and cold-start items, respectively. A 0.1 % percent 0.1 0.1\% 0.1 % change in CTR AUC is considered significant for our ranking model.

4.2 Performance of Semantic ID

Storing content embeddings for each video in users’ watch history is highly resource intensive. Hence, training a baseline large-scale ranking model that uses content embeddings to represent each video in users’ watch history is infeasible. To better understand which representation method performs better, we consider two settings of the ranking model. First, we compare the SID-based representation with raw content embeddings and random hashing based ID such that user history is not used as an input feature (Figure 2 ). In this setting, two video features (i.e., current and candidate video) are used as input features to the ranking model. In the second setting, we use users’ watch history as the input feature (along with current and candidate video), where the SID-based representation is compared with random hashing (Figure 3 ).

Refer to caption

Dense Content Embedding vs. Random Hashing. We observe that directly using content embeddings (Dense Input) to replace random hashing-based IDs, without additional changes to the model architecture, doesn’t lead to better quality. As shown in figures  2(a) - 2(b) , the Dense Input baseline performs worse than the video-ID based baseline. We hypothesize that that the ranking models heavily rely on memorization from the ID-based embedding tables; replacing the embedding table with fixed dense content embeddings as a feature leads to poor CTR. For testing this hypothesis, we also ran experiments with 1.5x-2x layers in the ranking model to increase the model’s memorization ability for the Dense Input baseline. We found that increasing the depth does improve quality for both overall and cold-start items compared to the random hashing-baseline. In fact, the increase in CTR is higher for the Dense Input Model with 2x layers compared to Dense Input with 1.5x layers, indicating more the number of layers, better the memorization (Overall CTR) and generalization (cold-start CTR/1D). However, increasing the number of layers can cause the serving cost to increase considerably. As discussed below, SIDs allows retaining the semantic information from raw content embeddings, while still flexibly and efficiently providing memorization via learned embedding tables.

Refer to caption

SID vs. Baselines. We compare the two types of SID representations (N-gram and SPM) with the baselines, where for N-gram-SID, we use Unigram (N=1) and Bigram (N=2). When using N-gram, the embedding table size is based on all the possible combinations for the respective N-gram, i.e., Unigram-SID has 8 × K 8 𝐾 8\times K 8 × italic_K rows and Bigram-SID has 4 × K 2 4 superscript 𝐾 2 4\times K^{2} 4 × italic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT rows, respectively. We found that both Unigram-SID and Bigram-SID lead to worse overall CTR compared to Random Hashing when the user history is not used as an input feature (Figure  2 ). This could be because of skew in the content in the training data, causing sparse usage of the embedding table. This issue doesn’t occur in random hashing since the embeddings are uniformly used due to random assignment of videos to embeddings in the embedding table. On the other hand, when we use the user history as an input feature (Figure  3 ), both Unigram-SID and Bigram-SID perform much better than random hashing because the video content in users’ watch history is likely covering more diverse content, leading to more uniform usage of the embedding table. Next, we show impressive gains from the SPM-SID-based video representations. While SPM-SID consistently outperformed N-gram representations when employing larger embedding tables, particularly evident in the improved CTR/1D AUC metrics (see Figures   2(b) and   3(b) ), suggesting greater generalization capabilities towards cold-start items, a nuanced observation emerges for smaller embedding table sizes. Specifically, when the embedding table size is limited ( 8 × K 8 𝐾 8\times K 8 × italic_K or 4 × K 2 4 superscript 𝐾 2 4\times K^{2} 4 × italic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), N-gram methods demonstrate a slight advantage over SPM-SID. This behavior can be attributed to the smaller subword vocabulary learned by SPM within these constrained table sizes, potentially hindering its ability to fully capture complex semantic relationships. Note that for most production ranking models, a large embedding table is necessary for good quality. Hence, the SPM-SID based representation is more beneficial for large-scale production ranking models. Overall, both Bigram-SID and SPM-SID significantly outperformed random hashing in our experiments with large-scale ranking models, highlighting the importance of structured representations for capturing semantic relationships in improving cold-start video recommendations.

Efficiency in SPM-SID vs. N-gram-SID. In contrast to N-gramSID representations, which utilize fixed embedding table sizes, SPM-SID offers the flexibility of adapting to a given embedding table size. This adaptation is achieved through the construction of subwords directly based on the training data. Given a fixed embedding table, SPM dynamically generates subwords, each mapping to a unique table entry. This optimizes Semantic ID representation within the size constraint, improving video representation efficiency. Moreover, in terms of embedding table lookups SPM-SID is more optimal compared to N-gram-SID. We plot the number of embedding lookups per video vs. the embedding table size in figure  4 . The plot highlights the adaptive nature of SPM, where the number of lookups are dynamically reduced for the head/common videos in the training data, while the average number of lookups are comparable to the fixed number of lookups in N-gram. This adaptive nature of SPM contributes to its enhanced efficiency and scalability, making it a more suitable approach for large-scale ranking models.

5 Conclusion and Future Work

This paper tackles the challenging task of removing reliance on widely used item IDs in recommendation models. Using the YouTube ranking model as a case study, we discuss the disadvantages of using item ID features in large-scale production recommendation models. Using RQ-VAE, we develop Semantic IDs for billions of YouTube videos from frozen content embeddings to capture semantically meaningful hierarchical structures across the corpus. We propose and demonstrate Semantic IDs as an effective method for replacing video IDs to improve generalization by introducing meaningful collisions.

  • Cheng et al. [2016] H.-T. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson, G. Corrado, W. Chai, M. Ispir, et al. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems , pages 7–10, 2016.
  • Dhariwal et al. [2020] P. Dhariwal, H. Jun, C. Payne, J. W. Kim, A. Radford, and I. Sutskever. Jukebox: A generative model for music, 2020.
  • Du et al. [2020] X. Du, X. Wang, X. He, Z. Li, J. Tang, and T.-S. Chua. How to learn item representation for cold-start multimedia recommendation? In Proceedings of the 28th ACM International Conference on Multimedia , pages 3469–3477, 2020.
  • Esser et al. [2021] P. Esser, R. Rombach, and B. Ommer. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 12873–12883, 2021.
  • Hou et al. [2022] Y. Hou, Z. He, J. McAuley, and W. X. Zhao. Learning vector-quantized item representation for transferable sequential recommenders. arXiv preprint arXiv:2210.12316 , 2022.
  • Jegou et al. [2010] H. Jegou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence , 33(1):117–128, 2010.
  • Kang and McAuley [2018] W. Kang and J. J. McAuley. Self-attentive sequential recommendation. CoRR , abs/1808.09781, 2018. URL http://arxiv.org/abs/1808.09781 .
  • Kang et al. [2020] W.-C. Kang, D. Z. Cheng, T. Chen, X. Yi, D. Lin, L. Hong, and E. H. Chi. Learning multi-granular quantized embeddings for large-vocab categorical features in recommender systems. In Companion Proceedings of the Web Conference 2020 , pages 562–566, 2020.
  • Kang et al. [2021] W.-C. Kang, D. Z. Cheng, T. Yao, X. Yi, T. Chen, L. Hong, and E. H. Chi. Learning to embed categorical features without embedding tables for recommendation. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , pages 840–850, 2021.
  • Kim et al. [2007] D. Kim, K.-s. Kim, K.-H. Park, J.-H. Lee, and K. M. Lee. A music recommendation system with a dynamic k-means clustering algorithm. In Sixth international conference on machine learning and applications (ICMLA 2007) , pages 399–403. IEEE, 2007.
  • Koren et al. [2009] Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer , 42(8):30–37, 2009.
  • Kudo [2018] T. Kudo. Subword regularization: Improving neural network translation models with multiple subword candidates, 2018.
  • Lee et al. [2022] D. Lee, C. Kim, S. Kim, M. Cho, and W.-S. Han. Autoregressive image generation using residual quantization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 11523–11532, 2022.
  • Lee et al. [2020] H. Lee, J. Lee, J. Y.-H. Ng, and P. Natsev. Large scale video representation learning via relational graph clustering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , June 2020.
  • Mikolov et al. [2013] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems , 26, 2013.
  • Ni et al. [2023] Y. Ni, Y. Cheng, X. Liu, J. Fu, Y. Li, X. He, Y. Zhang, and F. Yuan. A content-driven micro-video recommendation dataset at scale. arXiv preprint arXiv:2309.15379 , 2023.
  • Pancha et al. [2022] N. Pancha, A. Zhai, J. Leskovec, and C. Rosenberg. Pinnerformer: Sequence modeling for user representation at pinterest. arXiv preprint arXiv:2205.04507 , 2022.
  • Rajput et al. [2023] S. Rajput, N. Mehta, A. Singh, R. Keshavan, T. Vu, L. Heldt, L. Hong, Y. Tay, V. Q. Tran, J. Samost, and M. Sathiamoorthy. Recommender systems with generative retrieval. Advances in Neural Information Processing Systems , 2023.
  • Rendle et al. [2020] S. Rendle, W. Krichene, L. Zhang, and J. Anderson. Neural collaborative filtering vs. matrix factorization revisited, 2020.
  • Schein et al. [2002] A. I. Schein, A. Popescul, L. H. Ungar, and D. M. Pennock. Methods and metrics for cold-start recommendations. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval , pages 253–260, 2002.
  • Shi et al. [2019] S. Shi, M. Zhang, X. Yu, Y. Zhang, B. Hao, Y. Liu, and S. Ma. Adaptive feature sampling for recommendation with missing content feature values. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management , pages 1451–1460, 2019.
  • Sun et al. [2019] C. Sun, A. Myers, C. Vondrick, K. Murphy, and C. Schmid. Videobert: A joint model for video and language representation learning. CoRR , abs/1904.01766, 2019. URL http://arxiv.org/abs/1904.01766 .
  • Tang et al. [2023] J. Tang, Y. Drori, D. Chang, M. Sathiamoorthy, J. Gilmer, L. Wei, X. Yi, L. Hong, and E. H. Chi. Improving training stability for multitask ranking models in recommender systems. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining . ACM, aug 2023. doi: 10.1145/3580305.3599846 . URL https://doi.org/10.1145%2F3580305.3599846 .
  • Van Den Oord et al. [2017] A. Van Den Oord, O. Vinyals, et al. Neural discrete representation learning. Advances in neural information processing systems , 30, 2017.
  • Volkovs et al. [2017a] M. Volkovs, G. Yu, and T. Poutanen. Dropoutnet: Addressing cold start in recommender systems. Advances in neural information processing systems , 30, 2017a.
  • Volkovs et al. [2017b] M. Volkovs, G. W. Yu, and T. Poutanen. Content-based neighbor models for cold start in recommender systems. In Proceedings of the Recommender Systems Challenge 2017 , pages 1–6. 2017b.
  • Wang et al. [2022] J. Wang, F. Yuan, M. Cheng, J. M. Jose, C. Yu, B. Kong, X. He, Z. Wang, B. Hu, and Z. Li. Transrec: Learning transferable recommendation from mixture-of-modality feedback. arXiv preprint arXiv:2206.06190 , 2022.
  • Weinberger et al. [2009] K. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg. Feature hashing for large scale multitask learning. In Proceedings of the 26th annual international conference on machine learning , pages 1113–1120, 2009.
  • Ying et al. [2018] R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . ACM, jul 2018. doi: 10.1145/3219819.3219890 . URL https://doi.org/10.1145%2F3219819.3219890 .
  • Yu et al. [2022] J. Yu, Y. Xu, J. Y. Koh, T. Luong, G. Baid, Z. Wang, V. Vasudevan, A. Ku, Y. Yang, B. K. Ayan, et al. Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789 , 2022.
  • Yuan et al. [2023] Z. Yuan, F. Yuan, Y. Song, Y. Li, J. Fu, F. Yang, Y. Pan, and Y. Ni. Where to go next for recommender systems? id- vs. modality-based recommender models revisited, 2023.
  • Zeghidour et al. [2021] N. Zeghidour, A. Luebs, A. Omran, J. Skoglund, and M. Tagliasacchi. Soundstream: An end-to-end neural audio codec. CoRR , abs/2107.03312, 2021. URL https://arxiv.org/abs/2107.03312 .
  • Zhang et al. [2020] C. Zhang, Y. Liu, Y. Xie, S. I. Ktena, A. Tejani, A. Gupta, P. K. Myana, D. Dilipkumar, S. Paul, I. Ihara, et al. Model size reduction using frequency based double hashing for recommender systems. In Proceedings of the 14th ACM Conference on Recommender Systems , pages 521–526, 2020.
  • Zhao et al. [2019a] Z. Zhao, L. Hong, L. Wei, J. Chen, A. Nath, S. Andrews, A. Kumthekar, M. Sathiamoorthy, X. Yi, and E. Chi. Recommending what video to watch next: A multitask ranking system. In Proceedings of the 13th ACM Conference on Recommender Systems , RecSys ’19, page 43–51, New York, NY, USA, 2019a. Association for Computing Machinery. ISBN 9781450362436. doi: 10.1145/3298689.3346997 . URL https://doi.org/10.1145/3298689.3346997 .
  • Zhao et al. [2019b] Z. Zhao, L. Hong, L. Wei, J. Chen, A. Nath, S. Andrews, A. Kumthekar, M. Sathiamoorthy, X. Yi, and E. Chi. Recommending what video to watch next: a multitask ranking system. In Proceedings of the 13th ACM Conference on Recommender Systems , pages 43–51, 2019b.

Appendix A Appendix

A.1 rq-vae training and serving setup.

Model Hyperparameters. For the RQ-VAE model, we use a 1 1 1 1 -layer encoder decoder model with dimension 256. We apply L = 8 𝐿 8 L=8 italic_L = 8 levels of quantization using codebook size K = 2048 𝐾 2048 K=2048 italic_K = 2048 for each.

RQ-VAE Training : We train the RQ-VAE model on a random sample of impressed videos until the reconstruction loss stabilizes ( ≈ \approx ≈ 10s of millions of steps for our corpus). Vector quantization techniques are known to suffer from codebook collapse   Dhariwal et al. [ 2020 ] during training, where the model only uses a small proportion of codebook vectors. To address this challenge, we reset unused codebook vectors at each training step to content embeddings of randomly sampled videos from within the batch  Zeghidour et al. [ 2021 ] , which significantly improved the codebook utilization. We use β = 0.25 𝛽 0.25 \beta=0.25 italic_β = 0.25 to compute the training loss. Once trained, we freeze the RQ-VAE model and use the encoder to produce Semantic IDs for videos.

RQ-VAE Serving/Inference : As new videos get introduced into the corpus, we generate the Semantic IDs using the frozen RQ-VAE model. Semantic IDs are then stored and served similarly to other features used for ranking.

A.2 Stability of Semantic IDs over time

Refer to caption

To study Semantic IDs’ stability, we train two RQ-VAE models: RQ-VAE v0 and RQ-VAE v1 , using data  6 months apart. Figure  5 shows that the performance of the production ranking model trained on recent engagement data (using SID-3Bigram-sum) are comparable for Semantic IDs derived from both RQ-VAE v0 and RQ-VAE v1 . This confirms that semantic token space for videos learned via RQ-VAE is stable for use in the downstream production ranking model over time.

A.3 Semantic IDs as hierarchy of concepts

We illustrate the hierarchy of concepts captured by Semantic IDs from the videos in our corpus. Section  4.1 details the hyper-parameters used to train the RQ-VAE model. Intuitively, we can think of Semantic IDs as forming a trie over videos, with higher levels representing coarser concepts and lower levels representing more fine-grained concepts. Figures 6 and 7 show two example sub-tries from our trained RQ-VAE model with 4 tokens that captures a hierarchy of concepts within sports and food vlogging videos.

A.4 Similarity Analysis with Semantic ID

Table 1 shows the average pairwise cosine similarity in the content embedding space for all videos with a shared Semantic ID prefix of length n 𝑛 n italic_n and their corresponding sub-trie sizes. We consider two videos with Semantic IDs ( 1 , 2 , 3 , 4 ) 1 2 3 4 (1,2,3,4) ( 1 , 2 , 3 , 4 ) and ( 1 , 2 , 6 , 7 ) 1 2 6 7 (1,2,6,7) ( 1 , 2 , 6 , 7 ) to have a shared prefix of length 2 2 2 2 . We observe that as the shared prefix length increases, average pairwise cosine similarity increases while the sub-trie size decreases. These suggest that Semantic ID prefixes represent increasingly granular concepts as their lengths increase.

Refer to caption

Recommendation system using content filtering: A case study for college campus placement

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Organizing Your Social Sciences Research Assignments

  • Annotated Bibliography
  • Analyzing a Scholarly Journal Article
  • Group Presentations
  • Dealing with Nervousness
  • Using Visual Aids
  • Grading Someone Else's Paper
  • Types of Structured Group Activities
  • Group Project Survival Skills
  • Leading a Class Discussion
  • Multiple Book Review Essay
  • Reviewing Collected Works
  • Writing a Case Analysis Paper
  • Writing a Case Study
  • About Informed Consent
  • Writing Field Notes
  • Writing a Policy Memo
  • Writing a Reflective Paper
  • Writing a Research Proposal
  • Generative AI and Writing
  • Acknowledgments

A case study research paper examines a person, place, event, condition, phenomenon, or other type of subject of analysis in order to extrapolate  key themes and results that help predict future trends, illuminate previously hidden issues that can be applied to practice, and/or provide a means for understanding an important research problem with greater clarity. A case study research paper usually examines a single subject of analysis, but case study papers can also be designed as a comparative investigation that shows relationships between two or more subjects. The methods used to study a case can rest within a quantitative, qualitative, or mixed-method investigative paradigm.

Case Studies. Writing@CSU. Colorado State University; Mills, Albert J. , Gabrielle Durepos, and Eiden Wiebe, editors. Encyclopedia of Case Study Research . Thousand Oaks, CA: SAGE Publications, 2010 ; “What is a Case Study?” In Swanborn, Peter G. Case Study Research: What, Why and How? London: SAGE, 2010.

How to Approach Writing a Case Study Research Paper

General information about how to choose a topic to investigate can be found under the " Choosing a Research Problem " tab in the Organizing Your Social Sciences Research Paper writing guide. Review this page because it may help you identify a subject of analysis that can be investigated using a case study design.

However, identifying a case to investigate involves more than choosing the research problem . A case study encompasses a problem contextualized around the application of in-depth analysis, interpretation, and discussion, often resulting in specific recommendations for action or for improving existing conditions. As Seawright and Gerring note, practical considerations such as time and access to information can influence case selection, but these issues should not be the sole factors used in describing the methodological justification for identifying a particular case to study. Given this, selecting a case includes considering the following:

  • The case represents an unusual or atypical example of a research problem that requires more in-depth analysis? Cases often represent a topic that rests on the fringes of prior investigations because the case may provide new ways of understanding the research problem. For example, if the research problem is to identify strategies to improve policies that support girl's access to secondary education in predominantly Muslim nations, you could consider using Azerbaijan as a case study rather than selecting a more obvious nation in the Middle East. Doing so may reveal important new insights into recommending how governments in other predominantly Muslim nations can formulate policies that support improved access to education for girls.
  • The case provides important insight or illuminate a previously hidden problem? In-depth analysis of a case can be based on the hypothesis that the case study will reveal trends or issues that have not been exposed in prior research or will reveal new and important implications for practice. For example, anecdotal evidence may suggest drug use among homeless veterans is related to their patterns of travel throughout the day. Assuming prior studies have not looked at individual travel choices as a way to study access to illicit drug use, a case study that observes a homeless veteran could reveal how issues of personal mobility choices facilitate regular access to illicit drugs. Note that it is important to conduct a thorough literature review to ensure that your assumption about the need to reveal new insights or previously hidden problems is valid and evidence-based.
  • The case challenges and offers a counter-point to prevailing assumptions? Over time, research on any given topic can fall into a trap of developing assumptions based on outdated studies that are still applied to new or changing conditions or the idea that something should simply be accepted as "common sense," even though the issue has not been thoroughly tested in current practice. A case study analysis may offer an opportunity to gather evidence that challenges prevailing assumptions about a research problem and provide a new set of recommendations applied to practice that have not been tested previously. For example, perhaps there has been a long practice among scholars to apply a particular theory in explaining the relationship between two subjects of analysis. Your case could challenge this assumption by applying an innovative theoretical framework [perhaps borrowed from another discipline] to explore whether this approach offers new ways of understanding the research problem. Taking a contrarian stance is one of the most important ways that new knowledge and understanding develops from existing literature.
  • The case provides an opportunity to pursue action leading to the resolution of a problem? Another way to think about choosing a case to study is to consider how the results from investigating a particular case may result in findings that reveal ways in which to resolve an existing or emerging problem. For example, studying the case of an unforeseen incident, such as a fatal accident at a railroad crossing, can reveal hidden issues that could be applied to preventative measures that contribute to reducing the chance of accidents in the future. In this example, a case study investigating the accident could lead to a better understanding of where to strategically locate additional signals at other railroad crossings so as to better warn drivers of an approaching train, particularly when visibility is hindered by heavy rain, fog, or at night.
  • The case offers a new direction in future research? A case study can be used as a tool for an exploratory investigation that highlights the need for further research about the problem. A case can be used when there are few studies that help predict an outcome or that establish a clear understanding about how best to proceed in addressing a problem. For example, after conducting a thorough literature review [very important!], you discover that little research exists showing the ways in which women contribute to promoting water conservation in rural communities of east central Africa. A case study of how women contribute to saving water in a rural village of Uganda can lay the foundation for understanding the need for more thorough research that documents how women in their roles as cooks and family caregivers think about water as a valuable resource within their community. This example of a case study could also point to the need for scholars to build new theoretical frameworks around the topic [e.g., applying feminist theories of work and family to the issue of water conservation].

Eisenhardt, Kathleen M. “Building Theories from Case Study Research.” Academy of Management Review 14 (October 1989): 532-550; Emmel, Nick. Sampling and Choosing Cases in Qualitative Research: A Realist Approach . Thousand Oaks, CA: SAGE Publications, 2013; Gerring, John. “What Is a Case Study and What Is It Good for?” American Political Science Review 98 (May 2004): 341-354; Mills, Albert J. , Gabrielle Durepos, and Eiden Wiebe, editors. Encyclopedia of Case Study Research . Thousand Oaks, CA: SAGE Publications, 2010; Seawright, Jason and John Gerring. "Case Selection Techniques in Case Study Research." Political Research Quarterly 61 (June 2008): 294-308.

Structure and Writing Style

The purpose of a paper in the social sciences designed around a case study is to thoroughly investigate a subject of analysis in order to reveal a new understanding about the research problem and, in so doing, contributing new knowledge to what is already known from previous studies. In applied social sciences disciplines [e.g., education, social work, public administration, etc.], case studies may also be used to reveal best practices, highlight key programs, or investigate interesting aspects of professional work.

In general, the structure of a case study research paper is not all that different from a standard college-level research paper. However, there are subtle differences you should be aware of. Here are the key elements to organizing and writing a case study research paper.

I.  Introduction

As with any research paper, your introduction should serve as a roadmap for your readers to ascertain the scope and purpose of your study . The introduction to a case study research paper, however, should not only describe the research problem and its significance, but you should also succinctly describe why the case is being used and how it relates to addressing the problem. The two elements should be linked. With this in mind, a good introduction answers these four questions:

  • What is being studied? Describe the research problem and describe the subject of analysis [the case] you have chosen to address the problem. Explain how they are linked and what elements of the case will help to expand knowledge and understanding about the problem.
  • Why is this topic important to investigate? Describe the significance of the research problem and state why a case study design and the subject of analysis that the paper is designed around is appropriate in addressing the problem.
  • What did we know about this topic before I did this study? Provide background that helps lead the reader into the more in-depth literature review to follow. If applicable, summarize prior case study research applied to the research problem and why it fails to adequately address the problem. Describe why your case will be useful. If no prior case studies have been used to address the research problem, explain why you have selected this subject of analysis.
  • How will this study advance new knowledge or new ways of understanding? Explain why your case study will be suitable in helping to expand knowledge and understanding about the research problem.

Each of these questions should be addressed in no more than a few paragraphs. Exceptions to this can be when you are addressing a complex research problem or subject of analysis that requires more in-depth background information.

II.  Literature Review

The literature review for a case study research paper is generally structured the same as it is for any college-level research paper. The difference, however, is that the literature review is focused on providing background information and  enabling historical interpretation of the subject of analysis in relation to the research problem the case is intended to address . This includes synthesizing studies that help to:

  • Place relevant works in the context of their contribution to understanding the case study being investigated . This would involve summarizing studies that have used a similar subject of analysis to investigate the research problem. If there is literature using the same or a very similar case to study, you need to explain why duplicating past research is important [e.g., conditions have changed; prior studies were conducted long ago, etc.].
  • Describe the relationship each work has to the others under consideration that informs the reader why this case is applicable . Your literature review should include a description of any works that support using the case to investigate the research problem and the underlying research questions.
  • Identify new ways to interpret prior research using the case study . If applicable, review any research that has examined the research problem using a different research design. Explain how your use of a case study design may reveal new knowledge or a new perspective or that can redirect research in an important new direction.
  • Resolve conflicts amongst seemingly contradictory previous studies . This refers to synthesizing any literature that points to unresolved issues of concern about the research problem and describing how the subject of analysis that forms the case study can help resolve these existing contradictions.
  • Point the way in fulfilling a need for additional research . Your review should examine any literature that lays a foundation for understanding why your case study design and the subject of analysis around which you have designed your study may reveal a new way of approaching the research problem or offer a perspective that points to the need for additional research.
  • Expose any gaps that exist in the literature that the case study could help to fill . Summarize any literature that not only shows how your subject of analysis contributes to understanding the research problem, but how your case contributes to a new way of understanding the problem that prior research has failed to do.
  • Locate your own research within the context of existing literature [very important!] . Collectively, your literature review should always place your case study within the larger domain of prior research about the problem. The overarching purpose of reviewing pertinent literature in a case study paper is to demonstrate that you have thoroughly identified and synthesized prior studies in relation to explaining the relevance of the case in addressing the research problem.

III.  Method

In this section, you explain why you selected a particular case [i.e., subject of analysis] and the strategy you used to identify and ultimately decide that your case was appropriate in addressing the research problem. The way you describe the methods used varies depending on the type of subject of analysis that constitutes your case study.

If your subject of analysis is an incident or event . In the social and behavioral sciences, the event or incident that represents the case to be studied is usually bounded by time and place, with a clear beginning and end and with an identifiable location or position relative to its surroundings. The subject of analysis can be a rare or critical event or it can focus on a typical or regular event. The purpose of studying a rare event is to illuminate new ways of thinking about the broader research problem or to test a hypothesis. Critical incident case studies must describe the method by which you identified the event and explain the process by which you determined the validity of this case to inform broader perspectives about the research problem or to reveal new findings. However, the event does not have to be a rare or uniquely significant to support new thinking about the research problem or to challenge an existing hypothesis. For example, Walo, Bull, and Breen conducted a case study to identify and evaluate the direct and indirect economic benefits and costs of a local sports event in the City of Lismore, New South Wales, Australia. The purpose of their study was to provide new insights from measuring the impact of a typical local sports event that prior studies could not measure well because they focused on large "mega-events." Whether the event is rare or not, the methods section should include an explanation of the following characteristics of the event: a) when did it take place; b) what were the underlying circumstances leading to the event; and, c) what were the consequences of the event in relation to the research problem.

If your subject of analysis is a person. Explain why you selected this particular individual to be studied and describe what experiences they have had that provide an opportunity to advance new understandings about the research problem. Mention any background about this person which might help the reader understand the significance of their experiences that make them worthy of study. This includes describing the relationships this person has had with other people, institutions, and/or events that support using them as the subject for a case study research paper. It is particularly important to differentiate the person as the subject of analysis from others and to succinctly explain how the person relates to examining the research problem [e.g., why is one politician in a particular local election used to show an increase in voter turnout from any other candidate running in the election]. Note that these issues apply to a specific group of people used as a case study unit of analysis [e.g., a classroom of students].

If your subject of analysis is a place. In general, a case study that investigates a place suggests a subject of analysis that is unique or special in some way and that this uniqueness can be used to build new understanding or knowledge about the research problem. A case study of a place must not only describe its various attributes relevant to the research problem [e.g., physical, social, historical, cultural, economic, political], but you must state the method by which you determined that this place will illuminate new understandings about the research problem. It is also important to articulate why a particular place as the case for study is being used if similar places also exist [i.e., if you are studying patterns of homeless encampments of veterans in open spaces, explain why you are studying Echo Park in Los Angeles rather than Griffith Park?]. If applicable, describe what type of human activity involving this place makes it a good choice to study [e.g., prior research suggests Echo Park has more homeless veterans].

If your subject of analysis is a phenomenon. A phenomenon refers to a fact, occurrence, or circumstance that can be studied or observed but with the cause or explanation to be in question. In this sense, a phenomenon that forms your subject of analysis can encompass anything that can be observed or presumed to exist but is not fully understood. In the social and behavioral sciences, the case usually focuses on human interaction within a complex physical, social, economic, cultural, or political system. For example, the phenomenon could be the observation that many vehicles used by ISIS fighters are small trucks with English language advertisements on them. The research problem could be that ISIS fighters are difficult to combat because they are highly mobile. The research questions could be how and by what means are these vehicles used by ISIS being supplied to the militants and how might supply lines to these vehicles be cut off? How might knowing the suppliers of these trucks reveal larger networks of collaborators and financial support? A case study of a phenomenon most often encompasses an in-depth analysis of a cause and effect that is grounded in an interactive relationship between people and their environment in some way.

NOTE:   The choice of the case or set of cases to study cannot appear random. Evidence that supports the method by which you identified and chose your subject of analysis should clearly support investigation of the research problem and linked to key findings from your literature review. Be sure to cite any studies that helped you determine that the case you chose was appropriate for examining the problem.

IV.  Discussion

The main elements of your discussion section are generally the same as any research paper, but centered around interpreting and drawing conclusions about the key findings from your analysis of the case study. Note that a general social sciences research paper may contain a separate section to report findings. However, in a paper designed around a case study, it is common to combine a description of the results with the discussion about their implications. The objectives of your discussion section should include the following:

Reiterate the Research Problem/State the Major Findings Briefly reiterate the research problem you are investigating and explain why the subject of analysis around which you designed the case study were used. You should then describe the findings revealed from your study of the case using direct, declarative, and succinct proclamation of the study results. Highlight any findings that were unexpected or especially profound.

Explain the Meaning of the Findings and Why They are Important Systematically explain the meaning of your case study findings and why you believe they are important. Begin this part of the section by repeating what you consider to be your most important or surprising finding first, then systematically review each finding. Be sure to thoroughly extrapolate what your analysis of the case can tell the reader about situations or conditions beyond the actual case that was studied while, at the same time, being careful not to misconstrue or conflate a finding that undermines the external validity of your conclusions.

Relate the Findings to Similar Studies No study in the social sciences is so novel or possesses such a restricted focus that it has absolutely no relation to previously published research. The discussion section should relate your case study results to those found in other studies, particularly if questions raised from prior studies served as the motivation for choosing your subject of analysis. This is important because comparing and contrasting the findings of other studies helps support the overall importance of your results and it highlights how and in what ways your case study design and the subject of analysis differs from prior research about the topic.

Consider Alternative Explanations of the Findings Remember that the purpose of social science research is to discover and not to prove. When writing the discussion section, you should carefully consider all possible explanations revealed by the case study results, rather than just those that fit your hypothesis or prior assumptions and biases. Be alert to what the in-depth analysis of the case may reveal about the research problem, including offering a contrarian perspective to what scholars have stated in prior research if that is how the findings can be interpreted from your case.

Acknowledge the Study's Limitations You can state the study's limitations in the conclusion section of your paper but describing the limitations of your subject of analysis in the discussion section provides an opportunity to identify the limitations and explain why they are not significant. This part of the discussion section should also note any unanswered questions or issues your case study could not address. More detailed information about how to document any limitations to your research can be found here .

Suggest Areas for Further Research Although your case study may offer important insights about the research problem, there are likely additional questions related to the problem that remain unanswered or findings that unexpectedly revealed themselves as a result of your in-depth analysis of the case. Be sure that the recommendations for further research are linked to the research problem and that you explain why your recommendations are valid in other contexts and based on the original assumptions of your study.

V.  Conclusion

As with any research paper, you should summarize your conclusion in clear, simple language; emphasize how the findings from your case study differs from or supports prior research and why. Do not simply reiterate the discussion section. Provide a synthesis of key findings presented in the paper to show how these converge to address the research problem. If you haven't already done so in the discussion section, be sure to document the limitations of your case study and any need for further research.

The function of your paper's conclusion is to: 1) reiterate the main argument supported by the findings from your case study; 2) state clearly the context, background, and necessity of pursuing the research problem using a case study design in relation to an issue, controversy, or a gap found from reviewing the literature; and, 3) provide a place to persuasively and succinctly restate the significance of your research problem, given that the reader has now been presented with in-depth information about the topic.

Consider the following points to help ensure your conclusion is appropriate:

  • If the argument or purpose of your paper is complex, you may need to summarize these points for your reader.
  • If prior to your conclusion, you have not yet explained the significance of your findings or if you are proceeding inductively, use the conclusion of your paper to describe your main points and explain their significance.
  • Move from a detailed to a general level of consideration of the case study's findings that returns the topic to the context provided by the introduction or within a new context that emerges from your case study findings.

Note that, depending on the discipline you are writing in or the preferences of your professor, the concluding paragraph may contain your final reflections on the evidence presented as it applies to practice or on the essay's central research problem. However, the nature of being introspective about the subject of analysis you have investigated will depend on whether you are explicitly asked to express your observations in this way.

Problems to Avoid

Overgeneralization One of the goals of a case study is to lay a foundation for understanding broader trends and issues applied to similar circumstances. However, be careful when drawing conclusions from your case study. They must be evidence-based and grounded in the results of the study; otherwise, it is merely speculation. Looking at a prior example, it would be incorrect to state that a factor in improving girls access to education in Azerbaijan and the policy implications this may have for improving access in other Muslim nations is due to girls access to social media if there is no documentary evidence from your case study to indicate this. There may be anecdotal evidence that retention rates were better for girls who were engaged with social media, but this observation would only point to the need for further research and would not be a definitive finding if this was not a part of your original research agenda.

Failure to Document Limitations No case is going to reveal all that needs to be understood about a research problem. Therefore, just as you have to clearly state the limitations of a general research study , you must describe the specific limitations inherent in the subject of analysis. For example, the case of studying how women conceptualize the need for water conservation in a village in Uganda could have limited application in other cultural contexts or in areas where fresh water from rivers or lakes is plentiful and, therefore, conservation is understood more in terms of managing access rather than preserving access to a scarce resource.

Failure to Extrapolate All Possible Implications Just as you don't want to over-generalize from your case study findings, you also have to be thorough in the consideration of all possible outcomes or recommendations derived from your findings. If you do not, your reader may question the validity of your analysis, particularly if you failed to document an obvious outcome from your case study research. For example, in the case of studying the accident at the railroad crossing to evaluate where and what types of warning signals should be located, you failed to take into consideration speed limit signage as well as warning signals. When designing your case study, be sure you have thoroughly addressed all aspects of the problem and do not leave gaps in your analysis that leave the reader questioning the results.

Case Studies. Writing@CSU. Colorado State University; Gerring, John. Case Study Research: Principles and Practices . New York: Cambridge University Press, 2007; Merriam, Sharan B. Qualitative Research and Case Study Applications in Education . Rev. ed. San Francisco, CA: Jossey-Bass, 1998; Miller, Lisa L. “The Use of Case Studies in Law and Social Science Research.” Annual Review of Law and Social Science 14 (2018): TBD; Mills, Albert J., Gabrielle Durepos, and Eiden Wiebe, editors. Encyclopedia of Case Study Research . Thousand Oaks, CA: SAGE Publications, 2010; Putney, LeAnn Grogan. "Case Study." In Encyclopedia of Research Design , Neil J. Salkind, editor. (Thousand Oaks, CA: SAGE Publications, 2010), pp. 116-120; Simons, Helen. Case Study Research in Practice . London: SAGE Publications, 2009;  Kratochwill,  Thomas R. and Joel R. Levin, editors. Single-Case Research Design and Analysis: New Development for Psychology and Education .  Hilldsale, NJ: Lawrence Erlbaum Associates, 1992; Swanborn, Peter G. Case Study Research: What, Why and How? London : SAGE, 2010; Yin, Robert K. Case Study Research: Design and Methods . 6th edition. Los Angeles, CA, SAGE Publications, 2014; Walo, Maree, Adrian Bull, and Helen Breen. “Achieving Economic Benefits at Local Events: A Case Study of a Local Sports Event.” Festival Management and Event Tourism 4 (1996): 95-106.

Writing Tip

At Least Five Misconceptions about Case Study Research

Social science case studies are often perceived as limited in their ability to create new knowledge because they are not randomly selected and findings cannot be generalized to larger populations. Flyvbjerg examines five misunderstandings about case study research and systematically "corrects" each one. To quote, these are:

Misunderstanding 1 :  General, theoretical [context-independent] knowledge is more valuable than concrete, practical [context-dependent] knowledge. Misunderstanding 2 :  One cannot generalize on the basis of an individual case; therefore, the case study cannot contribute to scientific development. Misunderstanding 3 :  The case study is most useful for generating hypotheses; that is, in the first stage of a total research process, whereas other methods are more suitable for hypotheses testing and theory building. Misunderstanding 4 :  The case study contains a bias toward verification, that is, a tendency to confirm the researcher’s preconceived notions. Misunderstanding 5 :  It is often difficult to summarize and develop general propositions and theories on the basis of specific case studies [p. 221].

While writing your paper, think introspectively about how you addressed these misconceptions because to do so can help you strengthen the validity and reliability of your research by clarifying issues of case selection, the testing and challenging of existing assumptions, the interpretation of key findings, and the summation of case outcomes. Think of a case study research paper as a complete, in-depth narrative about the specific properties and key characteristics of your subject of analysis applied to the research problem.

Flyvbjerg, Bent. “Five Misunderstandings About Case-Study Research.” Qualitative Inquiry 12 (April 2006): 219-245.

  • << Previous: Writing a Case Analysis Paper
  • Next: Writing a Field Report >>
  • Last Updated: May 31, 2024 1:46 PM
  • URL: https://libguides.usc.edu/writingguide/assignments

404 Not found

404 Not found

Find Keyword Ideas in Seconds

Boost SEO results with powerful keyword research

22 Content Marketing Examples to Inspire You

Leigh McKenzie

Written by Leigh McKenzie

Content Marketing Examples – Blog post image

We reviewed dozens of content marketing examples in different formats and from multiple channels. And handpicked the 22 most inspiring examples to share here with you.

Take a look at what was launched, what worked, and what you can borrow from each example to create winning content marketing campaigns of your own.

Innovative Blog Posts

Your blog isn’t just a play for search engine optimization (SEO) .

Posts can be written to communicate your industry expertise. Build thought leadership. Or create link-worthy content.

Here are a few innovative content marketing examples for your blog:

1. Backlinko’s Skyscraper Technique 2.0

Backlinko’s founder, Brian Dean, wrote the “ Skyscraper Technique 2.0 ” blog. A sequel to his original piece introducing the skyscraper technique .

The blog previews the new landscape in which backlinks aren’t enough to win at SEO.

Brian shares his insights on creating and optimizing pages based on user intent. Referencing case studies from his own blog posts.

Skyscraper Technique 2.0

The “Skyscraper Technique 2.0” blog post was shared over 1,600 times and gained over 600 backlinks.

Here’s why:

Readers get a hands-on understanding of the Skyscraper 2.0 method with three clear steps for using it.

Each step is illustrated with multiple examples to contextualize the recommendation. Instead of simply sharing a tip, Brian shows exactly how he implemented it in his work.

Like here, he tells readers to “change the format” of a piece of content. And he uses a screenshot to visualize how he acted on this advice—changing a step-by-step case study into one with checklists.

This example makes the concept easy to understand and execute.

Change of format

Do It Your Way

Choose blog topics that are directly aligned with your expertise. Then, create blog posts on those topics using real-life examples of your work. This makes the content actionable for readers.

Further reading: Learn how to find trending topics in your industry using Google Trends

2. Buffer’s Open Salary System

Buffer is a social media management tool with a distributed, remote team. The company’s founder and CEO, Joel Gascoigne, introduced one of their most innovative policies in this Open Salary System blog .

Buffer – Salary System

In the post, Gascoigne shares his ideas and beliefs about the policy. He reveals to readers how the open salary system emerged, evolved, and currently works.

He talks about how the team ran several experiments to calculate salaries based on the cost of living for every location. The post refers to this concept as “the Good Life Curve.”

Salary System – Good Life Curve

The piece doesn’t read like your usual blog post.

Rather, it’s an exhaustive essay that walks the reader through every logistical detail of the policy. And, in doing so, gives them a peek into Buffer’s brand values and compensation principles.

This is a great example of thought leadership content.

Replicate this idea by creating blog posts like these that share perspectives from your company founders or senior leadership. This can build more credibility for your brand. And help establish your company’s long-term strategic differentiators.

3. Hotjar’s AI vs. Human Writer Experiment

Hotjar is a behavior analytics tool that’s used to capture data around user behavior with heatmaps, session recordings, and surveys.

The team designed an experiment that would compare a human writer against ChatGPT. They published a blog titled “ Woman vs. machine ” with links to two competing pieces—one written by a human writer and one by ChatGPT.

The blog sets the stage for the experiment. It evaluates the substance of the competing pieces across categories like outline, time, cost, tone, and more, to present a comparative analysis. But it doesn’t quite declare a winner.

Instead, it lays out performance metrics that the team planned to track for six months before declaring a final winner in the competition.

HotJar – AI vs human

This blog went live at the height of the AI versus human writers debate. So, it gained good momentum right from the get-go.

And then Tawni Sattler, Hotjar’s former Content Marketing Lead, posted about it on LinkedIn.

Her post blew up with over 700 reactions, nearly 100 comments, and 18 reposts.

LinkedIn – Tawni Sattler – AI vs humans

Another (likely calculated) benefit to the exposure? The follow-up blog discussing the experiment’s results shows a few of Hotjar’s features in action—like Feedback and Scroll Maps. So it drives product awareness as well.

HotJar – Reader sentiment

This campaign is a great example of opportunistic content creation. You can create content around a topic that people are already interested in. And by joining in the conversation, you stand a chance of capitalizing on its existing momentum.

To start, identify popular or even controversial topics in your industry. And write a blog post with an intriguing angle to capture this existing interest in the topic.

Bonus, you can show your product in the content to build awareness around your use cases.

Further reading: 12 Content Marketing Trends That Will Continue in 2024

4. Olipop’s Guide to Recycling

Olipop is a soft drink company that makes sodas with plant fiber and prebiotics.

Olipop’s website has a dedicated “Learn” section where visitors can explore the brand’s origin story, read blog posts, find ingredient details, and more.

Olipop – Learn

The brand publishes blogs around its core values, like sustainability, diversity, and nutritional health.

This guide to recycling targets one of Olipop’s ideal customers: The eco-conscious buyer who makes mindful choices for the environment. Published during Earth Month, the post shares insights on the importance of recycling and quick tips to recycle items.

Olipop – How to recycle

The brand also highlights its approach to sustainability and recycling.

It’s a great example of an ecommerce brand creating content that strengthens its positioning and earns buyers’ trust.

Olipop – Recycling approach

Your company blog can be a gateway to your brand’s core identity. Give your audience an insider’s view of your business goals, values, and initiatives while educating readers on relevant topics.

You can first chalk out themes and ideas that matter to your buyers. Then, create insight-packed blogs to share actionable tips, customer stories, and your brand’s initiatives.

Viral Video Campaigns

An impressive 89% of people want to see more videos from brands.

If that statistic alone is not enough motivation to create videos, here are two examples of viral video campaigns that captured a lot of positive attention.

They show how videos can engage and delight audiences across different industries.

5. Reddit’s IPO Video

Reddit posted a video on the day of its public launch. It went viral for all the right reasons.

This 90-second video features Reddit’s mascot, Snoo. In it, several people on the Reddit team talk about what it’s like working with Snoo. And the video ends with Snoo ringing the bell at the NY Stock Exchange building.

This video is quirky and hilarious. It reflects Reddit’s personality as a social brand.

LinkedIn – Reddit – IPO video

The video is unlike what most companies would post for a major milestone like an IPO. Reddit uses humor to appear more relatable.

And it paid off!

The video received close to 2,000 reactions and over 100 reposts on LinkedIn. It also got over 39,000 views on X. The results demonstrate the power of a well-crafted video with a unique approach.

What sets Reddit’s IPO video apart is its uniquely authentic tone. This example shows that it’s okay to not play by the rules if you want to stay true to your brand personality.

Focus more on what resonates with your audience. Creating relatable content with relevance to your niche can pay off even when you’re making major announcements.

6. Hyro’s Funding Announcement

Hyro, a conversational AI company, announced its $20 million Series B funding with a hilarious video campaign .

The video playfully personifies ChatGPT and Hyro as assistants talking to a user. It perfectly exemplifies how an adaptive conversational AI platform like Hyro is different from intent-based chatbots and LLMs like ChatGPT.

LinkedIn – Hyro – Funding video

Ziv Gidron, the company’s Head of Content, explains their approach to creating such an enjoyable video for a major announcement:

case study on content recommendation

“As a veteran conversational AI company in the wake of ChatGPT’s 2023 boom, we decided to ride the hype wave rather than drown in it by pointing out what makes us different—and in the context of heavily regulated industries such as healthcare, better and safer.”

This video garnered 15,000 impressions, 10,000 views, 308 clicks, 162 reactions, and a strong 5% engagement rate.

Hyro – Video comments

The Content Head shared that, in creating this video, the team found inspiration from Apple’s Mac versus PC ads.

The lesson? Find something iconic from the past and put your own creative spin on it.

But remember to work with a bigger picture in mind. A new spin on an old favorite is much more exciting than creating a copy-paste version of an old trend.

Creative Social Media Posts

Social media channels are a space where you can instantly connect with your users. Even turn them into raving fans.

Here are three amazing content marketing examples of success on socials.

7. tl;dv’s Instagram

tl;dv is an AI assistant that records, transcribes, and summarizes meetings.

The brand’s social media team includes a couple of video creators. They’re known for making short sketches enacting real-life scenarios in a SaaS company.

Their goal?

Entertain the audience > promote the product.

For example, they created an Instagram reel sharing a funny conversation about a product feature between a sales rep, customer success manager, and prospect.

Instagram – tl;dv

tl;dv’s content is comical, relatable, and worth sharing.

Ian Evans, one of the creators in tl;dv’s social team, summarizes their social media strategy in one line:

case study on content recommendation

“Instead of telling people about our product, we’re commiserating with the people who could use our product.”

He also explains why creating good content matters more than self-promotional content: You have to show people you understand their struggles, joys, and frustrations. Like a best friend. So, they like hanging out with you.

LinkedIn – Ian Evans post

Most people use platforms like TikTok and Instagram for leisure. They likely don’t want to learn how your product works or what services you can deliver.

So, instead of talking just about your brand, build an audience on these channels by creating content that speaks to your users and makes them feel seen.

8. Headspace’s Instagram

Headspace is a mindfulness app designed to improve mental wellbeing. The brand’s social media strategy is rooted in its mission to empower people and improve their mental health.

On Instagram, the brand uses vibrant illustrations and reels to share uplifting messages, like you see in this grid.

Instagram – Headspace

You’ll find a mix of different kinds of content on Headspace’s Instagram:

  • Brief meditation activities
  • Expert advice on critical subjects
  • Breathing exercise reels (with millions of views)

Instagram – Headspace video

Headspace’s example shows what you can achieve when you focus on helping your audience with genuinely meaningful advice. It also offers great inspiration for creating impactful visuals.

If you want to build a design-first social media strategy, jump to Headspace’s Instagram for some inspiration.

9. HubSpot’s LinkedIn

HubSpot is a one-stop platform with tools and resources for marketing, sales, and customer service.

The brand offers another case study in which marketing content uses humor to delight an audience. And keep them coming back for more.

The team leans heavily on memes, one-liners, and hilarious posts on a supposedly serious platform like LinkedIn.

Each post gets hundreds of likes. HubSpot often goes viral on the platform.

HubSpot – LinkedIn posts

Chi Thukral, the Team Lead on HubSpot’s social team, changed the way the brand approached social media. In a podcast with Jordan Scheltgen , Thukral mentioned:

case study on content recommendation

“Before I was hired, memes didn’t exist on HubSpot’s page at all. Never something they dabbled in. I joined when the Barbie movie came out. So, I pushed for the leadership to try a Barbie meme just to test it out. And that post went viral in a way that the brand had never seen before.”

LinkedIn – HubSpot – Barbie meme

HubSpot’s LinkedIn activity is proof that memes can work well for content marketing campaigns. Especially when you know your audience well enough to predict what will tickle their funny bone.

Don’t be afraid to include comical and witty content in your social media strategy. Use humor to give your audience a good laugh and stay top of mind.

To create relatable content, test what works best for your users—memes, one-liners, pop-culture references, and more.

Interactive Content Assets

Interactive content can capture people’s attention much faster than static posts can. These assets often engage users better for longer. And have a higher chance of getting shared.

Check out these interactive content marketing examples before you create one for your brand.

10. Anecdote’s AI Candle Generator

Anecdote Candles sells unique fragrances by crowdsourcing new ideas. In the same spirit, the brand created an AI candle generator to design custom candles based on people’s favorite memories and moods.

Anecdote Candles – AI Generator

This tool asks for a simple input—describe a moment. The user can keep it short or make it as detailed as they want.

AI Generator input

When they hit generate, the tool creates a new candle name and copy around the users’ response to the prompt.

Users can hit Shuffle to get multiple options and choose their favorite one.

AI Generator result

Then, they customize different details to create their own candle. And place an order.

The AI candle generator gives shoppers an immersive experience, building a candle on their own from start to finish.

Anecdote Candles – Customizer

Anecdote’s candle generator involves the buyer in the production process and gives them complete control over the end product.

You can pull off something similar to get creative ideas directly from your audience. Give them an interactive tool to share their memories, photos, or other details. Convert these inputs into a customized output for each user—like a badge, a title, or even a product to buy.

11. Slite’s Time-Saving Calculator

Slite, a knowledge base platform, has a time saving calculator .

To highlight how much time and money teams could save by making knowledge easily accessible to everyone. Visitors can use the sliding scale on the digital calculator to define the number of people on their team and their average monthly salary.

Then, this tool will determine how much time and money that prospective customers could save by documenting knowledge in one place.

Slite – Time saving calculator

Not just that, the calculator also suggests a fun activity specifically based on the number of hours and money it could save. Like how many pancakes the person can make. Or how many hours they could be meditating.

It’s an imaginative way of nudging people to save time. With Slite.

Time saving calculator – Results

If your product/service has a direct, quantifiable benefit for users, use interactive content to help them realize the possibilities.

Choose any content format that works best in your context. Like a quiz, calculator, poll, games, and more.

12. Coffee Bros’s Coffee-to-Water Ratio Calculator

Coffee Bros sells a wide variety of fresh-roasted coffee. The team created a coffee-to-water ratio calculator .

Why? To help anyone brew the right balance of coffee and water.

The calculator includes seven steps where users can choose any of the available options, like their brewing method, drink size, and roast level.

Coffeebros – Coffee to Water Ratio Calculator

Based on the inputs, the calculator suggests the ideal brewing time and creates a recipe table.

It’s a quick and convenient tool for coffee enthusiasts to try different types of coffee with the right recipe.

Coffee to Water Ratio Calculator – Steps

Coffee Bros targeted a fairly common pain point among their audience: How to make the perfect brew. They created an interactive tool to solve this problem and gently nudge people to explore their products.

To create something similar, identify a few problem statements where you can quickly offer a customized solution. Then, build an interactive tool—like a quiz, calculator, or something else—to understand the exact problem and share a few ways to solve it.

13. Figma’s Cost Comparison Calculator

Figma created a cost-comparison calculator to help buyers make informed decisions.

Enter information about your team size and the expected duration of your contract with a solution provider. Then, you see what each of Figma’s most popular competitors would cost you. Alongside Figma’s price.

Figma – Cost comparison calculator

The calculator gives an honest price estimation. And you can see a list of features you get if you choose Figma.

Figma – Cost comparison

Cost is one of the biggest factors among buyers. You can make a strong case for your brand with a cost calculator like this.

Show buyers exactly how much they’ll pay to use your product vs competing products. And what features they’ll get for their investment in your solution.

14. Miro’s “Look Back” Board

Miro’s ”look back” board recaps all the features the team rolled out in 2023. Across several different categories.

Each section of this board is beautifully designed with a combination of visual and textual content. And focuses on a specific part of the product, like Content & Data Visualization, Workshops & Async Collaboration, and more.

Miro – Look back board

You can zoom in to see different updates in each section.

Besides screenshots, GIFs, and text blocks, these sections also include “Talktracks.” In them, members of the Miro team walk users through different updates in said section.

You can click on any of the sections in the intro segment, and the board will take you directly to that part. For example, if you want to learn more about Miro’s AI features, you can click on the artificial intelligence box in blue.

Miro board – Talktracks

Miro presents the perfect example of using your product to interact with your audience. Especially if you have a communication-centric product.

Create a virtual playground for users to choose their own adventure and interact with different elements.

You need a big-picture idea of an explorable space you want to create—like a workspace, an arcade, a carnival (like Miro’s). Then, place your content assets in different parts of that explorable space with clear instructions to interact.

High-Impact Case Studies

Case studies play a crucial role in the buyer journey. That’s why 36% of marketers consider them to be the best-performing content type.

These case study examples will give you a new way to share customer stories.

15. Gong’s First-Person Case Study

Gong, a revenue intelligence tool, created this first-person case study .

Unlike most case studies, which are authored by the business touting its solution, this one is written by Kieran Smith, the point-of-contact from the client’s team. In a first-person POV.

It opens with the story of all the roles he worked in before he finally landed at Andela (the client company showcased in the piece).

Gong – Case study – Andela

This case study deviates from the standard structure of problem-solution-impact.

Instead, it reads as his personal story. He shares all the changes he introduced to enhance reps’ knowledge and shorten the sales cycle. And one common element in all this progress was (you guessed it) Gong.

Gong – Andela details

This example shows you don’t have to follow the same old playbook when creating case studies.

Instead, focus on involving customers in the process of creating a case study. And whenever there’s a chance, write it from their perspective.

Even if you follow a more traditional case study format, take inspiration from this example by including personal details about your customers or your team to draw readers in.

16. Huckberry’s Everyday Carry Dump Videos

Huckberry is a men’s fashion brand known for its adventure gear.

The brand has a creative way of documenting customer stories through YouTube videos.

Huckberry has a YouTube playlist called “ Everyday Carry Dump ”. It’s a collection of over 30 videos where customers share their experiences of using different Huckberry bags.

YouTube – Huckberry playlist

Each video also focuses on a specific use case, like:

  • Camping trip
  • Biking essentials
  • Cooking on the road
  • Art and design needs

These videos capture customer stories while showcasing different products in action.

YouTube – Huckberry videos

For example, this video shows how a serial entrepreneur packs his Huckberry backpack with all his productivity needs.

Huckberry offers the perfect example of how you can share customer stories through engaging videos. These videos target several personas and offer some good inspiration for people looking to buy a Huckberry backpack.

You can invite customers to your studio and create similar videos. Or encourage/incentivize happy customers to share their experiences with your products on social media. Then, repurpose that user-generated content on your own social media, on landing pages, and to run as ads.

17. Mutiny’s Customer Playbooks

Mutiny is a website-personalization tool for companies to run targeted, revenue-generating campaigns.

The team used a unique concept to turn customer success stories into start-to-finish playbooks. They look like this:

Mutiny – Playbooks

Each playbook includes a problem statement, a hypothesis, and a step-by-step solution.

The solution section of each playbook lists all the steps and tactics the client followed to tackle the big challenge. This image illustrates how each page highlights the hypothesis with a remark from the client. Then, it starts outlining all the steps in the solution section.

Mutiny – Playbook steps

Stewart Hillhouse, the Head of Content at Mutiny, explains the team’s primary goal for creating these playbooks:

case study on content recommendation

“Traditional case studies aren’t useful for the reader. That’s why we wanted to create playbooks that showcase how a customer could drive a meaningful impact for their company, while also giving the reader tactics and frameworks they could apply to their work (with or without Mutiny).”

He also mentions that the playbooks are the most popular content on their website. He says they have doubled their site’s conversion rate.

You can share customer stories to build social proof. And give potential customers a blueprint of how to be successful with your product.

Give buyers a framework to replicate at their organization with these customer stories. You can also synthesize these stories in visual, shorthand, or playful formats. Think: Creating a comic strip out of a case study for sharing on social media.

Interesting Infographics

Infographics are perfect for leisure reading. They’re ultra-consumable while scrolling through social media or checking all unread emails.

They offer helpful insights without overwhelming readers with too much information.

Here are a few creative examples of infographics you can draw inspiration from:

18. Postmark’s Comics

Postmark, an email delivery app, created a series of comics called Postmark Express . They explain the more technical industry terms. Like “dunning.”

A comic called “Dun Dun Dunning” shares the story of a superhero owl (Dunning) and a villain skunk (Churn). It shares a short story around these characters in the classic comic strip format with funky visuals and crisp dialogs.

Postmark – Dun Dun Dunning

It’s playful, intriguing, and unlike anything you’d expect to see when you look up “dunning emails.”

Dr. Fio Dosetto, the creator of Postmark Express and the former Head of Brand at Postmark, shares how the team introduced “sparking joy” as one of the main KPIs for this campaign.

Several people found the idea of the comics and this KPI interesting.

LinkedIn – Fio Dosetto – Comments

Comics are one of the less common content marketing examples.

They work best when you explain complex concepts with intriguing stories. And relatable characters.

19. Teal’s Job Search Cheat Sheet

Teal is a career growth platform. They created an A-to-Z cheat sheet with their best tips for people to find their next role.

It uses the bento design style to cover multiple aspects of job search. Like networking, interviews, upskilling, job applications, and more.

This design style organizes the information in an easily scannable structure. Readers can find the most crucial tips for networking and interview highlights in colored boxes.

Teal – Job Cheat Sheet

Infographics like these work well on social media platforms. When users don’t want to spend a lot of time reading big blocks of text.

You can create similar cheat sheets to help users crack the code for any particular subject. Remember to avoid cramming too much into one image. Make the information quickly scannable and easy to read.

Value-Packed Email Campaigns

Never underestimate the power of email marketing. It’s your space to nurture a long-term relationship with your audience.

Let’s look at these two great examples to level up your email marketing strategy:

20. Coda’s The Docket

Coda is a writing and documentation tool with a monthly newsletter called “The Docket.”

This is a good email marketing example to draw inspiration from because it comes with:

  • Recommendations for good reading resources
  • Five most-viewed docs created by Coda users
  • Relevant templates for the target audience

Coda – The Docket

Instead of spamming them with higher-volume, lower-value emails, Coda uses this monthly newsletter to deliver value to its users and reinforce itself as a trusted resource.

This newsletter also features user-generated content in one section. It curates some of the most popular docs created by Coda users.

Coda newsletter – Trends

You have the opportunity to be super creative with one-off email marketing campaigns. Those are great for deals and discounts.

But when it’s about relationship-building emails, you need something more value-driven like Coda.

21. Dohful’s Sunday Letters

Dohful is an artisanal cookie brand that relies heavily on email marketing to stay top of mind and drive sales.

Arushi Sachdeva founded Dohful. She sends out a weekly newsletter called The Sunday Letter . In it, she shares personal stories and company news.

In one of the editions, Sachdeva shared why she started Sunday Letters and how readers enjoyed these emails.

Dohful – Sunday Newsletter

For readers, the emails feel like a no-filter conversation.

Sachdeva gives a peek into everything about the business—launching new products, making key decisions, talking about mistakes, and more.

Like in one email, she discusses how their new flavor launch tanked, and they had to discontinue it.

Dohful – Cookie newsletter

Dohful’s newsletters are a wonderful example for small businesses to build meaningful relationships with their audience.

You can create newsletters showing some behind-the-scenes snippets from your business. Instead of simply selling them something, share your stories to make them feel more connected to your brand.

22. Podia’s Topical Email

This email by Podia is an awesome example of how brands can use trendjacking to resonate more with their audience.

In this case, the team jumped on the Taylor Swift—Travis Kelce hype train. In an email about Podia’s website builder, of all things.

The email shows the actual websites created for Taylor and Travis with Podia. And talks about the ease of using Podia’s website builder with multiple customization options to add embeds, adjust colors, spacing, and a lot more.

Podia – Newsletter

And the subject line—Do Taylor Swift and Travis Kelce have blogs on Podia?—is intriguing enough to drive opens.

When done right, trendjacking content can create a memorable experience for your audience.

Identify trends that would interest your target audience. Then, create a message that couples that trending content with information centered around their needs or pain points to truly capture their attention.

Over to You: Start Planning Your Next Big Campaign

That’s a wrap on our favorite content marketing examples. May they get your creative juices running the next time you’re looking for big ideas.

The success of any content campaign boils down to how well you know your audience. So, start by building user personas for each channel if you haven’t already done so. Then, brainstorm new content ideas.

Check out our Digital Content Strategy Template to document your research.

This paper is in the following e-collection/theme issue:

Published on 29.5.2024 in Vol 11 (2024)

Enabling Health Information Recommendation Using Crowdsourced Refinement in Web-Based Health Information Applications: User-Centered Design Approach and EndoZone Informatics Case Study

Authors of this article:

Author Orcid Image

There are no citations yet available for this article according to Crossref .

Ready to choose Fredonia? Make your deposit today!

Fredonia Logo

Increase in residence hall recycling focus of study by students, faculty published in journal

Recycling bins

Recent SUNY Fredonia alumni and School of Business faculty had their case study, which developed recommendations to maximize recycling in residence halls, published in the May 2024 edition of the journal Lean & Six Sigma Review.

The project was undertaken by Christopher Shepp, who earned a B.S. in Public Accountancy in December 2022, and Taylor Lemiszko, who graduated with a B.S. in Business Administration: Management in May 2023, along with Professor Reneta Barneva and Associate Professor Lisa Walters.

Their experiential learning exercise to study and leverage Lean & Six Sigma to identify recommendations was done in conjunction with BUAD 427: Applied Quality Operations in the Spring 2022 semester. A premise of the study was waste generation on university campuses can be comparable to that of small towns, so even modest improvements in practices can prove impactful, Dr. Walters explained.

These practical findings demonstrated that LSS (Lean & Six Sigma) can be an effective way to involve students in experiential learning and environmental planning.” - Dr. Lisa Walters

In the study, the student/faculty Lean & Six Sigma team analyzed a dormitory’s current recycling program and proposed improvements to meet the university’s sustainability objectives, according to the study’s abstract.

The team found that a number of improvements – some as simple as adding and relocating recycling bins, creating engaging signage and establishing a campus recycling competition – could be implemented to increase recycling volume.

“These practical findings demonstrated that LSS can be an effective way to involve students in experiential learning and environmental planning,” Walters said.

The team’s research was facilitated by Director of Environmental Health and Safety and Sustainability Sarah Laurie, Interim Co-director of Facilities Services Mark Delcamp and Director of Residence Life Kathy Forster.

As a result of sustainability studies undertaken by Walters’ classes, the order or positioning of individual indoor receptacle bin receptacles – going from left to right – has been changed, so the bin designated for recycled items is on the far left, where studies show it’s more likely to be selected first to dispose of recyclables. Recycling bins had been positioned on the far right, so they were the last bin chosen.

Lean & Six Sigma Review, a peer-reviewed magazine published by the American Society for Quality, provides a holistic view of lean and Six Sigma, from the basics to the boardroom, according to its website. The journal addresses the various professional development needs of Six Sigma executives, Champions, Master Black Belts, Black Belts, Green Belts, and Yellow Belts.

  • School of Business
  • Business Administration

You May Also Like

SUNY Fredonia President Stephen Kolison at Lectern

SUNY Fredonia Class of 2024 encouraged to 'Embrace the Journey' ahead

The Class of 2024 at SUNY Fredonia was encouraged to “put in the work necessary” and “embrace the journey ahead” during the university’s 197th annual Commencement ceremony Saturday.

President Stephen H. Kolison Jr. congratulates a student crossing the stage.

SUNY Fredonia Commencement on Saturday

Commencement 2024 at SUNY Fredonia will include ceremonies at 10 a.m. and 3 p.m. on Saturday, May 18, in the Steele Hall arena, combining undergraduate and master’s degree and advanced certificate graduates.

Kristine Hsia singing

Alumna returns to campus as lead vocalist with Navy’s premier jazz ensemble

Alumna Kristine Hsia will make a special homecoming visit in October when she returns to SUNY Fredonia for a concert performance as a member of the U.S. Navy Band Commodores.

  • Work & Careers
  • Life & Arts
  • Currently reading: Business school teaching case study: Unilever chief signals rethink on ESG
  • Business school teaching case study: can green hydrogen’s potential be realised?
  • Business school teaching case study: how electric vehicles pose tricky trade dilemmas
  • Business school teaching case study: is private equity responsible for child labour violations?

Business school teaching case study: Unilever chief signals rethink on ESG

A smiling middle-aged Caucasian man in a light blue shirt in front of shelves stocked with various household cleaning products

  • Business school teaching case study: Unilever chief signals rethink on ESG on x (opens in a new window)
  • Business school teaching case study: Unilever chief signals rethink on ESG on facebook (opens in a new window)
  • Business school teaching case study: Unilever chief signals rethink on ESG on linkedin (opens in a new window)
  • Business school teaching case study: Unilever chief signals rethink on ESG on whatsapp (opens in a new window)

Gabriela Salinas and Jeeva Somasundaram

Roula Khalaf, Editor of the FT, selects her favourite stories in this weekly newsletter.

In April this year, Hein Schumacher, chief executive of Unilever, announced that the company was entering a “new era for sustainability leadership”, and signalled a shift from the central priority promoted under his predecessor , Alan Jope.

While Jope saw lack of social purpose or environmental sustainability as the way to prune brands from the portfolio, Schumacher has adopted a more balanced approach between purpose and profit. He stresses that Unilever should deliver on both sustainability commitments and financial goals. This approach, which we dub “realistic sustainability”, aims to balance long- and short-term environmental goals, ambition, and delivery.

As a result, Unilever’s refreshed sustainability agenda focuses harder on fewer commitments that the company says remain “very stretching”. In practice, this entails extending deadlines for taking action as well as reducing the scale of its targets for environmental, social and governance measures.

Such backpedalling is becoming widespread — with many companies retracting their commitments to climate targets , for example. According to FactSet, a US financial data and software provider, the number of US companies in the S&P 500 index mentioning “ESG” on their earnings calls has declined sharply : from a peak of 155 in the fourth quarter 2021 to just 29 two years later. This trend towards playing down a company’s ESG efforts, from fear of greater scrutiny or of accusations of empty claims, even has a name: “greenhushing”.

Test yourself

This is the fourth in a series of monthly business school-style teaching case studies devoted to the responsible business dilemmas faced by organisations. Read the piece and FT articles suggested at the end before considering the questions raised.

About the authors: Gabriela Salinas is an adjunct professor of marketing at IE University; Jeeva Somasundaram is an assistant professor of decision sciences in operations and technology at IE University.

The series forms part of a wider collection of FT ‘instant teaching case studies ’, featured across our Business Education publications, that explore management challenges.

The change in approach is not limited to regulatory compliance and corporate reporting; it also affects consumer communications. While Jope believed that brands sold more when “guided by a purpose”, Schumacher argues that “we don’t want to force fit [purpose] on brands unnecessarily”.

His more nuanced view aligns with evidence that consumers’ responses to the sustainability and purpose communication attached to brand names depend on two key variables: the type of industry in which the brand operates; and the specific aspect of sustainability being communicated.

In terms of the sustainability message, research in the Journal of Business Ethics found consumers can be less interested when product functionality is key. Furthermore, a UK survey in 2022 found that about 15 per cent of consumers believed brands should support social causes, but nearly 60 per cent said they would rather see brand owners pay taxes and treat people fairly.

Among investors, too, “anti-purpose” and “anti-ESG” sentiment is growing. One (unnamed) leading bond fund manager even suggested to the FT that “ESG will be dead in five years”.

Media reports on the adverse impact of ESG controversies on investment are certainly now more frequent. For example, while Jope was still at the helm, the FT reported criticism of Unilever by influential fund manager Terry Smith for displaying sustainability credentials at the expense of managing the business.

Yet some executives feel under pressure to take a stand on environmental and social issues — in many cases believing they are morally obliged to do so or through a desire to improve their own reputations. This pressure may lead to a conflict with shareholders if sustainability becomes a promotional tool for managers, or for their personal social responsibility agenda, rather than creating business value .

Such opportunistic behaviours may lead to a perception that corporate sustainability policies are pursued only because of public image concerns.

Alison Taylor, at NYU Stern School of Business, recently described Unilever’s old materiality map — a visual representation of how companies assess which social and environmental factors matter most to them — to Sustainability magazine. She depicted it as an example of “baggy, vague, overambitious goals and self-aggrandising commitments that make little sense and falsely suggest a mayonnaise and soap company can solve intractable societal problems”.

In contrast, the “realism” approach of Schumacher is being promulgated as both more honest and more feasible. Former investment banker Alex Edmans, at London Business School, has coined the term “rational sustainability” to describe an approach that integrates financial principles into decision-making, and avoids using sustainability primarily for enhancing social image and reputation.

Such “rational sustainability” encompasses any business activity that creates long-term value — including product innovation, productivity enhancements, or corporate culture initiatives, regardless of whether they fall under the traditional ESG framework.

Similarly, Schumacher’s approach aims for fewer targets with greater impact, all while keeping financial objectives in sight.

Complex objectives, such as having a positive impact on the world, may be best achieved indirectly, as expounded by economist John Kay in his book, Obliquity . Schumacher’s “realistic sustainability” approach means focusing on long-term value creation, placing customers and investors to the fore. Saving the planet begins with meaningfully helping a company’s consumers and investors. Without their support, broader sustainability efforts risk failure.

Questions for discussion

Read: Unilever has ‘lost the plot’ by fixating on sustainability, says Terry Smith

Companies take step back from making climate target promises

The real impact of the ESG backlash

Unilever’s new chief says corporate purpose can be ‘unwelcome distraction ’

Unilever says new laxer environmental targets aim for ‘realism’

How should business executives incorporate ESG criteria in their commercial, investor, internal, and external communications? How can they strike a balance between purpose and profits?

How does purpose affect business and brand value? Under what circumstances or conditions can the impact of purpose be positive, neutral, or negative?

Are brands vehicles by which to drive social or environmental change? Is this the primary role of brands in the 21st century or do profits and clients’ needs come first?

Which categories or sectors might benefit most from strongly articulating and communicating a corporate purpose? Are there instances in which it might backfire?

In your opinion, is it necessary for brands to take a stance on social issues? Why or why not, and when?

Climate Capital

case study on content recommendation

Where climate change meets business, markets and politics. Explore the FT’s coverage here .

Are you curious about the FT’s environmental sustainability commitments? Find out more about our science-based targets here

Promoted Content

Explore the series.

Close-up of a green and white sign featuring the chemical symbol for hydrogen, ‘H2’

Follow the topics in this article

  • Sustainability Add to myFT
  • Impact investing Add to myFT
  • Corporate governance Add to myFT
  • Corporate social responsibility Add to myFT
  • Business school case Add to myFT

International Edition

IMAGES

  1. Research Recommendation Sample Pdf

    case study on content recommendation

  2. Tutorial 5- Content Based Recommendation System

    case study on content recommendation

  3. (PDF) Recommendations for Using the Case Study Method in International

    case study on content recommendation

  4. 12+ Case Study Examples

    case study on content recommendation

  5. 15+ Case Study Examples, Design Tips & Templates

    case study on content recommendation

  6. The 6 best content recommendation strategies

    case study on content recommendation

VIDEO

  1. [Webedia France] Case Study

  2. HOW TO WRITE THE CONCLUSION AND RECOMMENDATION OF CHAPTER 5

  3. What is Content Validity in Survey Research?

  4. 80 Articles, $900/Month: Simple Blogging Success Strategy!

  5. Part 3: SEO + Content Marketing Case Study for Recipe Website

  6. Why create your own hashtag on social media

COMMENTS

  1. Netflix Recommender System

    The study of the recommendation system is a branch of information filtering systems (Recommender system, 2020). Information filtering systems deal with removing unnecessary information from the data stream before it reaches a human. Recommendation systems deal with recommending a product or assigning a rating to item.

  2. Case Study: How Netflix Uses AI to Personalize Content Recommendations

    Here are a few tips for businesses from the Netflix case study: Collect data on your customers' actions and preferences. You can use this data to tailor your marketing messages, suggest products ...

  3. Case Study: Netflix's Machine Learning Algorithm-Powered Content

    This case study delves into how Netflix utilizes advanced machine learning techniques to recommend content to users, illustrating the profound impact of AI on user experience and its contribution ...

  4. How To Make Recommendation in Case Study (With Examples)

    How To Write Recommendation in Case Study. 1. Review Your Case Study's Problem. 2. Assess Your Case Study's Alternative Courses of Action. 3. Pick Your Case Study's Best Alternative Course of Action. 4. Explain in Detail Why You Recommend Your Preferred Course of Action.

  5. Deep learning for recommender systems: A Netflix case study

    Figure 1 displays a Netflix homepage with red circles enumerating different recommendation tasks, each of which is powered by a different algorithm. For example, there is a dedicated algorithm (1) for choosing the first video to display prominently at the top of the homepage, another one for ranking already-watched videos that the user may want to continue watching (7), as well as others ...

  6. How Content Recommendation Platforms Capture Users' Attention?

    Drive engagement by persuading readers to take action. Recommend practical steps or actionable tips related to the blog's content, motivating them to implement the insights shared. 4. Request a Share of the Reader's Post: Encourage social sharing by explicitly asking readers to share your blog post with their network.

  7. Before describing the technology: A Case Study on Recommendation Engine

    Here are some use cases of Recommendation Engines: According to a recent study, 75% of consumers of Netflix watch contents that are recommended by the system. The executives of Netflix say their ...

  8. How to Write a Case Study: Bookmarkable Guide & Template

    5. Contact your candidate for permission to write about them. To get the case study candidate involved, you have to set the stage for clear and open communication. That means outlining expectations and a timeline right away — not having those is one of the biggest culprits in delayed case study creation.

  9. Deep Learning for Recommender Systems: A Netflix Case Study

    In this article, we outline some of the challenges encountered and lessons learned in using deep learning for recommender systems at Netflix. We first provide an overview of the various recommendation tasks on the Netflix service. We found that different model architectures excel at different tasks. Even though many deep-learning models can be ...

  10. UX Design case study: New feature for content recommendations on

    Users love Netflix but… there is always room for improvement. T his solo project explores the idea of adding a new feature to the Netflix UI for content recommendations between family and friends as a new way to discover content on the platform. The project was completed during the month of December 2022 and is a conceptual project created in order to improve and put into practice my UX ...

  11. What is Content Recommendation?

    A content recommendation platform, also known as a content discovery platform, is the software that makes content recommendations work. It uses algorithms to automatically identify and recommend relevant content to website visitors based on preset criteria. For example, you can choose to recommend content on a particular topic to visitors to a ...

  12. An empirical study of content-based recommendation systems in mobile

    This study incorporates social network analysis and econometric models to empirically examine the impact of content-based filtering (CBF) recommendation systems on the distribution of demand in mobile app markets. The analysis results of two comprehensive panel datasets from App Store and Google Play suggest that CBF recommendations favor niche ...

  13. How to Write Recommendations in Research

    Recommendations for future research should be: Concrete and specific. Supported with a clear rationale. Directly connected to your research. Overall, strive to highlight ways other researchers can reproduce or replicate your results to draw further conclusions, and suggest different directions that future research can take, if applicable.

  14. A Complete Study of Amazon's Recommendation System

    Amazon is the largest e-commerce brand in the world in terms of revenue and market share. ( Statista) In 2021, Amazon's net revenue from e-commerce sales was US$470 billion, and about 35 percent of all sales on Amazon happen via recommendations. This clearly elucidates the power of recommendations. In this case study, we look at how Amazon is ...

  15. content-based dataset recommendation system for researchers—a case

    As per the best of our knowledge, this is the first study of its kind on content-based dataset recommendation. We hope that this system will further promote data sharing, offset the researchers' workload in identifying the right dataset and increase the reusability of biomedical datasets.

  16. Better Generalization with Semantic IDs: A Case Study in Ranking for

    A Case Study in Ranking for Recommendations. ... In this paper, we study content-based item representations that can improve the generalization for new and long-tail item distributions while keeping models' power of memorization without sacrificing overall quality, with a focus on recommendation ranking models. ...

  17. Recommendation system using content filtering: A case study for college

    Recommender system known as information gathering system aims at creating an algorithm which, keeps in consideration the diverse needs and varying level of competence. It offers better opportunities in project development cycle under requirement phase and design phase. Social media and Ecommerce market has tapped in the recommender system to boost its growth by providing with precise results ...

  18. What Is a Case Study?

    Case studies are good for describing, comparing, evaluating and understanding different aspects of a research problem. Table of contents. When to do a case study. Step 1: Select a case. Step 2: Build a theoretical framework. Step 3: Collect your data. Step 4: Describe and analyze the case.

  19. (PDF) A Case Study on Recommendation Systems Based on Big Data

    A Case Study on Recommendation Systems Based on Big Data: Proceedings of the Second International Conference on SCI 2018, Volume 2 January 2019 DOI: 10.1007/978-981-13-1927-3_44

  20. Writing a Case Study

    A case study is a research method that involves an in-depth analysis of a real-life phenomenon or situation. Learn how to write a case study for your social sciences research assignments with this helpful guide from USC Library. Find out how to define the case, select the data sources, analyze the evidence, and report the results.

  21. Writing a Case Study Analysis

    Identify the key problems and issues in the case study. Formulate and include a thesis statement, summarizing the outcome of your analysis in 1-2 sentences. Background. Set the scene: background information, relevant facts, and the most important issues. Demonstrate that you have researched the problems in this case study. Evaluation of the Case

  22. How To Make Recommendation in Case Study (With Examples)

    To help you understand how to make recommendations in a case study, let's take an look at some examples below. Example 1. Case Study Symptom: Lemongate Hotel is facing on overwhelming raising in the number of reserve due until adenine instantly implementation of a Local Public policy that boosts one city's business.Although Lemongate Hotel has a sufficient area to house the influx of ...

  23. Case Study Methodology of Qualitative Research: Key Attributes and

    A case study is one of the most commonly used methodologies of social research. This article attempts to look into the various dimensions of a case study research strategy, the different epistemological strands which determine the particular case study type and approach adopted in the field, discusses the factors which can enhance the effectiveness of a case study research, and the debate ...

  24. How To Make Recommendation in Case Study (With Examples)

    1. Test Choose Case Study's Problem. 2. Assess Your Case Study's Alternative Lessons of Action. 3. Click Your Case Study's Best Alternative Course of Action. 4. Explain in Point Conundrum You Recommend Your Preferred Course about Action. Examples of Recommendations in Case Study.

  25. 22 Content Marketing Examples to Inspire You

    High-Impact Case Studies. Case studies play a crucial role in the buyer journey. That's why 36% of marketers consider them to be the best-performing content type. These case study examples will give you a new way to share customer stories. 15. Gong's First-Person Case Study. Gong, a revenue intelligence tool, created this first-person case ...

  26. Technology Content Marketing Research 2024

    As in the previous year, the three most popular content types are short articles/posts (96%), case studies/customer stories (93%), and videos (90%). Eighty-two percent use thought leadership e-books/white papers, 81% use long articles/posts, 63% use data visualizations/visual content, 62% use product/technical data sheets, and 56% use research ...

  27. Enabling Health Information Recommendation Using Crowdsourced

    By harnessing user characteristics and feedback for content ranking, this methodology enables the creation of personalized recommendations that align with individual user needs within trusted health applications. ... Using a case study, we demonstrate the practical application of the proposed methodology through the implementation of ...

  28. Increase in residence hall recycling focus of study by students

    Their experiential learning exercise to study and leverage Lean & Six Sigma to identify recommendations was done in conjunction with BUAD 427: Applied Quality Operations in the Spring 2022 semester. A premise of the study was waste generation on university campuses can be comparable to that of small towns, so even modest improvements in ...

  29. Business school teaching case study: Unilever chief signals rethink on ESG

    Unilever has 'lost the plot' by fixating on sustainability, says Terry Smith. Companies take step back from making climate target promises. The real impact of the ESG backlash. Unilever's ...

  30. Application of Big Data of Appraisal System: a Case Study of News

    This study will apply big data technology to explore the resources of Chinese cultural news discourse, and deeply analyze the application of big data in the engagement system, attitude system, and graduation system, in order to provide references for related research.