How do you build a data product?
A data product is a software application or tool that incorporates data to assist organizations in making better decisions and processes. Non-data scientists may use Data Science to include predictive analytics, descriptive data modeling, data processing, deep learning, risk assessment, and various research methods through data elements with a convenient user interface. The main driver for organization adoption is achieving market goals through informed decisions made with insights from data elements. Here, we will provide you with a step by step guidance on how to build your own data products.
May 27 2021 ● 8 min read
How to Build Great Data Products
Data-driven and machine learning-based products can be an effective way to meet consumer needs. They can also build a "data moat" to keep competitors at bay. Google search and Amazon product recommendations are two prime examples, both of which are growing as more people engage with them. However, the opportunity goes far beyond the tech giants, businesses: of all sizes and industries investing in their data-driven products.
The lifecycle of a so-called "data product" is similar to that of a traditional product: find a way to address a key customer need, develop an initial version, test its impact, and iterate. However, the data component adds another dimension of complication. To address the problem, businesses can prioritize cross-functional cooperation, analyze and prioritize data product opportunities with a long-term perspective, and start small.
Stage 1: Identify the opportunity
Data products are a team sport
To find the best data product opportunities, the product and business perspective must be combined with the technology and data perspective. Product managers, user analysts, and business leaders have historically had the deep insight and industry knowledge required to identify the critical unmet user and business needs. Meanwhile, data scientists and engineers have a sharp eye for finding practical data-powered applications, as well as a deep understanding of what can and cannot be scaled.
Identifying and prioritizing the right data product opportunities requires bringing these two sides of the table together. A few norms can help:
Educate Data Scientists about the user and business needs:
It would be beneficial for Data Scientists to be in close contact with product managers, consumer analysts, and business leaders and ensure that part of their job is to dive into specific data to identify consumers and their needs.
Have data scientists serve as data evangelists:
They share data opportunities with the rest of the organization. This can range from quick access to raw data and model performance samples in the early stages of ideation to full prototyping at a later stage.
Develop the data-savvy of product and business groups:
Individuals in various functions and industries are becoming more data-savvy, and companies may help accelerate this development by engaging in learning programs. The greater the data literacy of the product and service functions, the more they will communicate with the data science and technology teams.
Give data science a seat at the table:
Data Science will exist under different organizational structures (e.g., centralized or decentralized). However, having data science leaders participate in product and business policy meetings can improve the creation of data products regardless of structure.
Prioritize with an eye on the future:
The best data products get better with age, like a fine wine. For example, data products typically speed up data processing, which improves the program. Consider a recommendation product motivated by self-reported profile data from users. With today's small profile data, initial (or "cold start") recommendations may be uninspiring. However, as people become more likely to complete a profile because it is used to personalize their experience, launching recommendations will accelerate profile selection, thereby optimizing recommendations.
Second, many data products can be adapted to drive a variety of applications. This is not only about distributing expensive R&D across many use cases but also about generating network effects via shared data. Suppose the data generated by each application flows back into the underlying data foundations. In this case, the applications change, leading to increased consumption and thus data collection, and the cycle continues. An example is Coursera's Skills Graph.
Stage 2: Build the product
De-risk by staging execution
In general, data products need confirmation of both the algorithm's functionality and users' acceptability. As a result, data product developers face an intrinsic tension between how long to spend in R&D upfront and how fast to get the framework out to validate that it addresses an essential need. Teams that overspend in technological evaluation before validating product-market match risk wasting R&D resources on the wrong problem or solution.
Teams that overinvest in validating consumer demand without adequate R&D risk delivering an underperforming prototype to consumers, resulting in a false negative. Teams at this end of the continuum might release an MVP with a weak model; if users don't respond well, the outcome might have been different if the application had been supported by stronger R&D.
If there is no magic formula for simultaneously validating the technology and matching the product to the market, staged execution is beneficial. Starting with the basics would expedite both research and the processing of valuable data. For example, when developing our Skills Graph, we initially introduced a skills-based search. This application required only a small subset of the graph, but provided a wealth of additional training data. A collection of MVP methods will also help minimize research time:
Lightweight models: are usually quicker to ship and have the added advantage of being easier to describe, debug, and expand upon over time. While Deep Learning can be powerful (and is definitely on the rise), it is not the best way to start in most situations.
External data sources: Creation can be sped up by using open source or purchasing/partnering solutions. If there is a clear signal from the data the product produces, the product can be optimized to depend on that differentiator.
Narrowing the domain: To begin, you may limit the reach of the algorithmic challenge. For example, some apps may initially be developed and released for only a subset of users or use cases.
Hand-curation: Having people either do the work you ultimately want the model to do, or at least review and optimize the production of the initial model, would help the business grow faster. This is best achieved with a view on how hand-curation measures might be streamlined in the future to scale the product.
Stage 3: Evaluate and iterate
Consider future potential when evaluating the performance of the data
Evaluating results after a launch to make a 'Go' or 'No-go' decision for a data product is more complicated than a quick UI tweak. That is because the data product can change significantly as more data is collected, and foundational data products enable even more features over time. Before you bin a data product that doesn't seem to be a clear winner, ask data scientists to measure the answers to a few key questions. For example, 'how fast is the product evolving organically through data collection', 'how much low-hanging fruit is there for algorithmic improvements', 'what new applications will this enable in the future'.
Depending on the answers to these questions, a product with uninspiring metrics today may well be worth keeping.
Speed of iteration matters
Iteration on both the algorithms and the user interface is common in data commodities. The difficulty is determining where the most valuable iterations can happen based on data and customer input, so teams know which roles are responsible for driving improvements. Try to design the framework so that data scientists can independently implement and validate new models in development where algorithmic iterations are central, such as in dynamic recommendation or contact schemes like Coursera's customized learning interventions.
Businesses of all sizes can accelerate the development of robust data products that address core customer needs, drive the market, and build long-term competitive advantage by promoting cooperation between product and business executives and data scientists, prioritizing investments with a view to the future, and beginning small.
Types of Data Products
There are several different kinds of data products. Even if we narrow down the potential products to those that meet our criteria, there is still a wide range of products. This diversity introduces new complexities to product development.
These data products can be grouped into five different categories: Raw Data, Derived Data, Algorithms, Decision Support, and Automated Decision Making.
- Raw data: Starting with raw data, we compile and make available data in its raw form.
- Derived data: We do some of the computation on our end while supplying users with derived results.
- Algorithms: Then there are algorithms or algorithms-as-a-service. We get some data, run it through an algorithm (machine learning or otherwise), and return knowledge or insights.
- Decision support: We try to provide facts to the customer to help them make a decision, but we don't make the decision ourselves.
- Automated decision-making: We outsource all intelligence within a given domain here. Famous examples include Netflix app reviews and Spotify's Discover Weekly.
What value does Data Product provide?
The correct data products can help organizations and enterprises extract intelligence from their data to make predictions, reduce prices, and produce more sales. Businesses and organizations may consult a wide range of sources to gather this data, including data mining from customers and users, business success indicators, and other sources.
Data products are most important when they meet a specific need within a market. For example, if a business wishes to create an app that will recognize a flower that you hold up to your phone camera, they must make a custom tool that deals with a unique dataset of plant details. Businesses can effectively use customizable data products to transform the abundance of data at their disposal into actionable information tailored to their individual needs. For instance, a call center can analyze the skill level and pace of its employees using data products customized to its needs, and a headhunting company can track how many of its clients are finding employment using data products tailored to its needs.
Published on May 27 2021
WRITTEN BY
Indrė Jankutė-CarmaciuIndrė is a copywriter at Whatagraph with extensive experience in search engine optimization and public relations. She holds a degree in International Relations, while her professional background includes different marketing and advertising niches. She manages to merge marketing strategy and public speaking while educating readers on how to automate their businesses.