Data Management Articles
You get it from your customers, your products, and your employees — data. But now that you have it, how do you sort it out? Let us help.
Articles From Data Management
Filter Results
Article / Updated 01-26-2024
In this article you will learn: What is data observability? Why is data observability necessary for your data platform? What are the must-have features of a strong data observability platform? What is data observability? Data is increasingly important to today’s businesses — and ensuring the quality and reliability of that data is critical. High-quality data is the fuel for everything from building new products to driving accurate decision making. Data observability was created to make ensuring the quality of data easier, faster, and more scalable over the long term. Data observability gives organizations a complete view of their data's health at every stage — from data pipelines to infrastructure — as well as delivering at-a-glance views of dependencies and relationships between datasets. By leveraging data observability, data teams can quickly identify and resolve data quality issues before they reach data consumers, effectively reducing costs, minimizing impact, and driving confidence in the data products it protects. Why is data observability necessary? Data downtime — time when data is incomplete, erroneous, missing, or otherwise inaccurate — can be disastrous for organizations. From misallocated budgets to broken AI models, data quality issues can wreak havoc on organizations of all kinds. While data quality testing and monitoring are relatively common practices, data observability goes beyond the traditional methods of testing and monitoring. Data observability manages and improves data quality at scale by leveraging automated monitoring, custom rules, root cause analysis tools, and impact analysis to not only catch and resolve known data quality incidents faster but to detect and resolve unknown data quality issues as well. Five must-have elements of a strong data observability platform Choosing the right data observability tool can help your company avoid a menagerie of serious and costly data quality incidents, so it’s important to know what features you should have on your shopping list. Below are five features you should look for when considering a data observability solution for your data stack. ML-powered deep and broad data monitoring — both out-of-the-box and custom monitors A key aspect of an effective data observability platform is its use of machine learning (ML) for data monitoring. Platforms with ML enable teams to programmatically identify data quality and performance issues, such as data freshness, volume issues, and schema changes out-of-the-box. Data observability platforms also offer the ability to create custom monitors that are tailored to your specific business needs and applied to your most critical tables, providing deep monitoring where you need it and allowing you to tackle recurrent data issues that can crop up within specific data environments. End-to-end integrations across cloud and on-prem tooling An effective data observability platform should work with tools both in the cloud and on-prem. This necessary integration allows for comprehensive oversight of your data platform, from ingestion and storage to transformation and consumption. This integration helps track data movement across a variety of settings, which improves the platform's ability to find and fix quality issues quickly and effectively. Incident triaging and resolution workflows To reduce the impact of data problems, it's important to have effective workflows for triaging and resolving incidents. A good data observability platform simplifies the steps to detect, triage, resolve, and measure data quality issues. This usually involves automatic alerts, tools to prioritize issues by severity and impact, and robust integrations with messaging and project management tools that complement existing workflows. Efficient prioritization means data teams can concentrate on the most urgent problems, which helps decrease delays and keeps the data accurate and reliable. Root cause and impact analysis via field-level lineage Identifying the underlying cause of a data quality issue is essential to preventing it in the future. An effective data observability platform will provide field-level lineage, which provides an at-a-glance view of where the data came from, how it was changed, and what dependencies or data products are impacted by it. This information allows data teams to quickly understand the root-cause of an issue upon detection, decide who’s responsible for resolving it, and determine who should be informed to minimize cost. Performance monitoring — query optimization and cloud cost management A key part of a strong data observability platform is performance monitoring, which includes improving query efficiency and managing cloud costs. This function helps you find and fix inefficient data queries and processes that could raise operating costs or slow down performance. By making queries more efficient and optimizing cloud resources, organizations can make their data operations more cost-effective and deliver greater value from their data platform at a significantly lower cost. Pioneering the future of reliable data with data observability Implementing a data observability platform that includes these five elements will empower your organization to reduce data downtime, improve data reliability, deliver more value for stakeholders, and foster an environment of data trust across your organization. Download Data Observability For Dummies to discover how data observability can help you improve your data reliability, build organizational trust, and deliver even more value from your data products.
View ArticleArticle / Updated 01-31-2023
Listen to the article:Download audio To be effective at their jobs, staff in an organization want to find the data they need quickly, and they want it to be high-quality data. This means the data needs to be accurate and current. Leaders want data to provide the basis for rich insights that enable timely and informed data-driven decision-making. The legal department requires data to be handled by everyone in a manner consistent with laws and regulations. Product designers want data to inform creative decisions that align with marketplace demands and customer trends. Security professionals are tasked with ensuring that the data is appropriately protected. Undoubtedly, a wide range of stakeholders want to harness the remarkable power of data. To achieve these and other increasingly common business demands, you need some form of data control and accountability in your enterprise. Quality results require the diligent management of your organization’s data. Data governance is all about managing data well. Well-managed data can drive growth Today, when data is managed well, it can drive innovation and growth and can be an enterprise’s most abundant and important lever for success. Well managed data can be transformational, and it can support the desirable qualities of a data-driven culture. This is when decisions at all levels of the organization are made using data in an informed and structured manner such that they deliver better outcomes internally and to customers. Research confirms that most business leaders today want their organizations to be data-driven, but, according to a survey by NewVantage Partners, only around 32 percent are achieving that goal. Successful data governance also means that data risks can be minimized, and data compliance and regulatory requirements can be met with ease. This can bring important comfort to business leaders who, in some jurisdictions, can now be personally liable for issues arising from poor data management. Every organization manages data at some level. All businesses generate, process, use, and store data as a result of their daily operations. But there’s a huge difference between businesses that casually manage data and those that consider data to be a valuable asset and treat it accordingly. This difference is characterized by the degree in which there are formalities in managing data. Broadly, the discipline in which an organization acts in recognition of the value of its information assets (a fancy term for data with specific value to an organization, such as a customer or product record) is called enterprise information management (EIM). Governing and managing data well is a central enabler of EIM, which also includes using technologies and processes to elevate data to be a shared enterprise asset. Data governance versus data management Within the EIM space there are many terms that sound like they might mean the same thing. There is often confusion about the difference between data governance and data management. Data governance is focused on roles and responsibilities, policies, definitions, metrics, and the lifecycle of data. Data management is the technical implementation of data governance. For example, databases, data warehouses and lakes, application programming interfaces (APIs), analytics software, encryption, data crunching, and architectural design and implementation are all data management features and functions. Data governance versus information governance Similarly, in EIM, you may want clarity on the difference between data governance and information governance. Data governance generally focuses on data, independent of its meaning. For example, you may want to govern the security of patient data and staff data from a policy and process perspective, despite their differences. The interest here is on the data, not as much on the business context. Information governance is entirely concerned with the meaning of the data and its relationship in terms of outcomes and value to the organization, customers, and other stakeholders. You might experience obvious overlap between the two terms. For sure, as a data governance practitioner, to some extent you’ll be operating in both the data and information governance worlds each day. This shouldn’t present an issue as long as the strategy for data governance is well understood. My view is that understanding the context of data, a concept known as data intelligence, and the desired business outcomes, complement data governance efforts in a valuable manner. The value of data governance If an organization considers data to be a priority and it puts in place processes and policies to leverage the data’s value and reduce data risks, that organization is demonstrating a strong commitment to data controls and accountabilities. In other words, that organization values data governance. An increasing number of businesses value data governance; in fact, according to Anmut, a data consultancy, 91 percent of business leaders say that data is a critical part of their organization’s success. Fundamentally, data governance is driven by a desire to increase the value of data and reduce the risks associated with it. It forces a leap from an ad hoc approach to data to one that is strategic in nature. Some of the main advantages achieved by good data governance include: Improved data quality Expanded data value Increased data compliance Improved data-driven decision-making Enhanced business performance Greater sharing and use of data across the enterprise and externally Increased data availability and accessibility Improved data search Reduced risks from data-related issues Reduced data management costs Established rules for handling data Any one of these alone is desirable, but a well-executed and maintained data governance program will deliver many of these and more. In the absence of formalized data governance, organizations will continue to struggle in achieving these advantages and may, in fact, suffer negative consequences. For example, poor quality data that is not current, inaccurate, and incomplete can lead to operating inefficiencies and poor decision-making. Data governance does not emerge by chance. It’s a choice and requires organizational commitment and investment.
View ArticleCheat Sheet / Updated 01-06-2023
Customer analytics is different than many business metrics you're probably familiar with: It focuses on customers' needs rather than on the company's needs. Through customer analytics, you can understand what drives customer satisfaction, customer loyalty, and repeat purchases. You'll also understand how your customers differ or are the same and how that may affect different pricing strategies, features, and marketing campaigns.
View Cheat SheetArticle / Updated 12-07-2022
In general, the definition of a data governance tool is one that assists in the creation and maintenance of policies, procedures, and processes that control how data is stored, used, and managed. No doubt, many aspects of data governance are complex, particularly in larger organizations. Fortunately, as expected from a competitive marketplace, where there is opportunity, you will find providers and their software solutions only too willing to help. As data has grown in its significance to every organization, particularly in just the last few years during the Cambrian explosion of data, many innovative data tools have been introduced. Some of the software has emerged from the largest technology players, such as Microsoft, Oracle, IBM, CA, Informatica, and SAP, but also mid-sized and even startups have entered this lucrative space. I’m not going to list solutions here, as there’s always a risk of implying some bias or leaving out an obvious player, plus, and this is probably the bigger reason, the marketplace is changing too fast and any list I provide will inevitably be dated quickly. The quantity and quality of innovative data tools recently introduced have been game-changers. The figure below is illustrative of many of the areas now addressed with software tools. With the increasing use of technologies, such as artificial intelligence, data management, governance, and analytics (and frankly, all aspects of data science) organizations have benefitted from increased automation, better decision-making, improved efficiencies and speed, higher data quality, greater compliance, and even the ability to contribute to increased revenue. To achieve these potential benefits, it’s certainly important for your organization to evaluate what tools may make sense. Selecting data governance tools Determining what tools you need, like so many things, depends on several factors. Considerations will often include: Business priorities and requirements The suite of data tools already available in the organization The complexity of data environment The complexity of IT infrastructure Current maturity level of data governance A narrow or broad focus of data governance objectives Skill sets of data governance team and data staff across the organization Available budget Data governance team appetite for automation and system administration Tool requirements may emerge out of an existing pain point, like so many solutions do. But deciding on a toolset may also be the product of a requirements-gathering process that considers the items in this list and others. Some of the common features now found in data governance tools include: Data discovery, collation, and cataloging: A mechanism to identify, collate, and support data set search. Data quality management: Tools that identify and correct flaws, cleanse, validate, and transform data. Master data management (MDM): This is covered earlier in the chapter in the “Master data management” section. Data analytics: An application to enable the discovery of insights in data. Reporting platform: A solution to generate all manner of business reports. Data visualization: An application that uses graphical elements as a way to see and understand trends, outliers, and patterns in data. Data glossary and dictionary: A repository that contains terms and definitions used to describe data and its usage context. Compliance tools: Solutions that automate and facilitate processes and procedures that support industry, legal, security and regulatory and compliance requirements. Policy management: A tool that helps in the creation policies, supports their review and approval, distributes to impacted staff, and can track that team members have received or viewed content. Data lineage: A solution that identifies, maps, and explains the source and destination of data, including its origin and stops along the way. Data lineage is also known as data provenance. Keep in mind that some tools are designed to do one or more of these tasks really well, while other solutions try to provide an entire suite of solutions. Needs, cost, and complexity are factors when determining whether to buy a single feature or full-suite solution. DataOps and DevOps A defining characteristic of the early years of the 21st century is the need to innovate at speed. In an unforgiving marketplace, organizations that are slow to improve their internal processes or cannot bring products and services to the market are at a disadvantage, which can result in business failure. In this context, greater emphasis has been placed on finding ways to accelerate innovation and produce more frequent deliverables. With technology playing such a central role in innovation, it was observed that the relationship between teams that created solutions — primarily based on software — and those responsible for deploying and supporting the code, were not aligned. These two groups, the developers and the IT operations teams, for example, reported to different leaders and had dissimilar performance goals. Around 2007, a movement started to better integrate development and operations that was aptly named DevOps. DevOps is a reimaging of how to build and deliver solutions quickly. It incorporates automation, collaboration, communication, feedback, and iterative development cycles. In a similar fashion, but on the premise that organizations were struggling with data volume and velocity, and the slow speed of deriving insights, it was observed that efficiencies could be gained in rethinking the lifecycle of data within the enterprise. Using the concepts and successes of DevOps, around 2014, a new approach to data analytics emerged called DataOps. Some called it DevOps for data science. The figure below shows the data management areas that are being automated — the shaded areas — with DataOps. Like DevOps, DataOps uses contemporary work approaches such as collaboration, tools, and automation to find efficiencies and deliver higher quality and quicker insights. You can think of DataOps as a way to kick data analytics into high gear. Central to DataOps is the emphasis on collaboration between participants in the data value chain. This includes data analysts, data engineers, IT team members, quality control, and data governance. In addition, like DevOps, DataOps proposes an agile approach to delivering data solutions. Instead of long periods of requirements analysis, design, and then development, work is broken into smaller chunks and priority is given to delivering value quickly and often. Cycle times are compressed, and business users get the data they need sooner. As an example of inefficiencies in the absence of DataOps, a marketing leader requests the development of a new monthly report. In traditional development lifecycle organization, it can take weeks and even months to elicit and validate the requirements for the report, design and develop it, receive feedback and make changes, and then deploy it. The long cycle times lead to disappointment and missed opportunities, and it deters data requestors from even making requests. DataOps changes the game on requests like these through a mix of agile methods, improved collaboration, and automation. Recent research revealed that many companies that embraced DataOps and agile practices were experiencing a 60 percent increase in revenues and profit growth. DataOps can be implemented through team structuring and new processes. But it can also be facilitated through new supporting tools that include artificial intelligence and automation. A dynamic marketplace has emerged that will provide you with many options and new capabilities to accelerate your data analytics cycle times. DataOps is a type of data governance in that it focuses on improved and faster methods to deliver more data value and quality while also considering risk. In addition, it requires the participation and support of the data governance team to help with policies, standards, quality control, and security considerations. DataOps tools can also give data governance teams new, actionable visibility to data use, flow, and challenges in the organization. Some say DataOps is the future of data governance. The evidence is certainly pointing in that direction.
View ArticleArticle / Updated 12-07-2022
You can’t buy a data governance program off-the-shelf. That’s actually good news. Organizations must implement a program relative to its level of interest, as well as its needs, budget, and capabilities. Even a modest effort can produce meaningful results. Glancing at all the areas in the figure below may seem overwhelming, but not all of these elements need to be addressed (certainly not at first), and there are different degrees in which each can be pursued. As you read and learn about them in this book, you can decide what makes most sense for your organization. Regardless of how and to what degree you implement the elements of a data governance program, you’ll need a basic set of guiding concepts and a structure in which to apply them. This is called the data governance framework. While there are many framework variations to choose from, including ISACA’s Control Objectives for Information and Related Technologies (COBIT) IT governance framework, they share some common components that address people, process, and technology. I’ve done the hard work of distilling down the most important qualities of a data governance framework and captured them in the figure below. These components are explored in detail in the book Data Governance For Dummies. It covers everything you need to know about how to implement a basic data governance framework. The data governance components in this figure are not in a specific order, with the exception of leadership and strategy, which is a prerequisite for the rest of the framework. Leadership and strategy Your data governance program must be aligned with the strategy of the organization. For example, how can data governance support the role that data plays in enabling growth in specific markets? Data plays a role in many aspects of organizational strategy, including risk management, innovation, and operational efficiencies, so you must ensure there’s a clear alignment between these aspects and the goals of data governance. The disconnect between business goals and data governance is the number one reason that data governance programs fail. When mapping organizational strategy to data governance, you need the support, agreement, and sponsorship of senior leadership. I’ll be blunt about this: Without full support from your organization’s leaders, your data governance efforts won’t succeed. Roles and responsibilities Your data governance program will only be possible with the right people doing the right things at the right time. Every data governance framework includes the identification and assignment of specific roles and responsibilities, which range from the information technology (IT) team to data stewards. While specific roles do exist, your organization must understand that data governance requires responsibilities from nearly everyone. Policies, processes, and standards At the heart of every data governance program are the policies, processes, and standards that guide responsibilities and support uniformity across the organization. Each of these must be designed, developed, and deployed. Depending on the size and complexity of the organization, this can take significant effort. Policies, processes, and standards must include accountability and enforcement components; otherwise it’s possible they will be dead on arrival. Metrics The data governance program must have a mechanism to measure whether it is delivering the expected results. Capturing metrics and delivering them to a variety of stakeholders is important for maintaining support, which includes funding. You’ll want to know if your efforts are delivering on the promise of the program. Based on the metrics, you and your team can make continuous improvements (or make radical changes) to ensure that the program is producing value. Tools Fortunately, a large marketplace now exists for tools in support of data governance and management. These include tools for master data management, data catalogs, search, security, integration, analytics, and compliance. In recent years, many data science-related tools have made leaps in terms of incorporating ease-of-use and automation. What used to be complex has been democratized and empowered more team members to better manage and derive value from data. Communications and collaboration With the introduction of data governance and the ongoing, sometimes evolving, requirements, high-quality communications are key. This takes many forms, including in-person meetings, emails, newsletters, and workshops. Change management, in particular, requires careful attention to ensure that impacted team members understand how the changes brought about by the data governance program affect them and their obligations. A large number of disparate stakeholders need to work together in order to effectively govern data. Collaboration is essential and can be the difference between success and failure. Good collaboration requires a positive culture that rewards teamwork. It also requires clear channels between teams, such as regular meetings. Online collaboration platforms are increasingly being used too.
View ArticleCheat Sheet / Updated 11-29-2022
This Cheat Sheet summarizes two important aspects of data governance: creating policy documents and the responsibilities of a data governance council.
View Cheat SheetCheat Sheet / Updated 02-24-2022
Business intelligence process creates an environment for better decision-making. To make successful business decisions, you need to gain insight in business intelligence, follow the main steps of the key performance indicators (KPI) cycle, find the best source to store and process operational data, and assess and use standard business intelligence applications.
View Cheat SheetArticle / Updated 08-26-2016
To the long-standing list of business capabilities, smart businesses are adding another category: data management. How you collect quality data, capture and maintain business and customer information, extract usable information, monitor key indicators, insure data security, and apply your findings to enhance marketing, production, productivity, growth, and product innovation has become an increasingly important aspect of running a successful business. Businesses leverage collected data in an ever-lengthening list of ways, including, but by no means limited to, these examples: Increase efficiency by making customer information, production status, inventory, and other business information accessible to on-site and remote employees. Segment target customers in order to tailor offerings and messages accordingly. Offer customers predictive services; for example, social media sites present “people you may know” and online retailers provide recommendations of “purchases made by others shopping for this item” suggestions. Anticipate customer volume based on trends from past purchase patterns and forecasted anticipated conditions; for example, forecasts for utility usage or restaurant volume and selection based on weather forecasts. Track, enhance, and benefit from customer on-site visits. Some retailers use a hardware device called a beacon to activate and send customized messages to downloaded apps on customer mobile devices. Others improve retail and display layouts by monitoring captured video to map customer traffic and to monitor where customers stop compared to purchase volume from that point. Stay on top of reviews and ratings, encouraging customer posts by following purchases with review invitations and consistently monitoring review sites for insights and possible responses. Create data-driven products or services that can generate revenue or deliver competitive advantages. As an example of a data-driven revenue generator, LinkedIn bundles its user profile data into a recruiting and staffing tool purchased by headhunters and hiring companies. As an example of a value-added data-driven service, the real estate site Zillow offers a free automated home value estimation tool, Zestimate, that’s helped the site attract visitors, achieve top-of-mind awareness, and draw industry-leading visitor counts. In your written plan, include a statement about your data management plans, including the objective of data collection, the source and ownership of data, and how you plan to apply collected data to strengthen products, operations, marketing, and business operations. As you develop data capabilities for your business, keep in mind that you need to either own or have permission to use the data you’re accessing and leveraging. Social media networks, for example, collect enough data to serve as their own data sources, whereas other businesses form data-gathering affiliations, purchase data, or harvest data from publicly available sources.
View ArticleArticle / Updated 03-26-2016
Businesses digitally store a tremendous amount of operational data, and for business intelligence to function, it needs wide-open roads between data sources. Mainframe legacy systems still form the foundation of many companies’ data centers because of their ability to process and harbor huge quantities of data, but their data is notoriously difficult to get to as many of the legacy applications are obsolete, proprietary, or pre-standards software. Other options for data sources are: Enterprise Resource Planning (ERP): Often implemented throughout the organization in modules that map to specific business domains, such as supply-chain, human resources, finance, accounts payable, and so on. ERP systems store a lot of transactional data used in today’s BI environments. Customer Relationship Management: A common data source for business intelligence, CRM systems do just what they say: they process and store customer profile and behavior information, like purchase activity. E-Commerce: Web applications can act as source data systems for business intelligence platforms by feeding real-time sales activity.
View ArticleArticle / Updated 03-26-2016
To help your company drive smart decisions and improve the way you do business, check out this variety of forms that can provide insight into business intelligence (BI). Query responses: Raw data produced by the BI system, allowing the user to draw immediate conclusions Reports: Structured and formatted data, built as part of a scheduled event, or on the fly as an ad hoc report Derived Analysis: Insights produced by interpretation of a front-end system’s output, after that application has applied rules, heuristics, other business information, and context to it, such as in a dashboard or scorecard
View Article