A data miner has nothing without data. And if you work in a large organization, you’ll have hundreds, perhaps thousands, of existing data resources potentially available for data mining. Every activity generates records, and those records can become your raw material. The table shows the variety of commonly collected data in a number of business activities.
Business activity | Data collected |
---|---|
Research | Competitor product information; experimental and test data |
Manufacturing | Process data; procurement records; production records; inspection and test records |
Marketing | Competitor marketing information and sales data; campaign data; marketing cost data |
Sales | Sales activity; sales data; customer information |
Fulfillment | Packaging records; shipping records; shipping complaints |
Customer service | Customer interaction records; product and service complaints; service issues |
Technical support | Support requests; product problem reports; design and other product suggestions |
Training | Staff training records; customer training records; certification and other credentialing records |
Accounting | Bills; payments; audit records; taxes collected and paid |
That’s a pretty long list, yet it’s really only a tiny sample of the activities and related data that’s already waiting somewhere within your business.
But knowing that data exists is not the same thing as being able to access and use it for data mining. For one thing, you’ll need much more specific information about exactly what internal data is relevant to the specific business problem you’re investigating. Who collects it? Who controls access? What variables (fields) are recorded, and for what range of time or activity? Where can you find documentation?
Appreciating your own data
You and your manager might choose from a number of options when selecting which project to tackle with data mining. You always have a choice of tools. But when it comes to data, you may have no choice at all: You use the data available to you or your company right now.
You may have doubts about this data. You are sure to know something about its flaws. And you may have heard about other organizations that have larger quantities of data or different types of data than your own.
Nonetheless, your organization’s internal data, the information collected in the course of everyday business, is your most valuable resource. It’s the very best data that you can have for data mining. It is superior to all external sources in a number of ways:
Unique relevance: The data pertains to your own business, with all its distinctive characteristics. It’s about your own customers, your own products, your own business practices. Whatever you may discover in this data will clearly also be relevant to the business. Nobody will be able to reject your results with the but our business is different excuse.
Transparency: You know (or you can find out) the sources of your own data. No mysteries should exist about the definitions of variables, the data collection methods, the time, the place, or the people involved.
Detail: You’ll have raw data, collected in the finest possible level of detail.
Range: Your data resources cover the full scope of activity taking place in your business.
Competitive advantage: Only you have your own internal data. It is not available to your current or your upcoming competitors.
Development potential: You can build on your own data in ways that would not be possible with data from any outside source. If you want to integrate information from multiple sources, your data will contain the identifiers you need to do that.
If you want to know more about customers, you have their names and contact information, and you can refer to other records, survey them, or even call and have a personal conversation. If you need more detailed or additional data, you may be able to change a data collection practice.
Another nice thing about your own data: You own it. Any data collection costs were covered by the business unit that generated the data in the first place. You’ll pay no fees and have no licensing issues to consider when using and reusing the data. (You may face data storage and other data management issues, but that’s true for any data source.)
Your own data resources will not be perfect in every way. You might discover that some data you’d like to use has not been collected, or has been discarded. You’re bound to encounter some data quality problems. And, of course, internal data has limits — it tells you about your own organization, but not your competitors. Still, internal data will always be your primary and most valuable data resource.
Handling data with respect
Data mining, like any kind of data analysis or reporting, uses a lot of data, much more than most everyday business activities. When you access data and perform analysis, you must be careful to do so in ways that stay within your company’s guidelines and that don’t interfere with routine business processes.
Data resources can be just as precious, and just as private, as cash. Get off to the right start in data mining by treating data with respect and discovering proper practices for data management and governance that affect your work.
Failure to follow legal and good business practices for data governance can lead to serious trouble. It’s important that data isn’t accessed by people who should not use it, that records not be improperly changed or destroyed, and that new data you create be properly archived. Documentation is a necessity. Many legal and good business practice requirements will be relevant to your work in data mining.
This may not be simple. You’ll have to discover things about what data is available, how to get access, and how to handle the data properly so that you don’t get in the way of others. In short, you’ll have to get involved with new things and new people. And it will be worth it, because you’ll get more done and broaden your own horizons as a result.
You’ll have to find out new things, but you won’t have to become a data governance expert. You can rely on the others in your organization who are experts in data governance and data management. Work with them constructively, and they will help you to stay within the law and to follow good data management practices.