Home

How Data is Collected and Why It Can Be Problematic

|
Updated:  
2020-02-18 9:39:52
|
Data Science Essentials For Dummies
Explore Book
Buy On Amazon
Because data is so valuable and users are sometimes adverse to giving it up, vendors constantly find new ways to collect data. One such method comes down to spying. Microsoft, for example, was recently accused (yet again) of spying on Windows 10 users even when the user doesn’t want their data collected.

Lest you think that Microsoft is solely interested in your computing concerns, think again. The data Microsoft admits to collecting (and there is likely more) is pretty amazing.

Microsoft’s data gathering doesn’t stop with your Windows 10 actions; it also collects data with Cortana, the personal assistant. Mind you, Alexa is accused of doing the same thing. Google, likewise, does the same thing. So, one of the trends the vendors are using is spying, and it doesn’t stop with Microsoft, nor does it stop with the obvious spying sources.

It might actually be possible to write an entire book on the ways in which people are spying on you, but that would make for a very paranoid book, and there are other new data collection trends to consider. You may have noticed that you get more email from everyone about the services or products you were provided. Everyone wants you to provide free information about your experiences in one of these forms:

  • Close-ended surveys: A close-ended survey is one in which the questions have specific answers that you check mark. The advantage is greater consistency of feedback. The disadvantage is that you can’t learn anything beyond the predefined answers.
  • Open-ended surveys: An open-ended survey is one in which the questions rely on text boxes in which the user enters data manually. In some cases, this form of survey enables you to find new information, but at the cost of consistency, reliability, and cleanliness of the data.
  • One-on-one interviews: Someone calls you or approaches you at a place like the mall and talks to you. When the interviewer is well trained, you obtain consistent data and can also discover new information. However, the quality of this information comes at the cost of paying someone to obtain it.
  • Focus group: Three or more people meet with an interviewer to discuss a topic (including products). Because the interviewer acts as a moderator, the consistency, reliability, and cleanliness of the data remain high and the costs are lower. However, now the data suffers contamination from the interaction between members of the focus group.
  • Direct observation: No conversation occurs in this case; someone monitors the interactions of another party with a product or service and records the responses using a script. However, because you now rely on a third party to interpret someone else’s actions, you have a problem with contamination in the form of bias. In addition, if the subject of the observation is aware of being monitored, the interactions likely won’t reflect reality.

These are just a few of the methods that are seeing greater use in data collection today. They’re just the tip of the iceberg. The key takeaway here is that no perfect means exists for collecting some types of data and all data collection methods require some sort of participative event.

Don’t want to find yourself in trouble? Here are ten mistakes to avoid when investing in data science.

About This Article

This article is from the book: 

About the book author:

John Paul Mueller is a freelance author and technical editor. He has writing in his blood, having produced 100 books and more than 600 articles to date. The topics range from networking to home security and from database management to heads-down programming. John has provided technical services to both Data Based Advisor and Coast Compute magazines.

Luca Massaron is a data scientist specialized in organizing and interpreting big data and transforming it into smart data by means of the simplest and most effective data mining and machine learning techniques. Because of his job as a quantitative marketing consultant and marketing researcher, he has been involved in quantitative data since 2000 with different clients and in various industries, and is one of the top 10 Kaggle data scientists.