Thank you for subscribing!

It's great to feel loved.

What Is Data Dredging?

May 13, 2020 1 min read

Data dredging alternatively referred to as data mining is a practices which involves the analyzation of large volumes of data without seeking any possible relationships between the analyzed data.

In contrast, the traditional or conventional scientific method of data dredging begins with the hypothesis and then extends through the stages of data examination.

Alternatively conducted for unethical purposes, data dredging is a data mining process that possibly circumvents the traditional techniques of data mining which may then results in premature conclusions.

In the simplest sense, data dredging is described as the act of seeking information from a set of data than it actually contains.

What Does Data Dredging Means in Your Business

Let’s imagine that you want to evaluate the fact that some subsets of your prospects or customers are more likely to upgrade than others. You then tested a specific variable or characteristics of one of your customers and the results of the test revealed that there seems to be a statistical significance.

As a modern data-savvy individual, you then begin to dredge through all the available data, while you are conserving a huge leap of information and characteristics about your customers.

Then, you proceed to develop a hypothesis from each one based on some certain criteria’s such as revenue brackets, geographic region and so forth. You continued to do this until you are able to attain a jackpot and discover that one of the hypotheses is significant. For instance, customers in North London are more likely to upgrade.

The truth is, that conclusion isn’t entirely positive. If you run that experiments continuously, chances are that you’ll discover a correlation that would seem statistically significant but it would still be false positive. Saying that particular correlation is statistically significant might be a fallacy because you are actually data dredging.

The next step is to subject sufficient number of hypothesis to tests. Averagely, you can test up to 20 hypothesis if you are utilizing the standard significance threshold of P=0.95.

Certainly, you’ll discover that some of these tested hypotheses will be statistically significant but misleading. In reality, virtually all data set with any degree of randomness has the possibility of containing some forms of false correlations.

Wendy
Written by Wendy

Wendy is a data-oriented marketing geek who loves to read detective fiction or try new baking recipes. She writes articles on the latest industry updates or trends.

In this day and age, big and small organizations have realized the importance of data in their everyday operations. They are aware that the only way to beat the competition is to collect customer data and use it to meet demands and create a more dependable customer base. In return for serving the customers with products and services they are looking for, organizations can expect to earn higher profits and expand in the future.
Read more...
Wendy
Aug 26, 2020 9 min read
Gathering data and data analysis have changed how every single industry functions. Considering just how data is important new ways of acquiring relevant data, analyzing data, and presenting data, have also emerged. Having structured data is the key in all of this, as we use machines in order to make data inputs and to decipher those same inputs. So, for structured data to be usable we need to manipulate it and translate it, in order to aid our business intelligence, business operations, or do trend analysis for example.
Read more...
Gintaras Baltusevicius
Aug 21, 2020 5 min read
You have probably heard terms like data intelligence, business intelligence, or business information. Furthermore, the context where you can encounter these phrases is very similar, so it’s understandable if you think the terms are interchangeable, or if someone uses the terms interchangeably. However, there is a clear distinction between all of those terms and what they denote, and for someone who is working or plans to work as a data analyst knowing the difference will definitely come in handy.
Read more...
Wendy
Aug 21, 2020 4 min read