Bottom Content goes here.
Wikipedia content requires these links.....
Wikipedia content is licensed under the GNU Free Documentation License.
Data mining is the practice of searching large stores of data for patterns.
Used in the technical context of data warehousing it is neutral. However, it
also has a wider, more pejorative usage that implies imposing patterns (and
particularly causal relationships) on data where none exist.
Data mining has been defined as "The nontrivial extraction of implicit,
previously unknown, and potentially useful information from data".
It is also known as knowledge-discovery in databases (KDD).
Used in this sense, "data mining" implies scanning the data for any
relationships, and then when one is found coming up with an interesting
explanation. The problem is that large data sets invariably happen to have
some exciting relationships peculiar to that data. Therefore any conclusions
reached by data mining are likely to be highly suspect. In spite of this
some exploratory data work is always required in any applied statistical
analysis to get a feel for the data, so sometimes the line between good
statistical practice and data mining is less than clear.
Here is an example. The insurance industry has found that people with good
credit records tend to be more likely to make car insurance claims, and have
therefore modified their pricing. While this appears to be a legitimate
finding, politicians in the United States have queried its legitimacy, on
the 'common-sense' grounds that how a person handles their credit card
doesn't affect how they handle a car. So a finding that is statistically
legitimate might not hold up to public scrutiny.
A more significant danger is finding correlations that do not really exist.
An example of this is found at the investment website The Motley Fool. In
the late 1990s the website had a suggested investment portfolio known as the
Foolish Four, which was based on a data mining analysis of trends in the
stock market. Further research in the early 2000s has highlighted that the
correlations they found were an artifact of the particular data set they
used, rather than reflecting reality. This experience is one of many similar
false findings linked to the stock market.
There are also privacy concerns associated with data mining. For example, if
an employer has access to medical records, they may screen out people with
diabetes or have had a heart attack. Screening out such employees will cut
costs for insurance, but it creates ethical and legal problems.
There are many legitimate uses of data mining. For example, a database of
all prescription drugs taken by people can be used to find combinations of
drugs with an adverse reaction. Since the combination may occur only in 100
people and the reaction in 10 of them, a single case may not raise a red
flag. Such a database could find reactions and save lives. However, there is
huge potential for abuse of such a database.
Basically, data mining gives information that wouldn't be available
otherwise. It must be properly interpreted to be useful. When the data
collected involves individual people, there are many questions concerning
privacy, legality, and ethics.