Data Mining 2 what is it for

Check out more papers on Data Collection Data Data Mining


Data mining

 is the analyses of large data sets that is used to find unique, hidden information. It became an important Information Age tool in the 1980s because of the large data sets being collected in databases. Data mining uses automated algorithms to analyze data aggregated from different data sources. Data mining can be used on any large data set. Healthcare collects huge volumes of data to which can be analyzed with data mining techniques. The results of data mining have the potential to impact everyone including patients, physicians, HIM and IT professionals, insurance companies, researchers, and more. Data analyses can be beneficial to hospitals, government, public health, pharmaceuticals, etc. It can contribute to patient quality of care safety, clinical decisions and guidelines, accreditation and licensing, healthcare costs, fraud and abuse, and much more. In identifying unique and actionable information, data mining is potentially a powerful tool that can add value to healthcare data.

Keywords: data mining, knowledge discovery, database, data analysis, big data, algorithms, healthcare

Data mining 

is a type of data analyses that processes large data sets into meaningful patterns and structures. It is the “process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data” (Fayyad, 1996). The large data sets that are analyzed include “big data” which is the large, often vast, volume of data that is being continuously collected with the ubiquitous usage of computers, creation of large databases, and increased computing power. In data mining, automated algorithms are applied to data to look for unknown correlations and patterns that cannot be found using smaller data sets or using manual data analyses.

Even though the term data mining is often used synonymously with knowledge discovery, it is a subset of knowledge discovery in databases. The basic steps of data mining, or knowledge discovery, represented in Figure 1 include:

  • identifying the goals of the analysis and what data to process,
  • preprocessing the data (data cleaning, integration, and transformation),
  • matching goals with analysis methodology,
  • application of algorithms (data mining),
  • identifying and evaluating patterns,
  • reiterating the steps until useful knowledge is acquired.

Data mining can be used analyze data that is stored in flat files, spreadsheets, databases, or other storage formats. Data mining algorithms can be applied to structured data like that in relational databases as well as semi-structured and unstructured data such as text, images, and graphs. “The important criteria for the data are not the storage format, but its applicability to the problem to be solved. “(Oracle, n.d.) Understanding the data and knowing what data is available is an important part of the process. Evaluating and validating data mining results depends on the users’ knowledge of the data and their understanding of when results are unique, useful, and actionable.

When Data Mining Became Important

Data mining has its roots as far back as the 1700s in statistical methods such as Bayes’ Theorem which related current probability to historical probability. (Li, 2016) Databases were introduced in the 1970s, and the Information Age began to see the collection of large data sets that presented the opportunity for analyses. Traditional sample-based statistical methods were inadequate to analyze these large data. Algorithms were being developed that would include data mining methodologies. The phrase “database mining” was trademarked by HNC in 1980. In 1989 the term “data mining” was identified as part of the process called Knowledge Discovery in Databases by Gregory Piatetsky-Shapiro. (Li, 2016) Now it is common for the terms data mining and knowledge discovery to be used interchangeably.

Large data sets have been rapidly accumulating since the explosive use of computers since the Information Age began. In 2003, Eric Schmidt of Google posited that all the data generated since the beginning of time was equal to about 2 days of data generated in that year. (Papp et al., 2018) In 2012, a report produced by the market intelligence and advisory company, IDC, indicated the worldwide data volume would increase from 1.2 zettabytes that year to 40 zettabytes or 40 trillion gigabytes in 2020. (IDC, 2012) IDC (2012) estimated a third of the data produced would be usable if processed and analyzed. The huge collections of data occurring in all fields provides the opportunity to explore data and identify information that was not previously available.  

Why Data Mining is Important in Healthcare

With the advent of the electronic health record (EHR), healthcare facilities are generating tremendous amounts of data that can be beneficial in clinical practices as well as administrative processes. In 2012, it was estimated that 40% of providers had complete EHRs, and 72% had basic EHR systems. (Donovan, 2079) Even before EHR were being implemented, healthcare facilities were using computers to collect administrative information, and laboratories were generating electronic data for test results and imaging. In 2018, it was estimated that the healthcare industry produced 8.41 petabytes or 841,000 gigabytes of data. (Donovan, 2019)

Data mining has the potential to turn this large volume of collected data into useful and actionable information. The data analyses add value to the data collected by healthcare organizations. (Deloitte, 2015) Data mining can be used on data collected that will affect patient quality of care and patient safety, clinical guidelines, facility policies and procedures, accreditation and licensing, federal regulations, costs of healthcare, selection of technologies, claims processing, etc.

Data analyzed in healthcare with data mining techniques can be beneficial to identify best practices in patient care, potentially adverse effects of medications, fraud and abuse violations, patterns of mortality and morbidity, and patterns of denials. (Sayles, 2018; Tomar & Agarwal, 2013) These types of information are useful for both clinical and administrative applications.

Who Data Mining Impacts

Data mining is used in many fields: business, science, mathematics, sports, education, criminal justice, government, healthcare, etc. Any collection of large data can be analyzed using data mining techniques to find new, unique information that can be applied to entity’s field. “As an application of data mining, businesses can learn more about their customers and develop more effective strategies related to various business functions.” (MicroStrategy, n.d.). Healthcare can also learn about their patients as well as the clinical and administrative processes that affect quality of care and costs. In healthcare analyzing large volumes of data can ultimately affect patients, physicians, medical staff, administration, insurance companies, researchers, government policy makers, vendors, researchers, pharmaceuticals, and more.

Data Mining as it Relates to the Healthcare Profession

Data mining can be used in many areas of healthcare including hospitals, physician offices, public health agencies, government, insurance companies, research, and academia, as well as others. Processes that have potential for data mining varies widely. Analyses can find information that might affect clinical decision making, administrative business decisions, privacy and security, database management, laboratory procedures, etc. Data mining can be limited by accessibility to data due to ethics, privacy, security, setting, and ownership. It can also be limited because of data quality issues. Even though large volumes of data are collected in healthcare, for certain applications it may be incomplete or inconsistent. Data quality can also be compromised by being corrupted or noisy. There are many healthcare disciplines that can participate in data mining especially HIM and IT personnel.

Summary / Conclusion

If small data sets and traditional statistics determine the probability that you will a find the needle in a small haystack, data mining either makes the novel discovery there is a needle in the mountainous haystack or verifies the hypothesis that the needle is present, and possibly identifies patterns that predict needle occurrence. Data mining can be used in any field including healthcare. The results of data mining can establish valuable unique, actionable information in healthcare data.


Becker’s Hospital IT. (n.d.). The rise of big data in hospitals: Opportunities behind the phenomenon. Becker’s Hospital Review. Davoudi, S., Dooling, J.A., Glondys, B., Jones, T.D., Kadlec, L., Overgaard, S.M., Ruben, K., & Wendicke, A. (October 2015). Data quality management model (2015 update)-Retired. Journal of AHIMA 86(10). Deloitte. (2015). Health system analytics: The missing key to unlocking value-based care. Deloitte. Donovan, F. (March 8, 2019). Organizations see 878% health data growth rate since 2016. HIT Infrastructure. Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI magazine, 17(3), 37-37. IDC. (2012). The digital universe in 2020: Big data, bigger data shadows, and biggest growth in the far east. EMC. Li, R. (June 2016). History of data mining. KDNuggets. MicroStrategy. (n.d.). Data mining explained. Microstrategy Oracle. (n.d.). Oracle database online documentation 11g Release 1 (11.1): Data mining concepts. Oracle. Papp, L., Spielvogel, C.P., Rausch, I., Hacker, M. & Beyer, T. (2018). Personalizing medicine through hybrid imaging and medical big data analysis. Analysis. Front. Phys. 6:51 http://doi: 10.3389/fphy.2018.00051 SAS. (n.d.). Data mining: what it is and why it matters. SAS. Sayles, N.B., & Kavanaugh-Burke, L. (2018). Introduction to information systems for health information technology. AHIMA. Tomar, D. & Agarwal, S. (2013). A survey on data mining approaches for healthcare. International Journal of Bio-Science and Bio-Technology 5(5), 241-266. [bookmark: _Hlk43642820]Figures

Did you like this example?

Cite this page

Data Mining 2 what is it for. (2021, Oct 08). Retrieved June 23, 2024 , from

Save time with Studydriver!

Get in touch with our top writers for a non-plagiarized essays written to satisfy your needs

Get custom essay

Stuck on ideas? Struggling with a concept?

A professional writer will make a clear, mistake-free paper for you!

Get help with your assignment
Leave your email and we will send a sample to you.
Stop wasting your time searching for samples!
You can find a skilled professional who can write any paper for you.
Get unique paper

I'm Amy :)

I can help you save hours on your homework. Let's start by finding a writer.

Find Writer