Numerical Linear Algebra in Data Mining

Check out more papers on Algebra Data Mining Mining

Abstract in this era of high-tech innovation, almost every level of application in almost any subject area produces a large amount of data. Extracting interesting knowledge from raw data or data mining has become an indispensable task. However, it is collected from complex phenomena. It is usually the combined result of several related variables, and the definition of these variables is less precise. The basic principle of data mining is to distinguish which variables are related to which variables and how the variables are related. In many cases, digital information is collected and stored as Data matrix. Ordinarily, or assuming that exogenous variables depend on variables in the endogenous linear relationship. Search is 'useful' so information can often be characterized as finding 'proper' matrix decomposition. This paper explores numerical Linear Algebra in Data Mining. How linear algebra techniques can help carry out data mining tasks. Examples from factor analysis, cluster analysis, latent semantic indexing and using link analysis to demonstrate how matrix decomposition helps to find hidden connections and do it quickly. Low-order matrix Approximation is the basis for cleaning data and compressing data. Other types of constraints, such as non-negative constraints, will also apply.


Data analysis is very common across the entire scientific and engineering fields and business applications. Almost an important task in each discipline is to analyze a data to search relationship in exogenous and endogenous variables. Data analysis has two special problems.

First, most information collection devices or methods currently have only limited bandwidth. A fact that cannot avoid the data collected is often inaccurate. For example, the signal received by the antenna array is usually subject to instrumental noise pollution; the information acquired by astronomical images through the telescope is often obscured by atmospheric turbulence; the database is usually subject to subjective judgment bias by document indexing; even empirical data is obtained inherent physical limitations often cannot be met in the laboratory. Any deductive science can be further developed and applied. It is important to first reconstruct or represent data in order to reduce the inaccuracy condition while satisfying certain feasibility.

Second, complex systems always require multiple variables. Observing the data of these systems is complicated by the effects of these variables. When these variables are not quite accurate, it is defined that the actual information contained in the original information may be overlapping and ambiguous. Reducing the system model can provide a close-to-level fidelity primitive system, while facilitating the extraction of hidden important decision-making knowledge.

In a variety of data mining technologies, classification, regression, factor analysis, and principal component analysis are some of the most commonly used methods for achieving quantitative reduction goals. The detection structure between variables and variables. The removal of a common point among various noise methods, model reduction, feasibility reconstruction, etc. is to replace the representation of the original data obtained by subspace approximation with a lower dimension. This has led to the concept of low-order approximation. This paper treats these data mining techniques as a problem matrix decomposition and proposes a foundation for the model and a prospective approach to linear algebra.


Let represents the data matrix to be analyzed. Each entry broadly represents the score obtained by entity j on variable i. One way to characterize the relationships is in the multiple variables that contribute to the observations. The data Y assumes that is a linearly weighted score that passes entity J through several 'factors.' We tentatively assume how many factors exist, but this is usually the case Point out that these factors should be retrieved during the mining process. Therefore, the linear model assumes this relationship.


  1.  M. T. Chu and R. E. Funderlic. The centroid decomposition: relationships between discrete variational decompositions and SVDs. SIAM J. Matrix Anal. Appl., 23(4):1025– 1044 (electronic), 2002
  2.  M. T. Chu and G. H. Golub. Inverse eigenvalue problems: theory, algorithms, and applications. Oxford University Press, New York, 2005.
  3. A. N. Langville and C. D. Meyer. A survey of eigenvector methods for Web information retrieval. SIAM Rev., 47(1):135–161 (electronic), 2005.
  4.  M. Berry, S. Dumais and G. O’Brien (1995), ‘Using linear algebra for intelligent information retrieval’, SIAM Review 37, 573–595.     
Did you like this example?

Cite this page

Numerical Linear Algebra in Data Mining. (2021, Dec 29). Retrieved June 21, 2024 , from

Save time with Studydriver!

Get in touch with our top writers for a non-plagiarized essays written to satisfy your needs

Get custom essay

Stuck on ideas? Struggling with a concept?

A professional writer will make a clear, mistake-free paper for you!

Get help with your assignment
Leave your email and we will send a sample to you.
Stop wasting your time searching for samples!
You can find a skilled professional who can write any paper for you.
Get unique paper

I'm Amy :)

I can help you save hours on your homework. Let's start by finding a writer.

Find Writer