To address the problem, multimodal deep learning models have been proposed .Since vectors cannot handle highly hetrogeneous data so we are using Tensors and since Stack auto encoders are single way we have to use a new stacked auto encoder.
In MDL They perform feature learning from each modality using conventional deep learning models,The deep computation model was proposed for big data feature learning based on the tensor big data representation model. In the tensor big data representation model, each object is represented by a tensor. For example, an image in the RGB space is typically represented by a third-order tensor R whc in which w, h, c denote the width, the height, and the color, respectively.
Therefore, an image with 1024 768 resolutions in the RGB space can be represented by R10247683. Furthermore, a deep computation model is constructed to learn hierarchical features for big data by stacking several TAEs. To train the parameters of the deep computation model, each TAE is first trained by an unsupervised strategy to obtain the initial parameters from bottom to top, and then some labeled samples are used as the supervised objects to fine-tune the parameters
B.Experiments on SNAE2 Dataset on Tensor Deep Learning Model The authors in paper [9] explained experiment results as follows. The SNAE2 dataset, collected from Youtube, consists of 1,800 video clips grouped into four categories: sport, new, advertisement, and entertainment. Each sample consisting of 100 frames is represented by a 4-order tensor. To evaluate the performance of the TDL model, we selected 1,500 samples as the training set and the rest as the testing set and compared our TDL model to the multimodal deep learning model. We train the TDL model and the multimodal deep learning model with different number of hidden layers for classifying the dataset. Each model is performed for five times and the classification results are shown in image below these two models produce the best classification results for the SNAE2 dataset when there are three hidden layers.
That demonstrates that three levels of representations are enough for classification on the SNAE2 dataset. Furthermore, our model performs better than SAE in most cases, proving the effectiveness of the proposed model. The classification accuracy of two models with three hidden layers is shown image below classification result with accuracy 85.7 percent on the SNEA2 while SAE produces the average classification accuracy of 81.4 percent. Furthermore, even in the worst case, the result obtained by the tensor auto-encoder with a classification accuracy of 82.1 percent is better than the average result of SAE, which demonstrates the effectiveness of the tensor deep learning model.
B. Experiments on SNAE2 Dataset on Deep Convolutional Computation Model The authors in paper explained experiment results as follows. To verify the performance of the DCCM, it is also compared with DCM and MDL. The best performance models are trained. Specially, the DCCM is the architecture of seven constrained layers and three fully connected layers. For the DCM, the structure with three hidden layers is adopted, since this structure performs best on CUAVE dataset in the work. Similarly, for the multimodal deep learning, the shared representation architecture is used. The details of the classification accuracy are shown in Table below.
As shown in Table, the average classification accuracy of DCCMis 86.2 on the SNAE2 dataset,which is higher than that of other models. Moreover, in the worst case, the classification accuracy of DCCM is slightly higher than classification results produced by other models, which shows the effectiveness of the DCCM.
C.Since Hinton proposed the deep belief network, many other deep learning models have been devised, such as stacked autoencoder, deep CNN, and their variants. And they have made some great progress in face recognition, real time search, and speech analysis. However, those models emphasize extraction of a single representation from the original data, which results in the serious restriction in capturing the hybrid features contained in the heterogeneous big data. To tackle this problem, somemodels for multimodal feature learning have been devised recently. The most representative multimodal feature learning model is bimodal deep autoencoder, which is devised by Ngiam et al. to model midlevel correlations between the audio and the visual. In the model, the authors considered three strategies: multimodal fusion, cross modality learning, and shared representation learning to discover the representations hidden in audio-visual data. Finally, the authors adopted the shared representation strategy in modeling the bimodal deep autoencoder.
A similar method is bimodal restricted Boltzmann machine proposed by Srivastava and Salakhutdinov. In this model, the authors employed states of latent variable to model the multimodal data to discover a probability density over the space of heterogeneous modalities. Different from the bimodal deep autoencoder, the multimodal deep Boltzmann machine focuses on the shared representations between images and texts. Moreover, the model can satisfy two properties. First, the similarity in the joint representation space is coincident with that in the independent raw space. Second, the model can be robust, which means that it can also learn the shared representation in multimodal space even when some modalities are missing. Recently, some novel multimodal models are proposed for heterogeneous data feature learning.
To improve the accuracy of face recognition, Ding and Tao designed the multimodal deep face representation framework, in which multimodal features are adopted to jointly model the face representation. The framework consists of two types of deep learning model: the CNN and the deep stacked autoencoder (SAE). Specially, various hidden modalities are extracted from the images by a set of well-designed CNN, and the learned features are concatenated as the input of the SAE. Hu et al. devised a novel deep model multimodal speaker naming model (DMSN) for the improvement of performance for speakers inSince Hinton proposed the deep belief network, many other deep learning models have been devised, such as stacked autoencoder, deep CNN, and their variants. And they have made some great progress in face recognition, real time search, and speech analysis.
However, those models emphasize extraction of a single representation from the original data, which results in the serious restriction in capturing the hybrid features contained in the heterogeneous big data. To tackle this problem, somemodels for multimodal feature learning have been devised recently. The most representative multimodal feature learning model is bimodal deep autoencoder, which is devised by Ngiam et al. to model midlevel correlations between the audio and the visual. In the model, the authors considered three strategies: multimodal fusion, cross modality learning, and shared representation learning to discover the representations hidden in audio-visual data.
Finally, the authors adopted the shared representation strategy in modeling the bimodal deep autoencoder. A similar method is bimodal restricted Boltzmann machine proposed by Srivastava and Salakhutdinov [10]. In this model, the authors employed states of latent variable to model the multimodal data to discover a probability density over the space of heterogeneous modalities. Different from the bimodal deep autoencoder, the multimodal deep Boltzmann machine focuses on the shared representations between images and texts.
Moreover, the model can satisfy two properties. First, the similarity in the joint representation space is coincident with that in the independent raw space. Second, the model can be robust, which means that it can also learn the shared representation in multimodal space even when some modalities are missing. Recently, some novel multimodal models are proposed for heterogeneous data feature learning. To improve the accuracy of face recognition, Ding and Tao designed the multimodal deep face representation framework, in which multimodal features are adopted to jointly model the face representation. The framework consists of two types of deep learning model: the CNN and the deep stacked autoencoder (SAE).
Specially, various hidden modalities are extracted from the images by a set of well-designed CNN, and the learned features are concatenated as the input of the SAE. Hu et al. [20] devised a novel deep model multimodal speaker naming model (DMSN) for the improvement of performance for speakers in
Learning models for exploring big data. (2022, Sep 29).
Retrieved January 11, 2025 , from
https://studydriver.com/learning-models-for-exploring-big-data/
A professional writer will make a clear, mistake-free paper for you!
Get help with your assignmentPlease check your inbox
Hi!
I'm Amy :)
I can help you save hours on your homework. Let's start by finding a writer.
Find Writer