The last 20 years, the evolution of personal computers has been rapid. Respectively, the advancements in software and hardware have been huge and inevitably, the e-learning sector was influenced as well, with several tools being developed with ever growing capabilities, from plain delivering of text, to audio/photo/video management. [1]
But what really e-learning is and why such importance is given to it? Electronic learning (or e-Learning) can be defined as the process of educating or offering knowledge via electronic means. Many researchers go a little bit further, like Nichols (2008) for example, who perceives e-learning as "pedagogy empowered by digital technology". This aspect assigns to e-learning an even greater importance, which is probably valid when considering the different ways that e-learning is applied. In most cases there is no face to face interaction between the trainer and the learner and the communication is based solely on electronic means like computers, videos, web sites, virtual reality environments, etc. In other cases e-learning is used as an addition to conventional learning, in order the latter to be enriched (blended learning); in these cases the aforementioned means are combined with or are added to, traditional techniques (classroom interaction) in order the desired results to be achieved. Finally there is an intermediate application as well, where, although there is no face to face interaction in a daily basis, certain meetings are organised from time to time between the learners and the trainer in order the learning process to be improved. [2]
Despite the fact there has been an explosion of computer-based multimedia applications in education in recent years (Gerlic and Jaušovec 1999), the success of e-learning applications has always been debatable. Plenty of researchers have studied the subject, with many of them (e.g. Kazmerski and Blasko - 1999, Kulik and Kulik - 1991, Steyn, du Toit and Lachmann - 1999) stressing out the advantages of e-learning against conventional learning methods. Others though, tend to think differently, believing that e-learning systems can prove to be deficient or simply not superior that the conventional ones (Merchant, Kreie, Cronan 2001) [3], [4], [5], [6], [7], [8].
Despite this debate however, there is a growing trend towards e-learning processes and implementations, with a basic reason being the continuously growing technology possibilities available. Moreover, e-learning presents certain benefits which have raised its popularity these past years:
On the other hand, many tend to believe that the human interaction in a classroom during the learning process is irreplaceable by any other learning type and consequently, e-learning. Jean Barbazette, president of The Training Clinic in Seal Beach, California, believes that "Some things still can't be taught online" and that "For interpersonal skills, classroom learning usually works better". The classroom offers immediate feedback from instructors and co-learners, which is crucial to the learning process Rebecca Aronauer supports. Nevertheless, many researchers support that should e-learning is implemented correctly, it can be as efficient as conventional learning (Zhao et al 2005). Moreover, Rovai (2002) supports that there are no significant differences concerning the experiences of students learning on-line or in a classroom, and that the sense of being part of a team can be effectively simulated when the electronic course is designed appropriately. Indeed, nowadays computers are considered to be of great importance by most university students (Gunn et al 2003). [13], [14], [15], [16]
In order e-learning to be implemented, certain tools are needed which vary significantly one from the other and each of them undertake a certain part of the whole process. These tools can be divided in the following categories based on the process part that they serve:
E-learning cannot be implemented unless the relevant infrastructure exists; this can include computers, visual and audio devices like web cams, microphones and speakers, web servers, media servers, etc. This hardware has to communicate remotely with respective equipment on the trainer's side, an issue which is addressed via networks like intranets, extranets, VPNs, or the World Wide Web. [17], [18]
These are the tools with the help of which the user can gain access to e-learning material; the main are web browsers like Internet Explorer, Mozilla Firefox, Opera and Safari and media players, like Windows Media Player, QuickTime, Winamp and VLC Player. Since the Internet is the main tool to interconnect the trainer and the learner, web browsers are basic tools of the e-learning process and without them, the Internet becomes inaccessible. Media players are quite important as well, since through them the user can access visual and audio material on which, the success of the electronic course is based many times.
In order e-learning material to become accessible to the learner certain tools are needed, which are called Learning Management Systems (LMSs). Their main duty is to provide the platform which will offer the learning content over a network. An LMS is a piece of software that enables as to plan, deliver, and manage the learning process and it can be found in different versions with different functionalities; it can be used just to keep records on the courses that the learners are interested in or to offer complete online learning sessions along with online interaction tools via which, the learners can communicate and enrich the whole process. It can be web-based or not, but in most cases, it is web-based. [19], [20], [21], [22], [23], [24]
The basic tools for creating e-learning material are called Learning Content Management Systems (LCMSs) and are responsible for authoring and managing e-learning content. In other words, these systems are used for creating and exploiting the learning content which will be later delivered via an LMS. The main advantage of LCMSs -in contrast to LMSs- is that they offer the possibility to a programmer to develop, export, import, manage or search for content that can be reused by other programmers in different projects, keeping in parallel history data and versions' data; this content may include text, graphics, media files, etc. In LMSs, courses cannot be developed and managed, and learning objects (small pieces of learning content) cannot be reused in other courses. Nevertheless, it should be noted that many confuse these two terms and often refer to both by using the term LMS; this is wrong though, since as it is evident from the above, an LCMS can be considered as a development of an LMS and offers different possibilities. It is true however, that many times the functionalities of an LCMS overlap those of an LMS.
In order the classroom "feeling" to be simulated effectively in an e-learning environment, various tools can be utilised. Despite the fact that these tools were not developed initially for this specific purpose, when combined can enrich greatly the learning process. These tools can be of two types depending on the presence of the individuals or parties that communicate: synchronous and asynchronous. Synchronous tools enable individuals or parties to communicate in "real time", when asynchronous don't. Asynchronous tools include e-mail services like Gmail, Yahoo or Hotmail, Blogs, Fora, etc. On the other hand, synchronous include chat clients like GoogleTalk or MSN, VoIP/ teleconference tools like Skype or WowPow, media players like VLC, WinAMP or Media Player, etc. Media players can be used of course, as asynchronous tool as well. [25]
All the above functionalities -besides those concerning hardware of course- are successfully incorporated in most modern LM/LCM Systems. There is a great variety of such tools from various vendors, but the most popular among the educational community these days seem to be Moodle and JoomlaLMS.
Moodle is one of the most popular tools of its kind due to the fact that it is a free and open source LCMS for creating dynamic environments for educational purposes and despite being free, it is considered to be highly efficient, since its modular design allows developers to add desired functionalities and in essence tailor it on their needs. Moreover many additional third party plug-ins are available for free, which enhance even more its modular character. The main programming language used for developing new modules is PHP, a fact that assigns an important advantage; Moodle can run on different platforms (Windows, Linux, Unix, Mac OS, etc) without any modifications being needed, as soon as PHP is supported. [59]
JoomlaLMS emerged from the extremely popular web content management platform Joomla, and like its parent application, is based on PHP programming language and MySQL database system. The basic Joomla characteristics like modularity, extensions and templates are still there, as well as in the aforementioned Moodle application. The difference however is that JoomlaLMS is not independent -needs Joomla to function- and additionally, it is not a free software package. [60], [61]
As we saw in the previous paragraphs the main tools to create and offer knowledge are LMSs and LCMSs; the former hosts content which is created on the latter. Additionally, the main advantage of LCMSs is their ability to create reusable learning objects. Obviously, reusing an object offers many advantages with the main being the time that is saved; the developers can use on their projects already developed pieces of content, independently of their project's nature and special demands. Nevertheless, creating learning objects on a certain platform doesn't mean that it can be used efficiently on any other platform that some other developer may use. This is where standards come in with their main goals being:
Until 1999, no e-learning standards had been used and the first development attempts gave results in 2000. Since then, several organisations have been developing e-learning standards for different purposes and some of them are:
In the picture below the interconnection between the various standards is presented.
Navigation is one of the most important elements of an e-learning course, since courses characterised by problematic navigation not only -most of the times- are abandoned by their users, but even when this is not the case, the efficiency of the course is significantly reduced.
At the dawn of the e-learning era, the navigation schemes were very simple and mostly linear, meaning that the trainee could just move from page to page, forwards and backwards. Nowadays, the recent hardware and software developments, along with the subsequent developments in e-learning systems, offer to us the possibility to create complex navigation schemes, with simultaneous and parallel access to different parts of the course, which can be comprised by texts, images, audio, video or combinations of these, meaning multimedia. Nevertheless, besides the obvious advantages, this modern navigation approach presents some serious drawbacks as well. The basic are the following:
So, in order the learner to remain focused on the learning content and be distracted as less as possible by irrelevant elements, Holzinger (2000) proposes several mechanisms, like indexes, site maps, guided tours, bookmarks, "fish-eye" views, etc. Nevertheless, such mechanisms don't prove to be enough in all cases and additional actions are often needed; these actions are defined by several standards with the most important of these being SCORM. [35], [36], [37], [38], [39], [8]
As we saw in section 1.1.3, standards are a fundamental element of the e-learning organisation globally. A quite serious effort on the subject has been conducted by the ADL (Advanced Distributing Learning) initiative -established by the White House Office of Science and Technology Policy (OSTP) and the US Department of Defence (DoD)- and is called SCORM.
SCORM was based on previous efforts by several organisations -like the aforementioned in paragraph 1.1.4- with the main being:
SCORM stands for Sharable Content Object Reference Model and has been developed in order to "foster the creation of reusable learning content as "instructional objects" within a common technical framework for computer-based and Web-based learning. SCORM describes that technical framework by providing a harmonized set of guidelines, specifications and standards based on the work of several distinct e-learning specifications and standards bodies". [40]
With the implementation of SCORM, the ADL Initiative aims to "accelerate large-scale development of dynamic and cost-effective learning software and systems and to stimulate the market for these products". [40]
SCORM's basic idea is that, the learning content, meaning courses, modules, etc, can be obtained by aggregating reusable content objects. These objects can be used repeatedly in any platform, without restrictions. This uniformity is achieved by certain rules and guidelines defined in SCORM. A SCORM compliant LMS can identify the organisation of the content without needing information regarding sequencing and navigation, since these subjects are taken care by SCORM, provided that the course is SCORM compliant. So, the content objects can be reused in other environments.
In order an e-learning environment to be SCORM compliant, it has to fulfil certain general requirements set by the ADL Initiative, which are incorporated in SCORM. These requirements are called "ilities" and are the following:
SCORM assumes that the implemented e-learning environments are web-based.
This blending of the "ilities" with the web-based character of the learning applications, offers the following abilities:
Naturally, the above mentioned requirements have a general character and are not the only ones incorporated in SCORM. There exist a large number of guidelines and specifications. In order these to be efficiently exploited, SCORM is divided in three technical books, with each one of them referring to a certain subject. These subjects are: the Content Aggregation Model (CAM), the Run-Time Environment (RTE) and Sequencing and Navigation (SN).
The first book of SCORM (CAM) provides descriptions of the content objects, which -when aggregated- comprise a course, module, etc, as well as ways to package these objects so as interoperability between several platforms to be achieved. Additionally, it proposes ways to describe these objects via metadata so as these to be easily searched and discovered and additionally, ways to define sequencing rules. The objects are organised together so as to produce content packages, meaning courses, lessons, modules, etc.
A Content Package connects and organises content objects or aggregations of content objects. A SCORM Content Package may represent a course, a lesson, a module or may simply be a collection of related content objects.
This process of creating, discovering, aggregating and organising small content pieces into more complex learning entities and moreover defining sequencing rules on how these are going to be accessed by the learner, consists of the following:
It refers to the components of a content package and how these are organised to create it. It consists of the following elements: Assets, SCOs (Sharable Content Objects), Learning Activities, Content Organization and Content Aggregation.
The Assets comprise the main building parts of any learning resource and can be described as electronic representations of any kind data that can be delivered to the user via a web browser (texts, images, videos, sound, etc.).
A SCO can be described as a single learning resource that can be launched by LMSs via the SCORM RTE. It can be produced by aggregations either of single assets or by connecting sets of assets, which in turn consist of multiple single assets. The SCOs comprise the lowest level of data that can communicate with LMSs, with this characteristic comprising their main difference versus assets or sets of assets.
The Content Organizations, are collections of SCOs and represent the ways that the learning content should be used by the learner; this can be accomplished by utilising meaningful units of instruction, the Activities.
Finally, the Content Aggregation is used to describe the process of creating sets of objects with related content in terms of functionality, so as these sets to be delivered to the learner during the learning experience.
Content Packaging is a process with main objective to ensure that the aggregated content will be able to operate on different platforms. A Content Package represents a unit of learning, meaning that it contains all the data needed so as the learning content to be processed by the LMS and delivered to the learner. It consists of two basic components; the so-called Manifest and the physical files that comprise the content. The Manifest is an XML file which holds data regarding the package's organisation and the included corresponding resources; consists of 4 main components, 2 mandatory (Organisations and Resources) and 2 optional (Metadata and Sub-Manifests). The "Metadata" provide general information about the package, i.e. title, description, etc, the "Organisations" hold the organisation (structure) of the package's resources, the "Resources" contain resources' data when the Sub-Manifests describe any stand-alone instruction units.
Metadata hold descriptive information of the content object, i.e. its properties.
These information provide definitions of rules' models which set the sequence and ordering of the content that is delivered to the learner.
RTE describes the requirements to which LMSs should conform in order interoperability between different platforms to be achieved, independently of the tools used in developing the content. In other words it defines how an LMS launches content objects, how it communicates with these at runtime and what data are exchanged during execution, so as interoperability to be accomplished.
These three activities are served by three respective components:
The Sequencing and Navigation (SN) book of SCORM focuses on defining ways so as the learning content to be offered to the learner efficiently, in an adequate order. In order this to be accomplished and the sequencing information to be processed at run-time, a SCORM compliant LMS must incorporate certain elements and functionalities, which are defined in this book as well. The sequencing information essentially refers, to what learning activity is to be delivered next to the learner; each learning activity is associated with a content object. How these objects are launched by the LMS, is described in the RTE book.
As it was mentioned in a previous paragraph, the content package holds information regarding the organization of resources, which however do not include information regarding the way that the learning content is going to be delivered to the user, meaning sequencing and order information, or which parts of the content will be accessible to the user and when; these information are held by the aforementioned, manifest file.
Towards this goal, SCORM has adopted sets of specifications originally developed by IMS, which provide ways for the sequencing information to be incorporated in the learning process.
Some fundamental concepts in these specifications are the Learning Activity, the Activity Tree, the Activity Cluster, the Attempt, the Learning Objectives, the Sequencing Rules and the Rollup Rules.
A loose definition of the Learning Activity is that it is "a meaningful unit of instruction"; in other words it is an action of the learner as he/she goes through the course. It can be an autonomous learning unit or may be comprise by several of sub-activities; sub-activities in turn, may consist of 2nd-level sub-activities and these in turn by 3rd-level, and so on. The activities and the users experiencing them can be associated with a tracking status. Each user can execute a predefined number of a certain activity or he/she may be free to execute it as many times desired. Activities may be suspended, abandoned, exited normally etc., nevertheless all of them must remain within the context of the parent activity.
An Activity Tree is a tree holding nodes with each node being associated to an activity and storing the sequencing information. The LMS goes through the activity tree and identifies which is the next learning activity to be delivered to the learner. Generally, the sequencing information is those that determine the activities' order; in case that there is no such information, those contained in the manifest file are followed.
An Activity Cluster can be defined as a group of activities containing a parent activity and its 1st-level children (sub-activities) and its main role is to help developers to organize sequencing in a more efficient way. Whatever rules apply on the child activity, these rules apply on the parent activity as well.
Each time the user tries to execute an activity he/she is making an Attempt. If this activity is a child of a parent activity, which in turn is the child of another parent and so on, then the attempt reflects to all activities throughout the whole tree.
A Learning Activity can be associated with one or more Learning Objectives and SCORM provides full freedom in associating activities to objectives. Nevertheless, the meaning multiple objectives cannot be assumed by SCORM and status information of an activity's objective is held locally to that activity. Status information sharing cannot be accomplished unless the objectives have a global character; status information of global objectives is available for sharing among several activities, either within a single Activity Tree or across multiple trees. There two restrictions however:
The Sequencing Rules are applied to an activity and evaluated -by using tracking information associated with the activity- at specified times during different sequencing cases, in other words, different learning cases. Each rule consists of a set of conditions and a relevant action. The rule is applied only when the status of the set of conditions is "True".
The Rollup Rules are used for evaluating the progress of the learner for cluster activities. Due to the fact that the cluster activities have no association with the content objects, information regarding the user's progress, cannot be applied directly to a cluster activity. A set of zero or more rules may be applied and the evaluation process takes place during a process called Rollup; this process uses the status data of children activities in order to evaluate the status information of the corresponding cluster. Each rule of this type, consists of a set of child activities, a set of conditions which are evaluated based on the status data of these child activities and a, relevant to these conditions, action, which is executed when the conditions' status is set to "True". [43]
Data mining has attracted great attention the last decades with the main reason being that it offers the possibility to extract useful information by huge amounts of data, which in turn can be used for decision making in various fields like research activities, engineering, marketing, business management, etc.
The last 30 years (1980-onwards), information technology has made gigantic steps forward and the evolution -in hardware and software as well- has been so rapid, that the available data processing capabilities have reached astronomical levels. Subsequently, the quantities of data collected are correspondingly huge. Evidentially, according to a research conducted by P. Lyman and H. R. Varian, "the new stored information grew about 30% a year between 1999 and 2002".
Obviously, the analysis and effective exploitation of these data quantities although not a simple task is yet an essential one since, unless extracting valuable information by data, these data are practically useless.
A solution to this problem is given by Data Mining, which according to G. Karypis can be defined as "Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns & rules."; when according to F. Castro et al and P. Chapman et al, Data Mining "is not just a collection of data analysis methods, but a data analysis process that encompasses anything from data understanding, preprocessing and modeling to process evaluation and implementation." [44], [45], [46], [47]
Data mining and Statistics share many common characteristics a fact which may seem confusing at first, albeit being perfectly natural since, data mining has emerged by the composition of various disciplines like informatics, machine learning, database systems, visualization and finally, statistics (figure 1).
Additionally, besides the inevitable similarities between them, there exists a major and fundamental difference; data mining allows the development of models which when applied on data can offer different views and visualisations of results depending on the number of dimensions that were used for building these models. On the other hand when using statistics such a practice is not possible and in order to get different views and results' visualisations the effort has to be repeated as many times as the required number of dimensions in order the desired conclusions to be reached.
When data mining is applied, the data under interest are often coming from different sources; this is the rule when talking about web data. Subsequently due to the inevitable differentiations between them, these data cannot be mined effectively. In order the data to end up in an adequate form and be submitted to data mining, several steps have to be followed:
Usually the first four steps are very time consuming; in fact these may require over 60-70% of the overall process time. For this purpose the data are inserted in adequate databases or even data warehouses, when the data exist in large amounts. [49]
The data mining process is supported by several techniques or combinations of them. The fundamental ones are the following:
The search for associations in data sets comprises a basic data mining task. Via an association process, we take a certain set of data and analyse it so as to extract associations' patterns of the included objects. The outcome of such a process is a number of rules which offer a set of associations between database objects which help us to reach useful conclusions and these rules are accompanied by two factors, Support and Confidence, which comprise measures of the rule's strength. More specifically:
The most common example of such an association process, which, as well, explains the role of the two aforementioned factors, is that of the "market basket". In this example we try to find what products are purchased in a super market and how these are associated.
The rule shows that 15% customers purchase Milk along with Chocolate and additionally, whoever purchases Milk also buys Chocolate 70% of the cases. [51], [52]
In classification or supervised learning, we determine certain classes and rules with each class holding certain attributes. We develop a classification model which is then applied on data that are not classified and these are grouped accordingly. In other words, we extract a set of rules from existing data and these rules are applied in turn on different -but similar- data sets, in order to predict certain behaviors.
For example, let's assume that we have an utterly simple data set like the one below, which shows the political preferences of certain population groups according to Age and Income attributes. The fourth column comprises the Class attribute:
What we want to achieve is, to initiate a "learning process" and extract a classification model from this data set which, when applied in different -but of same nature- data sets, will provide predictions regarding the political views of the registered individuals. An emerging rule from the data set above could be that, young and low income persons tend to vote for liberal parties. This rule when applied in different data, -if it is correct- should "predict" that people holding these attributes will indeed vote for liberals. The initial data set is called training set, when the data set on which the model is evaluated is called test set. The accuracy of the classification model can be assessed by comparing the predicted results, with the actual results of the class. The longest the training period, the better the models accuracy will be. [51], [52], [53]
Clustering is the process of developing clusters so as all objects that are members of the cluster to conform to some pre-found criteria, meaning that these objects will be similar in a certain degree. It is often called unsupervised learning, because in opposition to supervised learning, there are no class attributes which define the grouping of data. The discovered data groups are called clusters, which, in order to be formed, several approaches may be followed. The two most important and widely accepted approaches are, partitional clustering and hierarchical clustering.
In Partitional Clustering, random points within the data set are selected as the centers of the clusters called "centroids" and their number is depended on the number of clusters that the user wants to discover. Next, the distances between the centroids and the data points are computed, each centroid is matched to the points that are closest to it and the emerging groups (centroid plus matched points) shape the clusters. This process is iterated many more times in order the clusters shaping to be improved as much as possible and stops only when certain pre-defined conditions are met.
In Hierarchical Clustering a nested sequence of clusters like a tree is produced. This is called "dendogram". At the top of the tree there exists one cluster (root), each internal cluster node contains child cluster nodes and the lowest part of the tree represents single data points. The following schema depicts such a dendogram.
Some confuse clustering with classification, due to the fact that in both techniques, sets of data are created. The difference however is that in classification the criteria are pre-determined by the user, when in clustering these criteria emerge by analysing the data. [52], [53]
The first field where Data Mining found immediate application was the corporate sector. Nevertheless the last years, due to the increasing needs for data analysis as well as the wide variety of tools that have been developed which can execute data mining process of great complexity, its applications have been expanded practically everywhere.
Nowadays, data mining techniques are applied, in businesses, military or security offices, medical institutions, banks, educational organisations, etc.
Businesses use data mining in order to find potential customers and improve their marketing strategies by finding patterns regarding the customers buying preferences and habits.
Military or security offices analyse opponents' data or even private data -illegally in many cases- in order to extract information regarding hostile movements or terrorist attacks.
Medical Institutions, executing researches on genomic data for example, are dealing with gigantic sets of data, which cannot be analysed without using data mining techniques and tools.
The Banks apply data mining to detect credit card fraud e.g. by identifying the patterns of transactions related to fraud actions or to reduce the risks when supplying loans by identifying or predicting potential untrustworthy customers. Finally, in educational systems, data mining is applicable in many fields, with one of the most important being e-learning or -more accurately- web-learning, since, most e-learning courses are carried out via the internet.
We referred above to the various advantages of e-learning in the process of providing knowledge and education to literally any individual, independently of time, distance or personal ability. Nevertheless, e-learning environments even nowadays are still far from perfect and continuous improvements are needed in order to reach the desired level. The trainers need ways to assess the courses in terms of efficiency, structure, activities selected by the learners, learners' satisfaction, results, etc, and get adequate feedback in order to alter their course for the better.
In the e-learning field, two types of users are of main interest; the trainers and the learners. In the first category falls any organization that may be offering training courses of any kind, like universities, enterprises, public organizations, etc, while the second refers to any single one of us who is interested in acquiring knowledge.
Some of the data that are kept for each user may be: name, age, qualifications, experience (e.g. previous courses taken), course visiting frequency, time spent, grades achieved, etc. By applying data mining techniques on these data, we are able to extract information that may help us to evaluate the content of the courses, add/remove courses, establish new programs, guide the users better, identify most popular courses, improve the navigation schemes, identify groups of learners with similar behaviours, find cases where the learners don't take the process seriously and just play around, etc. [48], [62]
Additionally, due to the fact that most e-learning courses are offered nowadays via the internet and refer to a global audience, the amounts of collected data are huge and so, the processing and management of these data comprises a complicated issue. A solution on this issue can be offered by data mining via which the data can be assessed, managed, processed and exploited in such ways so as the e-learning environment itself to be adequately assessed and improved.
The online character that e-learning has adopted these least years, leads to conclusion that e-learning data mining is essentially applied on Web data; hence, it is called web mining. These web data may be:
and Web mining adopts the same techniques with its "parent" discipline in order to mine these data. It can be divided in three main categories: Web Content Mining, Web Usage Mining and Web Structure Mining. [48]
The World Wide Web has been expanding rapidly the last two decades and it is becoming harder and harder for the user to identify the information that interest him/her within such a vast pool of information. The main goal of web content mining is to offer to the user the information of interest, by searching the content of the available online resources and in order to achieve this goal, the classical data mining techniques are not always enough. In other words web content mining takes the functionalities of a search engine, one step further, by implementing more advanced techniques. Due to the fact that web content is not organized in relational databases -like offline data-, and it can be text, images, audio, videos, metadata or hyperlinks, a relational database cannot be used in this case and different types of databases have to be used, like multimedia databases for example. The reason is that web content is not always structured like offline data and it may be unstructured (text data), semi-structured (HTML data) or structured (table data)
Moreover, web data are almost never accumulated in one place, but are dispersed in heterogeneous sources; consequently the data have to be pooled in one place, in order to be organized and homogenised (e.g. data warehouse). [48], [54]
When Web Usage Mining is applied, in essence data mining techniques are used for discovering patterns regarding the web surfing activities of the users. Practically the data that are mined are the metadata (data about data) of these activities which are kept in respective logs (web logs).
The extracted patterns provide valuable information concerning the users' trends and preferences when surfing the web, products' marketing strategies, outcomes of promotional campaigns, etc; these information assist the web designers on developing improved web applications or marketing researchers to adjust their strategies accordingly. [54], [55]
Web Structure Mining attempts to find patterns concerning the structure of the hyperlinks that reside within web pages and link one with another. The main scopes that it serves are the following:
All the above have a main goal which is to provide information on improving the structure of web pages and applications. Under this scope we could support that it is strongly related to web usage mining, since a main goal of both aim is to improve the web structure in general. [54]
The importance of data mining in extracting knowledge and assisting in the decisions' making process, has led to the development of various tools. These tools aim to assist the researcher in conducting the data mining task in n easy, quick and efficient way without being necessary to be fully aware of the discipline. These tools can be either of commercial character or be available for free. Despite the great variety of commercial tools (BayesiaLab, Clementine, Data Miner Software Kit, DBMiner 2.0, IBM Intelligent Miner Data Mining Suite, KXEN, Oracle Data Mining (ODM), SPSS, SAS Enterprise Miner, etc.), their commercial character as well as their functionalities, are out of the scope of this thesis. Subsequently, we're going to focus in the free tools that are available. A few of the most widely used are the following:
RapidMiner is a Java-based application and its main characteristic is that it hosts a large number of operators (over 500); this feature provides the possibility to use a large number of different methods and make the corresponding comparisons and an additional advantage is the great possibilities that it offers for model building and validation. It seems to be the most powerful of all, but its main disadvantage is the somewhat complex GUI, which albeit aesthetically beautiful, it lacks user friendliness and seems to be harder to be learned, despite the complete documentation offered in the website. Another element is that Rapidminer has adopted several WEKA algorithms.
KNIME is simpler than Rapidminer in use, but it lacks the power when coming to model building and validation. Nevertheless, for relatively simple tasks, it includes all the required operators and visual components. Additionally, it can connect to and read from a database, and moreover it can incorporate modules of the WEKA tool a fact that enhances significantly the offered possibilities.
WEKA is somewhat in the middle between the two aforementioned applications KNIME and RapidMiner. It hosts many algorithms and visualization tools -not as many as RapidMiner- and it is relatively simple to use. Its user interface is not aesthetically in the level of the other two tools; nevertheless it seems to be more easy to use than both. The learning time required is significantly less than the other two even for novices and the accompanying documentation is seems to be more than enough. It offers direct access to databases and it is able to process the results of database queries.
The primary scope of this project is to investigate how to enhance the navigation scheme of a SCORM compliant course at TEI Piraeus using data mining techniques.
The course will be analysed by applying mining techniques on the data collected by the learning management system. The underlying analysis will focus on identifying patterns/clusters of data, which will provide information regarding the navigational behaviour of the students. The main goal is to find navigational patterns and clusters of students that perform high or low. Next, a new navigation scheme will be proposed with the basis being the SCORM standard.
In order to evaluate the navigation scheme of the course data mining techniques will be implemented. The data extracted by the course's database will be mined by using three different techniques so as to gain the best possible view of the students' actions; these techniques will be, Association Rules, Classification and Clustering, with the algorithms selected being a-Priori, J48 and K-Means respectively.
It is obvious that the above described goals are more or less similar in a certain extent. Nevetheless, by using all these three techniques, we can have a more consistent view of the situation and the outcomes of this assessment will be more accurate and the errors' possibility will be significantly diminished.
The tool that will be used for the data mining process is WEKA. WEKA was chosen over Rapidminer and KNIME. Despite the fact that Rapidminer and KNIME, would be efficient as well for the tasks that this thesis requires, WEKA was chosen due to the superior simplicity of its interface. The WEKA GUI that will be used among the four available will be the "Explorer".
After identifying the problematic navigation schemes, solutions will be proposed. The "tool" that will be deployed is SCORM. If the data mining outcome is adequately evaluated and compliance with the corresponding SCORM guidelines is achieved, the students' navigation patterns and subsequently the efficiency of the course will be significantly improved.
At this point the data of interest (attributes) have been identified in the course's database and the respective queries have been built and executed so as the data to be extracted. The data have been exported and have been preprocessed (cleansing, filtering, etc), so as become ready to be imported and analysed in WEKA. Some data mining tasks have already been performed on test data in order the best algorithm settings to be selected.
What remains to be done, is the main data analysis to be executed in WEKA and the results to be evaluated accordingly. The outcome of this process will be the identification of the main problems that the current navigational scheme presents. Next, the current scheme will be compared to the SCORM guidelines in order any divergences to be pointed out and subsequently, propositions to be made so as the course's navigation scheme to converge to SCORM as much as possible.
Evolution of personal computers. (2017, Jun 26).
Retrieved November 21, 2024 , from
https://studydriver.com/evolution-of-personal-computers/
A professional writer will make a clear, mistake-free paper for you!
Get help with your assignmentPlease check your inbox
Hi!
I'm Amy :)
I can help you save hours on your homework. Let's start by finding a writer.
Find Writer