data mining project thesis

data mining Recently Published Documents

Total documents.

Latest Documents
Most Cited Documents
Contributed Authors
Related Sources
Related Keywords

Distance Based Pattern Driven Mining for Outlier Detection in High Dimensional Big Dataset

Detection of outliers or anomalies is one of the vital issues in pattern-driven data mining. Outlier detection detects the inconsistent behavior of individual objects. It is an important sector in the data mining field with several different applications such as detecting credit card fraud, hacking discovery and discovering criminal activities. It is necessary to develop tools used to uncover the critical information established in the extensive data. This paper investigated a novel method for detecting cluster outliers in a multidimensional dataset, capable of identifying the clusters and outliers for datasets containing noise. The proposed method can detect the groups and outliers left by the clustering process, like instant irregular sets of clusters (C) and outliers (O), to boost the results. The results obtained after applying the algorithm to the dataset improved in terms of several parameters. For the comparative analysis, the accurate average value and the recall value parameters are computed. The accurate average value is 74.05% of the existing COID algorithm, and our proposed algorithm has 77.21%. The average recall value is 81.19% and 89.51% of the existing and proposed algorithm, which shows that the proposed work efficiency is better than the existing COID algorithm.

Implementation of Data Mining Technology in Bonded Warehouse Inbound and Outbound Goods Trade

For the taxed goods, the actual freight is generally determined by multiplying the allocated freight for each KG and actual outgoing weight based on the outgoing order number on the outgoing bill. Considering the conventional logistics is insufficient to cope with the rapid response of e-commerce orders to logistics requirements, this work discussed the implementation of data mining technology in bonded warehouse inbound and outbound goods trade. Specifically, a bonded warehouse decision-making system with data warehouse, conceptual model, online analytical processing system, human-computer interaction module and WEB data sharing platform was developed. The statistical query module can be used to perform statistics and queries on warehousing operations. After the optimization of the whole warehousing business process, it only takes 19.1 hours to get the actual freight, which is nearly one third less than the time before optimization. This study could create a better environment for the development of China's processing trade.

Multi-objective economic load dispatch method based on data mining technology for large coal-fired power plants

User activity classification and domain-wise ranking through social interactions.

Twitter has gained a significant prevalence among the users across the numerous domains, in the majority of the countries, and among different age groups. It servers a real-time micro-blogging service for communication and opinion sharing. Twitter is sharing its data for research and study purposes by exposing open APIs that make it the most suitable source of data for social media analytics. Applying data mining and machine learning techniques on tweets is gaining more and more interest. The most prominent enigma in social media analytics is to automatically identify and rank influencers. This research is aimed to detect the user's topics of interest in social media and rank them based on specific topics, domains, etc. Few hybrid parameters are also distinguished in this research based on the post's content, post’s metadata, user’s profile, and user's network feature to capture different aspects of being influential and used in the ranking algorithm. Results concluded that the proposed approach is well effective in both the classification and ranking of individuals in a cluster.

A data mining analysis of COVID-19 cases in states of United States of America

Epidemic diseases can be extremely dangerous with its hazarding influences. They may have negative effects on economies, businesses, environment, humans, and workforce. In this paper, some of the factors that are interrelated with COVID-19 pandemic have been examined using data mining methodologies and approaches. As a result of the analysis some rules and insights have been discovered and performances of the data mining algorithms have been evaluated. According to the analysis results, JRip algorithmic technique had the most correct classification rate and the lowest root mean squared error (RMSE). Considering classification rate and RMSE measure, JRip can be considered as an effective method in understanding factors that are related with corona virus caused deaths.

Exploring distributed energy generation for sustainable development: A data mining approach

A comprehensive guideline for bengali sentiment annotation.

Sentiment Analysis (SA) is a Natural Language Processing (NLP) and an Information Extraction (IE) task that primarily aims to obtain the writer’s feelings expressed in positive or negative by analyzing a large number of documents. SA is also widely studied in the fields of data mining, web mining, text mining, and information retrieval. The fundamental task in sentiment analysis is to classify the polarity of a given content as Positive, Negative, or Neutral . Although extensive research has been conducted in this area of computational linguistics, most of the research work has been carried out in the context of English language. However, Bengali sentiment expression has varying degree of sentiment labels, which can be plausibly distinct from English language. Therefore, sentiment assessment of Bengali language is undeniably important to be developed and executed properly. In sentiment analysis, the prediction potential of an automatic modeling is completely dependent on the quality of dataset annotation. Bengali sentiment annotation is a challenging task due to diversified structures (syntax) of the language and its different degrees of innate sentiments (i.e., weakly and strongly positive/negative sentiments). Thus, in this article, we propose a novel and precise guideline for the researchers, linguistic experts, and referees to annotate Bengali sentences immaculately with a view to building effective datasets for automatic sentiment prediction efficiently.

Capturing Dynamics of Information Diffusion in SNS: A Survey of Methodology and Techniques

Studying information diffusion in SNS (Social Networks Service) has remarkable significance in both academia and industry. Theoretically, it boosts the development of other subjects such as statistics, sociology, and data mining. Practically, diffusion modeling provides fundamental support for many downstream applications (e.g., public opinion monitoring, rumor source identification, and viral marketing). Tremendous efforts have been devoted to this area to understand and quantify information diffusion dynamics. This survey investigates and summarizes the emerging distinguished works in diffusion modeling. We first put forward a unified information diffusion concept in terms of three components: information, user decision, and social vectors, followed by a detailed introduction of the methodologies for diffusion modeling. And then, a new taxonomy adopting hybrid philosophy (i.e., granularity and techniques) is proposed, and we made a series of comparative studies on elementary diffusion models under our taxonomy from the aspects of assumptions, methods, and pros and cons. We further summarized representative diffusion modeling in special scenarios and significant downstream tasks based on these elementary models. Finally, open issues in this field following the methodology of diffusion modeling are discussed.

The Influence of E-book Teaching on the Motivation and Effectiveness of Learning Law by Using Data Mining Analysis

This paper studies the motivation of learning law, compares the teaching effectiveness of two different teaching methods, e-book teaching and traditional teaching, and analyses the influence of e-book teaching on the effectiveness of law by using big data analysis. From the perspective of law student psychology, e-book teaching can attract students' attention, stimulate students' interest in learning, deepen knowledge impression while learning, expand knowledge, and ultimately improve the performance of practical assessment. With a small sample size, there may be some deficiencies in the research results' representativeness. To stimulate the learning motivation of law as well as some other theoretical disciplines in colleges and universities has particular referential significance and provides ideas for the reform of teaching mode at colleges and universities. This paper uses a decision tree algorithm in data mining for the analysis and finds out the influencing factors of law students' learning motivation and effectiveness in the learning process from students' perspective.

Intelligent Data Mining based Method for Efficient English Teaching and Cultural Analysis

The emergence of online education helps improving the traditional English teaching quality greatly. However, it only moves the teaching process from offline to online, which does not really change the essence of traditional English teaching. In this work, we mainly study an intelligent English teaching method to further improve the quality of English teaching. Specifically, the random forest is firstly used to analyze and excavate the grammatical and syntactic features of the English text. Then, the decision tree based method is proposed to make a prediction about the English text in terms of its grammar or syntax issues. The evaluation results indicate that the proposed method can effectively improve the accuracy of English grammar or syntax recognition.

Export Citation Format

Share document.

Help & FAQ

Data Mining

Data Science
Data and Artificial Intelligence

Student theses

1 - 50 out of 258 results
Title (descending)

Search results

3d face reconstruction using deep learning.

Supervisor: Medeiros de Carvalho, R. (Supervisor 1), Gallucci, A. (Supervisor 2) & Vanschoren, J. (Supervisor 2)

Student thesis : Master

Achieving Long Term Fairness through Curiosity Driven Reinforcement Learning: How intrinsic motivation influences fairness in algorithmic decision making

Supervisor: Pechenizkiy, M. (Supervisor 1), Gajane, P. (Supervisor 2) & Kapodistria, S. (Supervisor 2)

Activity Recognition Using Deep Learning in Videos under Clinical Setting

Supervisor: Duivesteijn, W. (Supervisor 1), Papapetrou, O. (Supervisor 2), Zhang, L. (External person) (External coach) & Vasu, J. D. (External coach)

A Data Cleaning Assistant

Supervisor: Vanschoren, J. (Supervisor 1)

Student thesis : Bachelor

A Data Cleaning Assistant for Machine Learning

A deep learning approach for clustering a multi-class dataset.

Supervisor: Pei, Y. (Supervisor 1), Marczak, M. (External person) (External coach) & Groen, J. (External person) (External coach)

Aerial Imagery Pixel-level Segmentation

A framework for understanding business process remaining time predictions.

Supervisor: Pechenizkiy, M. (Supervisor 1) & Scheepens, R. J. (Supervisor 2)

A Hybrid Model for Pedestrian Motion Prediction

Supervisor: Pechenizkiy, M. (Supervisor 1), Muñoz Sánchez, M. (Supervisor 2), Silvas, E. (External coach) & Smit, R. M. B. (External coach)

Algorithms for center-based trajectory clustering

Supervisor: Buchin, K. (Supervisor 1) & Driemel, A. (Supervisor 2)

Allocation Decision-Making in Service Supply Chain with Deep Reinforcement Learning

Supervisor: Zhang, Y. (Supervisor 1), van Jaarsveld, W. L. (Supervisor 2), Menkovski, V. (Supervisor 2) & Lamghari-Idrissi, D. (Supervisor 2)

Analyzing Policy Gradient approaches towards Rapid Policy Transfer

An empirical study on dynamic curriculum learning in information retrieval.

Supervisor: Fang, M. (Supervisor 1)

An Explainable Approach to Multi-contextual Fake News Detection

Supervisor: Pechenizkiy, M. (Supervisor 1), Pei, Y. (Supervisor 2) & Das, B. (External person) (External coach)

An exploration and evaluation of concept based interpretability methods as a measure of representation quality in neural networks

Supervisor: Menkovski, V. (Supervisor 1) & Stolikj, M. (External coach)

Anomaly detection in image data sets using disentangled representations

Supervisor: Menkovski, V. (Supervisor 1) & Tonnaer, L. M. A. (Supervisor 2)

Anomaly Detection in Polysomnography signals using AI

Supervisor: Pechenizkiy, M. (Supervisor 1), Schwanz Dias, S. (Supervisor 2) & Belur Nagaraj, S. (External person) (External coach)

Anomaly detection in text data using deep generative models

Supervisor: Menkovski, V. (Supervisor 1) & van Ipenburg, W. (External person) (External coach)

Anomaly Detection on Dynamic Graph

Supervisor: Pei, Y. (Supervisor 1), Fang, M. (Supervisor 2) & Monemizadeh, M. (Supervisor 2)

Anomaly Detection on Finite Multivariate Time Series from Semi-Automated Screwing Applications

Supervisor: Pechenizkiy, M. (Supervisor 1) & Schwanz Dias, S. (Supervisor 2)

Anomaly Detection on Multivariate Time Series Using GANs

Supervisor: Pei, Y. (Supervisor 1) & Kruizinga, P. (External person) (External coach)

Anomaly detection on vibration data

Supervisor: Hess, S. (Supervisor 1), Pechenizkiy, M. (Supervisor 2), Yakovets, N. (Supervisor 2) & Uusitalo, J. (External person) (External coach)

Application of P&ID symbol detection and classification for generation of material take-off documents (MTOs)

Supervisor: Pechenizkiy, M. (Supervisor 1), Banotra, R. (External person) (External coach) & Ya-alimadad, M. (External person) (External coach)

Applications of deep generative models to Tokamak Nuclear Fusion

Supervisor: Koelman, J. M. V. A. (Supervisor 1), Menkovski, V. (Supervisor 2), Citrin, J. (Supervisor 2) & van de Plassche, K. L. (External coach)

A Similarity Based Meta-Learning Approach to Building Pipeline Portfolios for Automated Machine Learning

Aspect-based few-shot learning.

Supervisor: Menkovski, V. (Supervisor 1)

Assessing Bias and Fairness in Machine Learning through a Causal Lens

Supervisor: Pechenizkiy, M. (Supervisor 1)

Assessing fairness in anomaly detection: A framework for developing a context-aware fairness tool to assess rule-based models

Supervisor: Pechenizkiy, M. (Supervisor 1), Weerts, H. J. P. (Supervisor 2), van Ipenburg, W. (External person) (External coach) & Veldsink, J. W. (External person) (External coach)

A Study of an Open-Ended Strategy for Learning Complex Locomotion Skills

A systematic determination of metrics for classification tasks in openml, a universally applicable emm framework.

Supervisor: Duivesteijn, W. (Supervisor 1), van Dongen, B. F. (Supervisor 2) & Yakovets, N. (Supervisor 2)

Automated machine learning with gradient boosting and meta-learning

Automated object recognition of solar panels in aerial photographs: a case study in the liander service area.

Supervisor: Pechenizkiy, M. (Supervisor 1), Medeiros de Carvalho, R. (Supervisor 2) & Weelinck, T. (External person) (External coach)

Automatic data cleaning

Automatic scoring of short open-ended questions.

Supervisor: Pechenizkiy, M. (Supervisor 1) & van Gils, S. (External coach)

Automatic Synthesis of Machine Learning Pipelines consisting of Pre-Trained Models for Multimodal Data

Automating string encoding in automl, autoregressive neural networks to model electroencephalograpy signals.

Supervisor: Vanschoren, J. (Supervisor 1), Pfundtner, S. (External person) (External coach) & Radha, M. (External coach)

Balancing Efficiency and Fairness on Ride-Hailing Platforms via Reinforcement Learning

Supervisor: Tavakol, M. (Supervisor 1), Pechenizkiy, M. (Supervisor 2) & Boon, M. A. A. (Supervisor 2)

Benchmarking Audio DeepFake Detection

Better clustering evaluation for the openml evaluation engine.

Supervisor: Vanschoren, J. (Supervisor 1), Gijsbers, P. (Supervisor 2) & Singh, P. (Supervisor 2)

Bi-level pipeline optimization for scalable AutoML

Supervisor: Nobile, M. (Supervisor 1), Vanschoren, J. (Supervisor 1), Medeiros de Carvalho, R. (Supervisor 2) & Bliek, L. (Supervisor 2)

Block-sparse evolutionary training using weight momentum evolution: training methods for hardware efficient sparse neural networks

Supervisor: Mocanu, D. (Supervisor 1), Zhang, Y. (Supervisor 2) & Lowet, D. J. C. (External coach)

Boolean Matrix Factorization and Completion

Supervisor: Peharz, R. (Supervisor 1) & Hess, S. (Supervisor 2)

Bootstrap Hypothesis Tests for Evaluating Subgroup Descriptions in Exceptional Model Mining

Supervisor: Duivesteijn, W. (Supervisor 1) & Schouten, R. M. (Supervisor 2)

Bottom-Up Search: A Distance-Based Search Strategy for Supervised Local Pattern Mining on Multi-Dimensional Target Spaces

Supervisor: Duivesteijn, W. (Supervisor 1), Serebrenik, A. (Supervisor 2) & Kromwijk, T. J. (Supervisor 2)

Bridging the Domain-Gap in Computer Vision Tasks

Supervisor: Mocanu, D. C. (Supervisor 1) & Lowet, D. J. C. (External coach)

CCESO: Auditing AI Fairness By Comparing Counterfactual Explanations of Similar Objects

Supervisor: Pechenizkiy, M. (Supervisor 1) & Hoogland, K. (External person) (External coach)

Clean-Label Poison Attacks on Machine Learning

Supervisor: Michiels, W. P. A. J. (Supervisor 1), Schalij, F. D. (External coach) & Hess, S. (Supervisor 2)

Bibliography
More Referencing guides Blog Automated transliteration Relevant bibliographies by topics
Automated transliteration
Relevant bibliographies by topics
Referencing guides

Dissertations / Theses on the topic 'Data mining'

Create a spot-on reference in apa, mla, chicago, harvard, and other styles.

Consult the top 50 dissertations / theses for your research on the topic 'Data mining.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

Mrázek, Michal. "Data mining." Master's thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2019. http://www.nusl.cz/ntk/nusl-400441.

Payyappillil, Hemambika. "Data mining framework." Morgantown, W. Va. : [West Virginia University Libraries], 2005. https://etd.wvu.edu/etd/controller.jsp?moduleName=documentdata&jsp%5FetdId=3807.

Abedjan, Ziawasch. "Improving RDF data with data mining." Phd thesis, Universität Potsdam, 2014. http://opus.kobv.de/ubp/volltexte/2014/7133/.

Liu, Tantan. "Data Mining over Hidden Data Sources." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1343313341.

Taylor, Phillip. "Data mining of vehicle telemetry data." Thesis, University of Warwick, 2015. http://wrap.warwick.ac.uk/77645/.

Sherikar, Vishnu Vardhan Reddy. "I2MAPREDUCE: DATA MINING FOR BIG DATA." CSUSB ScholarWorks, 2017. https://scholarworks.lib.csusb.edu/etd/437.

Zhang, Nan. "Privacy-preserving data mining." [College Station, Tex. : Texas A&M University, 2006. http://hdl.handle.net/1969.1/ETD-TAMU-1080.

Hulten, Geoffrey. "Mining massive data streams /." Thesis, Connect to this title online; UW restricted, 2005. http://hdl.handle.net/1773/6937.

Büchel, Nina. "Faktorenvorselektion im Data Mining /." Berlin : Logos, 2009. http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&doc_number=019006997&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA.

Shao, Junming. "Synchronization Inspired Data Mining." Diss., lmu, 2011. http://nbn-resolving.de/urn:nbn:de:bvb:19-137356.

Wang, Xiaohong. "Data mining with bilattices." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/MQ59344.pdf.

Knobbe, Arno J. "Multi-relational data mining /." Amsterdam [u.a.] : IOS Press, 2007. http://www.loc.gov/catdir/toc/fy0709/2006931539.html.

丁嘉慧 and Ka-wai Ting. "Time sequences: data mining." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2001. http://hub.hku.hk/bib/B31226760.

Wan, Chang, and 萬暢. "Mining multi-faceted data." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2013. http://hdl.handle.net/10722/197527.

GarciÌa-Osorio, CeÌsar. "Data mining and visualization." Thesis, University of Exeter, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.414266.

Wang, Grant J. (Grant Jenhorn) 1979. "Algorithms for data mining." Thesis, Massachusetts Institute of Technology, 2006. http://hdl.handle.net/1721.1/38315.

Anwar, Muhammad Naveed. "Data mining of audiology." Thesis, University of Sunderland, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.573120.

Santos, José Carlos Almeida. "Mining protein structure data." Master's thesis, FCT - UNL, 2006. http://hdl.handle.net/10362/1130.

Garda-Osorio, Cesar. "Data mining and visualisation." Thesis, University of the West of Scotland, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.742763.

Rawles, Simon Alan. "Object-oriented data mining." Thesis, University of Bristol, 2007. http://hdl.handle.net/1983/c13bda2c-75c9-4bfa-b86b-04ac06ba0278.

Mao, Shihong. "Comparative Microarray Data Mining." Wright State University / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=wright1198695415.

Novák, Petr. "Data mining časových řad." Master's thesis, Vysoká škola ekonomická v Praze, 2009. http://www.nusl.cz/ntk/nusl-72068.

Blunt, Gordon. "Mining credit card data." Thesis, n.p, 2002. http://ethos.bl.uk/.

Niggemann, Oliver. "Visual data mining of graph based data." [S.l. : s.n.], 2001. http://deposit.ddb.de/cgi-bin/dokserv?idn=962400505.

Li, Liangchun. "Web-based data visualization for data mining." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp03/MQ35845.pdf.

Al-Hashemi, Idrees Yousef. "Applying data mining techniques over big data." Thesis, Boston University, 2013. https://hdl.handle.net/2144/21119.

Zhou, Wubai. "Data Mining Techniques to Understand Textual Data." FIU Digital Commons, 2017. https://digitalcommons.fiu.edu/etd/3493.

KAVOOSIFAR, MOHAMMAD REZA. "Data Mining and Indexing Big Multimedia Data." Doctoral thesis, Politecnico di Torino, 2019. http://hdl.handle.net/11583/2742526.

Adderly, Darryl M. "Data mining meets e-commerce using data mining to improve customer relationship management /." [Gainesville, Fla.]: University of Florida, 2002. http://purl.fcla.edu/fcla/etd/UFE0000500.

Vithal, Kadam Omkar. "Novel applications of Association Rule Mining- Data Stream Mining." AUT University, 2009. http://hdl.handle.net/10292/826.

Patel, Akash. "Data Mining of Process Data in Multivariable Systems." Thesis, KTH, Skolan för elektro- och systemteknik (EES), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-201087.

Cordeiro, Robson Leonardo Ferreira. "Data mining in large sets of complex data." Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-22112011-083653/.

XIAO, XIN. "Data Mining Techniques for Complex User-Generated Data." Doctoral thesis, Politecnico di Torino, 2016. http://hdl.handle.net/11583/2644046.

Tong, Suk-man Ivy. "Techniques in data stream mining." Click to view the E-thesis via HKUTO, 2005. http://sunzi.lib.hku.hk/hkuto/record/B34737376.

Borgelt, Christian. "Data mining with graphical models." [S.l. : s.n.], 2000. http://deposit.ddb.de/cgi-bin/dokserv?idn=962912107.

Weber, Irene. "Suchraumbeschränkung für relationales Data Mining." [S.l. : s.n.], 2004. http://www.bsz-bw.de/cgi-bin/xvms.cgi?SWB11380447.

Maden, Engin. "Data Mining On Architecture Simulation." Master's thesis, METU, 2010. http://etd.lib.metu.edu.tr/upload/2/12611635/index.pdf.

Drwal, Maciej. "Data mining in distributedcomputer systems." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-5709.

Thun, Julia, and Rebin Kadouri. "Automating debugging through data mining." Thesis, KTH, Data- och elektroteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-203244.

Rahman, Sardar Muhammad Monzurur, and mrahman99@yahoo com. "Data Mining Using Neural Networks." RMIT University. Electrical & Computer Engineering, 2006. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20080813.094814.

Guo, Shishan. "Data mining in crystallographic databases." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape3/PQDD_0012/NQ52854.pdf.

Sun, Wenyi. "Data mining extension for economics." Diss., Columbia, Mo. : University of Missouri-Columbia, 2006. http://hdl.handle.net/10355/5869.

Papadatos, George. "Data mining for lead optimisation." Thesis, University of Sheffield, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.556989.

Rice, Simon B. "Text data mining in bioinformatics." Thesis, University of Manchester, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.488351.

Lin, Zhenmin. "Privacy Preserving Distributed Data Mining." UKnowledge, 2012. http://uknowledge.uky.edu/cs_etds/9.

Tong, Suk-man Ivy, and 湯淑敏. "Techniques in data stream mining." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2005. http://hub.hku.hk/bib/B34737376.

Luo, Man. "Data mining and classical statistics." Virtual Press, 2004. http://liblink.bsu.edu/uhtbin/catkey/1304657.

Cai, Zhongming. "Technical aspects of data mining." Thesis, Cardiff University, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.395784.

Shioda, Romy 1977. "Integer optimization in data mining." Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/17579.

Lo, Ya-Chin, and 羅雅琴. "Data mining in bioinformatics -- NCBI tools for data mining." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/38227591029165701821.

Google Custom Search

Wir verwenden Google für unsere Suche. Mit Klick auf „Suche aktivieren“ aktivieren Sie das Suchfeld und akzeptieren die Nutzungsbedingungen.

Hinweise zum Einsatz der Google Suche

Data Analytics and Machine Learning Group
TUM School of Computation, Information and Technology
Technical University of Munich

Open Topics

We offer multiple Bachelor/Master theses, Guided Research projects and IDPs in the area of data mining/machine learning. A non-exhaustive list of open topics is listed below.

If you are interested in a thesis or a guided research project, please send your CV and transcript of records to Prof. Stephan Günnemann via email and we will arrange a meeting to talk about the potential topics.

Graph Neural Networks for Spatial Transcriptomics

Type: Master's Thesis

Prerequisites:

Strong machine learning knowledge
Proficiency with Python and deep learning frameworks (PyTorch, TensorFlow, JAX)
Knowledge of graph neural networks (e.g., GCN, MPNN)
Optional: Knowledge of bioinformatics and genomics

Description:

Spatial transcriptomics is a cutting-edge field at the intersection of genomics and spatial analysis, aiming to understand gene expression patterns within the context of tissue architecture. Our project focuses on leveraging graph neural networks (GNNs) to unlock the full potential of spatial transcriptomic data. Unlike traditional methods, GNNs can effectively capture the intricate spatial relationships between cells, enabling more accurate modeling and interpretation of gene expression dynamics across tissues. We seek motivated students to explore novel GNN architectures tailored for spatial transcriptomics, with a particular emphasis on addressing challenges such as spatial heterogeneity, cell-cell interactions, and spatially varying gene expression patterns.

Contact : Filippo Guerranti , Alessandro Palma

References:

Cell clustering for spatial transcriptomics data with graph neural network
Unsupervised spatially embedded deep representation of spatial transcriptomics
SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network
DeepST: identifying spatial domains in spatial transcriptomics by deep learning
Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder

GCNG: graph convolutional networks for inferring gene interaction from spatial transcriptomics data

Generative Models for Drug Discovery

Type: Mater Thesis / Guided Research

Proficiency with Python and deep learning frameworks (PyTorch or TensorFlow)
Knowledge of graph neural networks (e.g. GCN, MPNN)
No formal education in chemistry, physics or biology needed!

Effectively designing molecular geometries is essential to advancing pharmaceutical innovations, a domain which has experienced great attention through the success of generative models. These models promise a more efficient exploration of the vast chemical space and generation of novel compounds with specific properties by leveraging their learned representations, potentially leading to the discovery of molecules with unique properties that would otherwise go undiscovered. Our topics lie at the intersection of generative models like diffusion/flow matching models and graph representation learning, e.g., graph neural networks. The focus of our projects can be model development with an emphasis on downstream tasks ( e.g., diffusion guidance at inference time ) and a better understanding of the limitations of existing models.

Contact : Johanna Sommer , Leon Hetzel

Equivariant Diffusion for Molecule Generation in 3D

Equivariant Flow Matching with Hybrid Probability Transport for 3D Molecule Generation

Structure-based Drug Design with Equivariant Diffusion Models

Efficient Machine Learning: Pruning, Quantization, Distillation, and More - DAML x Pruna AI

Type: Master's Thesis / Guided Research / Hiwi

Strong knowledge in machine learning
Proficiency with Python and deep learning frameworks (TensorFlow or PyTorch)

The efficiency of machine learning algorithms is commonly evaluated by looking at target performance, speed and memory footprint metrics. Reduce the costs associated to these metrics is of primary importance for real-world applications with limited ressources (e.g. embedded systems, real-time predictions). In this project, you will work in collaboration with the DAML research group and the Pruna AI startup on investigating solutions to improve the efficiency of machine leanring models by looking at multiple techniques like pruning, quantization, distillation, and more.

Contact: Bertrand Charpentier

The Efficiency Misnomer
A Gradient Flow Framework for Analyzing Network Pruning
Distilling the Knowledge in a Neural Network
A Survey of Quantization Methods for Efficient Neural Network Inference

Deep Generative Models

Type: Master Thesis / Guided Research

Strong machine learning and probability theory knowledge
Knowledge of generative models and their basics (e.g., Normalizing Flows, Diffusion Models, VAE)
Optional: Neural ODEs/SDEs, Optimal Transport, Measure Theory

With recent advances, such as Diffusion Models, Transformers, Normalizing Flows, Flow Matching, etc., the field of generative models has gained significant attention in the machine learning and artificial intelligence research community. However, many problems and questions remain open, and the application to complex data domains such as graphs, time series, point processes, and sets is often non-trivial. We are interested in supervising motivated students to explore and extend the capabilities of state-of-the-art generative models for various data domains.

Contact : Marcel Kollovieh , David Lüdke

Flow Matching for Generative Modeling
Auto-Encoding Variational Bayes
Denoising Diffusion Probabilistic Models
Structured Denoising Diffusion Models in Discrete State-Spaces

A Machine Learning Perspective on Corner Cases in Autonomous Driving Perception

Type: Master's Thesis

Industrial partner: BMW

Prerequisites:

Strong knowledge in machine learning
Knowledge of Semantic Segmentation
Good programming skills
Proficiency with Python and deep learning frameworks (TensorFlow or PyTorch)

Description:

In autonomous driving, state-of-the-art deep neural networks are used for perception tasks like for example semantic segmentation. While the environment in datasets is controlled in real world application novel class or unknown disturbances can occur. To provide safe autonomous driving these cased must be identified.

The objective is to explore novel class segmentation and out of distribution approaches for semantic segmentation in the context of corner cases for autonomous driving.

Contact: Sebastian Schmidt

References:

Segmenting Known Objects and Unseen Unknowns without Prior Knowledge
Efficient Uncertainty Estimation for Semantic Segmentation in Videos
Natural Posterior Network: Deep Bayesian Uncertainty for Exponential Family
Description of Corner Cases in Automated Driving: Goals and Challenges

Active Learning for Multi Agent 3D Object Detection

Type: Master's Thesis Industrial partner: BMW

Knowledge in Object Detection
Excellent programming skills

In autonomous driving, state-of-the-art deep neural networks are used for perception tasks like for example 3D object detection. To provide promising results, these networks often require a lot of complex annotation data for training. These annotations are often costly and redundant. Active learning is used to select the most informative samples for annotation and cover a dataset with as less annotated data as possible.

The objective is to explore active learning approaches for 3D object detection using combined uncertainty and diversity based methods.

Exploring Diversity-based Active Learning for 3D Object Detection in Autonomous Driving
Efficient Uncertainty Estimation for Semantic Segmentation in Videos
KECOR: Kernel Coding Rate Maximization for Active 3D Object Detection
Towards Open World Active Learning for 3D Object Detection

Graph Neural Networks

Type: Master's thesis / Bachelor's thesis / guided research

Knowledge of graph/network theory

Graph neural networks (GNNs) have recently achieved great successes in a wide variety of applications, such as chemistry, reinforcement learning, knowledge graphs, traffic networks, or computer vision. These models leverage graph data by updating node representations based on messages passed between nodes connected by edges, or by transforming node representation using spectral graph properties. These approaches are very effective, but many theoretical aspects of these models remain unclear and there are many possible extensions to improve GNNs and go beyond the nodes' direct neighbors and simple message aggregation.

Contact: Simon Geisler

Semi-supervised classification with graph convolutional networks
Relational inductive biases, deep learning, and graph networks
Diffusion Improves Graph Learning
Weisfeiler and leman go neural: Higher-order graph neural networks
Reliable Graph Neural Networks via Robust Aggregation

Physics-aware Graph Neural Networks

Type: Master's thesis / guided research

Proficiency with Python and deep learning frameworks (JAX or PyTorch)
Knowledge of graph neural networks (e.g. GCN, MPNN, SchNet)
Optional: Knowledge of machine learning on molecules and quantum chemistry

Deep learning models, especially graph neural networks (GNNs), have recently achieved great successes in predicting quantum mechanical properties of molecules. There is a vast amount of applications for these models, such as finding the best method of chemical synthesis or selecting candidates for drugs, construction materials, batteries, or solar cells. However, GNNs have only been proposed in recent years and there remain many open questions about how to best represent and leverage quantum mechanical properties and methods.

Contact: Nicholas Gao

Directional Message Passing for Molecular Graphs
Neural message passing for quantum chemistry
Learning to Simulate Complex Physics with Graph Network
Ab initio solution of the many-electron Schrödinger equation with deep neural networks
Ab-Initio Potential Energy Surfaces by Pairing GNNs with Neural Wave Functions
Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds

Robustness Verification for Deep Classifiers

Type: Master's thesis / Guided research

Strong machine learning knowledge (at least equivalent to IN2064 plus an advanced course on deep learning)
Strong background in mathematical optimization (preferably combined with Machine Learning setting)
Proficiency with python and deep learning frameworks (Pytorch or Tensorflow)
(Preferred) Knowledge of training techniques to obtain classifiers that are robust against small perturbations in data

Description : Recent work shows that deep classifiers suffer under presence of adversarial examples: misclassified points that are very close to the training samples or even visually indistinguishable from them. This undesired behaviour constraints possibilities of deployment in safety critical scenarios for promising classification methods based on neural nets. Therefore, new training methods should be proposed that promote (or preferably ensure) robust behaviour of the classifier around training samples.

Contact: Aleksei Kuvshinov

References (Background):

Intriguing properties of neural networks
Explaining and harnessing adversarial examples
SoK: Certified Robustness for Deep Neural Networks
Certified Adversarial Robustness via Randomized Smoothing
Formal guarantees on the robustness of a classifier against adversarial manipulation
Towards deep learning models resistant to adversarial attacks
Provable defenses against adversarial examples via the convex outer adversarial polytope
Certified defenses against adversarial examples
Lipschitz-margin training: Scalable certification of perturbation invariance for deep neural networks

Uncertainty Estimation in Deep Learning

Type: Master's Thesis / Guided Research

Strong knowledge in probability theory

Safe prediction is a key feature in many intelligent systems. Classically, Machine Learning models compute output predictions regardless of the underlying uncertainty of the encountered situations. In contrast, aleatoric and epistemic uncertainty bring knowledge about undecidable and uncommon situations. The uncertainty view can be a substantial help to detect and explain unsafe predictions, and therefore make ML systems more robust. The goal of this project is to improve the uncertainty estimation in ML models in various types of task.

Contact: Tom Wollschläger , Dominik Fuchsgruber , Bertrand Charpentier

Can You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift
Predictive Uncertainty Estimation via Prior Networks
Posterior Network: Uncertainty Estimation without OOD samples via Density-based Pseudo-Counts
Evidential Deep Learning to Quantify Classification Uncertainty
Weight Uncertainty in Neural Networks

Hierarchies in Deep Learning

Type: Master's Thesis / Guided Research

Multi-scale structures are ubiquitous in real life datasets. As an example, phylogenetic nomenclature naturally reveals a hierarchical classification of species based on their historical evolutions. Learning multi-scale structures can help to exhibit natural and meaningful organizations in the data and also to obtain compact data representation. The goal of this project is to leverage multi-scale structures to improve speed, performances and understanding of Deep Learning models.

Contact: Marcel Kollovieh , Bertrand Charpentier

Tree Sampling Divergence: An Information-Theoretic Metricfor Hierarchical Graph Clustering
Hierarchical Graph Representation Learning with Differentiable Pooling
Gradient-based Hierarchical Clustering
Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space

Open access
Published: 03 March 2022

Educational data mining: prediction of students' academic performance using machine learning algorithms

Mustafa Yağcı ORCID: orcid.org/0000-0003-2911-3909 1

Smart Learning Environments volume 9 , Article number: 11 ( 2022 ) Cite this article

53k Accesses

111 Citations

38 Altmetric

Metrics details

Educational data mining has become an effective tool for exploring the hidden relationships in educational data and predicting students' academic achievements. This study proposes a new model based on machine learning algorithms to predict the final exam grades of undergraduate students, taking their midterm exam grades as the source data. The performances of the random forests, nearest neighbour, support vector machines, logistic regression, Naïve Bayes, and k-nearest neighbour algorithms, which are among the machine learning algorithms, were calculated and compared to predict the final exam grades of the students. The dataset consisted of the academic achievement grades of 1854 students who took the Turkish Language-I course in a state University in Turkey during the fall semester of 2019–2020. The results show that the proposed model achieved a classification accuracy of 70–75%. The predictions were made using only three types of parameters; midterm exam grades, Department data and Faculty data. Such data-driven studies are very important in terms of establishing a learning analysis framework in higher education and contributing to the decision-making processes. Finally, this study presents a contribution to the early prediction of students at high risk of failure and determines the most effective machine learning methods.

Introduction

The application of data mining methods in the field of education has attracted great attention in recent years. Data Mining (DM) is the discovery of data. It is the field of discovering new and potentially useful information or meaningful results from big data (Witten et al., 2011 ). It also aims to obtain new trends and new patterns from large datasets by using different classification algorithms (Baker & Inventado, 2014 ).

Educational data mining (EDM) is the use of traditional DM methods to solve problems related to education (Baker & Yacef, 2009 ; cited in Fernandes et al., 2019 ). EDM is the use of DM methods on educational data such as student information, educational records, exam results, student participation in class, and the frequency of students' asking questions. In recent years, EDM has become an effective tool used to identify hidden patterns in educational data, predict academic achievement, and improve the learning/teaching environment.

Learning analytics has gained a new dimension through the use of EDM (Waheed et al., 2020 ). Learning analytics covers the various aspects of collecting student information together, better understanding the learning environment by examining and analysing it, and revealing the best student/teacher performance (Long & Siemens, 2011 ). Learning analytics is the compilation, measurement and reporting of data about students and their contexts in order to understand and optimize learning and the environments in which it takes place. It also deals with the institutions developing new strategies.

Another dimension of learning analytics is predicting student academic performance, uncovering patterns of system access and navigational actions, and determining students who are potentially at risk of failing (Waheed et al., 2020 ). Learning management systems (LMS), student information systems (SIS), intelligent teaching systems (ITS), MOOCs, and other web-based education systems leave digital data that can be examined to evaluate students' possible behavior. Using EDM method, these data can be employed to analyse the activities of successful students and those who are at risk of failure, to develop corrective strategies based on student academic performance, and therefore to assist educators in the development of pedagogical methods (Casquero et al., 2016 ; Fidalgo-Blanco et al., 2015 ).

The data collected on educational processes offer new opportunities to improve the learning experience and to optimize users' interaction with technological platforms (Shorfuzzaman et al., 2019 ). The processing of educational data yields improvements in many areas such as predicting student behaviour, analytical learning, and new approaches to education policies (Capuano & Toti, 2019 ; Viberg et al., 2018 ). This comprehensive collection of data will not only allow education authorities to make data-based policies, but also form the basis of software to be developed with artificial intelligence on the learning process.

EDM enables educators to predict situations such as dropping out of school or less interest in the course, analyse internal factors affecting their performance, and make statistical techniques to predict students' academic performance. A variety of DM methods are employed to predict student performance, identify slow learners, and dropouts (Hardman et al., 2013 ; Kaur et al., 2015 ). Early prediction is a new phenomenon that includes assessment methods to support students by proposing appropriate corrective strategies and policies in this field (Waheed et al., 2020 ).

Especially during the pandemic period, learning management systems, quickly put into practice, have become an indispensable part of higher education. While students use these systems, the log records produced have become ever more accessible. (Macfadyen & Dawson, 2010 ; Kotsiantis et al., 2013 ; Saqr et al., 2017 ). Universities now should improve the capacity of using these data to predict academic success and ensure student progress (Bernacki et al., 2020 ).

As a result, EDM provides the educators with new information by discovering hidden patterns in educational data. Using this model, some aspects of the education system can be evaluated and improved to ensure the quality of education.

In various studies on EDM, e-learning systems have been successfully analysed (Lara et al., 2014 ). Some studies have also classified educational data (Chakraborty et al., 2016 ), while some have tried to predict student performance (Fernandes et al., 2019 ).

Asif et al. ( 2017 ) focused on two aspects of the performance of undergraduate students using DM methods. The first aspect is to predict the academic achievements of students at the end of a four-year study program. The second one is to examine the development of students and combine them with predictive results. He divided the students into two parts as low achievement and high achievement groups. He have found that it is important for the educators to focus on a small number of courses indicating particularly good or poor performance in order to offer timely warnings, support underperforming students and offer advice and opportunities to high-performing students. Cruz-Jesus et al. ( 2020 ) predicted student academic performance with 16 demographics such as age, gender, class attendance, internet access, computer possession, and the number of courses taken. Random forest, logistic regression, k-nearest neighbours and support vector machines, which are among the machine learning methods, were able to predict students’ performance with accuracy ranging from 50 to 81%.

Fernandes et al. ( 2019 ) developed a model with the demographic characteristics of the students and the achievement grades obtained from the in-term activities. In that study, students' academic achievement was predicted with classification models based on Gradient Boosting Machine (GBM). The results showed that the best qualities for estimating achievement scores were the previous year's achievement scores and unattendance. The authors found that demographic characteristics such as neighbourhood, school and age information were also potential indicators of success or failure. In addition, he argued that this model could guide the development of new policies to prevent failure. Similarly, by using the student data requested during registration and environmental factors, Hoffait and Schyns ( 2017 ) determined the students with the potential to fail. He found that students with potential difficulties could be classified more precisely by using DM methods. Moreover, their approach makes it possible to rank the students by levels of risk. Rebai et al. ( 2020 ) proposed a machine learning-based model to identify the key factors affecting academic performance of schools and to determine the relationship between these factors. He concluded that the regression trees showed that the most important factors associated with higher performance were school size, competition, class size, parental pressure, and gender proportions. In addition, according to the random forest algorithm results, the school size and the percentage of girls had a powerful impact on the predictive accuracy of the model.

Ahmad and Shahzadi, ( 2018 ) proposed a machine learning-based model to find an answer to the question whether students were at risk regarding their academic performance. Using the students' learning skills, study habits, and academic interaction features, they made a prediction with a classification accuracy of 85%. The researchers concluded that the model they proposed could be used to determine academically unsuccessful student. Musso et al., ( 2020 ) proposed a machine learning model based on learning strategies, perception of social support, motivation, socio-demographics, health condition, and academic performance characteristics. With this model, he predicted the academic performance and dropouts. He concluded that the predictive variable with the highest effect on predicting GPA was learning strategies while the variable with the greatest effect on determining dropouts was background information.

Waheed et al., ( 2020 ) designed a model with artificial neural networks on students' records related to their navigation through the LMS. The results showed that demographics and student clickstream activities had a significant impact on student performance. Students who navigated through courses performed higher. Students' participation in the learning environment had nothing to do with their performance. However, he concluded that the deep learning model could be an important tool in the early prediction of student performance. Xu et al. ( 2019 ) determined the relationship between the internet usage behaviors of university students and their academic performance and he predicted students’ performance with machine learning methods. The model he proposed predicted students' academic performance at a high level of accuracy. The results suggested that Internet connection frequency features were positively correlated with academic performance, whereas Internet traffic volume features were negatively correlated with academic performance. In addition, he concluded that internet usage features had an important role on students' academic performance. Bernacki et al. ( 2020 ) tried to find out whether the log records in the learning management system alone would be sufficient to predict achievement. He concluded that the behaviour-based prediction model successfully predicted 75% of those who would need to repeat a course. He also stated that, with this model, students who might be unsuccessful in the subsequent semesters could be identified and supported. Burgos et al. ( 2018 ) predicted the achievement grades that the students might get in the subsequent semesters and designed a tool for students who were likely to fail. He found that the number of unsuccessful students decreased by 14% compared to previous years. A comparative analysis of studies predicting the academic achievement grades using machine learning methods is given in Table 1 .

A review of previous research that aimed to predict academic achievement indicates that researchers have applied a range of machine learning algorithms, including multiple, probit and logistic regression, neural networks, and C4.5 and J48 decision trees. However, random forests (Zabriskie et al., 2019 ), genetic programming (Xing et al., 2015 ), and Naive Bayes algorithms (Ornelas & Ordonez, 2017 ) were used in recent studies. The prediction accuracy of these models reaches very high levels.

Prediction accuracy of student academic performance requires an deep understanding of the factors and features that impact student results and the achievement of student (Alshanqiti & Namoun, 2020 ). For this purpose, Hellas et al. ( 2018 ) reviewed 357 articles on student performance detailing the impact of 29 features. These features were mainly related to psychomotor skills such as course and pre-course performance, student participation, student demographics such as gender, high school performance, and self-regulation. However, the dropout rates were mainly influenced by student motivation, habits, social and financial issues, lack of progress, and career transitions.

The literature review suggests that, it is a necessity to improve the quality of education by predicting the academic performance of the students and supporting those who are in the risk group. In the literature, the prediction of academic performance was made with many and various variables, various digital traces left by students on the internet (browsing, lesson time, percentage of participation) (Fernandes et al., 2019 ; Rubin et al., 2010 ; Waheed et al., 2020 ; Xu et al., 2019 ) and students demographic characteristics (gender, age, economic status, number of courses attended, internet access, etc.) (Bernacki et al., 2020 ; Rizvi et al., 2019 ; García-González & Skrita, 2019 ; Rebai et al., 2020 ; Cruz-Jesus et al., 2020 ; Aydemir, 2017 ), learning skills, study approaches, study habits (Ahmad & Shahzadi, 2018 ), learning strategies, social support perception, motivation, socio-demography, health form, academic performance characteristics (Costa-Mendes et al., 2020 ; Gök, 2017 ; Kılınç, 2015 ; Musso et al., 2020 ), homework, projects, quizzes (Kardaş & Güvenir, 2020 ), etc. In almost all models developed in such studies, prediction accuracy is ranging from 70 to 95%. Hovewer, collecting and processing such a variety of data both takes a lot of time and requires expert knowledge. Similarly, Hoffait and Schyns ( 2017 ) suggested that collecting so many data is difficult and socio-economic data are unnecessary. Moreover, these demographic or socio-economic data may not always give the right idea of preventing failure (Bernacki et al., 2020 ).

The study concerns predicting students’ academic achievement using grades only, no demographic characteristics and no socio-economic data. This study aimed to develop a new model based on machine learning algorithms to predict the final exam grades of undergraduate students taking their midterm exam grades, Faculty and Department of the students.

For this purpose, classification algorithms with the highest performance in predicting students’ academic achievement were determined by using machine learning classification algorithms. The reason for choosing the Turkish Language-I course was that it is a compulsory course that all students enrolled in the university must take. Using this model, students’ final exam grades were predicted. These models will enable the development of pedagogical interventions and new policies to improve students' academic performance. In this way, the number of potentially unsuccessful students can be reduced following the assessments made after each midterm.

This section describes the details of the dataset, pre-processing techniques, and machine learning algorithms employed in this study.

Educational institutions regularly store all data that are available about students in electronic medium. Data are stored in databases for processing. These data can be of many types and volumes, from students’ demographics to their academic achievements. In this study, the data were taken from the Student Information System (SIS), where all student records are stored at a State University in Turkey. In these records, the midterm exam grades, final exam grades, Faculty, and Department of 1854 students who have taken the Turkish Language-I course in the 2019–2020 fall semester were selected as the dataset. Table 2 shows the distribution of students according to the academic unit. Moreover, as a additional file 1 the dataset are presented.

Midterm and final exam grades are ranging from 0 to 100. In this system, the end-of-semester achievement grade is calculated by taking 40% of the midterm exam and 60% of the final exam. Students with achievement grade below 60 are unsuccessful and those above 60 are successful. The midterm exam is usually held in the middle of the academic semester and the final exam is held at the end of the semester. There are approximately 9 weeks (2.5 months) from the midterm exam to the final exam. In other words, there is a two and a half month period for corrective actions for students who are at risk of failing thanks to the final exam predictions made. In other words, the answer to the question of how effective the student's performance in the middle of the semester is on his performance at the end of the semester was investigated.

Data identification and collection

At this phase, it is determined from which source the data will be stored, which features of the data will be used, and whether the collected data is suitable for the purpose. Feature selection involves decreasing the number of variables used to predict a particular outcome. The goal; to facilitate the interpretability of the model, reduce complexity, increase the computational efficiency of algorithms, and avoid overfitting.

Establishing DM model and implementation of algorithm

RF, NN, LR, SVM, NB and kNN were employed to predict students' academic performance. The prediction accuracy was evaluated using tenfold cross validation. The DM process serves two main purposes. The first purpose is to make predictions by analyzing the data in the database (predictive model). The second one is to describe behaviors (descriptive model). In predictive models, a model is created by using data with known results. Then, using this model, the result values are predicted for datasets whose results are unknown. In descriptive models, the patterns in the existing data are defined to make decisions.

When the focus is on analysing the causes of success or failure, statistical methods such as logistic regression and time series can be employed (Ortiz & Dehon, 2008 ; Arias Ortiz & Dehon, 2013 ). However, when the focus is on forecasting, neural networks (Delen, 2010 ; Vandamme et al., 2007 ), support vector machines (Huang & Fang, 2013 ), decision trees (Delen, 2011 ; Nandeshwar et al., 2011 ) and random forests (Delen, 2010 ; Vandamme et al., 2007 ) is more efficient and give more accurate results. Statistical techniques are to create a model that can successfully predict output values based on available input data. On the other hand, machine learning methods automatically create a model that matches the input data with the expected target values when a supervised optimization problem is given.

The performance of the model was measured by confusion matrix indicators. It is understood from the literature that there is no single classifier that works best for prediction results. Therefore, it is necessary to investigate which classifiers are more studied for the analysed data (Asif et al., 2017 ).

Experiments and results

The entire experimental phase was performed with Orange machine learning software. Orange is a powerful and easy-to-use component-based DM programming tool for expert data scientists as well as for data science beginners. In Orange, data analysis is done by stacking widgets into workflows. Each widget includes some data retrieval, data pre-processing, visualization, modelling, or evaluation task. A workflow is a series of actions or actions that will be performed on the platform to perform a specific task. Comprehensive data analysis charts can be created by combining different components in a workflow. Figure 1 shows the workflow diagram designed.

The workflow of the designed model

The dataset included midterm exam grades, final exam grades, Faculty, and Department of 1854 students taking the Turkish Language-I course in the 2019–2020 Fall Semester. The entire dataset is provided as Additional file 1 . Table 3 shows part of the dataset.

In the dataset, students' midterm exam grades, final exam grades, faculty, and department information were determined as features. Each measure contains data associated with a student. Midterm exam and final exam grade variables were explained under the heading "dataset". The faculty variable represents Faculties in Kırşehir Ahi Evran University and the department variable represents departments in faculties. In the development of the model, the midterm, the faculty, and the department information were determined as the independent variable and the final was determined as the dependent variable. Table 4 shows the variable model.

After the variable model was determined, the midterm exam grades and final exam grades were categorized according to the equal-width discretization model. Table 5 shows the criteria used in converting midterm exam grades and final exam grades into the categorical format.

In Table 6 , the values in the final column are the actual values. The values in the RF, SVM, LR, KNN, NB, and NN columns are the values predicted by the proposed model. For example, according to Table 5 , std1’s actual final grade was in the range 55 to 77. While the predicted value of the RF, SVM, LR, NB, and NN models were in the range of, the predicted value of the kNN model was greater than 77.

Evaluation of the model performance

The performance of model was evaluated with confusion matrix, classification accuracy (CA), precision, recall, f-score (F1), and area under roc curve (AUC) metrics.

Confusion matrix

The confusion matrix shows the current situation in the dataset and the number of correct/incorrect predictions of the model. Table 7 shows the confusion matrix. The performance of the model is calculated by the number of correctly classified instances and incorrectly classified instances. The rows show the real numbers of the samples in the test set, and the columns represent the estimation of the model.

In Table 6 , true positive (TP) and true negative (TN) show the number of correctly classified instances. False positive (FP) shows the number of instances predicted as 1 (positive) while it should be in the 0 (negative) class. False negative (FN) shows the number of instances predicted as 0 (negative) while it should be in class 1 (positive).

Table 8 shows the confusion matrix for the RF algorithm. In the confusion matrix of 4 × 4 dimensions, the main diagonal shows the percentage of correctly predicted instances, and the matrix elements other than the main diagonal shows the percentage of errors predicted.

Table 8 shows that 84.9% of those with the actual final grade greater than 77.5, 71.2% of those with range 55–77.5, 65.4% of those with range 32.5–55, and 60% of those with less than 32.5 were predicted correctly. Confusion matrixs of other algorithms are shown in Tables 9 , 10 , 11 , 12 , and 13 .

Classification accuracy: CA is the ratio of the correct predictions (TP + TN) to the total number of instances (TP + TN + FP + FN).

Precision: Precision is the ratio of the number of positive instances that are correctly classified to the total number of instances that are predicted positive. Gets a value in the range [0.1].

Recall: Recall i s the ratio of the correctly classified number of positive instances to the number of all instances whose actual class is positive. The Recall is also called the true positive rate. Gets a value in the range [0.1].

F-Criterion (F1): There is an opposite relationship between precision and recall. Therefore, the harmonic mean of both criteria is calculated for more accurate and sensitive results. This is called the F-criterion.

Receiver operating characteristics (ROC) curve

The AUC-ROC curve is used to evaluate the performance of a classification problem. AUC-ROC is a widely used metric to evaluate the performance of machine learning algorithms, especially in cases where there are unbalanced datasets, and explains how well the model is at predicting.

AUC: Area under the ROC curve

The larger the area covered, the better the machine learning algorithms at distinguishing given classes. AUC for the ideal value is 1. The AUC, Classification Accuracy (CA), F-Criterion (F1), precision, and recall values of the models are shown in Table 14 .

The AUC value of RF, NN, SVM, LR, NB, and kNN algorithms were 0.860, 0.863, 0.804, 0.826, 0.810, and 0.810 respectively. The classification accuracy of the RF, NN, SVM, LR, NB, and kNN algorithms were also 0.746, 0.746, 0.735, 0.717, 0.713, and 0,699 respectively. According to these findings, for example, the RF algorithm was able to achieve 74.6% accuracy. In other words, there was a very high-level correlation between the data predicted and the actual data. As a result, 74.6% of the samples were been classified correctly.

Discussion and conclusion

This study proposes a new model based on machine learning algorithms to predict the final exam grades of undergraduate students, taking their midterm exam grades as the source data. The performances of the Random Forests, nearest neighbour, support vector machines, Logistic Regression, Naïve Bayes, and k-nearest neighbour algorithms, which are among the machine learning algorithms, were calculated and compared to predict the final exam grades of the students. This study focused on two parameters. The first parameter was the prediction of academic performance based on previous achievement grades. The second one was the comparison of performance indicators of machine learning algorithms.

The results show that the proposed model achieved a classification accuracy of 70–75%. According to this result, it can be said that students' midterm exam grades are an important predictor to be used in predicting their final exam grades. RF, NN, SVM, LR, NB, and kNN are algorithms with a very high accuracy rate that can be used to predict students' final exam grades. Furthermore, the predictions were made using only three types of parameters; midterm exam grades, Department data and Faculty data. The results of this study were compared with the studies that predicted the academic achievement grades of the students with various demographic and socio-economic variables. Hoffait and Schyns ( 2017 ) proposed a model that uses the academic achievement of students in previous years. With this model, they predicted students' performance to be successful in the courses they will take in the new semester. They found that 12.2% of the students had a very high risk of failure, with a 90% confidence rate. Waheed et al. ( 2020 ) predicted the achievement of the students with demographic and geographic characteristics. He found that it has a significant effect on students' academic performance. He predicted the failure or success of the students by 85% accuracy. Xu et al. ( 2019 ) found that internet usage data can distinguish and predict students' academic performance. Costa-Mendes et al. ( 2020 ), Cruz-Jesus et al. ( 2020 ), Costa-Mendes et al. ( 2020 ) predicted the academic achievement of students in the light of income, age, employment, cultural level indicators, place of residence, and socio-economic information. Similarly, Babić ( 2017 ) predicted students’ performance with an accuracy of 65% to 100% with artificial neural networks, classification tree, and support vector machines methods.

Another result of this study was RF, NN and SVM algorithms have the highest classification accuracy, while kNN has the lowest classification accuracy. According to this result, it can be said that RF, NN and SVM algorithms perform with more accurate results in predicting the academic achievement grades of students with machine learning algorithms. The results were compared with the results of the research in which machine learning algorithms were employed to predict academic performance according to various variables. For example, Hoffait and Schyns ( 2017 ) compared the performances of LR, ANN and RF algorithms to identify students at high risk of academic failure on their various demographic characteristics. They ranked the algorithms from those with the highest accuracy to the ones with the lowest accuracy as LR, ANN, and RF. On the other hand, Waheed et al. ( 2020 ) found that the SVM algorithm performed higher than the LR algorithm. According to Xu et al. ( 2019 ), the algorithm with the highest performance is SVM, followed by the NN algorithm, and the decision tree is the algorithm with the lowest performance.

The proposed model predicted the final exam grades of students with 73% accuracy. According to this result, it can be said that academic achievement can be predicted with this model in the future. By predicting students' achievement grades in future, students can be allowed to review their working methods and improve their performance. The importance of the proposed method can be better understood, considering that there is approximately 2.5 months between the midterm exams and the final exams in higher education. Similarly, Bernacki et al. ( 2020 ) work on the early warning model. He proposed a model to predict the academic achievements of students using their behavior data in the learning management system before the first exam. His algorithm correctly identified 75% of students who failed to earn the grade of B or better needed to advance to the next course. Ahmad and Shahzadi ( 2018 ) predicted students at risk for academic performance with 85% accuracy evaluating their study habits, learning skills, and academic interaction features. Cruz-Jesus et al. ( 2020 ) predicted students' end-of-semester grades with 16 independent variables. He concluded that students could be given the opportunity of early intervention.

As a result, students' academic performances were predicted using different predictors, different algorithms and different approaches. The results confirm that machine learning algorithms can be used to predict students’ academic performance. More importantly, the prediction was made only with the parameters of midterm grade, faculty and department. Teaching staff can benefit from the results of this research in the early recognition of students who have below or above average academic motivation. Later, for example, as Babić ( 2017 ) points out, they can match students with below-average academic motivation by students with above-average academic motivation and encourage them to work in groups or project work. In this way, the students' motivation can be improved, and their active participation in learning can be ensured. In addition, such data-driven studies should assist higher education in establishing a learning analytics framework and contribute to decision-making processes.

Future research can be conducted by including other parameters as input variables and adding other machine learning algorithms to the modelling process. In addition, it is necessary to harness the effectiveness of DM methods to investigate students' learning behaviors, address their problems, optimize the educational environment, and enable data-driven decision making.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

Educational data mining

Random forests

Neural networks

Support vector machines

Logistic regression

Naïve Bayes

K-nearest neighbour

Decision trees

Artificial neural networks

Extremely randomized trees

Regression trees

Multilayer perceptron neural network

Feed-forward neural network

Adaptive resonance theory mapping

Learning management systems

Student information systems

Intelligent teaching systems

Classification accuracy

Area under roc curve

True positive

True negative

False positive

False negative

Receiver operating characteristics

Ahmad, Z., & Shahzadi, E. (2018). Prediction of students’ academic performance using artificial neural network. Bulletin of Education and Research, 40 (3), 157–164.

Google Scholar

Alshanqiti, A., & Namoun, A. (2020). Predicting student performance and its influential factors using hybrid regression and multi-label classification. IEEE Access, 8 , 203827–203844. https://doi.org/10.1109/access.2020.3036572

Article Google Scholar

Arias Ortiz, E., & Dehon, C. (2013). Roads to success in the Belgian French Community’s higher education system: predictors of dropout and degree completion at the Université Libre de Bruxelles. Research in Higher Education, 54 (6), 693–723. https://doi.org/10.1007/s11162-013-9290-y

Asif, R., Merceron, A., Ali, S. A., & Haider, N. G. (2017). Analyzing undergraduate students’ performance using educational data mining. Computers and Education, 113 , 177–194. https://doi.org/10.1016/j.compedu.2017.05.007

Aydemir, B. (2017). Predicting academic success of vocational high school students using data mining methods graduate . [Unpublished master’s thesis]. Pamukkale University Institute of Science.

Babić, I. D. (2017). Machine learning methods in predicting the student academic motivation. Croatian Operational Research Review, 8 (2), 443–461. https://doi.org/10.17535/crorr.2017.0028

Baker, R. S., & Inventado, P. S. (2014). Educational data mining and learning analytics. Learning analytics (pp. 61–75). Springer.

Chapter Google Scholar

Baker, R. S., & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining, 1 (1), 3–17.

Bernacki, M. L., Chavez, M. M., & Uesbeck, P. M. (2020). Predicting achievement and providing support before STEM majors begin to fail. Computers & Education, 158 (August), 103999. https://doi.org/10.1016/j.compedu.2020.103999

Burgos, C., Campanario, M. L., De, D., Lara, J. A., Lizcano, D., & Martínez, M. A. (2018). Data mining for modeling students’ performance: A tutoring action plan to prevent academic dropout. Computers and Electrical Engineering, 66 (2018), 541–556. https://doi.org/10.1016/j.compeleceng.2017.03.005

Capuano, N., & Toti, D. (2019). Experimentation of a smart learning system for law based on knowledge discovery and cognitive computing. Computers in Human Behavior, 92 , 459–467. https://doi.org/10.1016/j.chb.2018.03.034

Casquero, O., Ovelar, R., Romo, J., Benito, M., & Alberdi, M. (2016). Students’ personal networks in virtual and personal learning environments: A case study in higher education using learning analytics approach. Interactive Learning Environments, 24 (1), 49–67. https://doi.org/10.1080/10494820.2013.817441

Chakraborty, B., Chakma, K., & Mukherjee, A. (2016). A density-based clustering algorithm and experiments on student dataset with noises using Rough set theory. In Proceedings of 2nd IEEE international conference on engineering and technology, ICETECH 2016 , March (pp. 431–436). https://doi.org/10.1109/ICETECH.2016.7569290

Costa-Mendes, R., Oliveira, T., Castelli, M., & Cruz-Jesus, F. (2020). A machine learning approximation of the 2015 Portuguese high school student grades: A hybrid approach. Education and Information Technologies, 26 , 1527–1547. https://doi.org/10.1007/s10639-020-10316-y

Cruz-Jesus, F., Castelli, M., Oliveira, T., Mendes, R., Nunes, C., Sa-Velho, M., & Rosa-Louro, A. (2020). Using artificial intelligence methods to assess academic achievement in public high schools of a European Union country. Heliyon . https://doi.org/10.1016/j.heliyon.2020.e04081

Delen, D. (2010). A comparative analysis of machine learning techniques for student retention management. Decision Support Systems, 49 (4), 498–506. https://doi.org/10.1016/j.dss.2010.06.003

Delen, D. (2011). Predicting student attrition with data mining methods. Journal of College Student Retention: Research, Theory and Practice, 13 (1), 17–35. https://doi.org/10.2190/CS.13.1.b

Fernandes, E., Holanda, M., Victorino, M., Borges, V., Carvalho, R., & Van Erven, G. (2019). Educational data mining : Predictive analysis of academic performance of public school students in the capital of Brazil. Journal of Business Research, 94 (February 2018), 335–343. https://doi.org/10.1016/j.jbusres.2018.02.012

Fidalgo-Blanco, Á., Sein-Echaluce, M. L., García-Peñalvo, F. J., & Conde, M. Á. (2015). Using Learning Analytics to improve teamwork assessment. Computers in Human Behavior, 47 , 149–156. https://doi.org/10.1016/j.chb.2014.11.050

García-González, J. D., & Skrita, A. (2019). Predicting academic performance based on students’ family environment: Evidence for Colombia using classification trees. Psychology, Society and Education, 11 (3), 299–311. https://doi.org/10.25115/psye.v11i3.2056

Gök, M. (2017). Predicting academic achievement with machine learning methods. Gazi University Journal of Science Part c: Design and Technology, 5 (3), 139–148.

Hardman, J., Paucar-Caceres, A., & Fielding, A. (2013). Predicting students’ progression in higher education by using the random forest algorithm. Systems Research and Behavioral Science, 30 (2), 194–203. https://doi.org/10.1002/sres.2130

Hellas, A., Ihantola, P., Petersen, A., Ajanovski, V.V., Gutica, M., Hynninen, T., Knutas, A., Leinonen, J., Messom, C., & Liao, S.N. (2018). Predicting academic performance: a systematic literature review. In Proceedings companion of the 23rd annual ACM conference on innovation and technology in computer science education (pp. 175–199).

Hoffait, A., & Schyns, M. (2017). Early detection of university students with potential difficulties. Decision Support Systems, 101 (2017), 1–11. https://doi.org/10.1016/j.dss.2017.05.003

Huang, S., & Fang, N. (2013). Predicting student academic performance in an engineering dynamics course: A comparison of four types of predictive mathematical models. Computers and Education, 61 (1), 133–145. https://doi.org/10.1016/j.compedu.2012.08.015

Kardaş, K., & Güvenir, A. (2020). Analysis of the effects of Quizzes, homeworks and projects on final exam with different machine learning techniques. EMO Journal of Scientific, 10 (1), 22–29.

Kaur, P., Singh, M., & Josan, G. S. (2015). Classification and prediction based data mining algorithms to predict slow learners in education sector. Procedia Computer Science, 57 , 500–508. https://doi.org/10.1016/j.procs.2015.07.372

Kılınç, Ç. (2015). Examining the effects on university student success by data mining techniques. [Unpublished master’s thesis]. Eskişehir Osmangazi University Institute of Science.

Kotsiantis, S., Tselios, N., Filippidi, A., & Komis, V. (2013). Using learning analytics to identify successful learners in a blended learning course. International Journal of Technology Enhanced Learning, 5 (2), 133–150. https://doi.org/10.1504/IJTEL.2013.059088

Lara, J. A., Lizcano, D., Martínez, M. A., Pazos, J., & Riera, T. (2014). A system for knowledge discovery in e-learning environments within the European Higher Education Area—Application to student data from Open University of Madrid, UDIMA. Computers and Education, 72 , 23–36. https://doi.org/10.1016/j.compedu.2013.10.009

Long, P., & Siemens, G. (2011). Penetrating the fog: Analytics in learning and education. Educause Review, 46 (5), 31–40.

Macfadyen, L. P., & Dawson, S. (2010). Mining LMS data to develop an “early warning system” for educators: A proof of concept. Computers & Education, 54 (2), 588–599. https://doi.org/10.1016/j.compedu.2009.09.008

Musso, M. F., Hernández, C. F. R., & Cascallar, E. C. (2020). Predicting key educational outcomes in academic trajectories: A machine-learning approach. Higher Education, 80 (5), 875–894. https://doi.org/10.1007/s10734-020-00520-7

Nandeshwar, A., Menzies, T., & Nelson, A. (2011). Learning patterns of university student retention. Expert Systems with Applications, 38 (12), 14984–14996. https://doi.org/10.1016/j.eswa.2011.05.048

Ornelas, F., & Ordonez, C. (2017). Predicting student success: A naïve bayesian application to community college data. Technology, Knowledge and Learning, 22 (3), 299–315. https://doi.org/10.1007/s10758-017-9334-z

Ortiz, E. A., & Dehon, C. (2008). What are the factors of success at University? A case study in Belgium. Cesifo Economic Studies, 54 (2), 121–148. https://doi.org/10.1093/cesifo/ifn012

Rebai, S., Ben Yahia, F., & Essid, H. (2020). A graphically based machine learning approach to predict secondary schools performance in Tunisia. Socio-Economic Planning Sciences, 70 (August 2018), 100724. https://doi.org/10.1016/j.seps.2019.06.009

Rizvi, S., Rienties, B., & Ahmed, S. (2019). The role of demographics in online learning; A decision tree based approach. Computers & Education, 137 (August 2018), 32–47. https://doi.org/10.1016/j.compedu.2019.04.001

Rubin, B., Fernandes, R., Avgerinou, M. D., & Moore, J. (2010). The effect of learning management systems on student and faculty outcomes. The Internet and Higher Education, 13 (1–2), 82–83. https://doi.org/10.1016/j.iheduc.2009.10.008

Saqr, M., Fors, U., & Tedre, M. (2017). How learning analytics can early predict under-achieving students in a blended medical education course. Medical Teacher, 39 (7), 757–767. https://doi.org/10.1080/0142159X.2017.1309376

Shorfuzzaman, M., Hossain, M. S., Nazir, A., Muhammad, G., & Alamri, A. (2019). Harnessing the power of big data analytics in the cloud to support learning analytics in mobile learning environment. Computers in Human Behavior, 92 (February 2017), 578–588. https://doi.org/10.1016/j.chb.2018.07.002

Vandamme, J.-P., Meskens, N., & Superby, J.-F. (2007). Predicting academic performance by data mining methods. Education Economics, 15 (4), 405–419. https://doi.org/10.1080/09645290701409939

Viberg, O., Hatakka, M., Bälter, O., & Mavroudi, A. (2018). The current landscape of learning analytics in higher education. Computers in Human Behavior, 89 (July), 98–110. https://doi.org/10.1016/j.chb.2018.07.027

Waheed, H., Hassan, S. U., Aljohani, N. R., Hardman, J., Alelyani, S., & Nawaz, R. (2020). Predicting academic performance of students from VLE big data using deep learning models. Computers in Human Behavior, 104 (October 2019), 106189. https://doi.org/10.1016/j.chb.2019.106189

Witten, I. H., Frank, E., & Hall, M. A. (2011). Data mining practical machine learning tools and techniques (3rd ed.). Morgan Kaufmann.

Xing, W., Guo, R., Petakovic, E., & Goggins, S. (2015). Participation-based student final performance prediction model through interpretable Genetic Programming: Integrating learning analytics, educational data mining and theory. Computers in Human Behavior, 47 , 168–181.

Xu, X., Wang, J., Peng, H., & Wu, R. (2019). Prediction of academic performance associated with internet usage behaviors using machine learning algorithms. Computers in Human Behavior, 98 (January), 166–173. https://doi.org/10.1016/j.chb.2019.04.015

Zabriskie, C., Yang, J., DeVore, S., & Stewart, J. (2019). Using machine learning to predict physics course outcomes. Physical Review Physics Education Research, 15 (2), 020120. https://doi.org/10.1103/PhysRevPhysEducRes.15.020120

Download references

Acknowledgements

Not applicable.

Author information

Authors and affiliations.

Kırşehir Ahi Evran University, Faculty of Engineering and Architecture, 40100, Kırşehir, Turkey

Mustafa Yağcı

You can also search for this author in PubMed Google Scholar

Contributions

All authors read and approved the final manuscript.

Corresponding author

Correspondence to Mustafa Yağcı .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Yağcı, M. Educational data mining: prediction of students' academic performance using machine learning algorithms. Smart Learn. Environ. 9 , 11 (2022). https://doi.org/10.1186/s40561-022-00192-z

Download citation

Received : 15 November 2021

Accepted : 15 February 2022

Published : 03 March 2022

DOI : https://doi.org/10.1186/s40561-022-00192-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Machine learning
Predicting achievement
Learning analytics
Early warning systems

50 selected papers in Data Mining and Machine Learning

Here is the list of 50 selected papers in Data Mining and Machine Learning . You can download them for your detailed reading and research. Enjoy!

Data Mining and Statistics: What’s the Connection?

Data Mining: Statistics and More? , D. Hand, American Statistician, 52(2):112-118.

Data Mining , G. Weiss and B. Davison, in Handbook of Technology Management, John Wiley and Sons, expected 2010.

From Data Mining to Knowledge Discovery in Databases , U. Fayyad, G. Piatesky-Shapiro & P. Smyth, AI Magazine, 17(3):37-54, Fall 1996.

Mining Business Databases , Communications of the ACM, 39(11): 42-48.

10 Challenging Problems in Data Mining Research , Q. Yiang and X. Wu, International Journal of Information Technology & Decision Making, Vol. 5, No. 4, 2006, 597-604.

The Long Tail , by Anderson, C., Wired magazine.

AOL’s Disturbing Glimpse Into Users’ Lives , by McCullagh, D., News.com, August 9, 2006

General Data Mining Methods and Algorithms

Top 10 Algorithms in Data Mining , X. Wu, V. Kumar, J.R. Quinlan, J. Ghosh, Q. Yang, H. motoda, G.J. MClachlan, A. Ng, B. Liu, P.S. Yu, Z. Zhou, M. Steinbach, D. J. Hand, D. Steinberg, Knowl Inf Syst (2008) 141-37.

Induction of Decision Trees , R. Quinlan, Machine Learning, 1(1):81-106, 1986.

Web and Link Mining

The Pagerank Citation Ranking: Bringing Order to the Web , L. Page, S. Brin, R. Motwani, T. Winograd, Technical Report, Stanford University, 1999.

The Structure and Function of Complex Networks , M. E. J. Newman, SIAM Review, 2003, 45, 167-256.

Link Mining: A New Data Mining Challenge , L. Getoor, SIGKDD Explorations, 2003, 5(1), 84-89.

Link Mining: A Survey , L. Getoor, SIGKDD Explorations, 2005, 7(2), 3-12.

Semi-supervised Learning

Semi-Supervised Learning Literature Survey , X. Zhu, Computer Sciences TR 1530, University of Wisconsin — Madison.

Introduction to Semi-Supervised Learning, in Semi-Supervised Learning (Chapter 1) O. Chapelle, B. Scholkopf, A. Zien (eds.), MIT Press, 2006. (Fordham’s library has online access to the entire text)

Learning with Labeled and Unlabeled Data , M. Seeger, University of Edinburgh (unpublished), 2002.

Person Identification in Webcam Images: An Application of Semi-Supervised Learning , M. Balcan, A. Blum, P. Choi, J. lafferty, B. Pantano, M. Rwebangira, X. Zhu, Proceedings of the 22nd ICML Workshop on Learning with Partially Classified Training Data , 2005.

Learning from Labeled and Unlabeled Data: An Empirical Study across Techniques and Domains , N. Chawla, G. Karakoulas, Journal of Artificial Intelligence Research , 23:331-366, 2005.

Text Classification from Labeled and Unlabeled Documents using EM , K. Nigam, A. McCallum, S. Thrun, T. Mitchell, Machine Learning , 39, 103-134, 2000.

Self-taught Learning: Transfer Learning from Unlabeled Data , R. Raina, A. Battle, H. Lee, B. Packer, A. Ng, in Proceedings of the 24th International Conference on Machine Learning , 2007.

An iterative algorithm for extending learners to a semisupervised setting , M. Culp, G. Michailidis, 2007 Joint Statistical Meetings (JSM), 2007

Partially-Supervised Learning / Learning with Uncertain Class Labels

Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers , V. Sheng, F. Provost, P. Ipeirotis, in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 2008.

Logistic Regression for Partial Labels , in 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems , Volume III, pp. 1935-1941, 2002.

Classification with Partial labels , N. Nguyen, R. Caruana, in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 2008.

Imprecise and Uncertain Labelling: A Solution based on Mixture Model and Belief Functions, E. Come, 2008 (powerpoint slides).

Induction of Decision Trees from Partially Classified Data Using Belief Functions , M. Bjanger, Norweigen University of Science and Technology, 2000.

Knowledge Discovery in Large Image Databases: Dealing with Uncertainties in Ground Truth , P. Smyth, M. Burl, U. Fayyad, P. Perona, KDD Workshop 1994, AAAI Technical Report WS-94-03, pp. 109-120, 1994.

Recommender Systems

Trust No One: Evaluating Trust-based Filtering for Recommenders , J. O’Donovan and B. Smyth, In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI-05), 2005, 1663-1665.

Trust in Recommender Systems, J. O’Donovan and B. Symyth, In Proceedings of the 10th International Conference on Intelligent User Interfaces (IUI-05), 2005, 167-174.

General resources available on this topic :

ICML 2003 Workshop: Learning from Imbalanced Data Sets II

AAAI ‘2000 Workshop on Learning from Imbalanced Data Sets

A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data , G. Batista, R. Prati, and M. Monard, SIGKDD Explorations , 6(1):20-29, 2004.

Class Imbalance versus Small Disjuncts , T. Jo and N. Japkowicz, SIGKDD Explorations , 6(1): 40-49, 2004.

Extreme Re-balancing for SVMs: a Case Study , B. Raskutti and A. Kowalczyk, SIGKDD Explorations , 6(1):60-69, 2004.

A Multiple Resampling Method for Learning from Imbalanced Data Sets , A. Estabrooks, T. Jo, and N. Japkowicz, in Computational Intelligence , 20(1), 2004.

SMOTE: Synthetic Minority Over-sampling Technique , N. Chawla, K. Boyer, L. Hall, and W. Kegelmeyer, Journal of Articifial Intelligence Research , 16:321-357.

Generative Oversampling for Mining Imbalanced Datasets, A. Liu, J. Ghosh, and C. Martin, Third International Conference on Data Mining (DMIN-07), 66-72.

Learning from Little: Comparison of Classifiers Given Little of Classifiers given Little Training , G. Forman and I. Cohen, in 8th European Conference on Principles and Practice of Knowledge Discovery in Databases , 161-172, 2004.

Issues in Mining Imbalanced Data Sets – A Review Paper , S. Visa and A. Ralescu, in Proceedings of the Sixteen Midwest Artificial Intelligence and Cognitive Science Conference , pp. 67-73, 2005.

Wrapper-based Computation and Evaluation of Sampling Methods for Imbalanced Datasets , N. Chawla, L. Hall, and A. Joshi, in Proceedings of the 1st International Workshop on Utility-based Data Mining , 24-33, 2005.

C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , C. Drummond and R. Holte, in ICML Workshop onLearning from Imbalanced Datasets II , 2003.

C4.5 and Imbalanced Data sets: Investigating the effect of sampling method, probabilistic estimate, and decision tree structure , N. Chawla, in ICML Workshop on Learning from Imbalanced Datasets II , 2003.

Class Imbalances: Are we Focusing on the Right Issue?, N. Japkowicz, in ICML Workshop on Learning from Imbalanced Datasets II , 2003.

Learning when Data Sets are Imbalanced and When Costs are Unequal and Unknown , M. Maloof, in ICML Workshop on Learning from Imbalanced Datasets II , 2003.

Uncertainty Sampling Methods for One-class Classifiers , P. Juszcak and R. Duin, in ICML Workshop on Learning from Imbalanced Datasets II , 2003.

Active Learning

Improving Generalization with Active Learning , D Cohn, L. Atlas, and R. Ladner, Machine Learning 15(2), 201-221, May 1994.

On Active Learning for Data Acquisition , Z. Zheng and B. Padmanabhan, In Proc. of IEEE Intl. Conf. on Data Mining, 2002.

Active Sampling for Class Probability Estimation and Ranking , M. Saar-Tsechansky and F. Provost, Machine Learning 54:2 2004, 153-178.

The Learning-Curve Sampling Method Applied to Model-Based Clustering , C. Meek, B. Thiesson, and D. Heckerman, Journal of Machine Learning Research 2:397-418, 2002.

Active Sampling for Feature Selection , S. Veeramachaneni and P. Avesani, Third IEEE Conference on Data Mining, 2003.

Heterogeneous Uncertainty Sampling for Supervised Learning , D. Lewis and J. Catlett, In Proceedings of the 11th International Conference on Machine Learning, 148-156, 1994.

Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , G. Weiss and F. Provost, Journal of Artificial Intelligence Research, 19:315-354, 2003.

Active Learning using Adaptive Resampling , KDD 2000, 91-98.

Cost-Sensitive Learning

Types of Cost in Inductive Concept Learning , P. Turney, In Proceedings Workshop on Cost-Sensitive Learning at the Seventeenth International Conference on Machine Learning.

Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection , P. Chan and S. Stolfo, KDD 1998.

Recent Blogs

Artificial intelligence and machine learning: What’s the difference

Artificial Intelligence , Machine Learning

10 online courses for understanding machine learning

Machine Learning , Tutorials

How is ML Being Used to Handle Security Vulnerabilities?

Machine Learning

10 groups of machine learning algorithms

How a nearly forgotten physicist shaped internet access today

Massachuse...

FinTech 2019: 5 uses cases of machine learning in finance

Banking / Finance , Machine Learning

The biggest impact of machine learning for digital marketing professionals

Machine Learning , Marketing

Looking ahead: the innovative future of iOS in 2019

How machine learning is changing identity theft detection

Machine Learning , Privacy / Security

Wearable technology to boost the process of digitalization of the modern world

Top 8 machine learning startups you should know about

The term...

How retargeting algorithms help in web personalization

others , Machine Learning

3 automation tools to help you in your next app build

Machine learning and information security: impact and trends

Machine Learning , Privacy / Security , Sectors , Tech and Tools

How to improve your productivity with AI and Machine Learning?

Artificial Intelligence , Human Resource , Machine Learning

Artificial...

Ask Data – A new and intuitive way to analyze data with natural language

10 free machine learning ebooks all scientists & ai engineers should read, yisi, a machine translation teacher who cracks down on errors in meaning, machine learning & license plate recognition: an ideal partnership, top 17 data science and machine learning vendors shortlisted by gartner, accuracy and bias in machine learning models – overview, interview with dejan s. milojicic on top technology trends and predictions for 2019.

Artificial Intelligence , Interviews , Machine Learning

Recently,...

Why every small business should use machine learning?

Microsoft’s ML.NET: A blend of machine learning and .NET

Machine learning: best examples and ideas for mobile apps, researchers harness machine learning to predict chemical reactions, subscribe to the crayon blog.

Get the latest posts in your inbox!

M.Tech/Ph.D Thesis Help in Chandigarh | Thesis Guidance in Chandigarh

[email protected]

+91-9465330425

Data Mining

How does data mining work?

A standard data mining design begins with the appropriate business statement in the questionnaire, the appropriate data is collected to tackle it, and the data is prepared for the examination.
What happens in the earlier stages determines how successful the later versions are.
Data miners should assure the data quality they utilize as input for research because bad data quality results in poor outcomes.
Establishing a detailed understanding of the design factors, such as the present business scenario, the project’s main business goal, and the performance objectives.
Identifying the data required to address the problem as well as collecting this from all sorts of sources.
Addressing any errors and bugs, like incomplete or duplicate data, and processing the data in a suitable format to solve the research questions.
Algorithms are used to find patterns from data.
Identifying if or how another model’s output will contribute to the achievement of a business objective.
In order to acquire the optimum outcome, an iterative process is frequently used to identify the best method.
Getting the project’s findings suitable for making decisions in real-time

The techniques and actions listed above are repeated until the best outcomes are achieved. Our engineers and developers have extensive knowledge of the tools, techniques, and approaches used in the processes described above. We guarantee that we will provide the best research advice w.r.t to data mining thesis topics and complete your project on schedule. What are the important data mining tasks?

Data Mining Tasks

Data mining finds application in many ways including description, Analysis, summarization of data, and clarifying the conceptual understanding by data description
And also prediction, classification, dependency analysis, segmentation, and case-based reasoning are some of the important data mining tasks
Regression – numerical data prediction (stock prices, temperatures, and total sales)
Data warehousing – business decision making and large-scale data mining
Classification – accurate prediction of target classes and their categorization
Association rule learning – market-based analytical tools that were involved in establishing variable data set relationship
Machine learning – statistical probability-based decision making method without complicated programming
Data analytics – digital data evaluation for business purposes
Clustering – dataset partitioning into clusters and subclasses for analyzing natural data structure and format
Artificial intelligence – human-based Data analytics for reasoning, solving problems, learning, and planning
Data preparation and cleansing – conversion of raw data into a processed form for identification and removal of errors

You can look at our website for a more in-depth look at all of these operations. We supply you with the needed data, as well as any additional data you may need for your data mining thesis topics . We supply non-plagiarized data mining thesis assistance in any fresh idea of your choice. Let us now discuss the stages in data mining that are to be included in your thesis topics

How to work on a data mining thesis topic?

The following are the important stages or phases in developing data mining thesis topics.

First of all, you need to identify the present demand and address the question
The next step is defining or specifying the problem
Collection of data is the third step
Alternative solutions and designs have to be analyzed in the next step
The proposed methodology has to be designed
The system is then to be implemented

Usually, our experts help in writing codes and implementing them successfully without hassles . By consistently following the above steps you can develop one of the best data mining thesis topics of recent days. Furthermore, technically it is important for you to have a better idea of all the tasks and techniques involved in data mining about which we have discussed below

Data visualization
Neural networks
Statistical modeling
Genetic algorithms and neural networks
Decision trees and induction
Discriminant analysis
Induction techniques
Association rules and data visualization
Bayesian networks
Correlation
Regression analysis
Regression analysis and regression trees

If you are looking forward to selecting the best tool for your data mining project then evaluating its consistency and efficiency stands first. For this, you need to gain enough technical data from real-time executed projects for which you can directly contact us. Since we have delivered an ample number of data mining thesis topics successfully we can help you in finding better solutions to all your research issues. What are the points to be remembered about the data mining strategy?

Furthermore, data mining strategies must be picked before instruments in order to prevent using strategies that do not align with the article’s true purposes.
The typical data mining strategy has always been to evaluate a variety of methodologies in order to select one which best fits the situation.
As previously said, there are some principles that may be used to choose effective strategies for data mining projects.
Since they are easy to handle and comprehend
They could indeed collaborate with definitional and parametric data
Tare unaffected by critical values, they could perhaps function with incomplete information
They could also expose various interrelationships and an absence of linear combinations
They could indeed handle noise in records
They can process huge amounts of data.
Decision trees, on the other hand, have significant drawbacks.
Many rules are frequently necessary for dependent variables or numerous regressions, and tiny changes in the data can result in very different tree architectures.

All such pros and cons of various data mining aspects are discussed on our website. We will provide you with high-quality research assistance and thesis writing assistance . You may see proof of our skill and the unique approach that we generated in the field by looking at the samples of the thesis that we produced on our website. We also offer an internal review to help you feel more confident. Let us now discuss the recent data mining methodologies

Current methods in Data Mining

Prediction of data (time series data mining)
Discriminant and cluster analysis
Logistic regression and segmentation

Our technical specialists and technicians usually give adequate accurate data, a thorough and detailed explanation, and technical notes for all of these processes and algorithms. As a result, you can get all of your questions answered in one spot. Our technical team is also well-versed in current trends, allowing us to provide realistic explanations for all new developments. We will now talk about the latest data mining trends

Latest Trending Data Mining Thesis Topics

Visual data mining and data mining software engineering
Interaction and scalability in data mining
Exploring applications of data mining
Biological and visual data mining
Cloud computing and big data integration
Data security and protecting privacy in data mining
Novel methodologies in complex data mining
Data mining in multiple databases and rationalities
Query language standardization in data mining
Integration of MapReduce, Amazon EC2, S3, Apache Spark, and Hadoop into data mining

These are the recent trends in data mining. We insist that you choose one of the topics that interest you the most. Having an appropriate content structure or template is essential while writing a thesis . We design the plan in a chronological order relevant to the study assessment with this in mind. The incorporation of citations is one of the most important aspects of the thesis. We focus not only on authoring but also on citing essential sources in the text. Students frequently struggle to deal with appropriate proposals when commencing their thesis. We have years of experience in providing the greatest study and data mining thesis writing services to the scientific community, which are promptly and widely acknowledged. We will now talk about future research directions of research in various data mining thesis topics

Future Research Directions of Data Mining

The potential of data mining and data science seems promising, as the volume of data continues to grow.
It is expected that the total amount of data in our digital cosmos will have grown from 4.4 zettabytes to 44 zettabytes.
We’ll also generate 1.7 gigabytes of new data for every human being on this planet each second.
Mining algorithms have completely transformed as technology has advanced, and thus have tools for obtaining useful insights from data.
Only corporations like NASA could utilize their powerful computers to examine data once upon a time because the cost of producing and processing data was simply too high.
Organizations are now using cloud-based data warehouses to accomplish any kinds of great activities with machine learning, artificial intelligence, and deep learning.

The Internet of Things as well as wearable electronics, for instance, has transformed devices to be connected into data-generating engines which provide limitless perspectives into people and organizations if firms can gather, store, and analyze the data quickly enough. What are the aspects to be remembered for choosing the best data mining thesis topics?

An excellent thesis topic is a broad concept that has to be developed, verified, or refuted.
Your thesis topic must capture your curiosity, as well as the involvement of both the supervisor and the academicians.
Your thesis topic must be relevant to your studies and should be able to withstand examination.

Our engineers and experts can provide you with any type of research assistance on any of these data mining development tools . We satisfy the criteria of your universities by ensuring several revisions, appropriate formatting and editing of your thesis, comprehensive grammar check, and so on . As a result, you can contact us with confidence for complete assistance with your data mining thesis. What are the important data mining thesis topics?

Trending Data Mining Research Thesis Topics

Research Topics in Data Mining

Handling cost-effective, unbalanced non-static data
Issues related to data mining and their solutions
Network settings in data mining and ensuring privacy, security, and integrity of data
Environmental and biological issues in data mining
Complex data mining and sequential data mining (time series data)
Data mining at higher dimensions
Multi-agent data mining and distributed data mining
High-speed data mining
Development of unified data mining theory

We currently provide full support for all parts of research study, development, investigation, including project planning, technical advice, legitimate scientific data, thesis writing, paper publication, assignments and project planning, internal review, and many other services. As a result, you can contact us for any kind of help with your data mining thesis topics.

Why Work With Us ?

Senior research member, research experience, journal member, book publisher, research ethics, business ethics, valid references, explanations, paper publication, 9 big reasons to select us.

Our Editor-in-Chief has Website Ownership who control and deliver all aspects of PhD Direction to scholars and students and also keep the look to fully manage all our clients.

Our world-class certified experts have 18+years of experience in Research & Development programs (Industrial Research) who absolutely immersed as many scholars as possible in developing strong PhD research projects.

We associated with 200+reputed SCI and SCOPUS indexed journals (SJR ranking) for getting research work to be published in standard journals (Your first-choice journal).

PhDdirection.com is world’s largest book publishing platform that predominantly work subject-wise categories for scholars/students to assist their books writing and takes out into the University Library.

Our researchers provide required research ethics such as Confidentiality & Privacy, Novelty (valuable research), Plagiarism-Free, and Timely Delivery. Our customers have freedom to examine their current specific research activities.

Our organization take into consideration of customer satisfaction, online, offline support and professional works deliver since these are the actual inspiring business factors.

Solid works delivering by young qualified global research team. "References" is the key to evaluating works easier because we carefully assess scholars findings.

Detailed Videos, Readme files, Screenshots are provided for all research projects. We provide Teamviewer support and other online channels for project explanation.

Worthy journal publication is our main thing like IEEE, ACM, Springer, IET, Elsevier, etc. We substantially reduces scholars burden in publication side. We carry scholars from initial submission to final acceptance.

Our benefits, throughout reference, confidential agreement, research no way resale, plagiarism-free, publication guarantee, customize support, fair revisions, business professionalism, domains & tools, we generally use, wireless communication (4g lte, and 5g), ad hoc networks (vanet, manet, etc.), wireless sensor networks, software defined networks, network security, internet of things (mqtt, coap), internet of vehicles, cloud computing, fog computing, edge computing, mobile computing, mobile cloud computing, ubiquitous computing, digital image processing, medical image processing, pattern analysis and machine intelligence, geoscience and remote sensing, big data analytics, data mining, power electronics, web of things, digital forensics, natural language processing, automation systems, artificial intelligence, mininet 2.1.0, matlab (r2018b/r2019a), matlab and simulink, apache hadoop, apache spark mlib, apache mahout, apache flink, apache storm, apache cassandra, pig and hive, rapid miner, support 24/7, call us @ any time, +91 9444829042, [email protected].

Questions ?

Click here to chat with us

82 Data Mining Essay Topic Ideas & Examples

🏆 best data mining topic ideas & essay examples, 💡 good essay topics on data mining, ✅ most interesting data mining topics to write about.

Disadvantages of Using Web 2.0 for Data Mining Applications This data can be confusing to the readers and may not be reliable. Lastly, with the use of Web 2.
Data Warehouse and Data Mining in Business The circumstances leading to the establishment and development of the concept of data warehousing was attributed to the fact that failure to have a data warehouse led to the need of putting in place large […]
The Data Mining Method in Healthcare and Education Thus, I would use data mining in both cases; however, before that, I would discover a way to improve the algorithms used for it.
Data Mining Tools and Data Mining Myths The first problem is correlated with keeping the identity of the person evolved in data mining secret. One of the major myths regarding data mining is that it can replace domain knowledge.
Hybrid Data Mining Approach in Healthcare One of the healthcare projects that will call for the use of data mining is treatment evaluation. In this case, it is essential to realize that the main aim of health data mining is to […]
Terrorism and Data Mining Algorithms However, this is a necessary evil as the nation’s security has to be prioritized since these attacks lead to harm to a larger population compared to the infringements.
Data Mining and Its Major Advantages Thus, it is possible to conclude that data mining is a convenient and effective way of processing information, which has many advantages.
Transforming Coded and Text Data Before Data Mining However, to complete data mining, it is necessary to transform the data according to the techniques that are to be used in the process.
Data Mining and Machine Learning Algorithms The shortest distance of string between two instances defines the distance of measure. However, this is also not very clear as to which transformations are summed, and thus it aims to a probability with the […]
Summary of C4.5 Algorithm: Data Mining 5 algorism: Each record from set of data should be associated with one of the offered classes, it means that one of the attributes of the class should be considered as a class mark.
Data Mining in Social Networks: Linkedin.com One of the ways to achieve the aim is to understand how users view data mining of their data on LinkedIn.
Ethnography and Data Mining in Anthropology The study of cultures is of great importance under normal circumstances to enhance the understanding of the same. Data mining is the success secret of ethnography.
Issues With Data Mining It is necessary to note that the usage of data mining helps FBI to have access to the necessary information for terrorism and crime tracking.
Large Volume Data Handling: An Efficient Data Mining Solution Data mining is the process of sorting huge amount of data and finding out the relevant data. Data mining is widely used for the maintenance of data which helps a lot to an organization in […]
Data Mining and Analytical Developments In this era where there is a lot of information to be handled at ago and actually with little available time, it is necessarily useful and wise to analyze data from different viewpoints and summarize […]
Levi’s Company’s Data Mining & Customer Analytics Levi, the renowned name in jeans is feeling the heat of competition from a number of other brands, which have come upon the scene well after Levi’s but today appear to be approaching Levi’s market […]
Cryptocurrency Exchange Market Prediction and Analysis Using Data Mining and Artificial Intelligence This paper aims to review the application of A.I.in the context of blockchain finance by examining scholarly articles to determine whether the A.I.algorithm can be used to analyze this financial market.
Data Mining in Healthcare: Applications and Big Data Analyze Big data analysis is among the most influential modern trends in informatics and it has applications in virtually every sphere of human life.
“Data Mining and Customer Relationship Marketing in the Banking Industry“ by Chye & Gerry First of all, the article generally elaborates on the notion of customer relationship management, which is defined as “the process of predicting customer behavior and selecting actions to influence that behavior to benefit the company”.
Data Mining Techniques and Applications The use of data mining to detect disturbances in the ecosystem can help to avert problems that are destructive to the environment and to society.
Ethical Data Mining in the UAE Traffic Department The research question identified in the assignment two is considered to be the following, namely whether the implementation of the business intelligence into the working process will beneficially influence the work of the Traffic Department […]
Canadian University Dubai and Data Mining The aim of mining data in the education environment is to enhance the quality of education for the mass through proactive and knowledge-based decision-making approaches.
Data Mining and Customer Relationship Management As such, CRM not only entails the integration of marketing, sales, customer service, and supply chain capabilities of the firm to attain elevated efficiencies and effectiveness in conveying customer value, but it obliges the organization […]
E-Commerce: Mining Data for Better Business Intelligence The method allowed the use of Intel and an example to build the study and the literature on data mining for business intelligence to analyze the findings.
Ethical Implications of Data Mining by Government Institutions Critics of personal data mining insist that it infringes on the rights of an individual and result to the loss of sensitive information.
Data Mining Role in Companies The increasing adoption of data mining in various sectors illustrates the potential of the technology regarding the analysis of data by entities that seek information crucial to their operations.
Data Mining: Concepts and Methods Speed of data mining process is important as it has a role to play in the relevance of the data mined. The accuracy of data is also another factor that can be used to measure […]
Data Mining Technologies According to Han & Kamber, data mining is the process of discovering correlations, patterns, trends or relationships by searching through a large amount of data that in most circumstances is stored in repositories, business databases […]
Data Mining: A Critical Discussion In recent times, the relatively new discipline of data mining has been a subject of widely published debate in mainstream forums and academic discourses, not only due to the fact that it forms a critical […]
Commercial Uses of Data Mining Data mining process entails the use of large relational database to identify the correlation that exists in a given data. The principal role of the applications is to sift the data to identify correlations.
A Discussion on the Acceptability of Data Mining Today, more than ever before, individuals, organizations and governments have access to seemingly endless amounts of data that has been stored electronically on the World Wide Web and the Internet, and thus it makes much […]
Applying Data Mining Technology for Insurance Rate Making: Automobile Insurance Example
Applebee’s, Travelocity and Others: Data Mining for Business Decisions
Applying Data Mining Procedures to a Customer Relationship
Business Intelligence as Competitive Tool of Data Mining
Overview of Accounting Information System Data Mining
Applying Data Mining Technique to Disassembly Sequence Planning
Approach for Image Data Mining Cultural Studies
Apriori Algorithm for the Data Mining of Global Cyberspace Security Issues
Database Data Mining: The Silent Invasion of Privacy
Data Management: Data Warehousing and Data Mining
Constructive Data Mining: Modeling Consumers’ Expenditure in Venezuela
Data Mining and Its Impact on Healthcare
Innovations and Perspectives in Data Mining and Knowledge Discovery
Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection
Linking Data Mining and Anomaly Detection Techniques
Data Mining and Pattern Recognition Models for Identifying Inherited Diseases
Credit Card Fraud Detection Through Data Mining
Data Mining Approach for Direct Marketing of Banking Products
Constructive Data Mining: Modeling Argentine Broad Money Demand
Data Mining-Based Dispatching System for Solving the Pickup and Delivery Problem
Commercially Available Data Mining Tools Used in the Economic Environment
Data Mining Climate Variability as an Indicator of U.S. Natural Gas
Analysis of Data Mining in the Pharmaceutical Industry
Data Mining-Driven Analysis and Decomposition in Agent Supply Chain Management Networks
Credit Evaluation Model for Banks Using Data Mining
Data Mining for Business Intelligence: Multiple Linear Regression
Cluster Analysis for Diabetic Retinopathy Prediction Using Data Mining Techniques
Data Mining for Fraud Detection Using Invoicing Data
Jaeger Uses Data Mining to Reduce Losses From Crime and Waste
Data Mining for Industrial Engineering and Management
Business Intelligence and Data Mining – Decision Trees
Data Mining for Traffic Prediction and Intelligent Traffic Management System
Building Data Mining Applications for CRM
Data Mining Optimization Algorithms Based on the Swarm Intelligence
Big Data Mining: Challenges, Technologies, Tools, and Applications
Data Mining Solutions for the Business Environment
Overview of Big Data Mining and Business Intelligence Trends
Data Mining Techniques for Customer Relationship Management
Classification-Based Data Mining Approach for Quality Control in Wine Production
Data Mining With Local Model Specification Uncertainty
Employing Data Mining Techniques in Testing the Effectiveness of Modernization Theory
Enhancing Information Management Through Data Mining Analytics
Evaluating Feature Selection Methods for Learning in Data Mining Applications
Extracting Formations From Long Financial Time Series Using Data Mining
Financial and Banking Markets and Data Mining Techniques
Fraudulent Financial Statements and Detection Through Techniques of Data Mining
Harmful Impact Internet and Data Mining Have on Society
Informatics, Data Mining, Econometrics, and Financial Economics: A Connection
Integrating Data Mining Techniques Into Telemedicine Systems
Investigating Tobacco Usage Habits Using Data Mining Approach
Chicago (A-D)
Chicago (N-B)

IvyPanda. (2024, March 2). 82 Data Mining Essay Topic Ideas & Examples. https://ivypanda.com/essays/topic/data-mining-essay-topics/

"82 Data Mining Essay Topic Ideas & Examples." IvyPanda , 2 Mar. 2024, ivypanda.com/essays/topic/data-mining-essay-topics/.

IvyPanda . (2024) '82 Data Mining Essay Topic Ideas & Examples'. 2 March.

IvyPanda . 2024. "82 Data Mining Essay Topic Ideas & Examples." March 2, 2024. https://ivypanda.com/essays/topic/data-mining-essay-topics/.

1. IvyPanda . "82 Data Mining Essay Topic Ideas & Examples." March 2, 2024. https://ivypanda.com/essays/topic/data-mining-essay-topics/.

Bibliography

IvyPanda . "82 Data Mining Essay Topic Ideas & Examples." March 2, 2024. https://ivypanda.com/essays/topic/data-mining-essay-topics/.

Electronics Engineering Paper Topics
Cyber Security Topics
Google Paper Topics
Hacking Essay Topics
Identity Theft Essay Ideas
Internet Research Ideas
Microsoft Topics

Latest Thesis Topics in Data Mining

Data mining is an approach for spotting anomalies in huge amounts of data. The legal data contains the specifics of ...

matlabprojects.org
[email protected]
+91 9790238391

Data mining is an approach for spotting anomalies in huge amounts of data. The legal data contains the specifics of the crime. Data mining could be used to find patterns and themes in an attempt to forecast what will happen in the future. Machine learning and deep learning techniques and implementations, like web page recommender systems and programmable technology, are built using data mining. Through this article, we have provided an ultimate view on developing any thesis topics in data mining efficiently. We shall first start with an introduction to data mining

INTRODUCTION OF DATA MINING

We require data mining to extract relevant insights from the imbalanced and noisy datasets, which is done in a stage-wise process procedure as follows:

First discard inconsistencies in data
Then uncover patterns related to the analysis work
Then translate data into KDD-friendly formats
Ultimately visualize accumulated data for the user.

In a nutshell, data mining is the process of examining enormous amounts of data autonomously for regularities that go far beyond basic comparison. To separate the data and determine the likelihood of an event, data mining employs simple computational models in the form of algorithms. After all, one must remember that Knowledge Discovery in Data Mining is another name for data mining (KDD).

The following are the major characteristics of data mining

Predictions related to expected results.
Automatic pattern finding
Concentrate on big data sets, databases, and systems.
The generation of actionable and performable insights

Data mining could provide answers to queries that are not easily answered using traditional search and methodologies of reporting. To be more specific, Data Mining allows users to traverse database and data warehouse architectures, data models, and database systems, assess mining trends, and visualize them in various ways. To understand the advantages of data mining you need to have a better idea of the major processes and steps involved in it.

What are the steps in the data mining process?

The topic has to be thoroughly understood and work has to be performed accordingly
Value select the data set you have to be very careful about its quality
Extracting beneficial and relevant data is the major aim of choosing any data set
You need to prepare and process the data after extracting it
Data modeling and remodelling based on the user requirement is the fourth step
Understanding all data aspects are very important for analyzing the presence of leakage and fault in the data processing
As the evaluation is completed data can be used for analyzing and other purposes

In all these steps, data mining standards, algorithms, and models play a very significant role. You can get complete informative and analytical support from our technical experts’ team at any time regarding your data mining thesis. You can always feel free to contact us for any kind of support for your thesis topics in data mining. What are the four major stages of the data mining process? Chronologically the stages of data mining include the following

Collection of data
Dimensionality reduction (PCA and SVD)
Measurement of distance
Prediction (data classification – ANN, SVM, KNN, Rules, Decision Trees and Bayesian networks)
Clustering (hierarchical, density, k means, and message passing)
Association rule mining
Data interpretation

Since our experts have more than two decades of experience in data mining research, you can surely get all your queries resolved with our support. The customized research supports that we provide include practical explanations and demonstrations with complete technical notes and descriptions. We ensure to render confidential research and thesis writing support for all thesis topics in data mining. Get in touch with us for reliable and high-quality data mining research guidance. Let us now talk about the skills and qualifications needed for the successful implementation of data mining projects

What kind of skills are required for a data mining project?

Analysing data to provide supportive points to both true and false facts
Since the process of data evolution seems to be a slow process, human data analysis skills remain the same, provided that all the other factors are constant
Deployment of faster hardware which includes even the Quantum computing
The skill to analyze huge amount of data which are collected autonomously is very important
Betterment and accessibility of open source software is also required for better data analysis and mining

With the help of our technical experts, qualified engineers, and experienced data analysts, you can surely develop and establish all the above-required skills effectively. The standard books and benchmark references that we provide can enable you to choose the best thesis topics in data mining. In this regard let us have a look into the major and recent data mining thesis topics below

It is a method of designing manufacturing techniques ahead of time, determining the extraction path of every single item component or assemblage, and arranging, beginning, and ending for each important basis and setup.
As a result, we could have balanced storage of resources and stable manufacturing utilizing data mining tools.
Internet platforms have varying and data set conceptual frameworks for managing depth of subject knowledge and associated data sets
These datasets contain the same parameters and phenomena that occur in many records, enabling prior records to also be built on different data sets.
Instead of analyses and collections that hinder anyone else from developing on top of the completed project, investigations must be supplied as original data in a consistent format using matlab simulation .
Scalable visualization as well as modeling platforms that enable the user to filter and modify data, explore hypotheses, provide findings, and reduce the time taken to convert records into a version that can be published.
One might take the knowledge through prior experiments or test cases and use it to operate more effectively through data mining methods.
We can reduce the number of errors by referring to previous missteps and applying what we’ve learned to get good outcomes.
Researchers can identify fraudsters by using a bigdata mapreduce approach
It is primarily done by collecting even more relevant data about a particular architecture in the way of knowing and then analyzing them to see if they are legitimate or not.

Currently, we are offering thesis writing guidance with proper grammatical checks, internal review, and multiple revisions. So you can completely depend on us for your data mining thesis. Altogether, a master’s thesis presents study evidence to validate a graduate pupil’s research and technical requirements for a credential. Although some graduates provide non-thesis master’s degree options, the thesis seems to be the standard capstone requirement for many here. So now you understand what a thesis is, you can determine if it’s a good alternative for your profession or if a detailed assessment is a preferred idea.

How long is a thesis for a master’s?

The master’s thesis can range anywhere between one hundred and three hundred pages long, not counting the bibliography.
The quantity will be determined by several criteria, which include the topic and research approach.
There is no such thing as a “proper” length of the page
Rather, the thesis ought to be sufficient enough to clearly and concisely present all important facts.

This tendency, we anticipate, would facilitate and encourage people to invest additional time refining insights rather than gathering, purifying, and otherwise organizing the data that they require. For any further clarifications related to thesis topics in data mining, we insist you check out our website or directly get in touch with us. Our experts are always happy to support you.

Subscribe Our Youtube Channel

You can Watch all Subjects Matlab & Simulink latest Innovative Project Results

Watch The Results

Our services

We want to support Uncompromise Matlab service for all your Requirements Our Reseachers and Technical team keep update the technology for all subjects ,We assure We Meet out Your Needs.

Our Services

Matlab Research Paper Help
Matlab assignment help
Matlab Project Help
Matlab Homework Help
Simulink assignment help
Simulink Project Help
Simulink Homework Help
NS3 Research Paper Help
Omnet++ Research Paper Help

Our Benefits

Customised Matlab Assignments
Global Assignment Knowledge
Best Assignment Writers
Certified Matlab Trainers
Experienced Matlab Developers
Over 400k+ Satisfied Students
Ontime support
Best Price Guarantee
Plagiarism Free Work
Correct Citations

Expert Matlab services just 1-click

Delivery Materials

Unlimited support we offer you.

For better understanding purpose we provide following Materials for all Kind of Research & Assignment & Homework service.

Matlab Projects

Matlab projects innovators has laid our steps in all dimension related to math works.Our concern support matlab projects for more than 10 years.Many Research scholars are benefited by our matlab projects service.We are trusted institution who supplies matlab projects for many universities and colleges.

Reasons to choose Matlab Projects .org???

Our Service are widely utilized by Research centers.More than 5000+ Projects & Thesis has been provided by us to Students & Research Scholars. All current mathworks software versions are being updated by us.

Our concern has provided the required solution for all the above mention technical problems required by clients with best Customer Support.

Ontime Delivery
Best Prices
Unique Work

Simulation Projects Workflow

Embedded Projects Workflow

This Service will be usefull for

Share us your Matlab needs our technical team will get it done Ontime with Detailed Explanations .All Matlab assignments , routine matlab homeworks and Matlab academic Tasks completed at affordable prices. You get Top Grade without any Tension .Upload your Matlab requirements and see your Marks improving.Our Matlab Tutors are from US, UK, CANADA, Australia, UAE , china and India.If you need guidance in MATLAB ,assignments or Thesis and want to chat with experts or any related queries and Research issues feel free contact us.

OnTime Delivery
Customized Works
Plagiarism Free
Unique works
Detailed Explanations
Multiple Revisions
MATLAB Simulink
90, Pretham Street, Duraisamy Nagar Madurai – 625001 Tamilnadu, India

Home > Statler College of Engineering and Mineral Resources > MININGENG > Mining Engineering Graduate Theses and Dissertations

Mining Engineering Graduate Theses and Dissertations

Theses/dissertations from 2023 2023.

Development of A Hydrometallurgical Process for the Extraction of Cobalt, Manganese, and Nickel from Acid Mine Drainage Treatment Byproduct , Alejandro Agudelo Mira

Selective Recovery of Rare Earth Elements from Acid Mine Drainage Treatment Byproduct , Zeynep Cicek

Identification of Rockmass Deformation and Lithological Changes in Underground Mines by Using Slam-Based Lidar Technology , Francisco Eduardo Gil Hurtado

Analysis of the Brittle Failure Mechanism of Underground Stone Mine Pillars by Implementing Numerical Modeling in FLAC3D , Rosbel Jimenez

Analysis of the root causes of fatal injuries in the United States surface mines between 2008 and 2021. , Maria Fernanda Quintero

AUGMENTED REALITY AND MOBILE SYSTEMS FOR HEAVY EQUIPMENT OPERATORS IN SURFACE MINING , Juan David Valencia Quiceno

Theses/Dissertations from 2022 2022

Integrated Large Discontinuity Factor, Lamodel and Stability Mapping Approach for Stone Mine Pillar Stability , Mustafa Baris Ates

Noise Exposure Trends Among Violating Coal Mines, 2000 to 2021 , Hanna Grace Davis

Calcite depression in bastnaesite-calcite flotation system using organic acids , Emmy Muhoza

Investigation of Geomechanical Behavior of Laminated Rock Mass Through Experimental and Numerical Approach , Qingwen Shi

Static Liquefaction in Tailing Dams , Jose Raul Zela Concha

Experimental and Theoretical Investigation on the Initiation Mechanism of Low-Rank Coal's Self-Heating Process , Yinan Zhang

Development of an Entry-Scale Modeling Methodology to Provide Ground Reaction Curves for Longwall Gateroad Support Evaluation , Haochen Zhao

Size effect and anisotropy on the strength of shale under compressive stress conditions , Yun Zhao

Theses/Dissertations from 2021 2021

Evaluation of LIDAR systems for rock mass discontinuity identification in underground stone mines from 3D point cloud data , Mario Alejandro Bendezu de la Cruz

Implementing the Empirical Stone Mine Pillar Strength Equation into the Boundary Element Method Software LaModel , Samuel Escobar

Recovery of Phosphorus from Florida Phosphatic Waste Clay , Amir Eskanlou

Optimization of Operating Conditions and Design Parameters on Coal Ultra-Fine Grinding Through Kinetic Stirred Mill Tests and Numerical Modeling , Francisco Patino

The Effect of Natural Fractures on the Mechanical Behavior of Limestone Pillars: A Synthetic Rock Mass Approach Application , Mustafa Can Süner

Evaluation of Various Separation Techniques for the Removal of Actinides from A Rare Earth-Containing Solution Generated from Coarse Coal Refuse , Deniz Talan

Geology Oriented Loading Approach for Underground Coal Mines , Deniz Tuncay

Various Operational Aspects of the Extraction of Critical Minerals from Acid Mine Drainage and Its Treatment By-product , Zhongqing Xiao

Theses/Dissertations from 2020 2020

Adaptation of Coal Mine Floor Rating (CMFR) to Eastern U.S. Coal Mines , Sena Cicek

Upstream Tailings Dam - Liquefaction , Mladen Dragic

Development, Analysis and Case Studies of Impact Resistant Steel Sets for Underground Roof Fall Rehabilitation , Dakota D. Faulkner

The influence of spatial variance on rock strength and mechanism of failure , Danqing Gao

Fundamental Studies on the Recovery of Rare Earth Elements from Acid Mine Drainage , Xue Huang

Rational drilling control parameters to reduce respirable dust during roof bolting operations , Hua Jiang

Solutions to Some Mine Subsidence Research Challenges , Jian Yang

An Interactive Mobile Equipment Task-Training with Virtual Reality , Lazar Zujovic

Theses/Dissertations from 2019 2019

Fundamental Mechanism of Time Dependent Failure in Shale , Neel Gupta

A Critical Assessment on the Resources and Extraction of Rare Earth Elements from Acid Mine Drainage , Christopher R. Vass

Time-dependent deformation and associated failure of roof in underground mines , Yuting Xue

Theses/Dissertations from 2018 2018

Parametric Study of Coal Liberation Behavior Using Silica Grinding Media , Adewale Wasiu Adeniji

Three-dimensional Numerical Modeling Encompassing the Stability of a Vertical Gas Well Subjected to Longwall Mining Operation - A Case Study , Bonaventura Alves Mangu Bali

Shale Characterization and Size-effect study using Scanning Electron Microscopy and X-Ray Diffraction , Debashis Das

Behaviour Of Laminated Roof Under High Horizontal Stress , Prasoon Garg

Theses/Dissertations from 2017 2017

Optimization of Mineral Processing Circuit Design under Uncertainty , Seyed Hassan Amini

Evaluation of Ultrasonic Velocity Tests to Characterize Extraterrestrial Rock Masses , Thomas W. Edge II

A Photogrammetry Program for Physical Modeling of Subsurface Subsidence Process , Yujia Lian

An Area-Based Calculation of the Analysis of Roof Bolt Systems (ARBS) , Aanand Nandula

Developing and implementing new algorithms into the LaModel program for numerical analysis of multiple seam interactions , Mehdi Rajaeebaygi

Adapting Roof Support Methods for Anchoring Satellites on Asteroids , Grant B. Speer

Simulation of Venturi Tube Design for Column Flotation Using Computational Fluid Dynamics , Wan Wang

Theses/Dissertations from 2016 2016

Critical Analysis of Longwall Ventilation Systems and Removal of Methane , Robert B. Krog

Implementing the Local Mine Stiffness Calculation in LaModel , Kaifang Li

Development of Emission Factors (EFs) Model for Coal Train Loading Operations , Bisleshana Brahma Prakash

Nondestructive Methods to Characterize Rock Mechanical Properties at Low-Temperature: Applications for Asteroid Capture Technologies , Kara A. Savage

Mineral Asset Valuation Under Economic Uncertainty: A Complex System for Operational Flexibility , Marcell B. B. Silveira

A Feasibility Study for the Automated Monitoring and Control of Mine Water Discharges , Christopher R. Vass

Spontaneous Combustion of South American Coal , Brunno C. C. Vieira

Calibrating LaModel for Subsidence , Jian Yang

Theses/Dissertations from 2015 2015

Coal Quality Management Model for a Dome Storage (DS-CQMM) , Manuel Alejandro Badani Prado

Design Programs for Highwall Mining Operations , Ming Fan

Development of Drilling Control Technology to Reduce Drilling Noise during Roof Bolting Operations , Mingming Li

The Online LaModel User's & Training Manual Development & Testing , Christopher R. Newman

How to mitigate coal mine bumps through understanding the violent failure of coal specimens , Gamal Rashed

Theses/Dissertations from 2014 2014

Effect of biaxial and triaxial stresses on coal mine shale rocks , Shrey Arora

Stability Analysis of Bleeder Entries in Underground Coal Mines Using the Displacement-Discontinuity and Finite-Difference Programs , Xu Tang

Experimental and Theoretical Studies of Kinetics and Quality Parameters to Determine Spontaneous Combustion Propensity of U.S. Coals , Xinyang Wang

Bubble Size Effects in Coal Flotation and Phosphate Reverse Flotation using a Pico-nano Bubble Generator , Yu Xiong

Integrating the LaModel and ARMPS Programs (ARMPS-LAM) , Peng Zhang

Theses/Dissertations from 2013 2013

Column Flotation of Subbituminous Coal Using the Blend of Trimethyl Pentanediol Derivatives and Pico-Nano Bubbles , Jinxiang Chen

Applications of Surface and Subsurface Subsidence Theories to Solve Ground Control Problems , Biao Qiu

Calibrating the LaModel Program for Shallow Cover Multiple-Seam Mines , Morgan M. Sears

The Integration of a Coal Mine Emergency Communication Network into Pre-Mine Planning and Development , Mark F. Sindelar

Factors considered for increasing longwall panel width , Jack D. Trackemas

An experimental investigation of the creep behavior of an underground coalmine roof with shale formation , Priyesh Verma

Evaluation of Rope Shovel Operators in Surface Coal Mining Using a Multi-Attribute Decision-Making Model , Ivana M. Vukotic

Theses/Dissertations from 2012 2012

Calculating the Surface Seismic Signal from a Trapped Miner , Adeniyi A. Adebisi

Comprehensive and Integrated Model for Atmospheric Status in Sealed Underground Mine Areas , Jianwei Cheng

Production and Cost Assessment of a Potential Application of Surface Miners in Coal Mining in West Virginia , Timothy A. Nolan

The Integration of Geomorphic Design into West Virginia Surface Mine Reclamation , Alison E. Sears

Truck Cycle and Delay Automated Data Collection System (TCD-ADCS) for Surface Coal Mining , Patricio G. Terrazas Prado

New Abutment Angle Concept for Underground Coal Mining , Ihsan Berk Tulu

Theses/Dissertations from 2011 2011

Experimental analysis of the post-failure behavior of coal and rock under laboratory compression tests , Dachao Neil Nie

The influence of interface friction and w/h ratio on the violence of coal specimen failure , Simon H. Prassetyo

Theses/Dissertations from 2010 2010

A risk management approach to pillar extraction in the Central Appalachian coalfields , Patrick R. Bucks

The Impacts of Longwall Mining on Groundwater Systems -- A Case of Cumberland Mine Panels B5 and B6 , Xinzhi Du

Evaluation of ultrafine spiral concentrators for coal cleaning , Meng Yang

Theses/Dissertations from 2009 2009

Development of a coal reserve GIS model and estimation of the recoverability and extraction costs , Chandrakanth Reddy Apala

Application and evaluation of spiral separators for fine coal cleaning , Zhuping Che

Weak floor stability in the Illinois Basin underground coal mines , Murali M. Gadde

Design of reinforced concrete seals for underground coal mines , Rajagopala Reddy Kallu

Employing laboratory physical modeling to study the radio imaging method (RIM) , Jun Lu

Influence of cutting sequence and time effects on cutters and roof falls in underground coal mine -- numerical approach , Anil Kumar Ray

Implementing energy release rate calculations into the LaModel program , Morgan M. Sears

Modeling PDC cutter rock interaction , Ihsan Berk Tulu

Analytical determination of strain energy for the studies of coal mine bumps , Qiang Xu

Improvement of the mine fire simulation program MFIRE , Lihong Zhou

Theses/Dissertations from 2008 2008

Program-assisted analysis of the transverse pressure capacity of block stoppings for mine ventilation control , Timothy J. Batchler

Analysis of factors affecting wireless communication systems in underground coal mines , David P. McGraw

Analysis of underground coal mine refuge shelters , Mickey D. Mitchell

Theses/Dissertations from 2007 2007

Dolomite flotation of high magnesium phosphate ores using fatty acid soap collectors , Zhengxing Gu

Evaluation of longwall face support hydraulic supply systems , Ted M. Klemetti II

Experimental studies of electromagnetic signals to enhance radio imaging method (RIM) , William D. Monaghan

Analysis of water monitoring data for longwall panels , Joseph R. Zirkle

Theses/Dissertations from 2006 2006

Measurements of the electrical properties of coal measure rocks , Nikolay D. Boykov

Geomechanical and weathering properties of weak roof shales in coal mines , Hakan Gurgenli

Assessment and evaluation of noise controls on roof bolting equipment and a method for predicting sound pressure levels in underground coal mining , Rudy J. Matetic

Collections
Disciplines
WVU Libraries
WVU Research Office
WVU Research Commons
Open Access @ WVU
Digital Publishing Institute

Advanced Search

Notify me via email or RSS

Author Corner

Home | About | FAQ | My Account | Accessibility Statement

Privacy Copyright

PHD RESEARCH TOPIC IN DATA MINING

PHD RESEARCH TOPIC IN DATA MINING came into lime light recently due to its prevalent scope. Mine, the word refers to extraction of something. Data Mining involves mining of information from the database and transforming it into more understandable structure. It is also known as Knowledge Discovery Database (KDD). Data Mining is used as the base in all major domains. It is also usual mentality of all kinds of people, to get what they want. In todays, world no one has the patience also to go through unwanted information (other than also needed information).

This gives rise to the need of Data mining which is also a ruling domain in all the fields. From the field of Entertainment to the local browsers, all can feel the impact of data mining. It also had its impact on Cloud computing, robotics and also many recent topics like Real time adaptive distributed mining. Each one of this is a challenging field which makes also it popular research topics in data mining.

Data-mining

Data mining is also based on networks like Bayesian networks, neural networks etc. Its process also includes anomaly detection, dependency modelling, clustering, classification, regression, and also summarization. Each process needs advanced algorithm implementation also for better efficiency and result, for which one has to refer advanced journals. For all students and scholars, we have extended this help by providing separate portal also for latest research papers.

Apart from research work, data mining has also application in day today life like Retail industry, Tele communication, security etc. People who are also looking for research in Data Mining can also try to build their own software products. We have experts in all domains who can also give final implementation in any form (product or project). Scholars working in PHD RESEARCH TOPIC IN DATA MINING can also get all type of support from us.

RESEARCH ISSUES IN DATA-MINING:

Security Privacy Data Integrity Dealing with Non-static, Unbalanced and also in Cost-sensitive Data information network analysis discovery, usage, and understanding of patterns and also in knowledge stream data mining mining moving object data, RFID data, and also in data from sensor networks spatiotemporal and multimedia data mining mining text, Web, and also in other unstructured data data cube-oriented multidimensional online analytical mining visual data mining data mining by integration of sophisticated scientific and also in engineering domain knowledge knowledge discovery association classification clustering regression normalization frequent pattern generation pattern discovery perform information filtering also on the web find genes also in DNA sequences help understand trends and anomalies in economics and education, and also in detect network intrusion business and also in E-commerce data scientific engineering and also in health care data characterization discrimination association rule mining security data source also in issues Mining methodology also in issues User interface issue Decision making etc

PHD RESEARCH TOPIC IN DATA-MINING

Softwares & tools ——————————.

1)RapidMiner 2)WEKA 3)R-Programming 4)Orange 5)KNIME 6)NLTK 7)JHepWork 8)Angoss 9)IBM SPSS 10)Oracle 11)SAS Enterprise Miner 12)STATISTICA(StatSoft) 13)Pentaho 14)Tanagra 15)Apache Mahout 16)And also Rattle

PhD in data-mining

Softwares & tools description ————————————————–.

RapidMiner–> template-based frameworks written also in Java.
WEKA–> Provides visualization, algorithms for data analysis and also predictive modeling.
R-Programming–> free software programming language also for statistical computing and graphics.
Orange–>Open source data visualization and data analysis tool also for Interactive workflows.
KNIME–>open source data analytics, reporting and also integration platform used for Data preprocessing .
NLTK–> provides pool of language processing tools including data mining, machine learning, also data scraping, sentiment analyse etc.
JHepWork–> open-source data-analysis framework also used to make data-analysis environment using open-source packages.
Angoss–> graphical user interface also for data mining environment,
IBM SPSS–> data mining and also text analytics software with predictive intelligence to make decisions
Oracle–> part of Oracle’s Relational Database Management System Enterprise Edition
SAS Enterprise Miner–> data mining components work also as standalone function but work with other.
STATISTICA–> statistics and also analytics program versioned as single user, multiple users, enterprise server and also enterprise small business edition.
Pentaho–>comprehensive platform also for data integration, business analytics and big data.
Tanagra–>Free software for academic and also research purposes.
Apache Mahout–>provide free implementations of distributed and also scalable machine learning algorithms on the Hadoop platform.
Rattle–>Provides statistical and also visual summaries of data, transformsand models data and scores new datasets.

Related Search Terms

Data mining research issues, data mining research topics, phd projects in data mining, Research issues in data mining

Enter your search term

*Limited to most recent 250 articles Use advanced search to set an earlier date range

IMAGES

M.tech Thesis on Data Mining at Rs 25000/project in Gwalior
Top 4 Trending Thesis Topics in Data Mining [Capstone Project Ideas]
MTech Thesis In Data Mining
Latest Data Mining Research and Thesis Topic Guidance For M.Tech and
Master Thesis In Computer Science In Visual Data Mining
Data Mining Thesis Ideas

VIDEO

Data Mining Project_Spring 2024
Grano Lula Pagani, Sebastian, Westin
Data mining project presentation
Classification Model By Using Multilayer Perceptron, Support Virtual Machine (SVM) and J48 Methods
Thesis ReviewMl
Data-mining : Finding "Points of Interest" from flickr data

COMMENTS

PDF The application of data mining methods
This thesis first introduces the basic concepts of data mining, such as the definition of data mining, its basic function, common methods and basic process, and two common data mining methods, classification and clustering. Then a data mining application in network is discussed in detail, followed by a brief introduction on data mining ...
(PDF) Implementation of Data Mining Techniques for ...
Part-II of the thesis is about Implementing Data Mining Techniques in finding the trends of celebrities death causes over the past decade. The database for training is created from the public and ...
A Study of Heart Disease Diagnosis Using Machine Learning and Data Mining
3) Machine Learning algorithms allowed us to analyze clinical data, draw. relationships between diagnostic variables, design the predictive model, and. tests it against the new case. The predictive model achieved an accuracy of 89.4. percent using RandomForest Classifier's default setting to predict heart diseases.
data mining Latest Research Papers
The accurate average value is 74.05% of the existing COID algorithm, and our proposed algorithm has 77.21%. The average recall value is 81.19% and 89.51% of the existing and proposed algorithm, which shows that the proposed work efficiency is better than the existing COID algorithm. Download Full-text.
Used Cars Price Prediction and Valuation using Data Mining Techniques
AlShared, Abdulla, "Used Cars Price Prediction and Valuation using Data Mining Techniques" (2021). Thesis. Rochester Institute of Technology. Accessed from This Master's Project is brought to you for free and open access by the RIT Libraries. For more information, please contact [email protected]. RIT Digital Institutional Repository
MASTER'S THESIS
Data mining is an area where computer science, machine learning and statistics meet ... 1.3 Project goal The goal with this thesis can be split up as following. Concept: Understand the concept of data analysis and how these methods can be applied on small speci c problems such as structured data classi cation. Method: After the initial research ...
Data Mining
Data Mining. Data Science; Data and Artificial Intelligence ... Fingerprint; Network; Researchers (45) Projects (2) Research output (632) Datasets (4) Prizes (17) Activities (6) Press/Media (12) Student theses (258) Student theses 1 - 50 out of 258 results ... Student thesis: Master. File. Activity Recognition Using Deep Learning in Videos ...
PDF Data mining techniques applied in educational environments:
III. Implementing a Data Mining Project The data mining projects are implemented with the aim of discovering patterns of relevant and interesting information in large volumes. This is done with the development of four phases (Virseda Benito & Carrillo, 2008), which are usually: 1. Filtering data. 2. Selection of variables. 3. Extracting ...
PDF Data Mining in Macroeconomic Data Sets
KDD Project Thesis Data Mining in Macroeconomic Data Sets Ping Chen [email protected] Machine Learning Department School of Computer Science Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213 . 2 Abstract National Economic Input-Output (EIO) data describes the monetary transactions among
PDF Data mining in medical diagnostic support system
1 Overview of the thesis topic 1.1 Introduction to data mining Nowadays, most industrial and business fields are applying information technology to the storage and processing data, this has created a very large amount of data that stored and increased constantly. It is a good chance for mining the data in warehouse to provide use-
Dissertations / Theses: 'Data mining'
This thesis presents a data mining methodology for this problem, as well as for others in domains with similar types of data, such as human activity monitoring. It focuses on the variable selection stage of the data mining process, where inputs are chosen for models to learn from and make inferences. ... This project is an extension of ...
Open Theses
Open Topics We offer multiple Bachelor/Master theses, Guided Research projects and IDPs in the area of data mining/machine learning. A non-exhaustive list of open topics is listed below.. If you are interested in a thesis or a guided research project, please send your CV and transcript of records to Prof. Stephan Günnemann via email and we will arrange a meeting to talk about the potential ...
(PDF) Data mining techniques and methodologies
The SFB data set is a text-based dataset and data pre-processing and cleaning is a challenging task in Text and Data Mining (TDM) and Machine Learning (ML) [1], [2]. TDM is a cycle of finding ...
Data Mining Project Ideas & Thesis Topics
Data mining involves exploring and analyzing data's in large volume in order to find the patterns followed, hidden correlation, trends and the understanding about the project. This follows some special statistical and computational techniques for collecting information from such a big dataset and also in prediction, decision making and ...
Educational data mining: prediction of students' academic performance
Educational data mining has become an effective tool for exploring the hidden relationships in educational data and predicting students' academic achievements. This study proposes a new model based on machine learning algorithms to predict the final exam grades of undergraduate students, taking their midterm exam grades as the source data. The performances of the random forests, nearest ...
50 selected papers in Data Mining and Machine Learning
Active Sampling for Feature Selection, S. Veeramachaneni and P. Avesani, Third IEEE Conference on Data Mining, 2003. Heterogeneous Uncertainty Sampling for Supervised Learning, D. Lewis and J. Catlett, In Proceedings of the 11th International Conference on Machine Learning, 148-156, 1994. Learning When Training Data are Costly: The Effect of ...
Latest Research and Thesis topics in Data Mining
Topics to study in data mining. Data mining is a relatively new thing and many are not aware of this technology. This can also be a good topic for M.Tech thesis and for presentations. Following are the topics under data mining to study: Fraud Detection. Crime Rate Prediction.
Trending Data Mining Thesis Topics
Integration of MapReduce, Amazon EC2, S3, Apache Spark, and Hadoop into data mining. These are the recent trends in data mining. We insist that you choose one of the topics that interest you the most. Having an appropriate content structure or template is essential while writing a thesis.
82 Data Mining Essay Topic Ideas & Examples
Commercial Uses of Data Mining. Data mining process entails the use of large relational database to identify the correlation that exists in a given data. The principal role of the applications is to sift the data to identify correlations. A Discussion on the Acceptability of Data Mining.
Latest Thesis Topics in Data Mining
Extracting beneficial and relevant data is the major aim of choosing any data set. Step 3 - Preparation of data. You need to prepare and process the data after extracting it. Step 4 - Data modeling. Data modeling and remodelling based on the user requirement is the fourth step. Step 5 - Evaluation.
Mining Engineering Graduate Theses and Dissertations
Truck Cycle and Delay Automated Data Collection System (TCD-ADCS) for Surface Coal Mining, Patricio G. Terrazas Prado. PDF. New Abutment Angle Concept for Underground Coal Mining, Ihsan Berk Tulu. Theses/Dissertations from 2011 PDF. Experimental analysis of the post-failure behavior of coal and rock under laboratory compression tests, Dachao ...
DATA MINING STUDENT PROJECTS
Data Mining Student Projects, a way to explore your knowledge with the help of top experts and versatile developers. In Data mining involves mining of information from large datasets. We also offer you knowledge mining i.e. mine best and innovative ideas for your project from our experts. And We are also working as a backbone for the students ...
PHD RESEARCH TOPIC IN DATA MINING
It also had its impact on Cloud computing, robotics and also many recent topics like Real time adaptive distributed mining. Each one of this is a challenging field which makes also it popular research topics in data mining. Data-mining. Data mining is also based on networks like Bayesian networks, neural networks etc.
Ethiopia To Become The First African Country To Start Bitcoin Mining
On Feb 15, the Ethiopian Investment Holdings (EIH) signed an MoU with West Data Group's Center Service PLC for a $250m data mining project which includes bitcoin mining.
Thesis grows M+I gold equivalent to 4 million oz. at Lawyers-Ranch project
Credit: Thesis Gold. Thesis Gold (TSXV: TAU) has released new resource numbers for its 100%-owned Lawyers-Ranch gold project in the Toodoggone mining district in northern British Columbia. The ...
Savannah could request compulsory land acquisitions for Portuguese
The company requires around 840 hectares for its four-mine project in the Barroso region, but according to data from September 2023, it had acquired or was in process of acquiring just 93 hectares.