Data Mining
Students are provided with a comprehensive coverage of the problems related to data representation, manipulation and processing in terms of extracting information from data, including big data. They will apply their working understanding to the data primer, data processing and classification. They will also enhance their familiarity with dynamic data space partitioning, using evolving, clustering and data clouds. Furthermore, they will monitor the quality of the self-learning system online.
Students will also gain the ability to develop software scripts that implement advanced data representation and processing. With that they demonstrate their impact on performance. In addition, they will develop a working knowledge in listing, explaining and generalising the trade-offs of performance. They will also understand the complexity in designing practical solutions for problems of data representation and processing in terms of storage, time and computing power.
Data Science Fundamentals
This module will help you understand what the data science role entails and how that individual performs their job within an organisation on a day-to-day basis. Students will look at how research is performed in terms of formulating a hypothesis. The research findings are implemented, so students get to know different research strategies. They will gain an understanding of data processing, preparation and integration, and how this enables research to be performed. Furthermore, they learn how data science problems are tackled in an industrial setting, and how such findings are communicated to people within the organisation.
Programming for Data Scientists
This module is designed for students that are completely new to programming, and for experienced programmers. That brings them both to a high-skilled level to handle complex data science problems. Beginner students will learn the fundamentals of programming. Experienced students will have the opportunity to sharpen and further develop their programming skills. The students are going to learn data-processing techniques, including visualisation and statistical data analysis. In order to handle the most complex data science tasks, we will also cover problem solving, and the development of graphical applications. Two open-source programming languages will be used, R and Python.
Statistical Foundations
This module will motivate the use of statistical modelling as a tool for making inference on a population given a sample of data. Students will be introduced to basic terminology of statistical modelling. The similarities and differences between statistical and machine learning approaches will be discussed. That lays the foundations for the development of both of these over the remaining core modules. They will cover the concepts of sampling uncertainty, statistical inference and model fitting. It samples uncertainty used to motivate the need for standard errors and confidence intervals. Once core concepts have been established, linear regression and generalised linear models will be introduced. Those are essential statistical modelling tools. An understanding of these models will be obtained through implementation in the statistical software package R.
Statistical Learning
This module provides an introduction to statistical learning. General topics covered include big data, missing data and biased samples. Likelihood and cross-validation will be introduced as generic methods to fit and select statistical learning models. Cross-validation will require an understanding of sample splitting into calibration, training and validation samples. The focus will then move to handling regression problems for large data sets via variable reduction methods. Examples are the Lasso and Elastic Net. A variety of classification methods will be covered. That includs logistic and multinomial logistic models, regression trees, random forests and bagging and boosting. Examination of classification methods will culminate in neural networks which will be presented as generalised linear modelling extensions. Unsupervised learning for big data is then covered including K-means, PAM and CLARA. That is followed by mixture models and latent class analysis.
Optional Modules
Applied Data Mining
This module provides students with up-to-date information on current applications of data in both industry and research. It is an expansion on the module ‘Data Mining’. Students will gain a more detailed level of understanding about how data is processed and applied on a large scale across a variety of different areas.
Students will develop knowledge in different areas of science and will recognise their relation to big data. This is an addition to the understanding of how large-scale challenges are being addressed with current state-of-the-art techniques. The module will provide recommendations on the Social Web and their roots in social network theory and analysis. They will focus on primer, user-generated content and crowd-sourced data, social networks (theories, analysis), recommendation (collaborative filtering, content recommendation challenges, and friend recommendation/link prediction).
On completion of this module, students will be able to create scalable solutions to problems involving data from the semantic, social and scientific web. In addition knowledge about processing networks and performing of network analysis. With that they can identify key factors in information flow.
Building Big Data Systems
In this module we explore the architectural approaches, techniques and technologies that underpin today’s Big Data system. It is all about infrastructure and particularly large-scale enterprise systems. The module provides a broad knowledge and context of systems architecture enabling students to assess new systems technologies. As a result, students will learn where technologies fit in the larger scheme of enterprise systems and state of the art research thinking.
This module focuses on the principles of Big Data systems. Students apply those principles using state of the art technology to engineer and lead data science projects. Detailed case studies and invited industrial speakers will be used to provide supporting real-world context. This gives the basis for interactive seminar discussions.
Intelligent Data Analysis and Visualisation
The module provides an introduction to the fundamental methods and approaches from the interrelated areas of data mining, statistical/machine learning, and intelligent data analysis. The module covers the entire data analysis process: formulation of a project objective, developing an understanding of the available data and other resources, and statistical modelling and performance assessment. The focus is the classification and uses the R programming language.
Optimisation and Heuristics
Optimisation, sometimes called mathematical programming, has applications in many fields. It is included in operational research, computer science, statistics, finance, engineering and the physical sciences. Commercial optimisation software is now capable of solving many industrial-scale problems to proven optimality.
The module is designed to enable students to apply optimisation techniques to business problems. Building on the introduction to optimisation in the first term, students will be introduced to different problem formulations and algorithmic methods. This guides decision making in business and other organisations.
MSc Data Science Dissertation
This starts with the students selecting an industry or research partner. There they will undertake a placement in June-July. Then they will submit a written dissertation of 20,000-30,000 words in early September.
This is a self-study module designed to provide the foundation of the main dissertation, at a level considered to be of publishable quality. The project also offers students the opportunity to apply their technical skills and knowledge on current world class research problems. With that they develop an expert knowledge on a specific area.
The topic of the project will vary from student to student. That depends on the interests of the student and availability of research staff and industry partners.