WHU | Logo

Workshop: Data Preparation for Analytics

Course code
Course type
MSc Course
Weekly Hours
FS 2024
Dr. Tobias Keller
Please note that exchange students obtain a higher number of credits in the BSc-program at WHU than listed here. For further information please contact directly the International Relations Office.

Course Objectives

“The world’s most valuable resource is no longer oil, but data” – The Economist, May 2017.

Data has the potential to create immense business value, to disrupt existing, and to create new business models. But like oil, data needs refinement to fulfill this potential! The amount of time required to bring the data into shape for machine learning and artificial intelligence algorithms or statistical analysis is often underestimated. Furthermore, introductions to data science typically focus on the methods and algorithms and do not cover the required data preparation appropriately.

This workshop aims at enabling students to go beyond the unrealistically clean datasets provided in data science and machine learning tutorials. Instead, students learn how to handle data as they would face it in real-life business situations, where errors, inconsistencies, incompleteness, duplicates and many more problems are commonplace. They learn how to combine data from different sources and how to efficiently perform computations, aggregations, and other typical data preparation steps. Finally, students are introduced to special data preprocessing steps required for machine learning.

Having completed this course will give students an edge in the labor market where most newcomers have little experience with real-life datasets – especially those aiming for a career in consulting or other areas related to data science and artificial intelligence.

This course is also an ideal complement for students taking the courses “Managing Data Science” and “Visual Data Analysis”.

Course Contents

The course covers the typical data preparation techniques required for analytics:

  • Loading and joining data from different types of data sources
  • Data types and conversions
  • Filtering
  • Computations
  • Aggregations
  • Pivoting / reshaping
  • Handling inconsistencies and errors in the data
  • Time series operations
  • Special preprocessing operations for machine learning

We may emphasize or skip topics based on questions or suggestions during the workshop and based on the pace of the group.

Date Time
Thursday, 11.01.2024 09:45 - 17:00
Friday, 02.02.2024 09:45 - 17:00
Thursday, 08.02.2024 09:45 - 17:00
The course consists of hands-on tutorials and exercises using Python. Participants will learn by examples and exercises from the instructor’s experience as a data scientist in practice and empirical research. Students will use their own computers. Please see the requirements below for a list of software that needs to be installed to that end.
Active participation
WHU | Logo