WHU | Logo

Web Mining and Cognitive Computing - (B-E-F-M)

Course code
Course type
MSc Course
Weekly Hours
FS 2024
Martin Prause
Please note that exchange students obtain a higher number of credits in the BSc-program at WHU than listed here. For further information please contact directly the International Relations Office.

Various commentators have claimed that “data is the new oil” (Economist, 2017). This analogy typically refers to the economic impact of data on the industry and society today in comparison to oil and the second industrial revolution. However, while data could be the driving force of future economic growth, its characteristics, to refine economic value, are entirely different from the skill set needed during the second industrial revolution.

First, data comes in numerous shapes (variety, e.g., structured and unstructured), sizes (volume) and speed (velocity, e.g., real-time data) coining the term Big Data. Second, while data can be replicated at almost zero cost (no transportation cost), the cost for creating and aggregating meaningful data can be substantial. Third, the extraction of information or knowledge requires additional analytical techniques. Data per se has no economic value. Fourth, and finally, the usage of data implies new problems concerning privacy, ownership and trade regulations.

This course focuses on methods to aggregate textual, audio-visual and numerical data from different sources and types and processes them using appropriate methods to extract valuable information from it. Typical use cases are
1) Understanding the structure of the web as a distributed network using various protocols and standards (HTTP, SOCKS, REST, …).
2) Automatic news extraction from a website (Web crawling) such as
Newspaper websites, Data services 3) Social networks.
3) Text analysis of PDF documents using natural language processing for classification, sentiment analysis, semantic analysis and topic modeling.
4) Cognitive data processing for visual and audio analysis for image and video classification, face and gesture recognition, voice and music pattern recognition.

The learning objectives are:
-Understanding the structure of the web as a distributed network using various protocols and standards (HTTP, SOCKS, REST, …).
-Analyzing and parsing different document exchange formats such as HTML, XML, and JSON using regular expressions.
-Retrieving web resources using Python and storing the data in relational databases or NoSQL databases.
-Working and storing large amounts of data using cloud services
Parsing PDF and Word documents.
-Working with unstructured data such as videos, images and audio data from social networks and extracting semantic information from it.
-Applying natural language processing techniques for the classification and semantic analysis of documents.

Date Time
Tuesday, 12.03.2024 13:45 - 18:45
Thursday, 14.03.2024 15:30 - 20:30
Wednesday, 03.04.2024 15:30 - 20:30
Friday, 05.04.2024 15:30 - 20:30
Wednesday, 10.04.2024 15:30 - 20:30
1) “Python for Everyone”: https://www.py4e.com/book2) Witten, I. H.; Frank, Eibe. (2005): Data Mining. Practical Machine Learning Tools and Techniques, Second Edition: Morgan Kaufmann Publishers.3) Bird, S.; Klein, E.; Loper, E. (2009): Natural Language Processing with Python, O’Reilly Media
-Pre-course self-study online program to learn Python
-Follow-me-through the code examples
-Coding exercises
-Live data / real-world data analysis
Group Assignments (50%)
Individual case study (50%)
WHU | Logo