[ONLINE] PATC: Introduction to Big Data Analytics

Date: 01/Feb/2021 Time: 09:30 - 05/Feb/2021 Time: 13:00

Place:

The course will be online via Zoom.

Target group: For trainees with some theoretical and practical knowledge;

Cost: There is no registration fee. The course is free of charge.

Primary tabs

Day 1 (Feb 1st)

9:30 – 13:00 Introduction to Big Data (David Carrera, Data Centric Computing Group, BSC)
The goal of this session is to introduce the students in the technologies associated with Big Data: data challenges, cloud computing, processing, and internet of things. An overview of the technologies will be provided, both from a technical and from a business model point of view
11:00 - 11:30 Coffee break

13:00 – 14:00 Lunch Break

 

14:00 – 16:00 Practical Data Analytics for Solving Real World Problems (Patricio Reyes, Researcher, BSC; Maria Teresa Grifa, Data Scientist, Bridgestone EMA)
Data analytics has changed the way we make decisions. We see the benefits and the advances in many fields that go from financial to medical and industrial applications due to the integration of advanced data analytics. In this course we will propose practical tips gained through our experience at BSC in data analytics projects. We will also discover how to overcome some of the most challenging tasks in practical data analytics.

16:00 – 16:30 Coffee break

16:30 – 18:00 Hands-on (Patricio Reyes, Researcher, BSC; Maria Teresa Grifa, Data Scientist, Bridgestone EMA)
In this session you will learn how to structure a data analytics project, by following the methodology and the concepts introduced in the previous session. We will guide you through a step-by-step process to set up data science projects and start collaborating with the members of a team.

Day 2 (Feb 2nd)

9:30 – 13:00 Big Data Management (Albert Abelló, UPC, inLab FIB and Petar Jovanovic, UPC)
Big Data has many definitions and facets, we'll pay attention to the problems we have to face to store it and how we can process it. More specifically, we'll focus on the Apache Hadoop ecosystem and its two basic components, namely HBase and MapReduce engine.
11:00 - 11:30 Coffee break
Hands-on exercise

13:00 – 14:00 Lunch Break

14:00 - 16:00 NoSQL databases (Oscar Romero, Dept. of Service and Information System Engineering, UPC-BarcelonaTech)
The relational model has dominated data storage systems since the mid 1970s. However, the changing storage needs over the past decade have given rise to new models for storing data, collectively known as NoSQL. In this presentation, we will focus on two of the most common types of NoSQL databases: document-oriented databases and graph databases and explain the use cases suitable for each of them.

16:00 - 16:30 Coffee break

16:30 - 18:00 Multidisciplinary research and data analytics: Cultural Heritage (Maria Cristina Marinescu / Joaquim More / Artem Rashetnikov, Computer Applications in Science&Engineering, BSC)
This session will focus on Cultural Heritage as an example of a field that can really take advantage of integrating, analyzing, and reasoning with large amounts of data from many heterogeneous sources. We will explain how to improve the quality and quantity of open metadata associated with European Cultural Heritage (CH) imagery, starting (mostly) from images of paintings and text. Our ultimate goal is to transcribe insights about culture, symbols and traditions in a knowledge representation accessible to machine learning and artificial intelligence.
 

Day 3 (Feb 3rd)

9:30 – 13:00 Data Analytics with Apache Spark (Josep Lluis Berral, Computer Sciences - Data Centric Computing, BSC)
11:00 - 11:30 Coffee break
Apache Spark has become a consolidated technology for large-scale processing in a fast and general way, with “programmer-friendly” interfaces and official bindings for many of the most used languages (Java, Scala, Python and R), extensive documentation and development tools. This course introduces Apache Spark, as well as some of its core libraries for data manipulation, machine learning, data streams and graph analytics.

13:00 – 14:00 Lunch Break

14:00 – 15:30 Data Analytics with Apache Spark. Part 2 (Josep Lluis Berral, Computer Sciences - Data Centric Computing, BSC)

15:30 – 17:00 IoTwins: Modelling Mobility with Massive Amounts of Data (A H2020 European Project) (Eduardo Graells, Mobility Data Scientist, BSC)
What are the decisions that people make when moving in and out of places? Having an answer would allow to design and build better and safer places for us to congregate and enjoy, and to make efficient use of space. In IoTwins we aim to answer this question by studying how people move in the Camp Nou stadium, through the analysis of massive amounts of data coming from sensors and mobile platforms, and the usage of machine learning models and agent-based simulations.

Day 4 (Feb 4th)

9:30 – 13:00 Practical Introduction to programming Artificial Intelligence (Jordi Torres, Emerging Technologies for Artificial Intelligence Group Manager - Computer Sciences, BSC)

ABSTRACT: The next generation of Artificial Intelligence applications impose new and demanding computing infrastructures. How are the computer systems that support artificial intelligence? How to program it?

CONTENT:

Artificial Intelligence is a Supercomputing Problem

Programming Artificial Intelligence

  • Getting Started with Deep Learning
  • Deep Learning basic concepts
  • Learning Process of a Deep Neural Network

Scaling Artificial Intelligence applications

  • Scalable AI on Parallel and Distributed Infrastructures
  • Training on Multiple GPUs
  • Training on Multiple Servers

(*) Essential prerequisites to enroll in this course: It is assumed that the student has a basic knowledge of Python and Linux before starting the course.

 

Day 5 (Feb 5th)

9:30 – 13:00 Data Visualization Theory (Luz Calvo, User Experience And Interaction Designer, BSC and Juan Felipe Gomez Celis, FrontEnd Developer, BSC)

Data Visualization Theory (1h 30m)

  1. Basic concepts
  2. Human perception
  3. Design
  4. Colour
  5. Audience / Validation / Bad practices
  6. Visualisation design process

[11:00 - 11:30 Coffee break]

Tools for data visualization (30m)

  1. Tableau
  2. Data Wrapper
  3. RawGraphs
  4. Flourish
D3.js  (1h30m)
       D3.js Basics (Theory)
       Case studies

 

END of COURSE