PATC Course: Big Data Analytics
Date: 08/Feb/2016 Time: 09:00 - 11/Feb/2016 Time: 18:00
Place: The course will take place in
Barcelona Supercomputing Centre,
within the UPC Campus Nord premises.
Room VX208, Vertex building
Barcelona Supercomputing Centre,
within the UPC Campus Nord premises.
Room VX208, Vertex building
Target group: Level: (All courses are designed for specialists with at least 1st cycle degree or similar background experience) INTERMEDIATE: for trainees with some theoretical and practical knowledge;
Cost: There is no registration fee. The attendees would need to cover the expenses for travel, accommodation and meals.
Primary tabs
Day 1 08/02: Introduction (Vassil Alexandrov)
Session 1: 9:30am – 1pm
Session 2: 2pm – 6pm
Day 2 09/02:
Session 1: 9:30am – 1pm Data sharing (Anna Queralt)
Coffee break 11:00- 11:30
Session 2: 2pm – 5pm Data analytics with Apache Spark - part 1 (Mario Macias)
Day 3 10/02
Session 1: 9:30am – 1pm Data analytics with Apache Spark - part 2 (Mario Macias)
Session 2: 2pm – 6pm (Jordi Torres)
Day 4 11/02:
Session 1: 9:30am – 1pm (Alberto Abello)
Coffee break 16:00 – 16:30
Session 1: 9:30am – 1pm
- Data Science current trends session will focus on results of the latest key studies both and Europe and the USA an in the area of Data Science and outline the major trends, findings and recommendations.
- Data Science definitions and mathematical foundations introduction.
Session 2: 2pm – 6pm
- This session will focus on several key methods and algorithms (both serial and parallel) that enable to discover global properties on data while dealing with Big Data:
- Network Science
- Multi Constrained and Multi-Objective Optimization
- Examples of using the above approaches
- Examples using the above approaches and some hands-on exercise
- Social Simulation Applications (Josep Casanovas)
Day 2 09/02:
Session 1: 9:30am – 1pm Data sharing (Anna Queralt)
- In this session we will provide an overview on current Open Data and data sharing approaches.
Coffee break 11:00- 11:30
- Hands-on exercise
Session 2: 2pm – 5pm Data analytics with Apache Spark - part 1 (Mario Macias)
In the recent years, Apache Spark has emerged as one of the most promising technologies for large-scale data processing in a fast and general way, with “programmer-friendly” interfaces and official bindings for many of the most used languages (Java, Scala, Python and R), extensive documentation and development tools. In addition, overcomes other MapReduce engines by 10x to 100x in terms of performance. This course introduces Apache Spark, as well as some of its core libraries for data manipulation, machine learning, graph analytics, etc.
- Introduction to the core concepts of Apache Spark: RDDs and Basic Data Access.
- Hands on: get the most frequent term from a text.
- Processing semi-structured data with Spark SQL.
- Hands on: statistical processing from Data Sheets.
- Multidisciplinary research and data analytics: Smart Cities (Maria Cristina Marinescu)
Day 3 10/02
Session 1: 9:30am – 1pm Data analytics with Apache Spark - part 2 (Mario Macias)
- Machine learning with Spark ML.
- Hands on: clustering images according to their tags.
Session 2: 2pm – 6pm (Jordi Torres)
- Hello World in TensorFlow
- Hands-on exercises: beginning with basic machine learning models before moving on to a deep neural network, you will try out programming concepts as you learn them.
Day 4 11/02:
Session 1: 9:30am – 1pm (Alberto Abello)
- Big Data Management
- Big Data has many definitions and facets, we'll pay attention to the problems we have to face to store it and how we can process it. More specifically, we'll focus on the Apache Hadoop ecosystem and its two basic components, namely HBase and MapReduce engine.
- Hands-on exercise
- Data Visualisation
Coffee break 16:00 – 16:30
- Hands-on exercise