M20773 Analyzing Big Data with Microsoft R
This 3-day instructor led course provides students to use Microsoft R Server to create and run an analysis on a large dataset, and show how to utilize it in Big Data environments, such as a Hadoop or Spark cluster, or a SQL Server database.
Accredited course for Continuing Education of Pedagogical Staff
Course length: 3
List price: 18 200 CZK (without VAT)
This course has no dates set. If you are interested in setting a new one, please contact
PDF to download
Expand allCollapse all
Students will be able to
- Explain how Microsoft R Server and Microsoft R Client work.
- Use R Client with R Server to explore big data held in different data stores.
- Visualize data by using graphs and plots.
- Transform and clean big data sets.
- Implement options for splitting analysis jobs into parallel tasks.
- Build and evaluate regression models generated from big data.
- Create, score, and deploy partitioning models generated from big data.
- Use R in the SQL Server and Hadoop environments.
Knowledge of common statistical methods and data analysis best practices. Basic knowledge of the Microsoft Windows operating system.Working knowledge of relational databases.
This course is intended for
This course is intended people who wish to analyze large datasets within a big data environment and developers who need to integrate R analyses into their solutions.
All participants will get original Microsoft student materials.
Classrooms are equipped with high-performance computers with Internet access and the possibility of wireless connection.
Module 1: Microsoft R Server and R Client
- Lesson: What is Microsoft R server
- Lesson: Using Microsoft R client
- Lesson: The ScaleR functions
- Lab: Exploring Microsoft R Server and Microsoft R Client
Module 2: Exploring Big Data
- Lesson: Understanding ScaleR data sources
- Lesson: Reading data into an XDF object
- Lesson: Summarizing data in an XDF object
- Lab: Exploring Big Data
Module 3: Visualizing Big Data
- Lesson: Visualizing In-memory data
- Lesson: Visualizing big data
- Lab: Visualizing data
Module 4: Processing Big Data
- Lesson: Transforming Big Data
- Lesson: Managing datasets
- Lab: Processing big data
Module 5: Parallelizing Analysis Operations
- Lesson: Using the RxLocalParallel compute context with rxExec
- Lesson: Using the revoPemaR package
- Lab: Using rxExec and RevoPemaR to parallelize operations
Module 6: Creating and Evaluating Regression Models
- Lesson: Clustering Big Data
- Lesson: Generating regression models and making predictions
- Lab: Creating a linear regression model
Module 7: Creating and Evaluating Partitioning Models
- Lesson: Creating partitioning models based on decision trees
- Lesson: Test partitioning models by making and comparing predictions
- Lab: Creating and evaluating partitioning models
Module 8: Processing Big Data in SQL Server and Hadoop
- Lesson: Using R in SQL Server
- Lesson: Using Hadoop Map/Reduce
- Lesson: Using Hadoop Spark
- Lab: Processing big data in SQL Server and Hadoop