Databricks Fundamentals & Apache Spark Core

Databricks Fundamentals & Apache Spark Core

Learn how to process big-data using Databricks & Apache Spark 2.4 and 3.0.0 – DataFrame API and Spark SQL
What you’ll learn

  • Databricks

  • Apache Spark Architecture

  • Apache Spark DataFrame API

  • Apache Spark SQL

  • Selecting, and manipulating columns of a DataFrame

  • Filtering, dropping, sorting rows of a DataFrame

  • Joining, reading, writing and partitioning DataFrames

  • Aggregating DataFrames rows

  • Working with User Defined Functions

  • Use the DataFrameWriter API
Requirements
  • Basic Scala knowledge
  • Basic SQL knowledge
Description

Welcome to this course on Databricks and Apache Spark 2.4 and 3.0.0

Apache Spark is a Big Data Processing Framework that runs at scale.
In this course, we will learn how to write Spark Applications using Scala and SQL.

Databricks is a company founded by the creator of Apache Spark.
Databricks offers a managed and optimized version of Apache Spark that runs in the cloud.

The main focus of this course is to teach you how to use the DataFrame API & SQL to accomplish tasks such as:

  • Write and run Apache Spark code using Databricks
  • Read and Write Data from the Databricks File System – DBFS
  • Explain how Apache Spark runs on a cluster with multiple Nodes

Use the DataFrame API and SQL to perform data manipulation tasks such as

  • Selecting, renaming and manipulating columns
  • Filtering, dropping and aggregating rows
  • Joining DataFrames
  • Create UDFs and use them with DataFrame API or Spark SQL
  • Writing DataFrames to external storage systems

List and explain the element of Apache Spark execution hierarchy such as

  • Jobs
  • Stages
  • Tasks
Who this course is for:
  • Software developers curious about big-data, data engeneering and data science
  • Beginner data engineer who want to learn how to do work with databricks
  • Beginner data scientist who want to learn how to do work with databricks

Tags:

Tutorial Bar
Logo