Documentation Index
Fetch the complete documentation index at: https://mintlify.com/apache/spark/llms.txt
Use this file to discover all available pages before exploring further.
Apache Spark
Lightning-fast unified analytics engine for large-scale data processing. Process massive datasets with SQL, streaming, machine learning, and graph processing.

Quick Start
Get Apache Spark up and running in minutes
Download Apache Spark
Start the Interactive Shell
JAVA_HOME to point to your Java installation.Run an Example Application
spark-submit. Learn more in the Submitting Applications guide.Explore by Component
Apache Spark provides a rich set of libraries for different data processing needs
Spark SQL
Structured Streaming
MLlib
GraphX
Spark Connect
Core API
Choose Your Language
Write Spark applications in Scala, Java, Python, R, or SQL
Scala
Native Spark language with type-safe Dataset APIs and functional programming support.
Java
Full-featured Java APIs for enterprise applications with familiar syntax and tooling.
Python
PySpark brings Spark to the Python ecosystem with pandas-like APIs and native library integration.
R
SparkR enables R users to leverage Spark’s distributed computing capabilities with familiar R syntax.
SQL
Write standard SQL queries with ANSI SQL support, window functions, and advanced analytics.
Deploy Anywhere
Run Spark on your preferred cluster manager
Standalone Mode
Kubernetes
Apache YARN
Cluster Overview
Ready to process big data at scale?
Start building distributed data processing applications with Apache Spark. From batch processing to real-time analytics, Spark powers some of the world’s largest data workloads.