Big Data Spark Cluster

Big Data Platform - Spark Cluster

Teaching the technology of Big Data is important. My background is not as a Data Scientist but more of a Data Engineer. And I love deploying infrastructure.

Current Cluster:

8 nodes Spark 3.2.x Research Cluster
- 491 GB RAM and 47 processors available.

Current Hardware

MasterNode
- 1.8 TB of storage, 32 GB of RAM, AMD FX 6100 Hexacore
Worker Node 1
- 3.6 TB of storage, 94 GB of RAM, 2x Intel Xeon E5530
Worker Node 2
- 2 TB of storage, 32 GB of RAM, 2x Intel Xeon E5530
Worker Node 3
- 4 TB of storage, 32 GB of RAM, 2x Intel Xeon E5504
Worker Node 4
- 2 TB of storage, 64 GB of RAM, 2x Intel Xeon E5530
Worker Node 5
- 2 TB of storage, 100 GB of RAM, Intel Xeon E5-2620
Worker Node 6
- 2 TB of storage, 88 GB of RAM, Intel Xeon E5-2620
Worker Node 7
- 2 TB of storage, 68 GB of RAM, Intel Xeon X5650
Worker Node 8
- 2 TB of storage, 68 GB of RAM, Intel Xeon X5650

Storage component for Spark Cluster

There is an on-prem solution for storing datasets. Using Minio

Min.io

On-prem S3 compatible Object Storage

Software Support

Supported Software
MapReduce
Spark
SparkR
SparkQL

Actual Image Located at the Wheaton Rice Campus - Room 242