Big Data Spark Cluster
Big Data Platform - Spark Cluster
Teaching the technology of Big Data is important. My background is not as a Data Scientist but more of a Data Engineer. And I love deploying infrastructure.
Current Cluster:
- 8 nodes Spark 3.2.x Research Cluster
- 491 GB RAM and 47 processors available.
Current Hardware
- MasterNode
- 1.8 TB of storage, 32 GB of RAM, AMD FX 6100 Hexacore
- Worker Node 1
- 3.6 TB of storage, 94 GB of RAM, 2x Intel Xeon E5530
- Worker Node 2
- 2 TB of storage, 32 GB of RAM, 2x Intel Xeon E5530
- Worker Node 3
- 4 TB of storage, 32 GB of RAM, 2x Intel Xeon E5504
- Worker Node 4
- 2 TB of storage, 64 GB of RAM, 2x Intel Xeon E5530
- Worker Node 5
- 2 TB of storage, 100 GB of RAM, Intel Xeon E5-2620
- Worker Node 6
- 2 TB of storage, 88 GB of RAM, Intel Xeon E5-2620
- Worker Node 7
- 2 TB of storage, 68 GB of RAM, Intel Xeon X5650
- Worker Node 8
- 2 TB of storage, 68 GB of RAM, Intel Xeon X5650
Storage component for Spark Cluster
There is an on-prem solution for storing datasets. Using Minio
Min.io
On-prem S3 compatible Object Storage
Software Support
- Supported Software
- MapReduce
- Spark
- SparkR
- SparkQL