Super Unofficial In this video I’ll show you how to use Python Notebooks and Apache Spark to perform simple analysis on the Back to the Future transcript.

The source code and Docker Compose config for this tutorial can be found at:

https://github.com/markwatsonatx/tutorial-spark-notebook-wordcount

Clone the repository, run “docker-compose up -d” and you’ll be up and running!

This tutorial uses a Docker Image that I created and can be found at:

https://hub.docker.com/r/markwatsonatx/spark-notebook

The Docker Image contains Apache Spark 2.0.0-preview pre-built for Hadoop 2.7. It also includes Python 3.5 and Anaconda for running Python Notebooks.

The Dockerfile can be found at:

https://github.com/markwatsonatx/Dockerfiles/tree/master/spark-notebook-2.0.0-preview