• MySQL uses so called “schema on write” – it will need the data to be converted into MySQL. If our data is not inside MySQL you can’t use “sql” to query it.

• Spark (and Hadoop/Hive as well) uses “schema on read” – it can apply a table structure on top of a compressed text file, for example, (or any other supported input format)  and see it as a table; then we can use SQL to query this “table.”

Using Apache Spark and MySQL for Data Analysis
A discussion on how to use Apache Spark and MySQL for data analysis.

The rise of many easy-to-use, inexpensive, and open-source streaming-data platform components:

Apache Storm, a Hadoop compatible add-on (developed by Twitter) for rapid data transformation, has been implemented by The Weather
Channel, Spotify, WebMD, and Alibaba.com.

Apache Spark, a fast and general engine for large-scale data processing, supports SQL, machine learning, and streaming-data
analysis.

Apache Kafka, an open-source message broker, is widely used for consumption of streaming data.

Amazon Kinesis, a fully managed, cloud-based service for real-time data processing over large, distributed data streams, can continuously capture large volumes of data from streaming sources.