If you’re reading this book, it will come as no surprise that we are in the middle of a revolution in the way data is stored and processed in the enterprise. As anyone who has been in IT for any length of time knows, the technologies and approaches behind data processing and storage are always evolving. However, in the past 10 to 15 years, the pace of change has been remarkable. We have moved from a world where almost all enterprise data was processed and analyzed using variants of SQL and was contained in some form of relational database to one in which an enterprise’s data may be found in a variety of so-called NoSQL storage engines. Each of these engines sacrifices some constraint of the relational model to achieve superior performance and scalability for a certain use case. The modern data landscape includes nonrelational key-value stores, distributed filesystems, distributed columnar databases, log stores, and document stores, in addition to traditional relational databases. The data in these systems is exploited in a multitude of ways and is processed using distributed batch processing algorithms, stream processing, massively parallel processing query engines, free-text searches, and machine learning pipelines.