The goal of this book is to introduce the reader to a powerful, flexible and free set of data manipulation and cleansing commands developed over decades in the unix/linux environment but are now available in any operating system with a minimum amount of effort to set up the environment. While all examples and scripts use the “bash” command set, many of the concepts translate into other forms of shell scripting (ksh, sh, csh), including the concept of piping data between commands, regular expression substitution and the sed and awk commands. Aimed at a reader relatively new to working in a bash environment, the book is comprehensive enough to be a good reference and teach a few new tricks to those who already have some experience with using shells scripts for data cleansing.
This short book contains a variety of code fragments and shell scripts for data scientists, data analysts, and other people who want shell-based solutions to “clean” various types of datasets.
This book takes introductory concepts and commands in bash, and then demonstrates their use in simple yet powerful shell scripts. This book does not cover “pure” system administration functionality for Unix or Linux. In general, topics that are not relevant in a shell-based Data Cleaning Pocket Primer are not covered in this book.