

Buy anything from 5,000+ international stores. One checkout price. No surprise fees. Join 2M+ shoppers on Desertcart.
Desertcart purchases this item on your behalf and handles shipping, customs, and support to Nicaragua.
Build, monitor, and manage real-time data pipelines to create data engineering infrastructure efficiently using open-source Apache projects Key features: Become well-versed in data architectures, data preparation, and data optimization skills with the help of practical examples Design data models and learn how to extract, transform, and load (ETL) data using Python Schedule, automate, and monitor complex data pipelines in production Book Description Data engineering provides the foundation for data science and analytics, and forms an important part of all businesses. This book will help you to explore various tools and methods that are used for understanding the data engineering process using Python. The book will show you how to tackle challenges commonly faced in different aspects of data engineering. You'll start with an introduction to the basics of data engineering, along with the technologies and frameworks required to build data pipelines to work with large datasets. You'll learn how to transform and clean data and perform analytics to get the most out of your data. As you advance, you'll discover how to work with big data of varying complexity and production databases, and build data pipelines. Using real-world examples, you'll build architectures on which you'll learn how to deploy data pipelines. By the end of this Python book, you'll have gained a clear understanding of data modeling techniques, and will be able to confidently build data engineering pipelines for tracking data, running quality checks, and making necessary changes in production. What you will learn Understand how data engineering supports data science workflows Discover how to extract data from files and databases and then clean, transform, and enrich it Configure processors for handling different file formats as well as both relational and NoSQL databases Find out how to implement a data pipeline and dashboard to visualize results Use staging and validation to check data before landing in the warehouse Build real-time pipelines with staging areas that perform validation and handle failures Get to grips with deploying pipelines in the production environment Who this book is for This book is for data analysts, ETL developers, and anyone looking to get started with or transition to the field of data engineering or refresh their knowledge of data engineering using Python. This book will also be useful for students planning to build a career in data engineering or IT professionals preparing for a transition. No previous knowledge of data engineering is required. Review: Solid introduction to Data Engineering - Data Engineering With Python provides a solid overview of pipelining and database connections for those tasked with processing both batch and stream data flows. Not only for the data miners, this book will be useful as well in a CI/CD environment using Kafka and Spark. It’s very readable and contains lots of practical, illustrative examples. Hits — solid explanations and demonstrations of Pandas, Zookeeper, Kafka, and Spark. Also introduces Great Expectations, NiFi, Airflow, and Faker, all of which are tied together in a usable demonstration environment. Pipeline implementation’s thoroughly covered as well. Misses — the book could use a little fine-tuning as to Python 3; some of the instructions are rather downrev, and concepts like 311 / SeeClickFix are sort of dropped in without a lot of explanation. Also, there’s a heavy focus on SQL and almost no coverage of noSQL databases. Overall, a good addition to the bookshelf if you’re using any of these Python packages. Readable and useful for anyone supporting data analysis. Review: A Good conceptual foundation - I was hesitant to buy this book based on some reviews that I read, but I decided to give it a shot nonetheless. For context, I’m currently a Software Engineer looking to make a transition over to Data Engineering in the future. I had little knowledge of the field, so I was looking for a book to give me a bit of a foundation on it. For me, the first and second sections of the book were really good for a conceptual understanding of the techniques and jargon within DE. Especially the first section! What threw me a little off was the constant reference to NiFi. Don’t get me wrong though, I did enjoy learning about it and also I think it was useful to understand that DE’s use a variety of different tools in their day to day (which was a plus for me). I just think the title is slightly misleading as Python is not always referenced. I think it got more exposure in the first section, but died down a bit in the latter half of the second section and then got referenced just a bit in the last one. Also, the Python implementation used in NiFi is Jython and not the default implementation that many probably use. Also, and I could be totally mistaken, but it seems that the most recent Jython version uses Python 2.7? Seemed a bit backwards to me to reference such an outdated version, but that’s in no way the author’s fault. I gave a 4 out of 5 stars just because some examples of NiFi simply didn’t work even though I followed the steps correctly. This could be because things have changed with the software since the release of the book, but I’m not totally sure…. In any case, I would still recommend this book to get a decent conceptual knowledge on DE principles, but I would do what others suggested and look at the documentations of the tools referenced to get a more updated view of them and work on personal projects utilizing them to apply the knowledge that was taught in this reading.







| Best Sellers Rank | #1,069,822 in Books ( See Top 100 in Books ) #170 in Data Warehousing (Books) #312 in Data Modeling & Design (Books) #894 in Python Programming |
| Customer Reviews | 4.1 out of 5 stars 157 Reviews |
J**L
Solid introduction to Data Engineering
Data Engineering With Python provides a solid overview of pipelining and database connections for those tasked with processing both batch and stream data flows. Not only for the data miners, this book will be useful as well in a CI/CD environment using Kafka and Spark. It’s very readable and contains lots of practical, illustrative examples. Hits — solid explanations and demonstrations of Pandas, Zookeeper, Kafka, and Spark. Also introduces Great Expectations, NiFi, Airflow, and Faker, all of which are tied together in a usable demonstration environment. Pipeline implementation’s thoroughly covered as well. Misses — the book could use a little fine-tuning as to Python 3; some of the instructions are rather downrev, and concepts like 311 / SeeClickFix are sort of dropped in without a lot of explanation. Also, there’s a heavy focus on SQL and almost no coverage of noSQL databases. Overall, a good addition to the bookshelf if you’re using any of these Python packages. Readable and useful for anyone supporting data analysis.
L**Z
A Good conceptual foundation
I was hesitant to buy this book based on some reviews that I read, but I decided to give it a shot nonetheless. For context, I’m currently a Software Engineer looking to make a transition over to Data Engineering in the future. I had little knowledge of the field, so I was looking for a book to give me a bit of a foundation on it. For me, the first and second sections of the book were really good for a conceptual understanding of the techniques and jargon within DE. Especially the first section! What threw me a little off was the constant reference to NiFi. Don’t get me wrong though, I did enjoy learning about it and also I think it was useful to understand that DE’s use a variety of different tools in their day to day (which was a plus for me). I just think the title is slightly misleading as Python is not always referenced. I think it got more exposure in the first section, but died down a bit in the latter half of the second section and then got referenced just a bit in the last one. Also, the Python implementation used in NiFi is Jython and not the default implementation that many probably use. Also, and I could be totally mistaken, but it seems that the most recent Jython version uses Python 2.7? Seemed a bit backwards to me to reference such an outdated version, but that’s in no way the author’s fault. I gave a 4 out of 5 stars just because some examples of NiFi simply didn’t work even though I followed the steps correctly. This could be because things have changed with the software since the release of the book, but I’m not totally sure…. In any case, I would still recommend this book to get a decent conceptual knowledge on DE principles, but I would do what others suggested and look at the documentations of the tools referenced to get a more updated view of them and work on personal projects utilizing them to apply the knowledge that was taught in this reading.
A**R
Returned book before finishing chapter 2
This book has a very poor flow (ironically based on what it is teaching) and is filled with errors. I wasn't even able to finish chapter 2 before I grew frustrated and requested a refund. The author makes a very large assumption that you will be using Linux and expects you to have an understanding on how to use it. This isn't a big deal if you are familiar with Linux, except the author makes no attempt to explain the flavor of Linux they are using or what sort of setup they have. It would have been better to have the reader create a VM with a specific setup so that everyone is on the same page. It also took me a long time to get Nifi up and running. There is a typo in one of the commands when you set the JAVA_HOME system variable. The version of Nifi is also very out of date which is understandable considering this is a book, however, there is no correction on the publisher's site or the GitHub repo. Even after you get Nifi setup, it jumps around the example and completely misses some steps on how to run your first flow. It's obvious this was a rushed book and the editing was also rushed or was not even done. You're better off finding a different book or finding online training.
P**L
Too Framework Dependent
I’m really appreciative of this author helping to teach others about data engineering. With that being said, this book is too platform dependent and doesn’t cover the fundamentals in great depth. I think this book should’ve focused more on Python, sql, and how to model databases and build etls from scratch without using any tools like airflow, even if the examples were much simple. It would have been a better way to show the process in a way that doesn’t have too much abstraction because of advanced tools like airflow and nifi.
M**Y
Not the best editing
Already seeing typos and poor visual examples (e.g. the columnar format example) within the first 10 main pages. Seems to have some good overall content but a bit discouraging that I feel as though I need to reconfirm or check against some of the statements in a book that’s supposed to be beginner friendly. The saving grace is a lot of content and variety.
M**.
Don’t buy it
I think this book was relevant 3-5 years ago but unfortunately it’s now outdated. The instructions to follow the examples in the book are just outdated and I was unable to get Apache Airflow to work. Don’t buy this book!
U**R
Average
I was expecting better content and accurate code
M**R
Great Starter for Non-Begginers
This is a very hands on guide to building data pipelines with Python and a number of other tools that would be very useful for anyone looking to handle large data sets, clean or enhance data. Paul does a great job of breaking down the difference between a Data Scientist and a Data Engineer while also covering areas of overlap. His break down of the tools that are available and a number of quick start style tutorials would be useful to anyone who's experienced enough to understand them. This book is clearly not for beginners but is clear and concise enough to ensure that's it's not only for experts either. My only real compliant is that the book skips over a whole category of NoSQL databases, namely Graph Databases i.e. Neo4j, Neptune, etc. which are a fast growing part of data science as a whole. All in all a very through introduction to data engineering that focuses mostly on SQL and would help any experienced Python developer get their toes wet in the field.
J**N
Couldn’t download Nifi
The instructions to download nifi from the book did not work. Was stuck and couldn’t proceed to follow the rest of the book. Money wasted
C**H
Best resource on NiFi yet
This book does a good job introducing the use of NiFi and Airflow as data orchestrators for data pipelines as well as briefly mentioning Kafka and Spark. NiFi NiFi is explained in detail in this book. It covers subjects that go beyond the basics and are not easily googled. Since NiFi tutorials are rather rare compared to many other technologies, this is a valuable resource for learning it. It covers topics such as deploying to production, versioning and monitoring. Online you find some false information that versioning and automated deployment wouldn't be possible with NiFi but this book shows how it's done! Airflow In this book Airflow is also introduced and you're shown how to setup DAGs with it. If you're mainly interested in Airflow there are better books for you, e.g. "Data Pipelines With Apache Airflow". Nevertheless this book will teach you how to get started with Airflow as well and how to setup simple project sized data pipelines from extraction to visualization. Real-time processing At the end of the book Kafka and Spark are introduced as well. The content on these is not comprehensive. You will definitely need another resource to learn more on those in case that's what you're intetested in. What I didn't like This book shows you how to install the tools mentioned by hand. I would have much preferred it if a Docker setup was used. Others pointed out that they had problems installing tools and copying examples 1:1. I didn't have such problems since I installed everything using Docker and since it often is a relevant skill in Data Engineering it would not have been out of place to use it here as well. The title can be criticized as well since it's broader than the content. Something like "Data Pipelines with NiFi and Airflow" might have been more fitting. Summary I'm giving this book 5 stars since it's hard to find a good resource on NiFi and this book is it. After you read it, you will see that both Airflow and NiFi are good orchestration tools for very similar use cases.
H**E
Damaged product
The book is a little bit (not so remarkable) damaged. But since I love the content so I stay away from returning :p
E**C
The book should be called how to use NIFI
If you are using NIFI, this is the right book for you. But if not, be careful about choosing the book.
A**G
Not worth the price tag
The book does a good job of introducing the reader to what a data engineer does and the different tools that he uses and that’s it , if you are interested in the topic of data engineer I believe the materials provided online cover most of the topics in the book . Writer discusses mostly about Apache airFlow and Nifi but not too much in detail to make the use understand and the snap shot of code in the book are not so clear and not friendly to the eye .
Trustpilot
1 month ago
2 weeks ago