Data Processing and Modeling with Hadoop
US$ 19.95
The publisher has enabled DRM protection, which means that you need to use the BookFusion iOS, Android or Web app to read this eBook. This eBook cannot be used outside of the BookFusion platform.
Description
Contents
Reviews
Language
English
ISBN
9789391392284
Cover Page
Title Page
Copyright Page
Dedication Page
About the Author
About the Reviewer
Acknowledgement
Preface
Errata
Table of Contents
1. Understanding the Current Moment
Introduction
Structure
Objectives
A little context
Why use it?
Solving problems
Hadoop ecosystem
Building the data lake
What does the data tell us?
Conclusion
Points to remember
Questions
Multiple choice questions
Answers
2. Defining the Zones
Introduction
Structure
Objectives
Why separate data into zones?
Transition zone
RAW zone
Trusted zone
Refined zone
Where to put my Sandbox
Conclusion
Points to remember
Questions
Multiple choice questions
Answers
3. The Importance of Modeling
Introduction
Structure
Objectives
Why should we model our environment?
Data Vault 2.0
How to plan modeling
Conclusion
Points to remember
Questions
Multiple choice questions
Answers
4. Massive Parallel Processing
Introduction
Structure
Objectives
How did we arrive and where did we arrive?
What is MapReduce?
MapReduce features
Introduction to Spark
Resource Manager – YARN
Introduction to Apache Tez
Conclusion
Points to remember
Questions
Multiple choice questions
Answers
5. Doing ETL/ELT
Introduction
Structure
Objectives
Transforming data into information
Identifying enemies
Main types of transformations
Planning the rollback
Why a data mart?
Feedback
Data lake and data warehouse secrets
Conclusion
Points to remember
Questions
Multiple choice questions
Answers
6. A Little Governance
Introduction
Structure
Objectives
Governing the data
Main difficulties
What methodologies and tools to use?
Defining a deployment roadmap
Conclusion
Points to remember
Questions
Multiple choice questions
Answers
7. Talking About Security
Introduction
Structure
Objectives
Need to worry about security
The main difficulties
Making identification, authorization, and authentication
The main tools
Defining a schedule
Conclusion
Points to remember
Questions
Multiple choice questions
Answers
8. What Are the Next Steps?
Introduction
Structure
Objectives
A new era
Separating a batch from real time
Defining the visualization tools
Machine learning
New tendencies
Conclusion
Questions
Multiple choice questions
Answers
Index
The book hasn't received reviews yet.