Hadoop in Practice

Name: Hadoop in Practice
Author: Alex Holmes

Specificaties

Paperback, 511 blz. | Engels

Manning | 2e druk, 2014

ISBN13: 9781617292224

Rubricering

Hoofdrubriek : Computer en informatica

Juridisch : Computer en informatica

Manning 2e druk, 2014 9781617292224

€ 59,74

In winkelwagen

Levertijd ongeveer 11 werkdagen

Gratis verzonden

Samenvatting

It's always a good time to upgrade your Hadoop skills! ‘Hadoop in Practice, 2nd Edition’ provides a collection of 104 tested, instantly useful techniques for analyzing real-time streams, moving data securely, machine learning, managing large-scale clusters, and taming big data using Hadoop.

This completely revised edition covers changes and new features in Hadoop core, including MapReduce 2 and YARN. You'll pick up hands-on best practices for integrating Spark, Kafka, and Impala with Hadoop, and get new and updated techniques for the latest versions of Flume, Sqoop, and Mahout. In short, this is the most practical, up-to-date coverage of Hadoop available.

WHAT'S INSIDE
- Thoroughly updated for Hadoop 2
- How to write YARN applications
- Integrate real-time technologies like Storm, Impala, and Spark
- Predictive analytics using Mahout and RR

Readers need to know a programming language like Java and have basic familiarity with Hadoop.

Specificaties

ISBN13:9781617292224

Taal:Engels

Bindwijze:paperback

Aantal pagina's:511

Uitgever:Manning

Druk:2

Verschijningsdatum:12-10-2014

Hoofdrubriek:Computer en informatica

Inhoudsopgave

Preface
Acknowledgments
About this book
About the cover illustration

Part 1 Background and fundamentals
1. Hadoop in a heartbeat
1.1 What is Hadoop?
1.2 Getting your hands dirty with MapReduce
1.3 Summary

2. Introduction to YARN
2.1 YARN overview
2.2 YARN and MapReduce
2.3 YARN applications
2.4 Summary

Part 2 Data logistics
3. Data serialization—working with text and beyond
3.1 Understanding inputs and outputs in MapReduce
3.2 Processing common serialization formats
3.3 Big data serialization formats
3.4 Columnar storage
3.5 Custom file formats
3.6 Chapter summary

4. Organizing and optimizing data in HDFS
4.1 Data organization
4.2 Efficient storage with compression
4.3 Chapter summary

5. Moving data into and out of Hadoop
5.1 Key elements of data movement
5.2 Moving data into Hadoop
5.3 Moving data into Hadoop
5.4 Moving data out of Hadoop
5.5 Chapter summary

Part 3: Big data patterns
6. Applying MapReduce patterns to big data
6.1 Joining
6.2 Sorting
6.3 Sampling
6.4 Chapter summary

7. Utilizing data structures and algorithms at scale
7.1 Modeling data and solving problems with graphs
7.2 Modeling data and solving problems with graphs
7.3 Bloom filters
7.4 HyperLogLog
7.5 Chapter summary

8. Tuning, debugging, and testing
8.1 Measure, measure, measure
8.2 Tuning MapReduce
8.3 Debugging
8.4 Testing MapReduce jobs
8.5 Chapter summary

Part 4: Beyond MapReduce
9. SQL on Hadoop
9.1 Hive
9.2 Impala
9.3 Spark SQL
9.4 Chapter summary

10. Writing a YARN application
10.1 Fundamentals of building a YARN application
10.2 Building a YARN application to collect cluster statistics
10.3 Additional YARN application capabilities
10.4 YARN programming abstractions
10.5 Summary

Appendix: Installing Hadoop and friends
Index

Bonus chapters available for download
11. Integrating R and Hadoop for statistics and more
12. Predictive analytics with Mahout