Programming Hive

Data Warehouse and Query Language for Hadoop

Specificaties
Paperback, 328 blz. | Engels
O'Reilly | 1e druk, 2012
ISBN13: 9781449319335
Rubricering
Hoofdrubriek : Computer en informatica
O'Reilly 1e druk, 2012 9781449319335
Verwachte levertijd ongeveer 16 werkdagen

Samenvatting

Need to move a relational database application to Hadoop? This comprehensive guide introduces you to Apache Hive, Hadoop's data warehouse infrastructure. You'll quickly learn how to use Hive's SQL dialect-HiveQL-to summarize, query, and analyze large datasets stored in Hadoop's distributed filesystem.

This example-driven guide shows you how to set up and configure Hive in your environment, provides a detailed overview of Hadoop and MapReduce, and demonstrates how Hive works within the Hadoop ecosystem. You'll also find real-world case studies that describe how companies have used Hive to solve unique problems involving petabytes of data.

- Use Hive to create, alter, and drop databases, tables, views, functions, and indexes
- Customize data formats and storage options, from files to external databases
- Load and extract data from tables-and use queries, grouping, filtering, joining, and other conventional query methods
- Gain best practices for creating user defined functions (UDFs)
- Learn Hive patterns you should use and anti-patterns you should avoid
- Integrate Hive with other data processing programs
- Use storage handlers for NoSQL databases and other datastores
- Learn the pros and cons of running Hive on Amazon's Elastic MapReduce

Specificaties

ISBN13:9781449319335
Taal:Engels
Bindwijze:paperback
Aantal pagina's:328
Uitgever:O'Reilly
Druk:1
Verschijningsdatum:4-10-2012

Over Dean Wampler

Dean Wampler is a Consultant, Trainer, and Mentor with Object Mentor, Inc. He specializes in Scala, Java, and Ruby. He works with clients on application design strategies that combine object-oriented programming, functional programming, and aspect-oriented programming. He also consults on Agile methods, like Lean and XP. Dean is a frequent speaker at industry and academic conferences on these topics. He has a Ph.D. in Physics from the University of Washington.

Andere boeken door Dean Wampler

Inhoudsopgave

Preface

1. Introduction
2. Getting Started
3. Data Types and File Formats
4. HiveQL: Data Definition
5. HiveQL: Data Manipulation
6. HiveQL: Queries
7. HiveQL: Views
8. HiveQL: Indexes
9. Schema Design
10. Tuning
11. Other File Formats and Compression
12. Developing
13. Functions
14. Streaming
15. Customizing Hive File and Record Formats
16. Hive Thrift Service
17. Storage Handlers and NoSQL
18. Security
19. Locking
20. Hive Integration with Oozie
21. Hive and Amazon Web Services (AWS)
22. HCatalog
23. Case Studies

Glossary
Appendix: References

Index

Net verschenen

Rubrieken

Populaire producten

    Personen

      Trefwoorden

        Programming Hive