Big data learning and development platform

Hadoop

Purpose

- This documents describes how to set up and configure a single node Hadoop installation so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).

Tools Used

The tool to be used is: Cloudera.

- Cloudera is revolutionizing enterprise data management by offering the first unified Platform for Bigdata: The Enterprise Data Hub. Cloudera offers enterprises one place to store, process, and analyze all their data, empowering them to extend the value of existing investments while enabling fundamental new ways to derive value from their data.

- The cloudera was founded in 2008, and is currently, the leading provider and supporter of Apache Hadoop for the enterprise. Cloudera also offers software for business critical data challenges including storage, access, management, analysis, security, and search.

Requirements system.

These 64-bit VMs require a 64-bit host OS and a virtualization product that can support a 64-bit guest OS.
To use a VMware VM, you must use a player compatible with WorkStation 8.x or higher:

Player 4.x or higher
Fusion 4.x or higher

Older versions of WorkStation can be used to create a new VM using the same virtual disk (VMDK file), but some features in VMware Tools are not available.
The amount of RAM required varies by the run-time option you choose:

CDH and Cloudera Manager Version	RAM Required by VM
CDH 5 (default)	4+ GiB*
Cloudera Express	8+ GiB*
Cloudera Enterprise (trial)	10+ GiB*

*Minimum recommended memory. If you are running workloads larger than the examples provided, consider allocating additional memory.

Installation

In the section we just show you how to installation the Cloudera QuickStart Virtual Machine.

- The first step is to download and install VirtualBox. Download here.

- The first step is to download the Cloudera quickstart by following this link.

- Use the 7-zip to extract the contents of downloaded zip file.

- Run the VirtualBox and then import Cloudera to VirtualBox.

Test Cloudera.

Download WordCount.java from Sakai.
Create new project in eclipse.
Add references:

File system/usr/lib/hadoop/client-0.20
File system/usr/lib/hadoop
File system/usr/lib/hadoop/lib

Create folder input inside project and add document file with text.

Run program.
Check file in output folder to see result.

Eclipse

General about Eclipse.

Eclipse (@ www.eclipse.org) is an open-source Integrated Development Environment (IDE) supported by IBM. Eclipse is popular for Java application development (Java SE and Java EE) and Android apps. It also supports C/C++, PHP, Python, Perl, and other web project developments via extensible plug-ins. Eclipse is cross-platform and runs under Windows, Linux and Mac OS.

Installation.

To use Eclipse for Java programming, you need to first install Java Development Kit (JDK). To read this link to know "How to install SDK (on Window)" .
Download Eclipse from here.
To install Eclipse, simply unzip the downloaded file into a directory of your choice (e.g., "d:\myproject").

Running eclipse.

Create new project in Eclipse(HelloWorld).
Customize your code and then run the project.

References.

Comments

AnonymousMay 26, 2020 at 5:14 AM
It is a amazing post!!

Big Data Hadoop Training In Chennai | Big Data Hadoop Training In anna nagar | Big Data Hadoop Training In omr | Big Data Hadoop Training In porur | Big Data Hadoop Training In tambaram | Big Data Hadoop Training In velachery

UnknownJuly 25, 2021 at 11:19 PM
Brilliant post! We are connecting to this extraordinary post on our site. Keep up the extraordinary composition. tech updates
UnknownJuly 26, 2021 at 4:37 AM
Hello, I do think this is an incredible site. I stumbledupon it ;) I will return to once since I have saved as a most loved it. Cash and opportunity is the most ideal approach to change, may you be rich and keep on helping other people. tech updates

Learner and Developer

Search This Blog

Big data learning and development platform

Comments

Post a Comment