How to import your hadoop code in IDE

#Eclipse #Hadoop #code

Guys, as a developer, IDE has made our life so easy. I remember using the Turbo C editor during my college days, the blue screen with yellow fonts, the C code, etc etc .. that was fun. I remember spending half an hour fixing my code just because I wanted to rename a global variable. Shucks !! That was a waste of time.

Then I met Eclipse, my first IDE. A hefty 10 decades of coding, and yaa still using it. Its like the shortcuts. They are hardcoded in me now, I do not need to think ( or even use a mouse ) to compile / build a code, or add a try-catch, or put all the getters and setters, or change perspectives, or find references, or rename a variable, or blah blah blah !

So, coming back to our agenda for this, lets get our code into eclipse now.

Have you completed Stage – I ?

I hope you have completed the stage one of hadoop code compilation. If not, please take some time to get through that. Hadoop needs some protocol buffers, which are generated while compilation phase. So, come back here once you are done with stage one. Link is here : https://bigdatagurus.wordpress.com/2017/03/15/how-to-compile-hadoop-code-locally

Lets start digging… I mean importing !!!

So, now that you have your hadoop code compiled and ready to be ingested into Eclipse. Here is my configuration:

  1. Eclipse neon.2
  2. Oracle JDK 1.8
  3. $ mvn --version
    Apache Maven 3.3.9
    Maven home: /usr/share/maven
    Java version: 1.8.0_121, vendor: Oracle Corporation
    Java home: /usr/lib/jvm/java-8-openjdk-amd64/jre
    Default locale: en_IN, platform encoding: UTF-8
    OS name: "linux", version: "4.4.0-66-generic", arch: "amd64", family: "unix"
    abhay@mean-machine:~$

Check your Engines and Oils …. I mean your Eclipse Configuration

There are couple of steps you need to do before you start your importing. Every process has, like before a race, you better check all the oils are topped up and are in place. Below is what you need to do.

  1. Ensure you have your M2E plugin install in eclipse. If not installed, you can install it.You can install the lastest M2Eclipse release (1.7.0) by using the following update site from within Eclipse:
    http://download.eclipse.org/technology/m2e/releases

    There are also development builds available. Information on how to install those can be found here.

  2. Check you M2_REPO variable in eclipse.
    1. Eclipse recognizes the maven’s repo location using M2_REPO variable.
    2. Go to Windows -> Preferences -> Java -> Build Path -> Classpath variable. Check if M2_REPO is already added there.
      M2_Variable_check
    3. Add the tools.jar
      1. Hadoop uses some native APIs from Java and hence requires this tools.jar. Remember, in stage I, we created a link for tools.jar under /usr/lib/tools.jar. We need to tell eclipse to use that too.
      2. Go to Windows -> Preferences -> Java -> Installed JREs. Select your JRE, you should see 1.8 there.
      3. Push the “Add External JARs” button and navigate to your tools.jar. Mine is kept at /usr/lib/tools.jar. ( Its linked to /usr/lib/jvm/java-8-openjdk-amd64/lib/tools.jar on my machine ).
        Import-Tools.jar
    4. Formatting.
      1. Hadoop need a special kind of formatting. You can download the formatter from here.
      2. Import the formatter into Eclipse.
        1. Go to Window -> Preferences -> Java -> Code Style -> Formatter

OKies, So all good, we are ready to launch

Once we have all this ready, as a step to ensure all our proto files are converted to .java files, execute the following under src dir :

$mvn generate-sources generate-test-sources

Once this succeeds, we are sure to have all the proto files converted to java.

Now, we are ready to import. Just follow the steps below:

  1. Import project into Eclipse using the “Existing Maven Projects” option.
  2. The generated sources, i.e the Java files generated from the protoc files are not directly linked, and hence shows a tsunami of errors. That is OK for now. We will resolve it on a case by case basis.
  3. Close all the projects in Eclipse.
  4. We can now open all projects one-by-one.
  5. Open hadoop-common, and click yes, when eclipse asks you to open the related projects too.
  6. Now, you will see a lot of errors, but keep calm. These errors are because Eclipse is not knowing the location for the java files generated from the proto files.
  7. Right click “hadoop-common” project in Project Explorer -> Build Path -> Link Source.
  8. You need to include “generated-sources” as well as “generated-test-sources” as the source folders. Ensure you use the “Linked Folder Location” till java.
    Import_source
  9. In case you see an error like “The type package-info is already defined”, just exclude package-info.java in the src for the respective folder. See below :
    ExcludePackage.java
  10.  Here is how my Java project looks like :
    Hadoop-common
  11. In eclipse, you will encounter errors like “maven-resources-plugin prior to 2.4 is not supported by m2e. Use maven-resources-plugin version 2.4 or later.”
    1. This is because the latest m2e eclipse plugin needs maven-resources-plugin to be 2.4 onwards, while hadoop is still at 2.2.
    2. To resolve this issue, update the ./hadoop-project/pom.xml  ( Search for maven-resources-plugin ) and update the version to 2.4. Something similar to below:
      1.  <plugin>
             <groupId>org.apache.maven.plugins</groupId>
             <artifactId>maven-resources-plugin</artifactId>
             <version>2.4</version>
         </plugin>

After this exercise, you may see some errors related to maven. But we can ignore those because they are related to M2E plugin that we mentioned. Just ensure your referencing works using the golden buttons :

  1. F3 ( go to a class )
  2. Ctrl + Shift + G ( Search referencing )

Voila !

So, now you are all set to have a quick view on how the project structure is, how the code structured and all. Try compiling a single project / making changes etc.

hadoop-distcp is another project I like. Do explore it. It will provide you a lot of information.

This was the main part. Here onwards, we we see the structure for hadoop code, how we can debug this yellow elephant and understand how he works.

Hope this was useful.

Kiitokset ja terveiset,

Abhay Dandekar

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s