Class StreamingStep

java.lang.Object
com.amazonaws.services.elasticmapreduce.util.StreamingStep

public class StreamingStep extends Object
Class that makes it easy to define Hadoop Streaming steps.

See also: Hadoop Streaming

 AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey);
 AmazonElasticMapReduce emr = new AmazonElasticMapReduceClient(credentials);

 HadoopJarStepConfig config = new StreamingStep()
     .withInputs("s3://elasticmapreduce/samples/wordcount/input")
     .withOutput("s3://my-bucket/output/")
     .withMapper("s3://elasticmapreduce/samples/wordcount/wordSplitter.py")
     .withReducer("aggregate")
     .toHadoopJarStepConfig();

 StepConfig wordCount = new StepConfig()
     .withName("Word Count")
     .withActionOnFailure("TERMINATE_JOB_FLOW")
     .withHadoopJarStep(config);

 RunJobFlowRequest request = new RunJobFlowRequest()
     .withName("Word Count")
     .withSteps(wordCount)
     .withLogUri("s3://log-bucket/")
     .withInstances(new JobFlowInstancesConfig()
         .withEc2KeyName("keypairt")
         .withHadoopVersion("0.20")
         .withInstanceCount(5)
         .withKeepJobFlowAliveWhenNoSteps(true)
         .withMasterInstanceType("m1.small")
         .withSlaveInstanceType("m1.small"));

 RunJobFlowResult result = emr.runJobFlow(request);
 
  • Constructor Details

    • StreamingStep

      public StreamingStep()
      Creates a new default StreamingStep.
  • Method Details

    • getInputs

      public List<String> getInputs()
      Get list of step input paths.
      Returns:
      List of step inputs
    • setInputs

      public void setInputs(Collection<String> inputs)
      Set the list of step input paths.
      Parameters:
      inputs - List of step inputs.
    • withInputs

      public StreamingStep withInputs(String... inputs)
      Add more input paths to this step.
      Parameters:
      inputs - A list of inputs to this step.
      Returns:
      A reference to this updated object so that method calls can be chained together.
    • getOutput

      public String getOutput()
      Get output path.
      Returns:
      Output path.
    • setOutput

      public void setOutput(String output)
      Set the output path for this step.
      Parameters:
      output - Output path.
    • withOutput

      public StreamingStep withOutput(String output)
      Set the output path for this step.
      Parameters:
      output - Output path
      Returns:
      A reference to this updated object so that method calls can be chained together.
    • getMapper

      public String getMapper()
      Get the mapper.
      Returns:
      Mapper.
    • setMapper

      public void setMapper(String mapper)
      Set the mapper.
      Parameters:
      mapper - Mapper
    • withMapper

      public StreamingStep withMapper(String mapper)
      Set the mapper
      Parameters:
      mapper - Mapper
      Returns:
      A reference to this updated object so that method calls can be chained together.
    • getReducer

      public String getReducer()
      Get the reducer
      Returns:
      Reducer
    • setReducer

      public void setReducer(String reducer)
      Set the reducer
      Parameters:
      reducer - Reducer
    • withReducer

      public StreamingStep withReducer(String reducer)
      Set the reducer
      Parameters:
      reducer - Reducer
      Returns:
      A reference to this updated object so that method calls can be chained together.
    • getHadoopConfig

      public Map<String,String> getHadoopConfig()
      Get the Hadoop config overrides (-D values).
      Returns:
      Hadoop config.
    • setHadoopConfig

      public void setHadoopConfig(Map<String,String> hadoopConfig)
      Set the Hadoop config overrides (-D values).
      Parameters:
      hadoopConfig - Hadoop config.
    • withHadoopConfig

      public StreamingStep withHadoopConfig(String key, String value)
      Add a Hadoop config override (-D value).
      Parameters:
      key - Hadoop configuration key.
      value - Configuration value.
      Returns:
      A reference to this updated object so that method calls can be chained together.
    • toHadoopJarStepConfig

      public HadoopJarStepConfig toHadoopJarStepConfig()
      Creates the final HadoopJarStepConfig once you are done configuring the step. You can use this as you would any other HadoopJarStepConfig.
      Returns:
      HadoopJarStepConfig representing this streaming step.