EMR¶
boto.emr¶
This module provies an interface to the Elastic MapReduce (EMR) service from AWS.
boto.emr.connection¶
Represents a connection to the EMR service
-
class
boto.emr.connection.
EmrConnection
(aws_access_key_id=None, aws_secret_access_key=None, is_secure=True, port=None, proxy=None, proxy_port=None, proxy_user=None, proxy_pass=None, debug=0, https_connection_factory=None, region=None, path='/')¶ -
APIVersion
= '2009-03-31'¶
-
DebuggingArgs
= 's3n://us-east-1.elasticmapreduce/libs/state-pusher/0.1/fetch'¶
-
DebuggingJar
= 's3n://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar'¶
-
DefaultRegionEndpoint
= 'elasticmapreduce.amazonaws.com'¶
-
DefaultRegionName
= 'us-east-1'¶
-
ResponseError
¶ alias of
EmrResponseError
-
add_jobflow_steps
(jobflow_id, steps)¶ Adds steps to a jobflow
Parameters: - jobflow_id (str) – The job flow id
- steps (list(boto.emr.Step)) – A list of steps to add to the job
-
describe_jobflow
(jobflow_id)¶ Describes a single Elastic MapReduce job flow
Parameters: jobflow_id (str) – The job flow id of interest
-
describe_jobflows
(states=None, jobflow_ids=None, created_after=None, created_before=None)¶ Retrieve all the Elastic MapReduce job flows on your account
Parameters:
-
run_jobflow
(name, log_uri, ec2_keyname=None, availability_zone=None, master_instance_type='m1.small', slave_instance_type='m1.small', num_instances=1, action_on_failure='TERMINATE_JOB_FLOW', keep_alive=False, enable_debugging=False, hadoop_version='0.18', steps=[], bootstrap_actions=[])¶ Runs a job flow
Parameters: - name (str) – Name of the job flow
- log_uri (str) – URI of the S3 bucket to place logs
- ec2_keyname (str) – EC2 key used for the instances
- availability_zone (str) – EC2 availability zone of the cluster
- master_instance_type (str) – EC2 instance type of the master
- slave_instance_type (str) – EC2 instance type of the slave nodes
- num_instances (int) – Number of instances in the Hadoop cluster
- action_on_failure (str) – Action to take if a step terminates
- keep_alive (bool) – Denotes whether the cluster should stay alive upon completion
- enable_debugging (bool) – Denotes whether AWS console debugging should be enabled.
- steps (list(boto.emr.Step)) – List of steps to add with the job
Return type: Returns: The jobflow id
-
boto.emr.step¶
-
class
boto.emr.step.
JarStep
(name, jar, main_class=None, action_on_failure='TERMINATE_JOB_FLOW', step_args=None)¶ Custom jar step
A elastic mapreduce step that executes a jar
Parameters: -
args
()¶
-
jar
()¶
-
main_class
()¶
-
-
class
boto.emr.step.
Step
¶ Jobflow Step base class
-
args
()¶ Return type: list(str) Returns: List of arguments for the step
-
-
class
boto.emr.step.
StreamingStep
(name, mapper, reducer=None, action_on_failure='TERMINATE_JOB_FLOW', cache_files=None, cache_archives=None, step_args=None, input=None, output=None, jar='/home/hadoop/contrib/streaming/hadoop-streaming.jar')¶ Hadoop streaming step
A hadoop streaming elastic mapreduce step
Parameters: - name (str) – The name of the step
- mapper (str) – The mapper URI
- reducer (str) – The reducer URI
- action_on_failure (str) – An action, defined in the EMR docs to take on failure.
- cache_files (list(str)) – A list of cache files to be bundled with the job
- cache_archives (list(str)) – A list of jar archives to be bundled with the job
- step_args (list(str)) – A list of arguments to pass to the step
- input (str or a list of str) – The input uri
- output (str) – The output uri
- jar (str) – The hadoop streaming jar. This can be either a local path on the master node, or an s3:// URI.
-
args
()¶
-
jar
()¶
-
main_class
()¶
boto.emr.emrobject¶
This module contains EMR response objects
-
class
boto.emr.emrobject.
AddInstanceGroupsResponse
(connection=None)¶ -
Fields
= set(['InstanceGroupIds', 'JobFlowId'])¶
-
-
class
boto.emr.emrobject.
BootstrapAction
(connection=None)¶ -
Fields
= set(['Path', 'Args', 'Name'])¶
-
startElement
(name, attrs, connection)¶
-
-
class
boto.emr.emrobject.
EmrObject
(connection=None)¶ -
Fields
= set([])¶
-
endElement
(name, value, connection)¶
-
startElement
(name, attrs, connection)¶
-
-
class
boto.emr.emrobject.
InstanceGroup
(connection=None)¶ -
Fields
= set(['ReadyDateTime', 'InstanceType', 'InstanceRole', 'EndDateTime', 'InstanceRunningCount', 'State', 'BidPrice', 'Market', 'StartDateTime', 'Name', 'InstanceGroupId', 'CreationDateTime', 'InstanceRequestCount', 'LastStateChangeReason', 'LaunchGroup'])¶
-
-
class
boto.emr.emrobject.
JobFlow
(connection=None)¶ -
Fields
= set(['TerminationProtected', 'MasterInstanceId', 'State', 'HadoopVersion', 'LogUri', 'Ec2KeyName', 'ReadyDateTime', 'Type', 'JobFlowId', 'CreationDateTime', 'LastStateChangeReason', 'Name', 'EndDateTime', 'Value', 'InstanceCount', 'RequestId', 'StartDateTime', 'SlaveInstanceType', 'AvailabilityZone', 'MasterPublicDnsName', 'NormalizedInstanceHours', 'MasterInstanceType', 'KeepJobFlowAliveWhenNoSteps', 'Id'])¶
-
startElement
(name, attrs, connection)¶
-