Note

You are viewing the documentation for an older version of boto (boto2).

Boto3, the next version of Boto, is now stable and recommended for general use. It can be used side-by-side with Boto in the same project, so it is easy to start using Boto3 in your existing projects as well as new projects. Going forward, API updates and all new feature work will be focused on Boto3.

For more information, see the documentation for boto3.

Cloudsearch

boto.cloudsearch

boto.cloudsearch.connect_to_region(region_name, **kw_params)
boto.cloudsearch.regions()

Get all available regions for the Amazon CloudSearch service.

Return type:list
Returns:A list of boto.regioninfo.RegionInfo

boto.cloudsearch.domain

class boto.cloudsearch.domain.Domain(layer1, data)

A Cloudsearch domain.

Variables:
  • name – The name of the domain.
  • id – The internally generated unique identifier for the domain.
  • created – A boolean which is True if the domain is created. It can take several minutes to initialize a domain when CreateDomain is called. Newly created search domains are returned with a False value for Created until domain creation is complete
  • deleted – A boolean which is True if the search domain has been deleted. The system must clean up resources dedicated to the search domain when delete is called. Newly deleted search domains are returned from list_domains with a True value for deleted for several minutes until resource cleanup is complete.
  • processing – True if processing is being done to activate the current domain configuration.
  • num_searchable_docs – The number of documents that have been submittted to the domain and indexed.
  • requires_index_document – True if index_documents needs to be called to activate the current domain configuration.
  • search_instance_count – The number of search instances that are available to process search requests.
  • search_instance_type – The instance type that is being used to process search requests.
  • search_partition_count – The number of partitions across which the search index is spread.
create_index_field(field_name, field_type, default='', facet=False, result=False, searchable=False, source_attributes=[])

Defines an IndexField, either replacing an existing definition or creating a new one.

Parameters:
  • field_name (string) – The name of a field in the search index.
  • field_type (string) – The type of field. Valid values are uint | literal | text
  • default (string or int) – The default value for the field. If the field is of type uint this should be an integer value. Otherwise, it’s a string.
  • facet (bool) – A boolean to indicate whether facets are enabled for this field or not. Does not apply to fields of type uint.
  • results (bool) – A boolean to indicate whether values of this field can be returned in search results or used in ranking. Does not apply to fields of type uint.
  • searchable (bool) – A boolean to indicate whether search is enabled for this field or not. Applies only to fields of type literal.
  • source_attributes (list of dicts) –

    An optional list of dicts that provide information about attributes for this index field. A maximum of 20 source attributes can be configured for each index field.

    Each item in the list is a dict with the following keys:

    • data_copy - The value is a dict with the following keys:
      • default - Optional default value if the source attribute
        is not specified in a document.
      • name - The name of the document source field to add
        to this IndexField.
    • data_function - Identifies the transformation to apply
      when copying data from a source attribute.
    • data_map - The value is a dict with the following keys:
      • cases - A dict that translates source field values
        to custom values.
      • default - An optional default value to use if the
        source attribute is not specified in a document.
      • name - the name of the document source field to add
        to this IndexField
    • data_trim_title - Trims common title words from a source
      document attribute when populating an IndexField. This can be used to create an IndexField you can use for sorting. The value is a dict with the following fields: * default - An optional default value. * language - an IETF RFC 4646 language code. * separator - The separator that follows the text to trim. * name - The name of the document source field to add.
Raises:

BaseException, InternalException, LimitExceededException, InvalidTypeException, ResourceNotFoundException

create_rank_expression(name, expression)

Create a new rank expression.

Parameters:
  • rank_name (string) – The name of an expression computed for ranking while processing a search request.
  • rank_expression (string) –

    The expression to evaluate for ranking or thresholding while processing a search request. The RankExpression syntax is based on JavaScript expressions and supports:

    • Integer, floating point, hex and octal literals
    • Shortcut evaluation of logical operators such that an
      expression a || b evaluates to the value a if a is true without evaluting b at all
    • JavaScript order of precedence for operators
    • Arithmetic operators: + - * / %
    • Boolean operators (including the ternary operator)
    • Bitwise operators
    • Comparison operators
    • Common mathematic functions: abs ceil erf exp floor
      lgamma ln log2 log10 max min sqrt pow
    • Trigonometric library functions: acosh acos asinh asin
      atanh atan cosh cos sinh sin tanh tan
    • Random generation of a number between 0 and 1: rand
    • Current time in epoch: time
    • The min max functions that operate on a variable argument list

    Intermediate results are calculated as double precision floating point values. The final return value of a RankExpression is automatically converted from floating point to a 32-bit unsigned integer by rounding to the nearest integer, with a natural floor of 0 and a ceiling of max(uint32_t), 4294967295. Mathematical errors such as dividing by 0 will fail during evaluation and return a value of 0.

    The source data for a RankExpression can be the name of an IndexField of type uint, another RankExpression or the reserved name text_relevance. The text_relevance source is defined to return an integer from 0 to 1000 (inclusive) to indicate how relevant a document is to the search request, taking into account repetition of search terms in the document and proximity of search terms to each other in each matching IndexField in the document.

    For more information about using rank expressions to customize ranking, see the Amazon CloudSearch Developer Guide.

Raises:

BaseException, InternalException, LimitExceededException, InvalidTypeException, ResourceNotFoundException

created
delete()

Delete this domain and all index data associated with it.

deleted
doc_service_arn
doc_service_endpoint
get_access_policies()

Return a boto.cloudsearch.option.OptionStatus object representing the currently defined access policies for the domain.

get_document_service()
get_index_fields(field_names=None)

Return a list of index fields defined for this domain.

get_rank_expressions(rank_names=None)

Return a list of rank expressions defined for this domain.

get_search_service()
get_stemming()

Return a boto.cloudsearch.option.OptionStatus object representing the currently defined stemming options for the domain.

get_stopwords()

Return a boto.cloudsearch.option.OptionStatus object representing the currently defined stopword options for the domain.

get_synonyms()

Return a boto.cloudsearch.option.OptionStatus object representing the currently defined synonym options for the domain.

id
index_documents()

Tells the search domain to start indexing its documents using the latest text processing options and IndexFields. This operation must be invoked to make options whose OptionStatus has OptioState of RequiresIndexDocuments visible in search results.

name
num_searchable_docs
processing
requires_index_documents
search_instance_count
search_partition_count
search_service_arn
search_service_endpoint
update_from_data(data)
boto.cloudsearch.domain.handle_bool(value)

boto.cloudsearch.exceptions

boto.cloudsearch.layer1

class boto.cloudsearch.layer1.Layer1(aws_access_key_id=None, aws_secret_access_key=None, is_secure=True, host=None, port=None, proxy=None, proxy_port=None, proxy_user=None, proxy_pass=None, debug=0, https_connection_factory=None, region=None, path='/', api_version=None, security_token=None, validate_certs=True, profile_name=None)
APIVersion = '2011-02-01'
DefaultRegionEndpoint = 'cloudsearch.us-east-1.amazonaws.com'
DefaultRegionName = 'us-east-1'
create_domain(domain_name)

Create a new search domain.

Parameters:domain_name (string) – A string that represents the name of a domain. Domain names must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed.
Raises:BaseException, InternalException, LimitExceededException
define_index_field(domain_name, field_name, field_type, default='', facet=False, result=False, searchable=False, source_attributes=None)

Defines an IndexField, either replacing an existing definition or creating a new one.

Parameters:
  • domain_name (string) – A string that represents the name of a domain. Domain names must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed.
  • field_name (string) – The name of a field in the search index.
  • field_type (string) – The type of field. Valid values are uint | literal | text
  • default (string or int) – The default value for the field. If the field is of type uint this should be an integer value. Otherwise, it’s a string.
  • facet (bool) – A boolean to indicate whether facets are enabled for this field or not. Does not apply to fields of type uint.
  • results (bool) – A boolean to indicate whether values of this field can be returned in search results or used in ranking. Does not apply to fields of type uint.
  • searchable (bool) – A boolean to indicate whether search is enabled for this field or not. Applies only to fields of type literal.
  • source_attributes (list of dicts) –

    An optional list of dicts that provide information about attributes for this index field. A maximum of 20 source attributes can be configured for each index field.

    Each item in the list is a dict with the following keys:

    • data_copy - The value is a dict with the following keys:
      • default - Optional default value if the source attribute
        is not specified in a document.
      • name - The name of the document source field to add
        to this IndexField.
    • data_function - Identifies the transformation to apply
      when copying data from a source attribute.
    • data_map - The value is a dict with the following keys:
      • cases - A dict that translates source field values
        to custom values.
      • default - An optional default value to use if the
        source attribute is not specified in a document.
      • name - the name of the document source field to add
        to this IndexField
    • data_trim_title - Trims common title words from a source
      document attribute when populating an IndexField. This can be used to create an IndexField you can use for sorting. The value is a dict with the following fields: * default - An optional default value. * language - an IETF RFC 4646 language code. * separator - The separator that follows the text to trim. * name - The name of the document source field to add.
Raises:

BaseException, InternalException, LimitExceededException, InvalidTypeException, ResourceNotFoundException

define_rank_expression(domain_name, rank_name, rank_expression)

Defines a RankExpression, either replacing an existing definition or creating a new one.

Parameters:
  • domain_name (string) – A string that represents the name of a domain. Domain names must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed.
  • rank_name (string) – The name of an expression computed for ranking while processing a search request.
  • rank_expression (string) –

    The expression to evaluate for ranking or thresholding while processing a search request. The RankExpression syntax is based on JavaScript expressions and supports:

    • Integer, floating point, hex and octal literals
    • Shortcut evaluation of logical operators such that an
      expression a || b evaluates to the value a if a is true without evaluting b at all
    • JavaScript order of precedence for operators
    • Arithmetic operators: + - * / %
    • Boolean operators (including the ternary operator)
    • Bitwise operators
    • Comparison operators
    • Common mathematic functions: abs ceil erf exp floor
      lgamma ln log2 log10 max min sqrt pow
    • Trigonometric library functions: acosh acos asinh asin
      atanh atan cosh cos sinh sin tanh tan
    • Random generation of a number between 0 and 1: rand
    • Current time in epoch: time
    • The min max functions that operate on a variable argument list

    Intermediate results are calculated as double precision floating point values. The final return value of a RankExpression is automatically converted from floating point to a 32-bit unsigned integer by rounding to the nearest integer, with a natural floor of 0 and a ceiling of max(uint32_t), 4294967295. Mathematical errors such as dividing by 0 will fail during evaluation and return a value of 0.

    The source data for a RankExpression can be the name of an IndexField of type uint, another RankExpression or the reserved name text_relevance. The text_relevance source is defined to return an integer from 0 to 1000 (inclusive) to indicate how relevant a document is to the search request, taking into account repetition of search terms in the document and proximity of search terms to each other in each matching IndexField in the document.

    For more information about using rank expressions to customize ranking, see the Amazon CloudSearch Developer Guide.

Raises:

BaseException, InternalException, LimitExceededException, InvalidTypeException, ResourceNotFoundException

delete_domain(domain_name)

Delete a search domain.

Parameters:domain_name (string) – A string that represents the name of a domain. Domain names must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed.
Raises:BaseException, InternalException
delete_index_field(domain_name, field_name)

Deletes an existing IndexField from the search domain.

Parameters:
  • domain_name (string) – A string that represents the name of a domain. Domain names must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed.
  • field_name (string) – A string that represents the name of an index field. Field names must begin with a letter and can contain the following characters: a-z (lowercase), 0-9, and _ (underscore). Uppercase letters and hyphens are not allowed. The names “body”, “docid”, and “text_relevance” are reserved and cannot be specified as field or rank expression names.
Raises:

BaseException, InternalException, ResourceNotFoundException

delete_rank_expression(domain_name, rank_name)

Deletes an existing RankExpression from the search domain.

Parameters:
  • domain_name (string) – A string that represents the name of a domain. Domain names must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed.
  • rank_name (string) – Name of the RankExpression to delete.
Raises:

BaseException, InternalException, ResourceNotFoundException

describe_default_search_field(domain_name)

Describes options defining the default search field used by indexing for the search domain.

Parameters:domain_name (string) – A string that represents the name of a domain. Domain names must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed.
Raises:BaseException, InternalException, ResourceNotFoundException
describe_domains(domain_names=None)

Describes the domains (optionally limited to one or more domains by name) owned by this account.

Parameters:domain_names (list) – Limits the response to the specified domains.
Raises:BaseException, InternalException
describe_index_fields(domain_name, field_names=None)

Describes index fields in the search domain, optionally limited to a single IndexField.

Parameters:
  • domain_name (string) – A string that represents the name of a domain. Domain names must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed.
  • field_names (list) – Limits the response to the specified fields.
Raises:

BaseException, InternalException, ResourceNotFoundException

describe_rank_expressions(domain_name, rank_names=None)

Describes RankExpressions in the search domain, optionally limited to a single expression.

Parameters:
  • domain_name (string) – A string that represents the name of a domain. Domain names must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed.
  • rank_names (list) – Limit response to the specified rank names.
Raises:

BaseException, InternalException, ResourceNotFoundException

describe_service_access_policies(domain_name)

Describes the resource-based policies controlling access to the services in this search domain.

Parameters:domain_name (string) – A string that represents the name of a domain. Domain names must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed.
Raises:BaseException, InternalException, ResourceNotFoundException
describe_stemming_options(domain_name)

Describes stemming options used by indexing for the search domain.

Parameters:domain_name (string) – A string that represents the name of a domain. Domain names must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed.
Raises:BaseException, InternalException, ResourceNotFoundException
describe_stopword_options(domain_name)

Describes stopword options used by indexing for the search domain.

Parameters:domain_name (string) – A string that represents the name of a domain. Domain names must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed.
Raises:BaseException, InternalException, ResourceNotFoundException
describe_synonym_options(domain_name)

Describes synonym options used by indexing for the search domain.

Parameters:domain_name (string) – A string that represents the name of a domain. Domain names must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed.
Raises:BaseException, InternalException, ResourceNotFoundException
get_response(doc_path, action, params, path='/', parent=None, verb='GET', list_marker=None)
index_documents(domain_name)

Tells the search domain to start scanning its documents using the latest text processing options and IndexFields. This operation must be invoked to make visible in searches any options whose <a>OptionStatus</a> has OptionState of RequiresIndexDocuments.

Parameters:domain_name (string) – A string that represents the name of a domain. Domain names must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed.
Raises:BaseException, InternalException, ResourceNotFoundException
update_default_search_field(domain_name, default_search_field)

Updates options defining the default search field used by indexing for the search domain.

Parameters:
  • domain_name (string) – A string that represents the name of a domain. Domain names must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed.
  • default_search_field (string) – The IndexField to use for search requests issued with the q parameter. The default is an empty string, which automatically searches all text fields.
Raises:

BaseException, InternalException, InvalidTypeException, ResourceNotFoundException

update_service_access_policies(domain_name, access_policies)

Updates the policies controlling access to the services in this search domain.

Parameters:
  • domain_name (string) – A string that represents the name of a domain. Domain names must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed.
  • access_policies (string) – An IAM access policy as described in The Access Policy Language in Using AWS Identity and Access Management. The maximum size of an access policy document is 100KB.
Raises:

BaseException, InternalException, LimitExceededException, ResourceNotFoundException, InvalidTypeException

update_stemming_options(domain_name, stems)

Updates stemming options used by indexing for the search domain.

Parameters:
  • domain_name (string) – A string that represents the name of a domain. Domain names must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed.
  • stems (string) – Maps terms to their stems. The JSON object has a single key called “stems” whose value is a dict mapping terms to their stems. The maximum size of a stemming document is 500KB. Example: {“stems”:{“people”: “person”, “walking”:”walk”}}
Raises:

BaseException, InternalException, InvalidTypeException, LimitExceededException, ResourceNotFoundException

update_stopword_options(domain_name, stopwords)

Updates stopword options used by indexing for the search domain.

Parameters:
  • domain_name (string) – A string that represents the name of a domain. Domain names must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed.
  • stopwords (string) – Lists stopwords in a JSON object. The object has a single key called “stopwords” whose value is an array of strings. The maximum size of a stopwords document is 10KB. Example: {“stopwords”: [“a”, “an”, “the”, “of”]}
Raises:

BaseException, InternalException, InvalidTypeException, LimitExceededException, ResourceNotFoundException

update_synonym_options(domain_name, synonyms)

Updates synonym options used by indexing for the search domain.

Parameters:
  • domain_name (string) – A string that represents the name of a domain. Domain names must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed.
  • synonyms (string) – Maps terms to their synonyms. The JSON object has a single key “synonyms” whose value is a dict mapping terms to their synonyms. Each synonym is a simple string or an array of strings. The maximum size of a stopwords document is 100KB. Example: {“synonyms”: {“cat”: [“feline”, “kitten”], “puppy”: “dog”}}
Raises:

BaseException, InternalException, InvalidTypeException, LimitExceededException, ResourceNotFoundException

boto.cloudsearch.layer1.do_bool(val)

boto.cloudsearch.layer2

class boto.cloudsearch.layer2.Layer2(aws_access_key_id=None, aws_secret_access_key=None, is_secure=True, port=None, proxy=None, proxy_port=None, host=None, debug=0, session_token=None, region=None, validate_certs=True)
create_domain(domain_name)

Create a new CloudSearch domain and return the corresponding boto.cloudsearch.domain.Domain object.

list_domains(domain_names=None)

Return a list of boto.cloudsearch.domain.Domain objects for each domain defined in the current account.

lookup(domain_name)

Lookup a single domain :param domain_name: The name of the domain to look up :type domain_name: str

Returns:Domain object, or None if the domain isn’t found
Return type:boto.cloudsearch.domain.Domain

boto.cloudsearch.optionstatus

class boto.cloudsearch.optionstatus.IndexFieldStatus(domain, data=None, refresh_fn=None, save_fn=None)
save()
class boto.cloudsearch.optionstatus.OptionStatus(domain, data=None, refresh_fn=None, save_fn=None)

Presents a combination of status field (defined below) which are accessed as attributes and option values which are stored in the native Python dictionary. In this class, the option values are merged from a JSON object that is stored as the Option part of the object.

Variables:
  • domain_name – The name of the domain this option is associated with.
  • create_date – A timestamp for when this option was created.
  • state

    The state of processing a change to an option. Possible values:

    • RequiresIndexDocuments: the option’s latest value will not be visible in searches until IndexDocuments has been called and indexing is complete.
    • Processing: the option’s latest value is not yet visible in all searches but is in the process of being activated.
    • Active: the option’s latest value is completely visible.
  • update_date – A timestamp for when this option was updated.
  • update_version – A unique integer that indicates when this option was last updated.
endElement(name, value, connection)
refresh(data=None)

Refresh the local state of the object. You can either pass new state data in as the parameter data or, if that parameter is omitted, the state data will be retrieved from CloudSearch.

save()

Write the current state of the local object back to the CloudSearch service.

startElement(name, attrs, connection)
to_json()

Return the JSON representation of the options as a string.

wait_for_state(state)

Performs polling of CloudSearch to wait for the state of this object to change to the provided state.

class boto.cloudsearch.optionstatus.RankExpressionStatus(domain, data=None, refresh_fn=None, save_fn=None)
class boto.cloudsearch.optionstatus.ServicePoliciesStatus(domain, data=None, refresh_fn=None, save_fn=None)
allow_doc_ip(ip)

Add the provided ip address or CIDR block to the list of allowable address for the document service.

Parameters:ip (string) – An IP address or CIDR block you wish to grant access to.
allow_search_ip(ip)

Add the provided ip address or CIDR block to the list of allowable address for the search service.

Parameters:ip (string) – An IP address or CIDR block you wish to grant access to.
disallow_doc_ip(ip)

Remove the provided ip address or CIDR block from the list of allowable address for the document service.

Parameters:ip (string) – An IP address or CIDR block you wish to grant access to.
disallow_search_ip(ip)

Remove the provided ip address or CIDR block from the list of allowable address for the search service.

Parameters:ip (string) – An IP address or CIDR block you wish to grant access to.
new_statement(arn, ip)

Returns a new policy statement that will allow access to the service described by arn by the ip specified in ip.

Parameters:
  • arn (string) – The Amazon Resource Notation identifier for the service you wish to provide access to. This would be either the search service or the document service.
  • ip (string) – An IP address or CIDR block you wish to grant access to.

boto.cloudsearch.search

exception boto.cloudsearch.search.CommitMismatchError
class boto.cloudsearch.search.Query(q=None, bq=None, rank=None, return_fields=None, size=10, start=0, facet=None, facet_constraints=None, facet_sort=None, facet_top_n=None, t=None)
RESULTS_PER_PAGE = 500
to_params()

Transform search parameters from instance properties to a dictionary

Return type:dict
Returns:search parameters
update_size(new_size)
class boto.cloudsearch.search.SearchConnection(domain=None, endpoint=None)
build_query(q=None, bq=None, rank=None, return_fields=None, size=10, start=0, facet=None, facet_constraints=None, facet_sort=None, facet_top_n=None, t=None)
get_all_hits(query)

Get a generator to iterate over all search results

Transparently handles the results paging from Cloudsearch search results so even if you have many thousands of results you can iterate over all results in a reasonably efficient manner.

Parameters:query (boto.cloudsearch.search.Query) – A group of search criteria
Return type:generator
Returns:All docs matching query
get_all_paged(query, per_page)

Get a generator to iterate over all pages of search results

Parameters:
Return type:

generator

Returns:

Generator containing boto.cloudsearch.search.SearchResults

get_num_hits(query)

Return the total number of hits for query

Parameters:query (boto.cloudsearch.search.Query) – a group of search criteria
Return type:int
Returns:Total number of hits for query
search(q=None, bq=None, rank=None, return_fields=None, size=10, start=0, facet=None, facet_constraints=None, facet_sort=None, facet_top_n=None, t=None)

Send a query to CloudSearch

Each search query should use at least the q or bq argument to specify the search parameter. The other options are used to specify the criteria of the search.

Parameters:
  • q (string) – A string to search the default search fields for.
  • bq (string) – A string to perform a Boolean search. This can be used to create advanced searches.
  • rank (List of strings) – A list of fields or rank expressions used to order the search results. A field can be reversed by using the - operator. ['-year', 'author']
  • return_fields (List of strings) – A list of fields which should be returned by the search. If this field is not specified, only IDs will be returned. ['headline']
  • size (int) – Number of search results to specify
  • start (int) – Offset of the first search result to return (can be used for paging)
  • facet (list) – List of fields for which facets should be returned ['colour', 'size']
  • facet_constraints (dict) – Use to limit facets to specific values specified as comma-delimited strings in a Dictionary of facets {'colour': "'blue','white','red'", 'size': "big"}
  • facet_sort (dict) – Rules used to specify the order in which facet values should be returned. Allowed values are alpha, count, max, sum. Use alpha to sort alphabetical, and count to sort the facet by number of available result. {'color': 'alpha', 'size': 'count'}
  • facet_top_n (dict) – Dictionary of facets and number of facets to return. {'colour': 2}
  • t (dict) – Specify ranges for specific fields {'year': '2000..2005'}
Return type:

boto.cloudsearch.search.SearchResults

Returns:

Returns the results of this search

The following examples all assume we have indexed a set of documents with fields: author, date, headline

A simple search will look for documents whose default text search fields will contain the search word exactly:

>>> search(q='Tim') # Return documents with the word Tim in them (but not Timothy)

A simple search with more keywords will return documents whose default text search fields contain the search strings together or separately.

>>> search(q='Tim apple') # Will match "tim" and "apple"

More complex searches require the boolean search operator.

Wildcard searches can be used to search for any words that start with the search string.

>>> search(bq="'Tim*'") # Return documents with words like Tim or Timothy)

Search terms can also be combined. Allowed operators are “and”, “or”, “not”, “field”, “optional”, “token”, “phrase”, or “filter”

>>> search(bq="(and 'Tim' (field author 'John Smith'))")

Facets allow you to show classification information about the search results. For example, you can retrieve the authors who have written about Tim:

>>> search(q='Tim', facet=['Author'])

With facet_constraints, facet_top_n and facet_sort more complicated constraints can be specified such as returning the top author out of John Smith and Mark Smith who have a document with the word Tim in it.

>>> search(q='Tim',
...     facet=['Author'],
...     facet_constraints={'author': "'John Smith','Mark Smith'"},
...     facet=['author'],
...     facet_top_n={'author': 1},
...     facet_sort={'author': 'count'})
class boto.cloudsearch.search.SearchResults(**attrs)
next_page()

Call Cloudsearch to get the next page of search results

Return type:boto.cloudsearch.search.SearchResults
Returns:the following page of search results
exception boto.cloudsearch.search.SearchServiceException

boto.cloudsearch.document

exception boto.cloudsearch.document.CommitMismatchError
class boto.cloudsearch.document.CommitResponse(response, doc_service, sdf)

Wrapper for response to Cloudsearch document batch commit.

Parameters:
Raises:

boto.exception.BotoServerError

Raises:

boto.cloudsearch.document.SearchServiceException

Raises:

boto.cloudsearch.document.EncodingError

Raises:

boto.cloudsearch.document.ContentTooLongError

exception boto.cloudsearch.document.ContentTooLongError

Content sent for Cloud Search indexing was too long

This will usually happen when documents queued for indexing add up to more than the limit allowed per upload batch (5MB)

class boto.cloudsearch.document.DocumentServiceConnection(domain=None, endpoint=None)

A CloudSearch document service.

The DocumentServiceConection is used to add, remove and update documents in CloudSearch. Commands are uploaded to CloudSearch in SDF (Search Document Format).

To generate an appropriate SDF, use add() to add or update documents, as well as delete() to remove documents.

Once the set of documents is ready to be index, use commit() to send the commands to CloudSearch.

If there are a lot of documents to index, it may be preferable to split the generation of SDF data and the actual uploading into CloudSearch. Retrieve the current SDF with get_sdf(). If this file is the uploaded into S3, it can be retrieved back afterwards for upload into CloudSearch using add_sdf_from_s3().

The SDF is not cleared after a commit(). If you wish to continue using the DocumentServiceConnection for another batch upload of commands, you will need to clear_sdf() first to stop the previous batch of commands from being uploaded again.

add(_id, version, fields, lang='en')

Add a document to be processed by the DocumentService

The document will not actually be added until commit() is called

Parameters:
  • _id (string) – A unique ID used to refer to this document.
  • version (int) – Version of the document being indexed. If a file is being reindexed, the version should be higher than the existing one in CloudSearch.
  • fields (dict) – A dictionary of key-value pairs to be uploaded .
  • lang (string) – The language code the data is in. Only ‘en’ is currently supported
add_sdf_from_s3(key_obj)

Load an SDF from S3

Using this method will result in documents added through add() and delete() being ignored.

Parameters:key_obj (boto.s3.key.Key) – An S3 key which contains an SDF
clear_sdf()

Clear the working documents from this DocumentServiceConnection

This should be used after commit() if the connection will be reused for another set of documents.

commit()

Actually send an SDF to CloudSearch for processing

If an SDF file has been explicitly loaded it will be used. Otherwise, documents added through add() and delete() will be used.

Return type:CommitResponse
Returns:A summary of documents added and deleted
delete(_id, version)

Schedule a document to be removed from the CloudSearch service

The document will not actually be scheduled for removal until commit() is called

Parameters:
  • _id (string) – The unique ID of this document.
  • version (int) – Version of the document to remove. The delete will only occur if this version number is higher than the version currently in the index.
get_sdf()

Generate the working set of documents in Search Data Format (SDF)

Return type:string
Returns:JSON-formatted string of the documents in SDF
exception boto.cloudsearch.document.EncodingError

Content sent for Cloud Search indexing was incorrectly encoded.

This usually happens when a document is marked as unicode but non-unicode characters are present.

exception boto.cloudsearch.document.SearchServiceException