ElasticSearch Guided (code) Generator

Development

Install dependencies with opam install --deps-only .

Build with make

Mapping

esgg takes as an input an ES mapping (schema description) and actual query (with syntax for variables, described below). Often additional information is needed to map ES fields into proper OCaml types, this is achieved by attaching _meta annotation object to the affected field (ES only supports _meta at root level, so these annotations make it impossible to store extended mapping back into ES which is a pity), as follows:

    "counts": {
      "_meta": {
        "optional": true
      },
      "properties": {
        "hash": {
          "type": "long",
          "_meta": { "repr": "int64" }
        },
        "value": {
          "type": "long"
        }
      }
    },

Supported _meta attributes:

{"list":true} - property is an array (mapped to list)
{"list":"sometimes"} - property is either an array or single element (mapped to json with custom ocaml module wrap that will need to be provided in scope)
{"optional":<true|false>} - property may be missing (mapped to option)
{"ignore":true} - skip property altogether
{"fields_default_optional":true} - any subfield may be missing (can be overriden by per-field optional:false)
{"repr":"int64"} - override ES type, currently the only possible value is "int64" to ensure no bits are lost (by default long is mapped to OCaml int)

Host types mapping

Generated code allows to use application types for any fields. This is achieved by referencing specific type for each field in generated code, instead of the primitive type from the mapping, allowing consumer of the code to map it onto custom type etc. For example the field hash in example above will have type Counts.Hash.t in generated code. In order to compile the generated code this type must be present in scope and mapped to something useful. Default mapping (which just maps everything to corresponding primitive types) can be generated with esgg reflect <mapping name> <mapping.json>, e.g.:

esgg reflect hello_world src/mappings/hello_world.json >> src/mapping.ml

will generate the following, which should be edited manually as needed, e.g. by making Hash a module with an abstract type

  module Counts = struct
    module Hash = Id_(Int64_)
    module Value = Id_(Long_)
  end

Variables

Syntax for variables in template json files is as follows:

$var for regular required variable
$var? for optional variable (minimal surrounding scope is conditionally expunged)
full form $(var:hint) where hint can be either list or list? currently

Configuration via `_esgg`

The _esgg field can be added to query templates to configure code generation behavior. This field is automatically filtered out before sending queries to Elasticsearch.

Supported configuration options:

{"matched_queries": true} - Include matched_queries field in output types even when _name is not explicitly present in the query template. This is useful when _name is defined inside query variables.
{"inner_hits": [ ... ]} - Declare inner hits to include in output types even if the corresponding nested queries are provided via base/shared queries. Each entry describes one nested path.

Example:

{
  "_esgg": {
    "matched_queries": true
  },
  "query": $query,
  "size": 10
}

`_esgg.inner_hits` specification

When inner hits are defined inside a base/shared query (not visible in this template), declare them explicitly so esgg can generate typed inner_hits in the output:

{
  "_esgg": {
    "inner_hits": [
      {
        "path": "comments",            // required: nested path in the mapping
        "name": "comments",            // optional: key under inner_hits (defaults to path)
        "size": 100,                   // optional
        "from": 0,                     // optional
        "_source": ["fieldA","fieldB"],// optional: standard ES source filtering for inner hits
        "stored_fields": ["storedA"],  // optional
        "highlight": {                 // optional: ES highlight shape; fields keys are collected
          "fields": { "comments.text": {} }
        }
      }
    ]
  },
  "query": $query
}

Reusing shared ATD definitions

To reuse shared definitions using the -shared <file.atd> option, the atd file must have the <esgg from="..."> annotation at the top of the file. The value of the annotation must correspond to the OCaml module containing the shared definitions.

Example:

# file.atd

<esgg from="Your_ocaml_module_name">

...atd type definitions...

Elasticsearch features

TODO document what is supported

Some notes follow:

aggregations

The following aggregation types are supported:

Metric Aggregations

Bucket Aggregations

Pipeline Aggregations

Matrix Aggregations

matrix_stats - Matrix statistics

Aggregation-specific notes

filters

named
anonymous
dynamic (i.e. a variable)
partial dynamic (i.e. containing variables)
other_bucket and other_bucket_key
other_bucket with anonymous filters (ignored, user is responsible to treat last element of result specially)

Dynamic (defined at runtime) filters are supported, as follows { "filters": { "filters": $x } }. In this case corresponding part of output will be quite untyped. $x is assumed to be a dictionary and result will be represented with dictionaries. For anonymous filters (ie array of filters) use $(x:list).

date_histogram

key_as_string is returned in output only when format is explicitly specified, to discourage fragile code.

range

Keyed aggregation expects explicit key for each range. from/to fields in response are not extracted.

date_range

Same as for range aggregation.

dynamic

Specifying aggregation as variable ($var) will lead to an untyped json in place of aggregation output, this can be used as temporary workaround for unsupported aggregation types or for truly dynamic usecase (aggregation built at run-time).

script

Scripts are opaque, ie no type information is extracted and result is json.

source filtering

exclude and include
wildcards
dynamic (i.e. a variable) NB not implemented for get and mget

Tests

make test runs regression tests in test/ verifying that input and output atd generated from query stays unchanged. Once there is an expected change in generated query - it should be committed. Tests are easy to add and fast to run.

TODO tests to verify that:

code generated for query application of input variables does actually compile and produce correct query when run
atd description of output (generated from query) can indeed unserialize ES output from that actual query

Conditions

This project is distributed under the terms of GPL Version 2. See LICENSE file for full license text.

NB the output of esgg, i.e. the generated code, is all yours of course :)

Name		Name	Last commit message	Last commit date
Latest commit History 296 Commits
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
aggregations.ml		aggregations.ml
aggregations.mli		aggregations.mli
atdgen.ml		atdgen.ml
atdgen.mli		atdgen.mli
common.ml		common.ml
derive.ml		derive.ml
derive.mli		derive.mli
dune		dune
dune-project		dune-project
dune_flags.sexp		dune_flags.sexp
esgg.ml		esgg.ml
esgg.opam		esgg.opam
gen_version.ml		gen_version.ml
hit.ml		hit.ml
make_release.sh		make_release.sh
prelude.ml		prelude.ml
query.ml		query.ml
query.mli		query.mli
tjson.ml		tjson.ml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ElasticSearch Guided (code) Generator

Development

Mapping

Host types mapping

Variables

Configuration via `_esgg`

`_esgg.inner_hits` specification

Reusing shared ATD definitions

Elasticsearch features

aggregations

Metric Aggregations

Bucket Aggregations

Pipeline Aggregations

Matrix Aggregations

Aggregation-specific notes

filters

date_histogram

range

date_range

dynamic

script

source filtering

Tests

Conditions

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 7

Uh oh!

Languages

License

ahrefs/esgg

Folders and files

Latest commit

History

Repository files navigation

ElasticSearch Guided (code) Generator

Development

Mapping

Host types mapping

Variables

Configuration via _esgg

_esgg.inner_hits specification

Reusing shared ATD definitions

Elasticsearch features

aggregations

Metric Aggregations

Bucket Aggregations

Pipeline Aggregations

Matrix Aggregations

Aggregation-specific notes

filters

date_histogram

range

date_range

dynamic

script

source filtering

Tests

Conditions

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

Configuration via `_esgg`

`_esgg.inner_hits` specification

Packages