Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support in for custom analyzers for text and overriding the default template settings for index templates #1737

Merged
merged 24 commits into from
Feb 9, 2022

Conversation

kgeller
Copy link
Contributor

@kgeller kgeller commented Jan 24, 2022

Resolves #1264
Resolves #1203

This PR encompasses two smaller tooling improvements:

  1. Adds support for the analyzer in the ECS field definitions.
  2. Now allows for users to override the default composable template settings.

Change 2 led to a change in the arguments for the generator. Instead of just --template-settings, we will now have --template-settings-legacy and --template-settings. This will be a breaking change since --template-settings will now refer to v2 index templates.

@kgeller kgeller added the 8.1.0 label Jan 24, 2022
@kgeller kgeller self-assigned this Jan 24, 2022
@kgeller
Copy link
Contributor Author

kgeller commented Jan 24, 2022

@ebeahan @djptek I added support for users to add in their search analyzer for text fields. In doing so, I was trying to decide how to handle the settings part of it. I was back and forth on whether to 1) programmatically add an empty definition, 2) add no definition, or 3) adding an empty definition to the settings starter file.

I went with choice 1, but it is definitely the most complex. Do you both agree with that approach, or is there something else I should do / am not thinking of?

@ebeahan
Copy link
Member

ebeahan commented Jan 26, 2022

In doing so, I was trying to decide how to handle the settings part of it. I was back and forth on whether to 1) programmatically add an empty definition, 2) add no definition, or 3) adding an empty definition to the settings starter file.

Would you provide some more details? I'm uncertain what exactly needs handling here.

If the user wants to declare a custom analyzer on a field, the index settings specify an analyzer. But if the user doesn't define an analyzer, that settings would be omitted, right?

@kgeller
Copy link
Contributor Author

kgeller commented Jan 26, 2022

@ebeahan

Would you provide some more details? I'm uncertain what exactly needs handling here.

I just wasn't sure how much 'control' ECS wanted to exert here. They 'key' part IMO was passing the analyzer along with the field, which we do. It's just a matter of how we handle the settings piece of the analyzer and if we want to add it for them or not or whatever.

If the user wants to declare a custom analyzer on a field, the index settings specify an analyzer. But if the user doesn't define an analyzer, that settings would be omitted, right?

Correct. Essentially I was trying to figure out how we handle that definition of the analyzer in the settings, because we have a handful of directions we can go in. I went with adding an empty definition so that the user can fill it in with whatever definition they see fit. But of course if there are no analyzers, nothing will be added. On the other end of the spectrum, we could add nothing into the settings and let the user populate it themselves.

Does that help?

@ebeahan
Copy link
Member

ebeahan commented Jan 26, 2022

we could add nothing into the settings and let the user populate it themselves.

I think I better understand. IMO, our tooling should support passing through whatever settings the user provides. I'll walk through a hypothetical scenario using this feature. Share if you see this coming together differently.

Using the domain_analzyer example from #1264, the file passed in using --template-settings would contain its definition (along with a few other index template setting):

{
  "order": 1,
  "index_patterns": [
    "your-ecs-*"
  ],
  "settings": {
    "index": {
      "mapping": {
        "total_fields": {
          "limit": "10000"
        }
      },
      "analysis": {
        "analyzer": {
          "domain_analyzer": {
            "type": "pattern",
            "pattern": "\\."
          }
        }
      },
      "refresh_interval": "5s"
    }
  }
}

The user also declares the analyzer on their field definitions. Since we've moved all text fields to match_only_text, these would either be custom fields or someone overriding ECS default types:

- name: acme
  title: ACME
  description: >
    Acme Inc. custom fields
  type: group
  fields:
    - name: account.domain
      type: text
      level: custom
      analzyer: domain_analyzer
      description: >
        Customer account description for this activity.

If a user wants a built-in analyzer, they can skip the first step and define it on the field:

  fields:
    - name: account.domain
      type: text
      level: custom
      analyzer: whitespace
      description: >
        Customer account description for this activity.

User runs the generator, and the (v1) template is created:

{
  "index_patterns": [
    "your-ecs-*"
  ],
  "mappings": {
    "date_detection": false,
    "dynamic_templates": [
      {
        "strings_as_keyword": {
          "mapping": {
            "ignore_above": 1024,
            "type": "keyword"
          },
          "match_mapping_type": "string"
        }
      }
    ],
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "acme": {
        "properties": {
          "account": {
            "properties": {
              "domain": {
                "analyzer": "domain_analyzer",
                "type": "text"
              },
              "id": {
                "ignore_above": 1024,
                "type": "keyword"
              }
            }
          }
        }
      }
    }
  },
  "order": 1,
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "domain_analyzer": {
            "pattern": "\\.",
            "type": "pattern"
          }
        }
      },
      "mapping": {
        "total_fields": {
          "limit": "10000"
        }
      },
      "refresh_interval": "5s"
    }
  }
}

If the resulting template has a config error, I think it's okay to rely on ES to throw an exception when it's uploaded. If I typoed type in my analyzer config:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "analyzer [domain_analyzer] must specify either an analyzer type, or a tokenizer"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "analyzer [domain_analyzer] must specify either an analyzer type, or a tokenizer"
  },
  "status": 400
}

@kgeller kgeller marked this pull request as ready for review February 1, 2022 20:16
@kgeller kgeller requested a review from a team February 1, 2022 20:16
@kgeller kgeller added 8.2.0 and removed 8.1.0 labels Feb 2, 2022
Copy link
Member

@ebeahan ebeahan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generator implementation looks good. Do you see any opportunities to cover the different template generations in the unit tests?

I had some thoughts about the proposed naming.

@kgeller kgeller changed the title Adding support in for custom analyzers for text Adding support in for custom analyzers for text and overriding the default template settings for index templates Feb 3, 2022
@kgeller kgeller removed the 8.2.0 label Feb 3, 2022
@kgeller kgeller added the 8.0.0 label Feb 3, 2022
@kgeller
Copy link
Contributor Author

kgeller commented Feb 3, 2022

Generator implementation looks good. Do you see any opportunities to cover the different template generations in the unit tests?

I had some thoughts about the proposed naming.

I love your suggestions for updated naming, I just updated all of that.

As far as unit tests, I did not. But I will take a closer look.

@kgeller
Copy link
Contributor Author

kgeller commented Feb 9, 2022

@ebeahan tests added!

Copy link
Member

@ebeahan ebeahan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the USAGE.md suggestion, everything else is a nit or something we can handle outside this PR.

@@ -0,0 +1,16 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll handle it in a separate PR, but the usage-example generated files should be updated to match the new dir structure.

@@ -186,7 +166,7 @@ def entry_for(field):
if mf_type == 'keyword':
ecs_helpers.dict_copy_existing_keys(mf, mf_entry, ['normalizer', 'ignore_above'])
elif mf_type == 'text':
ecs_helpers.dict_copy_existing_keys(mf, mf_entry, ['norms'])
ecs_helpers.dict_copy_existing_keys(mf, mf_entry, ['norms', 'analyzer'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out-of-scope here, just leaving for future consideration:

#1264 only proposes adding analyzer support to multi-fields, but, in the near future, I think we could also support passing analyzer through for all text fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me

kgeller and others added 2 commits February 9, 2022 13:14
@kgeller kgeller merged commit d76ae49 into elastic:main Feb 9, 2022
@kgeller kgeller deleted the analyzer-multi-field branch February 9, 2022 18:25
kgeller added a commit to kgeller/ecs that referenced this pull request Feb 9, 2022
kgeller added a commit to kgeller/ecs that referenced this pull request Feb 9, 2022
…fault template settings for index templates (elastic#1737)

# Conflicts:
#	scripts/generators/es_template.py
kgeller added a commit to kgeller/ecs that referenced this pull request Feb 9, 2022
…fault template settings for index templates (elastic#1737)

# Conflicts:
#	CHANGELOG.next.md
#	scripts/generators/es_template.py
kgeller added a commit that referenced this pull request Feb 9, 2022
kgeller added a commit that referenced this pull request Feb 9, 2022
…the default template settings for index templates (#1737) (#1760)
kgeller added a commit that referenced this pull request Feb 9, 2022
…the default template settings for index templates (#1737) (#1761)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add analyzer multi-field capability Unable to override composable templates settings
2 participants