Event Ruler (called Ruler in rest of the doc for brevity) is a Java library that allows matching Rules to Events. An event is a list of fields, which may be given as name/value pairs or as a JSON object. A rule associates event field names with lists of possible values. There are two reasons to use Ruler:
Contents:
It's easiest to explain by example.
An Event is a JSON object. Here's an example:
{
"version": "0",
"id": "ddddd4-aaaa-7777-4444-345dd43cc333",
"detail-type": "EC2 Instance State-change Notification",
"source": "aws.ec2",
"account": "012345679012",
"time": "2017-10-02T16:24:49Z",
"region": "us-east-1",
"resources": [
"arn:aws:ec2:us-east-1:123456789012:instance/i-000000aaaaaa00000"
],
"detail": {
"c-count": 5,
"d-count": 3,
"x-limit": 301.8,
"source-ip": "10.0.0.33",
"instance-id": "i-000000aaaaaa00000",
"state": "running"
}
}
You can also see this as a set of name/value pairs. For brevity, we present only a sampling. Ruler has APIs for providing events both in JSON form and as name/value pairs:
+--------------+------------------------------------------+
| name | value |
|--------------|------------------------------------------|
| source | "aws.ec2" |
| detail-type | "EC2 Instance State-change Notification" |
| detail.state | "running" |
+--------------+------------------------------------------+
Events in the JSON form may be provided in the form of a raw JSON String, or a parsed Jackson JsonNode.
The rules in this section all match the sample event above:
{
"detail-type": [ "EC2 Instance State-change Notification" ],
"resources": [ "arn:aws:ec2:us-east-1:123456789012:instance/i-000000aaaaaa00000" ],
"detail": {
"state": [ "initializing", "running" ]
}
}
This will match any event with the provided values for the resource
,
detail-type
, and detail.state
values, ignoring any other fields in the
event. It would also match if the value of detail.state
had been
"initializing"
.
Values in rules are always provided as arrays, and match if the value in the
event is one of the values provided in the array. The reference to resources
shows that if the value in the event is also an array, the rule matches if the
intersection between the event array and rule-array is non-empty.
{
"time": [ { "prefix": "2017-10-02" } ]
}
Prefix matches only work on string-valued fields.
{
"source": [ { "prefix": { "equals-ignore-case": "EC2" } } ]
}
Prefix equals-ignore-case matches only work on string-valued fields.
{
"source": [ { "suffix": "ec2" } ]
}
Suffix matches only work on string-valued fields.
{
"source": [ { "suffix": { "equals-ignore-case": "EC2" } } ]
}
Suffix equals-ignore-case matches only work on string-valued fields.
{
"source": [ { "equals-ignore-case": "EC2" } ]
}
Equals-ignore-case matches only work on string-valued fields.
{
"source": [ { "wildcard": "Simple*Service" } ]
}
Wildcard matches only work on string-valued fields. A single value can contain zero to many wildcard characters, but consecutive wildcard characters are not allowed. To match the asterisk character specifically, a wildcard character can be escaped with a backslash. Two consecutive backslashes (i.e. a backslash escaped with a backslash) represents the actual backslash character. A backslash escaping any character other than asterisk or backslash is not allowed.
Anything-but matching does what the name says: matches anything except what's provided in the rule.
Anything-but works with single string and numeric values or lists, which have to contain entirely strings or entirely numerics. It also may be applied to a prefix, suffix, or equals-ignore-case match of a string or a list of strings.
Single anything-but (string, then numeric):
{
"detail": {
"state": [ { "anything-but": "initializing" } ]
}
}
{
"detail": {
"x-limit": [ { "anything-but": 123 } ]
}
}
Anything-but list (strings):
{
"detail": {
"state": [ { "anything-but": [ "stopped", "overloaded" ] } ]
}
}
Anything-but list (numbers):
{
"detail": {
"x-limit": [ { "anything-but": [ 100, 200, 300 ] } ]
}
}
Anything-but prefix:
{
"detail": {
"state": [ { "anything-but": { "prefix": "init" } } ]
}
}
Anything-but prefix list (strings):
{
"detail": {
"state": [ { "anything-but": { "prefix": [ "init", "error" ] } } ]
}
}
Anything-but suffix:
{
"detail": {
"instance-id": [ { "anything-but": { "suffix": "1234" } } ]
}
}
Anything-but suffix list (strings):
{
"detail": {
"instance-id": [ { "anything-but": { "suffix": [ "1234", "6789" ] } } ]
}
}
Anything-but-ignore-case:
{
"detail": {
"state": [ { "anything-but": {"equals-ignore-case": "Stopped" } } ]
}
}
Anything-but-ignore-case list (strings):
{
"detail": {
"state": [ { "anything-but": {"equals-ignore-case": [ "Stopped", "OverLoaded" ] } } ]
}
}
Anything-but wildcard:
{
"detail": {
"state": [ { "anything-but": { "wildcard": "*/bin/*.jar" } } ]
}
}
Anything-but wildcard list (strings):
{
"detail": {
"state": [ { "anything-but": { "wildcard": [ "*/bin/*.jar", "*/bin/*.class" ] } } ]
}
}
{
"detail": {
"c-count": [ { "numeric": [ ">", 0, "<=", 5 ] } ],
"d-count": [ { "numeric": [ "<", 10 ] } ],
"x-limit": [ { "numeric": [ "=", 3.018e2 ] } ]
}
}
Above, the references to c-count
, d-count
, and x-limit
illustrate numeric matching,
and only work with values that are JSON numbers. Numeric matching supports the same
precision and range as Java's double
primitive which implements IEEE 754 binary64
standard.
{
"detail": {
"source-ip": [ { "cidr": "10.0.0.0/24" } ]
}
}
This also works with IPv6 addresses.
Exists matching works on the presence or absence of a field in the JSON event.
The rule below will match any event which has a detail.c-count field present.
{
"detail": {
"c-count": [ { "exists": true } ]
}
}
The rule below will match any event which has no detail.c-count field.
{
"detail": {
"c-count": [ { "exists": false } ]
}
}
Note Exists
match only works on the leaf nodes. It does not work on intermediate nodes.
As an example, the above example for exists : false
would match the event below:
{
"detail-type": [ "EC2 Instance State-change Notification" ],
"resources": [ "arn:aws:ec2:us-east-1:123456789012:instance/i-000000aaaaaa00000" ],
"detail": {
"state": [ "initializing", "running" ]
}
}
but would also match the event below because c-count
is not a leaf node:
{
"detail-type": [ "EC2 Instance State-change Notification" ],
"resources": [ "arn:aws:ec2:us-east-1:123456789012:instance/i-000000aaaaaa00000" ],
"detail": {
"state": [ "initializing", "running" ]
"c-count" : {
"c1" : 100
}
}
}
{
"time": [ { "prefix": "2017-10-02" } ],
"detail": {
"state": [ { "anything-but": "initializing" } ],
"c-count": [ { "numeric": [ ">", 0, "<=", 5 ] } ],
"d-count": [ { "numeric": [ "<", 10 ] } ],
"x-limit": [ { "anything-but": [ 100, 200, 300 ] } ],
"source-ip": [ { "cidr": "10.0.0.0/8" } ]
}
}
As the examples above show, Ruler considers a rule to match if all of the fields named in the rule match, and it considers a field to match if any of the provided field values match, that is to say Ruler has applied "And" logic to all fields by default without "And" primitive is required.
There are two ways to reach the "Or" effects:
The "$or" primitive to allow the customer directly describe the "Or" relationship among fields in the rule.
Ruler recognizes "Or" relationship only when the rule has met all below conditions:
/src/main/software/amazon/event/ruler/Constants.java#L38
for example, below rule will be not parsed as "Or" relationship because "numeric" and "prefix" are Ruler reserved keywords.
{
"$or": [ {"numeric" : 123}, {"prefix": "abc"} ]
}
Otherwise, Ruler just treats the "$or" as normal filed name the same as other string in the rule.
Normal "Or":
// Effect of "source" && ("metricName" || "namespace")
{
"source": [ "aws.cloudwatch" ],
"$or": [
{ "metricName": [ "CPUUtilization", "ReadLatency" ] },
{ "namespace": [ "AWS/EC2", "AWS/ES" ] }
]
}
Parallel "Or":
// Effect of ("metricName" || "namespace") && ("detail.source" || "detail.detail-type")
{
"$or": [
{ "metricName": [ "CPUUtilization", "ReadLatency" ] },
{ "namespace": [ "AWS/EC2", "AWS/ES" ] }
],
"detail" : {
"$or": [
{ "source": [ "aws.cloudwatch" ] },
{ "detail-type": [ "CloudWatch Alarm State Change"] }
]
}
}
"Or" has an "And" inside
// Effect of ("source" && ("metricName" || ("metricType && "namespace") || "scope"))
{
"source": [ "aws.cloudwatch" ],
"$or": [
{ "metricName": [ "CPUUtilization", "ReadLatency" ] },
{
"metricType": [ "MetricType" ] ,
"namespace": [ "AWS/EC2", "AWS/ES" ]
},
{ "scope": [ "Service" ] }
]
}
Nested "Or" and "And"
// Effect of ("source" && ("metricName" || ("metricType && "namespace" && ("metricId" || "spaceId")) || "scope"))
{
"source": [ "aws.cloudwatch" ],
"$or": [
{ "metricName": [ "CPUUtilization", "ReadLatency" ] },
{
"metricType": [ "MetricType" ] ,
"namespace": [ "AWS/EC2", "AWS/ES" ],
"$or" : [
{ "metricId": [ 1234 ] },
{ "spaceId": [ 1000 ] }
]
},
{ "scope": [ "Service" ] }
]
}
"$or" is possibly already used as a normal key in some applications (though its likely rare). For these cases, Ruler tries its best to maintain the backward compatibility. Only when the 3 conditions mentioned above, will ruler change behaviour because it assumes your rule really wanted an OR and was mis-configured until today. For example, the rule below will keep working as normal rule with treating "$or" as normal field name in the rule and event:
{
"source": [ "aws.cloudwatch" ],
"$or": {
"metricType": [ "MetricType" ] ,
"namespace": [ "AWS/EC2", "AWS/ES" ]
}
}
Refer to /src/test/data/normalRulesWithOrWording.json
for more examples that "$or" is parsed as normal field name by Ruler.
The keyword "$or" as "Or" relationship primitive should not be designed as normal field in both Events and Rules. Ruler supports the legacy rules where "$or" is parsed as normal field name to keep backward compatibility and give time for team to migrate their legacy "$or" usage away from their events and rules as normal filed name. Mix usage of "$or" as "Or" primitive, and "$or" as normal field name is not supported intentionally by Ruler to avoid the super awkward ambiguities on "$or" from occurring.
There are two ways to use Ruler. You can compile multiple rules
into a "Machine", and then use either of its rulesForEvent()
method
or rulesForJSONEvent()
methods to check which of the rules match any Event.
The difference between these two methods is discussed below. This discussion
will use rulesForEvent()
generically except where the difference matters.
Alternatively, you can use a single static boolean method to determine whether an individual event matches a particular rule.
There is a single static boolean method Ruler.matchesRule(event, rule)
-
both arguments are provided as JSON strings.
NOTE: There is another deprecated method called Ruler.matches(event, rule)
which
should not be used as its results are inconsistent with rulesForJSONEvent()
and
rulesForEvent()
. See the documentation on Ruler.matches(event, rule)
for details.
The matching time does not depend on the number of rules. This is the best choice if you have multiple possible rules you want to select from, and especially if you have a way to store the compiled Machine.
The matching time is impacted by the degree of non-determinism caused by wildcard and anything-but-wildcard rules. Performance deteriorates as an increasing number of the wildcard rule prefixes match a theoretical worst-case event. To avoid this, wildcard rules pertaining to the same event field should avoid common prefixes leading up to their first wildcard character. If a common prefix is required, then use the minimum number of wildcard characters and limit repeating character sequences that occur following a wildcard character. MachineComplexityEvaluator can be used to evaluate a machine and determine the degree of non-determinism, or "complexity" (i.e. how many wildcard rule prefixes match a theoretical worst-case event). Here are some data points showing a typical decrease in performance for increasing complexity scores.
It is important to limit machine complexity to protect your application. There are at least two different strategies for limiting machine complexity. Which one makes more sense may depend on your application.
Strategy #1 is more ideal in that it measures the actual complexity of the machine containing all the rules. When possible, this strategy should be used. The downside is, let's say you have a control plane that allows the creation of one rule at a time, up to a very large number. Then for each of these control plane operations, you must load all the existing rules to perform the validation. This could be very expensive. It is also prone to race conditions. Strategy #2 is a compromise. The threshold used by strategy #2 will be lower than strategy #1 since it is a per-rule threshold. Let's say you want a machine's complexity, with all rules added, to be no more than 300. Then with strategy #2, for example, you could limit each single-rule machine to complexity of 10, and allow for 30 rules containing wildcard patterns. In an absolute worst case where complexity is perfectly additive (unlikely), this would lead to a machine with complexity of 300. The downside is that it is unlikely that the complexity will be perfectly additive, and so the number of wildcard-containing rules will likely be limited unnecessarily.
For strategy #2, depending on how rules are stored, an additional attribute may need to be added to rules to indicate which ones are nondeterministic (i.e. contain wildcard patterns) in order to limit the number of wildcard-containing rules.
The following is a code snippet illustrating how to limit complexity for a given pattern, like for strategy #2.
public class Validate {
private void validate(String pattern, MachineComplexityEvaluator machineComplexityEvaluator) {
// If we cannot compile, then return exception.
List<Map<String, List<Patterns>>> compilationResult = Lists.newArrayList();
try {
compilationResult.addAll(JsonRuleCompiler.compile(pattern));
} catch (Exception e) {
InvalidPatternException internalException =
EXCEPTION_FACTORY.invalidPatternException(e.getLocalizedMessage());
throw ExceptionMapper.mapToModeledException(internalException);
}
// Validate wildcard patterns. Look for wildcard patterns out of all patterns that have been used.
Machine machine = new Machine();
int i = 0;
for (Map<String, List<Patterns>> rule : compilationResult) {
if (containsWildcard(rule)) {
// Add rule to machine for complexity evaluation.
machine.addPatternRule(Integer.toString(++i), rule);
}
}
// Machine has all rules containing wildcard match types. See if the complexity is under the limit.
int complexity = machine.evaluateComplexity(machineComplexityEvaluator);
if (complexity > MAX_MACHINE_COMPLEXITY) {
InvalidPatternException internalException = EXCEPTION_FACTORY.invalidPatternException("Rule is too complex");
throw ExceptionMapper.mapToModeledException(internalException);
}
}
private boolean containsWildcard(Map<String, List<Patterns>> rule) {
for (List<Patterns> fieldPatterns : rule.values()) {
for (Patterns fieldPattern : fieldPatterns) {
if (fieldPattern.type() == WILDCARD || fieldPattern.type() == ANYTHING_BUT_WILDCARD) {
return true;
}
}
}
return false;
}
}
The main class you'll interact with implements state-machine based rule matching. The interesting methods are:
addRule()
- adds a new rule to the machinedeleteRule()
- deletes a rule from the machinerulesForEvent()
/rulesForJSONEvent()
- finds the rules in the machine that match an eventThere are two flavors: Machine
and GenericMachine<T>
. Machine is simply GenericMachine<String>
. The
API refers to the generic type as "name", which reflects history: The String version was built first and
the strings it stored and returned were thought of as rule names.
For safety, the type used to "name" rules should be immutable. If you change the content of an object while it's being used as a rule name, this may break the operation of Ruler.
The GenericMachine and Machine constructors optionally accept a GenericMachineConfiguration object, which exposes the following configuration options.
Default: false Normally, NameStates are re-used for a given key subsequence and pattern if this key subsequence and pattern have been previously added, or if a pattern has already been added for the given key subsequence. Hence, by default, NameState re-use is opportunistic. But by setting this flag to true, NameState re-use will be forced for a key subsequence. This means that the first pattern being added for a key subsequence will re-use a NameState if that key subsequence has been added before. Meaning each key subsequence has a single NameState. This improves memory utilization exponentially in some cases but does lead to more sub-rules being stored in individual NameStates, which Ruler sometimes iterates over, which can cause a modest runtime performance regression. This defaults to false for backwards compatibility, but likely, all but the most latency sensitive of applications would benefit from setting this to true.
Here's a simple example. Consider:
machine.addRule("0", "{"key1": ["a", "b", "c"]}");
The pattern "a" creates a NameState, and then, even with additionalNameStateReuse=false, the second pattern ("b") and third pattern ("c") re-use that same NameState. But consider the following instead:
machine.addRule("0", "{"key1": ["a"]}");
machine.addRule("1", "{"key1": ["b"]}");
machine.addRule("2", "{"key1": ["c"]}");
Now, with additionalNameStateReuse=false, we end up with three NameStates, because the first pattern encountered for a key subsequence on each rule addition will create a new NameState. So, "a", "b", and "c" all get their own NameStates. However, with additionalNameStateReuse=true, "a" will create a new NameState, then "b" and "c" will reuse this same NameState. This is accomplished by storing that we already have a NameState for the key subsequence "key1".
Note that it doesn't matter if each addRule uses a different rule name or the same rule name.
All forms of this method have the same first argument, a String which provides
the name of the Rule and is returned by rulesForEvent()
. The rest of the
arguments provide the name/value pairs. They may be provided in JSON as in
the examples above (via a String, a Reader, an InputStream, or byte[]
), or as
a Map<String, List<String>>
, where the keys are the field names and the
values are the list of possible matches; using the example above, there would
be a key named detail.state
whose value would be the list containing
"initializing"
and "running"
.
Note: This method (and also deleteRule()
) is synchronized, so only one thread
may be updating the machine at any point in time.