Handler - extractProperties

 

Description

Extracts name-value pairs (as metadata) from the content of a semistructures composit string product. It checks to see if name-value pairs exist in the string content of the subproducts of a composite product and if present will add them as properties (or metadata) of that subproduct.

The output can be further filtered and processed by other agents accordingly (e.g. passIfProductAttributesMatch).

For example, if the product has its content as "location: boston" then the property "location" is added to the product with the value "boston".

Changing the key-value separator to "=" and line delimiter to ";" with the input of "location=boston;foo=bar" would extract two name-value pairs: 1) location=boston, 2) foo=bar.

 

Configuration Variables

lineSeparator
The characters in the lineSeparator string are the delimiters for separating lines. The tokenizer uses the default delimiter set, which is "\n\r": the newline character, and the carriage-return character.

keyValueSeparator

String that the separates the key name from its value. The default separator is ':'. Whitespace before and after the separator is ignored so for example the string value "location: boston" is evaluated the same as "location:boston", which would add the a property with the name of "location" and assign a value of "boston" to it.

valueSeparator

The characters in the valueSeparator string are the delimiters for separating multiple values. The value tokenizer uses the "|" character to separate multiple values; e.g. "location: 50.0N,30.0W" would assign an ordered list of values [ "50.0N", "30.0W" ] to a location property.

 

Product

The input product is passed untouched except where name-value pairs were added to the subproduct properties (i.e., only metadata is affected; content and product structure are unaffected).

 

How it Works

This handler applies a filter to the string content of its input product in 3 steps.
  1. The filter first tokenizes the string into lines with the lineSeparator delimiters.
  2. Then for each line, a key-value pair is found by the occurrence of a keyValueSeparator string which if present will break the line into a key part and a value part. Trailing and leading whitespace for both parts are ignored. Whitespace in the first value can be preserved by starting the first value or ending the last value with a separator character (e.g. location:| foo bar |), which would store the value as " foo bar ".
  3. If a key-value pair is present then the value part is tokenized with the valueSeparator and the resulting value or values are assigned to the product's properties.

Revised: 2 November 1999