-
Notifications
You must be signed in to change notification settings - Fork 260
Task/add datatype error indexing #3021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: integration
Are you sure you want to change the base?
Conversation
private Map<Type,IndexedFields> errorIndexedFields = new HashMap<>(); | ||
private Map<Type,IndexedFields> errorReverseIndexedFields = new HashMap<>(); | ||
|
||
protected boolean hasErrorIndexDisallowlist = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These properties need to be moved into the IndexedFields
class and set appropriately. They are datatype-specific too.
String dataTypeString = keyParts.get(datatypeIndex); | ||
String errorDataTypeString = ERROR + "." + dataTypeString; | ||
|
||
activeDataType = TypeRegistry.getType(dataTypeString); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not want to ever set activeDataType
in the setup method, or use it when parsing out the property values. We want to avoid implying that the activeDataType
should ever be set outside of the setter method for it. You should use a local variable instead to contain the parsed datatype.
// get the property's datatype | ||
// "error.<datatype>.<etc>..." | ||
List<String> keyParts = List.of(propertyKey.split("\\.")); | ||
int datatypeIndex = keyParts.indexOf(ERROR) + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, your parsing of the properties assumes that a datatype is always present in the property between the first and second period. There are many different properties that can start with error
, and we want to first make sure we're looking at a possible datatype specific property before we attempt to extract and parse the datatype from the property. See the following for an example of how to first verify that a property will contain a datatype before attempting to extract it:
@Override
public void setup(Configuration config) {
Map<String,String> errorProperties = config.getPropsWithPrefix((ERROR + "."));
for (Map.Entry<String,String> entry : errorProperties.entrySet()) {
String property = entry.getKey();
// Check if the property is one that allows for configuring index/reverse index fields.
if(isIndexProperty(property)) {
// If the third segment of the property name is 'data', then the second segment contains a datatype.
String[] parts = property.split("\\.");
if(parts[2].equals("data")) {
Type dataType = TypeRegistry.getType(parts[1]);
updateDatatypeIndexFields(dataType, property, entry.getValue());
}
}
}
}
private boolean isIndexProperty(String property) {
return property.endsWith(INDEX_FIELDS) || property.endsWith(DISALLOWLIST_INDEX_FIELDS) || property.endsWith(REVERSE_INDEX_FIELDS) || property.endsWith(DISALLOWLIST_REVERSE_INDEX_FIELDS);
}
private void updateDatatypeIndexFields(Type dataType, String property, String value) {
if(property.endsWith(INDEX_FIELDS)) {
// Logic
// call sub-method
} else if(property.endsWith(DISALLOWLIST_INDEX_FIELDS)) {
// Logic
// call sub-method
} else if(property.endsWith(REVERSE_INDEX_FIELDS)) {
// Logic
// call sub-method
} else if(property.endsWith(DISALLOWLIST_REVERSE_INDEX_FIELDS)) {
// Logic
// call sub-method
} else {
throw new IllegalArgumentException("Unknown property: " + property);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, you could do the following and build a map that contains only the datatype-specific properties and work with that:
@Override
public void setup(Configuration config) {
super.setup(config);
config.set(Properties.DATA_NAME, "error");
Map<Type,Map<String,String>> dataTypeSpecificProperties = getDataTypeSpecificProperties(config);
for (Map.Entry<Type,Map<String,String>> entry : dataTypeSpecificProperties.entrySet()) {
Type type = entry.getKey();
Map<String,String> properties = entry.getValue();
if(properties.containsKey(INDEX_FIELDS) && properties.containsKey(DISALLOWLIST_INDEX_FIELDS)) {
// throw exception
}
if(properties.containsKey(REVERSE_INDEX_FIELDS) && properties.containsKey(DISALLOWLIST_REVERSE_INDEX_FIELDS)) {
// throw exception
}
// Continue logic to update IndexFields for the datatype.
}
}
private Map<Type, Map<String, String>> getDataTypeSpecificProperties(Configuration config) {
Map<Type,Map<String,String>> dataTypeSpecificProperties = new HashMap<>();
Map<String,String> errorProperties = config.getPropsWithPrefix((ERROR + "."));
for (Map.Entry<String,String> entry : errorProperties.entrySet()) {
String property = entry.getKey();
// Check if the property is one that allows for configuring index/reverse index fields.
if(isIndexProperty(property)) {
// If the third segment of the property name is 'data', then the second segment contains a datatype.
String[] parts = property.split("\\.");
if(parts[2].equals("data")) {
Type dataType = TypeRegistry.getType(parts[1]);
// Get the property suffix without the 'error.<datatype>' portion.
String propertySuffix = getPropertySuffix(property);
dataTypeSpecificProperties.computeIfAbsent(dataType, k -> new HashMap<>()).put(propertySuffix, entry.getValue());
}
}
}
return dataTypeSpecificProperties;
}
private boolean isIndexProperty(String property) {
return property.endsWith(INDEX_FIELDS) || property.endsWith(DISALLOWLIST_INDEX_FIELDS) || property.endsWith(REVERSE_INDEX_FIELDS) || property.endsWith(DISALLOWLIST_REVERSE_INDEX_FIELDS);
}
private String getPropertySuffix(String property) {
if(property.endsWith(INDEX_FIELDS)) {
return INDEX_FIELDS;
} else if(property.endsWith(DISALLOWLIST_INDEX_FIELDS)) {
return DISALLOWLIST_INDEX_FIELDS;
} else if(property.endsWith(REVERSE_INDEX_FIELDS)) {
return REVERSE_INDEX_FIELDS;
} else if(property.endsWith(DISALLOWLIST_REVERSE_INDEX_FIELDS)) {
return DISALLOWLIST_REVERSE_INDEX_FIELDS;
} else {
throw new IllegalArgumentException("Unhandled property: " + property);
}
}
} | ||
|
||
private IngestHelperInterface delegate = null; | ||
private void handleIndexFields(Collection<String> errorIndexedStrings){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for this method is effectively identical to handleReverseIndexFields
. You could create a more generic version of this method that accepts a reference to the collection in the IndexedFields instance that needs to be updated in addition to the collection of fields to be updated. Same goes for handleDisallowList
methods as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was considering this, but having them separate would make it easier to modify one without touching the other down the line. Although I'm not sure how realistic that is to future-proof against, just depends how tightly Forward/Reverse index fields are coupled. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't foresee any eventuality where indexed fields would be treated differently from reverse indexed fields in this case. I think it's fine to treat them similarly in this context.
setHasErrorIndexDisallowlist(false); | ||
|
||
// create an instance of the ErrorIndexFields for this datatype | ||
this.errorIndexedFields.putIfAbsent(activeDataType, new IndexedFields()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a note, you can avoid having to do this by using the computeIfAbsent() method instead. For instance, you could do
IndexedFields indexedFields = this.errorIndexedFields.computeIfAbsent(dataType, (k) -> new IndexedFields());
This will lazily initialize and put an IndexedFields instance in the map for the datatype if one is not already present, and return that same instance.
Map<Type,Map<String,String>> dataTypeSpecificProperties = getDataTypeSpecificProperties(config); | ||
|
||
// if there were no datatype specific properties found | ||
if(dataTypeSpecificProperties.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not throw an exception if there are no datatype specific properties. Datatype specific properties should not be treated as required.
private static final String ERROR = "error"; | ||
private IngestHelperInterface delegate = null; | ||
|
||
private Map<Type,IndexedFields> errorIndexedFields = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two properties can be final.
private Type activeDataType; | ||
|
||
private static class IndexedFields { | ||
private Set<String> indexedFields = new HashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The collection variables here can be final.
No description provided.