Skip to content

Task/add datatype error indexing #3021

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 31 commits into
base: integration
Choose a base branch
from

Conversation

SethSmucker
Copy link
Collaborator

No description provided.

private Map<Type,IndexedFields> errorIndexedFields = new HashMap<>();
private Map<Type,IndexedFields> errorReverseIndexedFields = new HashMap<>();

protected boolean hasErrorIndexDisallowlist = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These properties need to be moved into the IndexedFields class and set appropriately. They are datatype-specific too.

String dataTypeString = keyParts.get(datatypeIndex);
String errorDataTypeString = ERROR + "." + dataTypeString;

activeDataType = TypeRegistry.getType(dataTypeString);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not want to ever set activeDataType in the setup method, or use it when parsing out the property values. We want to avoid implying that the activeDataType should ever be set outside of the setter method for it. You should use a local variable instead to contain the parsed datatype.

// get the property's datatype
// "error.<datatype>.<etc>..."
List<String> keyParts = List.of(propertyKey.split("\\."));
int datatypeIndex = keyParts.indexOf(ERROR) + 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, your parsing of the properties assumes that a datatype is always present in the property between the first and second period. There are many different properties that can start with error, and we want to first make sure we're looking at a possible datatype specific property before we attempt to extract and parse the datatype from the property. See the following for an example of how to first verify that a property will contain a datatype before attempting to extract it:

    @Override
    public void setup(Configuration config) {
        Map<String,String> errorProperties = config.getPropsWithPrefix((ERROR + "."));
        for (Map.Entry<String,String> entry : errorProperties.entrySet()) {
            String property = entry.getKey();
            // Check if the property is one that allows for configuring index/reverse index fields.
            if(isIndexProperty(property)) {
                // If the third segment of the property name is 'data', then the second segment contains a datatype.
                String[] parts = property.split("\\.");
                if(parts[2].equals("data")) {
                    Type dataType = TypeRegistry.getType(parts[1]);
                    updateDatatypeIndexFields(dataType, property, entry.getValue());
                }
            }
        }
    }
    
    private boolean isIndexProperty(String property) {
        return property.endsWith(INDEX_FIELDS) || property.endsWith(DISALLOWLIST_INDEX_FIELDS) || property.endsWith(REVERSE_INDEX_FIELDS) || property.endsWith(DISALLOWLIST_REVERSE_INDEX_FIELDS);
    }
    
    private void updateDatatypeIndexFields(Type dataType, String property, String value) {
        if(property.endsWith(INDEX_FIELDS)) {
            // Logic
            // call sub-method
        } else if(property.endsWith(DISALLOWLIST_INDEX_FIELDS)) {
            // Logic
            // call sub-method
        } else if(property.endsWith(REVERSE_INDEX_FIELDS)) {
            // Logic
            // call sub-method
        } else if(property.endsWith(DISALLOWLIST_REVERSE_INDEX_FIELDS)) {
            // Logic
            // call sub-method
        } else {
            throw new IllegalArgumentException("Unknown property: " + property);
        }
    }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, you could do the following and build a map that contains only the datatype-specific properties and work with that:

    @Override
    public void setup(Configuration config) {
        super.setup(config);
        config.set(Properties.DATA_NAME, "error");
        
        Map<Type,Map<String,String>> dataTypeSpecificProperties = getDataTypeSpecificProperties(config);
        for (Map.Entry<Type,Map<String,String>> entry : dataTypeSpecificProperties.entrySet()) {
            Type type = entry.getKey();
            Map<String,String> properties = entry.getValue();
            
            if(properties.containsKey(INDEX_FIELDS) && properties.containsKey(DISALLOWLIST_INDEX_FIELDS)) {
                // throw exception
            }
            
            if(properties.containsKey(REVERSE_INDEX_FIELDS) && properties.containsKey(DISALLOWLIST_REVERSE_INDEX_FIELDS)) {
                // throw exception
            }
            
            // Continue logic to update IndexFields for the datatype.
        }
        
    }
    
    private Map<Type, Map<String, String>> getDataTypeSpecificProperties(Configuration config) {
        Map<Type,Map<String,String>> dataTypeSpecificProperties = new HashMap<>();
        Map<String,String> errorProperties = config.getPropsWithPrefix((ERROR + "."));
        for (Map.Entry<String,String> entry : errorProperties.entrySet()) {
            String property = entry.getKey();
            // Check if the property is one that allows for configuring index/reverse index fields.
            if(isIndexProperty(property)) {
                // If the third segment of the property name is 'data', then the second segment contains a datatype.
                String[] parts = property.split("\\.");
                if(parts[2].equals("data")) {
                    Type dataType = TypeRegistry.getType(parts[1]);
                    // Get the property suffix without the 'error.<datatype>' portion.
                    String propertySuffix = getPropertySuffix(property);
                    dataTypeSpecificProperties.computeIfAbsent(dataType, k -> new HashMap<>()).put(propertySuffix, entry.getValue());
                }
            }
        }
        return dataTypeSpecificProperties;
    }
    
    private boolean isIndexProperty(String property) {
        return property.endsWith(INDEX_FIELDS) || property.endsWith(DISALLOWLIST_INDEX_FIELDS) || property.endsWith(REVERSE_INDEX_FIELDS) || property.endsWith(DISALLOWLIST_REVERSE_INDEX_FIELDS);
    }
    
    private String getPropertySuffix(String property) {
        if(property.endsWith(INDEX_FIELDS)) {
            return INDEX_FIELDS;
        } else if(property.endsWith(DISALLOWLIST_INDEX_FIELDS)) {
            return DISALLOWLIST_INDEX_FIELDS;
        } else if(property.endsWith(REVERSE_INDEX_FIELDS)) {
            return REVERSE_INDEX_FIELDS;
        } else if(property.endsWith(DISALLOWLIST_REVERSE_INDEX_FIELDS)) {
            return DISALLOWLIST_REVERSE_INDEX_FIELDS;
        } else {
            throw new IllegalArgumentException("Unhandled property: " + property);
        }
    }

}

private IngestHelperInterface delegate = null;
private void handleIndexFields(Collection<String> errorIndexedStrings){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for this method is effectively identical to handleReverseIndexFields. You could create a more generic version of this method that accepts a reference to the collection in the IndexedFields instance that needs to be updated in addition to the collection of fields to be updated. Same goes for handleDisallowList methods as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering this, but having them separate would make it easier to modify one without touching the other down the line. Although I'm not sure how realistic that is to future-proof against, just depends how tightly Forward/Reverse index fields are coupled. Thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't foresee any eventuality where indexed fields would be treated differently from reverse indexed fields in this case. I think it's fine to treat them similarly in this context.

setHasErrorIndexDisallowlist(false);

// create an instance of the ErrorIndexFields for this datatype
this.errorIndexedFields.putIfAbsent(activeDataType, new IndexedFields());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a note, you can avoid having to do this by using the computeIfAbsent() method instead. For instance, you could do

IndexedFields indexedFields = this.errorIndexedFields.computeIfAbsent(dataType, (k) -> new IndexedFields());

This will lazily initialize and put an IndexedFields instance in the map for the datatype if one is not already present, and return that same instance.

Map<Type,Map<String,String>> dataTypeSpecificProperties = getDataTypeSpecificProperties(config);

// if there were no datatype specific properties found
if(dataTypeSpecificProperties.isEmpty()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not throw an exception if there are no datatype specific properties. Datatype specific properties should not be treated as required.

private static final String ERROR = "error";
private IngestHelperInterface delegate = null;

private Map<Type,IndexedFields> errorIndexedFields = new HashMap<>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two properties can be final.

private Type activeDataType;

private static class IndexedFields {
private Set<String> indexedFields = new HashSet<>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The collection variables here can be final.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants