Skip to content

Task/add datatype error indexing #3021

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 33 commits into
base: integration
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
1a01ee4
Initial commit for a random test suite
apmoriarty Jun 27, 2025
b11946f
Add error constant
SethSmucker Jul 7, 2025
6f0ba0c
Seth Notes
SethSmucker Jul 7, 2025
76601e9
First bit of adding error index stuff to BIH.setup()
SethSmucker Jul 7, 2025
7697884
Moved logic to ErrorShardedIngestHelper, updated callers in BaseInges…
SethSmucker Jul 8, 2025
0abb40f
wip
SethSmucker Jul 8, 2025
f3a03fc
wip
SethSmucker Jul 9, 2025
2e420a9
Documentation
SethSmucker Jul 9, 2025
437af04
formatting
SethSmucker Jul 9, 2025
7fa2f61
Formatting
SethSmucker Jul 9, 2025
077e211
Formatting
SethSmucker Jul 9, 2025
541df58
copy the random tests
SethSmucker Jul 10, 2025
4a700e6
Laura help
SethSmucker Jul 14, 2025
ef81543
WOOO IT WORKS!!! For now
SethSmucker Jul 15, 2025
de286b5
Add activeDataType to base ingest helper.
SethSmucker Jul 16, 2025
b0f1641
wip
SethSmucker Jul 16, 2025
e5dc301
Trivial
SethSmucker Jul 16, 2025
b1a9a8b
merge
SethSmucker Jul 17, 2025
e24a387
Formatting
SethSmucker Jul 17, 2025
fd7a545
wip
SethSmucker Jul 18, 2025
e5668bf
wip
SethSmucker Jul 18, 2025
4e908b7
wip
SethSmucker Jul 18, 2025
606bf02
fancy string magic
SethSmucker Jul 21, 2025
82ce3fd
beauty
SethSmucker Jul 23, 2025
8f614c1
Add comment
SethSmucker Jul 23, 2025
a2309ed
wip
SethSmucker Jul 24, 2025
d4865be
Wip
SethSmucker Jul 25, 2025
78ade6c
Test wip
SethSmucker Jul 28, 2025
385d749
wip
SethSmucker Jul 29, 2025
fb6ec05
wip
SethSmucker Aug 1, 2025
ba7c6b3
We need to ingest multiple dt for the test. How do we do that?
SethSmucker Aug 4, 2025
6b76eb7
savestate
SethSmucker Aug 4, 2025
a946100
wip
SethSmucker Aug 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ public void setup(Configuration config) {
aliaser.setup(getType(), config);
}

private void initType(final Configuration config) {
protected void initType(final Configuration config) {
if (type != null && TypeRegistry.hasInstance())
return;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,13 @@ public interface FieldConfigHelper {
boolean isTokenizedField(String fieldName);

boolean isReverseTokenizedField(String fieldName);

/*
* SETH NOTE Should this be documented? Not sure what the consensus is for abstract methods. These also force other classes to implement these methods like
* XMLFieldConfigHelper. Should they be added to those classes as well, or should we take another approach to adding ErrorIndexedFields?
*/
boolean isErrorIndexedField(String fieldName);

boolean isErrorReverseIndexedField(String fieldName);

}
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,22 @@ public boolean isReverseTokenizedField(String fieldName) {
return isNoMatchReverseTokenized();
}

/*
* TODO: Complete these once you get the OK on keeping these in the FieldConfigHelper. Keeps the compiler nice and quiet :)
*/
@Override
public boolean isErrorIndexedField(String fieldName) {
return false;
}

/*
* TODO: Complete these once you get the OK on keeping these in the FieldConfigHelper. Keeps the compiler nice and quiet :)
*/
@Override
public boolean isErrorReverseIndexedField(String fieldName) {
return false;
}

public boolean isNoMatchStored() {
return noMatchStored;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ public abstract class AbstractIngestHelper extends DataTypeHelperImpl implements
/* Map of field names to normalizers, null key is the default normalizer */
protected MaskedFieldHelper mfHelper = null;
protected Set<String> shardExclusions = new HashSet<>();

protected boolean hasIndexDisallowlist = false;
protected boolean hasReverseIndexDisallowlist = false;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,7 @@ public abstract class BaseIngestHelper extends AbstractIngestHelper implements C
public static final String FIELD_CONFIG_FILE = ".data.category.field.config.file";

private static final String PROPERTY_MALFORMED = " property malformed: ";

private static final Logger log = LoggerFactory.getLogger(BaseIngestHelper.class);

private Multimap<String,datawave.data.type.Type<?>> typeFieldMap = null;
Expand Down Expand Up @@ -262,6 +263,12 @@ public void setup(Configuration config) {
this.fieldConfigHelper = XMLFieldConfigHelper.load(fieldConfigFile, this);
}

// --- INDEX_FIELDS ---

/*
* SETH NOTE This is most likely the start of the chunk that needs to be cloned for the error index stuff.
*/

// Process the indexed fields
if (config.get(this.getType().typeName() + DISALLOWLIST_INDEX_FIELDS) != null) {
if (log.isDebugEnabled()) {
Expand Down Expand Up @@ -295,6 +302,12 @@ public void setup(Configuration config) {
}
}

// --- REVERSE INDEX FIELDS ---

/*
* SETH NOTE This is what Laura was talking about-- the Allow/Disallow is mutually exclusive. I haven't seen this same block above for the non-reverse
* index fields. Maybe I need to take another look.
*/
// Ensure that we have only an allowlist or a disallowlist of fields to
// reverse index
if (config.get(this.getType().typeName() + DISALLOWLIST_REVERSE_INDEX_FIELDS) != null
Expand Down Expand Up @@ -343,6 +356,10 @@ public void setup(Configuration config) {

}

/*
* SETH NOTE Not sure if I'll need what's after this. I'll start with the above block and add to it as needed.
*/

// gather the list of all indexed fields across all types
// this list is only used for generating warnings if we are not indexing
// something that
Expand Down Expand Up @@ -453,7 +470,7 @@ public void setup(Configuration config) {
}
}

private void moveToPatternMap(Set<String> in, Map<String,Pattern> out) {
protected void moveToPatternMap(Set<String> in, Map<String,Pattern> out) {
for (Iterator<String> itr = in.iterator(); itr.hasNext();) {
String str = itr.next();
if (str.indexOf('*') != -1) {
Expand Down Expand Up @@ -1225,4 +1242,5 @@ public void updateDatawaveTypes(String fieldName, String typeClasses) {
}
}
}

}
Loading
Loading