-
Notifications
You must be signed in to change notification settings - Fork 5
Connected Components Testing
We will use GTgraph
to generate small (N < 10e5) random graphs for correctness tests and some experimental tests. Currently, we are generating graphs using GTgraph-random
, an implementation of Erdos-Renyi graph generators.
Can we generate adversarial graphs to demonstrate our algo's superiority?
- Verify cut samples are valid (i.e. the edges exist) the vast majority, if not all the time [DONE]
- Verify correctness on small generated graphs [DONE]
- Verify correctness on a massive generated graph to test handling of values near the numerical limit
- Determine the rate of invalid cut samples and the respective rates of type I and II errors
- Determine the RTE/WA rate on many small generated graphs
- Benchmark memory usage against a naive connected-components implementation on small generated graphs
- Benchmark against massive graphs with simulated insert/delete streams
- Find the largest size of graph we can process with this algo (that we can't with a naive algo)
- Benchmark against other external-memory MST implementations
A note about random graph generation that we might consider for our experiments is copy/pasted below from an email from Michel Pelletier from SUNY Brockport.
Kronecker products are a great way to make graphs with a lot of different kinds of structure and degree distributions. You can make "Sierpinski" graphs or square "Menger" sponges or really any fractal construction starting from a small matrix template and "kronecker exponentiating" it. Mix in a little random selection and you get a random graphs with the distributions eroded but proportionally preserved. The degree distributions can vary quite a bit depending on your starting template, A true Sierpinsky upper triangle will be a DAG if you remove it's identity diagonal with a Zipfian distribution of outgoing degree, whereas a Menger sponge has similar properties but the "long tail" of degree distribution is truncated at some minimum amount. Here's a quick notebook I pushed a couple months ago that plays with this idea a bit. There's a lot of room for play here!
https://github.com/Graphegon/pygraphblas/blob/main/demo/Sierpinski-Graph.ipynb