Skip to content

Connected Components Testing

tenchd edited this page Apr 22, 2021 · 7 revisions

Connected Components Test Suite

We will use GTgraph to generate small (N < 10e5) random graphs for correctness tests and some experimental tests. Currently, we are generating graphs using GTgraph-random, an implementation of Erdos-Renyi graph generators. Can we generate adversarial graphs to demonstrate our algo's superiority?

Correctness Tests

  • Verify cut samples are valid (i.e. the edges exist) the vast majority, if not all the time [DONE]
  • Verify correctness on small generated graphs [DONE]
  • Verify correctness on a massive generated graph to test handling of values near the numerical limit

Experimental Tests

  • Determine the rate of invalid cut samples and the respective rates of type I and II errors
  • Determine the RTE/WA rate on many small generated graphs
  • Benchmark memory usage against a naive connected-components implementation on small generated graphs
  • Benchmark against massive graphs with simulated insert/delete streams
  • Find the largest size of graph we can process with this algo (that we can't with a naive algo)
  • Benchmark against other external-memory MST implementations

A note about random graph generation that we might consider for our experiments is copy/pasted below from an email from Michel Pelletier from SUNY Brockport.

Kronecker products are a great way to make graphs with a lot of different kinds of structure and degree distributions. You can make "Sierpinski" graphs or square "Menger" sponges or really any fractal construction starting from a small matrix template and "kronecker exponentiating" it. Mix in a little random selection and you get a random graphs with the distributions eroded but proportionally preserved. The degree distributions can vary quite a bit depending on your starting template, A true Sierpinsky upper triangle will be a DAG if you remove it's identity diagonal with a Zipfian distribution of outgoing degree, whereas a Menger sponge has similar properties but the "long tail" of degree distribution is truncated at some minimum amount. Here's a quick notebook I pushed a couple months ago that plays with this idea a bit. There's a lot of room for play here!

https://github.com/Graphegon/pygraphblas/blob/main/demo/Sierpinski-Graph.ipynb

Clone this wiki locally