Skip to content
This repository was archived by the owner on Jan 3, 2023. It is now read-only.

Conversation

@mingshan-wang
Copy link
Contributor

This PR added the validation script for resnet50 training with both synthetic data and real data.

The tf result references under tfGPU/ folder is collected running the same command in the script on TF GPU.

The patch to make the data loader for real data deterministic is also included, and also the patch to eliminate the average_loss encapsulates in the training graph.

Copy link
Contributor

@shresthamalik shresthamalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

@avijit-nervana avijit-nervana deleted the mingshan/validate_resnet50 branch April 12, 2019 16:33
@avijit-nervana avijit-nervana restored the mingshan/validate_resnet50 branch April 12, 2019 17:14
def check_validation_results(norm_dict, metric):
test_pass = True
for norm in norm_dict:
if norm_dict[norm] > 0.1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if we get ref accuracy = 75, and ng accuracy = 75.3, then is it a failure?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script is not comparing the accuracy. It compares the training loss value at every iteration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing for loss. If ref loss is 1, and we get 0.8, is the test passing?

return total_loss, top1_acc, top5_acc


def parse_reference_file(filename):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_reference_file and parse_training_output can be a single function... I think they are separate because one parses a file, and the other parses string. Maybe we keep the string parsing function and just read the file into a string and reuse.

@avijit-nervana avijit-nervana added the Release Candidate PRs needed for the next release label Apr 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Release Candidate PRs needed for the next release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants