Skip to content

Conversation

maldil
Copy link
Contributor

@maldil maldil commented Jun 26, 2022

Description of the problems or issues

Is your pull request related to a problem? Please describe.
Thank you very much for your excellent work in analysiscenter/batchflow.
I am a graduate student at the University of Colorado-Boulder, studying the best practices of evolving ML codes. From our research, one of the most common evolution best practice in ML code is the migration of loop-based computations to vectorization, since this usually improves performance. We made the following changes in batchflow, which remove the FOR loop and use NumPy APIs. I carefully checked the modification to ensure that it does not break the code. I will gladly contribute. Please help me to merge this.

Does your pull request fix any issue.
A possible performance issue

Description of the proposed changes

Use np.sum to compute sum of elements than using inefficient Python for loops

Test plan

I ran make test as described in the contribution guide line.

@maldil maldil changed the title use no.sum to compute sum use np.sum to compute sum Jun 26, 2022
Copy link
Contributor

@lukehsiao lukehsiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Malinda,

Is it correct to assume this is an automated PR generated from R-CPatMiner?

If so, perhaps you need to fix your tool's repo name substitution, as this is not https://github.com/analysiscenter/batchflow.

One other question, do you know the performance impact of this code? We are now iterating through boxes twice, rather than once, though using much faster summing.

for i, b in enumerate(boxes):
box_len_sum = box_len_sum + b.bbox[2] - b.bbox[0]
num_char_sum = num_char_sum + len(b.get_text())
box_len_sum = np.sum([b.bbox[2] - b.bbox[0] for b in boxes] )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chore: ‏remove the extra space before the final parenthesis

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done. Thank you

@maldil
Copy link
Contributor Author

maldil commented Jun 27, 2022

Hi @lukehsiao

Yes, you got me. I am an author of the project R-CPATMiner. However, this pull request is not auto-generated by the tool. If it is, I would have avoided this error. This is a human-made error due to not paying attention to properly mentioning the project name. I'm sorry for the mistake.

Yes you are correct, however, a number of studies that evaluate the effectiveness of list comprehension and Python for loops make the case for list comprehension over Python for loops in terms of efficiency. The use of both list comp and np.sum may result in a greater performance benefit even if this increases the number of iterations. I am happy to conduct a performance test for you if you could give me an idea of the variable boxes. However, I also think that this update is more Pythonic and also cuts down on the amount of lines of code.

Thanks again!

Copy link
Contributor

@lukehsiao lukehsiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a great tool!

Thanks for the contribution :)

@lukehsiao lukehsiao merged commit 29c6f0f into HazyResearch:master Jun 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants