-
Couldn't load subscription status.
- Fork 532
Lambda estimation work from Grant Erdmann #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
weights is unity. Online approximation still in use.
Conflicts: lm/interpolate/train_params_main.cc
|
There are >>>> failed merge markers in the final code |
|
Ok the merge conflict should be fixed now |
|
Wait, why is the sum of the weights unity? Nothing wrong with sharpening the probability distribution. |
|
Now that you made me think about it, I actually remember. "Interpolation" is a misnomer if they do not sum to one. The math works with a non-unit sum, but then you might be extrapolating, and it would be better to call it "loglinear combination". Each LM's distribution flattens as the sum goes to zero, steepens as the sum gets large. Luckily, it's easy to get loglinear combination (if that's what you want) by making the nullspace matrix the identity. All weights being zero (or infinity) would be allowed, but contrary to our standard intuition. |
|
The distribution is normalized by brute force. It will also produce values greater than zero even if the weights are less than 0. Having a weight less than 0 may seem odd, but consider the case where we have an LM trained on negative examples and effectively want the likelihood ratio from a good model and a bad model. The optimality criterion will generally favor non-negative weights. There is no reason for either constraint. |
|
Jeremy, I've got some optimizations, are you done with your changes? Regards, On Fri, May 22, 2015 at 10:05 PM, Jeremy Gwinnup [email protected]
|
|
Yeah I think so - thanks! |
|
@jsedoc Please push your change. I'm holding Jeremy's change (i.e. this one) up because it imposes inappropriate constraints on the \lambda s. |
|
@jgwinnup Can you produce an optimizer based on Dyer's code that does not impose constraints on the lambda values? |
|
That is probably beyond my skill right now - the above comment about the constraints was from Grant based on the code he did last week. I can back out those changes if you want. |
|
I just talked to Grant - there will be a version without the constraints ready in the morning barring any problems. |
Better output for the user.
|
Ok - this latest push from Grant is set so that there are no constraints on the lambda values as discussed. This compiles on the 'paramwork' branch, I've tried to update it to master, but I don't have a dev environment on this machine. Let me know if there's issues |
Newer version of the tuner. It has settings that seem to improve convergence speed somewhat, and the user output is clearer. Suggested use: For a tuning corpus of thousands of lines, first tune on just a dozen or two lines, then use the resulting weights to initialize "params" for tuning the full corpus. It probably will put you in the basin of convergence for Newton's method, and you won't waste time on many bad steps. Right now doing this requires recompiling with the different initialization, but I plan to change that.
|
Chase is integrating code on master |
Grant took a look at the estimation code and made some changes - LM query still needs to be sped up.