Skip to content

Conversation

@jgwinnup
Copy link
Collaborator

Grant took a look at the estimation code and made some changes - LM query still needs to be sped up.

@kpu
Copy link
Owner

kpu commented May 22, 2015

There are >>>> failed merge markers in the final code

@jgwinnup
Copy link
Collaborator Author

Ok the merge conflict should be fixed now

@kpu
Copy link
Owner

kpu commented May 22, 2015

Wait, why is the sum of the weights unity? Nothing wrong with sharpening the probability distribution.

@jgwinnup
Copy link
Collaborator Author

Now that you made me think about it, I actually remember. "Interpolation" is a misnomer if they do not sum to one.

The math works with a non-unit sum, but then you might be extrapolating, and it would be better to call it "loglinear combination". Each LM's distribution flattens as the sum goes to zero, steepens as the sum gets large. Luckily, it's easy to get loglinear combination (if that's what you want) by making the nullspace matrix the identity. All weights being zero (or infinity) would be allowed, but contrary to our standard intuition.
The nonnegativity constraint was the more annoying one. I have some additional edits in testing that should make the active set algorithm more elegant and efficient.
You could enforce interpolation in ways other than the sum (i.e., 1-norm), but that's the easiest way (for me, at least) computationally. You could choose 2-norm, infinity-norm, or some other measure being equal to unity. You would just have to avoid weights with only one nonzero component, where that component is not unity. E.g., you would need to exclude [0 0 .5] and [0 0 1.5].

@kpu
Copy link
Owner

kpu commented May 23, 2015

The distribution is normalized by brute force. It will also produce values greater than zero even if the weights are less than 0. Having a weight less than 0 may seem odd, but consider the case where we have an LM trained on negative examples and effectively want the likelihood ratio from a good model and a bad model. The optimality criterion will generally favor non-negative weights.

There is no reason for either constraint.

@jsedoc
Copy link
Collaborator

jsedoc commented May 24, 2015

Jeremy,

I've got some optimizations, are you done with your changes?

Regards,
João

On Fri, May 22, 2015 at 10:05 PM, Jeremy Gwinnup [email protected]
wrote:

Now that you made me think about it, I actually remember. "Interpolation"
is a misnomer if they do not sum to one.

The math works with a non-unit sum, but then you might be extrapolating,
and it would be better to call it "loglinear combination". Each LM's
distribution flattens as the sum goes to zero, steepens as the sum gets
large. Luckily, it's easy to get loglinear combination (if that's what you
want) by making the nullspace matrix the identity. All weights being zero
(or infinity) would be allowed, but contrary to our standard intuition.
The nonnegativity constraint was the more annoying one. I have some
additional edits in testing that should make the active set algorithm more
elegant and efficient.
You could enforce interpolation in ways other than the sum (i.e., 1-norm),
but that's the easiest way (for me, at least) computationally. You could
choose 2-norm, infinity-norm, or some other measure being equal to unity.
You would just have to avoid weights with only one nonzero component, where
that component is not unity. E.g., you would need to exclude [0 0 .5] and
[0 0 1.5].


Reply to this email directly or view it on GitHub
#32 (comment).

@jgwinnup
Copy link
Collaborator Author

Yeah I think so - thanks!

@kpu
Copy link
Owner

kpu commented May 27, 2015

@jsedoc Please push your change. I'm holding Jeremy's change (i.e. this one) up because it imposes inappropriate constraints on the \lambda s.

@kpu
Copy link
Owner

kpu commented May 27, 2015

@jgwinnup Can you produce an optimizer based on Dyer's code that does not impose constraints on the lambda values?

@jgwinnup
Copy link
Collaborator Author

That is probably beyond my skill right now - the above comment about the constraints was from Grant based on the code he did last week. I can back out those changes if you want.
@jsedoc - can you merge your changes into master as opposed to this fork?

@jgwinnup
Copy link
Collaborator Author

I just talked to Grant - there will be a version without the constraints ready in the morning barring any problems.

@jgwinnup
Copy link
Collaborator Author

Ok - this latest push from Grant is set so that there are no constraints on the lambda values as discussed. This compiles on the 'paramwork' branch, I've tried to update it to master, but I don't have a dev environment on this machine. Let me know if there's issues

jgwinnup added 3 commits May 29, 2015 08:29
Newer version of the tuner. It has settings that seem to improve convergence speed somewhat, and the user output is clearer.

Suggested use: For a tuning corpus of thousands of lines, first tune on just a dozen or two lines, then use the resulting weights to initialize "params" for tuning the full corpus. It probably will put you in the basin of convergence for Newton's method, and you won't waste time on many bad steps. Right now doing this requires recompiling with the different initialization, but I plan to change that.
@jgwinnup
Copy link
Collaborator Author

jgwinnup commented Jun 1, 2015

Chase is integrating code on master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants