-
-
Notifications
You must be signed in to change notification settings - Fork 24
Add support for 'meta' alignment operations (OP_PAD and OP_HARD_CLIP)
#64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
kescobo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for putting in the effort to add this - huge win!
I don't feel confident in the thoroughness of my review, but I don't see anything that looks obviously bad, and the tests still pass so I'm good with this. I would love for someone else that uses this functionality more regularly to take a look as well, though.
|
@jakobnissen, @CiaranOMara, could I get a review from one of you? |
Codecov ReportBase: 87.60% // Head: 88.20% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #64 +/- ##
==========================================
+ Coverage 87.60% 88.20% +0.59%
==========================================
Files 16 16
Lines 1138 1178 +40
==========================================
+ Hits 997 1039 +42
+ Misses 141 139 -2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. β View full report at Codecov. |
CiaranOMara
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
I agree with @jakobnissen, we should postpone merging this until #44 and #72 are both in |
Signed-off-by: Thomas A. Christensen II <[email protected]>
Signed-off-by: Thomas A. Christensen II <[email protected]>
Signed-off-by: Thomas A. Christensen II <[email protected]>
Signed-off-by: Thomas A. Christensen II <[email protected]>
Signed-off-by: Thomas A. Christensen II <[email protected]>
Signed-off-by: Thomas A. Christensen II <[email protected]>
Signed-off-by: Thomas A. Christensen II <[email protected]>
Signed-off-by: Thomas A. Christensen II <[email protected]>
According to the SAM spec, "H can only be present as the first and/or last operation." In addition, it's not even classified as an insert operation anymore. Signed-off-by: Thomas A. Christensen II <[email protected]>
Signed-off-by: Thomas A. Christensen II <[email protected]>
Signed-off-by: Thomas A. Christensen II <[email protected]>
Signed-off-by: Thomas A. Christensen II <[email protected]>
β¦tion Signed-off-by: Thomas A. Christensen II <[email protected]>
Signed-off-by: Thomas A. Christensen II <[email protected]>
Signed-off-by: Thomas A. Christensen II <[email protected]>
The test suite included logic for treating ends of alignment differently, but due to the loop condition, this logic was always false. Remove that logic entirely. Signed-off-by: Thomas A. Christensen II <[email protected]>
Signed-off-by: Thomas A. Christensen II <[email protected]>
Signed-off-by: Thomas A. Christensen II <[email protected]>
Signed-off-by: Thomas A. Christensen II <[email protected]>
Signed-off-by: Thomas A. Christensen II <[email protected]>
Signed-off-by: Thomas A. Christensen II <[email protected]>
The `count_aligned` function was updated by @alyst to use the new `alnpos` field in cc8237c ("count_aligned(): optimize with alnpos"). That optimization assumes that all anchor types consume either sequence or reference positions as they consume alignment positions, which isn't true for meta-operations. Switch back to the old way, which does not make any assumptions about how many bases each operation consumes. Signed-off-by: Thomas A. Christensen II <[email protected]>
Signed-off-by: Thomas A. Christensen II <[email protected]>
Signed-off-by: Thomas A. Christensen II <[email protected]>
Co-authored-by: Kevin Bonham <[email protected]>
Signed-off-by: Thomas A. Christensen II <[email protected]>
20f1856 to
96f8ee7
Compare
|
Failed downstream is expected: this is a breaking change due to the inclusion of #44. |
Types of changes
This PR implements the following changes:
π Additional detail
Adds support for the
OP_PADoperation, and fixes the implementation of theOP_HARD_CLIPoperation. I have dubbed these two operations 'meta' operations, as they contain information about the origin and context of the alignment, but have no bearing on the content (i.e. reference or query) of the alignment.New features
ismetaop, which returnstrueif passedOP_PADorOP_HARD_CLIPOP_PADoperation no longer throws an errorOP_HARD_CLIPorOP_SOFT_CLIPin the wrong placeAccording to the SAM specification, these have to be present at the end of an alignment, with soft clips optionally buffering hard clips from the rest of the alignment
Changed functionality
cigar(::Alignment)now uses theAlignmentAnchor.alnposto print operation lengthOP_HARD_CLIPis no longer considered an insert operation, andisinsertop(OP_HARD_CLIP)now returnsfalseOP_PADindicates it's supported now(I know I said this above, but it's probably the most likely addition to break something, even though it does conform to spec)
Justification
BioAlignments should be able to handle all of the operations defined by the SAM specification. As it currently is, BioAlignments is unable even to parse the alignments in Section 1.1 "An example" of the specification. While the work on clips may seem out-of-scope for a change adding an operation support, the way clips work and pads work are very similar and needed to be considered together.
Example
Using query sequence "r002" from Section 1.1 of the SAM specification:
BioAlignments 2.0.0
This PR
My testing also indicates that this fixes #56,
BioAlignments 2.0.0
This PR
βοΈ Checklist
docs/src/.[UNRELEASED]section of the manually curatedCHANGELOG.mdfile for this repository.