zhengqunkoo/smith-waterman

smith-waterman

0 subscribers

X10

Align sequences of data, with sequential, vectorized, and parallel implementations

Created Mar 2019

zhengqunkoo/smith-waterman

Live activities

pushed

Expanded documentation for project setup and execution

zhengqunkoo/smith-waterman • about 7 years • view on GitHub

Updated the README to provide more explicit instructions on setting up the X10 environment and running the Smith-Waterman implementation. By clarifying the build process and including specific command examples for various test cases, it's now easier to get the project running. This should reduce friction for anyone setting up the repository for the first time.

merged

Integrated vectorized and striped Smith-Waterman parallel implementations

zhengqunkoo/smith-waterman • about 7 years • view on GitHub

This merge introduces high-performance vectorized and striped variations of the Smith-Waterman algorithm for sequence alignment. The update includes significant optimizations, such as replacing cell-based matrix structures with raw memory representations to reduce overhead, alongside improvements to parallel execution patterns. These additions provide researchers with multiple strategies for exploring hardware acceleration and cache-efficient sequence analysis. High Variance In Manual Vectorization Performance

pushed

Updated performance benchmark results for parvect

zhengqunkoo/smith-waterman • about 7 years • view on GitHub

Updated the expected output and timing benchmarks for parvect across test suites including 'gap', 'men', and 'subst'. These changes reflect adjustments in the algorithm's output alignment and performance metrics. These updates ensure the regression test suite remains synchronized with the latest implementation.

pushed

Removed extra benchmark result files to keep the results set consistent and easier to interpret.

zhengqunkoo/smith-waterman • about 7 years • view on GitHub

This change deletes duplicate or unnecessary files across several results/* benchmark directories, including gap, men, and subst variants in sequential, parallel, and vectorized runs. There’s no code change here, but it does clean up the recorded outputs so future comparisons and analysis are less noisy and less error-prone. The practical effect is a tidier benchmark corpus with fewer confusing extra entries.

pushed

Reinstated pure Long matrix optimization for the H-matrix to save memory

zhengqunkoo/smith-waterman • about 7 years • view on GitHub

This update restores a previous memory optimization that replaces the Cell object matrix with a pure Long matrix for the H-matrix. By avoiding object allocation for each matrix cell, this significantly reduces the overall memory footprint and avoids object overhead. Ultimately, this change improves runtime performance and cache locality for the Smith-Waterman sequence alignment implementation.

pushed

Fixed array index and scoring logic to ignore padded illegal characters in Smith-Waterman vector calculations

zhengqunkoo/smith-waterman • about 7 years • view on GitHub

When calculating score profiles for the Smith-Waterman sequence alignment, the program previously attempted to look up the alphabet index even for padded illegal characters, potentially leading to incorrect scores or errors. This update introduces a check to ignore ILLEGAL_CHAR entries during generated vector evaluations and removes an unnecessary increment in the vector count initialization. This ensures accurate sequence scoring and avoids out-of-bounds operations on padded trailing fragments.

pushed

Fixed padding function in vector-based Smith-Waterman alignment

zhengqunkoo/smith-waterman • about 7 years • view on GitHub

The sequence padding logic has been updated for when the input sequences don't perfectly align with the vector size. The a-sequence is now properly padded with a designated illegal character, and the substitution matrix S is carefully resized to accommodate this padding. We also ensure that padding characters correctly receive a large negative score so they don't skew or artificially inflate the alignment results.

pushed

Fixed OR vector operations in Smith-Waterman parallel computation

zhengqunkoo/smith-waterman • about 7 years • view on GitHub

This commit addresses a bug in the OR vector operation logic where elements could be incorrectly overwritten by zeros during vector shifts. By adding conditional checks before assigning shifted values, it ensures that an element ORed with a zero-element correctly retains its original value. This ensures accurate alignment score and gap penalty calculations in the parallelized algorithm.

pushed

Revert memory optimization in Smith-Waterman H-matrix

zhengqunkoo/smith-waterman • about 7 years • view on GitHub

This update reverts a previous memory optimization that converted the H-matrix from using Cell objects to primitive Long values. It appears that stripping away the Cell object also removed necessary tracking metadata (like cell coordinates), causing issues in the alignment logic. The matrix is once again leveraging Cells, trading that intended memory efficiency back for correctness.

It's Rewind time

pushed

Reinstated the H-matrix cell indexing change in SmithWatermanParVect by reverting an earlier revert, restoring the intended coordinate handling in the vectorized Smith-Waterman path.

zhengqunkoo/smith-waterman • about 7 years • view on GitHub

This commit effectively reapplies a previously backed-out change in the parallel vectorized Smith-Waterman implementation. Most of the diff is comment cleanup, but the meaningful part restores the max-score cell update from zero-based to one-based coordinates when constructing maxH, which affects how the best alignment position is reported. For anyone consuming alignment results from parVect, this should bring the reported H-matrix location back in line with the intended semantics.

pushed

Fixed a scoring correctness issue in the vectorized Smith-Waterman implementation

zhengqunkoo/smith-waterman • about 7 years • view on GitHub

A missing shift operation was causing incorrect score calculations during sequence alignment. The logic for calculating the initial F-vector has been corrected by applying the necessary left shift before subtracting the gap penalty. This ensures accurate alignment scoring in the parallel vectorized version of the algorithm.

pushed

Optimize H-matrix representation in Smith-Waterman implementation to reduce memory overhead.

zhengqunkoo/smith-waterman • about 7 years • view on GitHub

The H-matrix type in the parallel vectorized Smith-Waterman aligner was updated from an array of Cell objects to an array of raw Long values. By avoiding the instantiation of a wrapper object for every coordinate pair, this significantly cuts down the number of allocations required during execution. Ultimately, this refactor makes memory usage far more efficient when performing sequence alignments on a large scale.

pushed

Fix off-by-one coordinate error in parallel Smith-Waterman scoring

zhengqunkoo/smith-waterman • about 7 years • view on GitHub

This update resolves an off-by-one indexing error when tracking the maximum score cell in the H-matrix for the parallel vector implementation of Smith-Waterman. It also comprehensively refactors the inline documentation describing vector operations and gap penalty logic during matrix generation. Correcting these coordinates ensures the expected origin point is captured for the alignment's backtrace phase.

Mind the off-by-ones

pushed

Clarified the Smith-Waterman vectorized alignment implementation by replacing placeholder comments with algorithm-specific documentation.

zhengqunkoo/smith-waterman • about 7 years • view on GitHub

This change updates the inline comments in SmithWatermanParVect.x10 to better describe the actual roles of the vector registers and arrays used in the Rognes-style Smith-Waterman inner loop. The code path itself does not change, but the documentation now explains concepts like H/E/F vector updates, gap penalty handling, and how state is carried between iterations, which makes the SIMD-style logic much easier to follow and maintain. Practical effect: no runtime behavior change, but future debugging and algorithm work should be less error-prone.

pushed

Added explanatory comments to the parallel Smith-Waterman implementations to make the anti-diagonal and vectorized algorithms easier to follow.

zhengqunkoo/smith-waterman • about 7 years • view on GitHub

This change is documentation-focused: it annotates the parallel and vectorized X10 implementations with clearer comments around matrix initialization, anti-diagonal scheduling, vector register flow, gap-penalty handling, and backtracking-related state. The underlying algorithms are largely unchanged, but the code is now much easier to read and maintain, especially in the more complex SIMD-style path where temporary vectors and shifts were previously hard to reason about. Practical effect: future contributors should have a much easier time understanding and modifying the parallel alignment code.

- End of feed -