Bug fix: fixed the wrong logic in AlignmentCount.rx2idx() that caused incorrect alignment counting in PMI distance calculation.
Fixed several bugs and improved performance.
Bug fixes:
alignment.cpp, fixed a bug that caused incorrect alignment results. All of the aligned sequences were accumulated into the last sequence.NA.dist_wjd.cpp, fixed a bug that caused incorrect WJD distance. wjd_form would never return 0.0 in the multi-category case.dist_wjd.cpp, added checks for vector lengths of multi_form_weights and multi_form_cats.dist_wjd.cpp, added checks for multi_form_weights sum being greater than 0.cost_table.cpp, added checks for symbols existing in both row and column names of the cost matrix.Performance improvements:
alignment.cpp, rewrited the DFS algorithm. Avoid insert to the front of a vector, which is slow.dist_pmi.cpp, rewrited the alignment counting logic with a new structure AlignmentCounter. Now we use a dense vector to store counts instead of a hash map, and a vector<string> for string-to-index searching. This is faster when the number of unique symbols is small, which is usually the case.Add quite parameters to all distance functions to control printing messages and progress bars.
Several improvements have been made:
long2squareform fills diagonal with 0 now; this can be specified with default_diag parameter.CostTable is reconstructed to support both a normal and a fast implementation. In the fast case, get_cost will not lookup the cost table but directly return values to fasten classic edit distance computation.pw_edit_dist now:
check_missing_cost = FALSE.default_sub_cost and default_ins_del_cost parameters.pw_pmi_dist.Added WJD distance supporting hierarchical categorical data with multiple forms.