This object looks for the most likeley partitions or segments of 2D data. More...
#include <LSDMostLikelyPartitionsFinder.hpp>
Public Member Functions | |
LSDMostLikelyPartitionsFinder (int this_min_seg_length, vector< float > this_x_data, vector< float > this_y_data) | |
Create LSDMostLikelyPartitionsFinder object. More... | |
int | get_n_nodes () |
vector< float > | get_MLE_of_segments () |
vector< float > | get_x_data () |
vector< float > | get_y_data () |
void | reset_derived_data_members () |
Resets all the derived data members. More... | |
void | thin_data_target_dx_preserve_data (float dx) |
Collects data as close as possible to some target dx, but does not modify invidivual data points. More... | |
void | thin_data_target_dx_preserve_data (float dx, vector< int > &node_ref) |
Collects data as close as possible to some target dx, but does not modify invidivual data points. More... | |
LSDMostLikelyPartitionsFinder | spawn_thinned_data_target_dx_preserve_data (float dx) |
Collects data as close as possible to some target dx, but does not modify invidivual data points. More... | |
void | thin_data_skip (int N, vector< int > &node_ref) |
This thins the data by skipping elements. More... | |
void | thin_data_target_dx_linear_interpolation (float dx) |
Recasts data at fixed values of dx, using linear interpoloation. More... | |
LSDMostLikelyPartitionsFinder | spawn_thinned_data_target_dx_linear_interpolation (float dx) |
Recasts data at fixed values of dx, using linear interpoloation. More... | |
void | thin_data_monte_carlo_skip (int Mean_skip, int skip_range, vector< int > &node_ref) |
Skips nodes but using a Monte Carlo scheme that samples random points along the channel profile. More... | |
void | thin_data_monte_carlo_dchi (float mean_dchi, float variation_dchi, vector< int > &node_ref) |
Thins object based on a monte carlo approach using a mean, max and minimum dchi. More... | |
void | print_x_y_data_to_screen () |
Function for looking at the x and y data. More... | |
void | best_fit_driver_AIC_for_linear_segments (vector< float > sigma_values) |
Driver function to get best fit segments. More... | |
void | best_fit_driver_AIC_for_linear_segments (float sigma_values) |
Driver function to get best fit segments. More... | |
void | get_data_from_best_fit_lines (int node, vector< float > sigma_values, vector< float > &b_values, vector< float > &m_values, vector< float > &r2_values, vector< float > &DW_values, vector< float > &fitted_y, vector< int > &seg_lengths, float &this_MLE, int &this_n_segments, int &this_n_nodes, float &this_AIC, float &this_AICc) |
Function returns data for a given sigma value. More... | |
void | get_start_and_end_x_for_segments (vector< float > &start_x, vector< float > &end_x, vector< int > seg_lengths) |
Replaces two vectors, which have the starting and ending position of the best fit segments for a given sigma. More... | |
void | calculate_segment_matrices (float sigma) |
This function is used to calculate the slope, intercept, and likelihood of all possible linear segments along a series of data points. More... | |
void | populate_segment_matrix (int start_node, int end_node, float no_data_value, float sigma) |
This function popultes the matrices of liklihood, m and b values. More... | |
void | find_max_like_of_segments () |
Function calculates the most likeley combination of segments given the liklihood of individual segments calcualted by the calculate_segment_matrices function. More... | |
void | partition_driver_to_vecvecvec (int k) |
This function drives the partitioning algorithms. More... | |
void | partitions_with_minimum_length (int n, int k, int t, vector< int > &p) |
An integer partition algorithm. More... | |
void | partition_assign (int t, vector< int > &p) |
Function for use with the permutations this assigns values into the vecvecvec that contains all the partitioning information. More... | |
int | LSDpartitions_min (int x, int y) |
Function for use with the permutations gets the mininum of two values. More... | |
Array2D< float > | normalize_like_matrix_to_sigma_one (float sigma) |
This takes a likelihood array that has been calcualted with a given sigma value and normalizes the sigma values as though sigma was equal to 1. More... | |
vector< float > | normalize_like_vector_to_sigma_one (float sigma, vector< float > like_vector) |
Normalizes but with vector data, for use with MLE vector for segments. More... | |
void | change_normalized_like_matrix_to_new_sigma (float sigma, Array2D< float > &sig1_like_array) |
Takes a normalized likelihood array and updates the values to a new sigma value. More... | |
vector< float > | change_normalized_like_vector_to_new_sigma (float sigma, vector< float > sig1_like_vector) |
Takes a normalized likelihood vector and updates the values to a new sigma value. More... | |
vector< float > | transform_like_from_sigma1_to_sigma2 (float sigma1, vector< float > sig1_like_vector, float sigma2) |
Takes a normalized likelihood vector and updates the values to a new sigma value. More... | |
void | get_n_segments_for_various_sigma (vector< float > sigma_values) |
Function takes the normalized MLE values (normalized with sigma = 1) and returns the best fit number of segments from both the AIC and the AICc measures. More... | |
void | calculate_AIC_of_segments_with_variable_sigma (float sigma, vector< float > &AIC_of_segments, vector< float > &AICc_of_segments) |
Function calculates AIC and AICc of segments taking the maximum_MLE based on a sigma of one. More... | |
void | get_properties_of_best_fit_segments (int bestfit_segments_node, vector< float > &m_values, vector< float > &b_values, vector< float > &r2_values, vector< float > &DW_values) |
Function extracts the m, b, r^2 and DW statistic of the most likeley segments. More... | |
void | print_to_screen_most_likeley_segment_lengths () |
Function prints the most likeley segment lengths to screen. More... | |
void | print_AIC_and_AICc_to_screen (vector< float > sigma_values) |
Function prints AIC and AICc values to screen. More... | |
Protected Attributes | |
int | minimum_segment_length |
Minimum segment length. | |
vector< float > | x_data |
X data. | |
vector< float > | y_data |
Y data. | |
float | base_sigma |
The base sigma value from which the MLE of the segments is calcluated. | |
Array2D< float > | like_array |
Liklihood array. Indexed so the row is the starting node and the column is the ending node. | |
Array2D< float > | m_array |
Slope array. Indexed so the row is the starting node and the column is the ending node. | |
Array2D< float > | b_array |
Intercept array. Indexed so the row is the starting node and the column is the ending node. | |
Array2D< float > | rsquared_array |
R^2 array. Indexed so the row is the starting node and the column is the ending node. | |
Array2D< float > | DW_array |
Array of Durbin-Watson statistics to test if the residuals are autocorrelated. More... | |
vector< float > | MLE_of_segments |
Maximum likelihood of the different number of segments. | |
vector< vector< int > > | segments_for_each_n_segments |
Each element of this vector contains the most likeley segments for that number of segments. More... | |
vector< vector< vector< int > > > | partitions |
This is vecvecvec. More... | |
vector< int > | best_fit_AIC |
Vector of best fit AIC values. | |
vector< int > | best_fit_AICc |
Vector of best fit AICc values. | |
vector< vector< float > > | AIC_for_each_n_segments |
Vector of vectors of AIC values. | |
vector< vector< float > > | AICc_for_each_n_segments |
Vector of vectors of AICc values. | |
This object looks for the most likeley partitions or segments of 2D data.
|
inline |
Create LSDMostLikelyPartitionsFinder object.
this_min_seg_length | int Minimum segement length required. |
this_x_data | vector<float> X data. |
this_y_data | vector<float> Y data. |
void LSDMostLikelyPartitionsFinder::best_fit_driver_AIC_for_linear_segments | ( | vector< float > | sigma_values | ) |
Driver function to get best fit segments.
sigma_values | vector<float> a vector containing sigma values for each node |
void LSDMostLikelyPartitionsFinder::best_fit_driver_AIC_for_linear_segments | ( | float | sigma_values | ) |
Driver function to get best fit segments.
sigma_values |
void LSDMostLikelyPartitionsFinder::calculate_AIC_of_segments_with_variable_sigma | ( | float | sigma, |
vector< float > & | AIC_of_segments, | ||
vector< float > & | AICc_of_segments | ||
) |
Function calculates AIC and AICc of segments taking the maximum_MLE based on a sigma of one.
sigma | Standard deviation of error. |
AIC_of_segments | |
AICc_of_segments |
void LSDMostLikelyPartitionsFinder::calculate_segment_matrices | ( | float | sigma | ) |
This function is used to calculate the slope, intercept, and likelihood of all possible linear segments along a series of data points.
The function requires the data in x and y vectors, the maximum segment length and sigma, the standard deviation of the measured data. This will be approxamately the error in the surface elevation, although it might have to be increased simply because likelihood will tend to zero if this is too small. sigma should also be considered to contain the 'noise' inherent in channel incision so perhaps 1-5 metres is appropriate the maximum segment length is an integer: it is the number of data points used. these data points from raw chi data are irregularly spaced so two segments of the same 'length' can have different lengths in chi space. One remedey for this is a preprocessor that places the zeta vs chi data along evenly spaced points.
The routine generates three matrices. The row of the matrix is the starting node of the segment. The column of the matrix is the ending node of the segment. Thus the routine will generate a matrix that is dimension n x n where n is the number of data points.
One potential future development is to implement this using a sparse matrix from the boost mtl library to reduce the memory usage.
sigma | Standard deviation of error. |
void LSDMostLikelyPartitionsFinder::change_normalized_like_matrix_to_new_sigma | ( | float | sigma, |
Array2D< float > & | sig1_like_array | ||
) |
Takes a normalized likelihood array and updates the values to a new sigma value.
sigma | Standard deviation of error. |
sig1_like_array |
vector< float > LSDMostLikelyPartitionsFinder::change_normalized_like_vector_to_new_sigma | ( | float | sigma, |
vector< float > | sig1_like_vector | ||
) |
Takes a normalized likelihood vector and updates the values to a new sigma value.
sigma | Standard deviation of error. |
sig1_like_vector |
void LSDMostLikelyPartitionsFinder::find_max_like_of_segments | ( | ) |
Function calculates the most likeley combination of segments given the liklihood of individual segments calcualted by the calculate_segment_matrices function.
void LSDMostLikelyPartitionsFinder::get_data_from_best_fit_lines | ( | int | node, |
vector< float > | sigma_values, | ||
vector< float > & | b_values, | ||
vector< float > & | m_values, | ||
vector< float > & | r2_values, | ||
vector< float > & | DW_values, | ||
vector< float > & | fitted_y, | ||
vector< int > & | seg_lengths, | ||
float & | this_MLE, | ||
int & | this_n_segments, | ||
int & | this_n_nodes, | ||
float & | this_AIC, | ||
float & | this_AICc | ||
) |
Function returns data for a given sigma value.
this_MLE, this_n_segments and this_n_nodes are all returned so the user can combine two or more segments and get an AIC or AICc.
node | index into the sigma vector. This requires a bit of explantion: because sigma can be extracted from the product that calculates MLE as a constant we can have multiple values of sigma and recalculate MLE for any given sigma When this code was written we added a function to modify the sigma values. In this case there are no normalisation values applied so if you use a random sigma here you will get the base sigma (supplied to the main computation function) |
sigma_values | |
b_values | A vector containing the b value (intercept) of each segment |
m_values | A vector containing the m value (slope) of each segment |
r2_values | A vector containing the r^2 value of each segment |
DW_values | A vector containing the durbin-watson value of each segment |
fitted_y | A vector containing the y value of each segment |
seg_lengths | A vector with the lengths of each segment |
this_MLE | The MLE of the best fit. |
this_n_segments | The number of segments |
this_n_nodes | The number of total nodes |
this_AIC | The Aikake information criterion |
this_AICc | The corrected Aikake information criterion (needed for finite sample size) |
|
inline |
|
inline |
void LSDMostLikelyPartitionsFinder::get_n_segments_for_various_sigma | ( | vector< float > | sigma_values | ) |
Function takes the normalized MLE values (normalized with sigma = 1) and returns the best fit number of segments from both the AIC and the AICc measures.
It also returns two vector of vectors which are the AIC values for the varius values of sigma passed to the function in the sigma values vector.
sigma_values | vector of sigma values. |
void LSDMostLikelyPartitionsFinder::get_properties_of_best_fit_segments | ( | int | bestfit_segments_node, |
vector< float > & | m_values, | ||
vector< float > & | b_values, | ||
vector< float > & | r2_values, | ||
vector< float > & | DW_values | ||
) |
Function extracts the m, b, r^2 and DW statistic of the most likeley segments.
bestfit_segments_node | |
m_values | |
b_values | |
r2_values | |
DW_values |
void LSDMostLikelyPartitionsFinder::get_start_and_end_x_for_segments | ( | vector< float > & | start_x, |
vector< float > & | end_x, | ||
vector< int > | seg_lengths | ||
) |
Replaces two vectors, which have the starting and ending position of the best fit segments for a given sigma.
This gets data from (most likeley) the get_data_from_best_fit_lines function.
start_x | |
end_x | |
seg_lengths |
|
inline |
|
inline |
int LSDMostLikelyPartitionsFinder::LSDpartitions_min | ( | int | x, |
int | y | ||
) |
Function for use with the permutations gets the mininum of two values.
x | |
y |
Array2D< float > LSDMostLikelyPartitionsFinder::normalize_like_matrix_to_sigma_one | ( | float | sigma | ) |
This takes a likelihood array that has been calcualted with a given sigma value and normalizes the sigma values as though sigma was equal to 1.
sigma | Standard deviation of error. |
vector< float > LSDMostLikelyPartitionsFinder::normalize_like_vector_to_sigma_one | ( | float | sigma, |
vector< float > | like_vector | ||
) |
Normalizes but with vector data, for use with MLE vector for segments.
sigma | Standard deviation of error. |
like_vector | Likelihood vector. |
void LSDMostLikelyPartitionsFinder::partition_assign | ( | int | t, |
vector< int > & | p | ||
) |
Function for use with the permutations this assigns values into the vecvecvec that contains all the partitioning information.
t | |
p |
void LSDMostLikelyPartitionsFinder::partition_driver_to_vecvecvec | ( | int | k | ) |
This function drives the partitioning algorithms.
k | Number of elements in the partition. |
void LSDMostLikelyPartitionsFinder::partitions_with_minimum_length | ( | int | n, |
int | k, | ||
int | t, | ||
vector< int > & | p | ||
) |
An integer partition algorithm.
Algorithm and original Pascal implementation: Frank Ruskey, 1995.
Translation to C: Joe Sawada, 1997
grabbed from http://theory.cs.uvic.ca/inf/nump/NumPartition.html
adapted smm 21/12/2012
algorith described in
http://mathworld.wolfram.com/PartitionFunctionP.html
and
Skiena, S. Implementing Discrete Mathematics: Combinatorics and Graph Theory with Mathematica. Reading, MA: Addison-Wesley, 1990.
This is a further adaptation that only presents solution to the partition with segments of a minimum length it stores all the partitions.
http://www.codeguru.com/cpp/cpp/algorithms/article.php/c5123/Permutations-in-C.htm
http://www.cplusplus.com/reference/algorithm/next_permutation/
http://mdm4u1.wetpaint.com/page/4.3+Permutations+with+Some+Identical+Elements
n | |
k | |
t | |
p |
void LSDMostLikelyPartitionsFinder::populate_segment_matrix | ( | int | start_node, |
int | end_node, | ||
float | no_data_value, | ||
float | sigma | ||
) |
This function popultes the matrices of liklihood, m and b values.
It is a recursive algorithm so in fact it doesn't just get one row but drills down through all the possible starting nodes to complete the matrix.
start_node | |
end_node | |
no_data_value | No data value |
sigma | Standard deviation of error. |
void LSDMostLikelyPartitionsFinder::print_AIC_and_AICc_to_screen | ( | vector< float > | sigma_values | ) |
Function prints AIC and AICc values to screen.
void LSDMostLikelyPartitionsFinder::print_to_screen_most_likeley_segment_lengths | ( | ) |
Function prints the most likeley segment lengths to screen.
void LSDMostLikelyPartitionsFinder::print_x_y_data_to_screen | ( | ) |
Function for looking at the x and y data.
void LSDMostLikelyPartitionsFinder::reset_derived_data_members | ( | ) |
Resets all the derived data members.
Derived data members are AIC, AICc, fit statistics, liklihoods (in the liklihood matrix, etc.) Basically everything except for the x and y data.
LSDMostLikelyPartitionsFinder LSDMostLikelyPartitionsFinder::spawn_thinned_data_target_dx_linear_interpolation | ( | float | dx | ) |
Recasts data at fixed values of dx, using linear interpoloation.
dx | Target dx for thinning. |
LSDMostLikelyPartitionsFinder LSDMostLikelyPartitionsFinder::spawn_thinned_data_target_dx_preserve_data | ( | float | dx | ) |
Collects data as close as possible to some target dx, but does not modify invidivual data points.
dx | Target dx for thinning. |
void LSDMostLikelyPartitionsFinder::thin_data_monte_carlo_dchi | ( | float | mean_dchi, |
float | variation_dchi, | ||
vector< int > & | node_ref | ||
) |
Thins object based on a monte carlo approach using a mean, max and minimum dchi.
mean_dchi | |
variation_dchi | |
node_ref | An index vector of the data points that were selected. |
void LSDMostLikelyPartitionsFinder::thin_data_monte_carlo_skip | ( | int | Mean_skip, |
int | skip_range, | ||
vector< int > & | node_ref | ||
) |
Skips nodes but using a Monte Carlo scheme that samples random points along the channel profile.
Mean_skip | |
skip_range | |
node_ref | An index vector of the data points that were selected. |
void LSDMostLikelyPartitionsFinder::thin_data_skip | ( | int | N, |
vector< int > & | node_ref | ||
) |
This thins the data by skipping elements.
A positive number N means it skips N elements after each element.
A negative number -N means that after N elements it skips one element.
N | skip value. |
node_ref | An index vector of the data points that were selected. |
void LSDMostLikelyPartitionsFinder::thin_data_target_dx_linear_interpolation | ( | float | dx | ) |
Recasts data at fixed values of dx, using linear interpoloation.
dx | Target dx for thinning. |
void LSDMostLikelyPartitionsFinder::thin_data_target_dx_preserve_data | ( | float | dx | ) |
Collects data as close as possible to some target dx, but does not modify invidivual data points.
dx | Target dx for thinning. |
void LSDMostLikelyPartitionsFinder::thin_data_target_dx_preserve_data | ( | float | dx, |
vector< int > & | node_ref | ||
) |
Collects data as close as possible to some target dx, but does not modify invidivual data points.
dx | Target dx for thinning. |
node_ref | An index vector of the data points that were selected. |
vector< float > LSDMostLikelyPartitionsFinder::transform_like_from_sigma1_to_sigma2 | ( | float | sigma1, |
vector< float > | sig1_like_vector, | ||
float | sigma2 | ||
) |
Takes a normalized likelihood vector and updates the values to a new sigma value.
sigma1 | Standard deviation of error. |
sig1_like_vector | |
sigma2 | Standard deviation of error. |
|
protected |
Array of Durbin-Watson statistics to test if the residuals are autocorrelated.
Used to determine if the segment is truly linear. Values less than 1 indicate that the segment is probably not linear values < 1.5 should arouse suspicion. Indexed so the row is the starting node and the column is the ending node.
|
protected |
This is vecvecvec.
Top index references the number of segments. Second layer loops through the possible partitions of that number of segments. Third layer is the individual segment lengths.
|
protected |
Each element of this vector contains the most likeley segments for that number of segments.
So for example segments_for_each_n_segments[3] is a vector containing the lengths of the most likeley segments for 4 segments (note 0 indexing).