HiPipe
0.7.0
C++17 data pipeline with Python bindings.
|
Functions | |
template<typename Prng = std::mt19937&> | |
std::vector< std::size_t > | hipipe::generate_groups (std::size_t size, std::vector< double > ratio, Prng &&gen=utility::random_generator) |
Randomly group data into multiple clusters with a given ratio. More... | |
template<typename Prng = std::mt19937&> | |
std::vector< std::vector< std::size_t > > | hipipe::generate_groups (std::size_t n, std::size_t size, const std::vector< double > &volatile_ratio, const std::vector< double > &fixed_ratio, Prng &&gen=utility::random_generator) |
Randomly group data into multiple clusters with a given ratio. More... | |
std::vector<std::vector<std::size_t> > hipipe::generate_groups | ( | std::size_t | n, |
std::size_t | size, | ||
const std::vector< double > & | volatile_ratio, | ||
const std::vector< double > & | fixed_ratio, | ||
Prng && | gen = utility::random_generator |
||
) |
Randomly group data into multiple clusters with a given ratio.
In this overload, multiple clusterings of the given size are generated. Some of the elements are supposed to be fixed, i.e., to have the same group assigned in all the clusterings. The rest of the data are volatile and their group may differ between the clusterings.
This function is convenient e.g., if you want to split the data into train/valid/test groups multiple times (e.g., for ensemble training or x-validation) and you want to have the same test group in all the splits.
Example:
n | The number of different groupings. |
size | The size of the data, i.e., the number of elements. |
volatile_ratio | The ratio of volatile groups (i.e., groups that change between groupings). |
fixed_ratio | The ratio of groups that are assigned equally in all groupings. |
gen | The random generator to be used. |
Definition at line 111 of file groups.hpp.
std::vector<std::size_t> hipipe::generate_groups | ( | std::size_t | size, |
std::vector< double > | ratio, | ||
Prng && | gen = utility::random_generator |
||
) |
Randomly group data into multiple clusters with a given ratio.
Example:
If the ratios do not exactly split the requested number of elements, the last group with non-zero ratio gets all the remaining elements.
size | The size of the data, i.e., the number of elements. |
ratio | Cluster size ratio. The ratios have to be non-negative and the sum of ratios has to be positive. |
gen | The random generator to be used. |
Definition at line 53 of file groups.hpp.