HiPipe  0.7.0
C++17 data pipeline with Python bindings.
Functions
Data splitting.

Functions

template<typename Prng = std::mt19937&>
std::vector< std::size_t > hipipe::generate_groups (std::size_t size, std::vector< double > ratio, Prng &&gen=utility::random_generator)
 Randomly group data into multiple clusters with a given ratio. More...
 
template<typename Prng = std::mt19937&>
std::vector< std::vector< std::size_t > > hipipe::generate_groups (std::size_t n, std::size_t size, const std::vector< double > &volatile_ratio, const std::vector< double > &fixed_ratio, Prng &&gen=utility::random_generator)
 Randomly group data into multiple clusters with a given ratio. More...
 

Detailed Description

Function Documentation

◆ generate_groups() [1/2]

template<typename Prng = std::mt19937&>
std::vector<std::vector<std::size_t> > hipipe::generate_groups ( std::size_t  n,
std::size_t  size,
const std::vector< double > &  volatile_ratio,
const std::vector< double > &  fixed_ratio,
Prng &&  gen = utility::random_generator 
)

Randomly group data into multiple clusters with a given ratio.

In this overload, multiple clusterings of the given size are generated. Some of the elements are supposed to be fixed, i.e., to have the same group assigned in all the clusterings. The rest of the data are volatile and their group may differ between the clusterings.

This function is convenient e.g., if you want to split the data into train/valid/test groups multiple times (e.g., for ensemble training or x-validation) and you want to have the same test group in all the splits.

Example:

generate_groups(3, 5, {2, 1}, {2});
// == e.g. {{0, 2, 1, 2, 0},
// {1, 2, 0, 2, 1},
// {1, 2, 1, 2, 0}}
// note that group 2 is assigned equally in all the groupings
Parameters
nThe number of different groupings.
sizeThe size of the data, i.e., the number of elements.
volatile_ratioThe ratio of volatile groups (i.e., groups that change between groupings).
fixed_ratioThe ratio of groups that are assigned equally in all groupings.
genThe random generator to be used.

Definition at line 111 of file groups.hpp.

◆ generate_groups() [2/2]

template<typename Prng = std::mt19937&>
std::vector<std::size_t> hipipe::generate_groups ( std::size_t  size,
std::vector< double >  ratio,
Prng &&  gen = utility::random_generator 
)

Randomly group data into multiple clusters with a given ratio.

Example:

generate_groups(10, {2, 2, 6}) // == e.g. {1, 2, 2, 1, 0, 2, 2, 2, 0, 2}

If the ratios do not exactly split the requested number of elements, the last group with non-zero ratio gets all the remaining elements.

Parameters
sizeThe size of the data, i.e., the number of elements.
ratioCluster size ratio. The ratios have to be non-negative and the sum of ratios has to be positive.
genThe random generator to be used.

Definition at line 53 of file groups.hpp.

hipipe::generate_groups
std::vector< std::size_t > generate_groups(std::size_t size, std::vector< double > ratio, Prng &&gen=utility::random_generator)
Randomly group data into multiple clusters with a given ratio.
Definition: groups.hpp:53