HiPipe
0.7.0
C++17 data pipeline with Python bindings.
|
Classes | |
class | hipipe::stream::batch |
Container for multiple columns. More... | |
class | hipipe::stream::abstract_column |
Abstract base class for HiPipe columns. More... | |
class | hipipe::stream::column_base< ColumnName, ExampleType > |
Implementation stub of a column defined by HIPIPE_DEFINE_COLUMN macro. More... | |
Macros | |
#define | HIPIPE_DEFINE_COLUMN(column_name_, example_type_) |
Macro for fast column definition. More... | |
Typedefs | |
using | hipipe::stream::forward_stream_t = ranges::any_view< batch_t, ranges::category::forward > |
The stream itself, i.e., a range of batches. More... | |
using | hipipe::stream::input_stream_t = ranges::any_view< batch_t, ranges::category::input > |
The stream type after special eager operations. More... | |
Functions | |
template<typename... FromColumns, typename... ToColumns> | |
auto | hipipe::stream::copy (from_t< FromColumns... > from_cols, to_t< ToColumns... > to_cols) |
Copy the data from FromColumns to the respective ToColumns. More... | |
template<typename... FromColumns, typename... ByColumns, typename Fun , int Dim = 1> | |
auto | hipipe::stream::filter (from_t< FromColumns... > f, by_t< ByColumns... > b, Fun fun, dim_t< Dim > d=dim_t< 1 >{}) |
Filter stream data. More... | |
template<typename... FromColumns, typename Fun , int Dim = 1> | |
auto | hipipe::stream::for_each (from_t< FromColumns... > f, Fun fun, dim_t< Dim > d=dim_t< 1 >{}) |
Apply a function to a subset of stream columns. More... | |
template<typename FromColumn , typename ToColumn , typename Gen , int Dim = utility::ndims<typename ToColumn::data_type>::value - utility::ndims<std::result_of_t<Gen()>>::value> | |
auto | hipipe::stream::generate (from_t< FromColumn > size_from, to_t< ToColumn > fill_to, Gen gen, long gendims=std::numeric_limits< long >::max(), dim_t< Dim > d=dim_t< Dim >{}) |
Fill the selected column using a generator (i.e., a nullary function). More... | |
template<typename FromColumn , typename MaskColumn , typename ValT = typename utility::ndim_type_t< typename FromColumn::data_type, utility::ndims<typename MaskColumn::data_type>::value>> | |
auto | hipipe::stream::pad (from_t< FromColumn > f, mask_t< MaskColumn > m, ValT value=ValT{}) |
Pad the selected column to a rectangular size. More... | |
template<typename FromColumn , typename ToColumn , typename Prng = std::mt19937, typename Dist = std::uniform_real_distribution<double>, int Dim = utility::ndims<typename ToColumn::data_type>::value - utility::ndims<std::result_of_t<Dist(Prng&)>>::value> | |
auto | hipipe::stream::random_fill (from_t< FromColumn > size_from, to_t< ToColumn > fill_to, long rnddims=std::numeric_limits< long >::max(), Dist dist=Dist{0, 1}, Prng &prng=hipipe::utility::random_generator, dim_t< Dim > d=dim_t< Dim >{}) |
Fill the selected column of a stream with random values. More... | |
template<typename... FromColumns, typename... ToColumns, typename Fun , int Dim = 1> | |
auto | hipipe::stream::transform (from_t< FromColumns... > f, to_t< ToColumns... > t, Fun fun, dim_t< Dim > d=dim_t< 1 >{}) |
Transform a subset of hipipe columns to a different subset of hipipe columns. More... | |
template<typename... FromColumns, typename... ToColumns, typename CondColumn , typename Fun , int Dim = 1> | |
auto | hipipe::stream::transform (from_t< FromColumns... > f, to_t< ToColumns... > t, cond_t< CondColumn > c, Fun fun, dim_t< Dim > d=dim_t< 1 >{}) |
Conditional transform of a subset of hipipe columns. More... | |
template<typename... FromColumns, typename... ToColumns, typename Fun , typename Prng = std::mt19937, int Dim = 1> | |
auto | hipipe::stream::transform (from_t< FromColumns... > f, to_t< ToColumns... > t, double prob, Fun fun, Prng &prng=utility::random_generator, dim_t< Dim > d=dim_t< 1 >{}) |
Probabilistic transform of a subset of hipipe columns. More... | |
template<typename Rng , typename... FromColumns, int Dim = 1> | |
auto | hipipe::stream::unpack (Rng &&rng, from_t< FromColumns... > f, dim_t< Dim > d=dim_t< 1 >{}) |
Unpack a stream into a tuple of ranges. More... | |
Variables | |
rgv::view< buffer_fn > | hipipe::stream::buffer {} |
Asynchronously buffers the given range. More... | |
template<typename... Columns> | |
rgv::view< detail::create_fn< Columns... > > | hipipe::stream::create {} |
Converts a data range to a HiPipe stream. More... | |
template<typename... Columns> | |
rgv::view< detail::drop_fn< Columns... > > | hipipe::stream::drop {} |
Drops columns from a stream. More... | |
template<typename... Columns> | |
rgv::view< detail::keep_fn< Columns... > > | hipipe::stream::keep {} |
Keep the specified columns in the stream, drop everything else. More... | |
rgv::view< rebatch_fn > | hipipe::stream::rebatch {} |
Accumulate the stream and yield batches of a different size. More... | |
#define HIPIPE_DEFINE_COLUMN | ( | column_name_, | |
example_type_ | |||
) |
Macro for fast column definition.
Under the hood, it creates a new type derived from column_base.
Definition at line 250 of file column_t.hpp.
using hipipe::stream::forward_stream_t = typedef ranges::any_view<batch_t, ranges::category::forward> |
The stream itself, i.e., a range of batches.
Unless specified otherwise, the stream transformers expect this type and return this type. Exceptions are e.g. Stream modifiers and data types. stream::rebatch.
Definition at line 29 of file stream_t.hpp.
using hipipe::stream::input_stream_t = typedef ranges::any_view<batch_t, ranges::category::input> |
The stream type after special eager operations.
For instance, stream::rebatch reduces the stream to input_range and returns this type. Stream of such type cannot be further transformed.
Definition at line 37 of file stream_t.hpp.
auto hipipe::stream::copy | ( | from_t< FromColumns... > | from_cols, |
to_t< ToColumns... > | to_cols | ||
) |
Copy the data from FromColumns to the respective ToColumns.
The data from i-th FromColumn are copied to i-th ToColumn. Note that the ToColumns examples must be constructible from their FromColumns counterparts.
Example:
from_cols | The source columns. |
to_cols | The target columns. |
auto hipipe::stream::filter | ( | from_t< FromColumns... > | f, |
by_t< ByColumns... > | b, | ||
Fun | fun, | ||
dim_t< Dim > | d = dim_t<1>{} |
||
) |
Filter stream data.
Example:
f | The columns to be filtered. |
b | The columns to be passed to the filtering function. Those have to be a subset of f. |
fun | The filtering function returning a boolean. |
d | The dimension in which the function is applied. Choose 0 to filter whole batches (in such a case, the f parameter is ignored). |
Definition at line 154 of file filter.hpp.
auto hipipe::stream::for_each | ( | from_t< FromColumns... > | f, |
Fun | fun, | ||
dim_t< Dim > | d = dim_t<1>{} |
||
) |
Apply a function to a subset of stream columns.
The given function is applied to a subset of columns given by FromColumns. The function is applied lazily, i.e., only when the range is iterated.
Example:
f | The columns to be exctracted out of the tuple of columns and passed to fun. |
fun | The function to be applied. |
d | The dimension in which the function is applied. Choose 0 for the function to be applied to the whole batch. |
Definition at line 68 of file for_each.hpp.
auto hipipe::stream::generate | ( | from_t< FromColumn > | size_from, |
to_t< ToColumn > | fill_to, | ||
Gen | gen, | ||
long | gendims = std::numeric_limits<long>::max() , |
||
dim_t< Dim > | d = dim_t<Dim>{} |
||
) |
Fill the selected column using a generator (i.e., a nullary function).
This function uses utility::generate(). Furthermore, the column to be filled is first resized so that it has the same size as the selected source column.
Tip: If there is no column the size could be taken from, than just resize the target column manually and use it as both from
column and to
column.
Example:
size_from | The column whose size will be used to initialize the generated column. |
fill_to | The column to be filled using the generator. |
gen | The generator to be used. |
gendims | The number of generated dimensions. See utility::generate(). |
d | This is the dimension in which will the generator be applied. E.g., if set to 1, the generator result is considered to be a single example. The default is ndims<ToColumn::data_type> - ndims<gen()>. This value has to be positive. |
Definition at line 89 of file generate.hpp.
auto hipipe::stream::pad | ( | from_t< FromColumn > | f, |
mask_t< MaskColumn > | m, | ||
ValT | value = ValT{} |
||
) |
Pad the selected column to a rectangular size.
Each batch is padded separately.
The mask of the padded values is created along with the padding. The mask evaluates to true
on the positions with the original elements and to false
on the positions of the padded elements. The mask column should be a multidimensional vector of type bool/char/int/... The dimensionality of the mask column is used to deduce how many dimensions should be padded in the source column.
This transformer internally uses utility::ndim_pad().
Example:
f | The column to be padded. |
m | The column where the mask should be stored and from which the dimension is taken. |
value | The value to pad with. |
auto hipipe::stream::random_fill | ( | from_t< FromColumn > | size_from, |
to_t< ToColumn > | fill_to, | ||
long | rnddims = std::numeric_limits<long>::max() , |
||
Dist | dist = Dist{0, 1} , |
||
Prng & | prng = hipipe::utility::random_generator , |
||
dim_t< Dim > | d = dim_t<Dim>{} |
||
) |
Fill the selected column of a stream with random values.
This function uses stream::generate() and has a similar semantics. That is, the column to be filled is first resized so that it has the same size as the selected source column.
Tip: If there is no column the size could be taken from, than just resize the target column manually and use it as both from
column and to
column.
Example:
size_from | The column whose size will be used to initialize the random column. |
fill_to | The column to be filled with random data. |
rnddims | The number of random dimensions. See utility::random_fill(). |
dist | The random distribution to be used. This object is copied on every use to avoid race conditions with stream::buffer(). |
prng | The random generator to be used. |
d | This is the dimension in which will the generator be applied. E.g., if set to 1, the generator result is considered to be a single example. The default is ndims<ToColumn::data_type> - ndims<dist(prng)>. This value has to be positive. |
Definition at line 58 of file random_fill.hpp.
auto hipipe::stream::transform | ( | from_t< FromColumns... > | f, |
to_t< ToColumns... > | t, | ||
cond_t< CondColumn > | c, | ||
Fun | fun, | ||
dim_t< Dim > | d = dim_t<1>{} |
||
) |
Conditional transform of a subset of hipipe columns.
This function behaves the same as the original stream::transform(), but it accepts one extra argument denoting a column of true
/false
values of the same shape as the columns to be transformed. The transformation will only be applied on true values and it will be an identity on false values.
Note that this can be very useful in combination with stream::random_fill() and std::bernoulli_distribution.
Example:
f | The columns to be extracted out of the tuple of columns and passed to fun. |
t | The columns where the result will be saved. Those have to already exist in the stream. |
c | The column of true /false values denoting whether the transformation should be performed or not. For false values, the transformation is an identity on the target columns. |
fun | The function to be applied. The function should return the type represented by the selected column in the given dimension. If there are multiple target columns, the function should return a tuple of the corresponding types. |
d | The dimension in which is the function applied. Choose 0 for the function to be applied to the whole batch. |
Definition at line 345 of file transform.hpp.
auto hipipe::stream::transform | ( | from_t< FromColumns... > | f, |
to_t< ToColumns... > | t, | ||
double | prob, | ||
Fun | fun, | ||
Prng & | prng = utility::random_generator , |
||
dim_t< Dim > | d = dim_t<1>{} |
||
) |
Probabilistic transform of a subset of hipipe columns.
This function behaves the same as the original stream::transform(), but it accepts one extra argument denoting the probability of transformation. If this probability is 0.0, the transformer behaves as an identity. If it is 1.0, the transofrmation function is always applied.
Example:
f | The columns to be extracted out of the tuple of columns and passed to fun. |
t | The columns where the result will be saved. Those have to already exist in the stream. |
prob | The probability of transformation. If the dice roll fails, the transformer applies an identity on the target columns. |
fun | The function to be applied. The function should return the type represented by the selected column in the given dimension. If there are multiple target columns, the function should return a tuple of the corresponding types. |
prng | The random generator to be used. Defaults to a thread_local std::mt19937. |
d | The dimension in which is the function applied. Choose 0 for the function to be applied to the whole batch. |
Definition at line 474 of file transform.hpp.
auto hipipe::stream::transform | ( | from_t< FromColumns... > | f, |
to_t< ToColumns... > | t, | ||
Fun | fun, | ||
dim_t< Dim > | d = dim_t<1>{} |
||
) |
Transform a subset of hipipe columns to a different subset of hipipe columns.
Example:
f | The columns to be extracted out of the tuple of columns and passed to fun. |
t | The columns where the result will be saved. If the stream does not contain the selected columns, they are added to the stream. This parameter can overlap with the parameter f. |
fun | The function to be applied. The function should return the type represented by the target column in the given dimension. If there are multiple target columns, the function should return a tuple of the corresponding types. |
d | The dimension in which is the function applied. Choose 0 for the function to be applied to the whole batch. |
Definition at line 218 of file transform.hpp.
auto hipipe::stream::unpack | ( | Rng && | rng, |
from_t< FromColumns... > | f, | ||
dim_t< Dim > | d = dim_t<1>{} |
||
) |
Unpack a stream into a tuple of ranges.
This operation transforms the stream (i.e., a range of batches) into a tuple of the types represented by the columns. The data can be unpacked in a specific dimension and then the higher dimensions are joined together.
If there is only a single column to be unpacked, the result is an std::vector of the corresponding type. If there are multiple columns to be unpacked, the result is a tuple of std::vectors.
Example:
Definition at line 118 of file unpack.hpp.
|
inline |
Asynchronously buffers the given range.
Asynchronously evaluates the given number of elements in advance. When queried for the next element, it is already prepared. This view works for any range, not only for hipipe streams.
Note that this transformer is not lazy and instead eagerly evaluates the data in asynchronous threads. To avoid recalculation of the entire underlying range whenever e.g., std::distance is called, this transformer intentionally changes the stream type to input_range. The downside is that no further transformations can be appended (except for Stream modifiers and data types. stream::rebatch) and everything has to be prepared before the application of this transformer.
Definition at line 195 of file buffer.hpp.
rgv::view<detail::create_fn<Columns...> > hipipe::stream::create {} |
Converts a data range to a HiPipe stream.
The value type of the input range is supposed to be either the type represented by the column to be created, or a tuple of such types if there are more columns to be created.
Example:
batch_size | The requested batch size of the new stream. |
Definition at line 119 of file create.hpp.
rgv::view<detail::drop_fn<Columns...> > hipipe::stream::drop {} |
Drops columns from a stream.
Example:
rgv::view<detail::keep_fn<Columns...> > hipipe::stream::keep {} |
Keep the specified columns in the stream, drop everything else.
Example:
|
inline |
Accumulate the stream and yield batches of a different size.
The batch size of the accumulated columns is allowed to differ between batches. To make one large batch of all the data, use std::numeric_limits<std::size_t>::max().
Note that this stream transformer is not lazy and instead eagerly evaluates the batches computed by the previous stream pipeline and reorganizes the evaluated data to batches of a different size. To avoid recalculation of the entire stream whenever e.g., std::distance is called, this transformer intentionally changes the stream type to input_range. The downside is that no further transformations or buffering can be appended and everything has to be prepared before the application of this transformer.
Definition at line 175 of file rebatch.hpp.