btrdb.transformers#

A number of transformation and serialization functions have been developed so you can use the data in the format of your choice. These functions are provided in the StreamSet class.

Value transformation utilities

btrdb.transformers.arrow_to_dict(streamset, agg=None, name_callable=None)#

Returns a list of dicts for each time code with the appropriate stream data attached.

Parameters:
  • agg (List[str], default: ["mean"]) – Specify the StatPoint field or fields (e.g. aggregating function) to constrain dict keys. Must be one or more of “min”, “mean”, “max”, “count”, or “stddev”. This argument is ignored if RawPoint values are passed into the function.

  • name_callable (lambda, default: lambda s: s.collection + "/" + s.name) – Specify a callable that can be used to determine the series name given a Stream object.

Note

This method is available for commercial customers with arrow-enabled servers.

btrdb.transformers.arrow_to_numpy(streamset, agg=None)#

Return a multidimensional array in the numpy format.

Parameters:

agg (List[str], default: ["mean"]) – Specify the StatPoint field or fields (e.g. aggregating function) to return for the arrays. Must be one or more of “min”, “mean”, “max”, “count”, or “stddev”. This argument is ignored if RawPoint values are passed into the function.

Note

This method first converts to a pandas data frame then to a numpy array.

Note

This method is available for commercial customers with arrow-enabled servers.

btrdb.transformers.arrow_to_series(streamset, agg=None, name_callable=None)#

Returns a list of Pandas Series objects indexed by time

Parameters:
  • agg (List[str], default: ["mean"]) – Specify the StatPoint field or fields (e.g. aggregating function) to create the Series from. Must be one or more of “min”, “mean”, “max”, “count”, or “stddev”. This argument is ignored if RawPoint values are passed into the function.

  • name_callable (lambda, default: lambda s: s.collection + "/" + s.name) – Specify a callable that can be used to determine the series name given a Stream object.

Return type:

List[pandas.Series]

Note

This method is available for commercial customers with arrow-enabled servers.

Note

If you are not performing a window or aligned_window query, the agg parameter will be ignored.

Examples

Return a list of series of raw data per stream.

>>> conn = btrdb.connect()
>>> s1 = conn.stream_from_uuid('c9fd8735-5ec5-4141-9a51-d23e1b2dfa42')
>>> s2 = conn.stream_from_uuid('9173fa70-87ab-4ac8-ac08-4fd63b910cae'
>>> streamset = btrdb.stream.StreamSet([s1,s2])
>>> streamset.filter(start=1500000000000000000, end=1500000000900000001).arrow_to_series(agg=None)
    [time
    2017-07-14 02:40:00+00:00            1.0
    2017-07-14 02:40:00.100000+00:00     2.0
    2017-07-14 02:40:00.200000+00:00     3.0
    2017-07-14 02:40:00.300000+00:00     4.0
    2017-07-14 02:40:00.400000+00:00     5.0
    2017-07-14 02:40:00.500000+00:00     6.0
    2017-07-14 02:40:00.600000+00:00     7.0
    2017-07-14 02:40:00.700000+00:00     8.0
    2017-07-14 02:40:00.800000+00:00     9.0
    2017-07-14 02:40:00.900000+00:00    10.0
    Name: new/stream/collection/foo, dtype: double[pyarrow],
    time
    2017-07-14 02:40:00+00:00            1.0
    2017-07-14 02:40:00.100000+00:00     2.0
    2017-07-14 02:40:00.200000+00:00     3.0
    2017-07-14 02:40:00.300000+00:00     4.0
    2017-07-14 02:40:00.400000+00:00     5.0
    2017-07-14 02:40:00.500000+00:00     6.0
    2017-07-14 02:40:00.600000+00:00     7.0
    2017-07-14 02:40:00.700000+00:00     8.0
    2017-07-14 02:40:00.800000+00:00     9.0
    2017-07-14 02:40:00.900000+00:00    10.0
    Name: new/stream/bar, dtype: double[pyarrow]]

A window query of 0.5seconds long.

>>> streamset.filter(start=1500000000000000000, end=1500000000900000001)
...          .windows(width=int(0.5 * 10**9))
...          .arrow_to_series(agg=["mean", "count"])
    [time
    2017-07-14 02:40:00+00:00           2.5
    2017-07-14 02:40:00.400000+00:00    6.5
    Name: new/stream/collection/foo/mean, dtype: double[pyarrow],
...
    time
    2017-07-14 02:40:00+00:00           4
    2017-07-14 02:40:00.400000+00:00    4
    Name: new/stream/collection/foo/count, dtype: uint64[pyarrow],
...
    time
    2017-07-14 02:40:00+00:00           2.5
    2017-07-14 02:40:00.400000+00:00    6.5
    Name: new/stream/bar/mean, dtype: double[pyarrow],
...
    time
    2017-07-14 02:40:00+00:00           4
    2017-07-14 02:40:00.400000+00:00    4
    Name: new/stream/bar/count, dtype: uint64[pyarrow]]
btrdb.transformers.arrow_to_dataframe(streamset, agg=None, name_callable=None) DataFrame#

Returns a Pandas DataFrame object indexed by time and using the values of a stream for each column.

Parameters:
  • agg (List[str], default: ["mean"]) – Specify the StatPoint fields (e.g. aggregating function) to create the dataframe from. Must be one or more of “min”, “mean”, “max”, “count”, “stddev”, or “all”. This argument is ignored if not using StatPoints.

  • name_callable (lambda, default: lambda s: s.collection + "/" + s.name) – Specify a callable that can be used to determine the series name given a Stream object.

Note

This method is available for commercial customers with arrow-enabled servers.

Examples

>>> conn = btrdb.connect()
>>> s1 = conn.stream_from_uuid('c9fd8735-5ec5-4141-9a51-d23e1b2dfa42')
>>> s2 = conn.stream_from_uuid('9173fa70-87ab-4ac8-ac08-4fd63b910cae'
>>> streamset = btrdb.stream.StreamSet([s1,s2])
>>> streamset.filter(start=1500000000000000000, end=1500000000900000001).arrow_to_dataframe()
                                        new/stream/collection/foo       new/stream/bar
    time
    2017-07-14 02:40:00+00:00                               1.0             1.0
    2017-07-14 02:40:00.100000+00:00                        2.0             2.0
    2017-07-14 02:40:00.200000+00:00                        3.0             3.0
    2017-07-14 02:40:00.300000+00:00                        4.0             4.0
    2017-07-14 02:40:00.400000+00:00                        5.0             5.0
    2017-07-14 02:40:00.500000+00:00                        6.0             6.0
    2017-07-14 02:40:00.600000+00:00                        7.0             7.0
    2017-07-14 02:40:00.700000+00:00                        8.0             8.0
    2017-07-14 02:40:00.800000+00:00                        9.0             9.0
    2017-07-14 02:40:00.900000+00:00                       10.0            10.0

Use the stream uuids as their column names instead, using a lambda function.

>>> streamset.filter(start=1500000000000000000, end=1500000000900000001)
...          .arrow_to_dataframe(
...             name_callable=lambda s: str(s.uuid)
...          )
                                  c9fd8735-5ec5-4141-9a51-d23e1b2dfa42  9173fa70-87ab-4ac8-ac08-4fd63b910cae
    time
    2017-07-14 02:40:00+00:00                                          1.0                                   1.0
    2017-07-14 02:40:00.100000+00:00                                   2.0                                   2.0
    2017-07-14 02:40:00.200000+00:00                                   3.0                                   3.0
    2017-07-14 02:40:00.300000+00:00                                   4.0                                   4.0
    2017-07-14 02:40:00.400000+00:00                                   5.0                                   5.0
    2017-07-14 02:40:00.500000+00:00                                   6.0                                   6.0
    2017-07-14 02:40:00.600000+00:00                                   7.0                                   7.0
    2017-07-14 02:40:00.700000+00:00                                   8.0                                   8.0
    2017-07-14 02:40:00.800000+00:00                                   9.0                                   9.0
    2017-07-14 02:40:00.900000+00:00                                  10.0                                  10.0

A window query, with a window width of 0.4 seconds, and only showing the mean statpoint.

>>> streamset.filter(start=1500000000000000000, end=1500000000900000001)
...          .windows(width=int(0.4*10**9))
...          .arrow_to_dataframe(agg=["mean"])
                                            new/stream/collection/foo/mean  new/stream/bar/mean
    time
    2017-07-14 02:40:00+00:00                                    2.5                  2.5
    2017-07-14 02:40:00.400000+00:00                             6.5                  6.5

A window query, with a window width of 0.4 seconds, and only showing the mean and count statpoints.

>>> streamset.filter(start=1500000000000000000, end=1500000000900000001)
...          .windows(width=int(0.4*10**9))
...          .arrow_to_dataframe(agg=["mean", "count"])
                              new/stream/collection/foo/mean  new/stream/collection/foo/count  new/stream/bar/mean  new/stream/bar/count
    time
    2017-07-14 02:40:00+00:00               2.5                                4                  2.5                     4
    2017-07-14 02:40:00.400000+00:00        6.5                                4                  6.5                     4
btrdb.transformers.arrow_to_polars(streamset, agg=None, name_callable=None)#

Returns a Polars DataFrame object with time as a column and the values of a stream for each additional column from an arrow table.

Parameters:
  • agg (List[str], default: ["mean"]) – Specify the StatPoint field or fields (e.g. aggregating function) to create the dataframe from. Must be one or multiple of “min”, “mean”, “max”, “count”, “stddev”, or “all”. This argument is ignored if not using StatPoints.

  • name_callable (lambda, default: lambda s: s.collection + "/" + s.name) – Specify a callable that can be used to determine the series name given a Stream object.

Note

This method is available for commercial customers with arrow-enabled servers.

btrdb.transformers.arrow_to_arrow_table(streamset)#

Return a pyarrow table of data.

Note

This method is available for commercial customers with arrow-enabled servers.

btrdb.transformers.to_dict(streamset, agg='mean', name_callable=None)#

Returns a list of OrderedDict for each time code with the appropriate stream data attached.

Parameters:
  • agg (str, default: "mean") – Specify the StatPoint field (e.g. aggregating function) to constrain dict keys. Must be one of “min”, “mean”, “max”, “count”, or “stddev”. This argument is ignored if RawPoint values are passed into the function.

  • name_callable (lambda, default: lambda s: s.collection + "/" + s.name) – Specify a callable that can be used to determine the series name given a Stream object.

Note

This method does not use the arrow -accelerated endpoints for faster and more efficient data retrieval.

btrdb.transformers.to_array(streamset, agg='mean')#

Returns a multidimensional numpy array (similar to a list of lists) containing point classes.

Parameters:

agg (str, default: "mean") – Specify the StatPoint field (e.g. aggregating function) to return for the arrays. Must be one of “min”, “mean”, “max”, “count”, or “stddev”. This argument is ignored if RawPoint values are passed into the function.

Note

This method does not use the arrow -accelerated endpoints for faster and more efficient data retrieval.

btrdb.transformers.to_polars(streamset, agg='mean', name_callable=None)#

Returns a Polars DataFrame object with time as a column and the values of a stream for each additional column.

Parameters:
  • agg (str, default: "mean") – Specify the StatPoint field (e.g. aggregating function) to create the Series from. Must be one of “min”, “mean”, “max”, “count”, “stddev”, or “all”. This argument is ignored if not using StatPoints.

  • name_callable (lambda, default: lambda s: s.collection + "/" + s.name) – Specify a callable that can be used to determine the series name given a Stream object. This is not compatible with agg == “all” at this time

Note

This method does not use the arrow -accelerated endpoints for faster and more efficient data retrieval.

btrdb.transformers.to_series(streamset, datetime64_index=True, agg='mean', name_callable=None)#

Returns a list of Pandas Series objects indexed by time

Parameters:
  • datetime64_index (bool) – Directs function to convert Series index to np.datetime64[ns] or leave as np.int64.

  • agg (str, default: "mean") – Specify the StatPoint field (e.g. aggregating function) to create the Series from. Must be one of “min”, “mean”, “max”, “count”, or “stddev”. This argument is ignored if RawPoint values are passed into the function.

  • name_callable (lambda, default: lambda s: s.collection + "/" + s.name) – Specify a callable that can be used to determine the series name given a Stream object.

Note

This method does not use the arrow -accelerated endpoints for faster and more efficient data retrieval.

btrdb.transformers.to_dataframe(streamset, agg='mean', name_callable=None)#

Returns a Pandas DataFrame object indexed by time and using the values of a stream for each column.

Parameters:
  • agg (str, default: "mean") – Specify the StatPoint field (e.g. aggregating function) to create the Series from. Must be one of “min”, “mean”, “max”, “count”, “stddev”, or “all”. This argument is ignored if not using StatPoints.

  • name_callable (lambda, default: lambda s: s.collection + "/" + s.name) – Specify a callable that can be used to determine the series name given a Stream object. This is not compatible with agg == “all” at this time

Note

This method does not use the arrow -accelerated endpoints for faster and more efficient data retrieval.

btrdb.transformers.to_csv(streamset, fobj, dialect=None, fieldnames=None, agg='mean', name_callable=None)#

Saves stream data as a CSV file.

Parameters:
  • fobj (str or file-like object) – Path to use for saving CSV file or a file-like object to use to write to.

  • dialect (csv.Dialect) – CSV dialect object from Python csv module. See Python’s csv module for more information.

  • fieldnames (sequence) – A sequence of strings to use as fieldnames in the CSV header. See Python’s csv module for more information.

  • agg (str, default: "mean") – Specify the StatPoint field (e.g. aggregating function) to return when limiting results. Must be one of “min”, “mean”, “max”, “count”, or “stddev”. This argument is ignored if RawPoint values are passed into the function.

  • name_callable (lambda, default: lambda s: s.collection + "/" + s.name) – Specify a callable that can be used to determine the series name given a Stream object.

Note

This method does not use the arrow -accelerated endpoints for faster and more efficient data retrieval.