btrdb.transformers#
A number of transformation and serialization functions have been developed so
you can use the data in the format of your choice. These functions are provided
in the StreamSet
class.
Value transformation utilities
- btrdb.transformers.arrow_to_dict(streamset, agg=None, name_callable=None)#
Returns a list of dicts for each time code with the appropriate stream data attached.
- Parameters:
agg (List[str], default: ["mean"]) – Specify the StatPoint field or fields (e.g. aggregating function) to constrain dict keys. Must be one or more of “min”, “mean”, “max”, “count”, or “stddev”. This argument is ignored if RawPoint values are passed into the function.
name_callable (lambda, default: lambda s: s.collection + "/" + s.name) – Specify a callable that can be used to determine the series name given a Stream object.
Note
This method is available for commercial customers with arrow-enabled servers.
- btrdb.transformers.arrow_to_numpy(streamset, agg=None)#
Return a multidimensional array in the numpy format.
- Parameters:
agg (List[str], default: ["mean"]) – Specify the StatPoint field or fields (e.g. aggregating function) to return for the arrays. Must be one or more of “min”, “mean”, “max”, “count”, or “stddev”. This argument is ignored if RawPoint values are passed into the function.
Note
This method first converts to a pandas data frame then to a numpy array.
Note
This method is available for commercial customers with arrow-enabled servers.
- btrdb.transformers.arrow_to_series(streamset, agg=None, name_callable=None)#
Returns a list of Pandas Series objects indexed by time
- Parameters:
agg (List[str], default: ["mean"]) – Specify the StatPoint field or fields (e.g. aggregating function) to create the Series from. Must be one or more of “min”, “mean”, “max”, “count”, or “stddev”. This argument is ignored if RawPoint values are passed into the function.
name_callable (lambda, default: lambda s: s.collection + "/" + s.name) – Specify a callable that can be used to determine the series name given a Stream object.
- Return type:
List[pandas.Series]
Note
This method is available for commercial customers with arrow-enabled servers.
Note
If you are not performing a
window
oraligned_window
query, theagg
parameter will be ignored.Examples
Return a list of series of raw data per stream.
>>> conn = btrdb.connect() >>> s1 = conn.stream_from_uuid('c9fd8735-5ec5-4141-9a51-d23e1b2dfa42') >>> s2 = conn.stream_from_uuid('9173fa70-87ab-4ac8-ac08-4fd63b910cae' >>> streamset = btrdb.stream.StreamSet([s1,s2]) >>> streamset.filter(start=1500000000000000000, end=1500000000900000001).arrow_to_series(agg=None) [time 2017-07-14 02:40:00+00:00 1.0 2017-07-14 02:40:00.100000+00:00 2.0 2017-07-14 02:40:00.200000+00:00 3.0 2017-07-14 02:40:00.300000+00:00 4.0 2017-07-14 02:40:00.400000+00:00 5.0 2017-07-14 02:40:00.500000+00:00 6.0 2017-07-14 02:40:00.600000+00:00 7.0 2017-07-14 02:40:00.700000+00:00 8.0 2017-07-14 02:40:00.800000+00:00 9.0 2017-07-14 02:40:00.900000+00:00 10.0 Name: new/stream/collection/foo, dtype: double[pyarrow], time 2017-07-14 02:40:00+00:00 1.0 2017-07-14 02:40:00.100000+00:00 2.0 2017-07-14 02:40:00.200000+00:00 3.0 2017-07-14 02:40:00.300000+00:00 4.0 2017-07-14 02:40:00.400000+00:00 5.0 2017-07-14 02:40:00.500000+00:00 6.0 2017-07-14 02:40:00.600000+00:00 7.0 2017-07-14 02:40:00.700000+00:00 8.0 2017-07-14 02:40:00.800000+00:00 9.0 2017-07-14 02:40:00.900000+00:00 10.0 Name: new/stream/bar, dtype: double[pyarrow]]
A window query of 0.5seconds long.
>>> streamset.filter(start=1500000000000000000, end=1500000000900000001) ... .windows(width=int(0.5 * 10**9)) ... .arrow_to_series(agg=["mean", "count"]) [time 2017-07-14 02:40:00+00:00 2.5 2017-07-14 02:40:00.400000+00:00 6.5 Name: new/stream/collection/foo/mean, dtype: double[pyarrow], ... time 2017-07-14 02:40:00+00:00 4 2017-07-14 02:40:00.400000+00:00 4 Name: new/stream/collection/foo/count, dtype: uint64[pyarrow], ... time 2017-07-14 02:40:00+00:00 2.5 2017-07-14 02:40:00.400000+00:00 6.5 Name: new/stream/bar/mean, dtype: double[pyarrow], ... time 2017-07-14 02:40:00+00:00 4 2017-07-14 02:40:00.400000+00:00 4 Name: new/stream/bar/count, dtype: uint64[pyarrow]]
- btrdb.transformers.arrow_to_dataframe(streamset, agg=None, name_callable=None) DataFrame #
Returns a Pandas DataFrame object indexed by time and using the values of a stream for each column.
- Parameters:
agg (List[str], default: ["mean"]) – Specify the StatPoint fields (e.g. aggregating function) to create the dataframe from. Must be one or more of “min”, “mean”, “max”, “count”, “stddev”, or “all”. This argument is ignored if not using StatPoints.
name_callable (lambda, default: lambda s: s.collection + "/" + s.name) – Specify a callable that can be used to determine the series name given a Stream object.
Note
This method is available for commercial customers with arrow-enabled servers.
Examples
>>> conn = btrdb.connect() >>> s1 = conn.stream_from_uuid('c9fd8735-5ec5-4141-9a51-d23e1b2dfa42') >>> s2 = conn.stream_from_uuid('9173fa70-87ab-4ac8-ac08-4fd63b910cae' >>> streamset = btrdb.stream.StreamSet([s1,s2]) >>> streamset.filter(start=1500000000000000000, end=1500000000900000001).arrow_to_dataframe() new/stream/collection/foo new/stream/bar time 2017-07-14 02:40:00+00:00 1.0 1.0 2017-07-14 02:40:00.100000+00:00 2.0 2.0 2017-07-14 02:40:00.200000+00:00 3.0 3.0 2017-07-14 02:40:00.300000+00:00 4.0 4.0 2017-07-14 02:40:00.400000+00:00 5.0 5.0 2017-07-14 02:40:00.500000+00:00 6.0 6.0 2017-07-14 02:40:00.600000+00:00 7.0 7.0 2017-07-14 02:40:00.700000+00:00 8.0 8.0 2017-07-14 02:40:00.800000+00:00 9.0 9.0 2017-07-14 02:40:00.900000+00:00 10.0 10.0
Use the stream uuids as their column names instead, using a lambda function.
>>> streamset.filter(start=1500000000000000000, end=1500000000900000001) ... .arrow_to_dataframe( ... name_callable=lambda s: str(s.uuid) ... ) c9fd8735-5ec5-4141-9a51-d23e1b2dfa42 9173fa70-87ab-4ac8-ac08-4fd63b910cae time 2017-07-14 02:40:00+00:00 1.0 1.0 2017-07-14 02:40:00.100000+00:00 2.0 2.0 2017-07-14 02:40:00.200000+00:00 3.0 3.0 2017-07-14 02:40:00.300000+00:00 4.0 4.0 2017-07-14 02:40:00.400000+00:00 5.0 5.0 2017-07-14 02:40:00.500000+00:00 6.0 6.0 2017-07-14 02:40:00.600000+00:00 7.0 7.0 2017-07-14 02:40:00.700000+00:00 8.0 8.0 2017-07-14 02:40:00.800000+00:00 9.0 9.0 2017-07-14 02:40:00.900000+00:00 10.0 10.0
A window query, with a window width of 0.4 seconds, and only showing the
mean
statpoint.>>> streamset.filter(start=1500000000000000000, end=1500000000900000001) ... .windows(width=int(0.4*10**9)) ... .arrow_to_dataframe(agg=["mean"]) new/stream/collection/foo/mean new/stream/bar/mean time 2017-07-14 02:40:00+00:00 2.5 2.5 2017-07-14 02:40:00.400000+00:00 6.5 6.5
A window query, with a window width of 0.4 seconds, and only showing the
mean
andcount
statpoints.>>> streamset.filter(start=1500000000000000000, end=1500000000900000001) ... .windows(width=int(0.4*10**9)) ... .arrow_to_dataframe(agg=["mean", "count"]) new/stream/collection/foo/mean new/stream/collection/foo/count new/stream/bar/mean new/stream/bar/count time 2017-07-14 02:40:00+00:00 2.5 4 2.5 4 2017-07-14 02:40:00.400000+00:00 6.5 4 6.5 4
- btrdb.transformers.arrow_to_polars(streamset, agg=None, name_callable=None)#
Returns a Polars DataFrame object with time as a column and the values of a stream for each additional column from an arrow table.
- Parameters:
agg (List[str], default: ["mean"]) – Specify the StatPoint field or fields (e.g. aggregating function) to create the dataframe from. Must be one or multiple of “min”, “mean”, “max”, “count”, “stddev”, or “all”. This argument is ignored if not using StatPoints.
name_callable (lambda, default: lambda s: s.collection + "/" + s.name) – Specify a callable that can be used to determine the series name given a Stream object.
Note
This method is available for commercial customers with arrow-enabled servers.
- btrdb.transformers.arrow_to_arrow_table(streamset)#
Return a pyarrow table of data.
Note
This method is available for commercial customers with arrow-enabled servers.
- btrdb.transformers.to_dict(streamset, agg='mean', name_callable=None)#
Returns a list of OrderedDict for each time code with the appropriate stream data attached.
- Parameters:
agg (str, default: "mean") – Specify the StatPoint field (e.g. aggregating function) to constrain dict keys. Must be one of “min”, “mean”, “max”, “count”, or “stddev”. This argument is ignored if RawPoint values are passed into the function.
name_callable (lambda, default: lambda s: s.collection + "/" + s.name) – Specify a callable that can be used to determine the series name given a Stream object.
Note
This method does not use the
arrow
-accelerated endpoints for faster and more efficient data retrieval.
- btrdb.transformers.to_array(streamset, agg='mean')#
Returns a multidimensional numpy array (similar to a list of lists) containing point classes.
- Parameters:
agg (str, default: "mean") – Specify the StatPoint field (e.g. aggregating function) to return for the arrays. Must be one of “min”, “mean”, “max”, “count”, or “stddev”. This argument is ignored if RawPoint values are passed into the function.
Note
This method does not use the
arrow
-accelerated endpoints for faster and more efficient data retrieval.
- btrdb.transformers.to_polars(streamset, agg='mean', name_callable=None)#
Returns a Polars DataFrame object with time as a column and the values of a stream for each additional column.
- Parameters:
agg (str, default: "mean") – Specify the StatPoint field (e.g. aggregating function) to create the Series from. Must be one of “min”, “mean”, “max”, “count”, “stddev”, or “all”. This argument is ignored if not using StatPoints.
name_callable (lambda, default: lambda s: s.collection + "/" + s.name) – Specify a callable that can be used to determine the series name given a Stream object. This is not compatible with agg == “all” at this time
Note
This method does not use the
arrow
-accelerated endpoints for faster and more efficient data retrieval.
- btrdb.transformers.to_series(streamset, datetime64_index=True, agg='mean', name_callable=None)#
Returns a list of Pandas Series objects indexed by time
- Parameters:
datetime64_index (bool) – Directs function to convert Series index to np.datetime64[ns] or leave as np.int64.
agg (str, default: "mean") – Specify the StatPoint field (e.g. aggregating function) to create the Series from. Must be one of “min”, “mean”, “max”, “count”, or “stddev”. This argument is ignored if RawPoint values are passed into the function.
name_callable (lambda, default: lambda s: s.collection + "/" + s.name) – Specify a callable that can be used to determine the series name given a Stream object.
Note
This method does not use the
arrow
-accelerated endpoints for faster and more efficient data retrieval.
- btrdb.transformers.to_dataframe(streamset, agg='mean', name_callable=None)#
Returns a Pandas DataFrame object indexed by time and using the values of a stream for each column.
- Parameters:
agg (str, default: "mean") – Specify the StatPoint field (e.g. aggregating function) to create the Series from. Must be one of “min”, “mean”, “max”, “count”, “stddev”, or “all”. This argument is ignored if not using StatPoints.
name_callable (lambda, default: lambda s: s.collection + "/" + s.name) – Specify a callable that can be used to determine the series name given a Stream object. This is not compatible with agg == “all” at this time
Note
This method does not use the
arrow
-accelerated endpoints for faster and more efficient data retrieval.
- btrdb.transformers.to_csv(streamset, fobj, dialect=None, fieldnames=None, agg='mean', name_callable=None)#
Saves stream data as a CSV file.
- Parameters:
fobj (str or file-like object) – Path to use for saving CSV file or a file-like object to use to write to.
dialect (csv.Dialect) – CSV dialect object from Python csv module. See Python’s csv module for more information.
fieldnames (sequence) – A sequence of strings to use as fieldnames in the CSV header. See Python’s csv module for more information.
agg (str, default: "mean") – Specify the StatPoint field (e.g. aggregating function) to return when limiting results. Must be one of “min”, “mean”, “max”, “count”, or “stddev”. This argument is ignored if RawPoint values are passed into the function.
name_callable (lambda, default: lambda s: s.collection + "/" + s.name) – Specify a callable that can be used to determine the series name given a Stream object.
Note
This method does not use the
arrow
-accelerated endpoints for faster and more efficient data retrieval.