Arrow-enabled Queries#

In more recent deployments of the BTrDB platform (>=5.30.0), commercial customers also have access to additional accelerated functionality for data fetching and inserting.

Also, most StreamSet based value queries (AlignedWindows, Windows, Values) are multithreaded by default. This leads to decent performance improvements for fetching and inserting data using the standard StreamSet api without any edits by the user. Refer to The StreamSet API

In addition to these improvements to the standard API, commercial customers also have access to additional accelerated data fetching and inserting methods that can dramatically speed up their workflows.

Apache Arrow Data Format#

While keeping our standard API consistent with our Point and StatPoint python object model, we have also created additional methods that will provide this same type of data, but in a tabular format by default. Leveraging the language-agnostic columnar data format Arrow, we can transmit our timeseries data in a format that is already optimized for data analytics with well-defined schemas that take the guesswork out of the data types, timezones, etc. To learn more about these methods, please refer to the arrow_ prefixed methods for both Stream and StreamSet objects and the StreamSet transformer methods.

True Multistream Support#

Until now, there has not been a true multistream query support, our previous api and with the new edits, emulates multistream support with StreamSet s and using multithreading. However, this will still only scale to an amount of streams based on the amount of threads that the python threadpool logic can support.

Due to this, raw data queries for StreamSet s using our arrow api StreamSet.filter(start=X, end=Y,).arrow_values() will now perform true multistream queries. The platform, instead of the python client, will now quickly grab all the stream data for all streams in your streamset, and then package that back to the python client in an arrow table! This leads to data fetch speedups on the order of 10-50x based on the amount and kind of streams.