Changelog
Pantab 5.2.2 (2025-05-15)
Bug Fixes
Fixed an issue where writing temporal or binary view data in batches would yield incorrect results
Fixed an issue where dates beyond 2262-04-11 and before 1677-09-21 were not writing properly
Pantab 5.2.1 (2025-04-10)
Bug Fixes
Fixed an issue where reading a Hyper file with pantab would occassionally crash (#392)
Pantab 5.2.0 (2024-11-07)
New Features
Larger than RAM Read Support
By default, from a Hyper database materializes all of the query results into memory. This can be problematic when dealing with a result set that exceeds the amount of available RAM on your machine.
With pantab 5.2 we are adding support for passing result sets through the Arrow PyCapsule Interface using``result_type=”stream”`` and optionally specifying a chunk_size=
argument. The object returned can be forwarded along to the stream reader of your choice to iterate the result set in batches.
To illustrate, let’s take a look at code using PyArrow’s RecordBatchReader:
import pantab as pt
import pyarrow as pa
tbl = pa.table({"col1": range(4), "col2": list("abcd")})
pt.frame_to_hyper(tbl, "example.hyper", table="test")
stream = pt.frame_from_hyper(
"example.hyper",
table="test",
return_type="stream",
chunk_size=2,
)
reader = pa.RecordBatchReader.from_stream(stream)
for batch in reader:
print(batch)
If you do not specify a chunk_size
argument, Tableau should by default provide back results in 256MB blocks of data.
Note that writing larger than RAM datasets is already natively supported by pantab; this version helps enforce that consistency on the read size.
Improved Write Performance
Prior to the 5.2.0 release, pantab would first try to write to a Hyper file in your temporary directory, only copying it to the desired location when a write was successful. In cases where the file you are trying to write did not exist, this unnecessary copy would add performance overhead.
In cases where you are trying to overwrite or append to an existing file, pantab by default will still write to the temporary directory so as to avoid corrupting the existing Hyper file in case of write failure. However, if you do not care to preserve the existing file at all even in case of subsequent write failures, you can provide the atomic==False
keyword to avoid use of the temporary directory.
import pantab as pt
import pyarrow as pa
tbl = pa.table({"col1": range(4), "col2": list("abcd")})
for _ in range(100):
pt.frame_to_hyper(tbl, "example.hyper", table="test", table_mode="append")
Prior to 5.2.0 the preceding code block would exhibit O(n) performance, with n
representing the number of loop iterations. With 5.2.0 and using atomic==False
, the performance is closer to O(1).
Other Features
Pantab 5.2.0 is the first release to officially provide Python 3.13 wheels
Pantab 5.1.0 (2024-10-09)
New Features
Control Hyper Database Version
Hyper files provide a “database version” that controls features and backwards compatability. To support many of the new features in pantab 5.0.0 this value was set to a database version of “4”.
However, this had the unintended consquence of making pantab-creeated files incompatible with older tools. To fix this, we have set the default version to 2 and added a process_params
argument to the pantab API which allows you to override this setting.
import pandas as pd
import pantab as pt
# single precision float support requires database version 4+
df = pd.DataFrame(
{"float32": pd.Series([3.14], dtype="float32")}
)
pt.frame_to_hyper(
df,
"example.hyper",
table="test",
process_params={"default_database_version": "4"}
)
Ultimately there is not one default value that will work for all users. For details specific to this parameter and its effects, please refer to Tableau’s default_database_version parameter documentation.
tableauhyperapi Package is Now an Optional Dependency
Historically we have required users to install the tableauhyperapi Python package alongside pantab. However, this can cause issues when a user installs a version of the tableauhyperapi package that is incompatible with the binaries provided already by pantab. There is no great way to prevent this via Python packaging, and it can yield subtle or even outright bugs, especially on Windows platforms.
Aside from the packaging difficulties, dropping this dependency now makes pantab a much smaller installation. While the exact amount of disk space saved will depend upon the platform, users should expect to save roughly 250 MB of disk space per installation.
If you decide to still install both, pantab will continue to work alongside the tableauhyperapi and can continue to accept the TableName / Name objects from that library. However, for Windows users in particular, we advise against installing both packages.
Other Features
Implemented write support for binary view types, which polars uses by default for strings (#340)
Implemented write support for dictionary-encoded strings (aka categoricals)
Improved performance and reduced memory usage (#343)
Bug Fixes
Fixed a bug where leading decimal places were not being transmitted properly (#342)
Pantab 5.0.0 (2024-08-21)
New Features
Support for Apple Silicon
As one of our most requested features, we are happy to announce that pantab now runs natively on Apple Silicon processors without having to use Rosetta 2. This provides significant performance and usability improvements to macOS users with M1 chips or newer. For users of Intel chips, pantab still distributes an x86 wheel.
New Data Type Support
With the 5.0.0 release, pantab now supports writing decimal data types, akin to those offered by the decimal module built into Python. For users that need precise calculations, these are highly encouraged over floating point data types.
Speaking of floating point…pantab now also offers support for single point precision types. Previously, single point precision values were cast to double point precision when being written to Hyper; now the types can be fully roundtripped.
Faster Datetime Writes
In the 5.0.0 release, the performance when writing datetime types (both timezone-aware and timezone-naive) has been improved by about 20%.
Other Features
pantab no longer generates a hyperd.log file (#292)
pantab now defaults to producing Hyper files using database version 4 (#313)
Improved error message when trying to write an unsupported data type to hyper (#293)
NumPy is no longer required when building pantab (#315)
Bugfixes
Fixed issue on Unix systems where an incompatible binary installed via the tableauhyperapi package could interfere with the version bundled into pantab (#302). In theory, this issue may still affect Windows users (see #318)
Pantab 4.1.0 (2024-04-02)
New Features
Add support for dataframes that have 8 bit integers. When written to Hyper, these will be stored as 16 bit integers. (#280)
Bugfixes
Bumped the minimum version of the tableauhyperapi to 0.0.18618 to fix a bug where reading from Hyper files would throw a RuntimeError (#283)
Pantab 4.0.1 (2024-03-21)
Bugfixes
Fixed issue where dates were not properly written to Hyper (#282)
Pantab 4.0.0 (2024-03-15)
pantab 4.0 represents the most significant change to the library since it began 6 years ago. Please note 4.0 introduces breaking changes to the API. When in doubt, users should pin pantab to the 3.x series in production and test before upgrading.
New Features
Support for pandas, pyarrow, polars and more!
The original design of pantab was heavily tied to the internals of pandas. Unfortunately, the type system pandas inherited from NumPy was not an ideal match for translating into Hyper types. Since that time, the Arrow Columnar Format has helped immensely to standardize the way libraries could efficiently exchange data. As a result, pantab can support exchanging information from pandas, pyarrow and polars dataframes with relative ease.
All of the following solutions will work:
>>> import pantab as pt
>>> import pandas as pd
>>> df = pd.DataFrame({"col": [1, 2, 3]})
>>> pt.frame_to_hyper(df, "example.hyper", table="test")
>>> import pyarrow as pa
>>> tbl = pa.Table.from_arrays([pa.array([1, 2, 3])], names=["col"])
>>> pt.frame_to_hyper(tbl, "example.hyper", table="test")
>>> import polars as pl
>>> df = pl.DataFrame({"col": [1, 2, 3]})
>>> pt.frame_to_hyper(df, "example.hyper", table="test")
As far as reading is concerned, you can control the type of DataFrame you receive back via the return_type
keyword. pandas remains the default
>>> pt.frame_from_hyper("example.hyper", table="test") # pandas by default
col
0 1
1 2
2 3
>>> pt.frame_from_hyper("example.hyper", table="test", return_type="pyarrow")
pyarrow.Table
col: int64
----
col: [[1,2,3]]
>>> pt.frame_from_hyper("example.hyper", table="test", return_type="polars")
shape: (3, 1)
┌─────┐
│ col │
│ --- │
│ i64 │
╞═════╡
│ 1 │
│ 2 │
│ 3 │
└─────┘
Note
Any library that implements the Arrow PyCapsule Interface will be writeable via pantab; reading to such a library would require explicit development
Read any Hyper file
Prior to the 4.0 release, pantab worked well as a “self-contained” system, i.e. it could roundtrip files that it itself created. However, pantab struggled to read in hyper files created from other sources given. With 4.0, pantab makes a promise to be able to read any hyper file regardless of the types therein.
Native Date/Time Support
pandas historically only had a timestamp type with nanosecond precision from the Unix epoch. Thanks to the arrow type system, users can now write dates and even times
>>> import pantab as pt
>>> import pyarrow as pa
>>> tbl = pa.Table.from_arrays([pa.array([datetime.date(2024, 1, 1)])], names=["col"])
>>> pt.frame_to_hyper(tbl, "example.hyper", table="test") # this will now write dates!
Write JSON / Geography
Arrow does not have a native JSON string type nor a geography type. To work around this, you may still pass in either type as a string and use the json_columns
and geo_columns
arguments respectively, providing a set
of column names that are applicable. pantab takes care of the rest!
>>> import pantab as pt
>>> import pandas as pd
>>> df = pd.DataFrame({"json_col": ['{"foo": 42}']})
>>> pt.frame_to_hyper(df, "example.hyper", table="test", json_columns={"json_col"})
>>> import polars as pl
>>> df = pl.DataFrame({"geo_col": ["point(-122.338083 47.647528)"]})
>>> pt.frame_to_hyper(df, "example.hyper", table="test", geo_columns={"geo_col"})
Note
The Hyper API reads back geography types as a binary proprietary format. You can still write this back via pantab, but note that you can not roundtrip a WKT like the above example
Better Performance
Reading in particular has much improved performance thanks to the new design. Compared to pantab 3.X, reads in pantab 4.0 are at least 5x faster and use only 20% of the memory
Miscellaneous
By default all columns written via pantab are assumed to be nullable. You can override this behavior by passing a set of column names to the
not_null_columns
argument when writingpantab will now handle duplicate column names during reads by appending
_n
to every duplicate, where n represents the 0-based counter of a given column name’s occurrance
Backwards incompatible changes
The ability to provide your own existing Hyper connection or process to pantab has been removed. This was removed due to the perceived incompatability between the 3.X and 4.X designs, and the development effort would be rather large for what is believed to be a seldomly used feature
pantab no longer reads / writes pandas Timedelta types. Users should instead use the Arrow interval types, which align more closely with the concept of an INTERVAL in the hyper database
Bug Fixes
Fixed a segmentation fault when writing certain frames (#240)
Fixed a memory error when writing empty frames (#172)
Pantab 3.0.3 (2023-12-18)
Fixed issue with reading VARCHAR columns from Hyper files (#210)
Pantab 3.0.2 (2023-11-13)
Fixed issue with NumPy throwing
RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xe
Pantab 3.0.1 (2023-10-09)
Special thanks to Abhinav Dhulipala for contributing to this release.
Fixed issue where timezone-aware datetimes with pandas >= 1.4 would write incorrect values to Hyper (#186)
Fixed issue where a query returning an empty result set from Hyper would raise
ValueError
(#163)
Pantab 3.0.0 (2022-09-14)
Implemented a new
use_parquet
keyword inframe_to_hyper
which uses Parquet as an intermediate storage solution instead of pantab’s own internal C library. This may provide a small performance boost at the cost of additional disk usageFixed issue where pantab was not compatabile with Hyper versions 0.0.14567 and above.
Pantab 2.1.1 (2022-04-13)
Fixed a memory leak with
frame_to_hyper
Fixed issue where
pantab.__version__
was misreporting the version string
Pantab 2.1.0 (2021-07-02)
Special thanks to Caleb Overman for contributing to this release.
Enhancments
A new
use_float_na
parameter has been added to reading functions, which will convert doubles from Hyper files to the pandasFloat64
Extension dtype rather than using the standard numpy float dtype (#131)Writing
Float32
andFloat64
dtypes is now supported (#131)Writing to a Hyper file is now up to 50% faster (#132)
Pantab 2.0.0 (2021-04-15)
Special thanks to Adrian Vogelsgesang for contributing to this release.
API Breaking Changes
Users may now pass an existing connection as the first argument to pantab’s read functions. As part of this, the first argument was renamed from
database
tosource
(#123)
Enhancements
Added support for Python 3.9 while dropping support for 3.6 (#122)
A new
frame_from_hyper_query
method has been added, providing support for executing SQL statements against a Hyper file (#118)Users may now create their own Hyper process and pass it as an argument to the reading and writing functions (#39, #51)
The value 0001-01-01 will no longer be read as a NULL timestamp (#121)
Pantab 1.1.1 (2020-11-02)
Bugfixes
Fixed issue where pantab would throw
TypeError: Column "COLUMN_NAME" has unsupported datatype TEXT
when reading Non-Nullable string columns from Hyper (#111)
Pantab 1.1.0 (2020-04-30)
Special thanks to Adrian Vogelsgesang for contributing to this release.
Features
Added support for reading Hyper DATE columns as datetime64 objects in pandas (#94)
Bugfixes
Fixed issue where Python would crash instead of throwing an error when reading invalid records from a Hyper file (#77)
Fixed ImportError when building from source with tableauhyperapi versions 0.0.10309 and greater (#88)
Attempting to read a Hyper extract with unsupported data types will now raise a
TypeError
(#92)
Pantab 1.0.1 (2020-02-03)
Features
pantab will not automatically install the tableauhyperapi as a dependency when installing via pip (#83)
Pre-built wheels for manylinux configurations are now available. (#84)
Pantab 1.0.0 (2020-01-15)
Special thanks to chillerno1 for contributing to this release.
Features
pantab now supports reading/writing pandas 1.0 dtypes, namely the
boolean
andstring
dtypes. (#20)Important
TEXT data read from a Hyper extract will be stored in a
string
dtype when using pandas 1.0 or greater in combination with pantab 1.0 or greater. Older versions of either tool will read the data back into aobject
dtype.
Bugfixes
Fixed potential segfault on systems where not all addresses can be expressed in an unsigned long long. (#52)
Pantab 0.2.3 (2020-01-02)
Bugfixes
Fixed issue where dates would roundtrip in pantab find but would either error or be incorrect in Tableau Desktop (#66)
Pantab 0.2.2 (2019-12-25)
Bugfixes
Pantab now writes actual NULL values for datetime columns, rather than 0001-01-01 00:00:00 (#60)
Pantab 0.2.1 (2019-12-23)
Bugfixes
Fixed issue where reading a datetime column containing
pd.NaT
values would throw anOutOfBoundsDatetime
error (#56)Fixed issue where reading a timedelta column containing
pd.NaT
would throw aValueError
(#57)
Pantab 0.2.0 (2019-12-19)
Features
Improved performance when reading data from Hyper extracts (#34)
0.1.1 (2019-12-06)
A special thank you goes out to the following contributors leading up to this release:
Bugfixes
Fixed issue where source installations would error with fatal error: tableauhyperapi.h: No such file or directory (#40)
0.1.0 (2019-11-29)
pantab is officially out of beta! Thanks for all of the feedback and support of the tool so far.
Special thanks to Adrian Vogelsgesang and Jan Finis at Tableau, who offered guidance and feedback on performance improvements in this release.
Improved error messaging when attempting to write invalid data. (#19)
Write-performance of Hyper extracts has been drastically improved for larger datasets. (#31)
Less memory is now required to write DataFrames to the Hyper format. (#33)
0.0.1.b5 (2019-11-05)
Bugfixes
Fixed issue where failures during append mode (
table_mode="a"
) would delete original Hyper file. (#17)
0.0.1.b4 (2019-11-05)
Features
frame_to_hyper and frames_to_hyper now support a table_mode keyword argument.
table_mode="a"
will append data to existing tables, or create them if they do not exist. The default operation oftable_mode="w"
will continue to fully drop / reload tables. (#14)
0.0.1.b3 (2019-11-01)
Features
Added support for nullable integer types (i.e. the “Int*” types in pandas). Current integer types will now show as NOT_NULLABLE in Hyper extracts. (#7)
Added support for reading / writing UTC timestamps, rather than only timezone-naive. (#8)
Bugfixes
Fixed issue where certain versions of pantab in combination with certain versions of the Hyper API would throw “TypeError: __init__() got an unexpected keyword argument ‘name’” when generating Hyper extracts. (#10)