Wrapper around hdf5r::guess_dtype, allowing
for the customization of string types rather than defaulting to
variable-length ASCII-encoded strings. Also encodes logicals as
H5T_INTEGER instead of H5T_LOGICAL
to ensure cross-language compatibility (controlled via
package options)
Arguments
- x
The object for which to guess the HDF5 datatype or the dimension or the number of elements
- stype
Type of string encoding to use, choose from:
- utf8
Variable-width, UTF-8
- ascii7
Fixed-width (7 bits), ASCII
- ...
Arguments passed on to
hdf5r::guess_dtypeds_dimCan explicitly set the dimension of the dataset object. For
scalar, this is one. Otherwise, this can be used so that a multi-dimensional object can be represented so that some of its dimension are in the dataset, and some are inside anH5T_ARRAYscalarShould the datatype be created so that
xcan be represented as a scalar with that datatype? This is intended to know if a vector/array should be represented as anH5T_ARRAYor not.string_lenIf a string is in the R object, the length to which the corresponding HDF5 type should be set. If it is a positive integer, the string is of that length. If it is
Inf, it is variable length. If it is set toestimate, it is set to the length of the longest string in thex.
Value
An object of class H5T
Examples
# \donttest{
# Characters can either be variable-width UTF8-encoded or
# fixed-width ASCII-encoded
srtdisk:::GuessDType(x = 'hello')
#> Class: H5T_STRING
#> Datatype: H5T_STRING {
#> STRSIZE H5T_VARIABLE;
#> STRPAD H5T_STR_NULLTERM;
#> CSET H5T_CSET_UTF8;
#> CTYPE H5T_C_S1;
#> }
srtdisk:::GuessDType(x = 'hello', stype = 'ascii7')
#> Class: H5T_STRING
#> Datatype: H5T_STRING {
#> STRSIZE 7;
#> STRPAD H5T_STR_NULLTERM;
#> CSET H5T_CSET_ASCII;
#> CTYPE H5T_C_S1;
#> }
# Data frames are a compound type; character columns follow the same rules
# as character vectors
df <- data.frame(x = c('g1', 'g2', 'g3'), y = 1, 2, 3, stringsAsFactors = FALSE)
srtdisk:::GuessDType(x = df)
#> Class: H5T_COMPOUND
#> Datatype: H5T_COMPOUND {
#> H5T_STRING {
#> STRSIZE H5T_VARIABLE;
#> STRPAD H5T_STR_NULLTERM;
#> CSET H5T_CSET_UTF8;
#> CTYPE H5T_C_S1;
#> } "x" : 0;
#> H5T_IEEE_F64LE "y" : 8;
#> H5T_IEEE_F64LE "X2" : 16;
#> H5T_IEEE_F64LE "X3" : 24;
#> }
srtdisk:::GuessDType(x = df, stype = 'ascii7')
#> Class: H5T_COMPOUND
#> Datatype: H5T_COMPOUND {
#> H5T_STRING {
#> STRSIZE 7;
#> STRPAD H5T_STR_NULLTERM;
#> CSET H5T_CSET_ASCII;
#> CTYPE H5T_C_S1;
#> } "x" : 0;
#> H5T_IEEE_F64LE "y" : 7;
#> H5T_IEEE_F64LE "X2" : 15;
#> H5T_IEEE_F64LE "X3" : 23;
#> }
# Logicals are turned into integers to ensure compatibility with Python
# TRUE evaluates to 1, FALSE to 0, and NA to 2
srtdisk:::GuessDType(x = c(TRUE, FALSE, NA))
#> Class: H5T_INTEGER
#> Datatype: H5T_STD_I32LE
# }