Upload
adamwelc
View
121
Download
1
Embed Size (px)
Citation preview
Supportingthe “Rapi” C-language APIin an R-compatible engine
Michael Sannella
RIOT 2015
TERR, Packages, and Rapi
• TERR: TIBCO® Enterprise Runtime for R • A commercial R-compatible statistics engine
• We want TERR to support many external packages• Much of the value of R lies in packages from CRAN and
other repositories (Bioconductor, Github, etc.)
• Many packages are pure “R-language” packages• About 70% of CRAN packages are pure R-language
• Many packages contain C-language code• C-language code can access the R engine using the R
engine’s C-language API (“Rapi”)
RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 2
TERR Rapi Support
• Ultimate goal: TERR supports loading and executing binary packages unchanged
• Fallback: Modify package sources and rebuild before loading• If necessary, we can distribute modified packages for
TERR in our TRAN repository
RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 3
Example: Code Using Rapi
• R code from the CRAN package “splusTimeDate”:
.Call("time_to_hour_min_sec", x, timeZoneList())
• C code:
SEXP time_to_hour_min_sec(SEXP time_vec,SEXP zone_list)
{...PROTECT(ret = NEW_LIST(4));...SET_VECTOR_ELT(ret, 0, PROTECT(NEW_INTEGER(lng)));...hour_data = INTEGER(VECTOR_ELT(ret, 0));...hour_data[i] = td.hour;...UNPROTECT(7);return( ret );
}
• NEW_INTEGER macro expands to a call to the Rf_allocVector Rapi entry
RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 4
Rapi Shared Libraries: R.dll, etc.
• Rapi defines hundreds of library entries• ATTRIB, SET_VECTOR_ELT, R_alloc, Rf_allocVector, Rf_Protect, dcopy_, dgemm_, dlaic1_, etc.
• Embedding API (used by Rstudio)• Global variables: R_GlobalEnv, R_NaInt, etc.
• Packages link to shared library files R.dll, Rblas.dll, etc. (libR.so, etc. on Linux) that export these Rapientries
• To support Rapi, TERR contains R.dll, etc. libraries that forward Rapi calls to the engine
RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 5
TERR Rapi Support: Handles
• Observation: Rapi entries treat SEXP (R object pointer) values as opaque values passed to and from the engine
• TERR approach: Treat SEXP as an offset in a “handle table” containing pointers to internal TERR objects
• Benefit: Can convert TERR-only objects “in-place” as needed• Ex: Expand TERR integer sequence object to an integer
vector object when Rapi code calls INTEGER to access its contents
• Works well for many packages
RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 6
Problem: USE_RINTERNALS
• If the C constant USE_RINTERNALS is defined, many Rapi function calls are redefined as macros directly accessing R object internals
• USE_RINTERNALS is used in many popular CRAN packages: Rcpp, Rserve, igraph, etc
• TERR workaround: Make our own versions of packages without USE_RINTERNALS defined• Some tweaks needed to compile:
STRING_ELT(x, i) = value; // change this
SET_STRING_ELT(x, i, value); // to this
RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 7
USE_RINTERNALS More Efficient?
• Can improve efficiency of some idioms:for (int i = 0; i<LENGTH(obj); ++i) {
INTEGER(obj)[i] = 0;
}
• However, it is easy to rework code to reduce function calls:int len = LENGTH(obj);
int* data = INTEGER(obj);
for (int i = 0; i<len; ++i) {
data[i] = 0
}
RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 8
Solution: New Object Layout
• The TERR team is currently reworking the TERR object layout to be more compatible with R• SEXP is pointer to R-compatible object
• TERR C++ header stored before R-compatible object bytes
RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 9
SObject* SEXP
TERR C++
Object Header
R-Compatible
Header
Data
New Object Layout: Issues
• Issue: List object must contain array of SEXP pointers, not TERR C++ object pointers
• Issue: TERR-only objects (like sequence objects) must be converted before exposing to Rapi code• Only allow R-compatible objects within “Rapi space”
• Convert arguments to .Call
• Convert value returned by Rapi entries Rf_eval, etc.
• Concerns: Extra complexity, object size, performance hit
RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 10
Beyond USE_RINTERNALS:The data.table Package• The data.table package exploits knowledge of
engine behavior and object layout to improve performance• Manipulates TRUELENGTH field to reuse vectors
• Uses TRUELENGTH field of CHARSXP as for own uses during sorting
• Reads object bits to access string encoding quickly without Rapi function calls
• Problem: It is coding to a particular implementation of the R engine, rather than to a well-defined API
RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 11
Beyond USE_RINTERNALS:The stringi Package• Recent discovery:
extern "C" void R_init_stringi(DllInfo* dll)
{
...
stri_set_icu_data_directory(
(char*)*(char**)(dll) /* dll->path */);
...
}
• Uses knowledge of internal DllInfo data structure• May break if DllInfo structure changed• Would be better to get path some other way
RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 12
Challenge: TERR vs R GC Behavior
• TERR does more aggressive GC, collecting an object immediately when its reference count goes to zero
• Matrix package: code swapping dimnames// Csparse.c:Csparse_transposeSEXP dn = PROTECT(duplicate(GET_SLOT(x, Matrix_DimNamesSym)));tmp = VECTOR_ELT(dn, 0);SET_VECTOR_ELT(dn, 0, VECTOR_ELT(dn, 1)); /* GC tmp here? */SET_VECTOR_ELT(dn, 1, tmp);
• Problem: TERR collects tmp immediately if SET_VECTOR_ELT
reduces tmp reference count to zero
• Solution: Change SET_VECTOR_ELT so it won’t kill old element
RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 13
Challenge: SET_TYPEOF
• SET_TYPEOF changes an object type
• Mainly used for converting pairlist to language type
• Using Rapi handles, SET_TYPEOF can change object in place
• With USE_RINTERNALS defined, SET_TYPEOF just changes type bits• Problem: We can’t change TERR C++ object type
• Possibility: Auto-convert pairlist object as needed in engine, if type bits are set to language type
RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 14
Bad code: .Call returning void
• In R, .Call can call function returning void• Ex: in RSQLite:
void rsqlite_driver_init(SEXP records_,
SEXP cache_)
• As long as the value is not used, no problem• In TERR, .Call manipulates returned object
• TERR can check for valid return object pointers, but this has a performance cost
• Possibility: Special processing of .Call where value is not used
• Possibility: Turn on/off valid-pointer check• Q: worth supporting bad code?
RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 15
TERR Rapi Support: Status
• TERR supports many packages with Rapi code, using handles to SEXP objects
• We are reworking TERR object layout to support packages that access object internals via USE_RINTERNALS
• We are dealing with a number of compatibility challenges
RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 16