16
Supporting the “Rapi” C-language API in an R-compatible engine Michael Sannella [email protected] RIOT 2015

Supporting the "Rapi" C-laguage API in an R-compatible engine

Embed Size (px)

Citation preview

Supportingthe “Rapi” C-language APIin an R-compatible engine

Michael Sannella

[email protected]

RIOT 2015

TERR, Packages, and Rapi

• TERR: TIBCO® Enterprise Runtime for R • A commercial R-compatible statistics engine

• We want TERR to support many external packages• Much of the value of R lies in packages from CRAN and

other repositories (Bioconductor, Github, etc.)

• Many packages are pure “R-language” packages• About 70% of CRAN packages are pure R-language

• Many packages contain C-language code• C-language code can access the R engine using the R

engine’s C-language API (“Rapi”)

RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 2

TERR Rapi Support

• Ultimate goal: TERR supports loading and executing binary packages unchanged

• Fallback: Modify package sources and rebuild before loading• If necessary, we can distribute modified packages for

TERR in our TRAN repository

RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 3

Example: Code Using Rapi

• R code from the CRAN package “splusTimeDate”:

.Call("time_to_hour_min_sec", x, timeZoneList())

• C code:

SEXP time_to_hour_min_sec(SEXP time_vec,SEXP zone_list)

{...PROTECT(ret = NEW_LIST(4));...SET_VECTOR_ELT(ret, 0, PROTECT(NEW_INTEGER(lng)));...hour_data = INTEGER(VECTOR_ELT(ret, 0));...hour_data[i] = td.hour;...UNPROTECT(7);return( ret );

}

• NEW_INTEGER macro expands to a call to the Rf_allocVector Rapi entry

RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 4

Rapi Shared Libraries: R.dll, etc.

• Rapi defines hundreds of library entries• ATTRIB, SET_VECTOR_ELT, R_alloc, Rf_allocVector, Rf_Protect, dcopy_, dgemm_, dlaic1_, etc.

• Embedding API (used by Rstudio)• Global variables: R_GlobalEnv, R_NaInt, etc.

• Packages link to shared library files R.dll, Rblas.dll, etc. (libR.so, etc. on Linux) that export these Rapientries

• To support Rapi, TERR contains R.dll, etc. libraries that forward Rapi calls to the engine

RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 5

TERR Rapi Support: Handles

• Observation: Rapi entries treat SEXP (R object pointer) values as opaque values passed to and from the engine

• TERR approach: Treat SEXP as an offset in a “handle table” containing pointers to internal TERR objects

• Benefit: Can convert TERR-only objects “in-place” as needed• Ex: Expand TERR integer sequence object to an integer

vector object when Rapi code calls INTEGER to access its contents

• Works well for many packages

RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 6

Problem: USE_RINTERNALS

• If the C constant USE_RINTERNALS is defined, many Rapi function calls are redefined as macros directly accessing R object internals

• USE_RINTERNALS is used in many popular CRAN packages: Rcpp, Rserve, igraph, etc

• TERR workaround: Make our own versions of packages without USE_RINTERNALS defined• Some tweaks needed to compile:

STRING_ELT(x, i) = value; // change this

SET_STRING_ELT(x, i, value); // to this

RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 7

USE_RINTERNALS More Efficient?

• Can improve efficiency of some idioms:for (int i = 0; i<LENGTH(obj); ++i) {

INTEGER(obj)[i] = 0;

}

• However, it is easy to rework code to reduce function calls:int len = LENGTH(obj);

int* data = INTEGER(obj);

for (int i = 0; i<len; ++i) {

data[i] = 0

}

RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 8

Solution: New Object Layout

• The TERR team is currently reworking the TERR object layout to be more compatible with R• SEXP is pointer to R-compatible object

• TERR C++ header stored before R-compatible object bytes

RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 9

SObject* SEXP

TERR C++

Object Header

R-Compatible

Header

Data

New Object Layout: Issues

• Issue: List object must contain array of SEXP pointers, not TERR C++ object pointers

• Issue: TERR-only objects (like sequence objects) must be converted before exposing to Rapi code• Only allow R-compatible objects within “Rapi space”

• Convert arguments to .Call

• Convert value returned by Rapi entries Rf_eval, etc.

• Concerns: Extra complexity, object size, performance hit

RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 10

Beyond USE_RINTERNALS:The data.table Package• The data.table package exploits knowledge of

engine behavior and object layout to improve performance• Manipulates TRUELENGTH field to reuse vectors

• Uses TRUELENGTH field of CHARSXP as for own uses during sorting

• Reads object bits to access string encoding quickly without Rapi function calls

• Problem: It is coding to a particular implementation of the R engine, rather than to a well-defined API

RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 11

Beyond USE_RINTERNALS:The stringi Package• Recent discovery:

extern "C" void R_init_stringi(DllInfo* dll)

{

...

stri_set_icu_data_directory(

(char*)*(char**)(dll) /* dll->path */);

...

}

• Uses knowledge of internal DllInfo data structure• May break if DllInfo structure changed• Would be better to get path some other way

RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 12

Challenge: TERR vs R GC Behavior

• TERR does more aggressive GC, collecting an object immediately when its reference count goes to zero

• Matrix package: code swapping dimnames// Csparse.c:Csparse_transposeSEXP dn = PROTECT(duplicate(GET_SLOT(x, Matrix_DimNamesSym)));tmp = VECTOR_ELT(dn, 0);SET_VECTOR_ELT(dn, 0, VECTOR_ELT(dn, 1)); /* GC tmp here? */SET_VECTOR_ELT(dn, 1, tmp);

• Problem: TERR collects tmp immediately if SET_VECTOR_ELT

reduces tmp reference count to zero

• Solution: Change SET_VECTOR_ELT so it won’t kill old element

RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 13

Challenge: SET_TYPEOF

• SET_TYPEOF changes an object type

• Mainly used for converting pairlist to language type

• Using Rapi handles, SET_TYPEOF can change object in place

• With USE_RINTERNALS defined, SET_TYPEOF just changes type bits• Problem: We can’t change TERR C++ object type

• Possibility: Auto-convert pairlist object as needed in engine, if type bits are set to language type

RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 14

Bad code: .Call returning void

• In R, .Call can call function returning void• Ex: in RSQLite:

void rsqlite_driver_init(SEXP records_,

SEXP cache_)

• As long as the value is not used, no problem• In TERR, .Call manipulates returned object

• TERR can check for valid return object pointers, but this has a performance cost

• Possibility: Special processing of .Call where value is not used

• Possibility: Turn on/off valid-pointer check• Q: worth supporting bad code?

RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 15

TERR Rapi Support: Status

• TERR supports many packages with Rapi code, using handles to SEXP objects

• We are reworking TERR object layout to support packages that access object internals via USE_RINTERNALS

• We are dealing with a number of compatibility challenges

RIOT 2015 --- July 2015 © Copyright 2000-2015 TIBCO Software Inc. 16