Title: | Computations over Distributed Data without Aggregation |
---|---|
Description: | Implementing algorithms and fitting models when sites (possibly remote) share computation summaries rather than actual data over HTTP with a master R process (using 'opencpu', for example). A stratified Cox model and a singular value decomposition are provided. The former makes direct use of code from the R 'survival' package. (That is, the underlying Cox model code is derived from that in the R 'survival' package.) Sites may provide data via several means: CSV files, Redcap API, etc. An extensible design allows for new methods to be added in the future and includes facilities for local prototyping and testing. Web applications are provided (via 'shiny') for the implemented methods to help in designing and deploying the computations. |
Authors: | Balasubramanian Narasimhan [aut, cre], Marina Bendersky [aut], Sam Gross [aut], Terry M. Therneau [ctb], Thomas Lumley [ctb] |
Maintainer: | Balasubramanian Narasimhan <[email protected]> |
License: | LGPL (>= 2) |
Version: | 1.3-4 |
Built: | 2024-10-29 05:02:10 UTC |
Source: | https://github.com/bnaras/distcomp |
The function availableComputations
returns a list
of available computations with various components. The names of this list
(with no spaces) are unique canonical tags that are used throughout the
package to unambiguously refer to the type of computation; web applications
particularly rely on this list to instantiate objects. As more computations
are implemented, this list is augmented.
availableComputations()
availableComputations()
a list with the components corresponding to a computation
desc |
a textual description (25 chars at most) |
definitionApp |
the name of a function that will fire up a shiny webapp for defining the particular computation |
workerApp |
the name of a function that will fire up a shiny webapp for setting up a worker site for the particular computation |
masterApp |
the name of a function that will fire up a shiny webapp for setting up a master for the particular computation |
makeDefinition |
the name of a function that will return a data frame
with appropriate fields needed to define the particular computation assuming
that they are populated in a global variable. This function is used by web
applications to construct a definition object based on inputs specified
by the users. Since the full information is often gathered incrementally by
several web applications, the inputs are set in a global variable and
therefore retrieved here using the function |
makeMaster |
a function that will construct a master object for the computation given the definition and a logical flag indicating if debugging is desired |
makeWorker |
a function that will construct a worker object for that computation given the definition and data |
availableComputations()
availableComputations()
The function availableDataSources
returns the
currently implemented data sources such as CSV files, Redcap etc.
availableDataSources()
availableDataSources()
a list of named arguments, each of which is another list, with
required fields named desc
, a textual description and
requiredPackages
availableDataSources()
availableDataSources()
CoxWorker
worker objectsCoxMaster
objects instantiate and run a distributed Cox model
computation fit
new()
CoxMaster
objects instantiate and run a distributed Cox model
computation fit
CoxMaster$new(defn, debug = FALSE)
defn
a computation definition
debug
a flag for debugging, default FALSE
R6 CoxMaster
object
kosher()
Check if inputs and state of object are sane. For future use
CoxMaster$kosher()
TRUE
or FALSE
logLik()
Return the partial log likelihood on all data for given beta
parameter.
CoxMaster$logLik(beta)
beta
the parameter vector
a named list with three components: value
contains the value of the
log likelihood, gradient
contains the score vector, and hessian
contains
the estimated hessian matrix
addSite()
Add a url or worker object for a site for participating in the distributed computation. The worker object can be used to avoid complications in debugging remote calls during prototyping.
CoxMaster$addSite(name, url = NULL, worker = NULL)
name
of the site
url
web url of the site; exactly one of url
or worker
should be specified
worker
worker object for the site; exactly one of url
or worker
should be specified
run()
Run the distributed Cox model fit and return the estimates
CoxMaster$run(control = coxph.control())
control
parameters, same as survival::coxph.control()
a named list of beta
, var
, gradient
, iter
, and returnCode
#' @description ' Return the summary of fit as a data frame
summary()
CoxMaster$summary()
a summary data frame columns for coef
,
exp(coef)
, ' standard error, z-score, and p-value for each
parameter in the model following the same format as the
survival
package
clone()
The objects of this class are cloneable with this method.
CoxMaster$clone(deep = FALSE)
deep
Whether to make a deep clone.
CoxWorker
which generates objects matched to such a master object
CoxMaster
master objectsCoxWorker
objects are worker objects at each data site of
a distributed Cox model computation
new()
Create a new CoxWorker
object.
CoxWorker$new(defn, data, stateful = TRUE)
defn
the computation definition
data
the local data
stateful
a boolean flag indicating if state needs to be preserved between REST calls
a new CoxWorker
object
getP()
Return the dimension of the parameter vector.
CoxWorker$getP(...)
...
other args ignored
the dimension of the parameter vector
getStateful()
Return the stateful status of the object.
CoxWorker$getStateful()
the stateful flag, TRUE
or FALSE
logLik()
Return the partial log likelihood on local data for given beta
parameter.
CoxWorker$logLik(beta, ...)
beta
the parameter vector
...
further arguments, currently unused
a named list with three components: value
contains the value of the
log likelihood, gradient
contains the score vector, and hessian
contains
the estimated hessian matrix
var()
Return the variance of estimate for given beta
parameter on local data.
CoxWorker$var(beta, ...)
beta
the parameter vector
...
further arguments, currently unused
variance vector
kosher()
Check if inputs and state of object are sane. For future use
CoxWorker$kosher()
TRUE
or FALSE
clone()
The objects of this class are cloneable with this method.
CoxWorker$clone(deep = FALSE)
deep
Whether to make a deep clone.
CoxMaster
which goes hand-in-hand with this object
The function createHEWorkerInstance
uses a
definition identified by defnId to create the appropriate
object instance for HE computations. The instantiated object is
searched for in the instance path and loaded if already
present, otherwise it is created and assigned the instanceId
and saved under the dataFileName if the latter is specified.
This instantiated object may change state between iterations
when a computation executes
createHEWorkerInstance( defnId, instanceId, pubkey_bits = NULL, pubkey_n = NULL, den_bits = NULL, dataFileName = NULL )
createHEWorkerInstance( defnId, instanceId, pubkey_bits = NULL, pubkey_n = NULL, den_bits = NULL, dataFileName = NULL )
defnId |
the identifier of an already defined computation |
instanceId |
an indentifier to use for the created instance |
pubkey_bits |
number of bits for public key |
pubkey_n |
the |
den_bits |
the number of bits for the denominator |
dataFileName |
a file name to use for saving the
data. Typically |
TRUE if everything goes well
This function uses an identifier (defnId
) to locate
a stored definition in the workspace to create the appropriate
object instance. The instantiated object is assigned the
instanceId and saved under the dataFileName if the latter is
not NULL
. This instantiated object may change state between
iterations when a computation executes
createNCPInstance( name, ncpId, instanceId, pubkey_bits, pubkey_n, den_bits, dataFileName = NULL )
createNCPInstance( name, ncpId, instanceId, pubkey_bits, pubkey_n, den_bits, dataFileName = NULL )
name |
identifying the NC party |
ncpId |
the id indicating the NCP definition |
instanceId |
an indentifier to use for the created instance |
pubkey_bits |
the public key number of bits |
pubkey_n |
the pubkey n |
den_bits |
the denominator number of bits for for rational approximations |
dataFileName |
a file name to use for saving the
data. Typically |
TRUE if everything goes well
The function createWorkerInstance
uses a definition identified by
defnId to create the appropriate object instance. The instantiated object is assigned
the instanceId and saved under the dataFileName if the latter is specified.
This instantiated object may change state between iterations when a computation executes
createWorkerInstance( defnId, instanceId, pubkey_bits = NULL, pubkey_n = NULL, den_bits = NULL, dataFileName = NULL )
createWorkerInstance( defnId, instanceId, pubkey_bits = NULL, pubkey_n = NULL, den_bits = NULL, dataFileName = NULL )
defnId |
the identifier of an already defined computation |
instanceId |
an indentifier to use for the created instance |
pubkey_bits |
number of bits for public key |
pubkey_n |
the |
den_bits |
the number of bits for the denominator |
dataFileName |
a file name to use for saving the data. Typically |
TRUE if everything goes well
This function just calls runDistcompApp()
with the
parameter "definition"
defineNewComputation()
defineNewComputation()
the results of running the web application
The function destroyInstanceObject
deletes an object associated
with the instanceId. This is typically done after a computation completes and results
have been obtained.
destroyInstanceObject(instanceId)
destroyInstanceObject(instanceId)
instanceId |
the id of the object to destroy |
TRUE if everything goes well
distcomp
is a collection of methods to fit models to data that may be
distributed at various sites. The package arose as a way of addressing the
issues regarding data aggregation; by allowing sites to have control over
local data and transmitting only summaries, some privacy controls can be
maintained. Even when participants have no objections in principle to data
aggregation, it may still be useful to keep data local and expose just the
computations. For further details, please see the reference cited below.
The initial implementation consists of a stratified Cox model fit with distributed survival data and a Singular Value Decomposition of a distributed matrix. General Linear Models will soon be added. Although some sanity checks and balances are present, many more are needed to make this truly robust. We also hope that other methods will be added by users.
We make the following assumptions in the implementation:
(a) the aggregate data is logically a stacking of data at each site, i.e.,
the full data is row-partitioned into sites where the rows are observations;
(b) Each site has the package distcomp
installed and a workspace setup
for (writeable) use by the opencpu
server
(see distcompSetup()
; and (c) each site is exposing distcomp
via an opencpu
server.
The main computation happens via a master process, a script of R code,
that makes calls to distcomp
functions at worker sites via opencpu
.
The use of opencpu
allows developers to prototype their distributed implementations
on a local machine using the opencpu
package that runs such a server locally
using localhost
ports.
Note that distcomp
computations are not intended for speed/efficiency;
indeed, they are orders of magnitude slower. However, the models that are fit are
not meant to be recomputed often. These and other details are discussed in the
paper mentioned above.
The current implementation, particularly the Stratified Cox Model, makes direct use of
code from survival::coxph()
. That is, the underlying Cox model code is
derived from that in the R survival
survival package.
For an understanding of how this package is meant to be used, please see the documented examples and the reference.
Software for Distributed Computation on Medical Databases: A Demonstration Project. Journal of Statistical Software, 77(13), 1-22. doi:10.18637/jss.v077.i13
Appendix E of Modeling Survival Data: Extending the Cox Model by Terry M. Therneau and Patricia Grambsch. Springer Verlag, 2000.
The examples in system.file("doc", "examples.html", package="distcomp")
The source for the examples: system.file("doc_src", "examples.Rmd", package="distcomp")
.
The function distcompSetup
sets up a distributed computation
and configures some global parameters such as definition file names,
data file names, instance object file names, and ssl configuration parameters. The
function creates some of necessary subdirectories if not already present and throws
an error if the workspace areas are not writeable
distcompSetup( workspacePath = "", defnPath = paste(workspacePath, "defn", sep = .Platform$file.sep), instancePath = paste(workspacePath, "instances", sep = .Platform$file.sep), defnFileName = "defn.rds", dataFileName = "data.rds", instanceFileName = "instance.rds", resultsCacheFileName = "results_cache.rds", ssl_verifyhost = 1L, ssl_verifypeer = 1L )
distcompSetup( workspacePath = "", defnPath = paste(workspacePath, "defn", sep = .Platform$file.sep), instancePath = paste(workspacePath, "instances", sep = .Platform$file.sep), defnFileName = "defn.rds", dataFileName = "data.rds", instanceFileName = "instance.rds", resultsCacheFileName = "results_cache.rds", ssl_verifyhost = 1L, ssl_verifypeer = 1L )
workspacePath |
a folder specifying the workspace path. This has to be writable by the opencpu process. On a cloud opencpu server on Ubuntu, for example, this requires a one-time modification of apparmor profiles to enable write permissions to this path |
defnPath |
the path where definition files will reside, organized by computation identifiers |
instancePath |
the path where instance objects will reside |
defnFileName |
the name for the compdef definition files |
dataFileName |
the name for the data files |
instanceFileName |
the name for the instance files |
resultsCacheFileName |
the name for the instance results cache files for HE computations |
ssl_verifyhost |
integer value, usually |
ssl_verifypeer |
integer value, usually |
TRUE if all is well
## Not run: distcompSetup(workspacePath="./workspace") ## End(Not run)
## Not run: distcompSetup(workspacePath="./workspace") ## End(Not run)
The function executeHEMethod
is a homomorphic
encryption wrapper around executeMethod
. It ensures any
returned result is encrypted using the homomorphic encryption
function.
executeHEMethod(objectId, method, ...)
executeHEMethod(objectId, method, ...)
objectId |
the (instance) identifier of the object on which to invoke a method |
method |
the name of the method to invoke |
... |
further arguments as appropriate for the method |
a list containing an integer and a fractional result converted to characters
The function executeMethod
is really the heart of
distcomp. It executes an arbitrary method on an object that
has been serialized to the distcomp workspace with any
specified arguments. The result, which is dependent on the
computation that is executed, is returned. If the object needs
to save state between iterations on it, it is automatically
serialized back for the ensuing iterations
executeMethod(objectId, method, ...)
executeMethod(objectId, method, ...)
objectId |
the (instance) identifier of the object on which to invoke a method |
method |
the name of the method to invoke |
... |
further arguments as appropriate for the method |
a result that depends on the computation being executed
A hash is generated based on the contents of the object
generateId(object, algo = "xxhash64")
generateId(object, algo = "xxhash64")
object |
the object for which a hash is desired |
algo |
the algorithm to use, default is "xxhash64" from
|
the hash as a string
In distcomp, several web applications need to communicate between themselves. Since only one application is expected to be active at any time, they do so via a global store, essentially a hash table. This function retrieves the value of a name
getComputationInfo(name)
getComputationInfo(name)
name |
the name for the object |
the value for the variable, NULL
if not set
The function getConfig
returns the values of the
configuration parameters set up by distcompSetup
getConfig(...)
getConfig(...)
... |
any further arguments |
a list consisting of
workspacePath |
a folder specifying the workspace path. This has to be writable by the opencpu process. On a cloud opencpu server on Ubuntu, for example, this requires a one-time modification of apparmor profiles to enable write permissions to this path |
defnPath |
the path where definition files will reside, organized by computation identifiers |
instancePath |
the path where instance objects will reside |
defnFileName |
the name for the compdef definition files |
dataFileName |
the name for the data files |
instanceFileName |
the name for the instance files |
ssl_verifyhost |
integer value, usually |
ssl_verifypeer |
integer value, usually |
## Not run: getConfig() ## End(Not run)
## Not run: getConfig() ## End(Not run)
HEMaster
objects run a distributed computation based
upon a definition file that encapsulates all information
necessary to perform a computation. A master makes use of two
non-cooperating parties which communicate with sites that
perform the actual computations using local data.
den
denominator for rational arithmetic
den_bits
number of bits for denominator for rational arithmetic
new()
Create a HEMaster
object to run homomorphic encrypted computation
HEMaster$new(defn)
defn
the homomorphic computation definition
a HEMaster
object
getNC_party()
Return a list of noncooperating parties (NCPs)
HEMaster$getNC_party()
a named list of length 2 of noncooperating party information
getPubkey()
Return the public key from the public private key pair
HEMaster$getPubkey()
an R6 Pubkey
object
addNCP()
Add a noncooperating party to this master either using a url or an object in session for prototyping
HEMaster$addNCP(ncp_defn, url = NULL, ncpWorker = NULL)
ncp_defn
the definition of the NCP
url
the url for the NCP; only one of url and ncpWorker should be non-null
ncpWorker
an instantiated worker object; only one of url and ncpWorker should be non-null
run()
Run a distributed homomorphic encrypted computation and return the result
HEMaster$run(debug = FALSE)
debug
a flag for debugging, default FALSE
the result of the distributed homomorphic computation
clone()
The objects of this class are cloneable with this method.
HEMaster$clone(deep = FALSE)
deep
Whether to make a deep clone.
HEQueryCountWorker()
HEQueryCountMaster
objects instantiate and run a distributed homomorphic query count computation; they're instantiated by non-cooperating parties (NCPs)
distcomp::QueryCountMaster
-> HEQueryCountMaster
pubkey
the master's public key visible to everyone
pubkey_bits
the number of bits in the public key (used for reconstructing public key remotely by serializing to character)
pubkey_n
the n
for the public key used for reconstructing public key remotely
den
the denominator for rational arithmetic
den_bits
the number of bits in the denominator used for reconstructing denominator remotely
new()
Create a new HEQueryCountMaster
object.
HEQueryCountMaster$new(defn, partyNumber, debug = FALSE)
defn
the computation definition
partyNumber
the party number of the NCP that this object belongs to (1 or 2)
debug
a flag for debugging, default FALSE
a new HEQueryCountMaster
object
setParams()
Set some parameters of the HEQueryCountMaster
object for homomorphic computations
HEQueryCountMaster$setParams(pubkey_bits, pubkey_n, den_bits)
pubkey_bits
the number of bits in public key
pubkey_n
the n
for the public key
den_bits
the number of bits in the denominator (power of 2) used in rational approximations
kosher()
Check if inputs and state of object are sane. For future use
HEQueryCountMaster$kosher()
TRUE
or FALSE
queryCount()
Run the distributed query count, associate it with a token, and return the result
HEQueryCountMaster$queryCount(token)
token
a token to use as key
the partial result as a list of encrypted items with components int
and frac
cleanup()
Cleanup the instance objects
HEQueryCountMaster$cleanup()
run()
Run the homomorphic encrypted distributed query count computation
HEQueryCountMaster$run(token)
token
a token to use as key
the partial result as a list of encrypted items with components int
and frac
clone()
The objects of this class are cloneable with this method.
HEQueryCountMaster$clone(deep = FALSE)
deep
Whether to make a deep clone.
HEQueryCountWorker()
which goes hand-in-hand with this object
HEQueryCountMaster()
HEQueryCountWorker
objects are worker objects at each site of
a distributed query count model computation using homomorphic encryption
distcomp::QueryCountWorker
-> HEQueryCountWorker
pubkey
the master's public key visible to everyone
den
the denominator for rational arithmetic
new()
Create a new HEQueryMaster
object.
HEQueryCountWorker$new( defn, data, pubkey_bits = NULL, pubkey_n = NULL, den_bits = NULL )
defn
the computation definition
data
the data which is usually the list of sites
pubkey_bits
the number of bits in public key
pubkey_n
the n
for the public key
den_bits
the number of bits in the denominator (power of 2) used in rational approximations
a new HEQueryMaster
object
setParams()
Set some parameters for homomorphic computations
HEQueryCountWorker$setParams(pubkey_bits, pubkey_n, den_bits)
pubkey_bits
the number of bits in public key
pubkey_n
the n
for the public key
den_bits
the number of bits in the denominator (power of 2) used in rational approximations
queryCount()
Run the query count on local data and return the appropriate encrypted result to the party
HEQueryCountWorker$queryCount(partyNumber, token)
partyNumber
the NCP party number (1 or 2)
token
a token to use for identifying parts of the same computation for NCP1 and NCP2
the count as a list of encrypted items with components int
and frac
clone()
The objects of this class are cloneable with this method.
HEQueryCountWorker$clone(deep = FALSE)
deep
Whether to make a deep clone.
HEQueryCountMaster()
which goes hand-in-hand with this object
The function makeDefinition
returns a computational
definition based on current inputs (from the global store) given a
canonical computation type tag. This is a utility function for web
applications to use as input is being gathered
makeDefinition(compType)
makeDefinition(compType)
compType |
the canonical computation type tag |
a data frame corresponding to the computation type
## Not run: makeDefinition(names(availableComputations())[1]) ## End(Not run)
## Not run: makeDefinition(names(availableComputations())[1]) ## End(Not run)
Instantiate a master process for HE operations
makeHEMaster(defn)
makeHEMaster(defn)
defn |
the computation definition |
an master object for HE operations
The function makeMaster
returns a master object
corresponding to the definition. The types of master objects
that can be created depend upon the available computations
makeMaster(defn, partyNumber = NULL, debug = FALSE)
makeMaster(defn, partyNumber = NULL, debug = FALSE)
defn |
the computation definition |
partyNumber |
the number of the noncooperating party, which can be optionally set if HE is desired |
debug |
a debug flag |
a master object of the appropriate class based on the definition
Instantiate an noncooperating party
makeNCP( ncp_defn, comp_defn, sites = list(), pubkey_bits = NULL, pubkey_n = NULL, den_bits = NULL )
makeNCP( ncp_defn, comp_defn, sites = list(), pubkey_bits = NULL, pubkey_n = NULL, den_bits = NULL )
ncp_defn |
the NCP definition |
comp_defn |
the computation definition |
sites |
a list of sites each entry a named list of name, url, worker |
pubkey_bits |
number of bits for public key |
pubkey_n |
the n for the public key |
den_bits |
the log to base 2 of the denominator |
an NCP object
The function makeWorker
returns an object of the
appropriate type based on a computation definition and sets the
data for the object. The types of objects that can be created
depend upon the available computations
makeWorker(defn, data, pubkey_bits = NULL, pubkey_n = NULL, den_bits = NULL)
makeWorker(defn, data, pubkey_bits = NULL, pubkey_n = NULL, den_bits = NULL)
defn |
the computation definition |
data |
the data for the computation |
pubkey_bits |
the number of bits for the public key (used only
if |
pubkey_n |
the |
den_bits |
the number of bits for the denominator (used only
if |
a worker object of the appropriate class based on the definition
NCP
objects are worker objects that separate a
master process from communicating directly with the worker
processes. Typically two such are needed for a distributed
homomorphic computation. A master process can communicate with
NCP
objects and the NCP
objects can communicate
with worker processes. However, the two NCP
objects,
designated by numbers 1 and 2, are non-cooperating in the sense
that they don't communicate with each other and are isolated
from each other.
pubkey
the master's public key visible to everyone
pubkey_bits
the number of bits in the public key (used for reconstructing public key remotely by serializing to character)
pubkey_n
the n
for the public key used for reconstructing public key remotely
den
the denominator for rational arithmetic
den_bits
the number of bits in the denominator used for reconstructing denominator remotely
new()
Create a new NCP
object.
NCP$new( ncp_defn, comp_defn, sites = list(), pubkey_bits = NULL, pubkey_n = NULL, den_bits = NULL )
ncp_defn
the NCP definition; see example
comp_defn
the computation definition
sites
list of sites
pubkey_bits
the number of bits in public key
pubkey_n
the n
for the public key
den_bits
the number of bits in the denominator (power of 2) used in rational approximations
a new NCP
object
getStateful()
Retrieve the value of the stateful
field
NCP$getStateful()
setParams()
Set some parameters of the NCP
object for homomorphic computations
NCP$setParams(pubkey_bits, pubkey_n, den_bits)
pubkey_bits
the number of bits in public key
pubkey_n
the n
for the public key
den_bits
the number of bits in the denominator (power of 2) used in rational approximations
getSites()
Retrieve the value of the private sites
field
NCP$getSites()
setSites()
Set the value of the private sites
field
NCP$setSites(sites)
sites
the list of sites
addSite()
Add a url or worker object for a site for participating in the distributed computation. The worker object can be used to avoid complications in debugging remote calls during prototyping.
NCP$addSite(name, url = NULL, worker = NULL)
name
of the site
url
web url of the site; exactly one of url
or worker
should be specified
worker
worker object for the site; exactly one of url
or worker
should be specified
cleanupInstance()
Clean up by destroying instance objects created in workspace.
NCP$cleanupInstance(token)
token
the token for the instance
run()
Run the distributed homomorphic computation
NCP$run(token)
token
a unique token for the run, used to ensure that correct parts of cached results are returned appropriately
the result of the computation
clone()
The objects of this class are cloneable with this method.
NCP$clone(deep = FALSE)
deep
Whether to make a deep clone.
QueryCountWorker()
QueryCountMaster
objects instantiate and run a distributed query count computation
new()
Create a new QueryCountMaster
object.
QueryCountMaster$new(defn, debug = FALSE)
defn
the computation definition
debug
a flag for debugging, default FALSE
a new QueryCountMaster
object
kosher()
Check if inputs and state of object are sane. For future use
QueryCountMaster$kosher()
TRUE
or FALSE
queryCount()
Run the distributed query count and return the result
QueryCountMaster$queryCount()
the count
getSites()
Retrieve the value of the private sites
field
QueryCountMaster$getSites()
addSite()
Add a url or worker object for a site for participating in the distributed computation. The worker object can be used to avoid complications in debugging remote calls during prototyping.
QueryCountMaster$addSite(name, url = NULL, worker = NULL)
name
of the site
url
web url of the site; exactly one of url
or worker
should be specified
worker
worker object for the site; exactly one of url
or worker
should be specified
run()
Run the distributed query count
QueryCountMaster$run()
the count
clone()
The objects of this class are cloneable with this method.
QueryCountMaster$clone(deep = FALSE)
deep
Whether to make a deep clone.
QueryCountWorker()
which goes hand-in-hand with this object
QueryCountMaster()
QueryCountWorker
objects are worker objects at each site of
a distributed QueryCount model computation
new()
Create a new QueryCountWorker
object.
QueryCountWorker$new(defn, data, stateful = FALSE)
defn
the computation definition
data
the local data
stateful
the statefulness flag, default FALSE
a new QueryCountWorker
object
getStateful()
Retrieve the value of the stateful
field
QueryCountWorker$getStateful()
kosher()
Check if inputs and state of object are sane. For future use
QueryCountWorker$kosher()
TRUE
or FALSE
queryCount()
Return the query count on the local data
QueryCountWorker$queryCount()
clone()
The objects of this class are cloneable with this method.
QueryCountWorker$clone(deep = FALSE)
deep
Whether to make a deep clone.
QueryCountMaster()
which goes hand-in-hand with this object
In distcomp, several web applications need to communicate between themselves. Since only one application is expected to be active at any time, they do so via a global store, essentially a hash table. This function clears the store, except for the working directory.
resetComputationInfo()
resetComputationInfo()
an empty list
setComputationInfo()
, getComputationInfo()
Web applications can define computation, setup worker sites or masters. This function invokes the appropriate web application depending on the task
runDistcompApp(appType = c("definition", "setupWorker", "setupMaster"))
runDistcompApp(appType = c("definition", "setupWorker", "setupMaster"))
appType |
one of three values: "definition", "setupWorker", "setupMaster" |
the results of running the web application
defineNewComputation()
, setupWorker()
, setupMaster()
The function saveNewComputation
uses the computation definition to save
a new computation instance. This is typically done for every site that wants to participate
in a computation with its own local data. The function examines the computation definition
and uses the identifier therein to uniquely refer to the computation instance at the site.
This function is invoked (maybe remotely) on the opencpu server by
uploadNewComputation()
when a worker site is being set up
saveNewComputation(defn, data, dataFileName = NULL)
saveNewComputation(defn, data, dataFileName = NULL)
defn |
an already defined computation |
data |
the (local) data to use |
dataFileName |
a file name to use for saving the data. Typically |
TRUE if everything goes well
The function saveNewNCP
uses the list of sites
definition to save a new NCP instance. This is
typically done for every pair of NCPs used in a computation. The function examines the
computation definition and uses the identifier therein to
uniquely refer to the computation instance at the site. This
function is invoked (maybe remotely) on the opencpu server by
uploadNewComputation()
when a worker site is being set up
saveNewNCP(defn, comp_defn, data, dataFileName = NULL)
saveNewNCP(defn, comp_defn, data, dataFileName = NULL)
defn |
a definition of the ncp |
comp_defn |
the computation definition |
data |
the list of sites with name and url to use |
dataFileName |
a file name to use for saving the
data. Typically |
TRUE if everything goes well
In distcomp, several web applications need to communicate between themselves. Since only one application is expected to be active at any time, they do so via a global store, essentially a hash table. This function sets a name to a value
setComputationInfo(name, value)
setComputationInfo(name, value)
name |
the name for the object |
value |
the value for the object |
invisibly returns the all the name value pairs
This function just calls runDistcompApp()
with the
parameter "setupMaster"
setupMaster()
setupMaster()
the results of running the web application
This function just calls runDistcompApp()
with the
parameter "setupWorker"
setupWorker()
setupWorker()
the results of running the web application
SVDWorker()
SVDMaster
objects instantiate and run a distributed SVD computation
new()
SVDMaster
objects instantiate and run a distributed SVD computation
SVDMaster$new(defn, debug = FALSE)
defn
a computation definition
debug
a flag for debugging, default FALSE
R6 SVDMaster
object
kosher()
Check if inputs and state of object are sane. For future use
SVDMaster$kosher()
TRUE
or FALSE
updateV()
Return an updated value for the V
vector, normalized by arg
SVDMaster$updateV(arg)
arg
the normalizing value
...
other args ignored
updated V
updateU()
Update U
and return the updated norm of U
SVDMaster$updateU(arg)
arg
the normalizing value
...
other args ignored
updated norm of U
fixFit()
Construct the residual matrix using given the V
vector and d
so far
SVDMaster$fixFit(v, d)
v
the value for v
d
the value for d
result
reset()
Reset the computation state by initializing work matrix and set up starting values for iterating
SVDMaster$reset()
addSite()
Add a url or worker object for a site for participating in the distributed computation. The worker object can be used to avoid complications in debugging remote calls during prototyping.
SVDMaster$addSite(name, url = NULL, worker = NULL)
name
of the site
url
web url of the site; exactly one of url
or worker
should be specified
worker
worker object for the site; exactly one of url
or worker
should be specified
run()
Run the distributed Cox model fit and return the estimates
SVDMaster$run(thr = 1e-08, max.iter = 100)
thr
the threshold for convergence, default 1e-8
max.iter
the maximum number of iterations, default 100
a named list of V
, d
summary()
Return the summary result
SVDMaster$summary()
a named list of V
, d
clone()
The objects of this class are cloneable with this method.
SVDMaster$clone(deep = FALSE)
deep
Whether to make a deep clone.
SVDWorker()
which goes hand-in-hand with this object
SVDMaster()
SVDWorker
objects are worker objects at each site of a distributed SVD model computation
new()
Create a new SVDWorker
object.
SVDWorker$new(defn, data, stateful = TRUE)
defn
the computation definition
data
the local x
matrix
stateful
a boolean flag indicating if state needs to be preserved between REST calls, TRUE
by default
a new SVDWorker
object
reset()
Reset the computation state by initializing work matrix and set up starting values for iterating
SVDWorker$reset()
dimX()
Return the dimensions of the matrix
SVDWorker$dimX(...)
...
other args ignored
the dimension of the matrix
updateV()
Return an updated value for the V
vector, normalized by arg
SVDWorker$updateV(arg, ...)
arg
the normalizing value
...
other args ignored
updated V
updateU()
Update U
and return the updated norm of U
SVDWorker$updateU(arg, ...)
arg
the initial value
...
other args ignored
updated norm of U
normU()
Normalize U
vector
SVDWorker$normU(arg, ...)
arg
the normalizing value
...
other args ignored
TRUE
invisibly
fixU()
Construct residual matrix using arg
SVDWorker$fixU(arg, ...)
arg
the value to use for residualizing
...
other args ignored
getN()
Getthe number of rows of x
matrix
SVDWorker$getN()
the number of rows of x
matrix
getP()
Getthe number of columnsof x
matrix
SVDWorker$getP()
the number of columns of x
matrix
getStateful()
Return the stateful status of the object.
SVDWorker$getStateful()
the stateful flag, TRUE
or FALSE
kosher()
Check if inputs and state of object are sane. For future use
SVDWorker$kosher()
TRUE
or FALSE
clone()
The objects of this class are cloneable with this method.
SVDWorker$clone(deep = FALSE)
deep
Whether to make a deep clone.
SVDMaster()
which goes hand-in-hand with this object
The function uploadNewComputation
is really a remote version
of saveNewComputation()
, invoking that function on an opencpu server.
This is typically done for every site that wants to participate in a computation
with its own local data. Note that a site is always a list of at least a unique
name element (distinguishing the site from others) and a url element.
uploadNewComputation(site, defn, data)
uploadNewComputation(site, defn, data)
site |
a list of two items, a unique |
defn |
the identifier of an already defined computation |
data |
the (local) data to use |
TRUE if everything goes well
The function uploadNewNCP
is really a remote version
of saveNewNCP()
, invoking that function on an opencpu server.
This is typically done for the two NCPs participating in a
computation with the list of sites. Note that sites are always
a list of at least a unique name element (distinguishing the
site from others) and a url element.
uploadNewNCP(defn, comp_defn, url = NULL, worker = NULL, sites)
uploadNewNCP(defn, comp_defn, url = NULL, worker = NULL, sites)
defn |
a definition for the NCP |
comp_defn |
the computation definition |
url |
the url for the NCP. Only one of url and worker can be non-null |
worker |
the worker for the NCP if local. Only one of url and worker can be non-null |
sites |
a list of lists, each containing two items, a unique
|
TRUE if everything goes well
Once a computation is defined, worker sites are set up, the master process code is written by this function. The current implementation does not allow one to mix localhost URLs with non-localhost URLs
writeCode(defn, sites, outputFilenamePrefix)
writeCode(defn, sites, outputFilenamePrefix)
defn |
the computation definition |
sites |
a named list of site URLs participating in the computation |
outputFilenamePrefix |
the name of the output file prefix using which code and data will be written |
the value TRUE
if all goes well