Running analyses
Here's what's to it ...
A little background of what's happening under the hood
As explained in previous sections, running an algorithm consists of the following steps:
Send a task to the nodes in your collaboration, consisting of:
The docker image you want the nodes to run (the docker image contains the algorithm you're interested in)
Any input you might want to provide to the docker image
Each node will execute the docker image.
The code/algorithm in the docker image will have access to:
The node's data
The input that was provided in the task
The node returns the result returned by the algorithm
Of course, there are few people that would actually like to run a regression (or another statistical analysis) this way. Therefore, the algorithms that have been written for the infrastructure have been packaged in such a way that using them is manageable.
So how, then?
As a researcher you would normally fire up your (least) favorite programming language and do something like this if you'd want to check which columns are in your data:
# This assumes the package devtools is installed:
devtools::install_github('mellesies/vtg.basic', subdir='src')
# Load the SEER dataset located in the vtg.basic package
data('SEER', package='vtg.basic')
# Print all the column names in the dataset
print( colnames(SEER) )
# Expected output:
# [1] "Age" "Race2" "Race3" "Mar2" "Mar3" "Mar4" "Mar5"
# [8] "Mar9" "Hist8520" "hist8522" "hist8480" "hist8501" "hist8201" "hist8211"
#[15] "grade" "ts" "nne" "npn" "er2" "er4" "Time"
#[22] "Censor" # In a federated situation, you won't have direct access to the data. Instead, you'd have to instruct the nodes to run a docker image that returns the list of column names. Since we'd also have to communicate with the infrastructure, we need two things:
A Docker image (with software that returns the column names)
A client to facilitate communication
Fortunately, a docker image that returns is already available and using it is not too difficult:
Last updated
Was this helpful?