Running analyses

Here's what's to it ...

A little background of what's happening under the hood

As explained in previous sections, running an algorithm consists of the following steps:

  • Send a task to the nodes in your collaboration, consisting of:

    • The docker image you want the nodes to run (the docker image contains the algorithm you're interested in)

    • Any input you might want to provide to the docker image

  • Each node will execute the docker image.

    • The code/algorithm in the docker image will have access to:

      • The node's data

      • The input that was provided in the task

    • The node returns the result returned by the algorithm

Of course, there are few people that would actually like to run a regression (or another statistical analysis) this way. Therefore, the algorithms that have been written for the infrastructure have been packaged in such a way that using them is manageable.

So how, then?

As a researcher you would normally fire up your (least) favorite programming language and do something like this if you'd want to check which columns are in your data:

# This assumes the package devtools is installed:
devtools::install_github('mellesies/vtg.basic', subdir='src')

# Load the SEER dataset located in the vtg.basic package
data('SEER', package='vtg.basic')

# Print all the column names in the dataset
print( colnames(SEER) )

# Expected output:
# [1] "Age"      "Race2"    "Race3"    "Mar2"     "Mar3"     "Mar4"     "Mar5"    
# [8] "Mar9"     "Hist8520" "hist8522" "hist8480" "hist8501" "hist8201" "hist8211"
#[15] "grade"    "ts"       "nne"      "npn"      "er2"      "er4"      "Time"    
#[22] "Censor"  # 

In a federated situation, you won't have direct access to the data. Instead, you'd have to instruct the nodes to run a docker image that returns the list of column names. Since we'd also have to communicate with the infrastructure, we need two things:

  • A Docker image (with software that returns the column names)

  • A client to facilitate communication

Fortunately, a docker image that returns is already available and using it is not too difficult:

Last updated

Was this helpful?