Client guide

This page describes the use of the rubin.nublado.client.NubladoClient class and its provided testing classes.

Installing the Client

The client can be installed from PyPI with:

pip install rubin-nublado-client

Using the Client

The NubladoClient is designed to make interaction with JupyterHub and Jupyterlab, as they are configured in the RSP environment, easy.

A particular instance of the client represents a single user in a particular RSP environment. A user, in this context, means a token (which will have a set of scopes allowing various actions within the RSP) bound to a username. Both of these will be available in the service you are writing with each request, in the X-Auth-Request-Token and X-Auth-Request-User headers on the request. You should not, in general, need to go to Gafaelfawr to extract any further information about the token, but you must be prepared to handle 401s and 403s in case the token you have is invalid, expired, or does not grant sufficient scope for the service you want to use.

Sequence of Events

A typical interaction with the client usually looks like this:

  1. Authenticate to the Hub with the auth_to_hub() method.

  2. Determine whether or not you have a running lab with is_lab_stopped().

  3. If you need to, spawn a lab with spawn_lab().

  4. Wait for the lab to spawn by looping through watch_spawn_progress() until you get a progress message indicating the lab is ready.

  5. Authenticate to the Lab with auth_to_lab()

  6. Use open_lab_session() as a context manager to interact with your lab.

  7. Do whatever it is you wanted to do with the lab; more below.

  8. When done, use stop_lab() to shut down the lab, if desired.

The client_test test in the client test suite steps through this process and may be a useful model.

Interacting With The Lab

NubladoClient provides three methods of interacting with a spawned lab. These are methods on the JupyterLabSession object you will have available inside the session context manager. They are:

  1. run_python(). This runs a string representing arbitrary Python code and returns streamed results (which is to say, stdout, within each cell). Of course, this code will run in the Lab environment specified by the session, which will be bound to a particular kernel (usually LSST within the RSP), and will have access to whatever is installed in that kernel.

  2. run_notebook(). This runs each cell of a supplied notebook using run_python(), accumulating the results and ultimately returning them as a list of cell outputs.

  3. run_notebook_via_rsp_extension(). This executes a notebook via the /rubin/execution endpoint of the RSP Jupyter Extensions. Times Square and Noteburst use this method. The extensions run within the user lab, and the execution extension, in turn, uses nbconvert to execute notebooks and return their rendered form. If you need to execute a notebook and capture output that did not go to stdout (for instance, the Javascript created by a Bokeh call, that will ultimately run in your browser), this is at present the way to do it.

Client Use Examples

The following uses the client to determine whether a user Lab is running and to start it if necessary. It does not run any code within the Lab:

"""Ensure there's a running lab for the user."""

import asyncio

import structlog

from rubin.nublado.client import NubladoClient
from rubin.nublado.client.models import (
    NubladoImageByClass,
    NubladoImageClass,
    NubladoImageSize,
    User,
)

LAB_SPAWN_TIMEOUT = 90


async def ensure_lab() -> None:
    """Start a Lab if one is not present."""
    client = NubladoClient(
        user=User(username="some-user", token="some-token"),
        base_url="https://data.example.org",
    )
    await client.auth_to_hub()
    stopped = await client.is_lab_stopped()
    if stopped:
        image = NubladoImageByClass(
            image_class=NubladoImageClass.RECOMMENDED,
            size=NubladoImageSize.Medium,
        )
        await client.spawn_lab(image)
        progress = client.watch_spawn_progress()
        async with asyncio.timeout(LAB_SPAWN_TIMEOUT):
            async for message in progress:
                if message.ready:
                    break


asyncio.run(ensure_lab())

The next example assumes that you have already done the above–that is, you know the user already has a running Lab–and that you, for some reason, want to run FizzBuzz for n=1 through 100:

"""Run Fizzbuzz in the RSP"""

import asyncio

from rubin.nublado.client import NubladoClient
from rubin.nublado.client.models import User

client = NubladoClient(
    user=User(username="some-user", token="some-token"),
    base_url="https://data.example.org",
)
FIZZBUZZ = """
i=1
accum=""
while (i<=100):
    if i>1:
        accum += ", "
    if (i%15 == 0):
        accum += "Fizz Buzz"
    elif (i%5 == 0):
        accum += "Buzz"
    elif (i%3 == 0):
        accum += "Fizz"
    else:
        accum += str(i)
    i += 1
print(accum)
"""


async def run_fizzbuzz(client: NubladoClient) -> str:
    await client.auth_to_hub()
    await client.auth_to_lab()
    async with client.open_lab_session() as lab_session:
        output = await lab_session.run_python(FIZZBUZZ)
    return output


output = asyncio.run(run_fizzbuzz(client=client))
print(output)

This will display the following:

1, 2, Fizz, 4, Buzz, Fizz, 7, 8, Fizz, Buzz, 11, Fizz, 13, 14, Fizz Buzz, 16, 17, Fizz, 19, Buzz, Fizz, 22, 23, Fizz, Buzz, 26, Fizz, 28, 29, Fizz Buzz, 31, 32, Fizz, 34, Buzz, Fizz, 37, 38, Fizz, Buzz, 41, Fizz, 43, 44, Fizz Buzz, 46, 47, Fizz, 49, Buzz, Fizz, 52, 53, Fizz, Buzz, 56, Fizz, 58, 59, Fizz Buzz, 61, 62, Fizz, 64, Buzz, Fizz, 67, 68, Fizz, Buzz, 71, Fizz, 73, 74, Fizz Buzz, 76, 77, Fizz, 79, Buzz, Fizz, 82, 83, Fizz, Buzz, 86, Fizz, 88, 89, Fizz Buzz, 91, 92, Fizz, 94, Buzz, Fizz, 97, 98, Fizz, Buzz

For the next two examples, we will assume that you have a notebook called HelloGoodbye.ipynb in your home directory. This notebook contains two cells. The first cell’s code is:

print("Hello, world!")

and the second cell’s code is:

print("Goodbye, world!")

Then the following will run the notebook via each method, compare their outputs, and if they are the same, print the outputs with the line number followed by a colon and a space before each one:

import asyncio
import json

from dataclasses import dataclass
from pathlib import Path

from rubin.nublado.client import NubladoClient
from rubin.nublado.client.models import NotebookExecutionResult, User


@dataclass
class NBResults:
    session_output: list[str]
    extension_output: NotebookExecutionResult


client = NubladoClient(
    user=User(username="some-user", token="some-token"),
    base_url="https://data.example.org",
)
notebook = Path("HelloGoodbye.ipynb")


async def run_notebook_both_ways(
    client: NubladoClient, notebook: Path
) -> NBResults:
    await client.auth_to_hub()
    await client.auth_to_lab()
    async with client.open_lab_session() as lab_session:
        session_output = await lab_session.run_notebook(notebook)
        extension_output = await lab_session.run_notebook_via_rsp_extension(
            path=notebook
        )
    return NBResults(
        session_output=session_output, extension_output=extension_output
    )


output = asyncio.run(run_notebook_both_ways(client=client, notebook=notebook))
obj = json.loads(output.extension_output.notebook)
cells = obj["cells"]
# Check that the output is the same from both methods.  We have to do a
# lot of work to pull the streaming output out of the cell.
outputs_from_extension: list[str] = []
for cell in cells:
    if (
        "cell_type" in cell
        and cell["cell_type"] == "code"
        and "outputs" in cell
        and cell["outputs"]
    ):
        cell_outputs = cell["outputs"]
        for outp in cell_outputs:
            if (
                "output_type" in outp
                and outp["output_type"] == "stream"
                and "name" in outp
                and outp["name"] == "stdout"
                and "text" in outp
                and outp["text"]
            ):
                text_list = outp["text"]
                for text in text_list:
                    if text:
                        outputs_from_extension.append(text)

if outputs_from_extension == output.session_output:
    for count, line in enumerate(output.session_output):
        print(f"{count+1}: {line.strip()}")

This yields:

1: Hello, World!
2: Goodbye, World!

Mocks and Testing

In the module rubin.nublado.client.testing you will find the MockJupyter class. This provides a simulation of the RSP Nublado Hub/Proxy/Controller environment, as well as a partial simulation of the Labs it spawns. The reason you would use this is to be able to meaningfully test your service without having to test against a live RSP or spin up your own RSP to test the service against. Although there are quite a few additional classes within the module, MockJupyter should be the only one you need directly, except to set up the test fixture.

Creating the Jupyter Mock Test Fixture

The rubin.nublado.client.testing.MockJupyter class is fundamentally an instance of the respx class (used for testing httpx services), with a websocket emulator patched into it.

It depends on two other fixtures: environment_url is a string, representing the base URL of the RSP environment, and filesystem is a pathlib.Path representing the home directory of the user the NubladoClient is running as. These collectively look like:

from collections.abc import Iterator
from contextlib import asynccontextmanager
from pathlib import Path
from unittest.mock import patch

import pytest
import respx
import websockets

from nublado.rubin.client.testing import (
    MockJupyter,
    MockJupyterWebSocket,
    mock_jupyter,
    mock_jupyter_websocket,
)


@pytest.fixture
def environment_url() -> str:
    return "https://data.example.org"


@pytest.fixture
def test_filesystem() -> Iterator[Path]:
    with TemporaryDirectory() as td:
        # Do whatever you need to do in order to set up the home
        # directory contents here
        yield Path(td)


@pytest.fixture
def jupyter(
    respx_mock: respx.Router,
    environment_url: str,
    test_filesystem: Path,
) -> Iterator[MockJupyter]:
    """Mock out JupyterHub and Jupyter labs."""
    jupyter_mock = mock_jupyter(
        respx_mock,
        base_url=environment_url,
        user_dir=test_filesystem,
    )

    # respx has no mechanism to mock aconnect_ws, so we have to do it
    # ourselves.
    @asynccontextmanager
    async def mock_connect(
        url: str,
        extra_headers: dict[str, str],
        max_size: int | None,
        open_timeout: int,
    ) -> AsyncIterator[MockJupyterWebSocket]:
        yield mock_jupyter_websocket(url, extra_headers, jupyter_mock)

    with patch(websockets, "connect") as mock:
        mock.side_effect = mock_connect
        yield jupyter_mock

Once you’ve done all that, all you will need to do is supply the test fixture jupyter to your unit tests along with a client to communicate with it.

The client is much simpler. The only special things you need to do with the NubladoClient are to configure it with the same environment URL your mock Jupyter has, and give it X-Auth-Request-User and X-Auth-Request-Token headers that, in real life, would come in via GafaelfawrIngress. It will make all its usual HTTP calls, which will be intercepted by the jupyter test fixture and responded to appropriately.

Mocking Payloads

The Python code being used as a client payload is expected, in the wild, to run within an RSP kernel; usually the LSST kernel, which is extremely heavyweight and has a great many features not found in a vanilla Python installation.

The MockJupyter class contains a pair of methods that enable the user to register code or notebook contents with the mock, and if the mock sees those things as execution payloads, it will reply with the registered results rather than trying to actually execute them.

These methods are register_python_result() and register_extension_result(). The first is used for mocking run_python() and run_notebook(), and the second for mocking run_notebook_via_rsp_extension(). For any case involving Python that uses modules outside the standard library, use the register methods to pre-load appropriate replies for that code.

These are generally the only two methods of MockJupyter that the service developer should use directly. All tests should then interact with the mock Jupyter service through NubladoClient, possibly with execution output mocked via registration.