NubladoSpawner

class rubin.nublado.spawner.NubladoSpawner(*args, **kwargs)

Bases: Spawner

Spawner class that sends requests to the RSP lab controller.

Rather than having JupyterHub spawn labs directly and therefore need Kubernetes permissions to manage every resource that a user’s lab environment may need, the Rubin Science Platform manages all labs in a separate privileged lab controller process. JupyterHub makes RESTful HTTP requests to that service using either its own credentials or the credentials of the user.

See SQR-066 for the full design.

Notes

This class uses a single process-global shared httpx.AsyncClient to make all of its HTTP requests, rather than using one per instantiation of the spawner class. Each user gets their own spawner, so this approach allows all requests to share a connection pool.

This client is created on first use and never shut down. To be strictly correct, it should be closed properly when the JupyterHub process is exiting, but we haven’t yet figured out how to hook into the appropriate part of the JupyterHub lifecycle to do that.

Parameters:

Attributes Summary

admin_token_path

Path to the Gafaelfawr token for JupyterHub itself.

controller_url

Base URL for the Nublado lab controller.

Methods Summary

get_url()

Determine the URL of a running lab.

options_form(spawner)

Retrieve the options form for this user from the lab controller.

poll()

Check if the pod is running.

progress()

Monitor the progress of a spawn.

start()

Start the user's pod.

stop()

Delete any running pod for the user.

Attributes Documentation

admin_token_path

Path to the Gafaelfawr token for JupyterHub itself.

This token will be used to authenticate to the lab controller routes that JupyterHub is allowed to call directly such as to get lab status and delete a lab.

controller_url

Base URL for the Nublado lab controller.

All URLs for talking to the Nublado lab controller will be constructed relative to this base URL.

Methods Documentation

async get_url()

Determine the URL of a running lab.

Returns:

URL of the lab if we can retrieve it from the lab controller, otherwise the saved URL in the spawner object.

Return type:

str

Notes

JupyterHub recommends implementing this if the spawner has some independent way to retrieve the lab URL, since it allows JupyterHub to recover if it was killed in the middle of spawning a lab and that spawn finished successfully while JupyterHub was down. This method is only called if poll returns None.

JupyterHub does not appear to do any error handling of failures of this method, so it should not raise an exception, just fall back on the stored URL and let the probe fail if that lab does not exist.

async options_form(spawner)

Retrieve the options form for this user from the lab controller.

Parameters:

spawner (Spawner) – Another copy of the spawner (not used). It’s not clear why JupyterHub passes this into this method.

Raises:
Return type:

str

async poll()

Check if the pod is running.

Pods that are currently being terminated are reported as not running, since we want to allow the user to immediately begin spawning a lab. If they outrace the pod termination, we’ll just join the wait for the lab termination to complete.

Returns:

If the pod is starting, running, or terminating, return None. If the pod does not exist, is being terminated, or was successfully terminated, return 0. If the pod exists in a failed state, return 1.

Return type:

int or None

Raises:

ControllerWebError – Raised on failure to talk to the lab controller or a failure response from the lab controller.

Notes

In theory, this is supposed to be the exit status of the Jupyter lab process. This isn’t something we know in the classic sense since the lab is a Kubernetes pod. We only know that something failed if the record of the lab is hanging around in a failed state, so use a simple non-zero exit status for that. Otherwise, we have no way to distinguish between a pod that was shut down without error and a pod that was stopped, so use an exit status of 0 in both cases.

async progress()

Monitor the progress of a spawn.

This method is the internal implementation of the progress API. It provides an iterator of spawn events and then ends when the spawn succeeds or fails.

Yields:

dict – Dictionary representing the event with fields progress, containing an integer completion percentage, and message, containing a human-readable description of the event.

Return type:

AsyncIterator[dict[str, int | str]]

Notes

This method must never raise exceptions, since those will be treated as unhandled exceptions by JupyterHub. If anything fails, just stop the iterator. It doesn’t do any HTTP calls itself, just monitors the events created by start.

Uses the internal _start_future attribute to track when the related start method has completed.

start()

Start the user’s pod.

Initiates the pod start operation and then waits for the pod to spawn by watching the event stream, converting those events into the format expected by JupyterHub and returned by progress. Returns only when the pod is running and JupyterHub should start waiting for the lab process to start responding.

Returns:

Running task monitoring the progress of the spawn. This task will be started before it is returned. When the task is complete, it will return the cluster-internal URL of the running Jupyter lab process.

Return type:

asyncio.Task

Notes

The actual work is done in _start. This is a tiny wrapper to do bookkeeping on the event stream and record the running task so that progress can notice when the task is complete and return.

It is tempting to only initiate the pod spawn here, return immediately, and then let JupyterHub follow progress via the progress API. However, this is not what JupyterHub is expecting. The entire spawn process must happen before the start method returns for the configured timeouts to work properly; once start has returned, JupyterHub only allows a much shorter timeout for the lab to fully start.

Also, JupyterHub handles exceptions from start and correctly recognizes that the pod has failed to start, but exceptions from progress are treated as uncaught exceptions and cause the UI to break. Therefore, progress must never fail and all operations that may fail must be done in start.

async stop()

Delete any running pod for the user.

If the pod does not exist, treat that as success.

Raises:

ControllerWebError – Raised on failure to talk to the lab controller or a failure response from the lab controller.

Return type:

None