{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Benchmarking a data-intensive operation (snow version)\n", "\n", "We're going to compute the zonal, time average of a product of two high-frequency fields from the CFSR dataset.\n", "\n", "Specifically we will compute\n", "\n", "$$[\\overline{vT}]$$\n", "\n", "for the month of January (one single year)\n", "\n", "I want to compare various ways of accelerating the calculation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compute environment\n", "\n", "I logged into https://jupyterlab.arcc.albany.edu, and spawned on `snow` with 8 cores and 64 GB memory\n", "\n", "This resource is only open to snow group members\n", "\n", "Python kernel is `daes_nov22`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Eager execution in memory\n", "\n", "First we do the \"normal\" style: open the datasets, execute the calculation immediately" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
<xarray.Dataset>\n", "Dimensions: (time: 1460, lat: 361, lon: 720, lev: 32)\n", "Coordinates:\n", " * time (time) datetime64[ns] 2021-01-01 ... 2021-12-31T18:00:00\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lon (lon) float32 -180.0 -179.5 -179.0 -178.5 ... 178.5 179.0 179.5\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n", "Data variables:\n", " v (time, lev, lat, lon) float32 ...\n", "Attributes:\n", " description: v 1000-10 hPa\n", " year: 2021\n", " source: http://nomads.ncdc.noaa.gov/data.php?name=access#CFSR-data\n", " references: Saha, et. al., (2010)\n", " created_by: User: ab473731\n", " creation_date: Sat Jan 2 06:00:41 UTC 2021
<xarray.DataArray (lev: 32, lat: 361)>\n", "array([[-5.6487140e-03, -4.9008570e+00, 9.0137497e+01, ...,\n", " -2.2839991e+01, -1.6109011e+01, -5.6033172e-03],\n", " [-5.1496424e-02, -4.9253478e+00, 8.9656609e+01, ...,\n", " -1.0113486e+01, -8.7076464e+00, -5.2921850e-02],\n", " [-7.3452897e-02, -4.9121408e+00, 8.9197853e+01, ...,\n", " 8.5336580e+00, -6.0262263e-01, -6.8927042e-02],\n", " ...,\n", " [-2.8627560e-01, -2.4832463e-01, -1.9483636e-01, ...,\n", " 1.4787066e+01, 8.3816442e+00, -2.6661530e-01],\n", " [ 2.6218587e-01, 6.6386479e-01, 9.0554982e-01, ...,\n", " 3.2305588e+01, 1.7912203e+01, 2.3270084e-01],\n", " [-7.3782736e-01, 1.6531569e-01, 1.1146791e+00, ...,\n", " 9.4755917e+00, 6.2672334e+00, -6.2244219e-01]], dtype=float32)\n", "Coordinates:\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0
<xarray.Dataset>\n", "Dimensions: (time: 1460, lat: 361, lon: 720, lev: 32)\n", "Coordinates:\n", " * time (time) datetime64[ns] 2021-01-01 ... 2021-12-31T18:00:00\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lon (lon) float32 -180.0 -179.5 -179.0 -178.5 ... 178.5 179.0 179.5\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n", "Data variables:\n", " t (time, lev, lat, lon) float32 dask.array<chunksize=(30, 1, 361, 720), meta=np.ndarray>\n", " v (time, lev, lat, lon) float32 dask.array<chunksize=(30, 1, 361, 720), meta=np.ndarray>\n", "Attributes:\n", " description: t 1000-10 hPa\n", " year: 2021\n", " source: http://nomads.ncdc.noaa.gov/data.php?name=access#CFSR-data\n", " references: Saha, et. al., (2010)\n", " created_by: User: ab473731\n", " creation_date: Sat Jan 2 06:00:24 UTC 2021
<xarray.DataArray (month: 12, lev: 32, lat: 361)>\n", "dask.array<transpose, shape=(12, 32, 361), dtype=float32, chunksize=(1, 1, 361), chunktype=numpy.ndarray>\n", "Coordinates:\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n", " * month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
<xarray.DataArray (lev: 32, lat: 361)>\n", "array([[-5.64870983e-03, -4.90085554e+00, 9.01375198e+01, ...,\n", " -2.28399925e+01, -1.61090107e+01, -5.60331950e-03],\n", " [-5.14964201e-02, -4.92534685e+00, 8.96566238e+01, ...,\n", " -1.01134815e+01, -8.70764828e+00, -5.29218502e-02],\n", " [-7.34528974e-02, -4.91214132e+00, 8.91978302e+01, ...,\n", " 8.53365612e+00, -6.02623105e-01, -6.89270422e-02],\n", " ...,\n", " [-2.86275655e-01, -2.48324722e-01, -1.94836378e-01, ...,\n", " 1.47870646e+01, 8.38164520e+00, -2.66615331e-01],\n", " [ 2.62185901e-01, 6.63864732e-01, 9.05549943e-01, ...,\n", " 3.23055954e+01, 1.79122047e+01, 2.32700869e-01],\n", " [-7.37827361e-01, 1.65315807e-01, 1.11467922e+00, ...,\n", " 9.47559261e+00, 6.26723337e+00, -6.22442186e-01]], dtype=float32)\n", "Coordinates:\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n", " month int64 1
Client-974a185b-66f6-11ed-96c3-80000208fe80
\n", "Connection method: Cluster object | \n", "Cluster type: dask_jobqueue.SLURMCluster | \n", " \n", "
\n", " Dashboard: http://10.77.8.107:8787/status\n", " | \n", "\n", " |
407d9434
\n", "\n", " Dashboard: http://10.77.8.107:8787/status\n", " | \n", "\n", " Workers: 0\n", " | \n", "
\n", " Total threads: 0\n", " | \n", "\n", " Total memory: 0 B\n", " | \n", "
Scheduler-04bf2212-5777-4b86-9a7d-beee2a36c4b4
\n", "\n", " Comm: tcp://10.77.8.107:40409\n", " | \n", "\n", " Workers: 0\n", " | \n", "
\n", " Dashboard: http://10.77.8.107:8787/status\n", " | \n", "\n", " Total threads: 0\n", " | \n", "
\n", " Started: Just now\n", " | \n", "\n", " Total memory: 0 B\n", " | \n", "
<xarray.DataArray (time: 1460, lev: 32, lat: 361, lon: 720)>\n", "dask.array<mul, shape=(1460, 32, 361, 720), dtype=float32, chunksize=(30, 1, 361, 720), chunktype=numpy.ndarray>\n", "Coordinates:\n", " * time (time) datetime64[ns] 2021-01-01 ... 2021-12-31T18:00:00\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lon (lon) float32 -180.0 -179.5 -179.0 -178.5 ... 178.5 179.0 179.5\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0
<xarray.DataArray (month: 12, lev: 32, lat: 361)>\n", "dask.array<transpose, shape=(12, 32, 361), dtype=float32, chunksize=(1, 1, 361), chunktype=numpy.ndarray>\n", "Coordinates:\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n", " * month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
<xarray.DataArray (lev: 32, lat: 361)>\n", "array([[-5.64870983e-03, -4.90085554e+00, 9.01375198e+01, ...,\n", " -2.28399925e+01, -1.61090107e+01, -5.60331950e-03],\n", " [-5.14964201e-02, -4.92534685e+00, 8.96566238e+01, ...,\n", " -1.01134815e+01, -8.70764828e+00, -5.29218502e-02],\n", " [-7.34528974e-02, -4.91214132e+00, 8.91978302e+01, ...,\n", " 8.53365612e+00, -6.02623105e-01, -6.89270422e-02],\n", " ...,\n", " [-2.86275655e-01, -2.48324722e-01, -1.94836378e-01, ...,\n", " 1.47870646e+01, 8.38164520e+00, -2.66615331e-01],\n", " [ 2.62185901e-01, 6.63864732e-01, 9.05549943e-01, ...,\n", " 3.23055954e+01, 1.79122047e+01, 2.32700869e-01],\n", " [-7.37827361e-01, 1.65315807e-01, 1.11467922e+00, ...,\n", " 9.47559261e+00, 6.26723337e+00, -6.22442186e-01]], dtype=float32)\n", "Coordinates:\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n", " month int64 1
<xarray.DataArray (month: 12, lev: 32, lat: 361)>\n", "dask.array<transpose, shape=(12, 32, 361), dtype=float32, chunksize=(1, 32, 361), chunktype=numpy.ndarray>\n", "Coordinates:\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n", " * month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
<xarray.DataArray (lev: 32, lat: 361)>\n", "array([[-5.6487117e-03, -4.9008560e+00, 9.0137527e+01, ...,\n", " -2.2839994e+01, -1.6109011e+01, -5.6033176e-03],\n", " [-5.1496424e-02, -4.9253478e+00, 8.9656616e+01, ...,\n", " -1.0113482e+01, -8.7076483e+00, -5.2921850e-02],\n", " [-7.3452897e-02, -4.9121413e+00, 8.9197830e+01, ...,\n", " 8.5336571e+00, -6.0262257e-01, -6.8927050e-02],\n", " ...,\n", " [-2.8627566e-01, -2.4832460e-01, -1.9483612e-01, ...,\n", " 1.4787064e+01, 8.3816442e+00, -2.6661530e-01],\n", " [ 2.6218590e-01, 6.6386473e-01, 9.0554994e-01, ...,\n", " 3.2305595e+01, 1.7912203e+01, 2.3270084e-01],\n", " [-7.3782736e-01, 1.6531578e-01, 1.1146792e+00, ...,\n", " 9.4755926e+00, 6.2672329e+00, -6.2244225e-01]], dtype=float32)\n", "Coordinates:\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n", " month int64 1
<xarray.DataArray (time: 1460, lev: 32, lat: 361, lon: 720)>\n", "dask.array<mul, shape=(1460, 32, 361, 720), dtype=float32, chunksize=(124, 32, 361, 720), chunktype=numpy.ndarray>\n", "Coordinates:\n", " * time (time) datetime64[ns] 2021-01-01 ... 2021-12-31T18:00:00\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lon (lon) float32 -180.0 -179.5 -179.0 -178.5 ... 178.5 179.0 179.5\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0
<xarray.DataArray (month: 12, lev: 32, lat: 361)>\n", "dask.array<transpose, shape=(12, 32, 361), dtype=float32, chunksize=(2, 32, 361), chunktype=numpy.ndarray>\n", "Coordinates:\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n", " * month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
<xarray.DataArray (lev: 32, lat: 361)>\n", "array([[-5.6487140e-03, -4.9008555e+00, 9.0137527e+01, ...,\n", " -2.2839994e+01, -1.6109011e+01, -5.6033167e-03],\n", " [-5.1496420e-02, -4.9253469e+00, 8.9656624e+01, ...,\n", " -1.0113482e+01, -8.7076483e+00, -5.2921850e-02],\n", " [-7.3452890e-02, -4.9121413e+00, 8.9197838e+01, ...,\n", " 8.5336561e+00, -6.0262299e-01, -6.8927050e-02],\n", " ...,\n", " [-2.8627566e-01, -2.4832465e-01, -1.9483635e-01, ...,\n", " 1.4787064e+01, 8.3816452e+00, -2.6661530e-01],\n", " [ 2.6218587e-01, 6.6386473e-01, 9.0554988e-01, ...,\n", " 3.2305595e+01, 1.7912203e+01, 2.3270085e-01],\n", " [-7.3782736e-01, 1.6531581e-01, 1.1146793e+00, ...,\n", " 9.4755907e+00, 6.2672334e+00, -6.2244219e-01]], dtype=float32)\n", "Coordinates:\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n", " month int64 1