{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Benchmarking a data-intensive operation (snow version)\n", "\n", "We're going to compute the zonal, time average of a product of two high-frequency fields from the CFSR dataset.\n", "\n", "Specifically we will compute\n", "\n", "$$[\\overline{vT}]$$\n", "\n", "for the month of January (one single year)\n", "\n", "I want to compare various ways of accelerating the calculation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compute environment\n", "\n", "I logged into https://jupyterlab.arcc.albany.edu, and spawned on `snow` with 8 cores and 64 GB memory\n", "\n", "This resource is only open to snow group members\n", "\n", "Python kernel is `daes_nov22`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Eager execution in memory\n", "\n", "First we do the \"normal\" style: open the datasets, execute the calculation immediately" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset>\n",
       "Dimensions:  (time: 1460, lat: 361, lon: 720, lev: 32)\n",
       "Coordinates:\n",
       "  * time     (time) datetime64[ns] 2021-01-01 ... 2021-12-31T18:00:00\n",
       "  * lat      (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n",
       "  * lon      (lon) float32 -180.0 -179.5 -179.0 -178.5 ... 178.5 179.0 179.5\n",
       "  * lev      (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n",
       "Data variables:\n",
       "    v        (time, lev, lat, lon) float32 ...\n",
       "Attributes:\n",
       "    description:    v 1000-10 hPa\n",
       "    year:           2021\n",
       "    source:         http://nomads.ncdc.noaa.gov/data.php?name=access#CFSR-data\n",
       "    references:     Saha, et. al., (2010)\n",
       "    created_by:     User: ab473731\n",
       "    creation_date:  Sat Jan  2 06:00:41 UTC 2021
" ], "text/plain": [ "\n", "Dimensions: (time: 1460, lat: 361, lon: 720, lev: 32)\n", "Coordinates:\n", " * time (time) datetime64[ns] 2021-01-01 ... 2021-12-31T18:00:00\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lon (lon) float32 -180.0 -179.5 -179.0 -178.5 ... 178.5 179.0 179.5\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n", "Data variables:\n", " v (time, lev, lat, lon) float32 ...\n", "Attributes:\n", " description: v 1000-10 hPa\n", " year: 2021\n", " source: http://nomads.ncdc.noaa.gov/data.php?name=access#CFSR-data\n", " references: Saha, et. al., (2010)\n", " created_by: User: ab473731\n", " creation_date: Sat Jan 2 06:00:41 UTC 2021" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import xarray as xr\n", "\n", "files = ['/network/daes/cfsr/data/2021/v.2021.0p5.anl.nc',\n", " '/network/daes/cfsr/data/2021/t.2021.0p5.anl.nc']\n", "\n", "v_ds = xr.open_dataset(files[0])\n", "T_ds = xr.open_dataset(files[1])\n", "v_ds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Computing the product $vT$ over one month involves taking the product of two arrays of size\n", "\n", "```\n", "124 x 32 x 361 x 720\n", "```\n", "\n", "The total number of grid points for the whole year is" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "12143462400" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "v_ds.time.size * v_ds.lev.size * v_ds.lat.size * v_ds.lon.size" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The month of January alone is about $1\\times 10^9$ points.\n", "\n", "Using 32-bit real numbers (4 bytes per numbers), this means we're taking the product of two arrays of size around 4 GB each." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Timing the eager execution" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 56.9 s, sys: 7.19 s, total: 1min 4s\n", "Wall time: 1min 8s\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray (lev: 32, lat: 361)>\n",
       "array([[-5.6487140e-03, -4.9008570e+00,  9.0137497e+01, ...,\n",
       "        -2.2839991e+01, -1.6109011e+01, -5.6033172e-03],\n",
       "       [-5.1496424e-02, -4.9253478e+00,  8.9656609e+01, ...,\n",
       "        -1.0113486e+01, -8.7076464e+00, -5.2921850e-02],\n",
       "       [-7.3452897e-02, -4.9121408e+00,  8.9197853e+01, ...,\n",
       "         8.5336580e+00, -6.0262263e-01, -6.8927042e-02],\n",
       "       ...,\n",
       "       [-2.8627560e-01, -2.4832463e-01, -1.9483636e-01, ...,\n",
       "         1.4787066e+01,  8.3816442e+00, -2.6661530e-01],\n",
       "       [ 2.6218587e-01,  6.6386479e-01,  9.0554982e-01, ...,\n",
       "         3.2305588e+01,  1.7912203e+01,  2.3270084e-01],\n",
       "       [-7.3782736e-01,  1.6531569e-01,  1.1146791e+00, ...,\n",
       "         9.4755917e+00,  6.2672334e+00, -6.2244219e-01]], dtype=float32)\n",
       "Coordinates:\n",
       "  * lat      (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n",
       "  * lev      (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0
" ], "text/plain": [ "\n", "array([[-5.6487140e-03, -4.9008570e+00, 9.0137497e+01, ...,\n", " -2.2839991e+01, -1.6109011e+01, -5.6033172e-03],\n", " [-5.1496424e-02, -4.9253478e+00, 8.9656609e+01, ...,\n", " -1.0113486e+01, -8.7076464e+00, -5.2921850e-02],\n", " [-7.3452897e-02, -4.9121408e+00, 8.9197853e+01, ...,\n", " 8.5336580e+00, -6.0262263e-01, -6.8927042e-02],\n", " ...,\n", " [-2.8627560e-01, -2.4832463e-01, -1.9483636e-01, ...,\n", " 1.4787066e+01, 8.3816442e+00, -2.6661530e-01],\n", " [ 2.6218587e-01, 6.6386479e-01, 9.0554982e-01, ...,\n", " 3.2305588e+01, 1.7912203e+01, 2.3270084e-01],\n", " [-7.3782736e-01, 1.6531569e-01, 1.1146791e+00, ...,\n", " 9.4755917e+00, 6.2672334e+00, -6.2244219e-01]], dtype=float32)\n", "Coordinates:\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%time (v_ds.v.groupby(v_ds.time.dt.month)[1] * T_ds.t.groupby(T_ds.time.dt.month)[1]).mean(dim=('lon','time'))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "v_ds.close()\n", "T_ds.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lazy execution using Dask locally" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset>\n",
       "Dimensions:  (time: 1460, lat: 361, lon: 720, lev: 32)\n",
       "Coordinates:\n",
       "  * time     (time) datetime64[ns] 2021-01-01 ... 2021-12-31T18:00:00\n",
       "  * lat      (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n",
       "  * lon      (lon) float32 -180.0 -179.5 -179.0 -178.5 ... 178.5 179.0 179.5\n",
       "  * lev      (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n",
       "Data variables:\n",
       "    t        (time, lev, lat, lon) float32 dask.array<chunksize=(30, 1, 361, 720), meta=np.ndarray>\n",
       "    v        (time, lev, lat, lon) float32 dask.array<chunksize=(30, 1, 361, 720), meta=np.ndarray>\n",
       "Attributes:\n",
       "    description:    t 1000-10 hPa\n",
       "    year:           2021\n",
       "    source:         http://nomads.ncdc.noaa.gov/data.php?name=access#CFSR-data\n",
       "    references:     Saha, et. al., (2010)\n",
       "    created_by:     User: ab473731\n",
       "    creation_date:  Sat Jan  2 06:00:24 UTC 2021
" ], "text/plain": [ "\n", "Dimensions: (time: 1460, lat: 361, lon: 720, lev: 32)\n", "Coordinates:\n", " * time (time) datetime64[ns] 2021-01-01 ... 2021-12-31T18:00:00\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lon (lon) float32 -180.0 -179.5 -179.0 -178.5 ... 178.5 179.0 179.5\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n", "Data variables:\n", " t (time, lev, lat, lon) float32 dask.array\n", " v (time, lev, lat, lon) float32 dask.array\n", "Attributes:\n", " description: t 1000-10 hPa\n", " year: 2021\n", " source: http://nomads.ncdc.noaa.gov/data.php?name=access#CFSR-data\n", " references: Saha, et. al., (2010)\n", " created_by: User: ab473731\n", " creation_date: Sat Jan 2 06:00:24 UTC 2021" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds = xr.open_mfdataset(files, chunks={'time':30, 'lev': 1})\n", "ds" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray (month: 12, lev: 32, lat: 361)>\n",
       "dask.array<transpose, shape=(12, 32, 361), dtype=float32, chunksize=(1, 1, 361), chunktype=numpy.ndarray>\n",
       "Coordinates:\n",
       "  * lat      (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n",
       "  * lev      (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n",
       "  * month    (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
" ], "text/plain": [ "\n", "dask.array\n", "Coordinates:\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n", " * month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vT = ds.v * ds.t\n", "meanfield = vT.mean(dim='lon').groupby(ds.time.dt.month).mean()\n", "meanfield" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 1min 12s, sys: 5.86 s, total: 1min 18s\n", "Wall time: 1min 5s\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray (lev: 32, lat: 361)>\n",
       "array([[-5.64870983e-03, -4.90085554e+00,  9.01375198e+01, ...,\n",
       "        -2.28399925e+01, -1.61090107e+01, -5.60331950e-03],\n",
       "       [-5.14964201e-02, -4.92534685e+00,  8.96566238e+01, ...,\n",
       "        -1.01134815e+01, -8.70764828e+00, -5.29218502e-02],\n",
       "       [-7.34528974e-02, -4.91214132e+00,  8.91978302e+01, ...,\n",
       "         8.53365612e+00, -6.02623105e-01, -6.89270422e-02],\n",
       "       ...,\n",
       "       [-2.86275655e-01, -2.48324722e-01, -1.94836378e-01, ...,\n",
       "         1.47870646e+01,  8.38164520e+00, -2.66615331e-01],\n",
       "       [ 2.62185901e-01,  6.63864732e-01,  9.05549943e-01, ...,\n",
       "         3.23055954e+01,  1.79122047e+01,  2.32700869e-01],\n",
       "       [-7.37827361e-01,  1.65315807e-01,  1.11467922e+00, ...,\n",
       "         9.47559261e+00,  6.26723337e+00, -6.22442186e-01]], dtype=float32)\n",
       "Coordinates:\n",
       "  * lat      (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n",
       "  * lev      (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n",
       "    month    int64 1
" ], "text/plain": [ "\n", "array([[-5.64870983e-03, -4.90085554e+00, 9.01375198e+01, ...,\n", " -2.28399925e+01, -1.61090107e+01, -5.60331950e-03],\n", " [-5.14964201e-02, -4.92534685e+00, 8.96566238e+01, ...,\n", " -1.01134815e+01, -8.70764828e+00, -5.29218502e-02],\n", " [-7.34528974e-02, -4.91214132e+00, 8.91978302e+01, ...,\n", " 8.53365612e+00, -6.02623105e-01, -6.89270422e-02],\n", " ...,\n", " [-2.86275655e-01, -2.48324722e-01, -1.94836378e-01, ...,\n", " 1.47870646e+01, 8.38164520e+00, -2.66615331e-01],\n", " [ 2.62185901e-01, 6.63864732e-01, 9.05549943e-01, ...,\n", " 3.23055954e+01, 1.79122047e+01, 2.32700869e-01],\n", " [-7.37827361e-01, 1.65315807e-01, 1.11467922e+00, ...,\n", " 9.47559261e+00, 6.26723337e+00, -6.22442186e-01]], dtype=float32)\n", "Coordinates:\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n", " month int64 1" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%time meanfield.sel(month=1).load()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "ds.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Conclusion\n", "\n", "Similar to running on `batch`, we get about the same performance either way -- giving advantage to the lazy method, because the code is nicer to work with.\n", "\n", "Snow is slower than batch -- probably just because the cpu hardware is significantly older." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lazy execution using Dask and a distributed cluster\n", "\n", "Now we'll try the same thing but farm the computation out to a dask cluster.\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "
\n", "
\n", "

Client

\n", "

Client-974a185b-66f6-11ed-96c3-80000208fe80

\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "
Connection method: Cluster objectCluster type: dask_jobqueue.SLURMCluster
\n", " Dashboard: http://10.77.8.107:8787/status\n", "
\n", "\n", " \n", "
\n", "

Cluster Info

\n", "
\n", "
\n", "
\n", "
\n", "

SLURMCluster

\n", "

407d9434

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
\n", " Dashboard: http://10.77.8.107:8787/status\n", " \n", " Workers: 0\n", "
\n", " Total threads: 0\n", " \n", " Total memory: 0 B\n", "
\n", "\n", "
\n", " \n", "

Scheduler Info

\n", "
\n", "\n", "
\n", "
\n", "
\n", "
\n", "

Scheduler

\n", "

Scheduler-04bf2212-5777-4b86-9a7d-beee2a36c4b4

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
\n", " Comm: tcp://10.77.8.107:40409\n", " \n", " Workers: 0\n", "
\n", " Dashboard: http://10.77.8.107:8787/status\n", " \n", " Total threads: 0\n", "
\n", " Started: Just now\n", " \n", " Total memory: 0 B\n", "
\n", "
\n", "
\n", "\n", "
\n", " \n", "

Workers

\n", "
\n", "\n", " \n", "\n", "
\n", "
\n", "\n", "
\n", "
\n", "
\n", "
\n", " \n", "\n", "
\n", "
" ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from dask_jobqueue import SLURMCluster\n", "from dask.distributed import Client\n", "\n", "cluster = SLURMCluster(processes=8, #By default Dask will run one Python process per job. \n", " # However, you can optionally choose to cut up that job into multiple processes\n", " cores=32, # size of a single job -- typically one node of the HPC cluster\n", " memory=\"256GB\", # snow has 8 GB ram per physical core I believe\n", " walltime=\"01:00:00\",\n", " queue=\"snow\",\n", " interface=\"ib0\",\n", " )\n", "cluster.scale(1)\n", "client = Client(cluster)\n", "client" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Effects of different chunking" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray (time: 1460, lev: 32, lat: 361, lon: 720)>\n",
       "dask.array<mul, shape=(1460, 32, 361, 720), dtype=float32, chunksize=(30, 1, 361, 720), chunktype=numpy.ndarray>\n",
       "Coordinates:\n",
       "  * time     (time) datetime64[ns] 2021-01-01 ... 2021-12-31T18:00:00\n",
       "  * lat      (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n",
       "  * lon      (lon) float32 -180.0 -179.5 -179.0 -178.5 ... 178.5 179.0 179.5\n",
       "  * lev      (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0
" ], "text/plain": [ "\n", "dask.array\n", "Coordinates:\n", " * time (time) datetime64[ns] 2021-01-01 ... 2021-12-31T18:00:00\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lon (lon) float32 -180.0 -179.5 -179.0 -178.5 ... 178.5 179.0 179.5\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds = xr.open_mfdataset(files, chunks={'time':30, 'lev': 1})\n", "vT = ds.v * ds.t\n", "vT" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray (month: 12, lev: 32, lat: 361)>\n",
       "dask.array<transpose, shape=(12, 32, 361), dtype=float32, chunksize=(1, 1, 361), chunktype=numpy.ndarray>\n",
       "Coordinates:\n",
       "  * lat      (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n",
       "  * lev      (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n",
       "  * month    (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
" ], "text/plain": [ "\n", "dask.array\n", "Coordinates:\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n", " * month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "meanfield = vT.mean(dim='lon').groupby(ds.time.dt.month).mean()\n", "meanfield" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 2.5 s, sys: 455 ms, total: 2.96 s\n", "Wall time: 41.9 s\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray (lev: 32, lat: 361)>\n",
       "array([[-5.64870983e-03, -4.90085554e+00,  9.01375198e+01, ...,\n",
       "        -2.28399925e+01, -1.61090107e+01, -5.60331950e-03],\n",
       "       [-5.14964201e-02, -4.92534685e+00,  8.96566238e+01, ...,\n",
       "        -1.01134815e+01, -8.70764828e+00, -5.29218502e-02],\n",
       "       [-7.34528974e-02, -4.91214132e+00,  8.91978302e+01, ...,\n",
       "         8.53365612e+00, -6.02623105e-01, -6.89270422e-02],\n",
       "       ...,\n",
       "       [-2.86275655e-01, -2.48324722e-01, -1.94836378e-01, ...,\n",
       "         1.47870646e+01,  8.38164520e+00, -2.66615331e-01],\n",
       "       [ 2.62185901e-01,  6.63864732e-01,  9.05549943e-01, ...,\n",
       "         3.23055954e+01,  1.79122047e+01,  2.32700869e-01],\n",
       "       [-7.37827361e-01,  1.65315807e-01,  1.11467922e+00, ...,\n",
       "         9.47559261e+00,  6.26723337e+00, -6.22442186e-01]], dtype=float32)\n",
       "Coordinates:\n",
       "  * lat      (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n",
       "  * lev      (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n",
       "    month    int64 1
" ], "text/plain": [ "\n", "array([[-5.64870983e-03, -4.90085554e+00, 9.01375198e+01, ...,\n", " -2.28399925e+01, -1.61090107e+01, -5.60331950e-03],\n", " [-5.14964201e-02, -4.92534685e+00, 8.96566238e+01, ...,\n", " -1.01134815e+01, -8.70764828e+00, -5.29218502e-02],\n", " [-7.34528974e-02, -4.91214132e+00, 8.91978302e+01, ...,\n", " 8.53365612e+00, -6.02623105e-01, -6.89270422e-02],\n", " ...,\n", " [-2.86275655e-01, -2.48324722e-01, -1.94836378e-01, ...,\n", " 1.47870646e+01, 8.38164520e+00, -2.66615331e-01],\n", " [ 2.62185901e-01, 6.63864732e-01, 9.05549943e-01, ...,\n", " 3.23055954e+01, 1.79122047e+01, 2.32700869e-01],\n", " [-7.37827361e-01, 1.65315807e-01, 1.11467922e+00, ...,\n", " 9.47559261e+00, 6.26723337e+00, -6.22442186e-01]], dtype=float32)\n", "Coordinates:\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n", " month int64 1" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%time meanfield.sel(month=1).load()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "ds.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Same thing, different chunking" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "ds = xr.open_mfdataset(files, chunks={'time':10, 'lev': 32})" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray (month: 12, lev: 32, lat: 361)>\n",
       "dask.array<transpose, shape=(12, 32, 361), dtype=float32, chunksize=(1, 32, 361), chunktype=numpy.ndarray>\n",
       "Coordinates:\n",
       "  * lat      (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n",
       "  * lev      (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n",
       "  * month    (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
" ], "text/plain": [ "\n", "dask.array\n", "Coordinates:\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n", " * month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "meanfield = (ds.v*ds.t).mean(dim='lon').groupby(ds.time.dt.month).mean()\n", "meanfield" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 812 ms, sys: 120 ms, total: 932 ms\n", "Wall time: 29.9 s\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray (lev: 32, lat: 361)>\n",
       "array([[-5.6487117e-03, -4.9008560e+00,  9.0137527e+01, ...,\n",
       "        -2.2839994e+01, -1.6109011e+01, -5.6033176e-03],\n",
       "       [-5.1496424e-02, -4.9253478e+00,  8.9656616e+01, ...,\n",
       "        -1.0113482e+01, -8.7076483e+00, -5.2921850e-02],\n",
       "       [-7.3452897e-02, -4.9121413e+00,  8.9197830e+01, ...,\n",
       "         8.5336571e+00, -6.0262257e-01, -6.8927050e-02],\n",
       "       ...,\n",
       "       [-2.8627566e-01, -2.4832460e-01, -1.9483612e-01, ...,\n",
       "         1.4787064e+01,  8.3816442e+00, -2.6661530e-01],\n",
       "       [ 2.6218590e-01,  6.6386473e-01,  9.0554994e-01, ...,\n",
       "         3.2305595e+01,  1.7912203e+01,  2.3270084e-01],\n",
       "       [-7.3782736e-01,  1.6531578e-01,  1.1146792e+00, ...,\n",
       "         9.4755926e+00,  6.2672329e+00, -6.2244225e-01]], dtype=float32)\n",
       "Coordinates:\n",
       "  * lat      (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n",
       "  * lev      (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n",
       "    month    int64 1
" ], "text/plain": [ "\n", "array([[-5.6487117e-03, -4.9008560e+00, 9.0137527e+01, ...,\n", " -2.2839994e+01, -1.6109011e+01, -5.6033176e-03],\n", " [-5.1496424e-02, -4.9253478e+00, 8.9656616e+01, ...,\n", " -1.0113482e+01, -8.7076483e+00, -5.2921850e-02],\n", " [-7.3452897e-02, -4.9121413e+00, 8.9197830e+01, ...,\n", " 8.5336571e+00, -6.0262257e-01, -6.8927050e-02],\n", " ...,\n", " [-2.8627566e-01, -2.4832460e-01, -1.9483612e-01, ...,\n", " 1.4787064e+01, 8.3816442e+00, -2.6661530e-01],\n", " [ 2.6218590e-01, 6.6386473e-01, 9.0554994e-01, ...,\n", " 3.2305595e+01, 1.7912203e+01, 2.3270084e-01],\n", " [-7.3782736e-01, 1.6531578e-01, 1.1146792e+00, ...,\n", " 9.4755926e+00, 6.2672329e+00, -6.2244225e-01]], dtype=float32)\n", "Coordinates:\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n", " month int64 1" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%time meanfield.sel(month=1).load()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "ds.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### And yet another chunking" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray (time: 1460, lev: 32, lat: 361, lon: 720)>\n",
       "dask.array<mul, shape=(1460, 32, 361, 720), dtype=float32, chunksize=(124, 32, 361, 720), chunktype=numpy.ndarray>\n",
       "Coordinates:\n",
       "  * time     (time) datetime64[ns] 2021-01-01 ... 2021-12-31T18:00:00\n",
       "  * lat      (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n",
       "  * lon      (lon) float32 -180.0 -179.5 -179.0 -178.5 ... 178.5 179.0 179.5\n",
       "  * lev      (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0
" ], "text/plain": [ "\n", "dask.array\n", "Coordinates:\n", " * time (time) datetime64[ns] 2021-01-01 ... 2021-12-31T18:00:00\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lon (lon) float32 -180.0 -179.5 -179.0 -178.5 ... 178.5 179.0 179.5\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds = xr.open_mfdataset(files, chunks={'time':124, 'lev': 32})\n", "vT = ds.v * ds.t\n", "vT" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray (month: 12, lev: 32, lat: 361)>\n",
       "dask.array<transpose, shape=(12, 32, 361), dtype=float32, chunksize=(2, 32, 361), chunktype=numpy.ndarray>\n",
       "Coordinates:\n",
       "  * lat      (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n",
       "  * lev      (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n",
       "  * month    (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
" ], "text/plain": [ "\n", "dask.array\n", "Coordinates:\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n", " * month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "meanfield = vT.mean(dim='lon').groupby(ds.time.dt.month).mean()\n", "meanfield" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 1.7 s, sys: 217 ms, total: 1.91 s\n", "Wall time: 1min 13s\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray (lev: 32, lat: 361)>\n",
       "array([[-5.6487140e-03, -4.9008555e+00,  9.0137527e+01, ...,\n",
       "        -2.2839994e+01, -1.6109011e+01, -5.6033167e-03],\n",
       "       [-5.1496420e-02, -4.9253469e+00,  8.9656624e+01, ...,\n",
       "        -1.0113482e+01, -8.7076483e+00, -5.2921850e-02],\n",
       "       [-7.3452890e-02, -4.9121413e+00,  8.9197838e+01, ...,\n",
       "         8.5336561e+00, -6.0262299e-01, -6.8927050e-02],\n",
       "       ...,\n",
       "       [-2.8627566e-01, -2.4832465e-01, -1.9483635e-01, ...,\n",
       "         1.4787064e+01,  8.3816452e+00, -2.6661530e-01],\n",
       "       [ 2.6218587e-01,  6.6386473e-01,  9.0554988e-01, ...,\n",
       "         3.2305595e+01,  1.7912203e+01,  2.3270085e-01],\n",
       "       [-7.3782736e-01,  1.6531581e-01,  1.1146793e+00, ...,\n",
       "         9.4755907e+00,  6.2672334e+00, -6.2244219e-01]], dtype=float32)\n",
       "Coordinates:\n",
       "  * lat      (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n",
       "  * lev      (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n",
       "    month    int64 1
" ], "text/plain": [ "\n", "array([[-5.6487140e-03, -4.9008555e+00, 9.0137527e+01, ...,\n", " -2.2839994e+01, -1.6109011e+01, -5.6033167e-03],\n", " [-5.1496420e-02, -4.9253469e+00, 8.9656624e+01, ...,\n", " -1.0113482e+01, -8.7076483e+00, -5.2921850e-02],\n", " [-7.3452890e-02, -4.9121413e+00, 8.9197838e+01, ...,\n", " 8.5336561e+00, -6.0262299e-01, -6.8927050e-02],\n", " ...,\n", " [-2.8627566e-01, -2.4832465e-01, -1.9483635e-01, ...,\n", " 1.4787064e+01, 8.3816452e+00, -2.6661530e-01],\n", " [ 2.6218587e-01, 6.6386473e-01, 9.0554988e-01, ...,\n", " 3.2305595e+01, 1.7912203e+01, 2.3270085e-01],\n", " [-7.3782736e-01, 1.6531581e-01, 1.1146793e+00, ...,\n", " 9.4755907e+00, 6.2672334e+00, -6.2244219e-01]], dtype=float32)\n", "Coordinates:\n", " * lat (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0\n", " * lev (lev) float32 1e+03 975.0 950.0 925.0 900.0 ... 50.0 30.0 20.0 10.0\n", " month int64 1" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%time meanfield.sel(month=1).load()" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "ds.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Results\n", "\n", "Same as for batch, the smallest wall time for this calculation was achieved using the distributed cluster and a chunking strategy `chunks={'time':10, 'lev': 32}`.\n", "\n", "For this one-month calculation, using the distributed cluster **reduces our wall time from about 65 s down to about 30 s**. \n", "\n", "That's about a factor of two speedup. \n", "\n", "Compared to batch, snow is about 1.5x slower (30 seconds vs 20 seconds)." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "cluster.close()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "daes_nov22", "language": "python", "name": "daes_nov22" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 4 }