{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "93434031-d7fe-4322-a9cf-41e5b8be622d",
   "metadata": {},
   "source": [
    "# Custom Kernels with `apply_ufunc`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c0164c9-f970-4206-964a-d1d8080e2b0a",
   "metadata": {},
   "source": [
    "## Overview\n",
    "### In this tutorial, you learn:\n",
    "\n",
    "* What `apply_ufunc` is and its importance in the xarray Python library.\n",
    "* The basic usage of `apply_ufunc` to apply your function to a DataArray.\n",
    "* Applying custom kernels to DataArray with CuPy\n",
    "\n",
    "## Prerequisites\n",
    "\n",
    "| Concepts | Importance | Notes |\n",
    "| --- | --- | --- |\n",
    "| [Basics of Cupy](Notebook0_Introduction) | Necessary | |\n",
    "| [Familiarity with Xarray](https://foundations.projectpythia.org/core/xarray.html) | Necessary | |\n",
    "\n",
    "- **Time to learn**: 20 minutes\n",
    "\n",
    "\n",
    "\n",
    "## What is `apply_ufunc`? \n",
    "\n",
    "`apply_ufunc` is a powerful function provided by the xarray library, which is commonly used for data manipulation in the Python programming language. This function allows users to apply universal functions (ufuncs) on xarray data structures, including DataArray, Dataset, or Variable objects. With `apply_ufunc`, users can apply arbitrary functions that are compatible with raw arrays (e.g.  NumPy or CuPy), and the function will take care of aligning the input data, looping over dimensions, and maintaining metadata.\n",
    "\n",
    "```{seealso}\n",
    "See the [Xarray tutorial material on apply_ufunc](https://tutorial.xarray.dev/advanced/apply_ufunc/simple_numpy_apply_ufunc.html) for more.\n",
    "```\n",
    "\n",
    "\n",
    "Simple functions that act independently on each value should work without any additional arguments.\n",
    "\n",
    "### Simple Example \n",
    "\n",
    "In the example below, we calculate the saturation vapor pressure by using `apply_ufunc()`.\n",
    "\n",
    "But first, let's create some sample data to work with.\n",
    "\n",
    "We'll use a 3-dimensional dataset (time, latitude, longitude) with random values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "ed7a50f8-a9f1-4a32-b864-389a6be2f4fa",
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "\n",
    "import cupy as cp\n",
    "import cupy_xarray  # Adds .cupy to Xarray objects\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import xarray as xr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "5f6f83a3-36d6-493d-bfda-49f5d766ef14",
   "metadata": {},
   "outputs": [],
   "source": [
    "np.random.seed(0)\n",
    "\n",
    "# Create the time range.\n",
    "date = pd.date_range(\"2010-01-01\", \"2020-12-31\", freq=\"M\")\n",
    "\n",
    "# Create the latitude range.\n",
    "lat = np.arange(-90, 90, 1)\n",
    "\n",
    "# Create the longitude range.\n",
    "lon = np.arange(-180, 180, 1)\n",
    "\n",
    "# Create random data\n",
    "data_np = np.random.rand(len(date), len(lat), len(lon))\n",
    "data_cp = cp.array(data_np)\n",
    "\n",
    "# -- Create DataArray with Numpy data\n",
    "data_xr_np = xr.DataArray(\n",
    "    data_np,\n",
    "    dims=[\"time\", \"lat\", \"lon\"],\n",
    "    coords=[date, lat, lon],\n",
    ")\n",
    "\n",
    "# -- Create DataArray with CuPy data\n",
    "data_xr_cp = xr.DataArray(\n",
    "    data_cp,\n",
    "    dims=[\"time\", \"lat\", \"lon\"],\n",
    "    coords=[date, lat, lon],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "94eb3898-97a9-419d-a158-158e92ca7c61",
   "metadata": {},
   "source": [
    "Now, let's define our function that calculate the saturation vapor pressure using Clausius-Clapeyron equation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "b11788f6-af4f-4cc9-8bd8-662d57fbf2a2",
   "metadata": {},
   "outputs": [],
   "source": [
    "def sat_p(t):\n",
    "    # return saturation vapor pressure\n",
    "    # using Clausius-Clapeyron equation\n",
    "    return 0.611 * np.exp(17.67 * (t - 273.15) * ((t - 29.65) ** (-1)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "e1c0b119-23f9-4397-957a-ace2a607ab6f",
   "metadata": {},
   "outputs": [],
   "source": [
    "start_time_np = time.time()\n",
    "\n",
    "es_np = xr.apply_ufunc(sat_p, data_xr_np)\n",
    "\n",
    "end_time_np = time.time()\n",
    "time_np = end_time_np - start_time_np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "53419533-005b-4ceb-9ae5-2deaa316c52a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "numpy.ndarray"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(es_np.data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "973e4b7e-1b2f-4fbb-a4da-a7a47b1be327",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "GroupBy with Xarray DataArrays using CuPy provides a 0.22 x speedup over NumPy.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "start_time_cp = time.time()\n",
    "\n",
    "es_cp = xr.apply_ufunc(sat_p, data_xr_cp)\n",
    "\n",
    "end_time_cp = time.time()\n",
    "time_cp = end_time_cp - start_time_cp\n",
    "\n",
    "print(\n",
    "    \"apply_ufunc with Xarray DataArrays using CuPy provides a\",\n",
    "    round(time_np / time_cp, 2),\n",
    "    \"x speedup over NumPy.\\n\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8c387d9-8c95-4559-a1a7-76c02219f1a9",
   "metadata": {},
   "source": [
    "Now, what is the output type? Does `apply_ufunc` preserve the underlying data type?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "a74d9141-c219-4341-93c3-bd4b1abbd80c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "cupy.ndarray"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(es_cp.data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ca7bf10-7919-41ad-acd8-5bc617b627ba",
   "metadata": {},
   "source": [
    "```{important}\n",
    "`apply_ufunc` preserves the underlying data type.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6a1bdc2b-1d49-4204-8668-1cff5c521379",
   "metadata": {},
   "source": [
    "In the timing test, you might notice not much speed-up when using CuPy. But let's run this cell another time: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "665125da-3082-4f5d-994c-41d2eac8cca6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "GroupBy with Xarray DataArrays using CuPy provides a 97.87 x speedup over NumPy.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "start_time_cp = time.time()\n",
    "\n",
    "es_cp = xr.apply_ufunc(sat_p, data_xr_cp)\n",
    "\n",
    "end_time_cp = time.time()\n",
    "time_cp = end_time_cp - start_time_cp\n",
    "\n",
    "print(\n",
    "    \"apply_ufunc with Xarray DataArrays using CuPy provides a\",\n",
    "    round(time_np / time_cp, 2),\n",
    "    \"x speedup over NumPy.\\n\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f9ac9b58-8d04-444c-9f25-b4b098a48848",
   "metadata": {},
   "source": [
    "Now, we can see much more speed-up using CuPy as explained in the first lesson: \n",
    "\n",
    "> When running these functions for the first time, you may experience a brief pause. This occurs as CuPy compiles the CUDA functions for the first time and cached them on disk for future use."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc3c4f8c-2077-4cdd-b86d-3a6643e38a4d",
   "metadata": {},
   "source": [
    "## Elementwise Kernel with CuPy\n",
    "\n",
    "Elementwise Kernels in CuPy allow for operations to be performed on an element-by-element basis on CuPy arrays.\n",
    "\n",
    "To create an elementwise kernel in CuPy, you need to use `cupy.ElementwiseKernel` class. This class defines a CUDA kernel which can be invoked by the `__call__` method of the instance.  \n",
    "\n",
    "This elementwise kernel takes three arguments: \n",
    "* a string defining the input type(s), \n",
    "* a string defining the output type(s), \n",
    "* and a string representing the operation to be performed, written in C syntax."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0fdf747a-cfda-4836-b69b-9abafb3842a1",
   "metadata": {},
   "source": [
    "In this example, we want to calculate Relative Humidity using Revised Magnus coefficients by Alduchov and Eskridge."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc0b24b1-3143-4fcc-81e5-d119af100bd0",
   "metadata": {},
   "source": [
    "Revised Magnus Coefficients by Alduchov and Eskridge:\n",
    "$$\n",
    "RH = \\left(\\frac{{6.112 \\cdot \\exp\\left(\\frac{{17.67 \\cdot (T_d - 273.15)}}{{T_d - 29.65}}\\right)}}{{6.112 \\cdot \\exp\\left(\\frac{{17.67 \\cdot (T - 273.15)}}{{T - 29.65}}\\right)}}\\right) \\times 100 \\%\n",
    "$$\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "a5727614-4289-45e0-b5fb-42997de34d61",
   "metadata": {},
   "outputs": [],
   "source": [
    "def calculate_relative_humidity(temp, dew_point):\n",
    "    \"\"\"\n",
    "    Calculate Relative Humidity using Revised Magnus coefficients by Alduchov and Eskridge.\n",
    "\n",
    "    Args:\n",
    "        temp (float): Temperature in Celsius.\n",
    "        dew_point (float): Dew Point Temperature in Celsius.\n",
    "\n",
    "    Returns:\n",
    "        float: Relative Humidity in percentage.\n",
    "    \"\"\"\n",
    "    temp += 273.15  # Convert temperature to Kelvin\n",
    "    dew_point += 273.15  # Convert dew point temperature to Kelvin\n",
    "\n",
    "    es_temp = 6.112 * np.exp(\n",
    "        (17.67 * (dew_point - 273.15)) / (dew_point - 29.65)\n",
    "    )  # Saturation vapor pressure at dew point\n",
    "    es_dew = 6.112 * np.exp(\n",
    "        (17.67 * (temp - 273.15)) / (temp - 29.65)\n",
    "    )  # Saturation vapor pressure at temperature\n",
    "\n",
    "    relative_humidity = (es_dew / es_temp) * 100.0  # Calculate relative humidity in percentage\n",
    "    return relative_humidity"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26d10b0a-1462-4fd9-9e53-5da2f36cd952",
   "metadata": {},
   "source": [
    "But for `Elementwise` kernels we need to write it in C syntax:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62a5996b-1a3d-443b-aa73-2594c52f96ba",
   "metadata": {},
   "source": [
    "1. Set the list of input and output arguments and their data types: \n",
    "\n",
    "    * input arguments : `float32 temp`, `float32 d_temp`\n",
    "    * output arguments : `float32 rh`\n",
    "\n",
    "2. Write the code body: \n",
    "``` C\n",
    "    temp += 273.15;\n",
    "    dew_point += 273.15;\n",
    "\n",
    "    // Calculate saturation vapor pressure at dew point\n",
    "    float es_temp = 6.112 * exp((17.67 * (dew_point - 273.15)) / (dew_point - 29.65));\n",
    "\n",
    "    // Calculate saturation vapor pressure at temperature\n",
    "    float es_dew = 6.112 * exp((17.67 * (temp - 273.15)) / (temp - 29.65));\n",
    "\n",
    "    // Calculate relative humidity in percentage\n",
    "    float relative_humidity = (es_dew / es_temp) * 100.0;\n",
    "    \n",
    "```\n",
    "\n",
    "3. Define the element-wise class and set the kernel name: \n",
    "\n",
    "```\n",
    "    compute_call = cp.ElementwiseKernel(input_list, output_list, code_body, 'RH')\n",
    "\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8289b506-5905-4b8d-b099-7d45dd819584",
   "metadata": {},
   "source": [
    "Now let's test to see how this works in a real example: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "id": "24f7fdbe-4c71-40b3-84af-564fd987f96b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create random data\n",
    "data_cp = 20 * (cp.random.rand(len(date), len(lat), len(lon)))\n",
    "\n",
    "\n",
    "# -- Create Temp DataArray with CuPy data\n",
    "temp = xr.DataArray(\n",
    "    data_cp,\n",
    "    dims=[\"time\", \"lat\", \"lon\"],\n",
    "    coords=[date, lat, lon],\n",
    ")\n",
    "\n",
    "\n",
    "offset = 20 * cp.random.rand(len(date), len(lat), len(lon))\n",
    "\n",
    "# -- Create Wet Bulb Temp DataArray with CuPy data\n",
    "\n",
    "temp_wet = xr.DataArray(\n",
    "    data_cp - offset,\n",
    "    dims=[\"time\", \"lat\", \"lon\"],\n",
    "    coords=[date, lat, lon],\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "id": "1fe6bf87-8ea2-4df6-8033-1c6a8debfd85",
   "metadata": {},
   "outputs": [],
   "source": [
    "input_list = \"float64 temp, float64 dew_temp\"\n",
    "output_list = \"float64 rh\"\n",
    "\n",
    "code_body = \"\"\"\n",
    "\n",
    "            // Calculate saturation vapor pressure at dew point\n",
    "            float es_temp = 6.112 * exp((17.67 * (dew_temp)) / (dew_temp - 29.65));\n",
    "\n",
    "            // Calculate saturation vapor pressure at temperature\n",
    "            float es_dew = 6.112 * exp((17.67 * (temp)) / (temp - 29.65));\n",
    "\n",
    "            // Calculate relative humidity in percentage\n",
    "            rh = (es_dew / es_temp) * 100.0;\n",
    "            \"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "id": "6b5cc07c-0de3-4f87-874b-6c2fb82e6c46",
   "metadata": {},
   "outputs": [],
   "source": [
    "## -- define the elementwise kernel:\n",
    "compute_call = cp.ElementwiseKernel(input_list, output_list, code_body, \"RH\")\n",
    "\n",
    "kernel = compute_call(data_cp, data_cp - offset)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "id": "3e794157-cd72-459d-a9a7-8818c9ed6b5b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 1.24 ms, sys: 0 ns, total: 1.24 ms\n",
      "Wall time: 1.24 ms\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "result = xr.apply_ufunc(\n",
    "    compute_call,\n",
    "    temp,\n",
    "    temp_wet,\n",
    ")\n",
    "##result"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1332e6e-ab34-4859-8b4b-9265eda4572d",
   "metadata": {},
   "source": [
    "How much this computation took if we wanted to use pure Python?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "id": "0030fec5-41fa-41bb-983b-b30ccc2ff734",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 3.97 ms, sys: 0 ns, total: 3.97 ms\n",
      "Wall time: 3.98 ms\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "relative_humidity = calculate_relative_humidity(temp, temp_wet)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c6da1f0-48f2-473b-a09f-51869611c679",
   "metadata": {},
   "source": [
    "We can see using the custom kernel method, we removed the pure Python overhead in between calculations, by creating a custom \"elementwise\" kernel that will run the entire computations on the GPU device. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "95dccf03-bb64-43ef-883e-9b84081c4514",
   "metadata": {},
   "source": [
    "Congratulations! You have now uncovered how to use `apply_ufunc` with custom CUDA kernels. \n",
    "\n",
    "## Summary\n",
    "\n",
    "In this notebook, we have learned about:\n",
    "\n",
    "* What `apply_ufunc` is and its importance in the xarray Python library.\n",
    "* The basic usage of `apply_ufunc` to apply your function to a DataArray.\n",
    "* Applying custom kernels to DataArray with CuPy\n",
    "\n",
    "```{seealso}\n",
    "[Xarray apply_ufunc](https://docs.xarray.dev/en/stable/generated/xarray.apply_ufunc.html)\n",
    "[CuPy User Guide](https://docs.cupy.dev/en/stable/user_guide/index.html)  \n",
    "[Xarray User Guide](https://docs.xarray.dev/en/stable/user-guide/index.html)  \n",
    "[Cupy-Xarray Github](https://github.com/xarray-contrib/cupy-xarray.git)\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}