# Welcome to the Crypticorn Hive AI Tutorial Notebook

This notebook is designed to guide you through interacting programmatically with Crypticorn's Hive AI. Here, you'll find everything you need to get started with essential functionalities, including setting up your model, managing data, and running evaluations.

The functionality provided here is straightforward but sufficient to set up and evaluate your model. For detailed information about Hive AI itself use the [documentation](https://docs.crypticorn.com/crypticorn-token-aic/crypticorns-technology/institutions-developer), for using the Python SDK refer to [Python Docs](https://docs.crypticorn.dev/sdk/python). We will use the Python Client with the pydantic response type throughout this tutorial. This guide will cover the following steps:

1. **[Getting started](#scrollTo=Yi4mjFrmdRak)**
2. **[Creating a model](#scrollTo=KLdVDy3sdewi)**
3. **[Downloading data](#scrollTo=HVzQLCS-dkcm)**
4. **[Exploring the data](#scrollTo=IrmeDKexdpqd)**
5. **[Training your model](#scrollTo=NCyDE3cCj8Np)**
6. **[Evaluating your model](#scrollTo=fn_m7UG4maN0)**

> **Note:** You can also create models and download data directly from the Crypticorn dashboard. However, using this notebook allows for a more efficient, one-stop workflow—especially important since model evaluation can only be done programmatically.


---



 IT IS RECOMMENDED TO USE THIS NOTEBOOK WITH THE VSCODE PLUGIN ([VSCode Marketplace](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter)).

# Getting started

Install the crypticorn package from PyPI...

In [None]:
pip install crypticorn[extra] # installs pandas alongside crypticorn

...and import the Crypticorn class and additional packages.

In [59]:
from crypticorn import AsyncClient
import pandas as pd
from sklearn.linear_model import LogisticRegression, LinearRegression

If you haven't set up an API Key yet, please generate one in the *Account > Developer* section of the dashboard and select all scopes containing _hive_.

In [17]:
KEY = 'YOUR-API-KEY'

To get started, instantiate the client class. For this tutorial, we'll call the instance `aic`.


In [19]:
aic = AsyncClient(api_key=KEY)

# Creating a model


With our `aic` instance ready, we can now create our first model. In this example, we'll create a model with the coin identifier `1` and set the target to `"Tatooine"`.


> **Optional:** Want to train your model on a different coin or target? Simply run the following code to fetch the latest data options.


In [None]:
options = await aic.hive.data.get_data_info()
print(f"COINS: {options.coins}")
print(f"TARGETS: {options.targets}")
print(f"FEATURE SIZES: {options.feature_sizes}")

In [None]:
from crypticorn.hive import Target, Coins, TargetType
TARGET = Target.TATOOINE
TARGET_TYPE = [target.type for target in options.targets if target.name == TARGET][0]
COIN = Coins.ENUM_1
print(f"{TARGET} is of type {TARGET_TYPE}")
print(f"{TargetType.BINARY.value} targets are used for classification, {TargetType.CONTINUOUS.value} targets are used for regression")

In [None]:
from crypticorn.hive import ModelCreate
import random
name = "unique-model-" + str(random.randint(0, 1000000))
my_model = await aic.hive.models.create_model(ModelCreate(coin_id=COIN, target=TARGET, name=name))
my_model

Upon receiving a successful response from the model creation request, you should store the `model_id` as a constant. This will allow you to easily reference it later in the tutorial.


In [51]:
MODEL_ID = my_model.id

# Downloading data

Next, we will download the data for the model we just created. You can specify a particular `version` of the data (or leave it empty to receive the latest version). Additionally, you can define the `feature_size`, which determines the number of features in the dataset. The default setting is `"large"`.

In [41]:
from crypticorn.hive import FeatureSize, DataVersion
FEATURE_SIZE = FeatureSize.SMALL

> **Optional:** To check the available data versions and feature sizes, run the following code snippet.
The output indicates that version `"x"` contains datasets for `coin_id` `"y"` and `"z"`, which are available in feature sizes `"a"`, `"b"`, and `"c"`.

In [None]:
options.data

To keep things simple, we'll use the latest data version and set feature_size to "small". You can experiment with other settings by uncommenting each line.

> **Important** If you download, train, and evaluate your model on older data versions (i.e., specify a version other than the latest), results won't be saved or appear on the leaderboard. This migh be useful for testing your model with additional data.

In [None]:
await aic.hive.data.download_data(model_id=MODEL_ID, feature_size=FEATURE_SIZE)
# await aic.hive.data.download_data(model_id=MODEL_ID, version=DataVersion.ENUM_1_DOT_0) # version 1.0, default feature size
# await aic.hive.data.download_data(model_id=MODEL_ID, feature_size=FeatureSize.MEDIUM) # latest version, feature size 'medium'
# await aic.hive.data.download_data(model_id=MODEL_ID) # latest version, default feature size
# await aic.hive.data.download_data(model_id=MODEL_ID, folder="custom_folder") # latest version, default feature size, custom folder

After a successful download, the data will be stored in the **Files** tab of this notebook. Please be patient, as it may take some time for the data to load completely.

You can check the **Files** tab to verify that the data files are present.

# Exploring the data

Now that we have downloaded the data, it's time to get an overview. You can load the data into a DataFrame and display its structure, including the first few rows, to understand its contents better.

In [84]:
data_dir = 'data/v1.0/coin_1/' # change to the actual path
target = TARGET.value # replace with the actual target if you used a different one
feature_size = FEATURE_SIZE.value # replace with the actual feature size if you used a different one

X_train = pd.read_feather(data_dir + f'x_train_{feature_size}.feather')
X_test = pd.read_feather(data_dir + f'x_test_{feature_size}.feather')
y_train = pd.read_feather(data_dir + f'y_train_{target}.feather')

In [None]:
print(f"X_train shape: {X_train.shape}")
print(f"X_train columns: {X_train.columns.tolist()}")
print(f"X_train head:\n{X_train.head()}")
print(f"X_train describe:\n{X_train.describe()}")

In [None]:
print(f"X_test shape: {X_test.shape}")
print(f"X_test columns: {X_test.columns.tolist()}")
print(f"X_test head:\n{X_test.head()}")
print(f"X_test describe:\n{X_test.describe()}")

In [None]:
print(f"y_train shape: {y_train.shape}")
print(f"y_train columns: {y_train.columns.tolist()}")
print(f"y_train head:\n{y_train.head()}")
print(f"y_train describe:\n{y_train.describe()}")

# Training your model

Now, let's dive into the most important part—training the actual model! While the tutorial model will not be magical, your implementation can be.

For demonstration purposes, we will train a simple regression model to predict our continuous target, `"Tatooine"`.


In [None]:
model = LinearRegression()
model.fit(X_train, y_train.values.ravel())
y_pred = model.predict(X_test)
y_pred_df = pd.DataFrame(y_pred, columns=['Prediction'])
print("Your prediction has {} rows".format(len(y_pred_df)))
y_pred_df.head()

> **IMPORTANT:** Ensure that your predictions contain only one column, and remember to fill or fix any NaN values (DO NOT drop them). You can name the column whatever you prefer.

# Evaluating your model

Now that we have our predictions ready, we can submit them to the Hive AI for evaluation against the internal `y_test`. This evaluation will provide detailed information about the performance of your model.
> **Important** Specify the data version if your model uses older data; otherwise, it will be evaluated against the latest, potentially affecting accuracy. Leave it empty if using the latest data.


In [None]:
res = await aic.hive.models.evaluate_model(MODEL_ID, y_pred_df.to_dict(orient="records"))
res.model_dump()

> **Recommendation:** It is advisable to copy the output into a [JSON viewer](https://jsonviewer.stack.hu/) for a better overview. The output will contain either regression or classification metrics, depending on the type of target you are training.

That's it! Easy, right? Well, easy aside from generating world-class predictions! :D

You've successfully learned how to interact with Crypticorn's Hive AI, from setting up your environment to training and evaluating a machine learning model.

> **Note:** You can delete your demo/tutorial model in the developer settings on the dashboard if you no longer need it.

Happy coding!