Usage

This library takes in a task (like face localization) and computes activations for the data and computes the top percent of activations and can compute the model with those activations ablated.

A task is just a Pandas dataframe with two required columns and one optional:

data: contains the data or path to the data (you define how to fetch data in model_forward later on).
positive: True or False. True when part of positive stimuli (ie faces). False when part of negative control stimuli (ie objects in the face localizer case).
Optional validation: True or False. False means part of localization training. True means part of unseen validation that can be used later on to test the performance.

See face_data_viewer.ipynb to view an example task.

Next see how to operate on the task to localize your model below!

Install

pip install deeplocalizer

or

uv add deeplocalizer

Import

from deeplocalizer import DeepLocalizer # or import other functions in API below

Example

An example using all of the core and visualization functions, see the face localization on Resnet example: resnet34_example.ipynb.

API

Core

load_task

load_task(filename: str) -> tuple[pd.DataFrame, pd.DataFrame] | pd.DataFrame

Loads in the task from a .parquet file (dataframe) with the columns data and positive and optionally validation

Parameters:

Name	Type	Description	Default
`filename`	`str`	where the task is located as a .parquet file. Underlying file must be parquet	required

Returns:

Type	Description
`tuple[pd.DataFrame, pd.DataFrame] \| pd.DataFrame`	If no `validation` column, just returns the dataframe. If there is a validation column splits the return into (task, validation).

Source code in deeplocalizer/deeplocalizer.py

def load_task(filename: str) -> tuple[pd.DataFrame, pd.DataFrame] | pd.DataFrame:
    """Loads in the task from a .parquet file (dataframe) with the columns `data` and `positive` and optionally `validation`

    Parameters:
        filename: where the task is located as a .parquet file. Underlying file must be parquet

    Returns:
        If no `validation` column, just returns the dataframe. If there is a validation column splits the return into (task, validation).
    """
    assert os.path.exists(filename), "task file must exist"

    df = pd.read_parquet(filename)
    assert_is_task(df)

    if "validation" in df.columns:
        task = df[df["validation"] == False]
        validation = df[df["validation"] == True]
        return task, validation
    else:
        return df

DeepLocalizer

DeepLocalizer(task: pd.DataFrame, layers_activations: list[torch.nn.Module], model_forward: ModelForwardFunc, save_activations_func: SaveActivationsFunc = default_save_activations, ablate_activations_func: AblateActivationsFunc = default_flat_idxs_ablate, ablate_factor: float = 0.0, batch_size: int = 32)

Parameters:

Name	Type	Description	Default
`task`	`pd.DataFrame`	Pandas Dataframe with columns `data` and `positive` (bool).	required
`layers_activations`	`list[torch.nn.Module]`	list of torch modules/layers that we take activations from the outputs of.	required
`model_forward`	`ModelForwardFunc`	callback that given a list of data, runs model inference. Can return single output or tuple of outputs from here.	required
`save_activations_func`	`SaveActivationsFunc`	defines how to save activations from each of the layer_activations. By default just saves the entire output.	`default_save_activations`
`ablate_activations_func`	`AblateActivationsFunc`	defines how to ablate the activations from each of the layer_activations. By default ablates based on the flattened index.	`default_flat_idxs_ablate`
`ablate_factor`	`float`	what to multiply the values to be ablated. Defaults to 0 (full ablation).	`0.0`
`batch_size`	`int`	Defines how many inputs passed into model_forward at a time.	`32`

compute_activations

compute_activations() -> DeepLocalizer

Computes activations using the given provided model_forward. Must populate activations to use other core functions.

load_activations

load_activations(filename: str, device: torch.device = 'cpu') -> DeepLocalizer

Loads activations into the DeepLocalizer from a file/disk. Must populate activations to use other core functions.

Parameters:

Name	Type	Description	Default
`filename`	`str`	the filename of a .safetensors file where each dict key is an integer formatted a string	required
`device`	`torch.device`	where to load tensors into. Essentially passes into torch.tensor(..., device)	`'cpu'`

save_activations

save_activations(filename: str)

Save activations from the DeepLocalizer to a file/disk. Can be loaded later on with load_activations

Parameters:

Name	Type	Description	Default
`filename`	`str`	the filename to save the activations to disk	required

top_percent_activations

top_percent_activations(top_percent: float, transform=lambda x: torch.abs(x)) -> tuple[list[int], list[int]]

regular_model_forward

regular_model_forward(df: pd.DataFrame = None) -> tuple[ModelForwardReturn, ModelForwardReturn]

Runs the model on positive=True dataframe and positive=False instances w/ no ablation (original model).

Parameters:

Name	Type	Description	Default
`df`	`pd.DataFrame`	the task data to run model on. Must have columns `data` and `positive` (bool). Defaults to the task provided in the constructor.	`None`

Returns:

Type	Description
`tuple[ModelForwardReturn, ModelForwardReturn]`	(Model outputs for `positive=True` rows, Model outputs for `positive=False` rows)

ablate_model_forward

ablate_model_forward(ablate_activations: list[AblateIdxs], df: pd.DataFrame = None) -> tuple[ModelForwardReturn, ModelForwardReturn]

Runs the ablated model on positive=True dataframe and positive=False instances.

Parameters:

Name	Type	Description	Default
`ablate_activations`	`list[AblateIdxs]`	List where each row is a layer and each column is a flat index of the outputs to ablate.	required
`df`	`pd.DataFrame`	the task data to run model on. Must have columns `data` and `positive` (bool). Defaults to the task provided in the constructor.	`None`

Returns:

Type	Description
`tuple[ModelForwardReturn, ModelForwardReturn]`	(Model outputs for `positive=True` rows, Model outputs for `positive=False` rows)

Visualization

visualize_activations

visualize_activations(activations: list[torch.Tensor], grid=None, cmap='viridis')

Source code in deeplocalizer/deeplocalizer.py

def visualize_activations(activations: list[torch.Tensor], grid=None, cmap="viridis"):
    if grid is None:
        # first combine all the layers so can do argpartition
        for i, a in enumerate(activations):
            plt.title(f"Layer {i}")
            plt.imshow(squarify(a), cmap=cmap, aspect="auto")
            plt.colorbar()
            plt.show()
    else:
        fig, axes = plt.subplots(*grid, figsize=(16, 9))
        fig.suptitle("Absolute activations")
        for i, ax in enumerate(axes.flat):
            a = activations[i]
            im = ax.imshow(squarify(a), cmap=cmap, aspect="auto")
            ax.set_title(f"Layer {i}")
            plt.colorbar(im, ax=ax)
            no_ticks(ax)
        plt.show()

visualize_top_per_layer

visualize_top_per_layer(top_idxs, activations, title='Percentage Top activations per layer')

Source code in deeplocalizer/deeplocalizer.py

def visualize_top_per_layer(
    top_idxs, activations, title="Percentage Top activations per layer"
):
    import seaborn as sns
    import numpy as np

    total_lengths = [prod(a.shape) for a in activations]
    percentages = (
        np.array([len(top_idxs[i]) / l for i, l in enumerate(total_lengths)]).reshape(
            (-1, 1)
        )
        * 100
    )
    labels = np.array([f"{p[0]:.2f}%" for p in percentages]).reshape(percentages.shape)

    plt.figure(figsize=(4, 6))
    ax = sns.heatmap(
        percentages,
        cmap="inferno",
        annot=labels,
        annot_kws={"fontsize": 10},
        fmt="s",
        linecolor="white",
        linewidths=1,
    )
    ax.set(
        title=title,
        xticklabels=[],
        xticks=[],
        ylabel="Layers",
    )

    plt.show()

visualize_top_activations

visualize_top_activations(top_idxs, top_values, activations, grid=(4, 4), title='Top % activations showing')

Source code in deeplocalizer/deeplocalizer.py

def visualize_top_activations(
    top_idxs, top_values, activations, grid=(4, 4), title="Top % activations showing"
):
    fig, axes = plt.subplots(*grid, figsize=(16, 9))
    fig.suptitle(title)
    for i, ax in enumerate(axes.flat):
        a = activations[i]
        top_idx = top_idxs[i]
        top_value = top_values[i]
        visualize_top_activation(ax, top_idx, top_value, a, title=f"Layer {i}")
    plt.show()

Types

ModelForwardFunc `module-attribute`

ModelForwardFunc = Callable[[list[Any]], ModelForwardReturn]

ModelForwardReturn `module-attribute`

ModelForwardReturn = Any | tuple

SaveActivationsFunc `module-attribute`

SaveActivationsFunc = Callable[[torch.Tensor], torch.Tensor]

AblateActivationsFunc `module-attribute`

AblateActivationsFunc = Callable[[torch.Tensor, AblateIdxs, float], torch.Tensor]

AblateIdxs `module-attribute`

AblateIdxs = torch.Tensor | list[int]

Usage

Install

Example

API

Core

load_task

DeepLocalizer

compute_activations

load_activations

save_activations

top_percent_activations

regular_model_forward

ablate_model_forward

Visualization

visualize_activations

visualize_top_per_layer

visualize_top_activations

Types

ModelForwardFunc module-attribute

ModelForwardReturn module-attribute

SaveActivationsFunc module-attribute

AblateActivationsFunc module-attribute

AblateIdxs module-attribute

ModelForwardFunc `module-attribute`

ModelForwardReturn `module-attribute`

SaveActivationsFunc `module-attribute`

AblateActivationsFunc `module-attribute`

AblateIdxs `module-attribute`