unitorch.models.vit¤

ViTProcessor¤

Bases: HfImageClassificationProcessor

Processor for ViT-based image classification models.

Initializes the ViTProcessor.

Parameters:

Name	Type	Description	Default
`vision_config_path`	`str`	Path to the vision configuration file.	required

Source code in src/unitorch/models/vit/processing.py

def __init__(
    self,
    vision_config_path: str,
):
    """
    Initializes the ViTProcessor.

    Args:
        vision_config_path (str): Path to the vision configuration file.
    """
    vision_processor = ViTImageProcessor.from_json_file(vision_config_path)
    super().__init__(
        vision_processor=vision_processor,
    )

ViTForImageClassification¤

Bases: GenericModel

ViT model for image classification tasks.

Initializes the ViTForImageClassification model.

Parameters:

Name	Type	Description	Default
`config_path`	`str`	Path to the configuration file.	required
`num_classes`	`Optional[int]`	Number of classes. Defaults to 1.	`1`

Source code in src/unitorch/models/vit/modeling.py

def __init__(
    self,
    config_path: str,
    num_classes: Optional[int] = 1,
):
    """
    Initializes the ViTForImageClassification model.

    Args:
        config_path (str): Path to the configuration file.
        num_classes (Optional[int]): Number of classes. Defaults to 1.
    """
    super().__init__()
    config = ViTConfig.from_json_file(config_path)

    self.vit = ViTModel(config)
    self.classifier = nn.Linear(config.hidden_size, num_classes)
    self.init_weights()

forward ¤

forward(pixel_values: Tensor)

Forward pass of the ViTForImageClassification model.

Parameters:

Name	Type	Description	Default
`pixel_values`	`Tensor`	Input tensor of shape [batch_size, num_channels, height, width].	required

Returns:

Type	Description
`Tensor`	Output logits of shape [batch_size, num_classes].

Source code in src/unitorch/models/vit/modeling.py

def forward(
    self,
    pixel_values: torch.Tensor,
):
    """
    Forward pass of the ViTForImageClassification model.

    Args:
        pixel_values (torch.Tensor): Input tensor of shape [batch_size, num_channels, height, width].

    Returns:
        (torch.Tensor):Output logits of shape [batch_size, num_classes].
    """
    vision_outputs = self.vit(
        pixel_values=pixel_values,
    )
    pooled_output = vision_outputs[1]
    return self.classifier(pooled_output)