SmolVLM-trl-dpo-rlaif-v Demo

This is a demo Space for a fine-tuned version of SmolVLM trained using rlaif-v dataset.

The corresponding model is located here.

For a full tutorial of fine-tuning using DPO, check out this link.