This is a demo Space for a fine-tuned version of SmolVLM trained using rlaif-v dataset.
The corresponding model is located here.
For a full tutorial of fine-tuning using DPO, check out this link.