Another fine tuning technique: use top-k sampling to limit the model's choices to the top k most likely next tokens. This helps in producing more coherent and contextually appropriate responses. For instance, setting k to 50 restricts the model to the top 50 choices. This can be done using API, refer to my temperature adjustment post earlier!