I don’t think that the forcing of an answer is the source of the problem you’re describing. The source actually lies in the problems that the AI is taught to solve and the data it is provided to solve the problem.
In the case of medical image analysis, the problems are always very narrowly defined (e.g. segmenting the liver from an MRI image of scanner xyz made with protecol abc) and the training data is of very high quality. If the model will be used in the clinic, you also need to prove how well it works.
For modern AI chatbots the problem is: add one word to the end of the sentence starting with a system prompt, the data provided is whatever they could get on the internet, and the quality controle is: if it sounds good it is good.
Comparing the two problems it is easy to see why AI chatbots are prone to hallucination.
The actual power of the LLMs on the market is not as glorified google, but as foundational models that are used as pretraining for actual problems people want to solve.
I don’t think that the forcing of an answer is the source of the problem you’re describing. The source actually lies in the problems that the AI is taught to solve and the data it is provided to solve the problem.
In the case of medical image analysis, the problems are always very narrowly defined (e.g. segmenting the liver from an MRI image of scanner xyz made with protecol abc) and the training data is of very high quality. If the model will be used in the clinic, you also need to prove how well it works.
For modern AI chatbots the problem is: add one word to the end of the sentence starting with a system prompt, the data provided is whatever they could get on the internet, and the quality controle is: if it sounds good it is good.
Comparing the two problems it is easy to see why AI chatbots are prone to hallucination.
The actual power of the LLMs on the market is not as glorified google, but as foundational models that are used as pretraining for actual problems people want to solve.