The hype cycle for Google’s fabulous new AI Co-Scientist tool, based on the Gemini LLM, includes a BBC headline about how José Penadés’ team at Imperial College asked the tool about a problem…
Perhaps it’s not exactly equivalent since this is an LLM, but from what I’ve learnt in my undergrad machine learning course, shouldn’t the test data be separate from the training data?
The train-test (or train-validate-test) split was one of the first few things we learnt to do.
Otherwise, the model can easily get a 100% accuracy (or whatever relevant metric) simply by regurgitating training data, which looks like the case here.
Perhaps it’s not exactly equivalent since this is an LLM, but from what I’ve learnt in my undergrad machine learning course, shouldn’t the test data be separate from the training data?
The train-test (or train-validate-test) split was one of the first few things we learnt to do.
Otherwise, the model can easily get a 100% accuracy (or whatever relevant metric) simply by regurgitating training data, which looks like the case here.
but that won’t trick investors into funding more of it