This research tests if GPT-4 can pass a UK physics undergraduate degree by applying it to all coursework and exams under a "maximal cheating" approach.
-----
https://arxiv.org/abs/2412.01312
🤔 Original Problem:
→ Traditional assessment methods in physics education face disruption from AI tools that can potentially handle complex academic tasks, raising concerns about academic integrity.
-----
🔧 Solution in this Paper:
→ The researchers tested GPT-4 on a complete BSc Physics curriculum at the University of Hull, including all examinations and coursework.
→ They employed a "maximal cheating" strategy, allowing question modification, breaking problems into sub-components, and using advanced prompting techniques.
→ The study evaluated performance across theoretical knowledge, computational tasks, laboratory work, and oral examinations.
-----
💡 Key Insights:
→ GPT-4 excels at coding tasks and single-step problems (85-100%)
→ Struggles with multi-step problems and interdisciplinary questions
→ Cannot handle laboratory work or oral examinations
→ Performs better in astronomy than classical mechanics
-----
📊 Results:
→ Overall grade: 65% (Upper Second Class)
→ Failed degree due to inability to pass laboratory components
→ Programming tasks: 85-100% success rate
→ Multi-step problems: 60-69% accuracy
→ Failed final project viva voce examination
Share this post