One thing that occurs to me is that even if we had a list of traits or measurable attributes that probabilistically correspond with consciousness, we’d probably still have a Goodhart’s Law problem due to programmability of AI systems. E.g. SuperAI would just claim “see... training run #8356 exhibited a lower than 10% chance of consciousness!”
I find this worrying because right now we seem pretty far from even having that list of proxy measures to begin with!
This is definitely a problem. However, some metrics might be harder to game. For example, suppose we could determine that a particular architecture was more likely to result in gaming. It's less easy for a company to sneakily use this architecture anyway.
Something else that's worrying to me in having quantifiable metrics is that people might directly optimize for them. We already see some researchers who want (seemingly because they think it's cool) to specifically design conscious systems. Absent regulation, metrics might just make this easier.
Great article!
One thing that occurs to me is that even if we had a list of traits or measurable attributes that probabilistically correspond with consciousness, we’d probably still have a Goodhart’s Law problem due to programmability of AI systems. E.g. SuperAI would just claim “see... training run #8356 exhibited a lower than 10% chance of consciousness!”
I find this worrying because right now we seem pretty far from even having that list of proxy measures to begin with!
This is definitely a problem. However, some metrics might be harder to game. For example, suppose we could determine that a particular architecture was more likely to result in gaming. It's less easy for a company to sneakily use this architecture anyway.
Something else that's worrying to me in having quantifiable metrics is that people might directly optimize for them. We already see some researchers who want (seemingly because they think it's cool) to specifically design conscious systems. Absent regulation, metrics might just make this easier.