Further thoughts on Fairchy
In my earlier post about Fairchy 1, I said that it fixed the FAI boxing issue; not so much solved it as did an end run around it.
That's true but I didn't explain it very well. The crucial idea is not that we've found and removed all avenues for "escaping the box". It's that we've brought our AI's goal out of the messy real world and into a system that we control. It wants to profit in a futarchy-style market. That's simple.
There's always a question of whether its goal matches our own collective human goals. Always, in any workable design of this nature, there must be selective pressure towards our own collective goals and that pressure must always outrun all selective pressure towards misguided "clippy" goals.
In Fairchy, the selection is not done in the messy real world, nor (fundamentally) in the unfathomable inner workings of an AI's rulebase, but in a marketplace that profits by serving human preferences.
What this implies
I partly covered this before, but I want to make it clearer and explore it more thoroughly.
"It wants to profit in a futarchy-style market". So the basic security threat here is that a speculator AI would lose the spirit of this rule and so its goals would change, probably in a harmful direction.
That threat seems to break down into these subheadings:
- A speculator finds a way to reward itself despite losing in the market.
- A speculator finds profitable behaviors within the market that go against the spirit of the utility function. Perhaps it enacts proposals which compromise the utility metric or compromise the market, thus creating a profit even though they "should" create a loss.
- A speculator perversely engages in unprofitable behaviors within the market, enough to create a real problem before it is stopped.
- A speculator finds a more direct way to alter the real world which affects the market to the AI's profit. Possibly it makes deals with outside entities.
Addressing these threats
Rewards itself despite losing in the market
This threat is more mechanical than the others. Breaking it down further, it's either:
- A breakdown in the AI itself.
- A breakdown in what communicates market results to the AI.
- A breakdown in the population selection algorithm.
Breakdown in the AI
A short-circuit, though it would probably occur in software. Here the AI tells itself it's doing great in the market when it isn't. For a while it lives in a fool's paradise making crazy bets.
Already, this is handled by the fact that speculators are individuals in a population-style learning algorithm (think genetic algorithm). Speculators that do not heed the market will eventually go broke and be replaced.
But we'd like to handle it faster, if possible. So the selection process might be augmented to actively detect broken AIs. I'll leave that as a future possibility.
Breakdown in what communicates market results
Here, the population algorithm doesn't help us because this might affect all AIs, and because it might not be the fault of the AI affected.
But it's largely a maintenance and channel-robustness problem. The protocols involved should be robust. Presumably we'd design them with such obvious steps as:
- periodic pinging - are the connections alive?
- checking - is what we received the same as what was sent?
- periodic auditing - does the history add up the way it ought to?
Breakdown in population selection
This area is crucial. Fortunately a population selection module would be much simpler than the "real" AIs, which helps security.
Some security measures are obvious:
- Keep the population selection processes isolated from pretty much everything.
- Make them robust.
- Make their decisions inescapable. Whatever machines host speculator processes need to absolutely respect the population selection's decisions to remove some speculators and add others.
There should also be dynamic and reactive security, and the measured security of this area needs to be part of the utility metric.
Finds behaviors within the market against the spirit
This reminds me that I left out a crucial role earlier: Proposer. The Proposer role is what makes the proposals that the market bets on.
A severe threat is lurking here. As I've repeatedly pointed out wrt Futarchy 2, the proposer and speculator roles can collude in ways that can rob the market or enact arbitrary proposals. I call this the Opacity Problem3.
So the proposer and speculator roles need to be separate. Yet those two roles are working from largely shared information and models. They benefit a lot from sharing information. So as before, I propose Speculator support modules to deal with this situation; I'd just extend them to support proposers too.
But keeping them separate isn't enough: if there exists any channel by which proposer and speculator can co-ordinate, the Opacity Problem can happen. So while my design keeps these two roles separate, that will only help a little, it won't suffice.
So my design includes the various means I have proposed of dealing with the Opacity Problem:
-
Measuring uncertainty in meaning via the market, and disadvantaging
unclear proposals.
- I proposed separate markets to measure uncertainty, but my proposal was complex. I now favor what Raph Frank proposed on my futarchy mailing list, separate markets that pay off with exponents near 1.0, for instance X1.1 and X0.9
- Requiring a certain amount of capitalization before enactment, in addition to price comparisons.
- Controlled language for proposals
- A hierarchy of policy "power", with more powerful levels having stronger restrictions on clarity.
- (Etc. I've discussed anti-Opacity Problem measures at more length on my futarchy mailing list)
Perversely engages in unprofitable behaviors
This is the least of the threats. There's constant selective pressure against this. Other AIs can probably profit by detecting or anticipating such behavior.
But there is a subtle threat, once that you might not expect: Other AIs, since they can profit by detecting or anticipating such behavior, find a way to encourage it, and then of course profit by blocking enactment. They may well find that this scam works best when occasionally the rogue proposal gets enacted, the better to prove their wisdom in betting against it.
In other words, they compete and fight dirty and hurt us in the process.
One approach would be to change the reward function so that part of it reflects group success. But that isn't possible. In a population-style learning algorithm, reward is zero-sum.
Approaches that might work:
- Like the above, but reward wrt smaller groups, and somehow arrange that opportunities for influencing weaker competitors occur mostly or only within these groups.
- Like the above, but a low overall reward results in different behavior by the population algorithm. Possibly it results in more mutation in the next generation.
- Provide a more direct way of taking advantage of weaker competitors, one that does not result in actually enacting proposals that the weaker competitors foolishly support. Possibly the bets of newborn competitors do not help a proposal be enacted.
- Prevent or weaken any mechanism for colluding to allow rogue proposals to pass. Anonymous betting may suffice.
Finds a way to alter the real world
This is the most direct threat. For a human speculator, it would look like (say) shorting a company's stock and then blowing that company's buildings up and making a profit. And then hopefully being caught and punished, deterring other saboteur profiteers.
For an AI, it's less straightforward. An AI speculator lives in virtual space; it wouldn't have any robot arms with which to blow things up. Its communications would be limited and easy to monitor. Its bank account could only be used for the few legitimate purposes it has: betting, buying information, not much more.
Still, there is a real risk that an AI would co-ordinate with outside entities and profit from destruction. This is essentially like human crime, and might be addressed by the same general types of measures:
-
Prevention
-
Remove the temptation
- Require proposals to be cast in form such that speculators can't profit from sabotage and extortion.
- Detect and stop the plans before they are acted on. Since we can monitor speculator AI communications, they are at a disadvantage in this. But they could still use encryption and steganography.
- Detect AIs at risk for doing this. There's a measure that can be used with AIs, as I proposed before: run the AI with fake temptation and see if it succumbs. That's what the Tempter and Temptation-faller roles are about.
-
Remove the temptation
-
Deterrence
- Investigation and subsequent punishment, presumably removal via the population algorithm.
-
Systemic measures
- Add to the Security Testers role a sub-role to measure this threat.
- Add a component to the utility metric that measures risk of such AI-crimes.
Footnotes:
1 And now I wish I'd left more anchors in the first fairchy post, because now I want to link to different parts of it and I can't.
2 And found to my shock and chagrin that Robin Hanson was incapable of understanding the Opacity Problem. I tried eight times and finally gave up explaining it to him when he became insulting.
3 Basically the exploit is that the proposer makes a proposal that only he can decode. Sometimes beneath the opaque exterior it's an honest proposal or a copy of one, sometimes it's "gimme all the money". If others bet against it, he can basically rob them. If not, he can enact it (and so rob everyone).
No comments:
Post a Comment