FAIrchy1
In this blog post I'm revisiting a comment I made on overcomingbias2. I observed that Eliezer Yudkowsky's Friendly Artificial Intelligence (FAI) and futarchy have something in common, that they are both critically dependent on a utility function that has about the same requirements. The requirements are basically:
- Society-wide
- Captures the panorama of human interests
- Future-proof
- Secure against loophole-finding
Background: The utility function
Though the utility functions for FAI and futarchy have the same requirements, thinking about them has developed very differently. The FAI (Singularity Institute) idea seems to be that earlier AIs would think up the right utility function. But there's no way to test that the AI got it right or even got it reasonable.
In contrast, in talking about futarchy it's been clear that a pre-determined utility function is needed. So much more thought has gone into it from the futarchy side. In all modesty, I have to take a lot of the credit for that myself. However, I credit Robin Hanson with originally proposing using GDP3. GDP as such won't work, of course, but it is at least pointed in the right general direction.
My thinking about the utility function is more than can be easily summed up here. But to give you a general flavor of it, the problem isn't defining the utility function itself, it's designing a secure, measurable proxy for it. Now I think it should comprise:
- Physical metrics (health, death, etc)
- Economic metrics
-
Satisfaction surveys.
- To be taken in physical circumstances similar to secret-ballot voting, with similar measures against vote-selling, coercion, and so forth.
- Ask about overall satisfaction, so nothing falls into the cracks between the categories.
- Phrase it to compare satisfaction across time intervals, rather than attempting an absolute measure.
- Compare multiple overlapping intervals, for robustness.
- Existential metrics
- Metrics of the security of the other metrics.
-
Citizen's proxy metrics. Citizens could pre-commit part of their
measured satisfaction metric according to any specific other metric
they chose.
-
This is powerful:
- It neatly handles personal identity issues such as mind uploading and last wills.
- It gives access to firmer metrics, instead of the soft metric of reported satisfaction.
- It lets individuals who favor a different blend of utility components effect that blend in their own case.
- May provide a level of control when we transition from physical-body-based life to whatever life will be in the distant future.
- All in all, it puts stronger control in individual hands.
-
But it's also dangerous. There must be no way to compel anyone
to proxy in a particular way.
- Proxied metrics should be silently revocable. Citizens should be encouraged, if they were coerced, to revoke and report.
- It should be impossible to confirm that a citizen has made a certain proxy.
- Citizens should not be able to proxy all of their satisfaction metric.
-
This is powerful:
-
(Not directly a utility component) Advisory markets
- Measure the effectiveness of various possible proxies
- Intended to help citizens deploy proxies effectively.
- Parameterized on facets of individual circumstance so individuals may easily adapt them to their situations and tastes.
- These markets' own utility function is based on satisfaction surveys.
This isn't future-proof, of course. For instance, the part about physical circumstances won't still work in 100 years. It is, however, something that an AI could learn from and learn with.
Background: Clippy and the box problem
One common worry about FAI is when the FAI gets really good at implementing the goals we give it, the result for us would actually be disastrous due to subtle flaws in the goals. This perverse goal is canonically expressed as Clippy trying to tile the solar system with paper clips, or alternatively with smiley faces.
The intuitive solution is to "put the AI in a box". It would have no direct ability to do anything, but would only give suggestions which we could accept or disregard. So if the FAI told us to tile the solar system with paper clips, we wouldn't do it.
This is considered unsatisfactory by most people. To my mind, that's very obvious. It almost doesn't need supporting argument, but I'll offer this: To be useful, the FAI's output would certainly have to be information-rich, more like software than like conversation. That information-richness could be used to smuggle out actions, or failing that, to smuggle out temptations. Now look how many people fall for phishing attacks even today. And now imagine a genius FAI phishing. A single successful phish could set in motion a chain of events that allows the FAI out of the box.
FAIrchy: The general idea
What I propose is this: The most important AIs, rather than directly doing things or even designing and advising, should be traders in a futarchy-like system. As such, they would in effect govern other AIs that design, advise, and directly do things.
At first, they'd be trading alongside humans (as now). Inevitably with Moore's Law they'd dominate trading, and humans would only use the market to hedge. By then, AIs would have organically evolved to do the right (human-satisfying) thing.
Treat these AI traders as individuals in a population-style search algorithm (think genetic programming). Select for the most profitable ones and erase those that overstepped their roles.
Advantages
- There's a built-in apprenticeship stage, in that the AIs are basically doing their eventual job even in the early stages, so any striking problems will be apparent while humanity can still correct them.
- We get the advantage of a reasonable satisfaction metric up front, rather than hoping AIs will design it well.
- These AIs have no incentive to try to get themselves unboxed. Earlier I talked about subtly perverse utility functions. But with these, we understand the utility function: make a profit in the decision markets. They can't go subtly off the track of human happiness, because that's not even the track they're intended to be on. We do need to make sure that corrupting the utility metric can't pay off, of course, but that's not a new issue.
- The AIs would learn from people's real satisfaction, not just from theoretical projections.
About the separate AI roles
In general
The healthy performance of each role should be a component of the overall utility function.
Separation of roles: Why
Don't allow mingling of AI roles, and especially not the speculators role and the security-tester role. The threat here is that a speculator AI that also moves in the real world may find a way to short-circuit the system for profit. For instance, it might find a way to distort the satisfaction reports, or destroy things corresponding to issues it had shorted.
Put a different way, we don't want the various roles to co-evolve outside of their proper functions. We never want a situation where one role (say, security) is compromised because on the whole, it's more profitable to compromise it and profit somewhere else (say, in speculating)
Effectively, this separation creates a sort of distributed system that includes us and our satisfaction metric. This was never a desideratum but it is encouraging.
Separation of roles: How
Of course we'd use the obvious physical and computational security measures. We'd run the trader AIs in a way that lets us physically monitor their traffic and operations. Probably they'd be run in virtual machines.
I'll talk about other measures in the next section, in regard to specific roles.
The AI roles
Speculator
The role described above. Trades in a big futarchy decision market.
Effector
AIs that "do things". This includes probably every role you pictured an AI in before now.
They, like society, would be governed by the FAIrchy system. This probably cannot be improved on because it subsumes every other conceivable proposal for making rules for them or turning them off.
Sub-speculator
Sub-speculators also trade in a big futarchy decision market. Their purpose is to trade against trivial market patterns, for instance the "january slump", so that the speculators can only focus on meaningful patterns.
Sub-speculators would be deliberately limited in information and processing power, so that they couldn't out-think the real speculators, but must simply react faster and more lightly.
Several possible means of effecting this situation:
- These might be "brokers" thru which the speculators trade.
- They might simply be allowed quicker access.
- They might be run in short bursts so they have no time to think deeply but plenty of CPU power to react fast.
Tempter
To keep roles from mingling, we might use defect seeding: mechanically present AIs with apparent temptations to overstep their role; erase the ones took the bait and subtract fitness from their relatives.
This would eventually be done by other AIs that constructed the temptations and were rewarded accordingly. Running them in a virtual machine helps: clone the AI without the copy knowing it's a clone.
Temptation-faller, etc
To keep the tempters themselves honest, the temptations would also be applied to AIs that looked like speculators, but are known to fall to temptation. They might be copies of speculators that fell and were turned off. If they didn't fall, we'd know the temptations weren't effective.
Less crucially, known honest AIs would be used too and expected not to fall, so that temptations are kept fair.
Temptation-faller designers would also need testing; this can regress quite a bit.
Information provider
We'd remove any incentive to get out into the world to look for better information by feeding all the speculators the best-quality information available.
This would also removes any incentive for outside competition that doesn't follow the non-mingling rules. So we are looking for a self-sustaining monopoly on best information-for-AI-speculators. This condition might form part of the utility function.
Information providers would be rewarded according to both:
- Overall decision market success
- Individual orders by speculators
Speculator support modules
When two or more AI speculators find certain computations useful, have them farm off that particular code to a support module. This probably shouldn't be optional. This profits the two speculators because they are now spending half as much computing time for the same information.
A speculator support module becomes a separate entity, obliged to make its results available to any speculator. It would have its own fitness function. If its fitness function fell below 0, it could be reabsorbed by a speculator, if it was still wanted. Its fitness wouldn't fall below zero if two or more speculators are interested enough to run it themselves.
Security testers
These would try to measure the security of the human satisfaction metric and other sensitive parts of the system. They don't try to improve it or safeguard it, just to test it without compromising it.
They must be separate from speculators in the overall market.
Security test testers
Again, the roles regress but dwindle.
Satisfaction extrapolator (Maybe)
This role is inspired by Eliezer's CEV but structured very differently. Like CEV, these AIs would try to predict human satisfaction in particular future conditions. Here, that would be done in a separate futures market.
But they don't replace the satisfaction metric, nor even directly contribute their prediction to the utility function. Rather, when their extrapolated expectation disagrees significantly with the satisfaction that is reported, it triggers an extra-hard look at the security of the human satisfaction metric. If corruption is found, it triggers penalization of responsible parties.
The extra-hard look should include some metrics that were kept secret, or shared only with the security testers.
These AIs are generally rewarded according to the accuracy of their extrapolation. But they are also deliberately insulated from the results of security testing so that they have no incentive to adjust their extrapolation when security is judged to be weak. This might be effected by subtracting the security result metric out so that the net effect of weak security on their estimates is zero.
Physical upkeep
Physical upkeep of the hardware. Sure, people could do that for a long time, but eventually Moore's Law prevails.
Software upkeep
Runs the population algorithm and the various interactions (tempter, etc). Sure, human-written code could do that for a long time, but again Moore's Law prevails. It should prove new versions to be functionality equivalent of old versions, test new versions in sandboxes, etc.
No comments:
Post a Comment