Showing posts with label fairchy. Show all posts
Showing posts with label fairchy. Show all posts

22 June 2011

More on Foreseeing Existential Risks

More on Foreseeing Existential Risks

Earlier I wrote about refuge markets 1. Basically, they are an attempt to estimate existential risks, which my Fairchy design needs to measure.

But Fairchy requires more from its measure of existential risks than refuge markets alone can deliver. It needs to measure all significant existential risks, not just the ones that I am thinking of now.

Add more metrics later? Not so simple.

It's tempting to answer "We'll add those things later when we think of them". But who counts as "we"2? Once the system starts, there will be all sorts of players in it. It is not likely that they would all simultaneously agree to a redesign.

You might suppose that an existential risk would be so universally appreciated that everyone would agree to measure it well. History suggests otherwise. For example, regardless of where you stand on global warming, you can agree that one side or the other resists real measures of that existential risk.

What sort of mechanism?

Since we will need to add new existential risk metrics but can't expect to just all agree, we need a mechanism for adding them. This mechanism must have these properties:

  1. Vested interests in seeing the risk as large or small must not affect the outcome.
  2. It should measure the risks with reasonable information economy; it should neither starve for information nor spend more than the Expected Value Of Perfect Information measuring them.
  3. It must be flexible enough to "see" new risks; beyond that, it should aggregate understanding about new risks. This suggests a decision market solution.

This implies that there is some sort of overarching perspective on existential risks that this mechanism leans on. But that is a circular situation: if we can't measure the specific existential risks, how can we hope to measure the general risk? We can hardly hope to make "our continued existence" an issue in a prediction market. For similar reasons, we can't make continued existence part of the general utility metric.

Not an answer: The personal utility metric

You might suppose, based on the central role of the individual satisfaction reports in Fairchy, that the answer simply falls out: people, preferring of course to live, would proxy part of their satisfaction report to measures of existential risk. But this does not satisfy any of the three properties above. It's really no wiser than voting.

Not an answer: Last minute awareness

There is one general source of information about existential risks: last-minute awareness.

The idea is that at the last minute, doomed people would know either "we saw it coming" or "we never saw it coming". Too late, of course. But previously, a prediction market would have bet on the outcomes. Using that, we would predict not only whether we will have seen it coming, but whether existential risks metrics contemplated

But even though bets would be technically be settled before the end of the world, settling them a few days before the end of the world is not much better.

One might say that, since we contemplate refuge markets, the people in the refuges could spend winnings. But that exactly misses. The whole point in needing multiple measures of existential risk is that refuges would save people in some situations, and in other situations, they wouldn't - think an underground bomb shelter in a flood. For each refuge, the situations where it would save people are exactly the situations that a refuge market can already measure.

So last minute awareness adds nothing.

So what are we missing?

But we humans are aware of existential risks. Collectively we're aware of a great many, some serious, some not. We know about them right now with no special social mechanism helping us. Of course, sometimes we're way off; see whichever side of Global Warming you disagree with. But in principle, if not in widespread social practice, we can understand many of these existential risks.

If it's so hard to predict existential risks in general, how do we do it now?

The answer is that we use analysis and logic, of course. We (some of us) think rationally about these things.

Of course, it's not as simple as exhorting Fairchy citizens to "be rational". Nearly everybody thinks they already are quite rational, and sensible, reasonable and every other mental virtue.

Nor can we simply require that analysis be "scientific" or presented in the form of a scholarly paper. See Wrong by David Freedman for why experts are frequently just plain wrong in spite of all scientific posturing. For analysis to work, it must not be something that a priesthood does and presents to the rest of us.

So I believe that analysis (and its "molecular building block", logic) must be intrinsic to the decision system. If we have that, we can simply3 add analysis-predicted survival as a component of the utility function.

Adding logic to the picture

Even if you follow the field, you probably haven't heard logic in connection with prediction markets or decision markets before. Analysis is seen as something that bettors do privately before placing their bets. It's not seen as something the system ought to support.

I thought about this topic years ago - starting about 1991 when Robin Hanson first told me his idea of prediction markets. I think I know how to do it. I call the idea "argument markets". In the next few posts I hope to describe this idea fully.

Footnotes:

1 My version, fixing Robin Hanson's design of them.

2 A good general rule I use in thinking about Fairchy is to not picture myself in charge of it all. I don't picture my friends and political allies in charge, either. I picture the dumbest, craziest, and evillest people I know pushing their agendas with all the tools available to them, and I picture a soulless AI following its programming to its logical conclusion. I always assume there are fools, maniacs, villains, and automatons in the mix. So it doesn't appeal to me to make it all up as "we" go along.

3 There are a few free parameters, but those are preferences, not predictions, so they can be set via the individual satisfaction metric or similar.

13 May 2011

FAIrchy diagram

FAIrchy diagram

I wrote yesterday about FAIrchy, my notion that combines FAI and futarchy. Here is an i* diagram that somewhat captures the system and its rationale. It's far from perfect, but captures a lot of what I was talking about.

Many details are left out, especially for peripheral roles,

Link to this diagram

Some comments on this diagram technically

I felt like I needed another type of i* goal-node to represent measurable decision-market goal components, which are goal-like but are unlike both hard and soft i* goals. Similarly I wanted a link-type that links these to measurement tasks. I used the dependency link, which seemed closest to what I want, but it's not precisely right.

There's some line-crossing. Dia's implementation of i* makes that inevitable for a large diagram.

FAIrchy

FAIrchy1

In this blog post I'm revisiting a comment I made on overcomingbias2. I observed that Eliezer Yudkowsky's Friendly Artificial Intelligence (FAI) and futarchy have something in common, that they are both critically dependent on a utility function that has about the same requirements. The requirements are basically:

  • Society-wide
  • Captures the panorama of human interests
  • Future-proof
  • Secure against loophole-finding

Background: The utility function

Though the utility functions for FAI and futarchy have the same requirements, thinking about them has developed very differently. The FAI (Singularity Institute) idea seems to be that earlier AIs would think up the right utility function. But there's no way to test that the AI got it right or even got it reasonable.

In contrast, in talking about futarchy it's been clear that a pre-determined utility function is needed. So much more thought has gone into it from the futarchy side. In all modesty, I have to take a lot of the credit for that myself. However, I credit Robin Hanson with originally proposing using GDP3. GDP as such won't work, of course, but it is at least pointed in the right general direction.

My thinking about the utility function is more than can be easily summed up here. But to give you a general flavor of it, the problem isn't defining the utility function itself, it's designing a secure, measurable proxy for it. Now I think it should comprise:

  • Physical metrics (health, death, etc)
  • Economic metrics
  • Satisfaction surveys.
    • To be taken in physical circumstances similar to secret-ballot voting, with similar measures against vote-selling, coercion, and so forth.
    • Ask about overall satisfaction, so nothing falls into the cracks between the categories.
    • Phrase it to compare satisfaction across time intervals, rather than attempting an absolute measure.
    • Compare multiple overlapping intervals, for robustness.
  • Existential metrics
  • Metrics of the security of the other metrics.
  • Citizen's proxy metrics. Citizens could pre-commit part of their measured satisfaction metric according to any specific other metric they chose.
    • This is powerful:
      • It neatly handles personal identity issues such as mind uploading and last wills.
      • It gives access to firmer metrics, instead of the soft metric of reported satisfaction.
      • It lets individuals who favor a different blend of utility components effect that blend in their own case.
      • May provide a level of control when we transition from physical-body-based life to whatever life will be in the distant future.
      • All in all, it puts stronger control in individual hands.
    • But it's also dangerous. There must be no way to compel anyone to proxy in a particular way.
      • Proxied metrics should be silently revocable. Citizens should be encouraged, if they were coerced, to revoke and report.
      • It should be impossible to confirm that a citizen has made a certain proxy.
      • Citizens should not be able to proxy all of their satisfaction metric.
  • (Not directly a utility component) Advisory markets
    • Measure the effectiveness of various possible proxies
    • Intended to help citizens deploy proxies effectively.
    • Parameterized on facets of individual circumstance so individuals may easily adapt them to their situations and tastes.
    • These markets' own utility function is based on satisfaction surveys.

This isn't future-proof, of course. For instance, the part about physical circumstances won't still work in 100 years. It is, however, something that an AI could learn from and learn with.

Background: Clippy and the box problem

One common worry about FAI is when the FAI gets really good at implementing the goals we give it, the result for us would actually be disastrous due to subtle flaws in the goals. This perverse goal is canonically expressed as Clippy trying to tile the solar system with paper clips, or alternatively with smiley faces.

Clippy-the-paper clip jpeg

The intuitive solution is to "put the AI in a box". It would have no direct ability to do anything, but would only give suggestions which we could accept or disregard. So if the FAI told us to tile the solar system with paper clips, we wouldn't do it.

This is considered unsatisfactory by most people. To my mind, that's very obvious. It almost doesn't need supporting argument, but I'll offer this: To be useful, the FAI's output would certainly have to be information-rich, more like software than like conversation. That information-richness could be used to smuggle out actions, or failing that, to smuggle out temptations. Now look how many people fall for phishing attacks even today. And now imagine a genius FAI phishing. A single successful phish could set in motion a chain of events that allows the FAI out of the box.

FAIrchy: The general idea

What I propose is this: The most important AIs, rather than directly doing things or even designing and advising, should be traders in a futarchy-like system. As such, they would in effect govern other AIs that design, advise, and directly do things.

At first, they'd be trading alongside humans (as now). Inevitably with Moore's Law they'd dominate trading, and humans would only use the market to hedge. By then, AIs would have organically evolved to do the right (human-satisfying) thing.

Treat these AI traders as individuals in a population-style search algorithm (think genetic programming). Select for the most profitable ones and erase those that overstepped their roles.

Advantages

  • There's a built-in apprenticeship stage, in that the AIs are basically doing their eventual job even in the early stages, so any striking problems will be apparent while humanity can still correct them.
  • We get the advantage of a reasonable satisfaction metric up front, rather than hoping AIs will design it well.
  • These AIs have no incentive to try to get themselves unboxed. Earlier I talked about subtly perverse utility functions. But with these, we understand the utility function: make a profit in the decision markets. They can't go subtly off the track of human happiness, because that's not even the track they're intended to be on. We do need to make sure that corrupting the utility metric can't pay off, of course, but that's not a new issue.
  • The AIs would learn from people's real satisfaction, not just from theoretical projections.

About the separate AI roles

In general

The healthy performance of each role should be a component of the overall utility function.

Separation of roles: Why

Don't allow mingling of AI roles, and especially not the speculators role and the security-tester role. The threat here is that a speculator AI that also moves in the real world may find a way to short-circuit the system for profit. For instance, it might find a way to distort the satisfaction reports, or destroy things corresponding to issues it had shorted.

Put a different way, we don't want the various roles to co-evolve outside of their proper functions. We never want a situation where one role (say, security) is compromised because on the whole, it's more profitable to compromise it and profit somewhere else (say, in speculating)

Effectively, this separation creates a sort of distributed system that includes us and our satisfaction metric. This was never a desideratum but it is encouraging.

Separation of roles: How

Of course we'd use the obvious physical and computational security measures. We'd run the trader AIs in a way that lets us physically monitor their traffic and operations. Probably they'd be run in virtual machines.

I'll talk about other measures in the next section, in regard to specific roles.

The AI roles

Speculator

The role described above. Trades in a big futarchy decision market.

Effector

AIs that "do things". This includes probably every role you pictured an AI in before now.

They, like society, would be governed by the FAIrchy system. This probably cannot be improved on because it subsumes every other conceivable proposal for making rules for them or turning them off.

Sub-speculator

Sub-speculators also trade in a big futarchy decision market. Their purpose is to trade against trivial market patterns, for instance the "january slump", so that the speculators can only focus on meaningful patterns.

Sub-speculators would be deliberately limited in information and processing power, so that they couldn't out-think the real speculators, but must simply react faster and more lightly.

Several possible means of effecting this situation:

  • These might be "brokers" thru which the speculators trade.
  • They might simply be allowed quicker access.
  • They might be run in short bursts so they have no time to think deeply but plenty of CPU power to react fast.

Tempter

To keep roles from mingling, we might use defect seeding: mechanically present AIs with apparent temptations to overstep their role; erase the ones took the bait and subtract fitness from their relatives.

This would eventually be done by other AIs that constructed the temptations and were rewarded accordingly. Running them in a virtual machine helps: clone the AI without the copy knowing it's a clone.

Temptation-faller, etc

To keep the tempters themselves honest, the temptations would also be applied to AIs that looked like speculators, but are known to fall to temptation. They might be copies of speculators that fell and were turned off. If they didn't fall, we'd know the temptations weren't effective.

Less crucially, known honest AIs would be used too and expected not to fall, so that temptations are kept fair.

Temptation-faller designers would also need testing; this can regress quite a bit.

Information provider

We'd remove any incentive to get out into the world to look for better information by feeding all the speculators the best-quality information available.

This would also removes any incentive for outside competition that doesn't follow the non-mingling rules. So we are looking for a self-sustaining monopoly on best information-for-AI-speculators. This condition might form part of the utility function.

Information providers would be rewarded according to both:

  • Overall decision market success
  • Individual orders by speculators

Speculator support modules

When two or more AI speculators find certain computations useful, have them farm off that particular code to a support module. This probably shouldn't be optional. This profits the two speculators because they are now spending half as much computing time for the same information.

A speculator support module becomes a separate entity, obliged to make its results available to any speculator. It would have its own fitness function. If its fitness function fell below 0, it could be reabsorbed by a speculator, if it was still wanted. Its fitness wouldn't fall below zero if two or more speculators are interested enough to run it themselves.

Security testers

These would try to measure the security of the human satisfaction metric and other sensitive parts of the system. They don't try to improve it or safeguard it, just to test it without compromising it.

They must be separate from speculators in the overall market.

Security test testers

Again, the roles regress but dwindle.

Satisfaction extrapolator (Maybe)

This role is inspired by Eliezer's CEV but structured very differently. Like CEV, these AIs would try to predict human satisfaction in particular future conditions. Here, that would be done in a separate futures market.

But they don't replace the satisfaction metric, nor even directly contribute their prediction to the utility function. Rather, when their extrapolated expectation disagrees significantly with the satisfaction that is reported, it triggers an extra-hard look at the security of the human satisfaction metric. If corruption is found, it triggers penalization of responsible parties.

The extra-hard look should include some metrics that were kept secret, or shared only with the security testers.

These AIs are generally rewarded according to the accuracy of their extrapolation. But they are also deliberately insulated from the results of security testing so that they have no incentive to adjust their extrapolation when security is judged to be weak. This might be effected by subtracting the security result metric out so that the net effect of weak security on their estimates is zero.

Physical upkeep

Physical upkeep of the hardware. Sure, people could do that for a long time, but eventually Moore's Law prevails.

Software upkeep

Runs the population algorithm and the various interactions (tempter, etc). Sure, human-written code could do that for a long time, but again Moore's Law prevails. It should prove new versions to be functionality equivalent of old versions, test new versions in sandboxes, etc.

Footnotes:

1 I had originally called this "futurairchy" and then "futairchy", but both seemed clumsy.

2 Which was moved from overcomingbias.com to lesswrong in the great split.

3 He then proposed GDP+, but he just defines that as GDP plus unspecified other components.