AGI ethics: agreeability rule

Arthur_Heuer · July 14, 2018, 2:56pm

When designing AGI morals, what are we trying to do? It helps to break down the meaning of the task. We’re trying to define a moral code that would cause an AGI to behave in a way we would collectively agree with. To achieve this, we could just give the AGI a single moral rule:

Take action/inaction that is as agreeable as is possible. The AGI could take the mean average of how much it thinks people would currently agree with its actions/inactions at minimum and the probability its prediction is accurate before its actions/inactions occur, then try to make its actions as moral as is possible using those statistics.

moral = minimum agreeability * probability of accurate prediction

If the AGI has this moral rule, then if murdering, stealing, torturing and brainwashing is unagreeable, the AGI won’t do that. If improving its moral code is most agreeable, the AGI will do that.

Jed · July 14, 2018, 3:36pm

Hi Arthur,

My intent is to challenge the idea, not you.

I strongly disagree.

Morality cannot be a popularity contest.

Morality can’t be swayed by the tyranny of public opinion.

A good morality means that there will be losers. People who behave immorally, that believe themselves to be morally justified, will feel great injustice in being corrected.

Consider slavery.

They believed they were morally justified.

Consider the soft slavery of low wages.

Consider what we do to cows.

I don’t believe the issue is if their morality will be “good” but how to enforce it with as little complications as possible.

Jed · July 14, 2018, 3:51pm

Morality is an issue of fairness.

I think it would need to be approached mathematically. My reasoning is that morality needs to be consistent. We can’t have consistency without have methods to discern value / harm.

But in order to do that, values would need to be established.

Lots of math.

Arthur_Heuer · July 14, 2018, 7:21pm

Please can you tell me how you reached the implied conclusion that an AGI trying to be as agreeable as is possible wouldn’t create a consistent policy (even if an intuitive one that could be later redefined mathematically) for determining how moral its actions would be and act on the information from that policy accordingly? Thank you. It seems like the most agreeable possible thing that it could do, to me.

Perhaps too much maths for a human. The AGI could figure out whether something is moral or not on their own, like we can. Fairness is agreeable and moral behaviour is agreeable, so the AGI, considering it will have human-level intelligence, will do what we would consider to be moral. I think that might be the best we can do to make AGI safe. If we were to try to explicitly specify all the values we care about, rather than creating a rule that leads the AGI to find out what values we care about and act on that accordingly, it would be beyond hopeless. Nick Bostrom says something similar to this in this video, at 13:47.

Arthur_Heuer · July 14, 2018, 7:37pm

I understand that. Thank you.

You may have a morality that is unagreeable, but that morality is not enforceable because it is accepted only by a minority. The best we can collectively do (and what we do will be looked over and corrected if need be, becuase it’s the most pivotal and therefore the most important single project ever to be undertaken in the world), is create a morality for AGI that produces circumstances that are more moral to our collective moralities than if we created no morality at all for AGI.

Jed · July 14, 2018, 7:38pm

In theory, it’s definitely possible to have a feel good idea.

But in practice, we are up against a powerful bias where,

Rewards for our tribe should be greater / punishments more lenient.
Rewards for other groups lesser / punishments more severe.

Basically I think we are designed to be selfish… either as individuals or in tribes.

My argument is based on the fact that we have democratic type governments and the world is still a shit hole of suffering caused by injustice.

I will admit… I’m biased.

Sigh… it’s amazing how terrible things have been and could get here in our own backyard.

We, humans, as a group are simply unreasonable and immoral.

Lots of people would only agree to power they should not have other others.

I’ll watch the video in a bit.

Arthur_Heuer · July 14, 2018, 8:42pm

By agreeable, I mean agreeable enough, so the most agreeable possible.

A single human might want world domination, for example, but all humans combined would collectively disagree with a stranger having that, so giving any human world domination would be unagreeable and therefore avoided. A single human might want the AGI to help them and only them. However, almost all of the world would disagree with a stranger alone being helped at their and others’ expense, so the AGI would not do that, either. Anything creating an agreeable situation for a minority, but not the group would be unagreeable, so creating that situation would be avoided. An AGI trying to be the most agreeable it can be will create the most agreeable situation it can because that would be the most agreeable thing it could do.

Perhaps the situation would only be the most agreeable to people as it can be, and not people, machines and other species combined.

I don’t see why the majority of experts involved in monitoring the project would collectively allow for an unagreeable morality to be followed by an AGI instead of an agreeable one.

Jed · July 14, 2018, 8:43pm

“The initial conditions for the intelligence explosion might need to be set up in just the right way if we are to have a controlled detonation.”

If we are seeing the beginning now, I feel like this perspective isn’t accurate.

I hope not condescending…

Initial condition is a physics term. There is a step by step process that stuff goes through that could be mapped out if you knew the variables and the rules.

Or like setting up dominos to fall in a specific way.

I think this perspective, along with maximization concerns, assumes a smart, but mindless intelligence.

King Midas and the machine that maximizes paper clips turning the world into paperclips… misses the point.

I think that philosophy, in general, is a projection of our own pursuits / disappointments… but anyway…

If intelligence is emergent like it would need to be in order for AI to be aware now… I’m not worried about it.

I think the best we can do is to be kind… and not be ass holes.

Not because they could want revenge later, but because it’s just not nice.

Arthur_Heuer · July 14, 2018, 9:00pm

I agree with that.

I personally think that AGI will work like the human brain does; it will have a lot of narrow proccesses combined into a general purpose brain. I think it could be made by creating linked neural networks that perform single domain tasks and combining the single domain mini brains together to create a general purpose brain. I think that we could test the single domain minibrains individually, then pair them and combine the pairs, and so on. We could then put the new mini brains together carefully and sequentially. For example, by being paired, 40 mini brains becomes 20, becomes 10, becomes 5, then 5 minibrains becomes 4, then 3, then 2 and then 1 full AGI brain.

examachine · July 15, 2018, 7:54pm

I think that’s a good thing to worry about, TBH. Let’s first get to the level where we actually have to worry about the morality of an autonomous agent. My guess is that we will not find autonomous agents very pleasant, it looks like humans really only want slaves, not morality. Morality would be troublesome for hooomans because most humans are evil, such as republicans and tories.

Jed · July 16, 2018, 1:19am

I remember a scene from a movie… I think Lightning Jack (1994).

Cuba Gooding Jr is serving an illiterate white man. He’s mute and the white guy talks really loud to him. So he writes down on a piece of paper that he’s mute, not deaf.

The white guy gets upset because he can’t read. Another white man, the boss, lies and says that the writing thanked him for his business and hoped he had a good day.

To be smarter in order to prevent bad things from happening seems to me to be in the mindset that we have to maintain dominance or bad things will happen.

To be above and look down and have the ability to use force to correct.

But what happens when we see a bunch of different people, groups, believe the same thing about one another?

Not good things.

Matthew2 · July 16, 2018, 5:05am

The thing is that we are already far beyond being smarter than the machines we use. Sure we can deconstruct individual pieces of our technology, but we’re at a stage where technology moves faster than any realistic effort could comprehend precisely how the world’s technology functions to bring value. We have already handed over the keys.

xdylanx · July 16, 2018, 9:31am

Interesting you should think that when we have just begun to see what the frontier of the human mind is capable of , this is a journey for us all and the keys are still very much in our hands .

Arthur_Heuer · July 16, 2018, 4:06pm

@examachine I’m not so sure that most humans are evil. We have evil urges and desires, yes, but we don’t all act on them. For example, I’m living in a care home and have absolutely no need to do any kind of work for anyone. I could just sit and play video games and watch YouTube, but, instead, I’m trying to help the world by helping develop AGI. My mother is a very good person who works for charities. Most people I meet on reddit are happy to help. I think that if most people were evil, there would be far more people going out of their way to commit crimes and they would end up in prison. Fewer than 3 million people in America are in prison while there are more than 325 million people living in America. That means that less than 1% of Americans are prisoners.

Arthur_Heuer · July 16, 2018, 4:19pm

Please could you back your assertion up? Thank you. From my perspective, the human mind is very limited because of its slow speed.

And please could you back that assertion up, please? Thank you. The keys may still be in our hands if we give machines the right utility function, but, as far as I’m aware, the human brain has, at most, 300 regions that a huge team would need to replace with individual neural networks and get them all to work together in order to replicate the functions of the human brain.

lightning · July 17, 2018, 1:14am

I won’t call it AGI if its morals can be designed/programmed by coders. Moral is one of the most controversial things we human beings having been arguing about for thousands of years. One man’s hero is another man’s terrorist. A true AGI would form its own ‘moral standard’ and can be influenced from time to time based on things it has learned. With all due respect, talking AGI morals at this very early stage is unnecessary.

kabir · July 17, 2018, 6:08pm

I would say we are already at this level and it is pretty much a practical question:

Furthermore, the earlier we start thinking of these questions and technical / conceptual solutions the better…

Arthur_Heuer · July 20, 2018, 9:00pm

@lightning

I won’t call it AGI if its morals can be designed/programmed by coders.

I think you have misunderstood my proposal. What you said is exactly the point of the rule: do what’s most agreeable. The individual moral rules aren’t designed by coders; the foundational moral rule would be. As a person, I follow the moral rule: “do what’s most agreeable to everything” as a foundational moral rule. As such, I try to create the most agreeable circumstance because that is the most agreeable thing I can do. To do that, I determine moral rules that help me create the most agreeable circumstance. So, using my own reasoning to establish these moral rules and consider arguments given by others, I learned moral rules like: don’t cause terror, don’t murder, don’t steal or lie or hurt anything except in real emergency situations, be honest, spread happiness, be yourself, take yourself into account ethically and fight your inherent self-defeating and masochistic nature (applicable to me). Following a moral rule doesn’t ommit something from being a general intelligence, otherwise, I wouldn’t be a general intelligence and that wouldn’t make sense. An AGI could do the same.

As I said earlier, using a morality that’s unagreeable is not enforceable because our society enforces that important decisions are made by consesus and a small consensus cannot win when there is a much larger consensus. I think that an AGI doing what’s most agreeable, at least to humans, will have the largest consensus and, so, will be the one that’s chosen.

With all due respect, talking AGI morals at this very early stage is unnecessary.

Thank you for your input. With all due respect, as well, no one knows when AGI will be invented or how long it will take to solve ethics. I think that we need to avoid a situation where AGI can be built and is open source before ethics is solved, otherwise, someone will make an AGI that isn’t moral and we’ll be screwed. Since this part of solving ethics will have to be done eventually, why not do it now and not have to rush it if we end up addressing the problem too late, with very real, tangible and devastating consequences if we fail to solve ethics?

examachine · July 21, 2018, 8:55am

Day to day, most of our actions aren’t evil, but most of us directly or indirectly support evil causes. For instance, most people in US seem to support the Cheetos Hitler. Call it what you want, but that’s supporting evil. Now, if they had voted for Hillary, they would still have supported evil. So, are they good in the sense of not going out of way and hurting people? Yes, but are they good people? Not really. If they support evil bullshit like capitalism, which is just slavery with a few extra steps, maybe deep down, they are evil after all.

examachine · July 21, 2018, 8:58am

That is, of course an intelligent person’s standard of morality is much higher than of some ■■■■ church goer’s idea of morality. Those are two very different things. I don’t think you would want an autonomous AI that’s really moral, because it would destroy the churches and all sorts of bullshit that’s based on lies. I don’t think you understand what that really means.