2018-02-26 RRG Notes
- Concept defined by Nick Bostrom
- Defined as: "a risk that arises from the dissemination or the potential dissemination of (true) information that may cause harm or enable some agents to cause harm
- Pointed out in contrast to the generally accepted principle of of information freedom
- Possibility of information hazards needs to be considered when making information policies
- Typology of Information Hazards
- By information transfer mode
- Data hazard
- Idea hazard
- Attention hazard
- Template hazard
- Signaling hazard
- Evocation hazard
- By effect
- Adversarial risks - competitiveness hazard
- Enemy hazard
- Intellectual property hazard
- Commitment hazard
- Knowing-too-much hazard
- Risks to social organizations and markets
- Norm hazard
- Information asymmetry hazard
- Unveiling hazard
- Recognition hazard
- Risks of irrationality and error
- Ideological hazard
- Distration and template hazard
- Role model hazard
- Biasing hazard
- De-biasing hazard
- Neuropsychological hazard
- Information burying hazard
- Risk to valuable states and activities
- Psychological reaction hazard
- Disappointment hazard
- Spoiler hazard
- Mindset hazard
- Belief-constituted value hazard
- Risks from information technology systems
- Information system hazard
- Information infrastructure failure hazard
- Information infrastructure misuse hazard
- Artificial intelligence hazard
- Risks from development
Abstract
- Information hazards are risks that arise from the potential dissemination of true information
- May cause harm, or may enable some agent to cause harm
- Subtler than direct threats
- This paper proposes a taxonomy
Introduction
- Commonly held presumption in favor of knowledge, truth and the uncovering and dissemination of information
- Even reactionaries don't oppose the revealing of information - the support truth, but had a different idea of what the truth is
- Although no one makes a case of general ignorance, there are many special cases where ignorance is deliberately cultivated
- National security
- Sexual innocence
- Jury impartiality
- Anonymity for patients, clients, voters, etc
- Suspense in films and novels
- Measuring the placebo effect
- Create mental challenges in games and studying
- Assume that objective truth exists and that humans can know these truths
- Not concerned with false information
- Therefore, an information hazard can be defined as: "A risk that arises from the dissemination or potential dissemination of (true) information that may cause harm or enable some agent to cause harm"
- Relative to their significance, some classes of information hazard are unduly neglected
- Create a vocabulary of information hazards to allow the examination of easily overlooked risks
- Create a catalog of some of the various ways in which information can cause harm
Six Information Transfer Modes
- Distinguish several different information formats, or "modes" of idea transfer
- Data hazard: specific data, such as the genetic sequence of a lethal pathogen, or a blueprint for a nuclear weapon that, if disseminated, create risk
- Idea hazard: a general idea, if disseminated, creates a risk even without a data-rich detailed specification
- Example: the idea that nuclear fission can be used to create a weapon is an idea hazard, even though it's not a detailed blueprint of a nuclear bomb
- Even a mere demonstration can be an idea hazard, insofar as it shows an agent that a particular harmful thing is possible to create
- Attention hazard: the mere drawing of attention to some particulary potent or relevant ideas or data increases risk, even when these ideas or data have already been published
- There are countless avenues for doing harm, not all of which are equally viable
- Adversary faces a search task
- Anything that makes this search task easier can be an infohazard
- Example: an adversary may look at the way in which we construct our defenses to see what we're worried about
- Attempts to suppress attention often backfire by letting people know that they should pay attention to the thing that is being suppressed
- Even thinking about a topic may not be entirely harmless, since once one has a good idea, one will be tempted to share it
- Template Hazard: the presentation of a template enables distinctive modes of information transfer, and thereby creates risk
- Risk of a "bad role model"
- Risks caused by implicit forms of information processing or organization structure
- Signaling hazard: Verbal and non-verbal actions can indirectly transmit information about some hidden quality of the sender, and this social signalling can create risk
- Academics might stay away from topics, or adopt excessive formalism in dealing with topics that are attractive to crackpots
- Individual thinkers suffer reputational damage just from being in the field
- Evocation hazard: Risk that the particular mode of presentation can cause undesirable mental states and processes
- Vivid description of an event can trigger mental processes that lie dormant when the same event is described in dry prose
Adversarial Risks
- Enemy hazard: By obtaining information, our enemy or potential enemy becomes stronger and increases the threat that they pose
- National security - everything from counterintelligence to camouflage is aimed at reducing the amount of information available to the enemy
- Depends on the existence of valuable information that the enemy might obtain
- Our own activities can be hazardous if they contribute to the production of such information
- Example: in world war 2, even though the allies and the axis powers invented chaff independently, from base principles, neither used it immediately because they were afraid of revealing the existence of radar disrupting equipment to the other side
- Rational strategy for military research would give significant consideration to enemy hazard
- Example: US should be careful about pursuing research into EMP weapons that affect electronics, because the US is more dependent on electronics than its adversaries
- This is really fancy language for, "People in glass houses shouldn't build trebuchets."
- Even when new technologies would not differentially benefit enemies, there can still be an advantage in intentionally retarding military progress
- Suppose a country has a great lead in military power and technology
- By investing heavily in military research, it could increase its lead and further enhance its security somewhat
- But if the rate of information leakage is a function of the size of the technological gap between the nation and its enemies, the farther ahead a nation gets, the faster its adversaries catch up
- Thus military research only serves to increase both nations' ascent in military technology and only serves to make wars more destructive
- Accelerating the ascent of the military tech tree is especially bad if the tree is of finite height, and the leader runs out of opportunities for innovation at some point
- Competitiveness Hazard: There is a risk that by obtaining information, some competitor of ours will become stronger, thereby weakening our competitive position
- In competitive situations, one person's information can cause harm to another even when no intent to cause harm is present
- Example: rival knows more and gets the job that you were applying for
- How is this an infohazard?
- Intellectual Property Hazard: A faces the risk that some other firm B will obtain A's intellectual property, thereby weakening A's competitive position
- Competitors can gain valuable information by
- Observing production and marketing methods
- Reverse-engineering products
- Recruiting employees
- Firms go to great lengths to protect their intellectual property
- Patents
- Copyright
- Non-disclosure agreements
- Physical security
- Compensation schemes that discourage turnover
- Is a special case of competitiveness hazard
- Commitment Hazard: Risk that the obtainment of some information will weaken one's ability to credibly commit to some course of actionz
- Example: blackmail
- As long as the target is unaware of the threat, they are not affected
- As soon as the target is made aware of the threat their ability to commit to a course of action that blackmailer does not want them to commit to is weakened
- In some situations it can be advantageous to make a probabilistic threat
- Thomas Schelling - "threat that leaves something to chance"
- Instead of threatening to launch a nuclear attack, you threaten to increase the chances of an attack occurring, by putting your forces on high alert or engaging in conventional war
- Theory is that you can't credibly threaten to deliberately launch an attack, but you can threaten to make an accidental launch more likely
- However, if information is revealed that dispels the uncertainty, the effect of the probabilistic threat is reduced
- Knowing-too-much hazard:
- Knowledge makes someone your adversary
- Example: Stalin's wife
- Committed suicide
- Death was attributed to appendicitis
- Doctors who knew the true cause of death found themselves targeted and executed
- Pol Pot targeted the entire intellectual class of Cambodia for extermination
- The mere possession of true knowledge can make you a target for those who wish to suppress that truth
Risks To Social Organizations And Markets
- Information can sometimes damage parts of our social environment, such as cultures, norms and markets
- This can damage some agents without necessarily benefiting their adversaries
- Norm Hazard: some social norms depend on a coordination of beliefs or expectations among many subjects; a risk is posed by information that could disrupt these expectations for the worse
- Information that alters the expectations that people have of the way others will behave can change their own behavior
- This can be a move into a worse social equilibrium
- Locally suboptimal policies are often justified as a price worth paying in order to protect norms that serve to block a slide into a worse equilibrium
- One can object to certain judicial decisions because of the precedent they set, rather than on the basis of the decision itself
- While it is obvious how false information can damage norms, norms can also be damaged by true information
- Self-fulfilling prophecies
- People act more honestly if they believe they are in an honest society and more corruptly if they believe they are in a corrupt society
- Information cascades
- Agents make decisions in sequence
- Each agent, in addition to some noisy private information, has the ability to observe the choices of the agents in front of him or her in the queue
- If the first agent makes a poor choice, it biases subsequent agents into also making poor choices
- The effect gets amplified as more and more agents follow the crowd
- Account for faddish behavior in many fields
- Information asymmetry hazard: when one party to a transaction has potential to gain information that others lack, market failure may result
- Example: "lemon market"
- Sellers of suboptimal goods are more likely to sell
- Buyers know this and subsequently offer lower prices accordingly
- Sellers of optimal goods are therefore less likely to sell, since they're not getting a fair price
- As a result, the market is dominated by suboptimal goods
- Example: insurance and genetic testing
- Buyers know more about their health than their insurers
- Therefore, the buyers at greatest risk of illness buy insurance
- Anticipating this, insurance companies raise premiums, leading to an adverse selection spiral that causes the market to collapse
- As a result, it can be beneficial for neither buyers nor insurance companies to know genetic risks
- Unveiling Hazard: The functioning of some markets, and the support for some social policies, depends on the existence of a shared "veil of ignorance"; and the lifting of the veil can undermine those markets and policies
- Example: insurance (again)
- You're not going to buy insurance against a loss that you are certain will not occur
- No insurer is going to sell you insurance against a loss they are certain will occur
- Insurance only works because both you and the insurance company are uncertain about whether a loss will or will not occur
- Rawlsian political philosophy
- Selfish people choose policies that favor their own self-interest because they know their own race, social class, occupation, etc
- If social policies had to be chosen from behind a veil of ignorance, they would be more fair
- Elites might be less likely to support a social safety net if they could be certain that neither they nor their descendants would ever have to make use of it
- Support for freedom of speech might weaken if people knew with certainty that they would never find themselves as a member of a persecuted minority
- In an iterated prisoner's dilemma, the equilibrium strategy of cooperation unravels if the agents know how many rounds there will be
- Recognition Hazard: some social fiction depends on some shared knowledge not becoming common knowledge or not being publicly acknowledged; public releases of information could ruin the pretense
Risks of Irrationality and Error
- Ideological Hazard: An idea might, by entering into an ecology populated by other ideas, interact in ways which, in the context of extant institutional ans social structures, produce a harmful outcome, even in the absence of any intention to harm
- Example: scriptural doctrine
- Scripture S contains an injunction to drink sea water
- Bob believes that everything in S is true
- Thus, informing Bob about the true contents of S causes him harm by inducing him to drink sea water
- Ideological hazard causes harm by leading someone in a bad direction through the interaction of true knowledge with existing false beliefs or incomplete knowledge
- Distraction and temptation hazards: Information can harm us by distracting us or presenting us with temptations
- Humans are not perfectly rational
- Humans do not have perfect self-control
- Some information involuntarily draws our attention to some idea when we would prefer that we focus our minds elsewhere
- A recovering alcoholic can be harmed by a detailed description of wine
- In the future, virtual reality environments and informational hyper-stimuli might be as addicting as drugs
- Role model hazard: we can be corrupted and deformed by exposure to bad role models
- Even if we know a model is bad, we can still be influenced by it via prolonged exposure
- Subjective well-being and even body mass are singnificantly influenced by peers
- Biasing hazard: When we are already biased, we can be led further away from the truth by information that amplifies or triggers our biases
- Cognitive biases can be aggravated by the provision of certain kinds of data
- Overestimation of one's own abilities can be aggravated by a good performance on an easy task
- Even knowledge of biases and logical fallacies can be harmful, because it gives the person useful counterarguments with which to rebut challenging facts
- Debiasing hazard: when biases have individual or social benefits, harm can result from information that erodes those biases
- Strong belief in our own abilities signals confidence and competence, making us more effective leaders
- Information that undermines that belief can deprive us of those benefits
- Possible that society benefits from excess individual risk-taking in some disciplines
- The overestimation of the chances of success by inventors and entrepreneurs may have positive externalities for society as a whole
- Neuropsychological hazard: Information might have negative effects on our psyches because of the particular ways in which our brains are structured, effects that would not arise in more "idealized" cognitive architectures
- Neurological problems that arise from too much "cross-talk" between different parts of the brain
- Photosensitive epilepsy
- Information Burying Hazard: Irrelevant information can make relevant information harder to find, thereby increasing search costs for agents with limited computational resources
- Steganography
- Hiding incriminating evidence inside masses of trivial documents
Risks To Valuable States and Activities
- Psychological reaction hazard: information can reduce well-being by causing sadness, disappointment or some other reaction the receiver
- Belief constituted value hazard: if some component of well-being depends constitutively on epistemic or attentional states, then information that affects those states might thereby directly impact well-being
- Disappointment hazard: Our emotional well-being can be adversely affected by the receipt of bad news
- Example: mother on her deathbed, with her son fighting in a war
- If the son is killed or injured there is a disappointment hazard to the mother
- If she learns about her son's fate before she dies, she will spend her last days in despair
- If she does not, she will die in peace
- Mother faces severe disappointment hazard in this scenario
- Spoiler hazard: Fun that depends on ignorance and suspense is at risk of being destroyed by premature disclosure of truth
- Mindset hazard: Our basic attitude or mindset might change in undesirable ways as a consequence of exposure to information of certain kinds
- Unwanted cynicism promoted by an excess of knowledge about the dark side of human affairs
- Historical knowledge sapping artistic and cultural innovation
- Scientific reductionism despoils life of its mystery and wonder
- How do we distinguish belief constituted value hazard from psychological reaction hazard?
- It might be valuable for someone to risk psychological reaction, because of broader values
- One might hold that life lived in ignorance is a life made worse, even when that ignorance shields one from painful realities
- One might also hold that there is some knowledge that makes a negative contribution to well-being
- We might value innocence for its own sake
- Privacy
- We might want to remain ignorant of some details of our friends or our parents lives so that we can think about them in a more appropriate manner
- Embarrassment hazard: We may suffer psychological distress or reputational damage as a result of embarrassing facts about ourselves being disclosed
- Often similar to and take the form of signaling hazards
- Combine elements of psychological reaction hazard, belief constituted value hazard, and competitiveness hazard
- Self-esteem is not a wholly private matter, but is also a social signal that influences others opinions of us
- Risk of embarrassment can suppress frank discussion
- Embarrassments that affect reputation and brand names of corporations can cause billions of dollars in damage
- In the Cold War, the prolongation of the Vietnam war (on the US side) and the Afghan war (on the Soviet side) were both due to the respective side not willing to suffer the embarrassment cost of admitting defeat
Risks from information technology systems
- Information technology systems are vulnerable to unintentionally disruptive input sequences, or systems interactions as well as to attacks by determined hackers
- Information system hazard: The behavior of some (non-human) information system can be adversely affected by some informational inputs or system interactions
- Can be subdivided in various ways
- Information infrastructure failure hazard: the risk that some information system will malfunction, either accidentally or as a result of cyber attack; and, as a consequence, the owners or users of the system maybe be harmed or inconvenience, or third parties whose welfare depends on the system may be harmed, or the malfunction might propagate through some dependent network, causing a wider distrubance
- Most attention is given to information infrastructure failure hazard
- Information infrastructure misuse hazard: Risk that some information system, while functioning according to specifications, will service some harmful purpose and will faciliatate the achievement of said purpose by providing useful information infrastructure
- Example: government or private databases that collect large amounts of data about citizens might make it easier for a future dictator to gain and maintain control
- Building such a database might, in addition, establish a norm that makes it easier for other, more harmful, governments to do the same thing
- Robot hazard: Risks that derive substantially from the physical capabilities of a robotic system
- If a Predator drone with armed missiles gets hacked or malfunctions, that's a robot hazard
- Artificial intelligence hazard: Computer related risks in which the threat would derive primarily from the cognitive sophistication of the program rather than specific properties of any actuators to which the system initially has access
- A superintelligent AI, even if initially restricted to interacting with human gatekeepers via a text interface, might hack or talk its way out of confinement
- The threat posed by a sufficiently advanced AI may depend more on its cognitive capabilities and its goal architecture than on the physical capabilities with which it is initially endowed
Risks From Development
- Development hazard: Progress in some field of knowledge can lead to enhanced technological, organizational or economic capabilities, which can produce negative consequences
- After Hiroshima and Nagasaki, the physicists of the Manhattan Project found themselves complicit in the deaths of over 200,000 people
- Given the example of the Manhattan project, it is no longer morally viable to proceed with research without thinking about its potential consequences
- Biotech
- Nanotech
- Surveillance systems
- The broad and interdisciplinary nature of modern scientific advances means that even innocuous looking advances may have implications for development hazard
Discussion
- The catalog of information hazards detailed above can help inform our choices by highlighting the sometimes subtle ways in which even true information can have harmful effects
- In many cases, the best response to an infohazard is no response
- Benefits of information so far outweigh the costs of information hazards that we still underinvest in information gathering
- Ignorance carries dangers that are often greater than knowledge
- Mitigation need not take the form of an active attempt to suppress information
- Invest less in research in certain areas
- Refrain from reading about spoilers by avoiding reviews and plot summaries
- Sometimes an information hazard is caused by partial information, so the solution to the information hazard is more information, not less
- Historically, policies that have restricted information have served special interests
- At the same time, we should recognize that knowledge and information frequently have downsides
- We should be more cognizant of which areas of knowledge should be promoted, which should be left fallow, and which should be actively impeded
- Indeed, the discussion of information hazards itself can be a norm hazard, if it undermines the fragile norms allowing for truth-seeking and truth-reporting
- Concealing information can produce risk
- Man Made Catastrophes and Risk Information Concealment: Case Studies of Major Disasters and Human Fallibility
- Hiding information in disasters contributed to make them possible and hindered rescue and recovery
- Book focuses mainly on technological disasters, such as Vajont Dam, Three Mile Island, Bhopal, Chernobyl, etc, but also covers financial disasters, military disasters, production failures, and concealment of product risk
- In all cases, there was concealment going on at multiple levels
- Many patterns of information concealment occur again and again
- 5 major clusters of information concealment
- External environment enticing concealment
- Risk communication channels blocked
- Internal ecology stimulating concealment or ignorance
- Faulty risk assessment and knowledge management
- People having personal incentives to conceal
- Systemic problem - one or two of these factors can be counteracted by good risk management, but when you get more, the causes become much more difficult to deal with
- Causes work to corrode the risk management ability of the organization
- Once risks are hidden, it becomes much more difficult to manage them
- However, risk concealment is something that can be counteracted
- Apply model to some technologies they think show signs of risk concealment
- Shale energy
- GMOs
- Debt and liabilities of US and China
- Patterns of concealment don't predict imminent disaster, but will make things worse if/when a disaster occurs
- Information concealment is not the cause of all disasters
- Some disasters are due to exogenous shocks or truly unexpected failures of technology
- However, risk concealment can make preparation brittle and recovery inefficient
- No evidence to indicate that examined disasters were uniquely bad from a concealment perspective - a lot of time organizations and individuals just get away with concealing risk
- Book is an important rejoinder to the concept of information hazards
- Some information can be risky
- But ignorance can be just as risky
- Institutional secrecy is intended to contain information hazards, but can compartmentalize and block relevant information flows
- A proper information hazard strategy needs to take into account concealment risk
- Scott likes trigger warnings
- Trigger warnings aren't censorship
- Opposite of censorship
- Censorship says, "Read what we tell you"
- Trigger warnings allow you to read what you want
- Scott doesn't understand what censorship is - censorship isn't about telling people what they should or should not read, it's about suppressing ideas
- We should give people relevant information and trust them to make their own decisions
- Trigger warnings attempt to provide you with the information to make good free choices about your reading material
- Analogy with book titles
- We print the titles of books on the outsides
- People can, and do judge books by their titles
- Our decision to print titles on the outsides of books means that we care more about trusting people's judgement than denying people the ability to avoid things they don't want to read
- "Beware he who would deny you access to information, for, in his heart, he dreams himself your master."
- Trigger warnings allow us to fight censorship by arguing that those who chose to engage with our ideas do so with the full knowledge that they might find what we have to say offensive
- People can misuse trigger warnings to avoid engaging with challenging ideas, but this is a problem any time you provide people with more information - they might use it the wrong way
- However, people might also use trigger warnings to increase their ability to read challenging material - choose to engage with arguments that don't try to offend them
- Do we, as a civilization, force people to be virtuous without their consent?
- Not any more, which is the crux of so many "hot-button" disagreements
- On topics like gay marriage, abortion, adultery, blue laws for alcohol, drug policy, and many other issues, society did (and to some extent, still does) force people to be virtuous
- The strongest argument that Scott's heard against trigger warnings is that they increase politicization
- Colleges put trigger warnings on everything that can offend liberals, but get outraged when conservatives ask for trigger warnings for things that offend them
- The solution to this is to put trigger warnings in small print on the "bullshit page" - the page with the publisher and copyright information
- This might be a solution for books, but what about all the other places where SJWs want trigger warnings, like class syllabi, blog posts, news articles, etc
- Also, what is the set of triggers that have to warned about? SJWs have literally asked for trigger warnings on opinions like, 'There are only two genders'
- Trigger warnings can be helpful, if used in good faith
- That is exactly the problem though - the people who are calling for trigger warnings are not acting in good faith - they're trying to put yellow radiation trefoils on anything that opposes their political agenda
- It's a motte-and-bailey argument: the bailey is SJWs going around putting trigger warnings on anything that's insufficiently leftist. But when challenged, they retreat to the motte of saying, "But we should warn people about graphic rape scenes, because that might trigger people's PTSD."
- Opposing trigger warnings on slippery slope grounds just serves to discredit you, while being completely ineffective in the long run
- Example: gay marriage
- Conservatives said there was nothing inherently objectionable about gay marriage
- Argued that it was the first step along a slippery slope to worse things
- But when gay marriage passed, and society didn't take any further steps, conservatives were shown to be both ineffectual and wrong
- Now even valid arguments on the basis of "family values" can be rejected, because people will pattern-match them to the arguments against gay marriage
- The real problem that Scott has is with the argument that trigger warnings should be avoided in order to force people with PTSD to confront their triggers
- You do not give people psychotherapy without their consent
- Even if you can argue consent, people want to confront triggers at their own pace and on their own terms
- My problem is that the word "triggered" has become totally devalued by social justice types on Tumblr
- "Triggered", in the way that Scott uses it, means a strong reaction that can be harmful or even debilitating to a person
- "Triggered", in the way that Tumblr SJWs use it means "mild discomfort or offense"
- Scott doesn't seem to understand the role of weaponized weakness and performative victimhood as tactics used by the social justice movement in order to advance political goals
- Unfortunately, calls of trigger warnings have become irretrievably associated with the tactics of weaponized vulnerability and performative victimhood, and thus, just like conservative opposition to gay marriage, SJW calls for trigger warnings have become discredited
- /
- Introduction
- Roko's Basilisk is a thought experiment proposed in 2010 by the user Roko on the LessWrong community forum
- Used ideas in decision theory to argue that a sufficiently powerful AI would have an incentive to torture anyone who imagined the agent, but didn't work to bring the agent into existence
- Called a basilisk because merely hearing the argument would put you at risk of torture from this hypothetical agent
- Argument was broadly rejected on LessWrong
- A basilisk-like agent would have no incentive to follow through on its threats
- Torturing people for past decisions would be a waste of resources, since once an agent is in existence, the probability of its existence is 1
- Although there are acausal decision theories that allow entities to follow through on acausal threats, these require a large amount of shared information and trust, which does not apply in this case
- Discussion of Roko's Basilisk was banned as part of a general site policy against spreading potential information hazards
- Had the opposite of the intended effect
- Outside websites began sharing information about Roko's Basilisk
- People assumed that discussion had been banned because LessWrong users accepted the argument
- Used as evidence to show that LessWrong users have unconventional and wrong-headed beliefs
- Background
- Roko's argument ties together Newcomblike problems in decision theory with normative uncertainty in moral philosophy
- Example of a Newcomblike-problem: Prisoner's Dilemma
- Each player prefers to defect individually while the other player cooperates
- Each player prefers mutual cooperation over mutual defection
- One of the basic problems in decision theory is that "rational" agents will end up defecting against each other, even though it would make both players better off to have a binding cooperation agreement
- Extreme version of a prisoner's dilemma - playing against an identical copy of oneself
- It's certain that both copies will play the same move - only choices are mutual cooperation or mutual defection
- Causal decision theory endorses mutual defection
- Assumes that agents' choices are independent
- Regardless of what the other copy does, it is in this copy's best interest to defect
- Since defection dominates in both scenarios for each agent, defection dominates
- Eliezer Yudkowsky proposed an alternative to Causal Decision Theory, Timeless Decision Theory, that can achieve cooperation in prisoner's dilemmas, provided that each player knows that the other is running TDT
- Wei Dai subsequently proposed a theory that outperforms both TDT and CDT, Updateless Decision Theory (UDT)
- Interest in decision theory stems from AI control problem - how can we gain high confidence in AI agents' reasoning and decision-making, even if they've surpassed us in intelligence?
- Without a full understanding of decision theory, we risk making AI agents whose behavior is difficult to model or erratic
- AI Control Problem also raises moral philosophy questions: how can we specify the goals of an autonomous system in the face of uncertainty about what it is we actually want?
- Hypothetical algorithm that could autonomously pursue human goals in a way compatible with moral progress: coherent extrapolated volition
- Because of Eliezer's status as a founding member of LessWrong, AI theory and "acausal" decision theories have been repeatedly discussed
- Roko's post was an attempt to use Yudkowsky's proposed decision theory to argue against his characterization of an ideal AI goal (coherent extrapolated volition)
- Roko's post
- If two TDT or UDT agents with common knowledge of each others' source code are separated in time, the later agent can seemingly blackmail the earlier agent
- Earlier agent: Alice
- Later agent: Bob
- Bob's algorithm outputs things that alice likes if Alice leaves Bob a large sum of money, and things that Alice dislikes otherwise
- Since Alice knows Bob's source code, she knows this fact about Bob, even though Bob doesn't exist yet
- If Alice is certain that Bob will someday exist, then her knowledge of what Bob would do seems to force Alice to comply
- CDT is immune to this
- CDT agents assume that their decisions are independent
- CDT Bob would not waste resources punishing a decision that has already happened
- Roko proposed that a highly moral AI agent (one whose actions are perfectly consistent with coherent extrapolated volition) would want to be created as soon as possible
- Such an AI agent would use acausal blackmail to give humans stronger incentives to create it
- The AI agent would target in particular people who had thought about this argument, because they would have a better chance of mentally simulating the AI's source code
- Conclusion: any AI agent that reasons like a utilitarian optimizing for humanity's coherently extrapolated values would be paradoxically detrimental to those values
- Response from Eliezer
- The AI agent would gain nothing from following through with its threats because it would be wasting resources punishing humanity for a decision that already had taken place
- Moreover, the agent has an even better outcome: make you believe that it's going to torture you in the future, and then not expend resources on that
- So, given that, why should we believe an basilisk-like agent?
- Subsequent discussion of the basilisk post has had more to do with the moderator response to Roko's post, rather than on the specific merits of the argument
- Topic moderation and response
- Yudkowsky deleted Roko's post and the ensuing discussion
- Yudkowsky rejected the idea that Roko's Basilisk could be considered a friendly AI in anyway, by asserting that even threatened torture would be contrary to humanity's coherent extroplated volition
- The deletion and the apparently strong response to the basilisk post caused others to assume that LessWrong users took the threat of Roko's basilisk seriously
- In addition, the ban prevented people from seeing the original argument, leading to a wealth of secondhand, sometimes distorted, interpretations of the argument
- Gwern says that few LessWrong users took the Basilisk seriously, and that everyone seems to know who is affected by the Basilisk, despite not knowing any such people
- Like the Band of Brothers quote: "It's funny how when you talk to people about it, everyone claims they heard it from someone who was there, and then when you go ask that person, they claim they heard it from someone who was there"
- Eliezer claims to have deleted the post not because the post itself was an infohazard, but because there may be some variant of the idea of a basilisk that is a real infohazard
- There is no upside from being exposed to Roko's Basilisk, so the probability of it being true is irrelevant
- Was indignant that Roko had violated the basic ethical code for handling infohazards
- Big-picture questions
- Blackmail resistant decision theories
- The general ability to cooperate in prisoners' dilemmas appears to be useful
- Introducing more sophisticated forms of contracts to ensure cooperation appears to be a beneficial thing to do
- At the same time, these contracts introduce new opportunities for blackmail
- If an agent can pre-commit to following through on on a promise, even when following through is no longer in the agent's best interest, it can also pre-commit to following through on a costly threat
- It appears that the best way to defeat this blackmail is to precommit to never giving in to any of the blackmailer's demands, even when there are short-term advantages to doing so
- Stick to the action that is recommended by the most generally useful policy (which is what UDT advises)
- UDT selects the best available mapping of observations to actions (policy) rather than the best available action
- Avoids selecting a strategy that other agents will have an especially easy time manipulating
- It has not been formally demonstrated that decision theories are vulnerable to blackmail, nor do we know in what circumstances a particular decision theory would be vulnerable
- If TDT or UDT were vulnerable to blackmail, then this would suggest that they are not normatively optimal decision theories
- Information hazards
- David Langford coined the term "basilisk", in the infohazard sense, in the 1988 science fiction story BLIT
- Roko's basilisk incident suggests that information that is deemed dangerous or taboo spreads more rapidly
- Although Roko's basilisk was not harmful, real infohazards may spread in a similar way
- Non-specialists spread the idea of Roko's basilisk without first investigating the risks or the benefits in any serious way
- Someone in possession of an infohazard should exercise caution in visibly suppressing it
- "Weirdness points"
- Promoting or talking about too many nonstandard ideas makes it less likely that any one of those ideas will be taken seriously
- If you promote too many weird ideas, a skeptical interlocutor will write you off as just being prone to weird ideas
- On the other hand, promoting weird ideas can help form a community that is interested in those weird ideas, whereas associating with people who solely endorse conventional ideas can just alienate you when you do spend all your "weirdness points" on that one idea