Formal Behavior Testing

Evaluating theory for dynamic consistency by formalizing statements about system behavior

James Houghton

%pylab inline
import pyDOE
import pandas as pd
Populating the interactive namespace from numpy and matplotlib

The process of formalizing theory as a simulation model helps to discipline a researcher’s effort, forcing them to be explicit in their assumptions and logically consistent in their theory’s structure. Similarly, the process of formalizing statements of system behavior with respect to time or as a result of varying parameters promotes rigor in claims of observable behavior of the system. The discipline of software testing has lessons for how such behavioral tests are constructed and provides a toolset for efficiently doing so. The approach has benefits for those seeking to create their own theories, and to understand theories promoted in existing literature.

Structural Consistency

Are the statements of theory structure logically consistent with one another? Are they ambiguous, missing pieces, or conflicting?

The process of modeling can be used as a personal discipline, enforcing rigor in the understanding of a theory’s structure.

  • Make assumptions explicit

  • Enforce exhaustive structural communication

  • Identify system boundaries

“Formalization helps you to recognize vague concepts and resolve contradictions. Formalization is where the real test of your understanding occurs: computers accept no hand waving arguments.”

Business Dynamics. Sterman 2000, 3.5.3

Example of Formalizing Structural Statements

“The value of achieving success [in a protest] depends on whether things might be gained by action… this includes both new advantages and avoiding harms that are currently experienced or anticipated”

Threat (and Opportunity): Popular Action and State Response in the Dynamics of Contentious Action. Goldstone and Tilly 2001

Interpretation:

Gains that would result from success = New Advantages + Harms Avoided

model diagram

model diagram

Behavioral Consistency

Does the theory’s structure actually create the behavior that is described by the theory’s authors?

Two contexts: - Testing preconstructed theories against their assumed implied behavior - Testing dynamic models during construction for consistency with observed behavior

“I examine an existing theory in detail, formalizing it to investigate how well the theory accounts for the phenomena it’s authors set out to explain.”

Problems and paradoxes in a model of punctuated organizational change. Sastry 1997

“The purpose of the model is… to play the roles of the actors in the system and to trace out the consequences of their actions over time, thus providing a test of the theory by checking whether the assumptions can actually produce [asserted behavior].”

The growth of knowledge: Testing a theory of scientific revolutions with a formal model. Sterman 1985

Lessons from software testing

We normally express statements of behavior as ‘Reference Modes’, and then manually compare reference modes to behavior.

We can learn from explicit behavioral testing in software development:

Software Development

SD Modeling

Make expected behavior of code explicit

Make statements of system behavior explicit

Make developing software easier

Help with model formulation

Ensure robustness to unknown user input

Ensure robustness to uncertain parameters

Support code acceptance by client

Create defensible statements of behavior

“Testing is an extremely creative and intellectually challenging task.”

The art of Software Testing - Myers et. al 2012

A Software Testing Example

def test_step(self):
    """ Tests the PySD version of Vensim's `STEP` function """
    from pysd import functions  # What are we examining?

    functions.time = lambda: 5  # What are the conditions of our test?
    self.assertEqual(functions.step(1, 10), 0) # what is expected?

    functions.time = lambda: 15  # New conditions
    self.assertEqual(functions.step(1, 10), 1)  # New expectation

    functions.time = lambda: 10
    self.assertEqual(functions.step(1, 10), 1)

Behavior Over Time Tests

Within a single run of the model, assessing if the behavior of the simulation actually follows that which is described by the authors:

Claim: “[The State] may swing back and forth between concessions and repression, trying to find a combination that quells protest…” Pg. 188

Interpretation: In some cases, the regime may begin by primarily working to increase either repression or concessions, such that the instantaneous rate of Making concessions is greater (or less) than that of Making threats and then at some point in the simulation, the relative weighting of these two parameters will switch.

def test(row):
    output = model.run(row.to_dict(),
                       return_columns=['Making concessions','Making threats'])
    # is there a preference for making concessions at some point?
    repr_pref = output['Making concessions'] > output['Making threats']
    # is there a preference for making threats at some point?
    threat_pref = output['Making concessions'] < output['Making threats']
    # do both preferences occur at various points?
    return any(repr_pref) and any(threat_pref)

result = parameters.apply(test, axis=1)
print(any(result))
True

Parametric Tests

  • Testing comparisons of how the system would respond to various conditions.

  • Testing model behavior over a range of values for exogenous parameters.

Claim: Authoritarian states may … rachet up repression too slowly and insufficiently to halt mobilization. Pg. 188

Interpretation: In some cases, slow implementation of repression may fail to stop protests that would have been successfully repressed through rapid application of the same absolute level of repression.

def test(row):
    params = row.to_dict()
    repressive_threshold = np.NaN
    for threat_rate in np.linspace(*p_ranges['Threat rate'], num=10):
        res = model.run(params.update({'Threat rate': threat_rate}),
                        return_columns=['Protest',
                                        'Repressive threat Tr']).iloc[0]
        if res['Protest'] == 0:  # protest is quelled
            if repressive_threshold is np.NaN:  # first successful
                repressive_threshold = res['Repressive threat Tr']
            elif (res['Repressive threat Tr'] < repressive_threshold - .0001):
                return True  # higher rate succeeds over lower rate
    return False

result = parameters.apply(test, axis=1)
print(any(result))

Conclusions

  • The process of formalizing statements of system behavior with respect to time or as a result of varying parameters promotes rigor in claims of observable behavior of the system.

  • The field of software testing has lessons for how such behavioral tests are constructed and provides a toolset for efficiently doing so.

  • The approach has benefits for those seeking to create their own theories, and to understand theories promoted in existing literature.

If behavioral claims are consistent with theory:

  • Your claims are stated in a defensible way

-or-

  • You have mastery of an important theory in your field

Todo: Look for new observable implications of the theory.

Behavioral claims are not consistent with theory:

  • You found a problem with your model before anyone else did

-or- - You found a problem with an important theory in your field

Todo: Fix the holes.

“We must design assessment into our work from the start so we can discover errors more quickly”

All models are wrong: Reflections on becoming a systems scientist. Sterman 2002

“Let every man test his own work. Then he will be proud of his own work.”

Biblical Letter from Paul of Tarsus to the Galatians. c50 AD. ch.6 v.4