Formal Behavior Testing ======================= Evaluating theory for dynamic consistency by formalizing statements about system behavior *James Houghton* .. code:: ipython3 %pylab inline import pyDOE import pandas as pd .. parsed-literal:: Populating the interactive namespace from numpy and matplotlib The process of formalizing theory as a simulation model helps to discipline a researcher’s effort, forcing them to be explicit in their assumptions and logically consistent in their theory’s structure. Similarly, the process of formalizing statements of system behavior with respect to time or as a result of varying parameters promotes rigor in claims of observable behavior of the system. The discipline of software testing has lessons for how such behavioral tests are constructed and provides a toolset for efficiently doing so. The approach has benefits for those seeking to create their own theories, and to understand theories promoted in existing literature. Structural Consistency ---------------------- Are the statements of theory structure logically consistent with one another? Are they ambiguous, missing pieces, or conflicting? The process of modeling can be used as a personal discipline, enforcing rigor in the understanding of a theory’s structure. - Make assumptions explicit - Enforce exhaustive structural communication - Identify system boundaries “Formalization helps you to recognize vague concepts and resolve contradictions. Formalization is where the real test of your understanding occurs: computers accept no hand waving arguments.” *Business Dynamics. Sterman 2000, 3.5.3* Example of Formalizing Structural Statements ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ “The value of achieving success [in a protest] depends on whether things might be gained by action… this includes both new advantages and avoiding harms that are currently experienced or anticipated” *Threat (and Opportunity): Popular Action and State Response in the Dynamics of Contentious Action. Goldstone and Tilly 2001* **Interpretation:** ``Gains that would result from success = New Advantages + Harms Avoided`` .. figure:: Goldstone_Tilly_2001.png :alt: model diagram model diagram Behavioral Consistency ---------------------- Does the theory’s structure actually **create the behavior** that is described by the theory’s authors? Two contexts: - Testing preconstructed theories against their assumed implied behavior - Testing dynamic models during construction for consistency with observed behavior “I examine an existing theory in detail, formalizing it to investigate how well the theory accounts for the phenomena it’s authors set out to explain.” *Problems and paradoxes in a model of punctuated organizational change. Sastry 1997* “The purpose of the model is… to play the roles of the actors in the system and to trace out the consequences of their actions over time, thus providing a test of the theory by checking whether the assumptions can actually produce [asserted behavior].” *The growth of knowledge: Testing a theory of scientific revolutions with a formal model. Sterman 1985* Lessons from software testing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We normally express statements of behavior as *‘Reference Modes’*, and then manually compare reference modes to behavior. We can learn from explicit behavioral testing in software development: +---------------------------------+------------------------------------+ | Software Development | SD Modeling | +=================================+====================================+ | Make expected behavior of code | Make statements of system behavior | | explicit | explicit | +---------------------------------+------------------------------------+ | Make developing software easier | Help with model formulation | +---------------------------------+------------------------------------+ | Ensure robustness to unknown | Ensure robustness to uncertain | | user input | parameters | +---------------------------------+------------------------------------+ | Support code acceptance by | Create defensible statements of | | client | behavior | +---------------------------------+------------------------------------+ “Testing is an extremely creative and intellectually challenging task.” *The art of Software Testing - Myers et. al 2012* A Software Testing Example -------------------------- .. code:: ipython3 def test_step(self): """ Tests the PySD version of Vensim's `STEP` function """ from pysd import functions # What are we examining? functions.time = lambda: 5 # What are the conditions of our test? self.assertEqual(functions.step(1, 10), 0) # what is expected? functions.time = lambda: 15 # New conditions self.assertEqual(functions.step(1, 10), 1) # New expectation functions.time = lambda: 10 self.assertEqual(functions.step(1, 10), 1) Example: Threat (and Opportunity): Popular Action and State Response in the Dynamics of Contentious Action ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Goldstone and Tilly 2001 |Goldstone Tilly Coding| .. |Goldstone Tilly Coding| image:: Goldstone_Tilly_2001_Coding.png .. code:: ipython3 import pysd model = pysd.read_vensim("Goldstone_Tilly_2001.mdl") .. code:: ipython3 p_ranges = { 'Concession fractional adjustment':(1,2), 'Repression fractional adjustment':(1,2), 'Initial level of current threat':(0,5), 'New advantages A':(0,5), 'Threat rate':(0,2), 'Concession rate':(0,2), 'Probability of success O':(0,1), 'Repression unit cost':(0,5), 'Concession unit cost':(0,5) } # Latin Hypercube Sample of Parameter Space norm_samples = pyDOE.lhs(n=len(p_ranges), samples=500) parameters = pd.DataFrame([{key:n*(p[1]-p[0])+p[0] for n,(key,p) in zip(row, p_ranges.items())} for row in norm_samples]) .. code:: ipython3 parameters.hist(figsize=(12,6), layout=(3,3), histtype='stepfilled', alpha=.5); .. image:: testing_behavior_files/testing_behavior_19_0.png Behavior Over Time Tests ------------------------ Within a single run of the model, assessing if the behavior of the simulation actually follows that which is described by the authors: **Claim:** “[The State] may **swing back and forth** between concessions and repression, trying to find a combination that quells protest…” Pg. 188 **Interpretation:** In some cases, the regime may begin by primarily working to increase either repression or concessions, such that the instantaneous rate of ``Making concessions`` is greater (or less) than that of ``Making threats`` and then at some point in the simulation, **the relative weighting of these two parameters will switch**. .. code:: ipython3 def test(row): output = model.run(row.to_dict(), return_columns=['Making concessions','Making threats']) # is there a preference for making concessions at some point? repr_pref = output['Making concessions'] > output['Making threats'] # is there a preference for making threats at some point? threat_pref = output['Making concessions'] < output['Making threats'] # do both preferences occur at various points? return any(repr_pref) and any(threat_pref) result = parameters.apply(test, axis=1) print(any(result)) .. parsed-literal:: True Parametric Tests ---------------- - Testing comparisons of how the system would respond to various conditions. - Testing model behavior over a range of values for exogenous parameters. **Claim:** Authoritarian states may … rachet up repression **too slowly and insufficiently** to halt mobilization. Pg. 188 **Interpretation:** In some cases, slow implementation of repression may fail to stop protests that would have been successfully repressed through rapid application of the same absolute level of repression. .. code:: ipython3 def test(row): params = row.to_dict() repressive_threshold = np.NaN for threat_rate in np.linspace(*p_ranges['Threat rate'], num=10): res = model.run(params.update({'Threat rate': threat_rate}), return_columns=['Protest', 'Repressive threat Tr']).iloc[0] if res['Protest'] == 0: # protest is quelled if repressive_threshold is np.NaN: # first successful repressive_threshold = res['Repressive threat Tr'] elif (res['Repressive threat Tr'] < repressive_threshold - .0001): return True # higher rate succeeds over lower rate return False result = parameters.apply(test, axis=1) print(any(result)) Conclusions ----------- - The process of formalizing statements of system behavior with respect to time or as a result of varying parameters promotes rigor in claims of observable behavior of the system. - The field of software testing has lessons for how such behavioral tests are constructed and provides a toolset for efficiently doing so. - The approach has benefits for those seeking to create their own theories, and to understand theories promoted in existing literature. If behavioral claims *are* consistent with theory: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Your claims are stated in a defensible way -or- - You have mastery of an important theory in your field Todo: Look for new observable implications of the theory. Behavioral claims *are not* consistent with theory: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - You found a problem with your model before anyone else did -or- - You found a problem with an important theory in your field Todo: Fix the holes. “We must design assessment into our work from the start so we can discover errors more quickly” *All models are wrong: Reflections on becoming a systems scientist. Sterman 2002* “Let every man test his own work. Then he will be proud of his own work.” *Biblical Letter from Paul of Tarsus to the Galatians. c50 AD. ch.6 v.4*