Manufacturing Defects Synthetic Data ==================================== In this notebook we generate some data that will represent measurements of defects in a manufacturing setting. .. code:: ipython3 import numpy as np import pandas as pd .. code:: ipython3 #generate synthetic data Factors = [] Outcome = [] numpoints = 2000 for workday, time_per_task in zip(np.random.normal(loc=.3, scale=.05, size=numpoints), np.random.normal(loc=.05, scale=.01, size=numpoints)): Factors.append([workday, time_per_task]) Outcome.append( 0*workday**2/(time_per_task**2) + 1/time_per_task**1.5 + 1000*workday**1.5) .. code:: ipython3 data = pd.DataFrame(Factors, columns=['Workday', 'Time per Task']) data['Defect Rate'] = Outcome data['Defect Rate']/= data['Defect Rate'].max()*10 data['Defect Rate'] += np.random.normal(scale=.003, size=len(data['Defect Rate'])) data.head() .. raw:: html
| Workday | Time per Task | Defect Rate | |
|---|---|---|---|
| 0 | 0.303114 | 0.060810 | 0.023022 |
| 1 | 0.263133 | 0.052325 | 0.023017 |
| 2 | 0.230397 | 0.065387 | 0.015868 |
| 3 | 0.265632 | 0.044866 | 0.032806 |
| 4 | 0.298651 | 0.038648 | 0.035234 |