knowing knowledge
a blog devoted to exploring the world of research and epistemology
Sunday, September 25, 2011
"yada, yada, yada" (Seinfeld and the Hebrew roots of knowledge)
Friday, September 9, 2011
solomon's judgement (or how the p-value is crazy like a fox)
Hypothetical Reality #1: Drug A = Drug B | Hypothetical Reality #2: Drug A ≠ Drug B | |
Hypothetical Result #1: Drug A ≠ Drug B | False Positive (This is termed “Level of Significance” and is synonymous with a Type I error...observing a difference when no difference truly exists.) | True Positive (This is termed “Power” because it is the power to find a true difference) |
Hypothetical Result #2: Drug A = Drug B | True Negative (This is termed “Level of Confidence”) | False Negative (This is the definition of a Type II error...failing to find a difference when one truly exists) |
Note that this is all hypothetical. The researcher sets this level of significance prior to the experiment; typically level of significance is set at 0.05. Then the researcher calculates a test statistic called a p-value. Any analysis that yields a p-value of less than 0.05 is considered significant and adds to the evidence that a true difference exists between groups. The p-value can only be obtained by assuming that the condition in box one (false positive) exists. That is, a p-value is calculated based on the assumption that in truth there is no difference between groups who have received different treatments. In other words, if two groups are treated with different diuretic drugs for high blood pressure, the researcher has a hunch that one drug is better than the other. To test the probability that his drug is superior to the other, he either must have access to the truth (which is not available in this kind of research design) or he must assume that there is no difference between the drugs and then test the observed difference against that assumption of no difference. So if he observes that his drug lowers blood pressure 10 points in the experimental groups and the other drug lowers blood pressure only 5 points in the control group, then the difference between the two drugs would be five (that is: 10-5 = 5). Assuming that there truly is no difference (hypothetical condition for box 1 in the above contingency table) he then subtracts the assumed difference (zero) from the observed difference (five) and divides this number by the standard error of the difference between both group means. Remember, standard error is a term used to reflect standard deviation of many experimental/control differences from the true mean if the experiment were repeated with random sampling from the population a large number of times. If the observed difference (5 in our example) is much larger than standard error (lets say standard error is 1 in our example), then the statistic will be large. The statistic can then be utilized to calculate a probability of observing the difference of 5 when the hypothetical reality of no difference is true. This is done by finding the test value on a normal bell curve and determining the area under the curve between that value and the left end of the curve. This area value is then the divided by the value of the entire area under the curve. The results in a probability of finding the difference between the two groups assuming that in truth there is no difference.
Sunday, August 14, 2011
martial arts, global warming, and chance.
Saturday, August 13, 2011
descriptive stats revisited
The spread of the data indicates to what degree all the data points vary away from the average. Standard deviation is a number that indicates the average distance of all data points from the average. Higher standard deviations indicate that the range of data spread further on either side of the average. See the lines on either side of the central average line on the graph below? The first line on either side of the average line indicates a point that is 1 standard deviation from the mean. The next line indicates a point that is 2 standard deviations away from the mean. Particular to the normal bell curve is the reality that 67% of all data will be only one standard deviation from the mean and 95% of the data only 2 standard deviations away from the mean.
The shape indicates whether the bell curve has a long tail to the left [negative skew], long tail to the right [positive skew], tall curve [leptokurtic], flat cruve platykurtic). Any distortion of the bell curve impacts statistical calculations as basic statistics assume a normal bell curve. See below for depictions of skewed data.
Thursday, August 4, 2011
redlegs and descriptive stats
- .
.
Saturday, July 30, 2011
ents
I believe that natural human knowledge is capable of more than a mere approximation of reality. I believe that natural human knowledge can achieve a one-to-one relationship with reality; that is to say that theories of reality (i.e.: how things are) can represent specified slices of reality with perfect accuracy (e.g.: Copernicus and heliocentrism; Newton and gravitational forces, etc.). However, I also believe that the vast majority of natural knowledge produced in a given calendar year is specious at worst (i.e.: false knowledge) and an approximation of reality at best (i.e.: analogous knowledge). Here I refer not to the sort of assertions that pass off as knowledge among the popular pundits, but rather I am referring to critically tested, peer-reviewed scientific knowledge. It is this high level of knowledge that ranges from false to analogous. To characterize this range of knowledge quality I have generated a list of four (4) descriptors beginning with the most basic and contingent sort of knowledge progressing up to true knowledge or "fact" (as science would have us call it).
Ronald Fisher: what's not to like about this statistician? The very incarnation of an Ent.
Possible. Definition: that can be; capable of existing. For example an oval circle is quite possible; an oblong circle is quite possible. However, a square circle could never be a circle (i.e.: essentialistically impossible). An MRI machine that produces only rare cheese could never be an MRI machine (i.e.: nominalistically impossible). Certain things are impossible. Importantly, though, many possible things are not true. For instance, although it is possible for a live cow to go over the moon, if I claim to know that a live cow actually went over the moon last night this would be a false knowledge claim. Although it is possible for the sun to rise in the West and set in the East, this is not the truth of what happens on planet Earth. That which is conceivably possible is not necessarily existentially so. This being held true, there are three possible descriptions of possibilities and existence: existent possibility, nonexistent possibility, nonexistent impossibility. There is no such thing as an existent impossibility.
Karl Pearson: note with what alacrity he wields his pen in noble service of statistics.
Probable. Definition: likely to occur or to be so; that can reasonably be expected or believed on the basis of the available evidence, though not proved or certain. For example, a 2002 study published in Spine journal deduced a clinical prediction rule to identify which patients with low back pain are most likely to be successfully treated with spinal manipulation. The study deduced these 5 parts for a prediction rule: 1) pain onset less than 16 days prior to treatment; 2) no symptoms distal to the knee; 3) Fear Avoidance Behavior Questionnaire score less than or equal to 19; 4) one or more hypomobile segments in the lumbar spine; 5) at least one hip with more than 35 degrees of internal rotation motion. A patient must exhibit at least 4 of the above 5 traits to be considered positive on the rule. A subsequent randomized controlled trial published in 2004 compared patients who satisfy this rule with those who do not. They found that patients positive on the rule and receiving spinal manipulation have an adjusted odds ratio for successful treatment of 60.8 when compared to those negative on the rule and receiving exercise. That is to say, a person who satisfies the prediction rule and is treated with spinal manipulation is 60.8 times more likely to have a successful outcome than someone negative on the rule and receiving exercise. Although adjusted odds ratios are not exactly probabilities, they approximate mathematical probabilities closely when baseline risks are low. Consequently we can say that a person with low back pain who satisfies the prediction rule is very likely to gain significant function after only 2 treatments of spinal manipulation and this functional gain is likely to last for 6 months. But, this is a probability. This means that a small number of people who are positive on the rule and receive spinal manipulation will feel no improvement or worsening symptoms after treatment. So this is a practical example of probabilistic knowledge. The above studies did not elucidate fact, they elucidated pragmatic probabilities to help clinicians, insurance companies and patients make a decision about treatment options.
Valid. Definition: sound; well grounded on principles or evidence; able to withstand criticism or objection, as an argument. The move from probable knowledge up to valid knowledge is like passing over the Natural Light beer for Kentucky Bourbon Barrel Ale. Like declining the Hostess Snack Cake to save room for Graeter's ice cream. Like operating on the spine with a high precision Medtronic drill instead of a high speed DeWalt house drill. Valid knowledge within medical diagnostics is produced when a new test for diagnosis is measured against a gold standard. Assessing the new test (lets say for detecting prostate cancer) against the gold standard can result in four possible outcomes: true positive, false positive, true negative, and false negative. The box below illustrates this well:
| Gold Standard Negative (no prostate cancer present) | Gold Standard Positive (prostate cancer present) |
New Test Positive | False Positive (invalid test result) | True Positive (valid test result) |
New Test Negative | True Negative (valid test result) | False Negative (invalid test result) |
A variety of statistics are available to describe these results, however I will spare the reader details of these at this time. Let the reader note, though, that validity does not measure probability of departure from truth but rather it measures reality of departure from truth. In this way validity is not a guess at the truth but a true measure of the truth. The greatest mistake in popular conceptions of research occurs when studies analyzed by probabilistic statistics (i.e.: p-values and confidence intervals) are interpreted as if they utilize validity statistics (i.e.: likelihood ratios). In this manner, many studies are presented as valid when they ought truthfully be presented as some grade (e.g.: low, moderate, high grade) of probabilistic evidence for or against a hypothesis.
Thomas Bayes: the right Reverend had a penchant for more than divine truth.
Veritable. Definition: true; real; actual. I am taking the name of this descriptor from the Latin word veritas: truth. How does the descriptor "veritable" differ from the descriptor "valid"? Knowledge that is veritable is perfectly true (1-for-1 correspondence with reality or logic) whereas knowledge that is described by validity is expressed as a degree of departure from truth. Given the above diagnostic example: the new prostate cancer test would be characterized as being valid if it deviates from the gold standard to only a small degree and invalid if it deviates from the gold standard to a large degree. However, the gold standard test is veritable: a container of 100% true knowledge, finding prostate cancer whenever it exists and ruling out prostate cancer whenever it does not exist. Veritable knowledge is not simply produced by a mathematical formula, but is rather arrived at through years of practice, research (statistically analyzed), critical review (e.g.: statistical assessment, construct assessment, etc.), technical reformulation, and even large scale disciplinary enactment (i.e.: use within the guild at large).
Jerzy Neyman: a fierce opponent to Fisher; his calculations were as killer as his Hitler-ish look.
One final cautionary note is due here. Having moved through this post, and engaged with the proposed hierarchical categories of knowledge (low quality "possibility" knowledge up to highest quality "veritable" knowledge) the reader might be tempted to scorn knowledge that is beneath "veritable" in my hierarchical scheme. The purpose of these hierarchies is not to cast aspersion upon "inferior" knowledge, but rather to appropriately characterize the levels of knowledge and their relation to truth. This has very pragmatic implications. Possible knowledge should be utilized with extreme caution and proposed lightly with a ready willingness to believe alternative possibilities; probable knowledge should be utilized discriminantly and defended publicly with openness to evidence to the contrary; valid knowledge should be ubiquitously employed and publicly defended with great vigor and energy; veritable knowledge should be published and re-published with regular frequency as well as being proclaimed with certainty to winsomely disabuse falsehoods consistently where falsehoods appear.
Next post will be devoted to a description of basic statistics and some of their mathematical formulations.