Basic Philosopy
Underlying Premises
The way in which the programs which make up the eXintegrator system handle and display
microarray data are based on a small number of premises regarding what is useful, and that
which perhaps is somewhat less useful.
- The value of looking at the expression of hundreds or thousands of probe sets at a
time is generally overestimated. As such these programs do their best to let the user search
and select relevant probe sets either by expression patterns or by associated annotation.
- Biological relevance is more importance than statistical correctness. Most microarray
experiments are essentially fishing expeditions where the researchers look for genes which
show some differential expression across some samples. Good statistics allows for the researcher
to limit the number of alternatives to study further, but it is very rare that the statistics
are considered proof of the fact. Since the stats themeselves generally do not end up as
part of the final proof they don't actually matter that much as long as there are quick ways
to screen interesting from not so interesting genes, for which the limited expression pattern
of the typical experiment is only part of hence....
- Context is king. These programs aim to provide the user with context by integrating the display
of probe set annotation with probe set expression data as well as providing an 'expression'
context by incorporating data from an extended data set.
- Raw data is best. It should always be easy to view the raw data which from which expression
values are calculated.
- The more people view the data the more useful it is. Hence this is a client server system
which does its best to make casual analyses of the data easy, painless and as fast as possible.
I've always wanted to make people look at the data with fewer preconceptions about what they are
looking for. As such I have not made a big effort to provide statistics which ask very well defined
questions, but rather which provide means of dealing with more fuzzy approximations (which doesn't
mean there's any fuzzy logic built in.. don't even really know what that means).
- Simplicity is a wonderful thing. Ok, in the beginning it was all kind of simple, but
things change...
This is not to denigrate other means of doing things, merely to give some idea as to
why the programs behave in the manner in which they do.
Basic Operating Mode
The programs allows the user to select and or order probe sets either by statistical
methods or by database lookups. The selection of sets of probe sets only results in an
index of these probe sets being loaded by the client application. The expression data
for the selected probe sets are only loaded from the server when requested by the client.
When this happens, the server not only sends the expression data for that probe set,
but also the set of annotation that is available for that probe set. This allows the user
to balance the clarity of the expression data with how biologically interesting or useful
the gene represented is. The programs could be said to actively promote decisions on the
basis of 'gut-feeling' rather than statistical correctness.
Almost all statistical queries work on the currently selected probe sets rather than on the
the whole data set available in the database. This makes it easy to select a set of probe
set on the basis of some annotation, and then to order or select a subset of this by some
statistical means. In addition it is possible to combine past selections using boolean
logic which can then serve as the source of the next analysis.
To use the programs the users have to be registered with the database, and must
log in using their password and username before gaining
access to the main window of the client application.