One obvious application of the expression data is to determine the relative similarities of different cell types. There are a number of ways in which this can be done, the most common perhaps being some form of hierarchial clustering based on some sort of distance or correlation measurement. This program provides a slightly different mechanism for achieving this. First, all against all distances between the currently selected samples are calculated using either a direct or a modified euclidean distance (using a soft thresholding). These distances are then mapped to a 2 dimensional field using a simple error minimisation algorithm. The distances are calculated by the server (as the server has the expression data in memory), but the mapping to two dimensions are carried out by the client application. To compare the samples to each other it is advisable to follow the following procedure.
The experiment compare window lets the user perform two types of comparisons. One uses euclidean distances based on the expression data, whereas the other maps the data between 0 and 1 using a sigmoidal function (thus creating a sort of soft thresholding). The second of these seems to result in better groupings of samples, though it is not entirely clear why this is. It should be emphasised that these methods are quite simple, and that there are many ways of improving the calculated distances and if people are interested in doing this then let me know and we can discuss the issues that need to be tackled and how to implemement it in code (actually quite simple). For the flat comparison there are two parameters that can be changed, one the order of the sigmoid function and sigma, a value which changes the width of the function (I don't remember exactly off hand, but this may be described in more detail at later stages). Press the top 'Compare' button to start a normal comparison and the lower 'Compare' button to start a flattened comparison. A new widget will be inserted into the window when the comparison is finished. This contains controls for the process of mapping the points to a two-dimensional field using the distances obtained in the comparison.
The bottom of the Compare Experiments window also contains some buttons and stuff labelled 'Trace Experiments'. This is just something that I was playing around with, and in general I would suggest that you ignore this as it doesn't appear to do anything particularly useful at the moment. Artefacts of some thought-experiments. I like to keep it there in case I think of a way of fixing the functions.
There is also a function that allows you to read in distances between things (can be anything) created by external programs (in a specified file format). This button is labelled 'Read Phylip Distances' as I used this a few times to display distances between proteins calculated by the Phylip suite of programs. I don't remember off hand what the requirements for the input file are, but if you're interested let me know or check the source code.
The comparison functions return a set of all-against-all distances between samples. These need to be displayed in some meaningful manner. This program uses a simple error minimisation algorithm that I've termed 'Self Organising Deltoids' in the absence of a better name (this algorithm may have been described elsewhere, but I haven't seen it.. if you have please let me know, so that I can use the proper name). This algorithm works by first assigning random positions to the samples in a 2-D field, then comparing the distances between samples represented in two dimensions to he distances obtained by comparing the expression patterns. If two samples are too close to each other (i.e. closer than the measurement obtained from the expression data) then a repulsive force is applied to the sample, if they are too far apart, then an attractive force is calculated. The program calculates all-against all forces in this manner, and then moves the sample points. This is carried out in an iterative manner until some criteria is met. Currently the program doesn't try to guess when to stop, but rather just runs 500 iterations or so. The resulting forces and movements are displayed in a window that opens when the distances are returned. Samples in this window are represented with blobs (with sample numbers superimposed) with the repulsive and attractive forces that the blob is subjected to being displayed by yellow and blue lines respectively. Naturally this algorithm does not usually find an optimal arrangment, and the blobs are colour coded to indicate the amount of stress they are under (green minimal stress, red maximal stress).
To start the transformation procedure, press the start button in the 'new widget' shown in the above figure. Let the mapping procedure finish before you try to doother things within the program (the actual mapping procedure runs in a separate thread, but the interacting drawing makes it difficult to use the program for other things during the mapping procedure. The procedure is finished when the blobs stop moving (or at least the forces stop changing). If there are many samples, then it is possible for samples to get caught in inappropriate areas by the repulsion of intermediate samples. If this happens, the caught samples will have very long yellow attractive lines pointing to other samples. The program allows you to drag these to a new position in the map and then to view the resulting forces and to continue the mapping procedure again (press 'Continue'). It is also possible to restart the mapping procedure from a different random seed by pressing the 'Restart' button. Alternatively for fun, it is possible to replay the mapping procedure by pressing the 'Replay' button. Note that in order to allow the replay, the program stores all the coordinates and forces in memory. For large data sets this can rapidly take up large amounts of memory and the program can crash or come to a crawling halt as a result of running out of memory if you are not careful (this is actually only necessary for the replay function which is more of a fun thing than a serious thing, so this will probably be optional at some point in the future).
In addition to moving single points around the area you can also select two groups of samples by drawing a region around those samples. The selected samples will be displayed in a different colour, indicating group membership. Actually this selection is just a interface element at the moment, but the idea is that it should be possible to ask questions of grouped samples (i.e. what's different between these samples and these samples). One of the obvious things to do for the future, but implementing the righh interfaces and choosing good statistical functions is not completely trivial.
There is a contextual menu that can be accessed by right-clicking on the drawing area. This menu only has two entries at the moment, one 'Compare Cell Types' doesn't do anything. The second entry, 'Set Coordinates', is related to the 'Toggle Surface Plot' of the raw data expression plot window. I'll leave it to the user to work out how to use these functions (they are anyway more examples of thought experiments than solidly useful functions).