Are you the publisher? Claim or contact us about this channel


Embed this content in your HTML

Search

Report adult content:

click to rate:

Account: (login)

More Channels


Channel Catalog


(Page 1) | 2 | newer

    0 0

    As I promised yesterday, this time, we will try to skew our figure, as if it was in 3D, and we were looking at it from an angle. I must admit that this is something that one would not put in a publication, unless it is a poster or presentation, perhaps. In those cases, however, it might lend a refreshing new, errr.., perspective to our data. Before diving into the script, here is the figure, so you can decide whether you want to read on:)



    Then, here is our script that was responsible for the figure
    reset
    xmin = 0; xmax = 10; zmin = -0.4; zmax = 0.75
    set view 60, 30
    unset key; unset colorbox
    set border 1+16+128+1024
    unset ytics; set xtics out nomirror; set ticslevel 0
    set yrange [0:-0.1]
    set zrange [zmin:zmax]
    set grid front; unset grid
    set xtics xmin,2,xmax-1
    set ztics zmin,0.2,zmax
    f(x) = exp(-x/4.0)*sin(x)
    c(x) = exp(-(x-xmax/3.0)*(x-xmax/3.0)/1.0)
    set xlabel 'Time [a.u.]'
    set label 2 'Amplitude [a.u.]' at graph -0.35, 0.3 rotate by 90
    set parametric
    set iso 3, 3
    set urange [xmin:xmax]
    set vrange [zmin:zmax]
    set table 'perspective1.dat'
    splot u, 0, v
    unset table
    set urange [xmin+0.2:xmax-0.2]
    set table 'perspective2.dat'
    splot u, 0, f(u)
    unset table
    unset parametric
    set size 1.4, 1.4
    set palette defined (0 "#7b6cff", 1 "#eeeeff")
    splot 'perspective1.dat' u 1:2:3:(c($1)) w pm3d, \
    'perspective2.dat' u ($1+0.05):2:($3-0.02) w l lw 6 lc rgb "#888888", \
    '' w l lw 6 lt 1


    The trick that we use here is to plot in 3D, and take off all those elements of the figure that we do not actually need. In the beginning, we define a couple of variables, in order to make our life a bit easier. Then we unset the colorbox and the key, and set only those borders that we want to see. If you are interested in how I came up with those numbers in the definition of the border, just issue the command
    ?border

    in the gnuplot prompt. After this, we keep unsetting things, and define our ranges. The next noteworthy command is
    set grid front; unset grid

    At first, this seems silly, but the reason is real: in the figure we have a background, which would obscure our tic marks. The way out of this problem is to push the tic marks to the front, which is achieved by setting the grid to the front. Since we do not actually need the grid, we unset it, but the setting, its front position, is still there. The tic marks inherit the position of the grid, and thus, they will be in the forefront.

    The function definitions are f(x), the function that we want to plot, and c(x), which will determine the colouring of our background. Modify them accordingly.

    We set the zlabel by hand, because it might not be possible to turn it by 90 degrees otherwise. (Not all terminals would support it.) Then we plot the background, and the function, both to a file, so that it will be easier to colour them in the next step, where we plot the real thing. Note that in the plot of the background, we use four columns, while there are only 3 in the data file. The fourth one determines the colour as the position on the graph. The colour is given by the function c(x), and the palette, which we defined one line earlier. You should change these two things, if you are not satisfied with the background you get. Finally, we plot the function twice. Once in gray, a bit shifted to the right and down, and for the second time, in red. In this way, we add a shadow to our curve. If you want to improve the shadow, you should look at my post from the 30th August

    I should also add that we discussed a vertical skew. In case you want to skew the figure horizontally, all you have got to do is to plot on the x-y plain instead of x-z.


    0 0
  • 10/13/09--13:12: The shiny histograms, again
  • Do you remember those shiny histograms that we discussed some long, long time ago, at the beginning of the summer? And do you also remember how much of a hassle it was to create them, since we relied on an external gawk script, and we had to build our rectangles, one by one? And have you ever wondered what all that fuss was about and whether there was an easier solution? A one-liner, perhaps? If the answer to these questions lies in the affirmative, go no further! We will discuss a method of making those histograms, without an external script, only with legal gnuplot commands, and in 5 lines. I understand that 5 lines is just 4 lines longer, than one would expect from a one-liner, but on the other hand, three out of those 5 lines are equivalent, so I don't feel so bad about this any more:)
    OK, so here is our figure


    here is our data file, which we will call 'hist.dat'
    1 2 3
    2 2 2
    4 10 3
    5 1 4
    5 6 2
    and here are our scripts, 'hist.gnu',
    reset
    unset key; set xtics nomirror; set ytics nomirror; set border front;
    div=1.1; bw = 0.9; h=1.0; BW=0.9; wd=10; LIMIT=255-wd; white = 0
    red = "#080000"; green = "#000800"; blue = "#000008"
    set auto x
    set yrange [0:11]
    set style data histogram
    set style histogram cluster gap 1
    set style fill solid
    set boxwidth bw
    set multiplot
    plot 'hist.dat' u 1 lc rgb red, '' u 2 lc rgb green, '' u 3 lc rgb blue
    unset border; set xtics format ""; set ytics format ""; set ylabel ""
    call 'hist_r.gnu'
    unset multiplot
    and 'hist_r.gnu'
    bw=BW*cos(white/LIMIT*pi/2.0); set boxwidth bw; white=white+wd
    red = sprintf("#%02X%02X%02X", 128+white/2, white, white);
    green = sprintf("#%02X%02X%02X", white, 128+white/2, white);
    blue = sprintf("#%02X%02X%02X", white, white, 128+white/2);
    rep
    if(white<LIMIT) reread

    Then let us see what is happening here! At the beginning, we define various variables, most notably, white, red, green, and blue. The rest up to the first plot command is nothing but setting up the figure: we define the range, tell gnuplot to treat our data as histogram, set the width of the bars, and finally, set multiplot.
    There is nothing exciting in the first plot, except, that we specify the colour of the bars as

    plot 'hist.dat' u 1 lc rgb red, '' u 2 lc rgb green, '' u 3 lc rgb blue
    The strings red, green, and blue were defined at the beginning of our first script, thus, we learn here that gnuplot will accept any defined (and valid) string as the specifier of the colour. After our first plot, we unset the border, re-set the format of the xtic and ytic to empty, and do likewise with the ylabel. Should there be an xlabel, we should have to do the same there. Having done this, we call our second script, which we will dissect now. This is really nothing but a 'for' loop, that we have discussed a couple of times before. In fact, quite a few times. The first two commands re-set the widths of the bars in the next plot. Note that I set the width in such a way that it would draw the outline of a circle as we step through the values of white. This is what we increment next, mind you!

    The next three lines are basically identical: we re-define the colours, using a sprintf command in each step. If you recall how the RGB colours are defined, we have to create a string that looks like
    #00FF00
    say. This is what our sprintf command will do, returning a string of this form that depends on the value of 'white'. There are some small nuances in the corresponding colour channel of red, green, and blue, respectively, namely, that the base colour for red was
    #080000
    and we want to linearly interpolate between this colour, and white,
    #FFFFFF
    so, we have to apply the relevant linear function, but there is nothing beyond this. Obviously, if you are unhappy with the colour scheme that I have (I know full well that these are not the best colours...), this is the place where you would have to tamper with the script. When we are done with re-defining the widths, and the colours, we simply replot our histogram, and do that, as long as the value of white is smaller, than the limit that we set at the beginning. In this particular case, 245. In most cases, we do not need this many plots, by the way! For a raster plot, 10-12 steps, for a vector format, something like 15-16 steps should be more than enough.

    At the end, we shouldn't forget about unsetting the multiplot. The only difficulty that I see with this figure is that it is not so straightforward to use a key. However, it is not terribly hard to come up with a solution for this problem: all we have to do is to put three vertical labels on the top of the first, second, and third column, indicating what they represent.

    0 0

    A couple of days ago, Cedric asked the question how we could add a shadow to a pm3d surface plot. In some sense, the problem turned out to be easier, than I had expected, but I am not sure that I did what he actually meant. Anyway, we could use it as our working hypothesis, and refine it, if necessary.

    The idea of putting "phong" on the surface was discussed ages ago, and I won't re-open that question here. For the shadow, we will just replot our surface (defined as z(x,y)) in a little bit strange way: instead of letting the x and y run through their corresponding ranges, we will restrict y to be equal to the maximum of the yrange. By doing so, we get this

    Here is our (short) script:
    reset
    unset colorbox; unset key
    set iso 2, 50
    set parametric; set urange [0:1]; set vrange [0:1.2]
    set table 'tb.tab'
    splot u, v, 1
    unset table
    unset parametric
    set iso 100, 100
    set xrange [-3:3]
    set yrange [-3:3]
    set zrange [0:1.2]
    set table 't.tab'
    splot exp(-x*x-y*y)+0.1*rand(0)
    unset table
    set ticslevel 0
    set pm3d
    set cbrange [0:4]
    f(x,y,z,a,b,s) = z*(exp(-(x-a)*(x-a)/s-(y-b)*(y-b)/s)/3.0+0.66)
    set palette defined (0 "#ff2222", 1 "#ffeeee", 2 "#aaaaaa", 3 "#2222ff", 4 "#8888ff")
    splot 'tb.tab' u ($1*6.0-3.0):(3):2:($2+3.0) w pm3d, \
    '' u (-3):($1*6.0-3.0):2:($2+3.0) w pm3d, \
    '' u ($1*6.0-3.0):($2*6.0-3.0):(0):(2.1) w pm3d, \
    't.tab' u ($1+1.0):(3):($3*0.7):(2.1) w pm3d,\
    '' u 1:2:3:(f($1,$2,$3,-0.5,-0.5,.8)) w pm3d
    First, we create the data that will be our background, then some dummy data, which in this particular case will be a noise Gaussian function in 2D. Then we move the surface to the bottom of the zrange, by setting the ticslevel to 0. The next step is the definition of the cbrange. We need to "overdefine" this, i.e., our cbrange is much larger, than the actual data range. The reason for this is that in this way, we can use the same colour palette, and we do not have resort to multiplot. The basic problem with multiplot is that we since we want to plot onto the same graph, we would have to re-set the border, tics, labels, and so on. By using the same plot, and same palette, we can avoid all this hassle. The price we pay is that our palette will be a bit more complicated, but we can live with that. Now, we have to define our palette, which will change between red and almost white for [0:1], and between blue, and almost white for [3:4]. We also defined a value at 2, which we will use for colouring the shadow, but the two main ranges are [0:1], and [3:4]. Note that we define disjoint ranges, thereby not walking into a trouble with the end points.
    Finally, we plot our background, and function. Pay attention to plots like
    splot 'tb.tab' u ($1*6.0-3.0):(3):2:($2+3.0) w pm3d
    This will plot a plane between x [-3:3], and z [0:1] at y=3, with a colour given by the value of ($2+3.0). This is nothing but pushing all z values in the plot into the [3:4] colour range.

    At the very end, we plot our function's shadow, namely,
    't.tab' u ($1+1.0):(3):($3*0.7):(2.1) w pm3d
    where the x values are shifted by 1.0 (so as to give the impression that the surface is lit from the (-1:-1) direction), the y values are all restricted to 3, which is the maximum of the yrange, and the z values are multiplied by 0.7, again for the same reason. Colouring is done by using one single value, 2.1. The very last step is to plot the surface itself, using the colouring given by f(...). If you want to have another direction for the lighting, or have a tighter focus, you should change the parameters in this function.

    0 0
  • 11/09/09--12:21: Patching gnuplot
  • One of the major advantages of open-source code is that if you would like to add some new features, you can easily do that. This applies to gnuplot, too, in fact, doing that does not require anything special. I have been quite inactive on this blog recently, and the reason is that Philipp Janert and I have been working on a patch to gnuplot.

    The steps of patching gnuplot are described on gnuplot's main web page. There are a number of patches uploaded to gnuplot's patch tracker, on which quite a few new features, still in the development phase, are published. It is really worthwhile to try them out, first, to provide feedback as to what is useful and what is not, and second, to help the developers to find bugs and other glitches, like what the syntax of a command should be and so on.

    Our patch is related to an old debate as to what gnuplot really is. At many a place, you will find the statement that "gnuplot is a plotting utility, not a statistical analysis package". I have nothing against this statement, however, when saying so, we have to tell what we mean by plotting. So, is plotting just placing a thousand dots at positions that represent our data? Or do we want more? E.g., throwing out data points that are unreasonably far from the mean. Or showing the mean, and the standard deviation? Or calling the reader's attentional to some special points, like the minimum or the maximum in a data set? And many similar things. I believe, plotting requires much more, than just showing the measurement data: a plot makes sense only, if we can point out what is to be pointed out. By the way, fitting falls into this category, and fitting has been an integral part of gnuplot for ages. The point being that the original statement (gnuplot is a plotting utility, not a statistical analysis package) has been wrong for a long time.

    The patch that I mentioned above was announced yesterday on the gnuplot development mailing list and you can find the patch for the source and the
    documentation on patch tracker. I have put a couple of examples on my gnuplot web site under patch. You can also find the full documentation.

    I would like to ask you, if you feel crafty and you can, download the patch, and try it, and let us know whether you find it useful, what else, do you think, we could do with it and so on. It would really help the development. Once the patch makes it to the main code, I will discuss various option on these pages.

    Just to wet your appetite, here is a figure that you could very easily make with the new patch. (You can find the code on my web site.)



    Many cheers,
    Zoltán

    0 0


    In my previous post, I mentioned a patch that you can compile into gnuplot, and that should make plots with some statistical properties a bit easier. Now, the problem with that patch is that, if you don't want to, or can't take the trouble of compiling gnuplot for yourself, it is no use. However, for most things contained in the patch, there is a work-around that should function properly in gnuplot 4.2. I will discuss those now.

    The first thing I did with the statistical patch was to plot the mean, minimum and maximum of a data set. This can easily be done in the following way.
    reset
    # Produce some dummy data
    set sample 200
    set table 'stats2.dat'
    plot [0:10] 0.5+rand(0)
    unset table

    set yrange [0:2]
    unset key

    # Retrieve statistical properties
    plot 'stats2.dat' u 1:2
    min_y = GPVAL_DATA_Y_MIN
    max_y = GPVAL_DATA_Y_MAX

    f(x) = mean_y
    fit f(x) 'stats2.dat' u 1:2 via mean_y

    # Plotting the minimum and maximum ranges with a shaded background
    set label 1 gprintf("Minimum = %g", min_y) at 2, min_y-0.2
    set label 2 gprintf("Maximum = %g", max_y) at 2, max_y+0.2
    set label 3 gprintf("Mean = %g", mean_y) at 2, max_y+0.35
    plot min_y with filledcurves y1=mean_y lt 1 lc rgb "#bbbbdd", \
    max_y with filledcurves y1=mean_y lt 1 lc rgb "#bbddbb", \
    'stats2.dat' u 1:2 w p pt 7 lt 1 ps 1

    At the beginning of our script, we just produce some dummy data, and call a dummy plot. This plot does nothing but fills in the values of the minimum and maximum of the data set. Then we fit a constant function. You can convince yourself that this returns the average of the data set.

    In the plotting section, we produce three labels that tell us something about the data set, and plot the data range with shaded region. Easy enough, and in just a couple of lines, we created this figure


    Now, what should we do, if we were to calculate the standard deviation. Well, we know how to calculate the average, so we will use that. Here is the script:

    reset
    set sample 200
    set table 'stats2.dat'
    plot [0:10] 0.5+rand(0)
    unset table

    set yrange [0:2]
    unset key
    f(x) = mean_y
    fit f(x) 'stats2.dat' u 1:2 via mean_y

    stddev_y = sqrt(FIT_WSSR / (FIT_NDF + 1 ))

    # Plotting the range of standard deviation with a shaded background
    set label 1 gprintf("Mean = %g", mean_y) at 2, min_y-0.2
    set label 2 gprintf("Standard deviation = %g", stddev_y) at 2, min_y-0.35
    plot mean_y-stddev_y with filledcurves y1=mean_y lt 1 lc rgb "#bbbbdd", \
    mean_y+stddev_y with filledcurves y1=mean_y lt 1 lc rgb "#bbbbdd", \
    mean_y w l lt 3, 'stats2.dat' u 1:2 w p pt 7 lt 1 ps 1

    What we utilise here is the fact that the fit function also sets a couple of variables. One of them is the sum of the residuals, which is called FIT_WSSR, while another is the number of degrees of freedom, FIT_NDF. However, we know that the number of degrees of freedoms is one less, than the number of data points, for we fit a function with a single parameter. Therefore, if we take the square root of the sum of residuals divided by the number of degrees of freedom plus one, we get the standard deviation. The rest of the plot is trivial, and this script results in the following graph:

    Incidentally, this can also be used for removing points that are very far from the mean. The following script takes out those data that are more than one standard deviation away from the mean.
    reset
    set sample 200
    set table 'stats2.dat'
    plot [0:10] 0.5+rand(0)
    unset table

    set yrange [0:2]
    unset key
    f(x) = mean_y
    fit f(x) 'stats2.dat' u 1:2 via mean_y

    stddev_y = sqrt(FIT_WSSR / (FIT_NDF + 1 ))

    # Removing points based on the standard deviation
    set label 1 gprintf("Mean = %g", mean_y) at 2, min_y-0.15
    set label 2 gprintf("Sigma = %g", stddev_y) at 2, min_y-0.3
    plot mean_y w l lt 3, mean_y+stddev_y w l lt 3, mean_y-stddev_y w l lt 3, \
    'stats2.dat' u 1:(abs($2-mean_y) < stddev_y ? $2 : 1/0) w p pt 7 lt 1 ps 1
    with the corresponding figure

    Only the last line is relevant: we use the ternary operator to decide whether we want to keep the point: if the deviation from the mean is less, than the standard deviation, we hold on to our data, otherwise, we replace it by 1/0, which is undefined, and gnuplot quietly ignores it. If you want to learn more about the working of the ternary operator, check out my post on the plotting of an inequality.

    We have, thus, already found a solution for two of the problems addressed in the patch. What about the third one, adding arrows to the plot at the position of the minimum or maximum, say. We can do that, too. Here is the script:

    reset
    set sample 50
    set table 'stats1.dat'
    plot [0:10] 0.5+rand(0)
    unset table

    set yrange [0:2]
    unset key
    plot 'stats1.dat' u 1:2
    min_y = GPVAL_DATA_Y_MIN
    max_y = GPVAL_DATA_Y_MAX

    plot 'stats1.dat' u ($2 == min_y ? $2 : 1/0):1
    min_pos_x = GPVAL_DATA_Y_MIN
    plot 'stats1.dat' u ($2 == max_y ? $2 : 1/0):1
    max_pos_x = GPVAL_DATA_Y_MAX

    # Automatically adding an arrow at a position that depends on the min/max
    set arrow 1 from min_pos_x, min_y-0.2 to min_pos_x, min_y-0.02 lw 0.5
    set arrow 2 from max_pos_x, max_y+0.2 to max_pos_x, max_y+0.02 lw 0.5
    set label 1 'Minimum' at min_pos_x, min_y-0.3 centre
    set label 2 'Maximum' at max_pos_x, max_y+0.3 centre
    plot 'stats1.dat' u 1:2 w p pt 6

    First, we retrieve the values of the minimum and the maximum by using a dummy plot. Having done that, we retrieve the positions of the minimum and maximum, by calling a dummy plot on the columns
    plot 'stats1.dat' u ($2 == min_y ? $2 : 1/0):1

    What this line does is substitute min_y, when the second column (whose minimum we extracted before) is equal to the minimum, and an undefined value, 1/0, otherwise. The minimum of this plot is nothing, but the x position of the first minimum. Likewise, had we assigned
    min_pos_x = GPVAL_DATA_Y_MAX

    that would have given the position of the last minimum of the data file. Obviously, these distinctions make sense only, if there are more than one minimum or maximum. Knowing the x and y positions of the minimum and maximum, we can easily set the arrows. We, thus, have the following figure


    Adding labels showing the value should not be a problem now.

    Well, this is for today. Till next time!

    0 0
  • 11/22/09--11:04: Update
  • I have had some time, so I moved all recent posts to their permanent place on
    my web page
    . I have "sexed up" the homepage a bit, so, hopefully, browsing will be a tad easier. Let me know, if there are any problems! (I know that there is a small glitch with the cascaded style sheets on IE6. IE8 should work without problems. Firefox 3.5 is also OK.) There is a zipped version of the complete site, if you want to read it off-line.
    Comments should still be posted here, please!



    Cheers,
    Zoltán

    0 0
  • 11/29/09--10:23: Broken histograms
  • Sometimes, a histogram is just a bit awkward, for the simple reason that one or two values are extremely high compared to the rest of the graph. In the case of a standard graph, we would use a broken axis to bring all points to the same order of magnitude. We can play the same trick with histograms, in fact, it is, in some sense, even simpler, than the broken axes. All we have to do is to plot a thick line at the proper position in the proper colour. This is the graph that we are going to make today


    Our data file, brokenhist.dat, is as follows
    "Jan" 1 2
    "Feb" 44 4
    "Mar" 3 1
    "Apr" 2 25
    "May" 4 5
    "June" 2 1
    and here is our script:
    reset
    blue = "#babaff"
    set xrange [-0.5:5.5]
    set yrange [0:11]
    set isosample 2, 100
    set table 'brokenhist_b.dat'
    splot 1-exp(-y/2.0)
    unset table

    unset colorbox
    set border 1
    set xtics rotate by 45 nomirror offset 0, -1
    unset ytics
    f(x) = (x < 6 ? x : (x < 30 ? x-17 : x-35) )
    g(x) = (x < 6 ? 1/0 : 6)
    set boxwidth 0.85
    set style fill solid 0.8 border -1
    set style data histograms
    set palette defined (0 "#ffff ff", 1 "#babaff")
    plot 'brokenhist_b.dat' w ima,\
    'brokenhist.dat' u (f($2)) t 'Red bars',\
    '' u (f($3)) lc rgb "#00bb00" t 'Green bars', \
    '' u 0:(f($2)):2 w labels center offset 0,0.5 t '',\
    '' u ($0+0.25):(f($3)):3 w labels center offset 0,0.5 t '',\
    '' u 0:(-1):xticlabel(1) w l t '', \
    '' u ($0+0.12):(g($3)+0.12):(0.25):(0.25) w vectors lt -1 lc rgb blue lw 5 nohead t '', \
    '' u ($0-0.12):(g($2)+0.12):(0.25):(0.25) w vectors lt -1 lc rgb blue lw 5 nohead t ''
    OK, so let us look at the code! The first couple of lines are required only, if you want to have some posh background. Likewise, you can drop the 'unset colorbox' line, when you have a white background. We set only the bottom axis, which means that we have to unset the ytics and set to xtics to nomirror. Then we have two helper functions. The definitions of these depend on where you want to have the break point in the histogram. In this particular case, I took 6, but it is arbitrary.

    In the next step, we set the properties of the histogram, like the width of the columns, the fill style, and the data style. We also define a palette, but this is needed for the background only. For white background, you can skip this step. You can also skip the first plot, because that is nothing but our fancy background.

    The actual plotting begins after this. We plot the two sets of columns, and also plot the data file with labels. The labels are placed at the top of each column (this is why we could do away with the yaxis.) We also 'plot' the axis labels, and finally, plot the two break points. Note that the plotting of the break points is automatic, once we have the definitions of the two helper functions. If you want to have a steeper cut, you could

    '' u ($0+0.12):(g($3)+0.12):(0.25):(0.5) w vectors lt -1 lc rgb blue lw 5 nohead t ''
    e.g., which stretches the vectors in the vertical direction. Otherwise, we have finished the plot, there is nothing else to do.
    I should point out here that, in order to have a seamless cut, we have to use a colour for the vectors that is identical to the background at that particular point. This implies that we could not have a gradient at y=6. The background colour is virtually constant at y=6 (c.f. the definition of 'brokenhist_b.dat'. While it would not be impossible to implement a cut over a gradient, I believe, it is probably not worth the trouble it involves.

    0 0
  • 12/01/09--11:07: Restricting fit parameters
  • Chris asked an interesting question today, namely, how one can restrict the fit range in gnuplot. What he meant by that was not the range of the data points (that is really easy, the syntax is the same as for plot), but the range of fit parameters. In some cases, it is a quite reasonable requirement, because we might know from somewhere that certain parameter values just do not make any sense. As it turns out, it is rather easy to achieve this in gnuplot. All we have to do is to come up with a function that restricts its values in the desired range.

    After this interlude, let us see an example! We will create some data with the following gnuplot script:

    reset
    a=1.0; b=1.0; c=1.0
    f(x) = a*exp(-(x-b)*(x-b)/c/c)
    set table 'restrict.dat'
    plot [-2:4] f(x)+0.1*(rand(0)-0.5)
    unset table

    We take a Gaussian, with some noise added to it. Naturally, we would like to fit a Gaussian to this data, and in particular, f(x). But what, if our model is such that 'a' must be in the range [1.1:2], 'b' must be in the range [0.1:0.9], and 'c' must be in the range [0.5:1.5]? We just use in our fit, instead of f(x), another function, g(x), say, of the form
    g(x) = A(a)*exp(-(x-B(b))*(x-B(b))/C(c)/C(c))
    where A(a), B(b), and C(c) take care of our restrictions. These functions are somewhat arbitrary, but for better or worse, I will take the following three arcus tangents
    # Restrict a to the range of [1.1:2]
    A(x) = (2-1.1)/pi*(atan(x)+pi/2)+1.1

    # Restrict b to the range of [0.1:0.9]
    B(x) = (0.9-0.1)/pi*(atan(x)+pi/2)+0.1

    # Restrict c to the range of [0.5:1.5]
    C(x) = (1.5-0.5)/pi*(atan(x)+pi/2)+0.5
    which would look like this

    The point here is that as x runs from negative infinity to positive infinity, A(x) runs between 1.1, and 2.0, and likewise for B(x), and C(x). Then the fit goes on as it would normally. Our script is, thus, the following in its full glory:

    # Restrict a to the range of [1.1:2]
    A(x) = (2-1.1)/pi*(atan(x)+pi/2)+1.1

    # Restrict b to the range of [0.1:0.9]
    B(x) = (0.9-0.1)/pi*(atan(x)+pi/2)+0.1

    # Restrict c to the range of [0.5:1.5]
    C(x) = (1.5-0.5)/pi*(atan(x)+pi/2)+0.5

    a=0.0
    b=0.5
    c=0.9
    fit f(x) 'restrict.dat' via a, b, c

    g(x) = A(aa)*exp(-(x-B(bb))*(x-B(bb))/C(cc)/C(cc))
    aa=1.5
    bb=0.5
    cc=0.9
    fit g(x) 'restrict.dat' via aa, bb, cc

    plot f(x), g(x), 'restrict.dat' w p pt 6

    and it produces the following graph:

    At this point, we should not forget, that what we are interested in is not the value of 'aa', 'bb', or 'cc', but the value of 'a', 'b', and 'c'. This means that what we have to take is
    A(aa), B(bb), and C(cc). If you print the values of 'a', 'b', and 'c' from the fit to f(x), the value of 'aa', 'bb', and 'cc', and the value of A(aa), B(bb), and C(cc), we get the following results
    gnuplot> pr a, b, c
    0.984221984191135 0.996600824504231 1.00765240463672

    gnuplot> pr aa, bb, cc
    -3442408.91578921 1443864.45093385 -0.201236322474146

    gnuplot> pr A(aa), B(bb), C(cc)
    1.10000008322047 0.899999823634477 0.936788734177637

    Obviously, the second print does not make too much sense, we have to compare the last one to the first one. We can here see that we got values in the ranges [1.1:2], [0.1:0.9], and [0.5:1.5], as we wanted to.

    0 0

    I have been waiting for this for a long time, but at long last, it has happened! The new version of gnuplot has been released with the designation 4.4. You can download the binary or the source code from the sourceforge repository. In the future, I will discuss the new things, and show what can be done with them.
    Cheers,
    Zoltán

    0 0

    I have long wanted to write a blog post on this subject, but somehow, I never got the time to do it. But the time has come, and I will do it now. One of the plot styles that I actually like in origin (no matter what I think of the software in overall) is the one where the circumference of a circle is drawn in a colour different to that of the body of the circle, as in this graph


    The only glitch is that this is not supported in gnuplot. Well, this is not completely true, because one can use a new prologue for the postscript output, but that works for postscript only, nothing else. What should we do then? There are two options: one has already been discussed, in connection with the bubble charts, sometime in early September. Nevertheless, that method is feasible only when the points do not overlap. What we did there was to plot the data twice (or however many times). But the problem is that those are actually two different plots, so if the adjacent circles overlap, we will not have what we want. Just try it, if you are still not convinced! The second option is that we keep reading. So, the question is then, how could we trick gnuplot into thinking that we have only one plot, not two. Well, the short answer is that we plot only one plot, not two, and we then make sure that points in the plot are coloured properly. I think it is time to show my script, that should make everything clear.

    reset
    set table 'two.dat'
    plot [0:10] sin(x)+0.1*rand(0), cos(x)+0.1*rand(0)
    unset table

    f(x) = cos(x*pi)
    set cbrange [-1:2]; unset colorbox; set border back

    set palette defined (-1 "#000000", 1 "#ff0000", 2 "#0000ff")

    plot "< gawk '{print $0; print $0}' two.dat" using 1:2:(2+f($0)/5.0):(f($0+1)) index 0 \
    with points pt 7 ps var lt palette title 'red', \
    '' using 1:2:(2+f($0)/5.0):(f($0+1)*2) index 1 \
    with points pt 7 ps var lt palette title 'blue'

    As always, we generate some data. Note, that we plot sin(x) and cos(x), which will be in the same data file, but in two separate data blocks. This will become important later. What we need to know about data blocks is that in the data file they are separated by pairs of blank lines, and that we can ask gnuplot to plot only certain data blocks. We indicate our choice with the use of the index keyword. If you want to know more about this, issue the
    ?index
    command, and look at the data file 'two.dat'!

    OK, so we have some data, if a bit obscure at the moment. Next, we define a funny function. If you look closely, you will realise that f(x) takes the value of -1 for odd, and 1 for even numbers. In principle, any function should do here, but one has got to be a bit careful: those functions that take 1 or -1 at isolated points might not work. If you do not want to dive into the details of this, take my word for it that f(x) defined above will just be perfect for our purposes.

    The next step is the definition of a colour palette with the colour range. (We take off the colour box, too, and set the border to back, but these are irrelevant nuances.) We use the palette for colouring our graph: in the first curve, every other point will be black and red, while in the second curve, every other point will be black and blue. Thus, I have already divulged my trick: we have to duplicate our data set in a way that every line is copied at its position, so, in 'two.dat' we will have something like this
    ...
    9.29293 -0.93128 i
    9.29293 -0.93128 i
    9.39394 -0.978266 i
    9.39394 -0.978266 i
    9.49495 -0.923256 i
    9.49495 -0.923256 i
    ...

    For this we use an external gawk script, but it is really just one line, it could not be any simpler. Once we duplicated our data, we plot it, and colour every second point as black, and every second point as red. Also note that we use the index keyword to choose the data block that we need. If you have only one data block, you can skip this. But not only do we colour the points differently, but we also resize them: after all, if all had the same size, we would not see the ones that are plotted first. For all this machinery, we use the fact that when a 2D plot is given as 4 columns, the size of the points can be determined by the third column, while the fourth column can be assigned to take care of the colour. In this case, the colour is taken from the palette that we defined above. The general form of such a plot is
    plot 'foo' using 1:2:3:4 with points pt 7 ps var lt palette
    where 'var' signifies the variable point size (ps), while palette is the colour. Note that we use our snappy f(x) function to choose the size and the colour of every second point, simply by counting the ordinal number of the particular data record, $0: For even numbers, we set the size 2+1/5, and the colour to black (colour value of -1), while for odd numbers, the size is 2-1/5, and the colour is red (colour value of 0). If you have a single data, the whole script would be simply
    f(x) = cos(x*pi)
    set cbrange [-1:1]; unset colorbox; set border back

    set palette defined (-1 "#000000", 1 "#ff0000")

    plot "< gawk '{print $0; print $0}' two.dat" using 1:2:(2+f($0)/5.0):(f($0+1)) \
    with points pt 7 ps var lt palette title 'red'
    Well, this is it for today. Next time I will try to show some of the new stuff in gnuplot 4.4. Cheers,
    Zoltán

    0 0

    As I promised some time ago, I will discuss some of the new features in gnuplot 4.4. The first one that I would like to show is the concept of iteration in the plot command, and the concept of certain pseudo-files. If you have ever had to script the creation of your plots, you will appreciate these features.

    First, let us see what the for loop looks like in the plot command. There are forms of it: once one can loop through integers, while in the other case, one can step through a string of words. So, the iteration either looks like

    plot for [var = start : end {:increment}]

    or

    plot for [var in "some string of words"]

    After this introduction, let us see how these can be used in real life. The first example that I will show is that of waterfall plots, i.e., when a couple of curves are plotted on the same graph, and they are shifted vertically. This is common practice, when one has several spectra, and wants to show the effect of some parameter on the spectra.

    reset
    f(x,a) = exp(-(x-a)*(x-a)/(1+a*0.5))+0.05*rand(0)
    title(n) = sprintf("column %d", n)
    set table 'iter.dat'
    plot [0:20] '+' using (f($1,1)):(f($1,2)):(f($1,3)):(f($1,4)):(f($1,5)):(f($1,6)) w xyerror
    unset table

    set yrange [0:15]
    plot for [i=1:6] 'iter.dat' u 0:(column(i)+2*i) w l lw 1.5 t title(i)

    I would like to walk through the script line by line, for there is something unusual in almost each line. So, after re-setting the gnuplot session, we define a function, which will be a Gaussian, whose centre and width is determined by the parameter 'a'. We then plot this function to a file, 'iter.dat', and do it 6 times, and each time with a different parameter, so that the Gaussian is shifted, and becomes broadened. Note, however, that we do this by plotting a special file, '+'. This was introduced in gnuplot 4.4, and the purpose of this special file is that by invoking this, one can use the standard plot modifiers even with functions. I.e., we can specify 'using' for a function. The importance of this is that many plot styles require several columns, and we could not use those plot styles with functions without the '+' pseudo-file. Consider the following example

    reset
    unset colorbox
    unset key
    set xrange [0:10]
    set cbrange [0:1]
    plot '+' using ($1):(sin($1)):(0.5*(1.0+sin($1))) with lines lw 3 lc palette, \
    '+' using ($1):(sin($1)+2):($1/10.0) with lines lw 3 lc palette
    which produces the following graph

    We can thus colour our curve by specifying the colour in the third column of the pseudo-file. Of course, this is only one possibility, and there are many more. If one wants to plot 3D graphs, then the pseudo-file becomes '++', but the concept is the same: the two variables are denoted by $1 and $2, and the function is calculated on the grid determined by the number of samples and the corresponding data range.

    Now, back to the iteration loop! We produce 6 columns of data by plotting '+' by invoking a plot style that requires 6 columns. In this case, it is the xyerrorbars. Having created some data, we plot each column, but we call plot only once: the iteration loop does the rest. In each plot, the curve is shifted upwards, and the title is taken from the column number. For specifying the title, we use the function that we defined earlier: it takes an integer, and returns a string. At the end of the day, we have this graph

    This was an example, when we plot various columns from the same file. We can also use the iteration to plot different files. When doing so, there are two options available. One is that we simply specify the file names in a string, as below

    reset
    filenames = "first second third fourth fifth"
    plot for [file in filenames] file using 1:2 with lines
    which will plot files 'first', 'second', 'third', 'fourth', and 'fifth'. At this point, note that 'file' is a string, i.e., we can manipulate it as a string. E.g., if we wanted to, instead of 'first', 'second', etc., plot 'first.dat', 'second.dat', and so on, we would do this as
    reset
    filenames = "first second third fourth fifth"
    plot for [file in filenames] file."dat" using 1:2 with lines

    The second option is, if the data files are numbered, e.g., if we have 'file_1.dat', 'file_2.dat', and so on, we can use the iteration over integers as follows
    reset
    filename(n) = sprintf("file_%d", n)
    plot for [i=1:10] filename(i) using 1:2 with lines
    which will plot the second column versus the first column of 'file_1.dat' through 'file_10.dat'.

    0 0

    Today I would like to discuss another new feature in gnuplot 4.4. This will be the notion of "inline" data manipulation. I don't really know what the proper name would be for this feature, so I will just show what it is.

    Normally, when one plots a function or file, the command has the following structure

    plot 'foo' using 1:2 with line, f(x) with line

    That was the old syntax. In the new version of gnuplot, we can insert arithmetic expressions in the plot command as follows

    f(x) = a*sin(x)
    plot a = 1.0, f(x), a = 2.0, f(x)

    Now, this has some implications. First, one has to be a bit careful, because the arithmetic expressions are separated from the actual function by a comma. However, the 'for' loop that we discussed a week ago, reads statements up to the comma, and then returns to the beginning of the statement. In other words,
    plot for [i=1:10] a = i, f(x)
    will evaluate the expression a = i ten times, and then plots f(x). At that point, the value of 'a' will be 10, therefore, we have only one plot, and that will be 10*sin(x).

    The second implication is that the notion of a function has completely changed. What we do in a plot command now is no longer a mapping of the form
    x -> f(x)
    but rather, the evaluation of a set of instructions, one of which is the above-mentioned mapping. But the crucial point here is that the mapping is not the only allowed statement. The upshot is that "functions" have become a set of operations, and the following statement is completely legal
    f(x) = (a = a+1.0, a*sin(x))
    a = 0.0
    plot for [i=1:10] f(x)

    (It is a complete different question, whether this plot makes any sense...)

    What we should notice is the fact that now a function can have the form of
    f(x) = (statement1, statement2, statement3, return value)
    and when the function is called, statement1, statement2, statement3 are evaluated, and 'return value' is returned. We should not underestimate the significance of this! Many things can be done with this. I will show a few of them below.

    The first thing I would like to dive into is calculating some statistics of a file. Let us see how this works!

    reset
    set table 'inline.dat'
    plot sin(x)
    unset table
    num = 0
    sum = 0.0
    sumsq = 0.0

    f(x) = (num = num+1, sum = sum+x, sumsq = sumsq+x*x, x)
    plot 'inline.dat' using 1:(f($2))

    print num, sum, sumsq
    which will print out
    100 -1.77635683940025e-15 0.295958848441
    We expected this, for the number of samples is 100 by default, and the sum should be 0 in this case.

    So, what about finding the minimum and its position in a data file? This is quite easy. All we have to do is to modify our function definition, and insert a statement that determines whether a value is minimal or not.

    reset
    set table 'inline.dat'
    plot sin(x)
    unset table

    num = 0
    min = 1000.0
    min_pos = 0
    min_pos_x = 0

    f(x,y) = ((min > y ? (min = y, min_pos_x = x, min_pos = num) : 1), num = num+1, y)
    plot 'inline.dat' using 1:(f($1, $2))

    print num, min, min_pos_x, min_pos
    which prints

    100 -0.999385 4.74747 73
    i.e., the minimum is at the 73rd record (we count from 0), at x = 4.74747, and its value is -0.999385. Note that instead of an 'if' statement, we use the ternary operator to decide whether min, min_pos_x, and min_pos should be updated.

    The implementation of calculating the standard deviation, e.g., should be trivial:
    sum = 0.0
    sumsq = 0.0
    f(x) = (num = num+1, sum = sum + x, sumsq = sumsq + x*x, x)
    plot 'inline.dat' using 1:(f($1))

    print num, sqrt(sumsq/num - (sum/num)*(sum/num))
    We have thus seen how the "inline" arithmetic can be used for calculating quantities, e.g., various moments, minima/maxima and their respective positions. These involve the sequential summing or inspection of the data set. But this trick with the function definition can be used for back-referencing, too. This is what we will discuss next.

    The trick is to use a construct similar to this

    backshift(x) = (prev = pres, pres = x, prev)

    which will store the last but one value in the variable 'prev', and return it. That is, the following code shift the whole curve to the right by one
    reset
    set table 'inline.dat'
    plot sin(x)
    unset table

    pres = 0.0

    backshift(x) = (prev = pres, pres = x, ($0 > 0 ? prev : pres))
    plot 'inline.dat' using 1:(backshift($2)) with line, '' u 1:2 with line
    (In cases like this, we always have to decide what to do with the first/last data record. In this particular case, I opted for duplicating the first record, - this is what happens in the ternary operator - but this is not the only possibility.) If, for some reason, you have to shift the curve by more, you do the same thing, but multiple times. E.g., the following code shifts by 3 places.

    backshift(x) = (prev1 = prev2, prev2 = prev3, prev3 = x, prev1)

    Once we have this option of back-referencing, we should ask the question what it can be used for. I show two examples for this.
    The first example is drawing arrows along a line given by the data set. Drawing arrows one by one is done by using
    set arrow from x1,y1 to x2,y2
    but we have to use a different method, if we want to plot the arrows from a file. Incidentally, there is a plotting style, 'with vectors', that works as
    plot 'foo' using 1:2:3:4 with vectors
    where the first two columns specify the coordinates of the beginning, and the second two columns specify the relative coordinates of the vectors. So, it works on four columns. What should we do, if we want to plot vectors from the points in a file. Well, we use the back shift that we defined above. Our script is as follows:
    reset
    unset key
    set sample 30
    set table 'arrowplot.dat'
    plot [0:3] sin(x)+0.2*rand(0)
    unset table

    px = NaN
    py = NaN
    dx(x) = (xd = x-px, px = ($0 > 0 ? x : 1/0), xd)
    dy(y) = (yd = y-py, py = ($0 > 0 ? y : 1/0), yd)

    plot 'arrowplot.dat' using (px):(py):(dx($1)):(dy($2)) with vector
    which results in the following figure:

    Note that we used the ternary operator to get rid of the very first data point. This is needed, because the arrows connect two points, that is, there will be one less arrow, than data points.

    In the second example, we will turn this around. In my post in last August, plotting the recession, I showed how the background of a plot can be changed, based on whether the the curve is dropping, or increasing. Let us take the following script
    reset
    set sample 20
    set table 'inline.dat'
    plot [0:10] exp(-x)+1.0+rand(0)
    unset table

    unset key

    px = 0
    py = 1000
    dx(x) = (xd = x-px, px = x, xd)
    dyn(y) = (yd = y-py, py = y, (yd < 0 ? yd : 1/0))
    dyp(y) = (yd = y-py, py = y, (yd >= 0 ? yd : 1/0))

    plot 'inline.dat' using (px):(py):(dx($1)):(dyp($2)) with vector nohead lt 1 lw 3, \
    px = 0, py = 0, '' using (px):(py):(dx($1)):(dyn($2)) with vector nohead lt 3 lw 3
    which creates the following graph
    First we produce some data; old trick. Then we take our difference functions, in this case, three of them. The first one is identical to that in the previous script. The second and the third are identical, except that the second returns a sensible value, if and only, if the slope is negative, while the third one returns 1/0, if the slope is negative. Then we just plot our data, making sure that we re-initialise px, and py before the second plot. Simple.

    Another utilisation of the back reference can be found on gnuplot's web site, under running averages.

    Next time I will try to go a bit further, and demonstrate some other uses of the inline data processing.
    Cheers,
    Gnuplotter

    0 0

    Some time ago, on the 26th of July, to be more accurate, I showed how somewhat decent-looking maps can be created with gnuplot. With the wisdom of hindsight, that was a rather ugly hack, I must say. Even worse, it seems that it is not quite fail-safe. At least, I have obtained reports complaining about it. Could we, then, do better? Could we, perhaps, throw out that disgusting gawk script, with all the hassle that comes with it? Could we, possibly, manage the whole affair in gnuplot? Sure, we could. And here is how, just keep reading!

    On the 17th of January, we saw that in the new version of gnuplot, functions take on a funny property, namely, they can contain algebraic statements not related to the return value. We also saw that this feature can be used to perform searches of some sort: as we "plot" a file, and step through the numbers in the file, we can assign values to variables, provided that some conditions are fulfilled. It is easy to see that in this way, we can determine the minimum or the maximum of a data file, e.g. But we can do much more than that.

    We should also recall from that old post on the map what the contour file looks like. In case you have forgotten, here is a small section of it

    # Contour 0, label:        2
    -0.391812 3.63636 2
    ...
    -0.959596 3.50978 2

    -0.959596 3.50978 2
    ...
    -0.391812 3.63636 2


    # Contour 1, label: 1.5
    -1.20098 4.51515 1.5
    -1.16162 4.54423 1.5
    -1.15982 4.54545 1.5
    ...

    What we have to realise is the following: first, contours lines belonging to the same level are not necessarily contiguous (this is quite obvious, for there is no reason why they should be), and if there is a discontinuity, it manifests itself in a single blank line in the contour file, and second, contour lines belonging to different levels are separated by two blank lines. So, in the data file above, there is a blank between the lines -0.959596 3.50978 2, and
    -0.959596 3.50978 2, and there are two blanks between -0.391812 3.63636 2, and # Contour 1, label: 1.5. By the way, the third column is the value of that particular contour line.

    This observation has at least one important consequence: we can decide which contour line we want to plot, simply by using the index keyword. You might recall, that indexing the data file pulls out one data block, which is defined by a chunk of data flanked by two blank lines.

    Now, what about the labels, and the white space that they need? Well, the white space is quite easy: what we will plot is not the contour line, but a function, which returns an undefined value at the place of the white space, e.g., this one (whatever eps and xtoy mean)
    f(x,y) = ((x-x0)*(x-x0)+(y-y0)*(y-y0)*xtoy*xtoy > eps ? y : 1/0)
    Normally we would plot the contour lines as
    plot 'contour.dat' using 1:2 with lines
    but instead of this, now we will use this
    plot 'contour.dat' using 1:(f($1,$2)) with lines
    This will leave out those points which are too close to (x0, y0). And the labels? Well, that is not difficult either. Take this function
    lab(x,y) = ( (x == x0 && y == y0) ? stringcolumn(3) : "")
    and this plot
    plot 'contour.dat' using 1:2:(lab($1,$2)) with labels
    This will put the labels at (x0, y0), and even better, we haven't got to set the labels by hand, they are taken from the data file.

    So, we have seen how we can plot the contour, leave out some white space, and then put a label at that position. The only remaining question is how we determine where the label should be. And this is where we come back to our inline functions. For the sake of example, let us take this function and the accompanying plot

    g(x,y)=(((x > xl && x < xh && y > yl && y < yh) ? (x0 = x, y0 = y) : 1), 1/0)
    plot 'contour.dat' using 1:(g($1,$2))
    What on Earth does this plot do? The plot itself does absolutely nothing: it is always 1/0. However, while we are doing this, we set the value of x0, and y0, if the two arguments are not too close to the edge of the plot. This latter condition is needed, otherwise labels could fall on the border, which doesn't look particularly nice.

    By now, we have all the bits and pieces, we have only got to put them together. Let us get down to business, then!

    I will split the script into two: the first produces the dummy data, while the second does the actual plotting. So, first, the data production.
    reset
    filename = "cont.dat"
    xi = -5; xa = 0; yi = 2; ya = 5;
    xl = xi + 0.1*(xa - xi); xh = xa - 0.1*(xa-xi);
    yl = yi + 0.1*(ya - yi); yh = ya - 0.1*(ya-yi);
    xtoy = (xa-xi) / (ya-yi)
    set xrange [xi:xa]
    set yrange [yi:ya]
    set isosample 100, 100
    set table 'test.dat'
    splot sin(1.3*x)*cos(.9*y)+cos(.8*x)*sin(1.9*y)+cos(y*.2*x)
    unset table
    set cont base
    set cntrparam level incremental -3, 0.5, 3
    unset surf
    set table filename
    splot sin(1.3*x)*cos(0.9*y)+cos(.8*x)*sin(1.9*y)+cos(y*.2*x)
    unset table

    What we should pay attention to here is the definition of a handful of variables at the very beginning. Some are already obvious, like xi, xa and the like, and some will become clear in the second part. Now, the plotting takes place here

    reset
    unset key
    set macro
    set xrange [xi:xa]
    set yrange [yi:ya]

    set tics out nomirror
    set palette rgbformulae 33,13,10
    eps = 0.05

    g(x,y)=(((x > xl && x < xh && y > yl && y < yh) ? (x0 = x, y0 = y) : 1), 1/0)
    f(x,y) = ((x-x0)*(x-x0)+(y-y0)*(y-y0)*xtoy*xtoy > eps ? y : 1/0)
    lab(x,y) = ( (x == x0 && y == y0) ? stringcolumn(3) : "")

    ZERO = "x0 = xi - (xa-xi), y0 = yi - (ya-yi), b = b+1"
    SEARCH = "filename index b using 1:(g($1,$2))"
    PLOT = "filename index b using 1:(f($1,$2)) with lines lt -1 lw 1"
    LABEL = "filename index b using 1:2:(lab($1,$2)) with labels"

    b = 0
    plot 'test.dat' with image, \
    @SEARCH, @PLOT, @LABEL, @ZERO, \
    @SEARCH, @PLOT, @LABEL, @ZERO, \
    @SEARCH, @PLOT, @LABEL, @ZERO, \
    @SEARCH, @PLOT, @LABEL, @ZERO, \
    @SEARCH, @PLOT, @LABEL, @ZERO, \
    @SEARCH, @PLOT, @LABEL, @ZERO, \
    @SEARCH, @PLOT, @LABEL, @ZERO, \
    @SEARCH, @PLOT, @LABEL, @ZERO, \
    @SEARCH, @PLOT, @LABEL, @ZERO, \
    @SEARCH, @PLOT, @LABEL, @ZERO, \
    @SEARCH, @PLOT, @LABEL, @ZERO, \
    @SEARCH, @PLOT, @LABEL

    A bit convoluted, isn't it? OK, we will walk through the script, line by line.

    First is the range setting, and then ticks go to the outside, just for aesthetic reasons. We also define eps here, which determines how much "white space" we have for the labels. Then we define the three functions that we discussed above. We have already seen eps, and the meaning of it, but what about xtoy? Despite its name, this is not something to play with, rather the ratio of x to y, or more precisely, the ratio of the xrange to the yrange. This is needed, if the two ranges are of different order of magnitude, e.g., if xrange is something like [0:1], while yrange is [0:1000]. But this ratio is automatically calculated at the beginning, you haven't got to worry about it.

    After this, we define 4 macros. These are abbreviations for longer chunks of code, and make life really easier. The idea is that when confronted with a macro, gnuplot expands it as a string, and then acts accordingly. In my opinion, if written properly, macros can make the script rather readable.

    The first of the macros, ZERO, is needed, because in our SEARCH macro, which is nothing but a call to the function g(x,y), if the condition is not satisfied for a particular data block, then x0, y0 wouldn't be updated, therefore, the label would end up at the wrong position. At the same time, ZERO also increments the value of b, which determines which data block we are actually plotting. b is used in the indexing in the macros SEARCH, PLOT, and LABEL. We have already mentioned SEARCH, PLOT plots the contour with the white space at the position given by x0, and y0 (this is calculated in the SEARCH macro), and finally, LABEL places the value of the contour line at that position.

    At this point, we have defined everything, all that is left is plotting. We do it 13 times, because our zrange, or the contour lines were given between -3, and 3, with steps of 0.5. In this particular case, there are only 10 contour lines, and gnuplot will complain that the last 3 data blocks are empty, but this is not an error, only a warning. Shouldn't we look at the figure, perhaps? But of course! Here it is:



    The only thing that I should like to point out is that the white space is made for a particular contour line, but there is no guarantee that, if the contour lines are too close to each other, the label does not cover a neighbouring contour line. If that happens, I would simply suggest to increase the contour spacing by incrementing the parameter in the set cntrparam line.

    I hope that this method proves better, than the other one, and that it will be easier to use. In the next post, I will re-visit the inline functions, and show a nifty trick with them. Cheers,
    Gnuplotter

    0 0

    Today I would like to touch on a vast subject, so prepare for a long post. However, I hope that the post will be worthwhile, for I want to discuss something that cannot be done in any other way. In due course, we will see how we can use gnuplot to create parametric plots from a file. What I mean by that is the following: if you want to plot, say, 10 similar objects, whose size is determined by the first column in a file. Of course, there are cases, when one can manipulate the size, e.g., if there is a pre-defined symbol, we can use one of the columns in a file to determine the size of the symbol. As an example, we can do this

    plot 'foo' u 1:2:3 with point pt 6 ps var
    which will draw circles whose radius is given by the third column in 'foo'. This is all well, but it works for a limited number of cases only, namely, when there is a symbol to start out with. But what happens, if we want to draw an object that is not a symbol, e.g., arcs of a circle, whose angle is given by one of the columns in a file, or cylinders, whose height is a variable, read from a file. As you can guess from these two suggestions, what we will do is to draw a pie, and a bar chart. I understand that we have done this a couple of times before, but this time, we will stay entirely in the realm of gnuplot, and the scripts are really short. We just have to figure out what to write in the scripts. But beyond this, I will also show how we can plot in 6 dimensions. We will plot ellipses on a plane (first 2 columns), whose two axes are given by the 3rd and 4th axis, the orientation by the 5th, and the colour by the 6th. If you are really pressed for it, you can add three more dimensions: if you draw ellipsoids in 3D, take all three axes from a file, and also the orientation, that would make 9 dimensions altogether. Quite a lot!

    So, let us get down to business! The first thing that I would like to discuss is the evaluate command. This is a really nifty way of shortening repetitive commands. Let us suppose that we want to place 10 arrows on our graph, and only the first coordinate of the arrows changes, otherwise everything is the same. Setting one arrow would read as follows
    set arrow from 0, 0 to 1, 1
    Of course, there are quite a few settings that we could specify, but this was supposed to be a minimal example. Then, the next arrow should be
    set arrow from 1, 0 to 2, 1
    and so on. What if we do not want to write this line a thousand times, and we do not want to search for the coordinate that we are to change, the first one, in this case? We could try the following
    a(x) = sprintf("set arrow from %d, 0 to %d, 1", x, x+1)
    This function takes 'x', and returns a string with all the settings and coordinates. So, we are almost done. The only thing we should do is to make gnuplot understand that what we want it to treat a(x) as a command, not as a string. Enter the eval command: it takes whatever string is presented to it, and turns it into a command. Thus, the following script creates 5 arrows, all parallel to each other, and consecutively shifted to the rigth
    a(x) = sprintf("set arrow from %d, 0 to %d, 1", x, x+1)
    eval a(0)
    eval a(1)
    eval a(2)
    eval a(3)
    eval a(4)
    I believe, this is a much simpler and cleaner procedure, than this
    set arrow from 0, 0 to 1, 1
    set arrow from 1, 0 to 2, 1
    set arrow from 2, 0 to 3, 1
    set arrow from 3, 0 to 4, 1
    set arrow from 4, 0 to 5, 1
    I should mention here that if chunks of a command are the same, another method of abbreviating them is to use macros. Those are disabled by default, so first we have to set it. Then it works as follows
    set macro
    ST = "using 1:2 with lines lt 3 lw 3"
    plot 'foo' @ST, 'bar' @ST
    i.e., the term @ST is expanded using the definition above, therefore, this plot is equivalent to this one
    plot 'foo' using 1:2 with lines lt 3 lw 3, 'bar' using 1:2 with lines lt 3 lw 3
    but the previous one is much more readable. I would also say that using capitals for the macros is probably not a bad idea, because then they cannot be mistaken for standard gnuplot commands. This much in the way of macros!

    So, we have the evaluate command, and we have a new concept for functions. Then let us take a closer look at the following code
    a(x) = sprintf("set arrow from %d, 0 to %d, 1;\n", x, x+1)
    ARROW = ""
    f(x) = (ARROW = ARROW.a(x), x)
    plot 'foo' using 1:(f($1))
    and let us suppose that our file 'foo' contains the following 5 lines
    1
    3
    5
    7
    9
    After plotting 'foo', the string 'ARROW' will be the following
    set arrow from 1, 0 to 2, 1;
    set arrow from 3, 0 to 4, 1;
    set arrow from 5, 0 to 6, 1;
    set arrow from 7, 0 to 8, 1;
    set arrow from 9, 0 to 10, 1;
    I.e., we have a string, which contains instructions for setting 5 arrow. If, at this point, we simply evaluate this string, all 5 arrows will be set. Therefore, we have found a way of using a file to set the coordinates of an arrow. (N.B., if it was for the arrows only, we wouldn't have had to do anything, since there is a plotting style, 'with vector', as we discussed some weeks ago.)

    We will use this trick to create a parametric plot, taking parameter values from a file, first plotting the ellipses! Again, we have got to create some dummy data, and since we now need 6 columns, we will use the errorbars

    reset
    f(x) = rand(0)
    set sample 50
    set table 'ellipse.dat'
    plot [0:10] '+' using (20*f($1)):(20*f($1)):(f($1)):(f($1)):(3.14*f($1)):(f($1)) w xyerror
    unset table
    which will produce 6 columns and 50 lines. Having produced some data, let us see what we can do with it. Here is our script:

    PRINT(x, y, a, b, alpha, colour) = \
    sprintf("%f+v*(%f*cos(u)*cos(%f)-%f*sin(u)*sin(%f)),
    %f+v*(%f*cos(u)*sin(%f)+%f*sin(u)*cos(%f)),
    %f with pm3d", x, a, alpha, b, alpha, y, b, alpha, b, alpha, colour)
    PLOT = "splot "
    num = -1
    count(x) = (num = num+1, 1)
    g(x) = (PLOT = PLOT.PRINT($1, $2, $3, $4, $5, $6), \
    ($0 < num ? PLOT=PLOT.sprintf(",\n") : 1/0))
    plot 'ellipse.dat' u 1:(count($1))
    plot 'ellipse.dat' using 1:(g($1))

    unset key
    set parametric
    set urange [0:2*pi]
    set vrange [0:1]
    set pm3d map
    set size 0.5, 1
    eval(PLOT)

    First, we have the definition of a print function that looks rather ugly, but is quite simple. We want to plot
    a*v*cos(u), b*v*sin(u), colour
    where a, and b are the axes of the ellipse, and colour is going to specify, well, its colour. However, we want to translate the ellipse to its proper position, and we also want to rotate it by an amount given by the 5th column, so we have to apply a two-dimensional rotation on the object. Therefore, we would end up with a function similar to this
    x+v*(a*cos(u)*cos(alpha)-b*sin(u)*sin(alpha)), y + v*(a*cos(u)*sin(alpha)+b*sin(u)*cos(alpha)), colour
    Now you know why that print function looked so complicated! After this, we define a string, PLOT, that we will expand as we read the file. But before that, we have to count the lines in the file. The reason for that is that successive plots must be separated by a comma, but there shouldn't be a comma after the last plot. So, we just have to know where to stop placing commas in our string. Then we define the function that does nothing useful, but concatenates the PLOT string as it reads the file. Here we use the number of lines that we determine in a dummy plot. At this point we are done with the functions, all we have to do is plotting.
    First we count, then plot g(x). At this point, we have the string that we need. We only have to set up our plot. Remember, we have a parametric plot, where the range of one of the variables is in [0:2*pi], while the other one is in [0:1]. Easy. Then we just have to evaluate our plot string, and we are done. Look what we have made here: a six-dimensional plot!


    I think that this script is much less complicated, than many that we have discussed in the past. Short and clear, thanks to the eval command, and the new concept of functions. Besides, we pulled off a trick that was impossible by other means. I started out saying that we will create bars and pie. I believe, having seen the trick, it should be quite simple now, but in case you insist on seeing it, I will discuss it in my next post.

    0 0

    As I promised yesterday, we will take a closer look at the pie chart, once more, and see how we can utilise what we have learnt recently. I should point out here, that this is not the only way of plotting a pie from a file. If you feel like building your gnuplot from source, you can check out either the CVS tree, or the patch tracker, where you can find a patch that makes it possible to plot slices. You can see a demo here. But we will try a different route here.

    First, here is our data file (it could be anything, really)

    1 1 Dolphins
    2 1 Whales
    2 0 Sharks
    3 0 Penguins
    4 1 Kiwis
    5 0 Tux

    and here is our script
    reset
    unset key; set border 0; unset tics; unset colorbox; set size 0.6,1
    set urange [0:1]
    set vrange [0:2*pi]
    set macro
    sum = 0.0
    ssum = 0.0
    n = 0
    PLOT = "splot 0, 0, 1/0 with pm3d"

    count(x) = (ssum = ssum + $1, 1)
    g(x,y,n) = \
    sprintf(", \
    u*cos((%.2f+%.2f*v)/ssum), \
    u*sin((%.2f+%.2f*v)/ssum), \
    %d @PL", x, y, x, y, n)

    f(x) = (PLOT = PLOT.g(2*pi*sum, x, n), sum = sum+x, n = n + 1, x)

    plot 'new_pie.dat' u 1:(count($1)), '' u 1:(f($1))

    PL = "with pm3d"
    set parametric; set pm3d map;
    eval(PLOT)

    There is really nothing that we haven't discussed before: we set a couple of things at the beginning, but most importantly, the macro, and sum, ssum, and n. Then we define a string, PLOT, and two functions. One is to sum the values in our file (we need this, so that we can scale the full range of angles to two pi), and another one, that writes our PLOT command for later use. Note that the first plot in PLOT is actually empty, we plot 1/0. This seems a bit silly, doesn't it? Well, it does, but there is a good reason: successive plots must be separated by a comma, and if we have an empty plot at the very beginning, then we can put the commas before the plots, not after, and in this way, we needn't keep track of which plot we are actually processing. Remember, yesterday we used a separate counter, and an if statement, to determine, whether we need the comma, or not. This we can avoid here.
    Next we call the two dummy plots, and finally, we evaluate our PLOT string. Oh, no! At the very end, we marvel in awe at the figure that we produced.

    So far, so good, but what if we wanted to add labels, e.g., the value of the slice? That is really easy. All we have to do is to define a function that produces the label. Here is our updated script
    reset
    unset key; set border 0; unset tics; unset colorbox; set size 0.6,1
    set urange [0:1]
    set vrange [0:2*pi]
    set macro
    sum = 0.0
    ssum = 0.0
    n = 0
    PLOT = "splot 0, 0, 1/0 with pm3d"
    LABEL = ""

    count(x) = (ssum = ssum + $1, 1)
    g(x,y,n) = \
    sprintf(", \
    u*cos((%.2f+%.2f*v)/ssum), \
    u*sin((%.2f+%.2f*v)/ssum), \
    %d @PL", x, y, x, y, n)

    lab(alpha, x) = sprintf("set label \"%s\" at %.2f, %.2f; ", \
    x, 1.2*cos(alpha), 1.2*sin(alpha))

    f(x) = (PLOT = PLOT.g(2*pi*sum, x, n), \
    LABEL = LABEL.lab(2*pi*sum/ssum+pi*x/ssum, sprintf("%2.f", x)), \
    sum = sum+x, n = n + 1, x)

    plot 'new_pie.dat' u 1:(count($1)), '' u 1:(f($1))

    PL = "with pm3d"
    set parametric; set pm3d map; set border 0; unset tics; unset colorbox;
    set size 0.6,1
    eval(LABEL)
    eval(PLOT)
    We have an addition string, LABEL, which we initialise with the value "". Then we define a function that prints "set label ..." with the proper positions, and finally, we insert this function in the definition of f(x). Of course, once we called f(x) in the plot, we have to evaluate the string LABEL. So, this is what we get

    This script can trivially be modified to print strings that are stored in our file. Watch just the following two lines
    ...
    lab(alpha, x) = sprintf("set label \"%s\" at %.2f, %.2f centre; ", \
    x, 1.2*cos(alpha), 1.2*sin(alpha))

    f(x) = (PLOT = PLOT.g(2*pi*sum, x, n), \
    LABEL = LABEL.lab(2*pi*sum/ssum+pi*x/ssum, stringcolumn($3)), \
    sum = sum+x, n = n + 1, x)
    ...
    and we are done. Here is the new pie

    Now, you might wonder why we had three columns in our data file, if we didn't want to use it. Well, perhaps, we wanted, just haven't got time till now. So, what could we do with those ones and zeros in the second column? We will make the pie explode! It is really simple, we have to modify two lines in our last script

    ...
    g(x,y,n,dx,dy) = \
    sprintf(", \
    %.2f+u*cos((%.2f+%.2f*v)/ssum), \
    %.2f+u*sin((%.2f+%.2f*v)/ssum), \
    %d @PL", 0.2*dx, x, y, 0.2*dy, x, y, n)

    lab(alpha, x, r) = sprintf("set label \"%s\" at %.2f, %.2f centre; ", \
    x, (1.25+0.2*r)*cos(alpha), (1.25+0.2*r)*sin(alpha))

    f(x) = (PLOT = PLOT.g(2*pi*sum, x, n, \
    $2*cos(2*pi*sum/ssum+pi*x/ssum), \
    $2*sin(2*pi*sum/ssum+pi*x/ssum)), \
    LABEL = LABEL.lab(2*pi*sum/ssum+pi*x/ssum, stringcolumn($3), $2), \
    sum = sum+x, n = n + 1, x)
    ...
    And here is the pie, when exploded

    I should mention here that if you are not happy with the colours, it is really easy to help it: all we have to do is to modify the colour palette, using whatever colour combinations. We have covered a lot of material today. Till next time,
    Gnuplotter

    0 0

    We have seen in the last couple of posts that with the new concept of functions, quite a few interesting effects can be achieved. Today I would like to show a trick that solves a problem that I discussed some time ago, when we made shiny histograms using a for loop in gnuplot. We will do the same thing here, but in two lines only. It is quick, and the results are just as good as in that case.

    So, here is my data file

    1
    3
    4
    2
    3
    5
    2
    which I will just name as 'bar.dat', and here is our script
    reset
    unset key
    set style fill solid 1.0
    set yrange [0:6]

    colour = "#080000"
    f(x,n) = (colour = sprintf("#%02X%02X%02X", 128+n/2, n, n), x)
    w(n) = 0.8*cos(n/230.0*pi/2.0)

    plot for [n=1:230:2] 'bar.dat' u 0:(f($1,n)):(w(n)) with boxes lc rgbcolor colour
    Simple enough, let us see what it does! The first four lines are just the usual settings, although, the yrange is really irrelevant. I set it only for aesthetic reasons (otherwise, gnuplot would set the yrange automatically to [1:5] for the data file above, and we wouldn't see one of the columns). Then we define a variable called 'colour'm which we will eventually overwrite in our function definition of f(x,n). f(x,n) returns x, thus, in this regard it would be absolutely useless, but when doing so, it actually prints a string to 'colour'. The next function is w(n), which will determine in what fashion our colour will converge to white.
    Finally, we plot the data file some 115 times, each time with a smaller, and shinier box. At the end, we get something like this

    We can very easily change the direction of the light. All we have to do is define a new function that shifts the bars as we progress with our for loop. So, the new script could be something like this
    reset
    unset key
    set style fill solid 1.0
    set yrange [0:6]

    colour = "#080000"
    f(x,n) = (colour = sprintf("#%02X%02X%02X", 128+n/2, n, n), x)
    w(n) = 0.8*cos(n/230.0*pi/2.0)
    shift(x,n) = x-0.8*n/850.0

    plot for [n=1:230:2] 'bar.dat' u (shift($0,n)):(f($1,n)):(w(n)) with boxes lc rgbcolor colour
    with a result as in this graph
    It should be really easy to modify the script to accommodate more data sets. Well, this is for now. I don't actually know what I will write about next time, but I am sure that there will be something!
    Cheers,
    Zoltán

    0 0
  • 03/16/10--14:02: Defining new symbols
  • Some time ago, I showed a method with which we could add a "frame" to a symbol. If you recall, what we did was to plot everything twice, and in order to duplicate our data set, we used a simple gawk script. Now, there is another way of doing this, one which does not rely on the gawk script, in fact, on any external script. I will discuss this method today. The gist of the trick is discussed in the old post, therefore, you are encouraged to cast, at least, a cursory glance at that, if you haven't yet done it.

    As I have already pointed out, we had to duplicate our data set. To be more accurate, we haven't got to duplicate anything, we have simply got to plot the data twice. Now, the difficulty is that is we do this in a primitive way, issuing the plot command twice, and taking the same data set, the points might overlap, and leads to some undesired results. So, the task is to plot the data set twice, but to plot each plot twice, and not the data set as a whole. For this, we will use the for loop introduced in gnuplot 4.4, and the 'every' keyword. To cut a long story short, I give my script here, and discuss it afterwards.

    reset 
    plot 'new_symbol1.dat' u 0:2
    red_n = GPVAL_DATA_X_MAX

    plot 'new_symbol2.dat' u 0:2
    blue_n = GPVAL_DATA_X_MAX

    plot 'new_symbol3.dat' u 0:2
    green_n = GPVAL_DATA_X_MAX

    parity(n) = (n/2.0 == int(n/2.0) ? 0 : 1)
    size(n) = 2 - parity(n)*0.4
    colour(n,r,g,b) = sprintf("#%02X%02X%02X", parity(n)*r, parity(n)*g, parity(n)*b)

    unset key
    set border back
    plot for [n=0:2*red_n+1] 'new_symbol1.dat' using 1:2 \
    every ::(n/2)::(n/2) with p pt 7 ps size(n) lc rgb colour(n,255,0,0) ,\
    for [n=0:2*blue_n+1] 'new_symbol2.dat' using 1:2 \
    every ::(n/2)::(n/2) with p pt 9 ps size(n) lc rgb colour(n,100,100,255) ,\
    for [n=0:2*green_n+1] 'new_symbol3.dat' using 1:2 \
    every ::(n/2)::(n/2) with p pt 5 ps size(n) lc rgb colour(n,0,150,0)
    Then, let us see what we have here! The first 6 lines are only to retrieve the number of data points in our data sets. If you know this from somewhere else, you can skip these, with the caveat that 'red_n', 'blue_n', and 'green_n' should still be defined somewhere.

    Next we define three functions, the first of which determines the parity of an integer, returning 1, if the number is odd, and 0, if it is even. The second function returns a number, depending on the parity of its argument. Surprising as it is, this function will determine the size if the symbol, when we plot. Finally, the third function returns a string, which is equal to the colour given by the triplet (r,g,b), if the first argument, 'n', is odd, and black, if the first argument is even. At this point, it should be clear that we could have defined a function that returns a different colour for even numbers.

    We are done with everything, but the plotting, so let us do that! As you see, for each data set, we step through the numbers, but not once, but twice: first plotting in black, and second, plotting with some decent colour. At the same time, we change the symbol size, so that the black symbols are always a bit bigger, than the red, blue, or green. Once all three plots have been called, the following graph will appear:

    We can see that the symbols overlap each others, as they should. Now, what about the keys, should we need them? Well, that requires some handwork, but it is not hard, actually. The following self-explanatory script should do
    set label 1 'Red symbols' at 1.3, 8 left
    plot for [n=0:2*red_n+1] 'new_symbol1.dat' using 1:2 \
    every ::(n/2)::(n/2) with p pt 7 ps size(n) lc rgb colour(n,255,0,0), \
    n=0, '-' using 1:2 with p pt 7 ps size(n) lc rgb colour(n,255,0,0), \
    n=1, '-' using 1:2 with p pt 7 ps size(n) lc rgb colour(n,255,0,0)
    1 8
    e
    1 8
    e

    and this produces the following figure

    0 0
  • 03/16/10--16:01: Bubble plots
  • Yesterday, I discussed a method for adding an edge to an arbitrary symbol. If you recall (or roll down on this page), the idea was to trick gnuplot into plotting our data file twice, but in a way that each point was plotted twice in succession. Now, what if we plotted more times? There was really nothing special about the number 2, so there is no reason why we could not do this. But if we can, then we should, and see what comes out of it. With very small modifications, our script from yesterday can be turned into a bubble graph, like this



    So, let us see how the machinery works!

    reset
    plot 'new_bubble1.dat' u 0:2
    red_n = GPVAL_DATA_X_MAX

    plot 'new_bubble2.dat' u 0:2
    blue_n = GPVAL_DATA_X_MAX

    plot 'new_bubble3.dat' u 0:2
    green_n = GPVAL_DATA_X_MAX

    rem(x,n) = x - n*(x/n)
    size(x,n) = 3*(1-0.8*rem(x,n)/n)
    c(x,n) = floor(240.0*rem(x,n)/n)
    red(x,n) = sprintf("#%02X%02X%02X", 255, c(x,n), c(x,n))
    blue(x,n) = sprintf("#%02X%02X%02X", c(x,n), c(x,n), 255)
    green(x,n) = sprintf("#%02X%02X%02X", c(x,n), 255, c(x,n))

    posx(X,x,n) = X + 0.03*rem(x,n)/n
    posy(Y,x,n) = Y + 0.03*rem(x,n)/n

    unset key
    set border back
    level = 40
    plot for [n=0:level*(red_n+1)-1] 'new_bubble1.dat' using (posx($1,n,level)):(posy($2,n,level)) \
    every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb red(n,level) , \
    for [n=0:level*(blue_n+1)-1] 'new_bubble2.dat' using (posx($1,n,level)):(posy($2,n,level)) \
    every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb blue(n,level) , \
    for [n=0:level*(green_n+1)-1] 'new_bubble3.dat' using (posx($1,n,level)):(posy($2,n,level)) \
    every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb green(n,level)
    Again, the first three plots are there for determining the sample size, and nothing more. We, thus, start out with a number of function definitions. The first one is a remainder function, the second one uses the remainder to return the size of the bubble, the third one is a simple helper function, returning values between 0 and 240, and red, blue, and green determine the colour of our bubbles. If you look carefully, you will notice that these colours are successively whiter as the remainder increases. Finally, again by making use of our remainder function, we define two position shifts: in order to give the impression that the bubbles are lit from the top right corner, we have to shift successive circles in that direction. The value of this shift is important in the sense that, if chosen too high, the circles belonging to the same data point will no longer cover each other. (This is not necessary a tragedy, see below.)

    Then we decide to have 40 colour levels (we could have anything up to 255, although it might be a bit time consuming and unnecessary), and call our plots. The structure is the same as it was yesterday: we use a for loop for each data set, move the circles a bit, and set the colours to whiter shades. That is all.

    Now, what happens, if we take too big a value for the shift? This, actually, might lead to interesting effects, as shown in this graph, where droplets represent the data points.




    After having seen the simplest implementation, we should ask whether it is possible to add some decorations. E.g., whether it is possible to add a thin black edge to the symbols. It is relatively simple, as the following script shows. We only have to re-define some of our functions as follows
    size(x,n) = (rem(x,n) == 0 ? 3.3 : 3*(1-0.8*rem(x,n)/n))
    c(x,n) = floor(240.0*rem(x,n)/n)
    red(x,n) = (rem(x,n) == 0 ? "#000000" : sprintf("#%02X%02X%02X", 255, c(x,n), c(x,n)))
    blue(x,n) = (rem(x,n) == 0 ? "#000000" : sprintf("#%02X%02X%02X", c(x,n), c(x,n), 255))
    green(x,n) = (rem(x,n) == 0 ? "#000000" : sprintf("#%02X%02X%02X", c(x,n), 255, c(x,n)))

    posx(X,x,n) = (rem(x,n) < 2 ? X : X + 0.03*rem(x,n)/n)
    posy(Y,x,n) = (rem(x,n) < 2 ? Y : Y + 0.03*rem(x,n)/n)
    All these functions do is to check whether we are plotting the first round, and if so, set the colour to black. There is a small difference in the shifts, for we do not move the circles, if they are in the first or the second round. The reason is obvious, as is the result

    OK, so we can plot bubbles, with or without black circumference, but we would also like to add a legend. Well, that is simple, in fact, nothing could be simpler. Just add the following the following three lines to our code

    set label 1 'Red bubbles' at 9,6 left
    set label 2 'Blue bubbles' at 9,5 left
    set label 3 'Green bubbles' at 9,4 left
    and the following six
    for [n=0:level-1] 'new_bubble1.dat' using (posx(8.5,n,level)):(posy(6,n,level)) \
    every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb red(n,level) , \
    for [n=0:level-1] 'new_bubble2.dat' using (posx(8.5,n,level)):(posy(5,n,level)) \
    every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb blue(n,level) , \
    for [n=0:level-1] 'new_bubble3.dat' using (posx(8.5,n,level)):(posy(4,n,level)) \
    every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb green(n,level)
    and we are done! All we do here is to plot our data files in a silly way: we plot a single point at (8.5,6), (8.5,5), and (8.5,4). The plotting of the data file does not happen in this sense, we use it for convenience's sake only. (This trick can also be used for the post from yesterday.) There, you have it!


    0 0

    The other day, I would have needed a couple of curved arrows on my plot, so I started to work out a method to get what I wanted. This, however, turned out to be rather interesting, so I thought that I would share the details with you.

    First, we should just define what I mean by a curved arrow. Perhaps, the easiest way to define it is to show a plot, similar to this



    In gnuplot, when one wants an arrow, one can invoke the following command:
    set arrow from 0,0 to 1,1
    or something similar. This will produce a straight arrow from (0,0) to (1,1). But what if we wanted to have an arrow, which is not straight. Well, in this case, we set a very short arrow, and draw a curve separately. The key to this is to set the arrow in such a way that it is tangential to the curve at the end point. It is easy to see that the following script would just do that

    reset
    unset key
    eps = 0.001

    set style arrow 1 head filled size screen 0.03, 15, 45 lt -1
    cut(x,x1,x3) = ((x >= x1 && x <= x1 + (1.0-eps)*(x3-x1)) ? 1.0 : 1/0)
    f(x) = 0.5+(x-1)*(x-1.2)*(x-1.4)

    x1 = 0.5
    x3 = 1.95
    new_x = x1 + (1.0-eps)*(x3-x1)
    set arrow from new_x, f(new_x) to x3,f(x3) as 1
    plot [0:3] sin(x) with point ps 1 pt 6, f(x)*cut(x,x1,new_x) with line lt -1

    First, we define an arrow style that we will use later. The arrow will be 0.03 screen sizes big, and the two angles determining the shape of the head are 15, and 45 degrees, respectively. Finally, we stipulate that the arrow be black, i.e. linetype -1. Then we define a window function, cut, which depends on the two end points, x1, and x3 (the reason for 3 will become clear soon), and the curve, f(x). In our plot, beyond what we actually want to plot, we will also plot f(x), but only between x1, and new_x, where new_x is a bit off with respect to the second end point. The degree of "bitness" is given by eps, which was defined at the beginning. However, before we actually plot anything, we have got to set the arrow, between new_x, f(new_x), and x3, f(x3). This construction ensures that the arrow is tangential to the curve.
    At this point, we are ready to plot, which we actually execute in the next, and last line.

    What we have created is great, but there are problems: first, we have to define our function, f(x), beforehand, we have to set the arrows by hand, and we also have to add the appropriate lines to our plot command. Quite tedious. There has got to be a better way!

    For the say of example, let us suppose that we want a curved arrow that, say, connects (0,0) and (1,1) via a parabola that passes through the point (0.5, 0.25). If we are really pressed for it, we could do the following: First, we have to figure out the parameters of our parabola. In this case, it is quite easy, for it is nothing but x*x. Then we would draw a parabola between (0,0), and (0.99, 0.9801), and then draw an arrow from (0.99, 0.9801) to (1,1).

    First, let us see, how we figure out the parameters of our parabola! We have two end points, and a "control" point, i.e., we have to solve the following set of equations
    y1 = a*x1*x1 + b*x1 + c
    y2 = a*x2*x2 + b*x2 + c
    y3 = a*x3*x3 + b*x3 + c
    for the unknown a, b, and c. You can convince yourself that the following will do

    denom(x1, x2, x3) = x1*x1*(x2-x3) + x1*(x3*x3-x2*x2) + x2*x3*(x2-x3)
    A(x1,y1,x2,y2,x3,y3) = ( (x2-x3)*y1 + (x3-x1)*y2 + (x1-x2)*y3 ) / denom(x1,x2,x3)
    B(x1,y1,x2,y2,x3,y3) = ( (x3*x3-x2*x2)*y1 + (x1*x1-x3*x3)*y2 + (x2*x2-x1*x1)*y3 ) / denom(x1,x2,x3)
    C(x1,y1,x2,y2,x3,y3) = ( (x2-x3)*x2*x3*y1 + (x3-x1)*x1*x3*y2 + (x1-x2)*x1*x2*y3 ) / denom(x1,x2,x3)
    a = A(x1,y1,x2,y2,x3,y3)
    b = B(x1,y1,x2,y2,x3,y3)
    c = C(x1,y1,x2,y2,x3,y3)

    We have done most of the hard work, the only thing that remains is how we "automate" this whole machinery, i.e., what do we do, if we have several arrows that we want to set. Again, as so many times in the past, we will utilise this new notion of function definition: the fact that a function is not only a x -> f(x) mapping, but this mapping, and a set of possibly unrelated instructions. What we will do is to define a "function" that sets our arrows, and, as the supplementary instruction, augments the plot command accordingly. First, let us take the following function definition


    arrow(x1,y1,x2,y2,x3,y3) = (new_x = x1 + (1.0-eps)*(x3-x1), \
    a = A(x1,y1,x2,y2,x3,y3), b = B(x1,y1,x2,y2,x3,y3), c = C(x1,y1,x2,y2,x3,y3), \
    PLOT = PLOT.sprintf(", cut(x,%f,%f)*(%f*x*x+%f*x+%f) with lines lt -1", x1, x3, a, b, c), \
    ARROW.sprintf("set arrow from %f, %f to %f,%f as 1; ", new_x, a*new_x*new_x + b*new_x + c, x3, y3))
    and try to understand what it does! For a start, it takes 6 arguments, which are nothing but the coordinates of the end points, and the control point. Then, it defines new_x, which we have already seen in the first example. In the next step, based on the 6 input arguments, calculates the three parameters of our parabola, and in the next line, adds the plot of this parabola to a string called PLOT. When adding to PLOT, we simply use the sprintf function. In the last line, we concatenate a string called ARROW, and another one, produced by another sprintf. It is easy to see that this sprintf returns the definition of an arrow between new_x, f(new_x), and x3, f(x3). We should also note that this line is the last line, which consequently means that whatever happens here is returned.

    At this point we are really done, we only have to "populate" our plot. The full script takes on the form
    reset
    unset key
    eps = 0.01

    set style arrow 1 head filled size screen 0.03, 15, 45 lt -1

    cut(x,x1,x3) = ((x >= x1 && x <= x1 + (1.0-eps)*(x3-x1)) ? 1.0 : 1/0)

    denom(x1, x2, x3) = x1*x1*(x2-x3) + x1*(x3*x3-x2*x2) + x2*x3*(x2-x3)
    A(x1,y1,x2,y2,x3,y3) = ( (x2-x3)*y1 + (x3-x1)*y2 + (x1-x2)*y3 ) / denom(x1,x2,x3)
    B(x1,y1,x2,y2,x3,y3) = ( (x3*x3-x2*x2)*y1 + (x1*x1-x3*x3)*y2 + (x2*x2-x1*x1)*y3 ) / denom(x1,x2,x3)
    C(x1,y1,x2,y2,x3,y3) = ( (x2-x3)*x2*x3*y1 + (x3-x1)*x1*x3*y2 + (x1-x2)*x1*x2*y3 ) / denom(x1,x2,x3)

    ARROW = ""
    PLOT = "p [0:3] sin(x) w p ps 1 pt 6"
    arrow(x1,y1,x2,y2,x3,y3) = (new_x = x1 + (1.0-eps)*(x3-x1), \
    a = A(x1,y1,x2,y2,x3,y3), b = B(x1,y1,x2,y2,x3,y3), c = C(x1,y1,x2,y2,x3,y3), \
    PLOT = PLOT.sprintf(", cut(x,%f,%f)*(%f*x*x+%f*x+%f) with lines lt -1", x1, x3, a, b, c), \
    ARROW.sprintf("set arrow from %f, %f to %f,%f as 1; ", new_x, a*new_x*new_x + b*new_x + c, x3, y3))

    ARROW = arrow(0,0,1,1.5,pi/2,1.03)
    ARROW = arrow(0,0,1,0.3,pi/2,0.97)
    eval(ARROW)
    eval(PLOT)
    which would result in the graph shown here:


    Now it is clear what was PLOT: it is nothing, but the actual plot that we want to have. This is the string to which we concatenate our parabolae, one by one, every time we define a new arrow. After we defined all our arrows, we have two strings, ARROW, and PLOT. As such, they are no good, they will become instructions only when we evaluate them. That is what we do in the last two lines.

    I would like to point out that my main reason for posting this was not that it can be used for creating curved arrows, but that this method is quite general. First, we can add to the plot, if that is needed, without having to keep track of all the tiny details. Second, the set command can be "fooled" by using the sprintf function. With the help of the string augmentation and the eval command, we can actually use parameters in our set instruction very efficiently.

    Well, this is for today. I am waiting for suggestions as to what we should discuss next time. Cheers,
    Zoltán

    0 0
  • 05/09/10--11:15: Ministry of Silly Walks
  • In a comment last week, someone asked whether it was possible to draw a Marimekko plot, i.e., a histogram in which both directions contain relevant information. In other words, the question is whether we could draw a square, and populate it with rectangles in such a way that the area of the rectangles is read from a file. I thought that it should be possible, but on the way, I also found out a couple of interesting things. If you keep reading, you will see it for yourself, how we can define a vector, whose value is taken from a data file, and how we can manipulate the elements of that vector. In some sense, this is similar to the trick that we made use of, when we generated a parametric plot from a file. But we can do things in a slightly better fashion.

    First, we will need a data file, and for the sake of conformity with the question of the commenter, I will just use this

    "" France Germany Japan Nauru
    Defense 9163 4857 2648 9437
    Agriculture 3547 5378 1831 1948
    Education 7722 7445 731 9822
    Industry 4837 147 3449 6111
    "Silly walks" 3441 7297 308 7386

    (We can already deduce that with the sole exception of Japan, countries spend a large chunk of their GDP on silly walk.)

    Now, our first attempt could be this:

    reset

    file = 'marimekko.dat'

    set style data histograms
    set style histogram columnstacked
    set style fill solid border -1
    set boxwidth 1.0

    set xrange [-1:5]
    set yrange [0:5e4]

    plot newhistogram at 0, file u 2 title col, \
    newhistogram at 1, file u 3 title col, \
    newhistogram at 2, file u 4 title col, \
    newhistogram at 3, file u 5 title col, \
    and it should be quite obvious that this is not what we want:

    It just falls short of our expectations in every respect: There is a gap between the columns, the colours are not consistent, and the width of the columns is equal. The only reasonable thing that happened here is that we can actually set the position of the columns. This will become important later on.

    Let us try to improve on the figure, step by step. First, we will place the histograms in a multiplot, for that will make life a lot easier: this is our only way of manipulating the column width during the plot. In this spirit, our second script will be this:

    reset

    file = 'marimekko.dat'

    set style data histograms
    set style histogram columnstacked
    set style fill solid border -1
    set boxwidth 1.0

    set xrange [-1:5]
    set yrange [0:5e4]
    set multiplot

    plot newhistogram at 0, file u (f($2)) title col
    plot newhistogram at 1, file u (f($3)) title col
    plot newhistogram at 2, file u (f($4)) title col
    set boxwidth 0.3
    plot newhistogram at 2.65, file u (f($5)) title col
    unset multiplot

    This is somewhat better, for the colours are now consistent, and we also see that the last column has a different width. We also see how the positioning works: the right hand side of Japan's column is at 2.5, and since the width of Nauru's column is 0.3, its centre has got to be shifted by 0.15 with respect to 2.5. That adds up to 2.65. However, if we watch closely, we will also notice that the ytics and labels are drawn four times; after all, we have four plots. What, if we unset the ytics after the first plot? Well, we would end up with this

    Rather upsetting! The problem is that once the tics are unset, the size of the figure changes, so we can no longer count on the plots' proper alignment. However, there is an easy remedy for this: all we have to do is not to unset the ytics, but to set them invisible. That is, we can do
    plot newhistogram at 0, file u (f($2)) title col
    set ytics ("", 30000)
    plot newhistogram at 1, file u (f($3)) title col
    plot newhistogram at 2, file u (f($4)) title col
    set boxwidth 0.3
    plot newhistogram at 2.65, file u (f($5)) title col
    where we have 6 white spaces in the quote. You might wonder why on Earth 6. Well, the answer is that the label "30000" is actually " 30000", which takes up 6 characters' space. With this trick, we get

    We have already achieved quite a lot, and slowly, but surely, we are getting to our goal. Just do not despair!

    The next thing that we would need is proper scaling of the columns: we want all of them to be between 0 and 100 (%), i.e., we would have to sum all columns first, and then divide the values by the sum. And that is the snag: we have four columns, and we have to do the summing for each column independently, and before the final plots. Otherwise, our multiplot will be messed up. And this is where the array comes in handy: if we just had an array, and could retrieve values from it, we would be saved. And of course, we can do this. Let us take a small detour!

    If we think about it, the array (5, 4, 6, 7, 8) is nothing but a finite series: its first element is 5, second element is 4, and so on. But we could also look at the series as a function: a mapping from the set of natural numbers to, well, to anything. In the example above, to natural numbers. It doesn't matter. My point is that an array is a function, a function for which h(0) = 5, h(1) = 4, h(2) = 6, h(3) = 7, and h(4) = 8. As long as this is true, we do not care what h(1.1) is. We need the function's values only at integer numbers. Then the only question is how we could define this function "on the fly". Being a physicist, and a lazy man, I would propose the following:
    g(x,a) = (abs(x-a) < 0.1 ? 1 : 0)
    h(x) = 5 * g(x,0) + 4 * g(x,1) + 6 * g(x,2) + 7 * g(x,3) + 8 * g(x,4)
    g(x,a) is (apart from some numerical factors) nothing but a very primitive representation of a Dirac-delta, centred on 'a'. You can convince yourself that h(x) defined in this way fulfils the requirements above.

    After this digression, let us see what we can do with this, and issue the following commands!
    ARRAY = "h(x) = 0"
    array(x, counter) = ( ARRAY.sprintf(" + %f*g(x,%d)", x/100.0, counter+1) )
    ff(x, counter) = (($0 > 0 ? ARRAY = array(x, counter-1) : 1), total = total + x, x)
    plot 'marimekko.dat' using 0:(ff($2, 2))
    At this point, the variable ARRAY should look something like this
    ARRAY = "h(x) = 0 + 91.630000*g(x,2) + 35.470000*g(x,2) + 77.220000*g(x,2) + 48.370000*g(x,2) + 34.410000*g(x,2)"
    and if we evaluate it, the function value h(2) returns the sum of the numbers in the second column. (Apart from a factor of 100, of course.) Note that in order to take out the first line, which is the header, we have to use the condition
    ($0 > 0 ? ARRAY = array(x, counter-1) : 1)
    which updates ARRAY only if we are processing the second record, at least. Also note that in order to get the sum of all columns, all we have to do is call this plot as many times as many columns there are. In the light of this, our next script could be this
    reset

    file = 'marimekko.dat'
    col = 4

    g(x,a) = (abs(x-a) < 0.1 ? 1 : 0)
    ARRAY = "h(x) = 0"
    array(x, counter) = ( ARRAY.sprintf(" + %f*g(x,%d)", x/100.0, counter) )

    ff(x, counter) = (($0 > 0 ? ARRAY = array(x, counter) : 1), x)
    plot for [i=2:col+1] 'marimekko.dat' using 0:(ff(column(i), i))

    set xrange [-1:3]
    set yrange [0:110]

    eval(ARRAY);

    set style data histograms
    set style histogram columnstacked
    set style fill solid border -1
    set boxwidth 1.0

    set multiplot
    plot newhistogram at 0, file u ($2/h(2)) title col
    set ytics ("" 20)
    plot newhistogram at 1, file u ($3/h(3)) title col
    plot newhistogram at 2, file u ($4/h(4)) title col
    set boxwidth 0.3
    plot newhistogram at 2.65, file u ($5/h(5)) title col
    unset multiplot
    and this results in this figure

    So, we are almost there: the columns are rescaled, and placed neatly next to each other. The only missing ingredient is the setting of the widths. But that is really easy: we only have to determine what the grand total is, and then scale the columns accordingly. Our script can, then, be modified as follows
    reset
    file = 'marimekko.dat'
    col = 4

    total = 0.0
    g(x,a) = (abs(x-a) < 0.1 ? 1 : 0)
    ARRAY = "h(x) = 0"
    array(x, counter) = ( ARRAY.sprintf(" + %f*g(x,%d)", x/100.0, counter+1) )

    ff(x, counter) = (($0 > 0 ? ARRAY = array(x, counter-1) : 1), total = total + x, x)
    plot for [i=2:col+1] 'marimekko.dat' using 0:(ff(column(i), i))

    set xrange [-0.3:1]
    set yrange [0:110]
    eval(ARRAY);

    set style data histograms
    set style histogram columnstacked
    set style fill solid border -1

    total = total / 100.0
    position = 0.0

    set multiplot
    set boxwidth h(2)/total
    plot newhistogram at position, file u ($2/h(2)) title col

    set ytics ("" 20)
    set boxwidth h(3)/total; position = position + (h(2)+h(3))/total/2.0
    plot newhistogram at position, file u ($3/h(3)) title col

    set boxwidth h(4)/total; position = position + (h(3)+h(4))/total/2.0
    plot newhistogram at position, file u ($4/h(4)) title col

    set boxwidth h(5)/total; position = position + (h(4)+h(5))/total/2.0
    plot newhistogram at position, file u ($5/h(5)) title col
    unset multiplot
    and this is what we wanted!

    Adding labels to the rectangles is relatively easy: we could do the following

    plot file using (position):(l($5)):5 with labels tc rgb "#ffffff"
    where l(x) is a function that keeps track of the previous values of the column, and adds them as new values are processed. The definition of this function should be trivial.

    The last thing that I would add here is that by using macros, we can tidy up the script: we no longer would need all those long and repetitive lines. In fact, we could also add another instruction to our 'ff' function, which would generate the plot command. The advantage of that is that in this way, we do not have to repeat the plot commands four times: we simply put that in our for loop, and then evaluate the resulting string. I discussed this trick in my last post, so, if you are interested in the details, you can look it up there.

(Page 1) | 2 | newer