## More Channels

## Showcase

- RSS Channel Showcase 7626601
- RSS Channel Showcase 1457783
- RSS Channel Showcase 6893468
- RSS Channel Showcase 4586908

## Articles on this Page

- 10/12/09--12:13: _Putting figures int...
- 10/13/09--13:12: _The shiny histogram...
- 10/16/09--12:34: _A long shadow is ca...
- 11/09/09--12:21: _Patching gnuplot
- 11/22/09--00:54: _Some basic statisti...
- 11/22/09--11:04: _Update
- 11/29/09--10:23: _Broken histograms
- 12/01/09--11:07: _Restricting fit par...
- 12/01/09--12:17: _New version of gnup...
- 12/03/09--12:15: _Defining some new p...
- 01/10/10--04:59: _Plot iterations and...
- 01/17/10--13:38: _Further new feature...
- 02/10/10--13:23: _The map, the inline...
- 02/24/10--15:01: _Plotting in 6 dimen...
- 02/25/10--13:38: _Parametric plot fro...
- 02/26/10--13:13: _Phong on histograms...
- 03/16/10--14:02: _Defining new symbols
- 03/16/10--16:01: _Bubble plots
- 04/26/10--12:01: _Bending the arrows ...
- 05/09/10--11:15: _Ministry of Silly W...

As I promised yesterday, this time, we will try to skew our figure, as if it was in 3D, and we were looking at it from an angle. I must admit that this is something that one would not put in a publication, unless it is a poster or presentation, perhaps. In those cases, however, it might lend a refreshing new, errr.., perspective to our data. Before diving into the script, here is the figure, so you can decide whether you want to read on:)

Then, here is our script that was responsible for the figure

reset

xmin = 0; xmax = 10; zmin = -0.4; zmax = 0.75

set view 60, 30

unset key; unset colorbox

set border 1+16+128+1024

unset ytics; set xtics out nomirror; set ticslevel 0

set yrange [0:-0.1]

set zrange [zmin:zmax]

set grid front; unset grid

set xtics xmin,2,xmax-1

set ztics zmin,0.2,zmax

f(x) = exp(-x/4.0)*sin(x)

c(x) = exp(-(x-xmax/3.0)*(x-xmax/3.0)/1.0)

set xlabel 'Time [a.u.]'

set label 2 'Amplitude [a.u.]' at graph -0.35, 0.3 rotate by 90

set parametric

set iso 3, 3

set urange [xmin:xmax]

set vrange [zmin:zmax]

set table 'perspective1.dat'

splot u, 0, v

unset table

set urange [xmin+0.2:xmax-0.2]

set table 'perspective2.dat'

splot u, 0, f(u)

unset table

unset parametric

set size 1.4, 1.4

set palette defined (0 "#7b6cff", 1 "#eeeeff")

splot 'perspective1.dat' u 1:2:3:(c($1)) w pm3d, \

'perspective2.dat' u ($1+0.05):2:($3-0.02) w l lw 6 lc rgb "#888888", \

'' w l lw 6 lt 1

The trick that we use here is to plot in 3D, and take off all those elements of the figure that we do not actually need. In the beginning, we define a couple of variables, in order to make our life a bit easier. Then we unset the colorbox and the key, and set only those borders that we want to see. If you are interested in how I came up with those numbers in the definition of the border, just issue the command

?border

in the gnuplot prompt. After this, we keep unsetting things, and define our ranges. The next noteworthy command is

set grid front; unset grid

At first, this seems silly, but the reason is real: in the figure we have a background, which would obscure our tic marks. The way out of this problem is to push the tic marks to the front, which is achieved by setting the grid to the front. Since we do not actually need the grid, we unset it, but the setting, its front position, is still there. The tic marks inherit the position of the grid, and thus, they will be in the forefront.

The function definitions are f(x), the function that we want to plot, and c(x), which will determine the colouring of our background. Modify them accordingly.

We set the zlabel by hand, because it might not be possible to turn it by 90 degrees otherwise. (Not all terminals would support it.) Then we plot the background, and the function, both to a file, so that it will be easier to colour them in the next step, where we plot the real thing. Note that in the plot of the background, we use four columns, while there are only 3 in the data file. The fourth one determines the colour as the position on the graph. The colour is given by the function c(x), and the palette, which we defined one line earlier. You should change these two things, if you are not satisfied with the background you get. Finally, we plot the function twice. Once in gray, a bit shifted to the right and down, and for the second time, in red. In this way, we add a shadow to our curve. If you want to improve the shadow, you should look at my post from the 30th August

I should also add that we discussed a vertical skew. In case you want to skew the figure horizontally, all you have got to do is to plot on the x-y plain instead of x-z.

Do you remember those shiny histograms that we discussed some long, long time ago, at the beginning of the summer? And do you also remember how much of a hassle it was to create them, since we relied on an external gawk script, and we had to build our rectangles, one by one? And have you ever wondered what all that fuss was about and whether there was an easier solution? A one-liner, perhaps? If the answer to these questions lies in the affirmative, go no further! We will discuss a method of making those histograms, without an external script, only with legal gnuplot commands, and in 5 lines. I understand that 5 lines is just 4 lines longer, than one would expect from a one-liner, but on the other hand, three out of those 5 lines are equivalent, so I don't feel so bad about this any more:)

OK, so here is our figure

here is our data file, which we will call 'hist.dat'

1 2 3and here are our scripts, 'hist.gnu',

2 2 2

4 10 3

5 1 4

5 6 2

resetand 'hist_r.gnu'

unset key; set xtics nomirror; set ytics nomirror; set border front;

div=1.1; bw = 0.9; h=1.0; BW=0.9; wd=10; LIMIT=255-wd; white = 0

red = "#080000"; green = "#000800"; blue = "#000008"

set auto x

set yrange [0:11]

set style data histogram

set style histogram cluster gap 1

set style fill solid

set boxwidth bw

set multiplot

plot 'hist.dat' u 1 lc rgb red, '' u 2 lc rgb green, '' u 3 lc rgb blue

unset border; set xtics format ""; set ytics format ""; set ylabel ""

call 'hist_r.gnu'

unset multiplot

bw=BW*cos(white/LIMIT*pi/2.0); set boxwidth bw; white=white+wd

red = sprintf("#%02X%02X%02X", 128+white/2, white, white);

green = sprintf("#%02X%02X%02X", white, 128+white/2, white);

blue = sprintf("#%02X%02X%02X", white, white, 128+white/2);

rep

if(white<LIMIT) reread

Then let us see what is happening here! At the beginning, we define various variables, most notably, white, red, green, and blue. The rest up to the first plot command is nothing but setting up the figure: we define the range, tell gnuplot to treat our data as histogram, set the width of the bars, and finally, set multiplot.

There is nothing exciting in the first plot, except, that we specify the colour of the bars as

plot 'hist.dat' u 1 lc rgb red, '' u 2 lc rgb green, '' u 3 lc rgb blueThe strings red, green, and blue were defined at the beginning of our first script, thus, we learn here that gnuplot will accept any defined (and valid) string as the specifier of the colour. After our first plot, we unset the border, re-set the format of the xtic and ytic to empty, and do likewise with the ylabel. Should there be an xlabel, we should have to do the same there. Having done this, we call our second script, which we will dissect now. This is really nothing but a 'for' loop, that we have discussed a couple of times before. In fact, quite a few times. The first two commands re-set the widths of the bars in the next plot. Note that I set the width in such a way that it would draw the outline of a circle as we step through the values of white. This is what we increment next, mind you!

The next three lines are basically identical: we re-define the colours, using a sprintf command in each step. If you recall how the RGB colours are defined, we have to create a string that looks like

#00FF00say. This is what our sprintf command will do, returning a string of this form that depends on the value of 'white'. There are some small nuances in the corresponding colour channel of red, green, and blue, respectively, namely, that the base colour for red was

#080000and we want to linearly interpolate between this colour, and white,

#FFFFFFso, we have to apply the relevant linear function, but there is nothing beyond this. Obviously, if you are unhappy with the colour scheme that I have (I know full well that these are not the best colours...), this is the place where you would have to tamper with the script. When we are done with re-defining the widths, and the colours, we simply replot our histogram, and do that, as long as the value of white is smaller, than the limit that we set at the beginning. In this particular case, 245. In most cases, we do not need this many plots, by the way! For a raster plot, 10-12 steps, for a vector format, something like 15-16 steps should be more than enough.

At the end, we shouldn't forget about unsetting the multiplot. The only difficulty that I see with this figure is that it is not so straightforward to use a key. However, it is not terribly hard to come up with a solution for this problem: all we have to do is to put three vertical labels on the top of the first, second, and third column, indicating what they represent.

A couple of days ago, Cedric asked the question how we could add a shadow to a pm3d surface plot. In some sense, the problem turned out to be easier, than I had expected, but I am not sure that I did what he actually meant. Anyway, we could use it as our working hypothesis, and refine it, if necessary.

The idea of putting "phong" on the surface was discussed ages ago, and I won't re-open that question here. For the shadow, we will just replot our surface (defined as z(x,y)) in a little bit strange way: instead of letting the x and y run through their corresponding ranges, we will restrict y to be equal to the maximum of the yrange. By doing so, we get this

resetFirst, we create the data that will be our background, then some dummy data, which in this particular case will be a noise Gaussian function in 2D. Then we move the surface to the bottom of the zrange, by setting the ticslevel to 0. The next step is the definition of the cbrange. We need to "overdefine" this, i.e., our cbrange is much larger, than the actual data range. The reason for this is that in this way, we can use the same colour palette, and we do not have resort to multiplot. The basic problem with multiplot is that we since we want to plot onto the same graph, we would have to re-set the border, tics, labels, and so on. By using the same plot, and same palette, we can avoid all this hassle. The price we pay is that our palette will be a bit more complicated, but we can live with that. Now, we have to define our palette, which will change between red and almost white for [0:1], and between blue, and almost white for [3:4]. We also defined a value at 2, which we will use for colouring the shadow, but the two main ranges are [0:1], and [3:4]. Note that we define disjoint ranges, thereby not walking into a trouble with the end points.

unset colorbox; unset key

set iso 2, 50

set parametric; set urange [0:1]; set vrange [0:1.2]

set table 'tb.tab'

splot u, v, 1

unset table

unset parametric

set iso 100, 100

set xrange [-3:3]

set yrange [-3:3]

set zrange [0:1.2]

set table 't.tab'

splot exp(-x*x-y*y)+0.1*rand(0)

unset table

set ticslevel 0

set pm3d

set cbrange [0:4]

f(x,y,z,a,b,s) = z*(exp(-(x-a)*(x-a)/s-(y-b)*(y-b)/s)/3.0+0.66)

set palette defined (0 "#ff2222", 1 "#ffeeee", 2 "#aaaaaa", 3 "#2222ff", 4 "#8888ff")

splot 'tb.tab' u ($1*6.0-3.0):(3):2:($2+3.0) w pm3d, \

'' u (-3):($1*6.0-3.0):2:($2+3.0) w pm3d, \

'' u ($1*6.0-3.0):($2*6.0-3.0):(0):(2.1) w pm3d, \

't.tab' u ($1+1.0):(3):($3*0.7):(2.1) w pm3d,\

'' u 1:2:3:(f($1,$2,$3,-0.5,-0.5,.8)) w pm3d

Finally, we plot our background, and function. Pay attention to plots like

splot 'tb.tab' u ($1*6.0-3.0):(3):2:($2+3.0) w pm3dThis will plot a plane between x [-3:3], and z [0:1] at y=3, with a colour given by the value of ($2+3.0). This is nothing but pushing all z values in the plot into the [3:4] colour range.

At the very end, we plot our function's shadow, namely,

't.tab' u ($1+1.0):(3):($3*0.7):(2.1) w pm3dwhere the x values are shifted by 1.0 (so as to give the impression that the surface is lit from the (-1:-1) direction), the y values are all restricted to 3, which is the maximum of the yrange, and the z values are multiplied by 0.7, again for the same reason. Colouring is done by using one single value, 2.1. The very last step is to plot the surface itself, using the colouring given by f(...). If you want to have another direction for the lighting, or have a tighter focus, you should change the parameters in this function.

One of the major advantages of open-source code is that if you would like to add some new features, you can easily do that. This applies to gnuplot, too, in fact, doing that does not require anything special. I have been quite inactive on this blog recently, and the reason is that Philipp Janert and I have been working on a patch to gnuplot.

The steps of patching gnuplot are described on gnuplot's main web page. There are a number of patches uploaded to gnuplot's patch tracker, on which quite a few new features, still in the development phase, are published. It is really worthwhile to try them out, first, to provide feedback as to what is useful and what is not, and second, to help the developers to find bugs and other glitches, like what the syntax of a command should be and so on.

Our patch is related to an old debate as to what gnuplot really is. At many a place, you will find the statement that "gnuplot is a plotting utility, not a statistical analysis package". I have nothing against this statement, however, when saying so, we have to tell what we mean by plotting. So, is plotting just placing a thousand dots at positions that represent our data? Or do we want more? E.g., throwing out data points that are unreasonably far from the mean. Or showing the mean, and the standard deviation? Or calling the reader's attentional to some special points, like the minimum or the maximum in a data set? And many similar things. I believe, plotting requires much more, than just showing the measurement data: a plot makes sense only, if we can point out what is to be pointed out. By the way, fitting falls into this category, and fitting has been an integral part of gnuplot for ages. The point being that the original statement (gnuplot is a plotting utility, not a statistical analysis package) has been wrong for a long time.

The patch that I mentioned above was announced yesterday on the gnuplot development mailing list and you can find the patch for the source and the

documentation on patch tracker. I have put a couple of examples on my gnuplot web site under patch. You can also find the full documentation.

I would like to ask you, if you feel crafty and you can, download the patch, and try it, and let us know whether you find it useful, what else, do you think, we could do with it and so on. It would really help the development. Once the patch makes it to the main code, I will discuss various option on these pages.

Just to wet your appetite, here is a figure that you could very easily make with the new patch. (You can find the code on my web site.)

Many cheers,

Zoltán

The first thing I did with the statistical patch was to plot the mean, minimum and maximum of a data set. This can easily be done in the following way.

reset

# Produce some dummy data

set sample 200

set table 'stats2.dat'

plot [0:10] 0.5+rand(0)

unset table

set yrange [0:2]

unset key

# Retrieve statistical properties

plot 'stats2.dat' u 1:2

min_y = GPVAL_DATA_Y_MIN

max_y = GPVAL_DATA_Y_MAX

f(x) = mean_y

fit f(x) 'stats2.dat' u 1:2 via mean_y

# Plotting the minimum and maximum ranges with a shaded background

set label 1 gprintf("Minimum = %g", min_y) at 2, min_y-0.2

set label 2 gprintf("Maximum = %g", max_y) at 2, max_y+0.2

set label 3 gprintf("Mean = %g", mean_y) at 2, max_y+0.35

plot min_y with filledcurves y1=mean_y lt 1 lc rgb "#bbbbdd", \

max_y with filledcurves y1=mean_y lt 1 lc rgb "#bbddbb", \

'stats2.dat' u 1:2 w p pt 7 lt 1 ps 1

At the beginning of our script, we just produce some dummy data, and call a dummy plot. This plot does nothing but fills in the values of the minimum and maximum of the data set. Then we fit a constant function. You can convince yourself that this returns the average of the data set.

In the plotting section, we produce three labels that tell us something about the data set, and plot the data range with shaded region. Easy enough, and in just a couple of lines, we created this figure

Now, what should we do, if we were to calculate the standard deviation. Well, we know how to calculate the average, so we will use that. Here is the script:

reset

set sample 200

set table 'stats2.dat'

plot [0:10] 0.5+rand(0)

unset table

set yrange [0:2]

unset key

f(x) = mean_y

fit f(x) 'stats2.dat' u 1:2 via mean_y

stddev_y = sqrt(FIT_WSSR / (FIT_NDF + 1 ))

# Plotting the range of standard deviation with a shaded background

set label 1 gprintf("Mean = %g", mean_y) at 2, min_y-0.2

set label 2 gprintf("Standard deviation = %g", stddev_y) at 2, min_y-0.35

plot mean_y-stddev_y with filledcurves y1=mean_y lt 1 lc rgb "#bbbbdd", \

mean_y+stddev_y with filledcurves y1=mean_y lt 1 lc rgb "#bbbbdd", \

mean_y w l lt 3, 'stats2.dat' u 1:2 w p pt 7 lt 1 ps 1

What we utilise here is the fact that the fit function also sets a couple of variables. One of them is the sum of the residuals, which is called FIT_WSSR, while another is the number of degrees of freedom, FIT_NDF. However, we know that the number of degrees of freedoms is one less, than the number of data points, for we fit a function with a single parameter. Therefore, if we take the square root of the sum of residuals divided by the number of degrees of freedom plus one, we get the standard deviation. The rest of the plot is trivial, and this script results in the following graph:

Incidentally, this can also be used for removing points that are very far from the mean. The following script takes out those data that are more than one standard deviation away from the mean.

resetwith the corresponding figure

set sample 200

set table 'stats2.dat'

plot [0:10] 0.5+rand(0)

unset table

set yrange [0:2]

unset key

f(x) = mean_y

fit f(x) 'stats2.dat' u 1:2 via mean_y

stddev_y = sqrt(FIT_WSSR / (FIT_NDF + 1 ))

# Removing points based on the standard deviation

set label 1 gprintf("Mean = %g", mean_y) at 2, min_y-0.15

set label 2 gprintf("Sigma = %g", stddev_y) at 2, min_y-0.3

plot mean_y w l lt 3, mean_y+stddev_y w l lt 3, mean_y-stddev_y w l lt 3, \

'stats2.dat' u 1:(abs($2-mean_y) < stddev_y ? $2 : 1/0) w p pt 7 lt 1 ps 1

Only the last line is relevant: we use the ternary operator to decide whether we want to keep the point: if the deviation from the mean is less, than the standard deviation, we hold on to our data, otherwise, we replace it by 1/0, which is undefined, and gnuplot quietly ignores it. If you want to learn more about the working of the ternary operator, check out my post on the plotting of an inequality.

We have, thus, already found a solution for two of the problems addressed in the patch. What about the third one, adding arrows to the plot at the position of the minimum or maximum, say. We can do that, too. Here is the script:

reset

set sample 50

set table 'stats1.dat'

plot [0:10] 0.5+rand(0)

unset table

set yrange [0:2]

unset key

plot 'stats1.dat' u 1:2

min_y = GPVAL_DATA_Y_MIN

max_y = GPVAL_DATA_Y_MAX

plot 'stats1.dat' u ($2 == min_y ? $2 : 1/0):1

min_pos_x = GPVAL_DATA_Y_MIN

plot 'stats1.dat' u ($2 == max_y ? $2 : 1/0):1

max_pos_x = GPVAL_DATA_Y_MAX

# Automatically adding an arrow at a position that depends on the min/max

set arrow 1 from min_pos_x, min_y-0.2 to min_pos_x, min_y-0.02 lw 0.5

set arrow 2 from max_pos_x, max_y+0.2 to max_pos_x, max_y+0.02 lw 0.5

set label 1 'Minimum' at min_pos_x, min_y-0.3 centre

set label 2 'Maximum' at max_pos_x, max_y+0.3 centre

plot 'stats1.dat' u 1:2 w p pt 6

First, we retrieve the values of the minimum and the maximum by using a dummy plot. Having done that, we retrieve the positions of the minimum and maximum, by calling a dummy plot on the columns

plot 'stats1.dat' u ($2 == min_y ? $2 : 1/0):1

What this line does is substitute min_y, when the second column (whose minimum we extracted before) is equal to the minimum, and an undefined value, 1/0, otherwise. The minimum of this plot is nothing, but the x position of the first minimum. Likewise, had we assigned

min_pos_x = GPVAL_DATA_Y_MAX

that would have given the position of the last minimum of the data file. Obviously, these distinctions make sense only, if there are more than one minimum or maximum. Knowing the x and y positions of the minimum and maximum, we can easily set the arrows. We, thus, have the following figure

Adding labels showing the value should not be a problem now.

Well, this is for today. Till next time!

I have had some time, so I moved all recent posts to their permanent place on

my web page. I have "sexed up" the homepage a bit, so, hopefully, browsing will be a tad easier. Let me know, if there are any problems! (I know that there is a small glitch with the cascaded style sheets on IE6. IE8 should work without problems. Firefox 3.5 is also OK.) There is a zipped version of the complete site, if you want to read it off-line.

Comments should still be posted here, please!

Cheers,

Zoltán

Sometimes, a histogram is just a bit awkward, for the simple reason that one or two values are extremely high compared to the rest of the graph. In the case of a standard graph, we would use a broken axis to bring all points to the same order of magnitude. We can play the same trick with histograms, in fact, it is, in some sense, even simpler, than the broken axes. All we have to do is to plot a thick line at the proper position in the proper colour. This is the graph that we are going to make today

Our data file, brokenhist.dat, is as follows

"Jan" 1 2and here is our script:

"Feb" 44 4

"Mar" 3 1

"Apr" 2 25

"May" 4 5

"June" 2 1

resetOK, so let us look at the code! The first couple of lines are required only, if you want to have some posh background. Likewise, you can drop the 'unset colorbox' line, when you have a white background. We set only the bottom axis, which means that we have to unset the ytics and set to xtics to nomirror. Then we have two helper functions. The definitions of these depend on where you want to have the break point in the histogram. In this particular case, I took 6, but it is arbitrary.

blue = "#babaff"

set xrange [-0.5:5.5]

set yrange [0:11]

set isosample 2, 100

set table 'brokenhist_b.dat'

splot 1-exp(-y/2.0)

unset table

unset colorbox

set border 1

set xtics rotate by 45 nomirror offset 0, -1

unset ytics

f(x) = (x < 6 ? x : (x < 30 ? x-17 : x-35) )

g(x) = (x < 6 ? 1/0 : 6)

set boxwidth 0.85

set style fill solid 0.8 border -1

set style data histograms

set palette defined (0 "#ffff ff", 1 "#babaff")

plot 'brokenhist_b.dat' w ima,\

'brokenhist.dat' u (f($2)) t 'Red bars',\

'' u (f($3)) lc rgb "#00bb00" t 'Green bars', \

'' u 0:(f($2)):2 w labels center offset 0,0.5 t '',\

'' u ($0+0.25):(f($3)):3 w labels center offset 0,0.5 t '',\

'' u 0:(-1):xticlabel(1) w l t '', \

'' u ($0+0.12):(g($3)+0.12):(0.25):(0.25) w vectors lt -1 lc rgb blue lw 5 nohead t '', \

'' u ($0-0.12):(g($2)+0.12):(0.25):(0.25) w vectors lt -1 lc rgb blue lw 5 nohead t ''

In the next step, we set the properties of the histogram, like the width of the columns, the fill style, and the data style. We also define a palette, but this is needed for the background only. For white background, you can skip this step. You can also skip the first plot, because that is nothing but our fancy background.

The actual plotting begins after this. We plot the two sets of columns, and also plot the data file with labels. The labels are placed at the top of each column (this is why we could do away with the yaxis.) We also 'plot' the axis labels, and finally, plot the two break points. Note that the plotting of the break points is automatic, once we have the definitions of the two helper functions. If you want to have a steeper cut, you could

'' u ($0+0.12):(g($3)+0.12):(0.25):(0.5) w vectors lt -1 lc rgb blue lw 5 nohead t ''e.g., which stretches the vectors in the vertical direction. Otherwise, we have finished the plot, there is nothing else to do.

I should point out here that, in order to have a seamless cut, we have to use a colour for the vectors that is identical to the background at that particular point. This implies that we could not have a gradient at y=6. The background colour is virtually constant at y=6 (c.f. the definition of 'brokenhist_b.dat'. While it would not be impossible to implement a cut over a gradient, I believe, it is probably not worth the trouble it involves.

Chris asked an interesting question today, namely, how one can restrict the fit range in gnuplot. What he meant by that was not the range of the data points (that is really easy, the syntax is the same as for plot), but the range of fit parameters. In some cases, it is a quite reasonable requirement, because we might know from somewhere that certain parameter values just do not make any sense. As it turns out, it is rather easy to achieve this in gnuplot. All we have to do is to come up with a function that restricts its values in the desired range.

After this interlude, let us see an example! We will create some data with the following gnuplot script:

reset

a=1.0; b=1.0; c=1.0

f(x) = a*exp(-(x-b)*(x-b)/c/c)

set table 'restrict.dat'

plot [-2:4] f(x)+0.1*(rand(0)-0.5)

unset table

We take a Gaussian, with some noise added to it. Naturally, we would like to fit a Gaussian to this data, and in particular, f(x). But what, if our model is such that 'a' must be in the range [1.1:2], 'b' must be in the range [0.1:0.9], and 'c' must be in the range [0.5:1.5]? We just use in our fit, instead of f(x), another function, g(x), say, of the form

g(x) = A(a)*exp(-(x-B(b))*(x-B(b))/C(c)/C(c))where A(a), B(b), and C(c) take care of our restrictions. These functions are somewhat arbitrary, but for better or worse, I will take the following three arcus tangents

# Restrict a to the range of [1.1:2]which would look like this

A(x) = (2-1.1)/pi*(atan(x)+pi/2)+1.1

# Restrict b to the range of [0.1:0.9]

B(x) = (0.9-0.1)/pi*(atan(x)+pi/2)+0.1

# Restrict c to the range of [0.5:1.5]

C(x) = (1.5-0.5)/pi*(atan(x)+pi/2)+0.5

The point here is that as x runs from negative infinity to positive infinity, A(x) runs between 1.1, and 2.0, and likewise for B(x), and C(x). Then the fit goes on as it would normally. Our script is, thus, the following in its full glory:

# Restrict a to the range of [1.1:2]

A(x) = (2-1.1)/pi*(atan(x)+pi/2)+1.1

# Restrict b to the range of [0.1:0.9]

B(x) = (0.9-0.1)/pi*(atan(x)+pi/2)+0.1

# Restrict c to the range of [0.5:1.5]

C(x) = (1.5-0.5)/pi*(atan(x)+pi/2)+0.5

a=0.0

b=0.5

c=0.9

fit f(x) 'restrict.dat' via a, b, c

g(x) = A(aa)*exp(-(x-B(bb))*(x-B(bb))/C(cc)/C(cc))

aa=1.5

bb=0.5

cc=0.9

fit g(x) 'restrict.dat' via aa, bb, cc

plot f(x), g(x), 'restrict.dat' w p pt 6

and it produces the following graph:

At this point, we should not forget, that what we are interested in is not the value of 'aa', 'bb', or 'cc', but the value of 'a', 'b', and 'c'. This means that what we have to take is

A(aa), B(bb), and C(cc). If you print the values of 'a', 'b', and 'c' from the fit to f(x), the value of 'aa', 'bb', and 'cc', and the value of A(aa), B(bb), and C(cc), we get the following results

gnuplot> pr a, b, c

0.984221984191135 0.996600824504231 1.00765240463672

gnuplot> pr aa, bb, cc

-3442408.91578921 1443864.45093385 -0.201236322474146

gnuplot> pr A(aa), B(bb), C(cc)

1.10000008322047 0.899999823634477 0.936788734177637

Obviously, the second print does not make too much sense, we have to compare the last one to the first one. We can here see that we got values in the ranges [1.1:2], [0.1:0.9], and [0.5:1.5], as we wanted to.

I have been waiting for this for a long time, but at long last, it has happened! The new version of gnuplot has been released with the designation 4.4. You can download the binary or the source code from the sourceforge repository. In the future, I will discuss the new things, and show what can be done with them.

Cheers,

Zoltán

I have long wanted to write a blog post on this subject, but somehow, I never got the time to do it. But the time has come, and I will do it now. One of the plot styles that I actually like in origin (no matter what I think of the software in overall) is the one where the circumference of a circle is drawn in a colour different to that of the body of the circle, as in this graph

The only glitch is that this is not supported in gnuplot. Well, this is not completely true, because one can use a new prologue for the postscript output, but that works for postscript only, nothing else. What should we do then? There are two options: one has already been discussed, in connection with the bubble charts, sometime in early September. Nevertheless, that method is feasible only when the points do not overlap. What we did there was to plot the data twice (or however many times). But the problem is that those are actually two different plots, so if the adjacent circles overlap, we will not have what we want. Just try it, if you are still not convinced! The second option is that we keep reading. So, the question is then, how could we trick gnuplot into thinking that we have only one plot, not two. Well, the short answer is that we plot only one plot, not two, and we then make sure that points in the plot are coloured properly. I think it is time to show my script, that should make everything clear.

reset

set table 'two.dat'

plot [0:10] sin(x)+0.1*rand(0), cos(x)+0.1*rand(0)

unset table

f(x) = cos(x*pi)

set cbrange [-1:2]; unset colorbox; set border back

set palette defined (-1 "#000000", 1 "#ff0000", 2 "#0000ff")

plot "< gawk '{print $0; print $0}' two.dat" using 1:2:(2+f($0)/5.0):(f($0+1)) index 0 \

with points pt 7 ps var lt palette title 'red', \

'' using 1:2:(2+f($0)/5.0):(f($0+1)*2) index 1 \

with points pt 7 ps var lt palette title 'blue'

As always, we generate some data. Note, that we plot sin(x) and cos(x), which will be in the same data file, but in two separate data blocks. This will become important later. What we need to know about data blocks is that in the data file they are separated by pairs of blank lines, and that we can ask gnuplot to plot only certain data blocks. We indicate our choice with the use of the index keyword. If you want to know more about this, issue the

?indexcommand, and look at the data file 'two.dat'!

OK, so we have some data, if a bit obscure at the moment. Next, we define a funny function. If you look closely, you will realise that f(x) takes the value of -1 for odd, and 1 for even numbers. In principle, any function should do here, but one has got to be a bit careful: those functions that take 1 or -1 at isolated points might not work. If you do not want to dive into the details of this, take my word for it that f(x) defined above will just be perfect for our purposes.

The next step is the definition of a colour palette with the colour range. (We take off the colour box, too, and set the border to back, but these are irrelevant nuances.) We use the palette for colouring our graph: in the first curve, every other point will be black and red, while in the second curve, every other point will be black and blue. Thus, I have already divulged my trick: we have to duplicate our data set in a way that every line is copied at its position, so, in 'two.dat' we will have something like this

...

9.29293 -0.93128 i

9.29293 -0.93128 i

9.39394 -0.978266 i

9.39394 -0.978266 i

9.49495 -0.923256 i

9.49495 -0.923256 i

...

For this we use an external gawk script, but it is really just one line, it could not be any simpler. Once we duplicated our data, we plot it, and colour every second point as black, and every second point as red. Also note that we use the index keyword to choose the data block that we need. If you have only one data block, you can skip this. But not only do we colour the points differently, but we also resize them: after all, if all had the same size, we would not see the ones that are plotted first. For all this machinery, we use the fact that when a 2D plot is given as 4 columns, the size of the points can be determined by the third column, while the fourth column can be assigned to take care of the colour. In this case, the colour is taken from the palette that we defined above. The general form of such a plot is

plot 'foo' using 1:2:3:4 with points pt 7 ps var lt palettewhere 'var' signifies the variable point size (ps), while palette is the colour. Note that we use our snappy f(x) function to choose the size and the colour of every second point, simply by counting the ordinal number of the particular data record, $0: For even numbers, we set the size 2+1/5, and the colour to black (colour value of -1), while for odd numbers, the size is 2-1/5, and the colour is red (colour value of 0). If you have a single data, the whole script would be simply

f(x) = cos(x*pi)Well, this is it for today. Next time I will try to show some of the new stuff in gnuplot 4.4. Cheers,

set cbrange [-1:1]; unset colorbox; set border back

set palette defined (-1 "#000000", 1 "#ff0000")

plot "< gawk '{print $0; print $0}' two.dat" using 1:2:(2+f($0)/5.0):(f($0+1)) \

with points pt 7 ps var lt palette title 'red'

Zoltán

As I promised some time ago, I will discuss some of the new features in gnuplot 4.4. The first one that I would like to show is the concept of iteration in the plot command, and the concept of certain pseudo-files. If you have ever had to script the creation of your plots, you will appreciate these features.

First, let us see what the for loop looks like in the plot command. There are forms of it: once one can loop through integers, while in the other case, one can step through a string of words. So, the iteration either looks like

plot for [var = start : end {:increment}]

or

plot for [var in "some string of words"]

After this introduction, let us see how these can be used in real life. The first example that I will show is that of waterfall plots, i.e., when a couple of curves are plotted on the same graph, and they are shifted vertically. This is common practice, when one has several spectra, and wants to show the effect of some parameter on the spectra.

reset

f(x,a) = exp(-(x-a)*(x-a)/(1+a*0.5))+0.05*rand(0)

title(n) = sprintf("column %d", n)

set table 'iter.dat'

plot [0:20] '+' using (f($1,1)):(f($1,2)):(f($1,3)):(f($1,4)):(f($1,5)):(f($1,6)) w xyerror

unset table

set yrange [0:15]

plot for [i=1:6] 'iter.dat' u 0:(column(i)+2*i) w l lw 1.5 t title(i)

I would like to walk through the script line by line, for there is something unusual in almost each line. So, after re-setting the gnuplot session, we define a function, which will be a Gaussian, whose centre and width is determined by the parameter 'a'. We then plot this function to a file, 'iter.dat', and do it 6 times, and each time with a different parameter, so that the Gaussian is shifted, and becomes broadened. Note, however, that we do this by plotting a special file, '+'. This was introduced in gnuplot 4.4, and the purpose of this special file is that by invoking this, one can use the standard plot modifiers even with functions. I.e., we can specify 'using' for a function. The importance of this is that many plot styles require several columns, and we could not use those plot styles with functions without the '+' pseudo-file. Consider the following example

resetwhich produces the following graph

unset colorbox

unset key

set xrange [0:10]

set cbrange [0:1]

plot '+' using ($1):(sin($1)):(0.5*(1.0+sin($1))) with lines lw 3 lc palette, \

'+' using ($1):(sin($1)+2):($1/10.0) with lines lw 3 lc palette

We can thus colour our curve by specifying the colour in the third column of the pseudo-file. Of course, this is only one possibility, and there are many more. If one wants to plot 3D graphs, then the pseudo-file becomes '++', but the concept is the same: the two variables are denoted by $1 and $2, and the function is calculated on the grid determined by the number of samples and the corresponding data range.

Now, back to the iteration loop! We produce 6 columns of data by plotting '+' by invoking a plot style that requires 6 columns. In this case, it is the xyerrorbars. Having created some data, we plot each column, but we call plot only once: the iteration loop does the rest. In each plot, the curve is shifted upwards, and the title is taken from the column number. For specifying the title, we use the function that we defined earlier: it takes an integer, and returns a string. At the end of the day, we have this graph

This was an example, when we plot various columns from the same file. We can also use the iteration to plot different files. When doing so, there are two options available. One is that we simply specify the file names in a string, as below

resetwhich will plot files 'first', 'second', 'third', 'fourth', and 'fifth'. At this point, note that 'file' is a string, i.e., we can manipulate it as a string. E.g., if we wanted to, instead of 'first', 'second', etc., plot 'first.dat', 'second.dat', and so on, we would do this as

filenames = "first second third fourth fifth"

plot for [file in filenames] file using 1:2 with lines

reset

filenames = "first second third fourth fifth"

plot for [file in filenames] file."dat" using 1:2 with lines

The second option is, if the data files are numbered, e.g., if we have 'file_1.dat', 'file_2.dat', and so on, we can use the iteration over integers as follows

resetwhich will plot the second column versus the first column of 'file_1.dat' through 'file_10.dat'.

filename(n) = sprintf("file_%d", n)

plot for [i=1:10] filename(i) using 1:2 with lines

Today I would like to discuss another new feature in gnuplot 4.4. This will be the notion of "inline" data manipulation. I don't really know what the proper name would be for this feature, so I will just show what it is.

Normally, when one plots a function or file, the command has the following structure

plot 'foo' using 1:2 with line, f(x) with line

That was the old syntax. In the new version of gnuplot, we can insert arithmetic expressions in the plot command as follows

f(x) = a*sin(x)

plot a = 1.0, f(x), a = 2.0, f(x)

Now, this has some implications. First, one has to be a bit careful, because the arithmetic expressions are separated from the actual function by a comma. However, the 'for' loop that we discussed a week ago, reads statements up to the comma, and then returns to the beginning of the statement. In other words,

plot for [i=1:10] a = i, f(x)will evaluate the expression a = i ten times, and then plots f(x). At that point, the value of 'a' will be 10, therefore, we have only one plot, and that will be 10*sin(x).

The second implication is that the notion of a function has completely changed. What we do in a plot command now is no longer a mapping of the form

x -> f(x)but rather, the evaluation of a set of instructions, one of which is the above-mentioned mapping. But the crucial point here is that the mapping is not the only allowed statement. The upshot is that "functions" have become a set of operations, and the following statement is completely legal

f(x) = (a = a+1.0, a*sin(x))

a = 0.0

plot for [i=1:10] f(x)

(It is a complete different question, whether this plot makes any sense...)

What we should notice is the fact that now a function can have the form of

f(x) = (statement1, statement2, statement3, return value)and when the function is called, statement1, statement2, statement3 are evaluated, and 'return value' is returned. We should not underestimate the significance of this! Many things can be done with this. I will show a few of them below.

The first thing I would like to dive into is calculating some statistics of a file. Let us see how this works!

resetwhich will print out

set table 'inline.dat'

plot sin(x)

unset table

num = 0

sum = 0.0

sumsq = 0.0

f(x) = (num = num+1, sum = sum+x, sumsq = sumsq+x*x, x)

plot 'inline.dat' using 1:(f($2))

print num, sum, sumsq

100 -1.77635683940025e-15 0.295958848441We expected this, for the number of samples is 100 by default, and the sum should be 0 in this case.

So, what about finding the minimum and its position in a data file? This is quite easy. All we have to do is to modify our function definition, and insert a statement that determines whether a value is minimal or not.

resetwhich prints

set table 'inline.dat'

plot sin(x)

unset table

num = 0

min = 1000.0

min_pos = 0

min_pos_x = 0

f(x,y) = ((min > y ? (min = y, min_pos_x = x, min_pos = num) : 1), num = num+1, y)

plot 'inline.dat' using 1:(f($1, $2))

print num, min, min_pos_x, min_pos

100 -0.999385 4.74747 73i.e., the minimum is at the 73rd record (we count from 0), at x = 4.74747, and its value is -0.999385. Note that instead of an 'if' statement, we use the ternary operator to decide whether min, min_pos_x, and min_pos should be updated.

The implementation of calculating the standard deviation, e.g., should be trivial:

sum = 0.0We have thus seen how the "inline" arithmetic can be used for calculating quantities, e.g., various moments, minima/maxima and their respective positions. These involve the sequential summing or inspection of the data set. But this trick with the function definition can be used for back-referencing, too. This is what we will discuss next.

sumsq = 0.0

f(x) = (num = num+1, sum = sum + x, sumsq = sumsq + x*x, x)

plot 'inline.dat' using 1:(f($1))

print num, sqrt(sumsq/num - (sum/num)*(sum/num))

The trick is to use a construct similar to this

backshift(x) = (prev = pres, pres = x, prev)

which will store the last but one value in the variable 'prev', and return it. That is, the following code shift the whole curve to the right by one

reset(In cases like this, we always have to decide what to do with the first/last data record. In this particular case, I opted for duplicating the first record, - this is what happens in the ternary operator - but this is not the only possibility.) If, for some reason, you have to shift the curve by more, you do the same thing, but multiple times. E.g., the following code shifts by 3 places.

set table 'inline.dat'

plot sin(x)

unset table

pres = 0.0

backshift(x) = (prev = pres, pres = x, ($0 > 0 ? prev : pres))

plot 'inline.dat' using 1:(backshift($2)) with line, '' u 1:2 with line

backshift(x) = (prev1 = prev2, prev2 = prev3, prev3 = x, prev1)

Once we have this option of back-referencing, we should ask the question what it can be used for. I show two examples for this.

The first example is drawing arrows along a line given by the data set. Drawing arrows one by one is done by using

set arrow from x1,y1 to x2,y2but we have to use a different method, if we want to plot the arrows from a file. Incidentally, there is a plotting style, 'with vectors', that works as

plot 'foo' using 1:2:3:4 with vectorswhere the first two columns specify the coordinates of the beginning, and the second two columns specify the relative coordinates of the vectors. So, it works on four columns. What should we do, if we want to plot vectors from the points in a file. Well, we use the back shift that we defined above. Our script is as follows:

resetwhich results in the following figure:

unset key

set sample 30

set table 'arrowplot.dat'

plot [0:3] sin(x)+0.2*rand(0)

unset table

px = NaN

py = NaN

dx(x) = (xd = x-px, px = ($0 > 0 ? x : 1/0), xd)

dy(y) = (yd = y-py, py = ($0 > 0 ? y : 1/0), yd)

plot 'arrowplot.dat' using (px):(py):(dx($1)):(dy($2)) with vector

Note that we used the ternary operator to get rid of the very first data point. This is needed, because the arrows connect two points, that is, there will be one less arrow, than data points.

In the second example, we will turn this around. In my post in last August, plotting the recession, I showed how the background of a plot can be changed, based on whether the the curve is dropping, or increasing. Let us take the following script

resetwhich creates the following graph

set sample 20

set table 'inline.dat'

plot [0:10] exp(-x)+1.0+rand(0)

unset table

unset key

px = 0

py = 1000

dx(x) = (xd = x-px, px = x, xd)

dyn(y) = (yd = y-py, py = y, (yd < 0 ? yd : 1/0))

dyp(y) = (yd = y-py, py = y, (yd >= 0 ? yd : 1/0))

plot 'inline.dat' using (px):(py):(dx($1)):(dyp($2)) with vector nohead lt 1 lw 3, \

px = 0, py = 0, '' using (px):(py):(dx($1)):(dyn($2)) with vector nohead lt 3 lw 3

First we produce some data; old trick. Then we take our difference functions, in this case, three of them. The first one is identical to that in the previous script. The second and the third are identical, except that the second returns a sensible value, if and only, if the slope is negative, while the third one returns 1/0, if the slope is negative. Then we just plot our data, making sure that we re-initialise px, and py before the second plot. Simple.

Another utilisation of the back reference can be found on gnuplot's web site, under running averages.

Next time I will try to go a bit further, and demonstrate some other uses of the inline data processing.

Cheers,

Gnuplotter

Some time ago, on the 26th of July, to be more accurate, I showed how somewhat decent-looking maps can be created with gnuplot. With the wisdom of hindsight, that was a rather ugly hack, I must say. Even worse, it seems that it is not quite fail-safe. At least, I have obtained reports complaining about it. Could we, then, do better? Could we, perhaps, throw out that disgusting gawk script, with all the hassle that comes with it? Could we, possibly, manage the whole affair in gnuplot? Sure, we could. And here is how, just keep reading!

On the 17th of January, we saw that in the new version of gnuplot, functions take on a funny property, namely, they can contain algebraic statements not related to the return value. We also saw that this feature can be used to perform searches of some sort: as we "plot" a file, and step through the numbers in the file, we can assign values to variables, provided that some conditions are fulfilled. It is easy to see that in this way, we can determine the minimum or the maximum of a data file, e.g. But we can do much more than that.

We should also recall from that old post on the map what the contour file looks like. In case you have forgotten, here is a small section of it

# Contour 0, label: 2

-0.391812 3.63636 2

...

-0.959596 3.50978 2

-0.959596 3.50978 2

...

-0.391812 3.63636 2

# Contour 1, label: 1.5

-1.20098 4.51515 1.5

-1.16162 4.54423 1.5

-1.15982 4.54545 1.5

...

What we have to realise is the following: first, contours lines belonging to the same level are not necessarily contiguous (this is quite obvious, for there is no reason why they should be), and if there is a discontinuity, it manifests itself in a single blank line in the contour file, and second, contour lines belonging to different levels are separated by two blank lines. So, in the data file above, there is a blank between the lines -0.959596 3.50978 2, and

-0.959596 3.50978 2, and there are two blanks between -0.391812 3.63636 2, and # Contour 1, label: 1.5. By the way, the third column is the value of that particular contour line.

This observation has at least one important consequence: we can decide which contour line we want to plot, simply by using the index keyword. You might recall, that indexing the data file pulls out one data block, which is defined by a chunk of data flanked by two blank lines.

Now, what about the labels, and the white space that they need? Well, the white space is quite easy: what we will plot is not the contour line, but a function, which returns an undefined value at the place of the white space, e.g., this one (whatever eps and xtoy mean)

f(x,y) = ((x-x0)*(x-x0)+(y-y0)*(y-y0)*xtoy*xtoy > eps ? y : 1/0)Normally we would plot the contour lines as

plot 'contour.dat' using 1:2 with linesbut instead of this, now we will use this

plot 'contour.dat' using 1:(f($1,$2)) with linesThis will leave out those points which are too close to (x0, y0). And the labels? Well, that is not difficult either. Take this function

lab(x,y) = ( (x == x0 && y == y0) ? stringcolumn(3) : "")and this plot

plot 'contour.dat' using 1:2:(lab($1,$2)) with labelsThis will put the labels at (x0, y0), and even better, we haven't got to set the labels by hand, they are taken from the data file.

So, we have seen how we can plot the contour, leave out some white space, and then put a label at that position. The only remaining question is how we determine where the label should be. And this is where we come back to our inline functions. For the sake of example, let us take this function and the accompanying plot

g(x,y)=(((x > xl && x < xh && y > yl && y < yh) ? (x0 = x, y0 = y) : 1), 1/0)What on Earth does this plot do? The plot itself does absolutely nothing: it is always 1/0. However, while we are doing this, we set the value of x0, and y0, if the two arguments are not too close to the edge of the plot. This latter condition is needed, otherwise labels could fall on the border, which doesn't look particularly nice.

plot 'contour.dat' using 1:(g($1,$2))

By now, we have all the bits and pieces, we have only got to put them together. Let us get down to business, then!

I will split the script into two: the first produces the dummy data, while the second does the actual plotting. So, first, the data production.

reset

filename = "cont.dat"

xi = -5; xa = 0; yi = 2; ya = 5;

xl = xi + 0.1*(xa - xi); xh = xa - 0.1*(xa-xi);

yl = yi + 0.1*(ya - yi); yh = ya - 0.1*(ya-yi);

xtoy = (xa-xi) / (ya-yi)

set xrange [xi:xa]

set yrange [yi:ya]

set isosample 100, 100

set table 'test.dat'

splot sin(1.3*x)*cos(.9*y)+cos(.8*x)*sin(1.9*y)+cos(y*.2*x)

unset table

set cont base

set cntrparam level incremental -3, 0.5, 3

unset surf

set table filename

splot sin(1.3*x)*cos(0.9*y)+cos(.8*x)*sin(1.9*y)+cos(y*.2*x)

unset table

What we should pay attention to here is the definition of a handful of variables at the very beginning. Some are already obvious, like xi, xa and the like, and some will become clear in the second part. Now, the plotting takes place here

reset

unset key

set macro

set xrange [xi:xa]

set yrange [yi:ya]

set tics out nomirror

set palette rgbformulae 33,13,10

eps = 0.05

g(x,y)=(((x > xl && x < xh && y > yl && y < yh) ? (x0 = x, y0 = y) : 1), 1/0)

f(x,y) = ((x-x0)*(x-x0)+(y-y0)*(y-y0)*xtoy*xtoy > eps ? y : 1/0)

lab(x,y) = ( (x == x0 && y == y0) ? stringcolumn(3) : "")

ZERO = "x0 = xi - (xa-xi), y0 = yi - (ya-yi), b = b+1"

SEARCH = "filename index b using 1:(g($1,$2))"

PLOT = "filename index b using 1:(f($1,$2)) with lines lt -1 lw 1"

LABEL = "filename index b using 1:2:(lab($1,$2)) with labels"

b = 0

plot 'test.dat' with image, \

@SEARCH, @PLOT, @LABEL, @ZERO, \

@SEARCH, @PLOT, @LABEL, @ZERO, \

@SEARCH, @PLOT, @LABEL, @ZERO, \

@SEARCH, @PLOT, @LABEL, @ZERO, \

@SEARCH, @PLOT, @LABEL, @ZERO, \

@SEARCH, @PLOT, @LABEL, @ZERO, \

@SEARCH, @PLOT, @LABEL, @ZERO, \

@SEARCH, @PLOT, @LABEL, @ZERO, \

@SEARCH, @PLOT, @LABEL, @ZERO, \

@SEARCH, @PLOT, @LABEL, @ZERO, \

@SEARCH, @PLOT, @LABEL, @ZERO, \

@SEARCH, @PLOT, @LABEL

A bit convoluted, isn't it? OK, we will walk through the script, line by line.

First is the range setting, and then ticks go to the outside, just for aesthetic reasons. We also define eps here, which determines how much "white space" we have for the labels. Then we define the three functions that we discussed above. We have already seen eps, and the meaning of it, but what about xtoy? Despite its name, this is not something to play with, rather the ratio of x to y, or more precisely, the ratio of the xrange to the yrange. This is needed, if the two ranges are of different order of magnitude, e.g., if xrange is something like [0:1], while yrange is [0:1000]. But this ratio is automatically calculated at the beginning, you haven't got to worry about it.

After this, we define 4 macros. These are abbreviations for longer chunks of code, and make life really easier. The idea is that when confronted with a macro, gnuplot expands it as a string, and then acts accordingly. In my opinion, if written properly, macros can make the script rather readable.

The first of the macros, ZERO, is needed, because in our SEARCH macro, which is nothing but a call to the function g(x,y), if the condition is not satisfied for a particular data block, then x0, y0 wouldn't be updated, therefore, the label would end up at the wrong position. At the same time, ZERO also increments the value of b, which determines which data block we are actually plotting. b is used in the indexing in the macros SEARCH, PLOT, and LABEL. We have already mentioned SEARCH, PLOT plots the contour with the white space at the position given by x0, and y0 (this is calculated in the SEARCH macro), and finally, LABEL places the value of the contour line at that position.

At this point, we have defined everything, all that is left is plotting. We do it 13 times, because our zrange, or the contour lines were given between -3, and 3, with steps of 0.5. In this particular case, there are only 10 contour lines, and gnuplot will complain that the last 3 data blocks are empty, but this is not an error, only a warning. Shouldn't we look at the figure, perhaps? But of course! Here it is:

The only thing that I should like to point out is that the white space is made for a particular contour line, but there is no guarantee that, if the contour lines are too close to each other, the label does not cover a neighbouring contour line. If that happens, I would simply suggest to increase the contour spacing by incrementing the parameter in the set cntrparam line.

I hope that this method proves better, than the other one, and that it will be easier to use. In the next post, I will re-visit the inline functions, and show a nifty trick with them. Cheers,

Gnuplotter

Today I would like to touch on a vast subject, so prepare for a long post. However, I hope that the post will be worthwhile, for I want to discuss something that cannot be done in any other way. In due course, we will see how we can use gnuplot to create parametric plots from a file. What I mean by that is the following: if you want to plot, say, 10 similar objects, whose size is determined by the first column in a file. Of course, there are cases, when one can manipulate the size, e.g., if there is a pre-defined symbol, we can use one of the columns in a file to determine the size of the symbol. As an example, we can do this

plot 'foo' u 1:2:3 with point pt 6 ps varwhich will draw circles whose radius is given by the third column in 'foo'. This is all well, but it works for a limited number of cases only, namely, when there is a symbol to start out with. But what happens, if we want to draw an object that is not a symbol, e.g., arcs of a circle, whose angle is given by one of the columns in a file, or cylinders, whose height is a variable, read from a file. As you can guess from these two suggestions, what we will do is to draw a pie, and a bar chart. I understand that we have done this a couple of times before, but this time, we will stay entirely in the realm of gnuplot, and the scripts are really short. We just have to figure out what to write in the scripts. But beyond this, I will also show how we can plot in 6 dimensions. We will plot ellipses on a plane (first 2 columns), whose two axes are given by the 3rd and 4th axis, the orientation by the 5th, and the colour by the 6th. If you are really pressed for it, you can add three more dimensions: if you draw ellipsoids in 3D, take all three axes from a file, and also the orientation, that would make 9 dimensions altogether. Quite a lot!

So, let us get down to business! The first thing that I would like to discuss is the evaluate command. This is a really nifty way of shortening repetitive commands. Let us suppose that we want to place 10 arrows on our graph, and only the first coordinate of the arrows changes, otherwise everything is the same. Setting one arrow would read as follows

set arrow from 0, 0 to 1, 1Of course, there are quite a few settings that we could specify, but this was supposed to be a minimal example. Then, the next arrow should be

set arrow from 1, 0 to 2, 1and so on. What if we do not want to write this line a thousand times, and we do not want to search for the coordinate that we are to change, the first one, in this case? We could try the following

a(x) = sprintf("set arrow from %d, 0 to %d, 1", x, x+1)This function takes 'x', and returns a string with all the settings and coordinates. So, we are almost done. The only thing we should do is to make gnuplot understand that what we want it to treat a(x) as a command, not as a string. Enter the eval command: it takes whatever string is presented to it, and turns it into a command. Thus, the following script creates 5 arrows, all parallel to each other, and consecutively shifted to the rigth

a(x) = sprintf("set arrow from %d, 0 to %d, 1", x, x+1)I believe, this is a much simpler and cleaner procedure, than this

eval a(0)

eval a(1)

eval a(2)

eval a(3)

eval a(4)

set arrow from 0, 0 to 1, 1I should mention here that if chunks of a command are the same, another method of abbreviating them is to use macros. Those are disabled by default, so first we have to set it. Then it works as follows

set arrow from 1, 0 to 2, 1

set arrow from 2, 0 to 3, 1

set arrow from 3, 0 to 4, 1

set arrow from 4, 0 to 5, 1

set macroi.e., the term @ST is expanded using the definition above, therefore, this plot is equivalent to this one

ST = "using 1:2 with lines lt 3 lw 3"

plot 'foo' @ST, 'bar' @ST

plot 'foo' using 1:2 with lines lt 3 lw 3, 'bar' using 1:2 with lines lt 3 lw 3but the previous one is much more readable. I would also say that using capitals for the macros is probably not a bad idea, because then they cannot be mistaken for standard gnuplot commands. This much in the way of macros!

So, we have the evaluate command, and we have a new concept for functions. Then let us take a closer look at the following code

a(x) = sprintf("set arrow from %d, 0 to %d, 1;\n", x, x+1)and let us suppose that our file 'foo' contains the following 5 lines

ARROW = ""

f(x) = (ARROW = ARROW.a(x), x)

plot 'foo' using 1:(f($1))

1After plotting 'foo', the string 'ARROW' will be the following

3

5

7

9

set arrow from 1, 0 to 2, 1;I.e., we have a string, which contains instructions for setting 5 arrow. If, at this point, we simply evaluate this string, all 5 arrows will be set. Therefore, we have found a way of using a file to set the coordinates of an arrow. (N.B., if it was for the arrows only, we wouldn't have had to do anything, since there is a plotting style, 'with vector', as we discussed some weeks ago.)

set arrow from 3, 0 to 4, 1;

set arrow from 5, 0 to 6, 1;

set arrow from 7, 0 to 8, 1;

set arrow from 9, 0 to 10, 1;

We will use this trick to create a parametric plot, taking parameter values from a file, first plotting the ellipses! Again, we have got to create some dummy data, and since we now need 6 columns, we will use the errorbars

resetwhich will produce 6 columns and 50 lines. Having produced some data, let us see what we can do with it. Here is our script:

f(x) = rand(0)

set sample 50

set table 'ellipse.dat'

plot [0:10] '+' using (20*f($1)):(20*f($1)):(f($1)):(f($1)):(3.14*f($1)):(f($1)) w xyerror

unset table

PRINT(x, y, a, b, alpha, colour) = \

sprintf("%f+v*(%f*cos(u)*cos(%f)-%f*sin(u)*sin(%f)),

%f+v*(%f*cos(u)*sin(%f)+%f*sin(u)*cos(%f)),

%f with pm3d", x, a, alpha, b, alpha, y, b, alpha, b, alpha, colour)

PLOT = "splot "

num = -1

count(x) = (num = num+1, 1)

g(x) = (PLOT = PLOT.PRINT($1, $2, $3, $4, $5, $6), \

($0 < num ? PLOT=PLOT.sprintf(",\n") : 1/0))

plot 'ellipse.dat' u 1:(count($1))

plot 'ellipse.dat' using 1:(g($1))

unset key

set parametric

set urange [0:2*pi]

set vrange [0:1]

set pm3d map

set size 0.5, 1

eval(PLOT)

First, we have the definition of a print function that looks rather ugly, but is quite simple. We want to plot

a*v*cos(u), b*v*sin(u), colourwhere a, and b are the axes of the ellipse, and colour is going to specify, well, its colour. However, we want to translate the ellipse to its proper position, and we also want to rotate it by an amount given by the 5th column, so we have to apply a two-dimensional rotation on the object. Therefore, we would end up with a function similar to this

x+v*(a*cos(u)*cos(alpha)-b*sin(u)*sin(alpha)), y + v*(a*cos(u)*sin(alpha)+b*sin(u)*cos(alpha)), colourNow you know why that print function looked so complicated! After this, we define a string, PLOT, that we will expand as we read the file. But before that, we have to count the lines in the file. The reason for that is that successive plots must be separated by a comma, but there shouldn't be a comma after the last plot. So, we just have to know where to stop placing commas in our string. Then we define the function that does nothing useful, but concatenates the PLOT string as it reads the file. Here we use the number of lines that we determine in a dummy plot. At this point we are done with the functions, all we have to do is plotting.

First we count, then plot g(x). At this point, we have the string that we need. We only have to set up our plot. Remember, we have a parametric plot, where the range of one of the variables is in [0:2*pi], while the other one is in [0:1]. Easy. Then we just have to evaluate our plot string, and we are done. Look what we have made here: a six-dimensional plot!

I think that this script is much less complicated, than many that we have discussed in the past. Short and clear, thanks to the eval command, and the new concept of functions. Besides, we pulled off a trick that was impossible by other means. I started out saying that we will create bars and pie. I believe, having seen the trick, it should be quite simple now, but in case you insist on seeing it, I will discuss it in my next post.

As I promised yesterday, we will take a closer look at the pie chart, once more, and see how we can utilise what we have learnt recently. I should point out here, that this is not the only way of plotting a pie from a file. If you feel like building your gnuplot from source, you can check out either the CVS tree, or the patch tracker, where you can find a patch that makes it possible to plot slices. You can see a demo here. But we will try a different route here.

First, here is our data file (it could be anything, really)

1 1 Dolphins

2 1 Whales

2 0 Sharks

3 0 Penguins

4 1 Kiwis

5 0 Tux

and here is our script

reset

unset key; set border 0; unset tics; unset colorbox; set size 0.6,1

set urange [0:1]

set vrange [0:2*pi]

set macro

sum = 0.0

ssum = 0.0

n = 0

PLOT = "splot 0, 0, 1/0 with pm3d"

count(x) = (ssum = ssum + $1, 1)

g(x,y,n) = \

sprintf(", \

u*cos((%.2f+%.2f*v)/ssum), \

u*sin((%.2f+%.2f*v)/ssum), \

%d @PL", x, y, x, y, n)

f(x) = (PLOT = PLOT.g(2*pi*sum, x, n), sum = sum+x, n = n + 1, x)

plot 'new_pie.dat' u 1:(count($1)), '' u 1:(f($1))

PL = "with pm3d"

set parametric; set pm3d map;

eval(PLOT)

There is really nothing that we haven't discussed before: we set a couple of things at the beginning, but most importantly, the macro, and sum, ssum, and n. Then we define a string, PLOT, and two functions. One is to sum the values in our file (we need this, so that we can scale the full range of angles to two pi), and another one, that writes our PLOT command for later use. Note that the first plot in PLOT is actually empty, we plot 1/0. This seems a bit silly, doesn't it? Well, it does, but there is a good reason: successive plots must be separated by a comma, and if we have an empty plot at the very beginning, then we can put the commas before the plots, not after, and in this way, we needn't keep track of which plot we are actually processing. Remember, yesterday we used a separate counter, and an if statement, to determine, whether we need the comma, or not. This we can avoid here.

Next we call the two dummy plots, and finally, we evaluate our PLOT string. Oh, no! At the very end, we marvel in awe at the figure that we produced.

So far, so good, but what if we wanted to add labels, e.g., the value of the slice? That is really easy. All we have to do is to define a function that produces the label. Here is our updated script

resetWe have an addition string, LABEL, which we initialise with the value "". Then we define a function that prints "set label ..." with the proper positions, and finally, we insert this function in the definition of f(x). Of course, once we called f(x) in the plot, we have to evaluate the string LABEL. So, this is what we get

unset key; set border 0; unset tics; unset colorbox; set size 0.6,1

set urange [0:1]

set vrange [0:2*pi]

set macro

sum = 0.0

ssum = 0.0

n = 0

PLOT = "splot 0, 0, 1/0 with pm3d"

LABEL = ""

count(x) = (ssum = ssum + $1, 1)

g(x,y,n) = \

sprintf(", \

u*cos((%.2f+%.2f*v)/ssum), \

u*sin((%.2f+%.2f*v)/ssum), \

%d @PL", x, y, x, y, n)

lab(alpha, x) = sprintf("set label \"%s\" at %.2f, %.2f; ", \

x, 1.2*cos(alpha), 1.2*sin(alpha))

f(x) = (PLOT = PLOT.g(2*pi*sum, x, n), \

LABEL = LABEL.lab(2*pi*sum/ssum+pi*x/ssum, sprintf("%2.f", x)), \

sum = sum+x, n = n + 1, x)

plot 'new_pie.dat' u 1:(count($1)), '' u 1:(f($1))

PL = "with pm3d"

set parametric; set pm3d map; set border 0; unset tics; unset colorbox;

set size 0.6,1

eval(LABEL)

eval(PLOT)

This script can trivially be modified to print strings that are stored in our file. Watch just the following two lines

...and we are done. Here is the new pie

lab(alpha, x) = sprintf("set label \"%s\" at %.2f, %.2f centre; ", \

x, 1.2*cos(alpha), 1.2*sin(alpha))

f(x) = (PLOT = PLOT.g(2*pi*sum, x, n), \

LABEL = LABEL.lab(2*pi*sum/ssum+pi*x/ssum, stringcolumn($3)), \

sum = sum+x, n = n + 1, x)

...

Now, you might wonder why we had three columns in our data file, if we didn't want to use it. Well, perhaps, we wanted, just haven't got time till now. So, what could we do with those ones and zeros in the second column? We will make the pie explode! It is really simple, we have to modify two lines in our last script

...And here is the pie, when exploded

g(x,y,n,dx,dy) = \

sprintf(", \

%.2f+u*cos((%.2f+%.2f*v)/ssum), \

%.2f+u*sin((%.2f+%.2f*v)/ssum), \

%d @PL", 0.2*dx, x, y, 0.2*dy, x, y, n)

lab(alpha, x, r) = sprintf("set label \"%s\" at %.2f, %.2f centre; ", \

x, (1.25+0.2*r)*cos(alpha), (1.25+0.2*r)*sin(alpha))

f(x) = (PLOT = PLOT.g(2*pi*sum, x, n, \

$2*cos(2*pi*sum/ssum+pi*x/ssum), \

$2*sin(2*pi*sum/ssum+pi*x/ssum)), \

LABEL = LABEL.lab(2*pi*sum/ssum+pi*x/ssum, stringcolumn($3), $2), \

sum = sum+x, n = n + 1, x)

...

I should mention here that if you are not happy with the colours, it is really easy to help it: all we have to do is to modify the colour palette, using whatever colour combinations. We have covered a lot of material today. Till next time,

Gnuplotter

We have seen in the last couple of posts that with the new concept of functions, quite a few interesting effects can be achieved. Today I would like to show a trick that solves a problem that I discussed some time ago, when we made shiny histograms using a for loop in gnuplot. We will do the same thing here, but in two lines only. It is quick, and the results are just as good as in that case.

So, here is my data file

1which I will just name as 'bar.dat', and here is our script

3

4

2

3

5

2

resetSimple enough, let us see what it does! The first four lines are just the usual settings, although, the yrange is really irrelevant. I set it only for aesthetic reasons (otherwise, gnuplot would set the yrange automatically to [1:5] for the data file above, and we wouldn't see one of the columns). Then we define a variable called 'colour'm which we will eventually overwrite in our function definition of f(x,n). f(x,n) returns x, thus, in this regard it would be absolutely useless, but when doing so, it actually prints a string to 'colour'. The next function is w(n), which will determine in what fashion our colour will converge to white.

unset key

set style fill solid 1.0

set yrange [0:6]

colour = "#080000"

f(x,n) = (colour = sprintf("#%02X%02X%02X", 128+n/2, n, n), x)

w(n) = 0.8*cos(n/230.0*pi/2.0)

plot for [n=1:230:2] 'bar.dat' u 0:(f($1,n)):(w(n)) with boxes lc rgbcolor colour

Finally, we plot the data file some 115 times, each time with a smaller, and shinier box. At the end, we get something like this

We can very easily change the direction of the light. All we have to do is define a new function that shifts the bars as we progress with our for loop. So, the new script could be something like this

resetwith a result as in this graph

unset key

set style fill solid 1.0

set yrange [0:6]

colour = "#080000"

f(x,n) = (colour = sprintf("#%02X%02X%02X", 128+n/2, n, n), x)

w(n) = 0.8*cos(n/230.0*pi/2.0)

shift(x,n) = x-0.8*n/850.0

plot for [n=1:230:2] 'bar.dat' u (shift($0,n)):(f($1,n)):(w(n)) with boxes lc rgbcolor colour

It should be really easy to modify the script to accommodate more data sets. Well, this is for now. I don't actually know what I will write about next time, but I am sure that there will be something!

Cheers,

Zoltán

Some time ago, I showed a method with which we could add a "frame" to a symbol. If you recall, what we did was to plot everything twice, and in order to duplicate our data set, we used a simple gawk script. Now, there is another way of doing this, one which does not rely on the gawk script, in fact, on any external script. I will discuss this method today. The gist of the trick is discussed in the old post, therefore, you are encouraged to cast, at least, a cursory glance at that, if you haven't yet done it.

As I have already pointed out, we had to duplicate our data set. To be more accurate, we haven't got to duplicate anything, we have simply got to plot the data twice. Now, the difficulty is that is we do this in a primitive way, issuing the plot command twice, and taking the same data set, the points might overlap, and leads to some undesired results. So, the task is to plot the data set twice, but to plot each plot twice, and not the data set as a whole. For this, we will use the for loop introduced in gnuplot 4.4, and the 'every' keyword. To cut a long story short, I give my script here, and discuss it afterwards.

resetThen, let us see what we have here! The first 6 lines are only to retrieve the number of data points in our data sets. If you know this from somewhere else, you can skip these, with the caveat that 'red_n', 'blue_n', and 'green_n' should still be defined somewhere.

plot 'new_symbol1.dat' u 0:2

red_n = GPVAL_DATA_X_MAX

plot 'new_symbol2.dat' u 0:2

blue_n = GPVAL_DATA_X_MAX

plot 'new_symbol3.dat' u 0:2

green_n = GPVAL_DATA_X_MAX

parity(n) = (n/2.0 == int(n/2.0) ? 0 : 1)

size(n) = 2 - parity(n)*0.4

colour(n,r,g,b) = sprintf("#%02X%02X%02X", parity(n)*r, parity(n)*g, parity(n)*b)

unset key

set border back

plot for [n=0:2*red_n+1] 'new_symbol1.dat' using 1:2 \

every ::(n/2)::(n/2) with p pt 7 ps size(n) lc rgb colour(n,255,0,0) ,\

for [n=0:2*blue_n+1] 'new_symbol2.dat' using 1:2 \

every ::(n/2)::(n/2) with p pt 9 ps size(n) lc rgb colour(n,100,100,255) ,\

for [n=0:2*green_n+1] 'new_symbol3.dat' using 1:2 \

every ::(n/2)::(n/2) with p pt 5 ps size(n) lc rgb colour(n,0,150,0)

Next we define three functions, the first of which determines the parity of an integer, returning 1, if the number is odd, and 0, if it is even. The second function returns a number, depending on the parity of its argument. Surprising as it is, this function will determine the size if the symbol, when we plot. Finally, the third function returns a string, which is equal to the colour given by the triplet (r,g,b), if the first argument, 'n', is odd, and black, if the first argument is even. At this point, it should be clear that we could have defined a function that returns a different colour for even numbers.

We are done with everything, but the plotting, so let us do that! As you see, for each data set, we step through the numbers, but not once, but twice: first plotting in black, and second, plotting with some decent colour. At the same time, we change the symbol size, so that the black symbols are always a bit bigger, than the red, blue, or green. Once all three plots have been called, the following graph will appear:

We can see that the symbols overlap each others, as they should. Now, what about the keys, should we need them? Well, that requires some handwork, but it is not hard, actually. The following self-explanatory script should do

set label 1 'Red symbols' at 1.3, 8 left

plot for [n=0:2*red_n+1] 'new_symbol1.dat' using 1:2 \

every ::(n/2)::(n/2) with p pt 7 ps size(n) lc rgb colour(n,255,0,0), \

n=0, '-' using 1:2 with p pt 7 ps size(n) lc rgb colour(n,255,0,0), \

n=1, '-' using 1:2 with p pt 7 ps size(n) lc rgb colour(n,255,0,0)

1 8

e

1 8

e

and this produces the following figure

Yesterday, I discussed a method for adding an edge to an arbitrary symbol. If you recall (or roll down on this page), the idea was to trick gnuplot into plotting our data file twice, but in a way that each point was plotted twice in succession. Now, what if we plotted more times? There was really nothing special about the number 2, so there is no reason why we could not do this. But if we can, then we should, and see what comes out of it. With very small modifications, our script from yesterday can be turned into a bubble graph, like this

So, let us see how the machinery works!

resetAgain, the first three plots are there for determining the sample size, and nothing more. We, thus, start out with a number of function definitions. The first one is a remainder function, the second one uses the remainder to return the size of the bubble, the third one is a simple helper function, returning values between 0 and 240, and red, blue, and green determine the colour of our bubbles. If you look carefully, you will notice that these colours are successively whiter as the remainder increases. Finally, again by making use of our remainder function, we define two position shifts: in order to give the impression that the bubbles are lit from the top right corner, we have to shift successive circles in that direction. The value of this shift is important in the sense that, if chosen too high, the circles belonging to the same data point will no longer cover each other. (This is not necessary a tragedy, see below.)

plot 'new_bubble1.dat' u 0:2

red_n = GPVAL_DATA_X_MAX

plot 'new_bubble2.dat' u 0:2

blue_n = GPVAL_DATA_X_MAX

plot 'new_bubble3.dat' u 0:2

green_n = GPVAL_DATA_X_MAX

rem(x,n) = x - n*(x/n)

size(x,n) = 3*(1-0.8*rem(x,n)/n)

c(x,n) = floor(240.0*rem(x,n)/n)

red(x,n) = sprintf("#%02X%02X%02X", 255, c(x,n), c(x,n))

blue(x,n) = sprintf("#%02X%02X%02X", c(x,n), c(x,n), 255)

green(x,n) = sprintf("#%02X%02X%02X", c(x,n), 255, c(x,n))

posx(X,x,n) = X + 0.03*rem(x,n)/n

posy(Y,x,n) = Y + 0.03*rem(x,n)/n

unset key

set border back

level = 40

plot for [n=0:level*(red_n+1)-1] 'new_bubble1.dat' using (posx($1,n,level)):(posy($2,n,level)) \

every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb red(n,level) , \

for [n=0:level*(blue_n+1)-1] 'new_bubble2.dat' using (posx($1,n,level)):(posy($2,n,level)) \

every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb blue(n,level) , \

for [n=0:level*(green_n+1)-1] 'new_bubble3.dat' using (posx($1,n,level)):(posy($2,n,level)) \

every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb green(n,level)

Then we decide to have 40 colour levels (we could have anything up to 255, although it might be a bit time consuming and unnecessary), and call our plots. The structure is the same as it was yesterday: we use a for loop for each data set, move the circles a bit, and set the colours to whiter shades. That is all.

Now, what happens, if we take too big a value for the shift? This, actually, might lead to interesting effects, as shown in this graph, where droplets represent the data points.

After having seen the simplest implementation, we should ask whether it is possible to add some decorations. E.g., whether it is possible to add a thin black edge to the symbols. It is relatively simple, as the following script shows. We only have to re-define some of our functions as follows

size(x,n) = (rem(x,n) == 0 ? 3.3 : 3*(1-0.8*rem(x,n)/n))All these functions do is to check whether we are plotting the first round, and if so, set the colour to black. There is a small difference in the shifts, for we do not move the circles, if they are in the first or the second round. The reason is obvious, as is the result

c(x,n) = floor(240.0*rem(x,n)/n)

red(x,n) = (rem(x,n) == 0 ? "#000000" : sprintf("#%02X%02X%02X", 255, c(x,n), c(x,n)))

blue(x,n) = (rem(x,n) == 0 ? "#000000" : sprintf("#%02X%02X%02X", c(x,n), c(x,n), 255))

green(x,n) = (rem(x,n) == 0 ? "#000000" : sprintf("#%02X%02X%02X", c(x,n), 255, c(x,n)))

posx(X,x,n) = (rem(x,n) < 2 ? X : X + 0.03*rem(x,n)/n)

posy(Y,x,n) = (rem(x,n) < 2 ? Y : Y + 0.03*rem(x,n)/n)

OK, so we can plot bubbles, with or without black circumference, but we would also like to add a legend. Well, that is simple, in fact, nothing could be simpler. Just add the following the following three lines to our code

set label 1 'Red bubbles' at 9,6 leftand the following six

set label 2 'Blue bubbles' at 9,5 left

set label 3 'Green bubbles' at 9,4 left

for [n=0:level-1] 'new_bubble1.dat' using (posx(8.5,n,level)):(posy(6,n,level)) \and we are done! All we do here is to plot our data files in a silly way: we plot a single point at (8.5,6), (8.5,5), and (8.5,4). The plotting of the data file does not happen in this sense, we use it for convenience's sake only. (This trick can also be used for the post from yesterday.) There, you have it!

every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb red(n,level) , \

for [n=0:level-1] 'new_bubble2.dat' using (posx(8.5,n,level)):(posy(5,n,level)) \

every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb blue(n,level) , \

for [n=0:level-1] 'new_bubble3.dat' using (posx(8.5,n,level)):(posy(4,n,level)) \

every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb green(n,level)

The other day, I would have needed a couple of curved arrows on my plot, so I started to work out a method to get what I wanted. This, however, turned out to be rather interesting, so I thought that I would share the details with you.

First, we should just define what I mean by a curved arrow. Perhaps, the easiest way to define it is to show a plot, similar to this

In gnuplot, when one wants an arrow, one can invoke the following command:

set arrow from 0,0 to 1,1or something similar. This will produce a straight arrow from (0,0) to (1,1). But what if we wanted to have an arrow, which is not straight. Well, in this case, we set a very short arrow, and draw a curve separately. The key to this is to set the arrow in such a way that it is tangential to the curve at the end point. It is easy to see that the following script would just do that

reset

unset key

eps = 0.001

set style arrow 1 head filled size screen 0.03, 15, 45 lt -1

cut(x,x1,x3) = ((x >= x1 && x <= x1 + (1.0-eps)*(x3-x1)) ? 1.0 : 1/0)

f(x) = 0.5+(x-1)*(x-1.2)*(x-1.4)

x1 = 0.5

x3 = 1.95

new_x = x1 + (1.0-eps)*(x3-x1)

set arrow from new_x, f(new_x) to x3,f(x3) as 1

plot [0:3] sin(x) with point ps 1 pt 6, f(x)*cut(x,x1,new_x) with line lt -1

First, we define an arrow style that we will use later. The arrow will be 0.03 screen sizes big, and the two angles determining the shape of the head are 15, and 45 degrees, respectively. Finally, we stipulate that the arrow be black, i.e. linetype -1. Then we define a window function, cut, which depends on the two end points, x1, and x3 (the reason for 3 will become clear soon), and the curve, f(x). In our plot, beyond what we actually want to plot, we will also plot f(x), but only between x1, and new_x, where new_x is a bit off with respect to the second end point. The degree of "bitness" is given by eps, which was defined at the beginning. However, before we actually plot anything, we have got to set the arrow, between new_x, f(new_x), and x3, f(x3). This construction ensures that the arrow is tangential to the curve.

At this point, we are ready to plot, which we actually execute in the next, and last line.

What we have created is great, but there are problems: first, we have to define our function, f(x), beforehand, we have to set the arrows by hand, and we also have to add the appropriate lines to our plot command. Quite tedious. There has got to be a better way!

For the say of example, let us suppose that we want a curved arrow that, say, connects (0,0) and (1,1) via a parabola that passes through the point (0.5, 0.25). If we are really pressed for it, we could do the following: First, we have to figure out the parameters of our parabola. In this case, it is quite easy, for it is nothing but x*x. Then we would draw a parabola between (0,0), and (0.99, 0.9801), and then draw an arrow from (0.99, 0.9801) to (1,1).

First, let us see, how we figure out the parameters of our parabola! We have two end points, and a "control" point, i.e., we have to solve the following set of equations

y1 = a*x1*x1 + b*x1 + cfor the unknown a, b, and c. You can convince yourself that the following will do

y2 = a*x2*x2 + b*x2 + c

y3 = a*x3*x3 + b*x3 + c

denom(x1, x2, x3) = x1*x1*(x2-x3) + x1*(x3*x3-x2*x2) + x2*x3*(x2-x3)

A(x1,y1,x2,y2,x3,y3) = ( (x2-x3)*y1 + (x3-x1)*y2 + (x1-x2)*y3 ) / denom(x1,x2,x3)

B(x1,y1,x2,y2,x3,y3) = ( (x3*x3-x2*x2)*y1 + (x1*x1-x3*x3)*y2 + (x2*x2-x1*x1)*y3 ) / denom(x1,x2,x3)

C(x1,y1,x2,y2,x3,y3) = ( (x2-x3)*x2*x3*y1 + (x3-x1)*x1*x3*y2 + (x1-x2)*x1*x2*y3 ) / denom(x1,x2,x3)

a = A(x1,y1,x2,y2,x3,y3)

b = B(x1,y1,x2,y2,x3,y3)

c = C(x1,y1,x2,y2,x3,y3)

We have done most of the hard work, the only thing that remains is how we "automate" this whole machinery, i.e., what do we do, if we have several arrows that we want to set. Again, as so many times in the past, we will utilise this new notion of function definition: the fact that a function is not only a x -> f(x) mapping, but this mapping, and a set of possibly unrelated instructions. What we will do is to define a "function" that sets our arrows, and, as the supplementary instruction, augments the plot command accordingly. First, let us take the following function definition

arrow(x1,y1,x2,y2,x3,y3) = (new_x = x1 + (1.0-eps)*(x3-x1), \and try to understand what it does! For a start, it takes 6 arguments, which are nothing but the coordinates of the end points, and the control point. Then, it defines new_x, which we have already seen in the first example. In the next step, based on the 6 input arguments, calculates the three parameters of our parabola, and in the next line, adds the plot of this parabola to a string called PLOT. When adding to PLOT, we simply use the sprintf function. In the last line, we concatenate a string called ARROW, and another one, produced by another sprintf. It is easy to see that this sprintf returns the definition of an arrow between new_x, f(new_x), and x3, f(x3). We should also note that this line is the last line, which consequently means that whatever happens here is returned.

a = A(x1,y1,x2,y2,x3,y3), b = B(x1,y1,x2,y2,x3,y3), c = C(x1,y1,x2,y2,x3,y3), \

PLOT = PLOT.sprintf(", cut(x,%f,%f)*(%f*x*x+%f*x+%f) with lines lt -1", x1, x3, a, b, c), \

ARROW.sprintf("set arrow from %f, %f to %f,%f as 1; ", new_x, a*new_x*new_x + b*new_x + c, x3, y3))

At this point we are really done, we only have to "populate" our plot. The full script takes on the form

resetwhich would result in the graph shown here:

unset key

eps = 0.01

set style arrow 1 head filled size screen 0.03, 15, 45 lt -1

cut(x,x1,x3) = ((x >= x1 && x <= x1 + (1.0-eps)*(x3-x1)) ? 1.0 : 1/0)

denom(x1, x2, x3) = x1*x1*(x2-x3) + x1*(x3*x3-x2*x2) + x2*x3*(x2-x3)

A(x1,y1,x2,y2,x3,y3) = ( (x2-x3)*y1 + (x3-x1)*y2 + (x1-x2)*y3 ) / denom(x1,x2,x3)

B(x1,y1,x2,y2,x3,y3) = ( (x3*x3-x2*x2)*y1 + (x1*x1-x3*x3)*y2 + (x2*x2-x1*x1)*y3 ) / denom(x1,x2,x3)

C(x1,y1,x2,y2,x3,y3) = ( (x2-x3)*x2*x3*y1 + (x3-x1)*x1*x3*y2 + (x1-x2)*x1*x2*y3 ) / denom(x1,x2,x3)

ARROW = ""

PLOT = "p [0:3] sin(x) w p ps 1 pt 6"

arrow(x1,y1,x2,y2,x3,y3) = (new_x = x1 + (1.0-eps)*(x3-x1), \

a = A(x1,y1,x2,y2,x3,y3), b = B(x1,y1,x2,y2,x3,y3), c = C(x1,y1,x2,y2,x3,y3), \

PLOT = PLOT.sprintf(", cut(x,%f,%f)*(%f*x*x+%f*x+%f) with lines lt -1", x1, x3, a, b, c), \

ARROW.sprintf("set arrow from %f, %f to %f,%f as 1; ", new_x, a*new_x*new_x + b*new_x + c, x3, y3))

ARROW = arrow(0,0,1,1.5,pi/2,1.03)

ARROW = arrow(0,0,1,0.3,pi/2,0.97)

eval(ARROW)

eval(PLOT)

Now it is clear what was PLOT: it is nothing, but the actual plot that we want to have. This is the string to which we concatenate our parabolae, one by one, every time we define a new arrow. After we defined all our arrows, we have two strings, ARROW, and PLOT. As such, they are no good, they will become instructions only when we evaluate them. That is what we do in the last two lines.

I would like to point out that my main reason for posting this was not that it can be used for creating curved arrows, but that this method is quite general. First, we can add to the plot, if that is needed, without having to keep track of all the tiny details. Second, the set command can be "fooled" by using the sprintf function. With the help of the string augmentation and the eval command, we can actually use parameters in our set instruction very efficiently.

Well, this is for today. I am waiting for suggestions as to what we should discuss next time. Cheers,

Zoltán

In a comment last week, someone asked whether it was possible to draw a

First, we will need a data file, and for the sake of conformity with the question of the commenter, I will just use this

"" France Germany Japan Nauru

Defense 9163 4857 2648 9437

Agriculture 3547 5378 1831 1948

Education 7722 7445 731 9822

Industry 4837 147 3449 6111

"Silly walks" 3441 7297 308 7386

(We can already deduce that with the sole exception of Japan, countries spend a large chunk of their GDP on silly walk.)

Now, our first attempt could be this:

resetand it should be quite obvious that this is not what we want:

file = 'marimekko.dat'

set style data histograms

set style histogram columnstacked

set style fill solid border -1

set boxwidth 1.0

set xrange [-1:5]

set yrange [0:5e4]

plot newhistogram at 0, file u 2 title col, \

newhistogram at 1, file u 3 title col, \

newhistogram at 2, file u 4 title col, \

newhistogram at 3, file u 5 title col, \

Let us try to improve on the figure, step by step. First, we will place the histograms in a multiplot, for that will make life a lot easier: this is our only way of manipulating the column width during the plot. In this spirit, our second script will be this:

reset

file = 'marimekko.dat'

set style data histograms

set style histogram columnstacked

set style fill solid border -1

set boxwidth 1.0

set xrange [-1:5]

set yrange [0:5e4]

set multiplot

plot newhistogram at 0, file u (f($2)) title col

plot newhistogram at 1, file u (f($3)) title col

plot newhistogram at 2, file u (f($4)) title col

set boxwidth 0.3

plot newhistogram at 2.65, file u (f($5)) title col

unset multiplot

This is somewhat better, for the colours are now consistent, and we also see that the last column has a different width. We also see how the positioning works: the right hand side of Japan's column is at 2.5, and since the width of Nauru's column is 0.3, its centre has got to be shifted by 0.15 with respect to 2.5. That adds up to 2.65. However, if we watch closely, we will also notice that the ytics and labels are drawn four times; after all, we have four plots. What, if we unset the ytics after the first plot? Well, we would end up with this

Rather upsetting! The problem is that once the tics are unset, the size of the figure changes, so we can no longer count on the plots' proper alignment. However, there is an easy remedy for this: all we have to do is not to unset the ytics, but to set them invisible. That is, we can do

plot newhistogram at 0, file u (f($2)) title colwhere we have 6 white spaces in the quote. You might wonder why on Earth 6. Well, the answer is that the label "30000" is actually " 30000", which takes up 6 characters' space. With this trick, we get

set ytics ("", 30000)

plot newhistogram at 1, file u (f($3)) title col

plot newhistogram at 2, file u (f($4)) title col

set boxwidth 0.3

plot newhistogram at 2.65, file u (f($5)) title col

We have already achieved quite a lot, and slowly, but surely, we are getting to our goal. Just do not despair!

The next thing that we would need is proper scaling of the columns: we want all of them to be between 0 and 100 (%), i.e., we would have to sum all columns first, and then divide the values by the sum. And that is the snag: we have four columns, and we have to do the summing for each column independently, and before the final plots. Otherwise, our multiplot will be messed up. And this is where the array comes in handy: if we just had an array, and could retrieve values from it, we would be saved. And of course, we can do this. Let us take a small detour!

If we think about it, the array (5, 4, 6, 7, 8) is nothing but a finite series: its first element is 5, second element is 4, and so on. But we could also look at the series as a function: a mapping from the set of natural numbers to, well, to anything. In the example above, to natural numbers. It doesn't matter. My point is that an array is a function, a function for which h(0) = 5, h(1) = 4, h(2) = 6, h(3) = 7, and h(4) = 8. As long as this is true, we do not care what h(1.1) is. We need the function's values only at integer numbers. Then the only question is how we could define this function "on the fly". Being a physicist, and a lazy man, I would propose the following:

g(x,a) = (abs(x-a) < 0.1 ? 1 : 0)g(x,a) is (apart from some numerical factors) nothing but a very primitive representation of a Dirac-delta, centred on 'a'. You can convince yourself that h(x) defined in this way fulfils the requirements above.

h(x) = 5 * g(x,0) + 4 * g(x,1) + 6 * g(x,2) + 7 * g(x,3) + 8 * g(x,4)

After this digression, let us see what we can do with this, and issue the following commands!

ARRAY = "h(x) = 0"At this point, the variable ARRAY should look something like this

array(x, counter) = ( ARRAY.sprintf(" + %f*g(x,%d)", x/100.0, counter+1) )

ff(x, counter) = (($0 > 0 ? ARRAY = array(x, counter-1) : 1), total = total + x, x)

plot 'marimekko.dat' using 0:(ff($2, 2))

ARRAY = "h(x) = 0 + 91.630000*g(x,2) + 35.470000*g(x,2) + 77.220000*g(x,2) + 48.370000*g(x,2) + 34.410000*g(x,2)"and if we evaluate it, the function value h(2) returns the sum of the numbers in the second column. (Apart from a factor of 100, of course.) Note that in order to take out the first line, which is the header, we have to use the condition

($0 > 0 ? ARRAY = array(x, counter-1) : 1)which updates ARRAY only if we are processing the second record, at least. Also note that in order to get the sum of all columns, all we have to do is call this plot as many times as many columns there are. In the light of this, our next script could be this

resetand this results in this figure

file = 'marimekko.dat'

col = 4

g(x,a) = (abs(x-a) < 0.1 ? 1 : 0)

ARRAY = "h(x) = 0"

array(x, counter) = ( ARRAY.sprintf(" + %f*g(x,%d)", x/100.0, counter) )

ff(x, counter) = (($0 > 0 ? ARRAY = array(x, counter) : 1), x)

plot for [i=2:col+1] 'marimekko.dat' using 0:(ff(column(i), i))

set xrange [-1:3]

set yrange [0:110]

eval(ARRAY);

set style data histograms

set style histogram columnstacked

set style fill solid border -1

set boxwidth 1.0

set multiplot

plot newhistogram at 0, file u ($2/h(2)) title col

set ytics ("" 20)

plot newhistogram at 1, file u ($3/h(3)) title col

plot newhistogram at 2, file u ($4/h(4)) title col

set boxwidth 0.3

plot newhistogram at 2.65, file u ($5/h(5)) title col

unset multiplot

So, we are almost there: the columns are rescaled, and placed neatly next to each other. The only missing ingredient is the setting of the widths. But that is really easy: we only have to determine what the grand total is, and then scale the columns accordingly. Our script can, then, be modified as follows

resetand this is what we wanted!

file = 'marimekko.dat'

col = 4

total = 0.0

g(x,a) = (abs(x-a) < 0.1 ? 1 : 0)

ARRAY = "h(x) = 0"

array(x, counter) = ( ARRAY.sprintf(" + %f*g(x,%d)", x/100.0, counter+1) )

ff(x, counter) = (($0 > 0 ? ARRAY = array(x, counter-1) : 1), total = total + x, x)

plot for [i=2:col+1] 'marimekko.dat' using 0:(ff(column(i), i))

set xrange [-0.3:1]

set yrange [0:110]

eval(ARRAY);

set style data histograms

set style histogram columnstacked

set style fill solid border -1

total = total / 100.0

position = 0.0

set multiplot

set boxwidth h(2)/total

plot newhistogram at position, file u ($2/h(2)) title col

set ytics ("" 20)

set boxwidth h(3)/total; position = position + (h(2)+h(3))/total/2.0

plot newhistogram at position, file u ($3/h(3)) title col

set boxwidth h(4)/total; position = position + (h(3)+h(4))/total/2.0

plot newhistogram at position, file u ($4/h(4)) title col

set boxwidth h(5)/total; position = position + (h(4)+h(5))/total/2.0

plot newhistogram at position, file u ($5/h(5)) title col

unset multiplot

Adding labels to the rectangles is relatively easy: we could do the following

plot file using (position):(l($5)):5 with labels tc rgb "#ffffff"where l(x) is a function that keeps track of the previous values of the column, and adds them as new values are processed. The definition of this function should be trivial.

The last thing that I would add here is that by using macros, we can tidy up the script: we no longer would need all those long and repetitive lines. In fact, we could also add another instruction to our 'ff' function, which would generate the plot command. The advantage of that is that in this way, we do not have to repeat the plot commands four times: we simply put that in our for loop, and then evaluate the resulting string. I discussed this trick in my last post, so, if you are interested in the details, you can look it up there.