The HARRY_READ_ME.txt file

Part 35a

And now, a brief interlude. As we've reached the stage of thinking about secondary variables, I
wondered about the CLIMAT updates, as one of the outstanding work items is to write routines to
convert CLIMAT and MCDW bulletins to CRU format (so that mergedb.for can read them). So I look at
a CLIMAT bulletin, and what's the first thing I notice? It's that there is absolutely no station
identification information apart from the WMO code. None. No lat/lon, no name, no country. Which
means that all the bells and whistles I built into mergedb, (though they were needed for the db
merging of course) are surplus to requirements. The data must simply be added to whichever station
has the same number at the start, and there's no way to check it's right. I don't appear to have a
copy of a MCDW bulletin yet, only a PDF.. I wonder if that's the same? Anyway, back to the main job.

As I was examining the vap database, I noticed there was a 'wet' database. Could I not use that to
assist with rd0 generation? well.. it's not documented, but then, none of the process is so I might
as well bluff my way into it! Units seem to vary:

CLIMAT bulletins have day counts:

01 001 10152 10164 5 52 9 63 2 -32768 -32768 -12 20

Dave L's CLIMAT update has days x 10:

100100 7093 -867 9JAN MAYEN(NOR-NAVY) NORWAY 20002006 -7777777
2000 150 120 180 60 150 20 30 130 120 150 70 70

The existing 'wet' database (wet.0311061611.dtb) has days x 100:

10010 7093 -866 9 JAN MAYEN(NOR NAVY) NORWAY 1990 2003 -999 -999
1990-9999-9999-9999-9999 400 600 600 1800 1500 1100 800 1800

The published climatology has days x 100 as well:

Tyndall Centre grim file created on 13.01.2004 at 15:22 by Dr. Tim Mitchell
.wet = wet day frequency (days)
0.5deg lan clim:1961-90 MarkNew but adj so that wet= [Long=-180.00, 180.00] [Lati= -90.00, 90.00] [Grid X,Y= 720, 360]
[Boxes= 67420] [Years=1975-1975] [Multi= 0.0100] [Missing=-999]
Grid-ref= 1, 148
1760 1580 1790 1270 890 510 470 290 430 400 590 1160

So I guess we go with days x100. Dave's files will have to be reformatted anyway so it's a
negligible overhead. Okaaaay..

Wrote dave2cru.for to convert Dave L's CLIMAT composites to CRU-format files in the appropriate
units. One problem is the significant number of stations without names or countries: they are
simply 'xxxxxxxxxx' and I'm not sure how mergedb is going to take to that! Well only one way to
find out.. so I converted the rain days data:

crua6[/cru/cruts/version_3_0/db] ./dave2cru

DAVE2CRU - convert Dave L CLIMAT composites to dtb files
Enter the CLIMAT composite to be converted: CLIMAT_MCDW_MCDW_rdy_updat_merged

Example data line from that file:
2000 150 120 180 60 150 20 30 130 120 150 70 70

Please enter a factor to apply (or 1): 10
Please enter the 3-ch parameter code: rd0

The output file will be: rd0.0708151122.dtb

3411 stations written.

Then tried to merge that into wet.0311061611.dtb, and immediately hot formatting issues - that pesky last
field has been badly abused here, taking values including:

nocode (yes, really!)

Had a quick review of mergedb; it won't be trivial to update it to treat that field as a8. So reluctantly,
changed all the 'nocode' entries to '0':

crua6[/cru/cruts/version_3_0/db/rd0] perl -pi -e 's/nocode/ 0/g' wet.0311061611.dt*

Unfortunately, that didn't solve the problems.. as there are alphanumerics in that field later on:

-712356 5492 -11782 665 SPRING CRK WOLVERINE CANADA 1969 1988 -999 307F0P9

So.. ***sigh***.. will have to alter mergedb.for to treat that field as alpha. Aaarrgghhh.

Did that. Next problem is best summarised with an example:

* *
* *
100100 7093 -867 9 JAN MAYEN(NOR-NAVY) NORWAY 2000 2006 -999 0
* *
* This incoming station has a possible match in *
* the current database, but either the WMO code *
* or the lat/lon values differ. *
* *
* Incoming: *
100100 7093 -867 9 JAN MAYEN(NOR-NAVY) NORWAY 2000 2006 -999 0
* Potential match: *
10010 7093 -866 9 JAN MAYEN(NOR NAVY) NORWAY 1990 2003 -999 -999

Yes, the 'wet' database features old-style 5-digit WMO codes. The best approach is probably to alter
mergedb again, to multiply any 5-digit codes by 10. Not sure if there is a similar problem with 7-digit
codes, hopefully not.

Oh, more bloody delays. Modified mergedb to 'adjust' the WMO codes, fine. But then a proper run of it
just demonstrated that it's far too picky. Even a 0.01-degree difference in coordinates required ops
intervention. What we need for updates is an absolute priority for WMO codes, and only a shout if the
name or the spatial coordinates are waaay off. I am seriously considering scrapping mergedb in favour of
a version of auminmaxresync - its cloud-based approach and 'intelligent' matching is far more efficient
than mergedb's brute-force attack, as you'd expect from a program built on top of that knowledge. And it
does save all its actions. But I don't know that I have the wherewithal.. okay, I do.

Derived newmergedb.for from auminmaxresync.for. Should be fairly robust. Doesn't offer as many bells
and whistles as mergedb.for, but should be faster and more helpful all the same.

Well.. it works.. but the data doesn't. It's that old devil called WMO numbering again:

Comparing Update: 718000 4868 622 217 NANCY/ESSEY FRANCE 2001 2002 -999 0
..with Master: 718000 4665 -5306 28 CAPE RACE (MARS) CANADA 1920 1969 -999 -999

Now what's happened here? Well the CLIMAT numbering only gives five digits (71 800) and so an extra zero
has been added to bring it up to six. Unfortunately, that's the wrong thing to do, because that's the code
of CAPE RACE. The six-digit code for NANCY/ESSEY is 071800. Mailed Phil and DL as this could be a big
problem - many of the Update stations have no other metadata!

Also noticed that some of the CLIMAT data seemed to be missing, eg for NANCY/ESSEY:

718000 4868 622 217NANCY/ESSEY FRANCE 20002006 -7777777
2001-9999 110-9999-9999-9999-9999-9999 120 150 110 130 90
2002 80 160 70 70 80 30 60 120 100 130 180 140

I have the CLIMAT bulletin for 10/2006, which gives data for Rain Days (12 in this case). It doesn't seem
likely that nothing was reported after 2002.

I am now wondering whether it would be best to go back to the MCDW and CLIMAT bulletins themselves and work
directly from those.


Well, information is always useful. And I probably did know this once.. long ago. All official WMO codes
are five digits, countrycountrystationstationstation. However, we use seven-digit codes, because when no
official code is available we improvise with two extra digits. Now I can't see why we didn't leave the rest
at five digits, that would have been clear. I also can't see why, if we had to make them all seven digits,
we extended the 'legitimate' five-digit codes by multiplying by 100, instead of adding two numerically-
meaningless zeros at the most significant (left) end. But, that's what happened, and like everything else
that's the way it's staying.

So - incoming stations with WMO codes can only match stations with codes ending '00'. Put another way, for
comparison purposes any 7-digit codes ending '00' should be truncated to five digits.

Also got the locations of the original CLIMAT and MCDW bulletins.

CLIMAT are here:

MCDW are here:

Downloaded all CLIMAT and MCDW bulletins (CLIMAT 01/2003 to 07/2007; MCDW 01/2003 to 06/2007 (with a
mysterious extra called 'ssm0302.Apr211542' - which turns out to be identical to ssm0302.fin)).

Wrote mcdw2cru.for and climat2cru.for, just guess what they do, go on..

uealogin1[/cru/cruts/version_3_0/incoming/MCDW] ./mcdw2cru

MCDW2CRU: Convert MCDW Bulletins to CRU Format

Enter the earliest MCDW file: ssm0301.fin
Enter the latest MCDW file (or for single files): ssm0706.fin

All Files Processed
tmp.0709071541.dtb: 2407 stations written
vap.0709071541.dtb: 2398 stations written
pre.0709071541.dtb: 2407 stations written
sun.0709071541.dtb: 1693 stations written

Thanks for playing! Byeee!

uealogin1[/cru/cruts/version_3_0/incoming/CLIMAT] ./climat2cru

CLIMAT2CRU: Convert MCDW Bulletins to CRU Format

Enter the earliest CLIMAT file: climat_data_200301.txt
Enter the latest CLIMAT file (or for single file): climat_data_200707.txt

All Files Processed
tmp.0709071547.dtb: 2881 stations written
vap.0709071547.dtb: 2870 stations written
pre.0709071547.dtb: 2878 stations written
sun.0709071547.dtb: 2020 stations written
tmn.0709071547.dtb: 2800 stations written
tmx.0709071547.dtb: 2800 stations written

Thanks for playing! Byeee!

Of course, it wasn't quite that simple. MCDW has an inexplicably complex format, which I'm sure will vary
over time and eventually break the converter. For instance, most text is left-justified, except the month
names for the overdue data, which are right-justified. Also, there is no missing value code, just blank
space if a value is absent. This necessitates reading everything as strings and then testing for content.
Oh, and a small amount of rain is marked 'T'.. as are small departures from the mean!!

So moan over, now we have a set of updates for the secondary databases. And, indeed for the primary ones -
except that I've already processed those, as updated by Dave L.. er.. ah well. So as I'm running stupidly
late anyway - why not find out? It's that Imp of the Perverse on my shoulder again.

Actually as I examined all the databases in the tree to work out what was wheat and what chaff, I had my
awful memory jogged quite nastily: WE NEED RAIN DAYS. So both conversion progs will need adjusting and
re-running!! Waaaaah! And frankly at 18:45 on a Friday evening.. it's not gonna happen right now.

Go on to part 35b, back to index or Email search