Given enough time, maybe I could figure this out. I don't have the time to spare,a lot of tasks ahead of me, so I'm relying on the experts here as a cheat sheet..The problem is that I have 70 years of daily rainfall records for 12 stations, but for some stations in some years, many days are missing, and other combinations of stations and years measurements exceed the number of days in a year - some pattern of double accounting.
Here is the initial query, the join is only to get the name of the station:
SELECT EXTRACT(YEAR FROM rainfall.datetime) as year, rain_gauge.name, count(datetime) FROM rainfallJOIN rain_gauge on rainfall.station_code=rain_gauge.code GROUP BY EXTRACT(YEAR FROM rainfall.datetime), rain_gauge.nameORDER BY count(datetime) desc nulls last, year, rain_gauge.name ;
here are the top ten results:
1996 "MOYALE, KE" 9511994 "MOYALE, KE" 9451995 "MOYALE, KE" 9451997 "MOYALE, KE" 9021993 "MOYALE, KE" 7681974 "MOYALE, KE" 7131973 "MOYALE, KE" 7101980 "MOYALE, KE" 7071962 "MOYALE, KE" 7061976 "MOYALE, KE" 706
If count is in ascending order (lowest ten):
1975 "ELDORET INTERNATIONAL, KE" 11982 "ELDORET INTERNATIONAL, KE" 12016 "GARISSA, KE" 12017 "GARISSA, KE" 11973 "ELDORET INTERNATIONAL, KE" 22000 "NAIROBI DAGORETTI, KE" 22006 "MALINDI, KE" 22011 "MANDERA, KE" 22021 "NAIROBI DAGORETTI, KE" 2
Since there are 365 days in a year, I add a having condition, allowing for 5 missing days a year, and accounting for leap years::
SELECT EXTRACT(YEAR FROM rainfall.datetime) as year, rain_gauge.name, count(datetime) from rainfallJOIN rain_gauge on rainfall.station_code=rain_gauge.code GROUP BY EXTRACT(YEAR FROM rainfall.datetime), rain_gauge.nameHAVING count(datetime)>360 AND count(datetime) <367ORDER BY year, rain_gauge.name, count(datetime) DESC NULLS LAST ;
I get 195 rows of combinations of years and stations that have more or less complete datasets. Now, what I need is to extract those daily records out of the aggregate query to put into a 'clean' table
1950 "MOYALE, KE" 3651950 "MUSOMA, TZ" 3641950 "TORORO, UG" 3651951 "MOYALE, KE" 3651951 "MUSOMA, TZ" 3651951 "TORORO, UG" 3651952 "MOYALE, KE" 3661952 "MUSOMA, TZ" 3661952 "TORORO, UG" 3661953 "MOYALE, KE" 365
Perhaps a CTE approach would be the solution or a WINDOW function, or a combination of the two. Please guide me in the right direction.