Fast summaries
|
Developer Tools | www.onegasoft.com |
This article explains how to get summary data of a found set in a fast and flexible manner using GetSummary() function and provides detailed
information about a few possible implementations: getting a summary report in a text form, including only summaries that meet given criteria, and producing cross-tab
reports.
It's a long article, so if you'd like to have it printed get
the PDF version with FileMaker samples ( 1.1M).
This article is accompanied by
sample files ( 1.1M), that contain a tutorial file you can work with while reading the article and a completed tutorial with all the scripts
and reports ready and working. The files aren't absolutely necessary, but might be a convenient sample to look at. To get them please use the link on the left or on the bottom of the page.
The sample file contains imaginary sales data:
City Amount Date Oakland $564 05/29/2001 Raymond 276 05/08/2001 Burlington 103 04/04/2001 Burlington 763 04/10/2001 Chester 45 05/15/2001 Weston 892 04/17/2001 ... ... ...
This gives us quite a choice of possible summary and subsummary reports.
Let's try to make a simple summary report "manually". Of course, it won't be true manual labor: we will use FileMaker summary fields and functions, but won't use reports. Assume we are to send summarized data by e-mail and want to get it in a text form.
The report we need is total amount of sales in each city in a given found set. We don't know what the found set is; we need to summarize just any found set.
A straightforward approach would be to run through the found set, get Amount of each record we have and somehow summarize the values by city. It's doable, but won't it be slow? Sure it will, unless you have only a few records. Is there a better way?
Have you ever exported summarized data? Try it if you don't. Even a huge number of records, that takes awhile to find and sort is exported in seconds. It would be fine if we could calculate a summary with such a speed.
The good news is that we can. Of course, we still have to perform find and sort and they will take all the time they need, but it's unavoidable in any case. But the rest will be at the maximum possible speed.
When we make a summary report we sort records by a field we want to summarize them by. Therefore a set of data ready to be summarized contains big chunks of records with same value in this sorting field.
For example, if we had a found set of 100,000 records of 20 cities, the found set sorted by City would contain only 20 such chunks.
The main idea of making a summary "manually" is to
loop through these chunks, not the entire found
set. After all, if we were able to get all the data of a
given chunk by visiting only one record from it, we
wouldn't need the rest of the chunk. And FileMaker has a
function that does exactly that: GetSummary()
function.
How it works? It takes two parameters: the name of a summary field and the name of a break field, that is the field by which the database is sorted. For example, if we have summary field Amount Total, sort database by City, go to any "Burlington" record and use
GetSummary(Amount Total, City)
we'll have the total of all sales in Burlington in the current found set. Same is for any other city, of course.
That's good, but how are we to move through chunks, not records? It's also pretty easy. We'll use the function with another summary field Amount Count, that counts records. Then
GetSummary(Amount Count, City)
will give us a number of records in a chunk! We can use
the number to skip the whole chunk and get to the next
one using Go to record [by field
value] script step.
Let's put it all together step by step.
Step 1. Open the tutorial file. It already has basic data, summary fields and some extra fields and scripts that will be explained later. Now add two global fields: gReport (text) and gPosition (number).
Step 2. Open ScriptMaker and begin writing a script Summarize:
Perform Script [Sort By City] Set Field [gReport, ""] Set Field [gPosition, 1] Loop Go to Record/Request/Page [gPosition] End Loop
Let's discuss it a bit while it's simple: here you can see how the script initializes its variables and goes to the first record. Now we need to write our chunk-looping routine and make sure the script will leave the loop when necessary (and it's just a good practice to write loop ending statements as soon as loop is created).
Step 3. Add the following script steps:
... Loop Go to Record/Request/Page [gPosition] Set Field [gPosition, gPosition + GetSummary(Amount Count, City)] Exit Loop If [gPosition > Status(CurrentFoundCount)] End Loop
Pretty simple, isn't it? As soon as we come to a chunk (the first record of it to be precise) we get the number of records in it and calculate where the next chunk starts. When we finally come to the last chunk the same calc will evaluate to total number of records inthe found set + 1; we use this to leave the loop.
You might want to check the script's work step-by-step1. It doesn't produce any data, but it runs correctly: it visits the first record of each chunk and leaves the loop after the last one.
This is the basic method of manual working with summaries, that can satisfy many purposes depending on commands that form the functional part. Now its function is to give us a basic summary.
Step 4. Add the following script step
... Loop Go to Record/Request/Page [gPosition] Set Field [gPosition, gPosition + GetSummary(Amount Count, City)] Set Field [gReport, gReport & City & " "& GetSummary(Amount Total, City) & "¶"] Exit Loop If [gPosition > Status(CurrentFoundCount)] End Loop
The final touch should be to add grand summary at the end. To get grand summary of whole found set you should specify the summary field as the break field.
Step 5. Finally add the following script step at the end of the script:
... End Loop Set Field [gReport, gReport & "Total "& GetSummary(Amount Total, Amount Total)]
Now it's time to test this script in a big found set. Of course, the result will vary on different machines, but the overall impression is that it's pretty fast. If you test it further you'll see that the number of records in a found set has a very little effect to the speed; only the number of chunks matters.
What is the possible usage of the technique? Of course, it's useful only for summary reports and, since FileMaker has rather good capabilities of those, only for reports that beyond FileMaker's. FileMaker isn't too advanced a system (which makes us love it so much), but it definitely covers about 80-90% of typical user's need and even summary reports are often underused. But who knows what problem you'll be solving next month? May be it will require you to, say, include only those summaries that fall into a given range? To make a cross-tab report?
These samples show us two main applications of this technique. Since you can can get summaries very fast, you have time to analyze them and do something with the results, for example, exclude some data. And since you can get summaries very fast, you can format them in ways standard reports doesn't allow.
I'm going to guide you in both these areas. As a sample of analysis I'll use a standard summary report with a non-standard requirement: to include only data above a given value. As a sample of advanced formatting I'll use a cross-tab report, that is rather difficult to make in "native" FileMaker.
Sometimes you need to make a summary report that looks like standard, but requires a found set to be based on... summary data. For example, your final report must contains only summaries higher or lower than a given value or fall into a given range. Since it's impossible to search within a found set by summary values, such a report cannot be done in standard FileMaker. The Òfast summariesÓ technique fits here just right. We'll use it to make a quick summary analyzer to be run after the data are found and sorted and before the report is printed.
Let's continue with our sample file and assume we are to include only those cities, whose total amount of sales is is greater than a value, specified by global number gThreshold. The report must work with any found set, so we'll write only the part that starts when the found set it ready.
Step 1. Add a global number gThreshold.
Before adding basic stuff let's think how we are to deal with data chunks that don't fit our criteria. If we come to a chunk whose summary is below the threshold, we need to remove the whole chunk from our final report. How to do it?
There are different possibilities, but the simplest is to
omit these records from the found set (note that
Omit command doesn't change the current sort
status). If fact, it isn't simple, because FileMaker
doesn't provide us a way to omit an arbitrary number of
records. It can omit only a fixed number of them. Though
this wise decision deserves some kind words, we'd better
keep them and write a tiny script to omit any number of
records.
Since the script isn't directly related to our main
subject, I described it in appendix A. If you're
interested, you can examine it later. Now just assume we
have a script Omit (gOmitNumber) (the sample
file does) that omits a number of records specified by
global number gOmitNumber and continue.
So if we come to a chunk that is to be excluded, we omit it. Such an operation will move us right to the next chunk (if any), so we don't need to use Go to record script step. We'll use such a step if a chunk fits our condition. It means we need to rewrite our basic stuff a little:
Step 2. Add the basic stuff:
Perform Script [Sort by City]
Set Field [gPosition, 1]
Go to Record/Request/Page [gPosition]
Loop
If[GetSummary(Amount Total, City) >= gThreshold]
Set Field [gPosition, gPosition + GetSummary(Amount Count, City)]
Go to Record/Request/Page [gPosition]
End If
Exit Loop If [gPosition > Status(CurrentFoundCount)]
End Loop
This part does half of the job: it skips chunks that fit our criteria, so their data will appear in the report.
Step 3. Now let's code our main functionality:
...
If[GetSummary(Amount Total, City) >= gThreshold]
...
Else
Set Field [gOmitNumber, GetSummary(Amount Count, City)]
Peform Script[Omit (gOmitNumber)]
End If
...
Note that we don't need to change the Exit Loop
If condition; it still works just the same. That's all
and it's that simple. You can make the If
[...] statement as complicated and intellectual as you
need and be able to include and exclude report parts by
any criteria based on summary values: just create summary
fields to get the data and write formulas to analyze
them. It's not a problem to include only values that fall
into a given range or don't differ much from average or
anything like that.
Cross-tab reports are rather difficult to do in FileMaker; sometimes it's simpler to, say, import raw data into Microsoft Excel and perform analysis using its "pivot table" feature. Fast summaries provide a rather easy and fast way to have cross-tabs report right in FileMaker.
In fact it isn't so easy as two samples above: it will
take longer to explain it and the final script is going
to be about three times larger than Summary with
Threshold, but compared to other cross-tab reporting
methods it's very simple. Other methods I know typically
use a separate file that has a bunch of calculations with
aggregate functions to display data. Such methods are
rather cumbersome and not flexible in fact; for example,
they won't work with an arbitrary found set and the
number of columns is often fixed.
The principle of making a cross-tab report using "fast summaries" is similar to the first sample that gets summaries as text. A whole cross-tab report is placed into a single text field with columns created by tabs. The method not only works with any found set, it's also quite tolerant to the number of rows and even number of columns the report is going to have. Its limitations are those of FileMaker, which allows you to have up to 20 tabs and 64,000 characters in a text field. For the report it means not more than 20 columns and so many rows as would fit in 64,000 chars; seems to be enough for almost all imaginable cross-tab reports. By the way it's not a problem to split report's data into several fields, so really there are almost no limits. All this stuff would "cost" you a couple of scripts and a dozen of global fields; isn't it a good deal?
Setting tab stops in FileMaker is similar to word processing; if you haven't ever done it, please refer to "Specifying paragraph attributes and tab settings" section in FileMaker Pro Help. In the tutorial file gReport already has all the necessary tab stops.
Let's make a simple cross-tab report from our sales data. We need to put cities at the left and weeks at the top and enter the volume of sales for each city-and-week pair. We also need subsummaries by city and by week and a grand summary. The final report should look like this:
| City | Week 18 | Week 19 | Week 20 | Total |
| Arlington | 5833 | 9666 | 2011 | 17510 |
| Florence | 6796 | 6683 | 4022 | 17501 |
| Lexington | 6261 | 10879 | 2420 | 19560 |
| Raymond | 10198 | 7193 | 1563 | 18954 |
| Riverside | 4586 | 8965 | 5122 | 18673 |
| Total | 33674 | 43386 | 15138 | 92198 |
At this moment we're going to put a whole report into a single field. Note that it means all the report will be set in one font style. In most cases it's fine; later I'll briefly explain how to cut off headers and totals into separate fields to format them differently.
First let's rewrite our sample report without columns to better understand what data it contains.
| Data chunk | Summary |
| Arlington | |
| Week 18 | 5833 |
| Week 19 | 9666 |
| Week 20 | 2011 |
| Total of Arlington | 17510 |
| Florence | |
| Week 18 | 6796 |
| Week 19 | 6683 |
| Week 20 | 4022 |
| Total of Florence | 17501 |
| ... |
You can see that this is a typical subsummary report that
has no difficulties at all, except formatting. Since our
sales data contain exact date, we need one more field to
calculate the date's week. In the sample I use a
calculated field Week with the formula
Right("00000"& Int(Date/7), 6).
This gives us a fixed-length week identifier which is, as
you'll see later, a bit more
convenient.
The proper sort order for our report is that:
Will our technique work with it?
Sure it will. From the "fast summaries" point of view the data consist of big chunks of same city divided into smaller chunks of same week within this city. So the basic stuff is same: we sort our found set by city and week and then jump through chunks of data and put them into our text field. The only difference is that we need to form rows and columns. This is also simple: when we come to a new city chunk we start a new row and enter the city name; then we jump through week chunks and store their data in cells of this row, separating each entry with a tab; when we leave the last chunk we put the city summary in the last cell and finish the row.
Step 1. Add a new script with the following script steps.
Perform Script [Sort by City and Week] Set Field [gReport, ""] Set Field [gPosition, 1] Loop Exit Loop If [gPosition > Status(CurrentFoundCount)] End Loop
To handle cities and weeks we need two loops. The outer loop will form rows and the inner one will make cells within a row.
Step 2. Create a global number field gNextRow and add the following steps to the script:
Perform Script [Sort by City and Week]
Set Field [gReport, ""]
Set Field [gPosition, 1]
Set Field [gNextRow, 1]
Loop
Set Field [gNextRow, gNextRow + GetSummary(Amount Count, City)]
Loop
Go to Record/Request/Page [gPosition]
Set Field[gPosition, gPosition + GetSummary(Amount Count, Week)]
Exit Loop If [gPosition = gNextRow]
End Loop
Exit Loop If [gPosition > Status(CurrentFoundCount)]
End Loop
You can see that we use the familiar method to jump through small week chunks and introduce a slightly modified variation to catch the row ends.
The basic stuff is ready. Now we're going to make the functional part and will use tab characters to make cells. A convenient method to work with them is to create a "constant" field with tab character (the sample file already has such a field, a global text field cTab).
Step 3. Create a global number gRowTotal. Add the following steps to the script:
...
Loop
Set Field [gNextRow, gNextRow + GetSummary(Amount Count, City]
Set Field [gReport, gReport & City & cTab]
Set Field [gRowTotal, GetSummary(Amount Total, City)]
Loop
...
End Loop
Set Field [gReport, gReport & cTab & gRowTotal & "¶"]
Exit Loop If [gPosition > Status(CurrentFoundCount)]
End Loop
Since we'll be adding a row total right after we left the city chunk, its summary value will be unaccessible. That's why we cache the value in the intermediate field gRowTotal.
Step 4. Finally add the following command in the inner loop:
...
Loop Go to Record/Request/Page [gPosition] Set Field[gPosition, gPosition + GetSummary(Amount Count, Week)] Set Field [gReport, gReport & cTab & GetSummary(Amount Total, Week)] Exit Loop If [gPosition = gNextRow] End Loop ...
Is that all? Not yet. Though the script is correct and will work (you can even test it), we are still far from the finish. First, the script doesn't make the report header, the last line with subsummary values and the grand summary, but this just a minor part. The main problem comes from the fact some cells may be empty. For example, its quite likely our found set to have the following data:
| City | Week 18 | Week 19 | Week 20 | Total |
| Arlington | 0 | 9666 | 2011 | 11677 |
| Florence | 6796 | 0 | 4022 | 10818 |
In our raw data an empty cell is represented by... nothing. If a city doesn't have any sales on a given week then there won't be such records in the database. The found set for the sample above will look like this:
| Data chunk | Summary |
| Arlington | |
| Week 19 | 9666 |
| Week 20 | 2011 |
| Total of Arlington | 11677 |
| Florence | |
| Week 18 | 6796 |
| Week 20 | 4022 |
| Total of Florence | 10818 |
The corresponding data chunk simply doesn't exist, but a cell for it (with an extra tab and, perhaps, a zero or dash) must be created. It means each time we jump to a chunk we must already know which column it belongs to.
The thing we're missing is a header, that is a list of all week identifiers in a whole found set in a proper (sorted) order. If we had it, we would be able to calculate the position of a given week identifier within the list and, therefore, the week's column.
Though it might be possible to calculate the header "externally", that is from from search criteria or something like, we'll get the it from the found set itself. This method is more universal and allows us to get the report's header and footer at the same time.
The simplest way to do it is same as getting a summary in a text form. We need to sort our found set by a week, jump through week chunks and collect week identifiers and header and footer data into global fields. Then we sort the found set by city and week and make the report, using the identifiers to calculate columns and inserting the cached header and footer into appropriate places.
This isn't the fastest method, because it requires the found set to be sorted twice. Depending on the number of records and type of fields it may be rather slow. It's possible to avoid it and sort records only once; the "completed" file even contains such a script. Unfortunately it's more complex and doesn't fit well for the article, which is already long enough, so I decided not to explain it here. By the way in the sample file it works only one or two seconds faster, so it might be worth studying only if you face really time-consuming sort order. Let's continue with the slower but simpler one.
Step 5. Create global text fields gWeeks, gHeader, gFooter.
Step 6. Add the following commands to the beginning of the script:
Perform Script [Sort by Week] Set Field [gPosition, 1] Set Field [gWeeks, ""] Set Field [gHeader, "City"] Set Field [gFooter, "Total"] Loop Go to Record/Request/Page [gPosition] Set Field [gPosition, gPosition + GetSummary(Amount Count, Week] Set Field [gWeeks, gWeeks & Week & "¶")] Set Field [gHeader, gHeader & cTab & Week Display] Set Field [gFooter, gFooter & cTab & GetSummary(Amount Total, Week)] Exit Loop If [gPosition > Status(CurrentFoundCount)] End Loop Perform Script [Sort by City and Week] ...
The script is very simple and I think you don't need any explanations. When this part finishes, we'll have everything we need to make the report:
First let's look where to place the cached parts of the report:
Step 7. gHeader should be inserted into the
initial Set Field [gReport]:
... Perform Script [Sort by City and Week] Set Field [gReport, gHeader & cTab & "Total¶"] Set Field [gPosition, 1] ...
Step 8. gFooter goes to the very end of the script:
... End Loop Set Field [gReport, gReport & gFooter & cTab & GetSummary(Amount Total, Amount Total)]
To calculate a column we’ll use two global number fields gLastColumn and gColumn. The former will store the number of the last processed column, the latter will be set to the number of the current one and the difference between them will give us the number of empty chunks between the columns.
Since there could be more than one empty cell, a good solution would be to make a template of empty row: a list of zeros or dashes or whatever you're going to use to represent an empty value, separated by tabs. There should be so many zeros or dashes as there could be columns in the tab. With such a template we'll be able to insert any number of empty cells in one step. The best place to make such a row is the week-looping script we have just made.
Step 9. Create a global text field gEmptyRow. Add the following commands to the first part of the script:
Perform Script [Sort by Week] ... Set Field [gEmptyRow, ""] Loop ... Set Field [gEmptyRow, gEmptyRow & cTab & "0"] Exit Loop If [gPosition > Status(CurrentFoundCount)] End Loop ...
Step 10. Now we're ready to rewrite the main part. Create global numbers gColumn, gLastColumn, and gValue and modify the main part of the script:
...
Loop
...
Set Field [gLastColumn, 0]
Loop
...
Set Field [gValue, GetSummary(Amount Total, Week)]
Set Field [gColumn, WordCount(Left(gWeeks, Position(gWeeks, Week, 1, 1))]
Set Field [gReport, gReport & Left(gEmptyRow, (gColumn - gLastColumn - 1) * 2) & cTab & gValue]
Set Field [gLastColumn, gColumn]
...
End Loop
Set Field [gReport, gReport & Left(gEmptyRow, (WordCount(gWeeks) - gColumn) * 2)
...
End Loop
Note the formula we use to calculate a column:
WordCount(Left(gWeeks, Position(gWeeks, Week, 1, 1)))
Since our Week identifier is fixed-length, we don't need to put traditional "¶" around fields because fixed-length keys just cannot mess up. The the formula becomes simpler and far more readable. This is the advantage I told you about on page pageref.
It may happen some of the rightmost columns are empty; to handle this we repeat the empty column
check after the loop with a slightly different formula. The "magic" number 2 in these functions is just
Length(cTab & ”0”). If you’re going to use another value to represent an empty cell, for example, n/a or
none, you’ll have to change the number to get proper results.
The only purpose of the intermediate field gValue is to make the formulas simpler.
Guess what? We are ready! The cross-tab report routine has got real power! It makes the whole report with all necessary parts and cannot be fooled by missing data chunks.
We already did it; note that we have collected some values into intermediate fields and then inserted them into the final report. You can also use this method to really split a report into several fields, for example, to format them differently. Just create separate fields for header, reading column, body, both totals and grand summary and modify the corresponding script steps. You'll also have to carefully arrange the fields and set their sliding and shrinking options to make the report look nice.
If a field is empty it won't be counted by summary
function. Since we need to know a number of records in a
chunk despite of whether they contains a value in a
particular field or not, we'd better to make sure the
field we're counting does have a value. A typical
serial number or widely used constant =1
will do.
GetSummary() doesn't work with related fields!In most cases you run a report from a file that doesn't have real data from related files, only internal identifiers. In real database solution a file similar to our sample won't have city names, since they aren't guaranteed to be unique. Instead it will have city identifers, most likely serial numbers. Of course, in a report cities must be sorted by their names, not by their identifiers.
All this looks simple until you discover that
GetSummary() refuses to accept a reference to
a related field as the break field parameter. If
you try to do it, you'll get an empty result. Since this
behavior isn't documented and there's no problems with
related fields in standard summary reports, I consider it
a bug. The function just doesn't know what a related
field is; if you create a local calculation that gets
that field data and sort by it, GetSummary()
will work just fine. By the way, this is the main
workaround to this problem. Another possible solutions
are to lookup the necessary data as records are being
created (not quite safe) or to sort records twice: first
by the related filed and then by the serial and use
GetSummary() with the latter (too complex).
You know that standard FileMaker reports can be sorted by summary fields; unfortunately this sort order doesn't really sort records: it affects only printed or previewed reports. It's documented, so it should be considered a feature.
If the records were actually sorted, it would be easy to make reports of type Top N using "fast summaries" in the same manner we did the report with a threshold. Though we can still use the method to make such reports, it seems to be too much for the article.
Instead let's briefly review yet another possible method that can solve many summary report problems.
In the beginning of the article I mentioned exporting summarized data because of its remarkable speed; why not to use the exported records for reports?
Technically you'll need a dedicated "reports" file and an intermediate file to hold the data. First, you sort a found set the way you need, then export summarized data, import they into the "reports" file and work as with ordinary records. All the summaries that were so difficult to get in main files now in your full control.
Simple reports, such as those described in this article just doesn't deserve a separate file, and a complex cross-tab report won't be much easier to get, but for many advanced summary reports this is certainly an option - if you can spare one more file, of course.
There are 10 kinds of people: those that
understand binary and those that don't.
I think this idea was first time publicly released by Matt Petrowsky. We'll make a script that can omit 1, 2, 4, 8 and so on records (the "so on" is easier to understand if we rewrite the sequence as 20, 21, 22, 23,...) and then combine these commands to omit an exact number we need. It will be possible with any number, because basically it's same as writing the number as binary. Since any number can be written this way, any number of records can be omitted.
The only problem is that a script could contain only a fixed number of steps so it has to be big enough to omit any number we might need.
Create a global number field gOmitNumber. It's only purpose is to be a parameter for our script. The script itself is fairly straightforward:
If[Mod(Int(sNumber/2^0), 2)] Omit Multiple[No dialog, 1] End If
What it does? It checks if the number written as a binary digit contains 1 in the position of ones and then omits such a number of records. If the number doesn't have 1 in this place, it skips this step and continues. All the other steps are basically same:
... If[Mod(Int(sNumber/2^1), 2)] Omit Multiple[No dialog, 2] End If If[Mod(Int(var>sNumber/2^2), 2)] Omit Multiple[No dialog, 4] End If If[Mod(Int(sNumber/2^3), 2)] Omit Multiple[No dialog, 8] End If
First it goes slowly, but speeds up quite soon:
... If[Mod(Int(sNumber/2^18), 2)] Omit Multiple[No dialog, 262144] End If If[Mod(Int(sNumber/2^19), 2)] Omit Multiple[No dialog, 524288] End If If[Mod(Int(sNumber/2^20), 2)] Omit Multiple[No dialog, 1048576] End If
These 21 steps (they start from zero) should be enough for most applications. Such a script can omit any number of records from 1 to 2,097,152 (221-1) which should be ehough for most applications. If your database has more records, just add some more steps. Only ten more steps will give you more than a billion.
You can also rewrite it a bit and make each step subtract the number of records it omits from gOmitNumber and then put the last step inside a loop that should end when gOmitNumber is empty, but seems to be less clear.
Of course, you have to make sure you run the script under right circumstances: for example, if there's not enough records the script will give you weird results.
It's a long article, so if you'd like to have it printed get
the PDF version with FileMaker samples ( 1.1M).
Copyright © 2000-2004
Mikhail Edoshin
If you have problems viewing this site, please drop a note to the
webmaster