Benchmarks for QISLib_MT and QISBool (via clipextract)

Benchmarks

Benchmarking is difficult because one really must produce a specific output in a specific way that matches our client's needs. At the same time we are trying to run enough data points to understand how the programs work for different sized GDSII files, different hardware and so on.

Benchmark 1 - GDSII In, Windows, GDSII Output

Input File - P8.gds (9 GB) Layers 40,43,46,47 and 49 processed simultaneously.

Operating System - Windows 7 Pro SP1 64 bit

Hardware -- i7-3930K CPU 6 cores HyperThreading. 3.2 GHz, 32 GB RAM - SSD system disk 500 MB/sec RW

Benchmark Flow

The benchmark is performed using clipextract - an application (and a library) that utilizes QISLib_MT for extracting data from a supplied window and the QISBool for clipping and unionizing polygons passed to it.

The Clip Extract calling application uses both QISLIb_MT and QISBool to generate many small clips from a large layout.

Step-by-Step

1. The Clip Extract Executable reads a list of randomly generated 50 x 50 um windows from a file

2. Windows are queued up and sent to one of the threaded exploders.

3. The exploder thread traverses the hierarchy extracting any boundaries or paths that cross the window.

4. Geometries are accumulated in a polygon buffer. Paths are converted to polygons.

5. QISBool reads from the polygon buffer and optionally clips the polygons where they cross the window edge. It can also be instructed to completely unionize all touching and overlapping polygons than remain after clipping.

6. Results are stored in another buffer labeled V in the diagram. The input and output buffers share identical formats. QISBool itself is multithreaded.

7. If the Clip Extract application has been instructed to output GDSII, then the Clip Extract Library will format and write the contents of the V buffer to disk.

8. If the user prefers to read the clipped vectors directly from buffer V, the callback function makes the pointer to the buffer available.

This diagram does not show the opening of the GDSII file and building of the quad tree and loading of the entity data into RAM.

Output Clipping and Unionization Options

The number of windows that can be processed per second are greatly dependent on our choice of clipping and unionization. There are three options: NO CLIP-NO UNION, CLIP-NO UNION and finally CLIP-UNION.

NO CLIP -- NO UNION

This put essentially zero load on QISBool - the exploder extracts any polygon (or path) that crosses or is enclosed by the clipping window and those are immediately converted to GDSII by the clipextract library. In this case one would allocate all the threads to the exploder and we are essentially exploder limited. Is this useful to the receiving system? Depends on what is being done with the GDSII.

CLIP -- NO UNION

This requires that QISBool examine each polygon and if part of it extends past the window edge, it will be clipped and healed to the edge. This takes more computation power than in the first case, but not as much as fully unionizing all the polygons in the window.

CLIP -- UNION

This requires not only clipping of the polygons but then unionizing the entire set and is the most compute intensive option; because this computation goes up with the number of vertices, QISBool can subdivide the problem by assigning stripes to different threads. However now you are taking threads away from the exploder and other instances of QISBool that could be running simultaneously. So the allocation of threads for best performance is no longer obvious.

Results for Benchmark 1

Windows - 1000 randomly selected clipping windows of 50 x 50 um in size.

Polygon Density - approximately 8351 polygons per window per layer x 5 layers (almost all of these are rectangles based on 5 vertices per rectangle)

timing reported does not include the GDSII scan/load times.

Effect of Output Processing on Clip Rate

a) if we are not doing any Boolean operations (NO CLIP - NO UNION) then we are best off allocating all available CPU cores to the exploder.

b) Clipping is not much work for the Boolean so we are better off with 6 exploder threads and only one Boolean thread per exploder.

c) Union operation is a lot of work for the Boolean. So the tables have turned and we are better off with more threads enabled for the Boolean and less exploder threads.