/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | luams-3.c | A summary of this and other HPC projects is in high-performance.rhtf 19961015.xxxx Create LUAMS-2 to answer a new thread on sci.fractals about the Mandelbrot area. It seems they want to do a new GIPPP on our newer, faster machines. 19961017.2xxx Find my postings to sci.fractals from 1993. Notable points: - I was about to run a GIPPP myself but called it off in order to accomodate the distance-estimate GIPPP, which at the time was more mathematically interesting. - The 1.506595 result was calculated with a dwell limit of 512*1024 and period detection. It was an average of 40 runs, half of which straddled Re=0.5 (Seahorse Valley) and the other half of which ran right down it. - The error term 0.000002 came from the standard deviation for the 40 runs divided by sqrt(40). - An additional 0.0000056 should have been subtracted to account for the dwell limit 524288 that was used in the 40 runs. - It would be nice to calculate the total number of iterations iterated in the course of the area estimate. - It takes 7 years for computing power to grow enough to use a 2x2 larger grid. And other things I thought of while reading: - Given the trend in stderr, it takes a little under 20 years for computing power to grow enough to gain one decimal digit. - It would be nice to compute the center-of-gravity of the Mandelbrot Set at the same time as the area. 19961018.14xx. Add period detection. grid itmax area stderr time 32 256 1.509723 0.039114 1.60 32 512 1.500947 0.038372 0.63 64 512 1.516968 0.015629 2.00 64 1024 1.512485 0.016269 2.30 128 1024 1.510655 0.007460 7.92 128 2048 1.508534 0.007052 8.87 256 2048 1.508178 0.002845 33.2 256 4096 1.507292 0.002688 36.6 512 4096 1.507517 0.001122 141 512 8192 1.507116 0.001097 155 1024 8192 1.506971 0.000489 607 1024 16384 1.506779 0.000501 661 19961018.16xx. Add test for period-2 mu-atoms, but it doesn't work. Add skewing along Im axis and code to deal with the first row differently. 32 256 1.530539 0.039807 1.45 32 512 1.524781 0.038548 0.63 64 512 1.508679 0.017331 2.00 64 1024 1.504898 0.019480 2.32 128 1024 1.509324 0.005810 7.90 128 2048 1.507078 0.005430 8.88 256 2048 1.508248 0.002059 33.2 256 4096 1.507333 0.002191 36.6 512 4096 1.507676 0.001369 149 512 8192 1.507299 0.001376 166 1024 8192 1.507195 0.000489 675 19961018.1635. Fix period-2 test. Increase number of iterations greatly to reduce that source of error. 19961018.1717. Add printout of total aggregate iterations. 32 4096 1.512 842 730 0.035 980 710 771,615 2.07 64 8192 1.502 917 751 0.019 617 425 2,714,431 4.40 128 16384 1.505 115 841 0.005 550 857 10,725,998 12.3 256 32768 1.506 413 466 0.002 277 286 45,943,157 49.6 512 65536 1.506 898 665 0.001 368 183 200,294,650 218 1024 131072 1.506 801 245 0.000 503 518 877,059,533 895 2048 262144 1.506 675 767 0.000 151 678 3,830,269,312 3737 4096 524288 1.506 588 196 0.000 074 581 16,898,067,806 15840 8192 1048576 1.506 595 617 0.000 027 484 74,884,551,084 67570 19961021.1248. Make m_area save to a file periodically, with filename based on gridsize and n_max so (potentially) different projects can be started and stopped at different times. It also has the nice side-effect that if you run the same gridsize again after recompiling (or whatever) it returns the area for that gridsize immediately. 1024 262144 1.506 622 314 46,910,573 2048 524288 1.506 286 621 213,925,295 19961021.1315. Go back to m_area_stat, which now automatically benefits from the changes to m_area. Increase itmax again and begin a new set of runs (Also, the skew parameter has become an integer, so two things have changed): 32 8192 1.511 458 647 0.037 221 915 904,123 64 16384 1.502 640 743 0.019 431 398 3,133,053 128 32768 1.504 840 699 0.005 686 364 11,973,129 256 65536 1.506 341 352 0.002 261 414 51,835,404 512 131072 1.506 886 924 0.001 371 587 225,038,584 1024 262144 1.506 783 392 0.000 493 761 971,556,504 2048 524288 1.506 674 263 0.000 152 765 4,248,279,832 19961022 4096 1048576 1.506 585 067 0.000 074 146 18,687,285,458 19961023 8192 2097152 1.506 593 810 0.000 027 584 83,170,102,744 19961028 16384 4194304 1.506 588 027 0.000 012 606 368,748,991,052 (19961107) 32768 8388608 1.506 591 544 0.000 004 484 1,416,385,072,113 (19961209) 65536 16777216 1.506 592 300 0.000 001 900 3,728,239,546,503 19961028.17xx. Add nice() routine, so that it will avoid slowing down the Mac when the mouse is being moved. This does a pretty good job of keeping LUAMS out of the way while I'm doing development. 19961029.1550. Small changes to nice(). 19961104.2xxx. rebuild Power MANDELZOOM projects, preparing to merge pipelined iteration code into LUAMS. 19961105.1458. Finally add Jay Hill's tests for period-1 and -2 mu-atoms (replacing my old test for the period-2 mu-atom). The iterations column shows how much this speeds up the program: 32 8192 1.511 458 647 0.037 221 915 276,466 64 16384 1.502 640 743 0.019 431 398 993,155 128 32768 1.504 840 699 0.005 686 364 4,144,440 256 65536 1.506 341 352 0.002 261 414 17,877,456 512 131072 1.506 886 924 0.001 371 587 79,693,035 Immediately replace the ongoing background task so it can benefit from the speedup. 19961105.1650. Add knowledge of exact value (7 pi / 16) for period 1 and 2 mu-atoms; pixel counting only used to estimate area of the higher mu-atoms. As expected, this lowers the standard deviation a lot at first, then substantially less and eventually not at all: 32 8192 1.510 347 371 0.028 337 143 276,466 64 16384 1.501 618 093 0.013 830 156 993,155 128 32768 1.506 875 646 0.005 529 593 4,144,440 256 65536 1.506 404 500 0.001 983 789 17,877,456 512 131072 1.506 870 205 0.001 167 508 79,693,035 1024 262144 1.506 775 316 0.000 508 381 361,842,027 (This happens because the standard error has two components: that which comes from the period-1 and -2 mu-atoms and that which comes from the rest of the Set. The component of error from the two mu-atoms goes down as (grid size ^ 1.0)/(grid size^2) because the mu-atoms' boundaries have dimension 1.0; the other error term goes down slower and as a result it dominates when the grid size is arbitrarily large.) 19961107.xxxx. Merge in pipelined iteration code and (almost) verify accuracy. It seems to differ on a few points, probably because of the differences in the iteration algorithms. Also, it doesn't provide a speed improvement (yet) because it doesn't have orbit detection. (In other words, the simple iterator with orbit detection is faster than the pipelined iterator without orbit detection.) 19961107.17xx. Two sources of discrepancy were found: One was a bug (not flushing the pipeline before doing the first row adjustment) and the other results from the differences in the iteration formulas (such as using zi2 = 2*(zr*zi + ci/2) rather than zi2 = zr*zi + ci). The following are the new standard results to compare to (times are approximate): 32 8192 1.511 458 647 0.037 221 915 275,459 0.267 64 16384 1.502 640 743 0.019 431 398 967,521 0.867 128 32768 1.504 914 742 0.005 692 132 4,188,562 4.533 256 65536 1.506 386 129 0.002 312 484 17,981,647 15.87 512 131072 1.506 869 368 0.001 349 102 80,190,202 64.72 {For the next 15 months the program's performance is hindered because I don't realize that the loop unrolling makes the period detection so inefficient as to cancel out the benefit of iterating two points at once. -19990520} 19961107.2039. Pipeline now includes orbit detection, but isn't any faster than single iteration. Don't really know why... Subject: Mandelbrot Set Area, new result From: Munafo@prepress.pps.com Date: 1996/12/23 Message-Id: <32BED4D5.BF8@prepress.pps.com> Newsgroups: sci.fractals I have continued to compute the area of the Mandelbrot Set using my method of averaging 20 runs by the pixel-counting method, with slightly different grid placements and grid spacings. The latest result (after about 3.7 trillion Mandelbrot iterations) is 1.50659230 +- 0.0000006. Here are the results of a set of such averaged runs. For each grid size, 20 runs were computed and averaged together. grid dwell average standard size limit area deviation ---- ------- ------------- ------------- 32 8192 1.511 4 0.037 2 64 16384 1.502 6 0.019 4 128 32768 1.504 84 0.005 68 256 65536 1.506 34 0.002 26 512 131072 1.506 88 0.001 37 1024 262144 1.506 783 0.000 493 2048 524288 1.506 674 0.000 152 4096 1048576 1.506 585 0 0.000 074 1 8192 2097152 1.506 593 8 0.000 027 5 16384 4194304 1.506 588 0 0.000 012 6 32768 8388608 1.506 591 54 0.000 004 48 65536 16777216 1.506 592 30 0.000 001 90 The standard deviation measures how much (on average) each run differs from the average. This means that the average itself almost certainly varys even less from the true area. For 20 samples, the standard error of the mean is about 0.23 of the standard deviation. A safe estimate of the error would be about 1/3 of the standard deviation. (My old estimate, from April 13th 1993, was 1.506595 +- 0.000002. That estimate was based on an average of 40 runs, and used a dwell limit of 524288. I should have subtracted 0.000005 to account for the error in using such a low dwell limit; I also was a little too "liberal" in my calculation of the expected error.) - Robert Munafo {Big hiatus here -19990520} --------------------------------------------- 19971010.0055. Resurrect the project. I have just gotten it to run again after updating to the new CodeWarrior. It was crashing because g_dp was being set to dwell2, which caused an infinite loop (the fix was to set g_dp to dwell3). Can't tell how fast it's running because the iteration counts are way off due to orbit detection -- it's adding up the dwell values returned by the pipeline, not the actual number of iterations performed. With this new build on the 6100/80 I am starting a new set of runs: 32 8192 1.511 458 647 0.037 221 916 1,427,000 0.367 64 16384 1.502 640 744 0.019 431 398 7,705,483 0.983 128 32768 1.504 914 742 0.005 692 132 53,725,852 3.667 256 65536 1.506 386 130 0.002 312 485 389,356,427 13.72 512 131072 1.506 869 368 0.001 349 103 2,983,154,869 56.55 1024 262144 1.506 781 916 0.000 492 708 23,278,493,347 262.9 2048 524288 1.506 673 891 0.000 151 484 183,890,967,883 1405 4096 1048576 1.506 583 929 0.000 074 489 1,461,504,579,703 5106 19971011 8192 2097152 1.506 593 644 0.000 027 695 11,658,311,598,595 59242 19971012.1454. Make it measure ticks only while it's actually calculating, and make it record ticks in the files. This requires trashing all the files so far, so I have to start over. While I'm at it, I also make it record the actual number of iterations and the "simulated" number of iterations seperately, which had stopped being done right after loop detection was added. It is somewhat disturbing to notice that the actual area values are changing too. Probably something to do with the pipeline restart after loading from a file -- I think perhaps it isn't flushing the pipeline before doing saves. 32 1.511 458 647 0.037 221 916 13,698,616 8192 460,236 0.55 64 1.502 640 744 0.019 431 398 84,071,307 16384 1,452,625 1.1667 128 1.504 914 742 0.005 692 132 585,681,564 32768 6,091,786 3.5833 256 1.506 386 130 0.002 312 485 4,358,544,267 65536 30,530,867 13.05 512 1.506 751 662 0.001 348 801 33,606,713,474 131072 165,141,228 54.817 1024 1.506 771 394 0.000 495 651 263,814,226,416 262144 1,048,193,486 251.7 2048 1.506 673 891 0.000 151 484 2,090,491,542,859 524288 7,455,656,801 1349.6 4096 1.506 584 024 0.000 074 387 16,643,863,788,711 1048576 55,389,875,749 8218.1 19971012.2347. Yep, it wasn't flushing the pipeline before doing file operations. Change format of output and begin a whole new set of runs. 32 1.511 458 647 0.037 221 916 13,698,616 8192 460,236 0.55 5.8575 MFLOPs 64 1.502 640 744 0.019 431 398 84,071,307 16384 1,452,625 1.1667 8.7157 MFLOPs 128 1.504 914 742 0.005 692 132 585,681,564 32768 6,091,786 3.55 12.012 MFLOPs 256 1.506 386 130 0.002 312 485 4,358,495,089 65536 30,543,955 13.05 16.384 MFLOPs 512 1.506 869 368 0.001 349 103 33,607,097,921 131072 165,722,959 55.183 21.022 MFLOPs 1024 1.506 781 916 0.000 492 708 263,815,177,892 262144 1,050,371,926 252.75 29.090 MFLOPs 2048 1.506 673 891 0.000 151 484 2,090,497,201,424 524288 7,464,872,712 1359 38.451 MFLOPs 19971013: 4096 1.506 584 024 0.000 074 387 16,643,919,148,551 1048576 55,472,219,207 8321.5 46.663 MFLOPs 8192 1.506 593 644 0.000 027 695 132,836,944,645,477 2097152 428,665,503,505 57369 52.304 MFLOPs 19971018: 16384 1.506 588 076 0.000 012 669 1,061,430,262,321,482 4194304 3,365,249,417,036 4.0937e5 57.544 MFLOPs 19971014.14xx Look over the code thoroughly, trying to figure out why the pipeline is running only half as fast as theory indicates it should. I don't see anything, so I'll have to do some profiling. Given the magnitude of the discrepancy, just breaking into Macsbug a bunch of times should be sufficient to find where it's spending half its time. 19971014.235x I check this and find that it's actually spending almost all of its time in the unrolled loop. However, there are a lot of pipeline stalls because the compiler reordered the instructions. After I turn off optimization and recompile it is in fact a bit faster, but not nearly as faster as I thought it would be. It looks like the asymptote is a little over 60 MFLOPs. Then I remember that it's double precision, and the next day I check Power MANDELZOOM and see that it only performs at 64 MFLOPs double anyway. 19971018.23xx. Read the notes from 19961017 and realize I still haven't done the center-of-gravity thing. Oops. 32 8192 1.511 458 647 0.037 221 916 13,698,616 -0.292 565 786 0.014 640 563 460,236 0.517 6.236 MFLOPs 64 16384 1.502 640 744 0.019 431 398 84,071,307 -0.287 474 451 0.004 811 761 1,452,625 1.78 5.702 MFLOPs 128 32768 1.504 914 742 0.005 692 132 585,639,290 -0.286 495 853 0.002 352 688 6,094,438 3.58 11.905 MFLOPs 256 65536 1.506 386 130 0.002 312 485 4,358,537,348 -0.286 741 015 0.000 912 340 30,588,764 12.8 16.750 MFLOPs 512 131072 1.506 869 368 0.001 349 103 33,607,168,662 -0.286 880 847 0.000 416 546 165,498,826 54.5 21.263 MFLOPs 1024 262144 1.506 781 916 0.000 492 708 263,814,920,625 -0.286 801 539 0.000 173 164 1,049,624,871 246 29.910 MFLOPs 2048 524288 1.506 673 891 0.000 151 484 2,090,496,819,135 -0.286 811 014 0.000 074 787 7,461,538,307 1,310 40.011 MFLOPs 4096 1048576 1.506 584 024 0.000 074 387 16,643,914,697,009 -0.286 765 808 0.000 024 839 55,470,040,257 7,970 48.708 MFLOPs 8192 2097152 1.506 593 644 0.000 027 695 132,836,936,354,151 -0.286 769 509 0.000 009 733 428,624,195,429 54,700 54.845 MFLOPs 19971021.xxx.x I get ready to post a new message to sci.fractals: Recently I have decided to restart the computing effort I undertook about a year ago at this time to compute the area of the Mandelbrot Set. After reviewing the notes from that time I saw that there had been some interest in the past of computing the Mandelbrot Set's "center of gravity" as well. I have also done some optimizations of the code in the mean time and fixed a few bugs, and am ready to do some long runs. The method I use involves counting pizels on a grid. There is error in the estimate resulting from grid placement and the fact that the grid squares on the boundary get counted as completely in or completely out when in fact only part of their area is in. It is hard to estimate the magnitude of this error, so I use statistical methods to measure it. I use a bunch of grids of slightly different grid spacings with slight vertical and horizontal offsets, and take the mean and standard deviation of all the runs. These are the figures I've compiled so far. Each run is the average of 20 grids for which the standard deviation is shown. grid iteration area and std deviation size limit center-of-gravity (20 runs) 256 65536 1.506 39 0.002 31 -0.286 741 0.000 912 512 131072 1.506 87 0.001 35 -0.286 881 0.000 417 1024 262144 1.506 782 0.000 493 -0.286 802 0.000 173 2048 524288 1.506 674 0.000 151 -0.286 811 0 0.000 074 8 4096 1048576 1.506 584 0 0.000 074 4 -0.286 765 8 0.000 024 8 8192 2097152 1.506 593 6 0.000 027 7 -0.286 769 51 0.000 009 73 For each average, the standard deviation gives the expected error in each of the 20 individual grids. The expected error of the mean is 1/sqrt(19) of the standard deviation (this is called "the standard error of the mean"). I use 1/3 as a conservative estimate. So, my best estimate of the center-of-gravity is -0.286769 +- 0.000003. A few other observations: - My program currently runs at 8 million Mandelbrot iterations per second. The current model Macs are about 4 times faster. Intel machines get nowhere close because they don't have enough floating-point registers to keep the pipeline full. - Each time I double the grid size it takes almost 8 times as long. It takes 7 years for the industry to increase computing power of personal computers by 8. - Each time the gridsize is doubled the error goes down to about 0.4 of the previous value. It would take about 20 years for personal computers to improve enough to give us one more digit in the area estimate (given the same run time). - My best area estimate is still the figure from last year: 1.5065923 +- 0.0000006. and then I notice that the current version is much slower than the version I was running when I posted the results a year ago. The number of iterations is going up by about 7.5 each time, and last year's version was more like 4.5. I guess almost right away that it's because of the new pipelined routines' rather poor ability to detect limit cycles. 19971021.23xx. I do some runs with the old iterator routine just to verify this. 32 8192 1.511 458 647 0.037 221 916 13,648,582 -0.292 565 786 0.014 640 563 275,459 0.383 5.0301 MFLOPs 64 16384 1.502 640 744 0.019 431 398 83,832,862 -0.287 474 451 0.004 811 761 967,521 0.867 7.8146 MFLOPs 128 32768 1.504 914 742 0.005 692 132 584,564,367 -0.286 495 853 0.002 352 688 4,188,562 2.97 9.8831 MFLOPs 256 65536 1.506 386 130 0.002 312 485 4,354,401,151 -0.286 741 015 0.000 912 340 17,981,647 11.9 10.533 MFLOPs 512 131072 1.506 869 368 0.001 349 103 33,588,336,186 -0.286 880 847 0.000 416 546 80,190,202 51.1 10.989 MFLOPs 1024 262144 1.506 781 916 0.000 492 708 263,748,416,082 -0.286 801 539 0.000 173 164 362,847,780 224 11.347 MFLOPs 2048 524288 1.506 673 891 0.000 151 484 2,090,234,952,071 -0.286 811 014 0.000 074 787 1,678,084,515 1,010 11.608 MFLOPs 4096 1048576 1.506 584 024 0.000 074 387 16,642,875,255,072 -0.286 765 808 0.000 024 839 7,780,618,202 4,600 11.846 MFLOPs and I quickly decide that I should try to figure out a way I can write some code to *guess* whether each point will be in the set or not (e.g., based on the result of the most recently completed point). It can use the double pipeline for points guessed outside the set and the single pipeline for points guessed in the set. 19971022.01xx. Optimize the single iterator routine a bit (the rad2 test in particular could use some work). This improves the times substantially. 32 8192 1.511 458 647 0.037 221 916 13,664,870 -0.292 565 786 0.014 640 563 291,747 0.183 11.139 MFLOPs 64 16384 1.502 640 744 0.019 431 398 83,882,823 -0.287 474 451 0.004 811 761 1,017,482 0.600 11.871 MFLOPs 128 32768 1.504 914 742 0.005 692 132 584,737,298 -0.286 495 853 0.002 352 688 4,361,493 2.23 13.670 MFLOPs 256 65536 1.506 386 130 0.002 312 485 4,355,042,622 -0.286 741 015 0.000 912 340 18,623,118 9.37 13.918 MFLOPs 512 131072 1.506 869 368 0.001 349 103 33,590,805,167 -0.286 880 847 0.000 416 546 82,659,183 40.2 14.376 MFLOPs 1024 262144 1.506 781 916 0.000 492 708 263,758,102,020 -0.286 801 539 0.000 173 164 372,533,718 176 14.839 MFLOPs 2048 524288 1.506 673 891 0.000 151 484 2,090,273,319,600 -0.286 811 014 0.000 074 787 1,716,452,044 788 15.246 MFLOPs I try making it call the dual pipeline whenever the dwell was greater than 50 but not in the set, the results are encouraging at first and then much, much worse at higher grid sizes: 256 65536 22,167,716 9.2 sec 16.867 MFLOPs 512 131072 109,955,129 38.9 sec 19.769 MFLOPs 1024 262144 623,655,741 176 24.776 2048 524288 4,006,852,998 888 31.588 4096 1048576 27,710,036,024 4,860 39.904 8192 2097152 177,665,018,000 27,300 45.543 so I go back to the all-dwell1 version: 4096 1048576 1.506 584 024 0.000 074 387 16,643,027,976,653 -0.286 765 808 0.000 024 839 7,933,339,783 3,580 15.502 MFLOPs 8192 2097152 1.506 593 644 0.000 027 695 132,833,077,382,170 -0.286 769 509 0.000 009 733 36,987,339,092 16,300 15.852 MFLOPs 16384 4194304 1.506 588 076 0.000 012 669 1,061,413,532,170,594 -0.286 767 267 0.000 003 204 171,493,547,507 74,400 16.138 MFLOPs 32768 8388608 1.506 591 501 0.000 004 512 8,486,342,263,388,365 -0.286 768 373 0.000 001 774 814,136,316,927 347,000 16.442 MFLOPs 65536 16777216 1.506 592 311 0.000 001 904 67,870,833,876,143,188 -0.286 768 423 0.000 000 510 3,844,750,451,387 1,630,000 16.529 MFLOPs 19980218 compose a new article for sci.fractals: Mandelbrot Set center-of-gravity is -0.28676842 +- 0.00000017 Recently I restarted the computing effort I undertook about a year ago at this time to compute the area of the Mandelbrot Set. After reviewing the notes from that time I saw that there had been some interest in the past of computing the Mandelbrot Set's "center of gravity" as well. I have also done some optimizations of the code in the mean time and fixed a few bugs, and have started doing a long run. The method I use involves counting pizels on a grid. There is error in the estimate resulting from grid placement and the fact that the grid squares on the boundary get counted as completely in or completely out when in fact only part of their area is in. It is hard to estimate the magnitude of this error, so I use statistical methods to measure it. I use a bunch of grids of slightly different grid spacings with slight vertical and horizontal offsets, and take the mean and standard deviation of all the runs. These are the figures I've compiled so far. Each run is the average of 20 grids for which the standard deviations are shown. grid iter area and standard iterations size limit center-of- deviations time (seconds) gravity (20 runs) performance (MFLOPs) 256 65536 1.506 38 0.002 31 18,623,118 -0.286 741 0.000 912 9.4 13.918 512 131072 1.506 87 0.001 35 82,659,183 -0.286 881 0.000 417 40.2 14.376 1024 262144 1.506 782 0.000 493 372,533,718 -0.286 802 0.000 173 176 14.839 2048 524288 1.506 674 0.000 151 1,716,452,044 -0.286 811 0 0.000 074 8 788 15.246 4096 1048576 1.506 584 0 0.000 074 4 7,933,339,783 -0.286 765 8 0.000 024 8 3,580 15.502 8192 2097152 1.506 593 6 0.000 027 7 36,987,339,092 -0.286 769 51 0.000 009 73 16,300 15.852 16384 4194304 1.506 588 1 0.000 012 7 171,493,547,507 -0.286 767 27 0.000 003 20 74,400 16.138 32768 8388608 1.506 591 50 0.000 004 51 814,136,316,927 -0.286 768 37 0.000 001 77 347,000 16.442 65536 16777216 1.506 592 31 0.000 001 90 3,844,750,451,387 -0.286 768 423 0.000 000 510 1,630,000 16.529 For each average, the standard deviation gives the expected error in each of the 20 individual grids. The expected error of the mean is 1/sqrt(19) of the standard deviation (this is called "the standard error of the mean"). I use 1/3 as a conservative estimate. So, my best estimate of the center-of-gravity is -0.28676842 +- 0.00000017. A few other observations: - My program currently runs at about 2.35 million Mandelbrot iterations per second. The latest model Macs are about 4 times faster. Intel machines get nowhere close because they don't have enough floating-point registers to keep the pipeline full. - Each time I double the grid size it takes almost 4.7 times as long. The current rate of progress in computer performance is about 1.56 per year; at that rate it takes about 3.5 years to increase computing power by 4.7. - Each time the gridsize is doubled the error goes down to about 0.4 of the previous value (but it varies from 0.3 to 0.55). It would take about 8 to 10 years for personal computers to improve enough to give us one more digit in the area estimate (given the same run time). - My new program reproduces my best area estimate from late 1996: 1.5065923 +- 0.0000006. http://www.cecm.sfu.ca/projects/ISC/ISCmain.html gives the following for 0.286... (and way too much output for 1.506...): 2867682063800133 = (0250) F(44/59;39/56;1) 2867682149714066 = (0250) F(19/21;11/13;1) 2867682521500307 = (0404) Psi(5/12)+Psi(3/4)-Psi(19/20) 2867682633829350 = (0001) (1/3-ln(3))^Feig1 2867682783580957 = (0010) sum((11/3*n^3-20*n^2+157/3*n-25)/(n!+1),n=1..inf) 2867682869758240 = (0064) sum((7/3*n^3-13*n^2+83/3*n+3)/(Fibo(n)+1),n=1..inf) 2867682944514431 = (0326) 4^(3/4)*(12-12^(1/4)) 2867682960509936 = (0001) Catalan+2/3+GAM(17/24) 2867682970307599 = (0192) (GAM(5/6)+1/3)/(ln(3)+4) 2867683491264107 = (0400) sum(1/(n!+7/6*n^3-5*n^2+131/6*n-14),n=1..inf) 2867683638369271 = (0395) sum(1/C(2*n,n)/(17/6*n^3-13*n^2+133/6*n-10),n=1..inf) 2867683747960658 = (0013) sum((8/3*n^3-27/2*n^2+167/6*n-3)/n^(n-1),n=1..inf) 2867683843843252 = (0259) F(3/11,9/10;4/9,4/5,1/2;1) Artin = 0.3739558136192022 gamma = 0.57721566490153286060651209008240 TwinPrim = 0.6601618158468695 Catalan = 0.9159655941772190150546035149324 Feig2 = 2.502907875095892822283902873218215786381271376727149977336192056 Feig1 = 4.6692016091029906718532038 Gibbs + BesI(1,1) = 2.417096155974951 5606298864626305 = (0001) Gibbs+3/2*Feig2 5606312839756551 = (0001) Artin+Gibbs^Khint 5606382942821305 = (0001) Gibbs-ln(3)^E 8606371538195862 = (0405) Cahen/Parking 5606401453580847 = (0192) (sin(1)+3)/(Gibbs+5) 19980302 get the new result: 131072 33554432 1.506 591 734 0.000 000 624 18,173,551,931,685 -0.286 768 317 0.000 000 295 7,590,000 16.754 19980304 It seems clear that the Mandelbrot Set center of gravity is (ln(3)-1/3)^Feig1, which comes out to about 0.2867682633829350268529586 {Another big hiatus here -19990520} ------------------------------------- 19990518.1xxx. Port the program to gcc under Linux, and call it luams-3.c. It seems to produce fairly similar results, and runs at about 32 MFLOPs on a 233-MHz Pentium MMX. 19990518.1820. I've gotten it up to 38 MFLOPs by changing the optimization flags and rearranging the inner loop a bit. For some reason, optimization makes it print different values of g_act_its. I need to figure out why in case this represents a bug. Gridsize 32, itmax 8192 Area 1.5114586 +- 0.0372219 Its: 13664870 Center of gravity -0.29256579 +- 0.01464056, Actual: 291766 Gridsize 64, itmax 16384 Area 1.5026407 +- 0.0194314 Its: 83882823 Center of gravity -0.28747445 +- 0.00481176, Actual: 1011791 Gridsize 128, itmax 32768 Area 1.5049147 +- 0.0056921 Its: 584735910 Center of gravity -0.28649585 +- 0.00235269, Actual: 4367224 Gridsize 256, itmax 65536 Area 1.5063889 +- 0.0022895 Its: 4355035389 Center of gravity -0.28675522 +- 0.00091069, Actual: 18674354 Gridsize 512, itmax 131072 Area 1.5068630 +- 0.0013543 Its: 33590760535 Center of gravity -0.28687915 +- 0.00041832, Actual: 82279430 Gridsize 1024, itmax 262144 Area 1.5067819 +- 0.0004926 Its: 263758161877 Center of gravity -0.28680108 +- 0.00017393, Actual: 372254099 19990519.1xxx. Add im_sqrt parameters so the inner loop doesn't have to test for iterations overflow. This doesn't seem to speed it up as much as I was hoping it would. I would guess that results from speculative execution of the floating-point operations. 19990519.2356. Today I make it catch the SIGTERM signal so it will automatically stop when the system is rebooted normally, make .bashrc launch luams at startup if it's not already running, and write a script to find and kill the luams process to use when I need to. I also restore the l_ticks and g_ticks variables and related benchmarking calculations, which I removed during yesterday's port because I didn't yet know how to measure user CPU time under Linux. All of these changes bring LUAMS back up to the level of usability it had in the Power Macintosh incarnation, and now it's on a more stable OS and significantly faster hardware. 19990520.xxxx Add iter_type typedef to make it easy to change the floating-point type for benchmarking purposes. float isn't any faster than double on a Celeron. Also, I discover that the 333-MHz Celeron is in fact a lot faster than the 233-MHz Pentium MMX at work, despite the Linux "BogoMIPS" values. 19990603.2430 Change prototype of main() so gcc doesn't complain. 19990628.2422 Check current state of luams-3 output (it has been running as a background task for about 5 weeks now) and find that it has finished the 131072 gridsize and one iteration (out of 20) of the next gridsize: Gridsize 131072, itmax 67108864 Area 1.506591696 +- 0.000000627 Its: 1085772250631444522 Center of gravity -0.2867683146 +- 0.0000002961 Actual: 22749243215253 Total computation time: 2100630400 msec, 75808 KFLOPs it is notable that the area estimate is a little lower (by 0.000 000 030) which I attribute to the higher itmax. 19991021 It has finished its 262144 grid: Gridsize 262144 (34359738368 pixels) .................... Gridsize 262144, itmax 134217728 Area 1.506591772 +- 0.000000242 Its: 8685544402515235059 Center of gravity -0.2867684377 +- 0.0000000748 Actual: 107475449884576 Total computation time: 1303136528 msec, 76045 KFLOPs The total computation time is wrong (because of overflow). The actual time was about 9.9 million seconds or about 115 days; in real-time the program was running continuously since 20000519. From this I derive my estimates: area = 1.50659177 +- 0.00000008 c-g = -0.28676844 +- 0.000000025 After a few weeks I add these results to my web pages but I don't post them to a newsgroup until 2000. {Another big hiatus here} -------------------------------------------------- 20030925 Make luams-3 a standard background task on S3. Replace WRITE_INTERVAL with a variable, to make it cache a little more on each of the already-finished cases each time it restarts. Increase max gridsize to 1048576 so it can be left running for a very long time without running out of things to do. Port it to MacOS X 10.2.x (mainly involves using getrusage instead of clock() and changes to how time is counted and printed out) 2003121x Divide std deviation by 3 when printing to more accurately reflect the error in the area estimate. 20040903 gridsize 524288 has finished, but the work was wasted because g_inset overflowed. Change it to 64-bit. 20040906 Fix overflow bug in g_inset that caused loss of entire 524288-size grid results. {Another big hiatus here} -------------------------------------------------- 20101017 Get it running again (mainly just updating the makefile after an upgrade to MacOS 10.6 caused me to audit all of my makefiles) 20101019 After about 1.5 days as a single thread, it has completed all gridsizes up to 131072. Add COMPILE_DATE, usage() and command-line argument parsing. Add -slice option to permit use of multiple copies of #luams# for concurrency (which can be run on separate machines and then merged by copying the cache files to one location, or can all be run on the same machine). Compute KFLOPs with floating-point math. Add check_sizes() and update the definitions of signed16, etc. for 64-bit compilers. (The existing cache files have to be trashed because some of the variables were the wrong size. They are now archived as "cache-20101019.1400.zip") I do some sanity testing with gridsize 4096, then start a set of 4 slices, now starting over from scratch. 20101023: Change R_LO, R_HI, and I_HI; add standard deviation to output Calculate standard error based on actual number of samples being averaged (rather than hard-coded to 3.0 as before), and multiply by 1.96 to get the 95% confidence interval; these combine to make the new std error estimate more conservative than before (because 1.96/sqrt(20) is greater than 1/3.0). Std dev and std err are both now returned explicitly from m_area_stat, and both are printed Do not print center of gravity average if PERFECT_CARDIOID is defined Keep track of total number of pixels tested (g_tested variable) which is now also saved in the cache files. This version is in "old-versions/luams-20101023.txt" 20101024 I have done several important changes over the past 24 hours, and now have a sync issue: the cache files from the 20101019 and earlier versions will not work with the new changes, but I want to "salvage" as much of that cached work as I can to get some results for future comparison. So I am going to re-do the recent changes in an order that allows me to keep the old cache structure for as long as possible. First few changes: * Add SKEW_INC_MAX define, use it in the one place where STAT_NSAMP was being used to compute skew_inc, so that I can change the number of samples without affecting the skews (and therefore the cache file names) * Do not print center of gravity average if PERFECT_CARDIOID is defined * Display PERFECT_CARDIOID and USE_FILES options after COMPILE_DATE * Calculate standard error based on actual number of samples being averaged (rather than hard-coded to 3.0 as before), and multiply by 1.96 to get the 95% confidence interval; these combine to make the new std error estimate more conservative than before (because 1.96/sqrt(20) is greater than 1/3.0). * Std dev and std err are both now returned explicitly from m_area_stat, and both are printed This version of the code is in "old-versions/luams-20101024.txt", and the cache files are saved as "cache-20101024.1930.zip". I continue to make changes, now diverging from the old cache file format: * Store R_LO, R_HI and I_HI in each cache file and abort with an error if an existing cache file has different values * Change R_LO, R_HI, and I_HI (as was done yesterday); the ranges are now (-2.02 .. 0.49) and (0.00 .. 1.15) * Disable PERFECT_CARDIOID (probably permanently) * Add -gmin and -gmax options * Add g_row_mass, used to accumulate the mass total one row at a time, to minimize the cumulative effect of roundoff error that would otherwise influence the precision of the answer at the larger gridsizes. * DEFAULT_GS_MIN is now 25; gridsizes are now centered on the nominal size (for example, for gridsize 128 it used to go from 128 to 147, but now it will go from 118 to 137). Add save_or_reload(). 20101028 Finish gridsize groups from 25 up to 204800 on MP16. Stats for the last few gridwidths are: Gridwidth 51200: Area 1.506592211(521), CoG -0.286768176(129); 8223 sec. Gridwidth 102400: Area 1.506591964(223), CoG -0.2867683969(465); 8912 sec. Gridwidth 204800: Area 1.5065919790(936), CoG -0.2867683744(266); 8912 sec. 20111228 In response to email from Warren Smith, I decide to continue with the next gridwidth 409600 (using first 1 task, then 4) on MP16. This should take about 7 or 8 days to complete (each task is taking just a little over 30 hours to complete a grid, and each task is doing 5 grids). 20120105 Add a few formatting routines (f64_expon, snfu_tag, and snformat_unc), not yet complete. 20120107 At the end of the gridwidth 409600 run started on 1228, I have these results: Gridwidths 409590-409609 at Nmax 209715200.................... Gridsize 409600, itmax 209715200 Area 1.506591856 +- 2.54e-08 (n=20, sigma 5.80e-08) Center of gravity -0.2867684229 +- 1.11e-08 (n=20, sigma 2.54e-08) Its: -1618709677335174482; Actual: 140360269258091 pixels in first sample: 20059349338 of 76864256651 Total computation time: 691255 sec, 1.42 GFLOPs This result shows a bug: the "Its:" number (effective number of iterations) overflowed its range (which was 64-bit integer). This needs to be changed to 128-bit or double. In addition, in order for the numbers to be right when used with -slice, I need to make each cache file store only its own g_agg_its and g_act_its values, not the running totals for the entire set of grids. 20120208 Report error on fopen(filename, "w"). Eliminate use of ugly "goto"s. 20120209 Replace g_agg_its and g_act_its with g_this_its, g_effc_its, g_ttl_its and g_t_efc_its, all of which are now 128-bit integers. This makes the cache files incompatible with anything I did prior to this point. I also work on snformat_unc; get it mostly working. Change use of l_ticks and g_ticks; add f_ticks (this parallels the changes made to the g_xxx_its variables so that the benchmark stats work regardless of the -slice option). 20120210 Add s128_str 20120222 Report ns_done rather than STAT_NSAMP for (n=XX) value (only affects slice mode) 20120224 Add g_dots and display current imaginary coordinate as a sign of progress. FUTURE CHANGES: - Area of cubic Mandelbrot set (Z^3+C) and higher orders. - Add an argument option to override STAT_NSAMP | end of revision history |____________________________________________________________________________*/ /* #include "sys-includes.h" */ #ifdef MACOSX # include # include #else # include # include # include #endif #include #include #include #include #include #include // Parameters that control the amount and precision of area measurement. // Because of the way the cache files are named, any of these parameters can // be changed and the program can be recompiled without losing earlier work // and without clearing the cache. // Gridsizes to calculate #define DEFAULT_GS_MIN 25 #define DEFAULT_GS_MAX 2000000 // Iterations limit multipliers to try. // The 19980218 results used a multiplier of 256. #define IM_MIN 512 #define IM_MAX 512 // Number of runs to average together to determine standard deviation. // The standard error goes down by the square root of this value, and // goes down just a little faster when increasing the gridsize. #define STAT_NSAMP 20 #define SKEW_INC_MAX 20 /* The real coordinates in the Mandelbrot set vary from -2.0 to +0.471185... and the imaginary coordinates range from +-1.122757... See: mrob.com/pub/muency/easternmostpoint.html mrob.com/pub/muency/northernmostpoint.html We conservatively add about 0.02 in all directions. */ #define R_LO -2.02 #define R_HI 0.49 #define I_LO 0.00 #define I_HI 1.15 typedef unsigned short unsigned16; typedef signed int signed32; typedef unsigned int unsigned32; typedef signed long signed64; typedef unsigned long unsigned64; /* For 128-bit I probably should be converting LUAMS to C++ so I can use my own int128 library. However, this works in the short-term. According to some on the internet, something like "int128_t" or "__int128_t" might be provided by compilers. */ typedef __attribute__((__mode__(__TI__))) signed128; typedef unsigned __attribute__((__mode__(__TI__))) unsigned128; typedef double iter_type; typedef struct { double ar_area; double ar_gravity; } area_result; // dwell completion procedure pointer typedef void(*dwell_cpp)(iter_type, long, long); typedef void(*dwell_proc)(iter_type r, iter_type i, signed32 itmax, signed32 im_sqrt, dwell_cpp cp); signed64 rpm_clock(void); void dcp1(iter_type r, long dwell, long its); void dwell1(iter_type r, iter_type i, signed32 itmax, signed32 im_sqrt, dwell_cpp cp); void dwell2(iter_type r, iter_type i, signed32 itmax, signed32 im_sqrt, dwell_cpp cp); area_result m_area(signed32 gridsize, signed32 itmax, signed32 im_sqrt, signed32 skew_v); area_result m_area_stat(signed32 gridsize, signed32 itmax, signed32 nsamp, area_result *std_dev, area_result *std_err, int * ns_done); void loop1(void); int main(int, char **); signed32 g_gs_min; signed32 g_gs_max; signed128 g_this_its; signed128 g_effc_its; signed128 g_ttl_its; signed128 g_t_efc_its; unsigned32 g_itmax; unsigned64 g_inset; unsigned64 g_tested; unsigned64 g_1st_inset; unsigned64 g_1st_tested; double g_row_mass; double g_mass; // l_ticks counts the total number of ticks (microseconds on Linux) spent on // the current grid. f_ticks is the number of ticks that have elapsed since // the last time we wrote out to a cache file. Everytime we write a file we // transfer f_ticks into l_ticks and zero out f_ticks, so that f_ticks doesn't // overflow. signed64 g_ticks; signed64 l_ticks; signed64 f_ticks; unsigned16 g_quit; dwell_proc g_dp; const double k256 = 256.0; const double k96 = 96.0; const double k32 = 32.0; const double k16 = 16.0; const double k7 = 7.0; const iter_type k4 = 4.0; const double k3 = 3.0; const iter_type k2 = 2.0; const double k1 = 1.0; const double kpi = 3.1415926535897932; void s128_str(char *s, signed128 x); void s128_str(char *s, signed128 x) { int d, nd; signed128 t, k10; if (x == 0) { *s = '0'; s++; *s = 0; return; } if (x < 0) { *s = '-'; s++; x = - x; } k10 = 10; // Extract each digit (in reverse order) nd = 0; while(x > 0) { t = x / k10; d = x - (t * k10); s[nd++] = ('0' + d); x = t; } s[nd] = 0; /* Un-reverse the output string */ int i; char tc; for (i=0; i*2 < nd; i++) { tc = s[nd-1-i]; s[nd-1-i] = s[i] ; s[i] = tc; } } // Define PERFECT_CARDIOID to add the exact value of the period-1 and -2 // components' areas rather than pixel-counting them. This reduces the // computation time a little (less than you'd think!) and also reduces the // standard error for small grid sizes, but not for big grid sizes. //#define PERFECT_CARDIOID #define USE_FILES // To save work in disk files, to recover from power failures /* The write interval constants are in the same units as rpm_clock() */ #define INIT_WRITE_INTERVAL 500000LL #define MAX_WRITE_INTERVAL 5000000LL signed64 write_interval; //#define DEBUG_WRITES int g_slice; int g_of; /* Measure elapsed runtime in microseconds */ signed64 rpm_clock(void) { signed64 rv; struct rusage ru; int err; err = getrusage(RUSAGE_SELF, &ru); if (err) { printf("rpm_clock err %d\n", err); exit(1); } rv = (signed64) ru.ru_utime.tv_sec; rv = (1000000LL * rv) + (signed64) ru.ru_utime.tv_usec; return(rv); } void check_sizes(void) { int s; s = (int) sizeof(unsigned16); if (s != 2) { printf("sizeof(u16) == %d, expected size was 2\n", s); exit(-1); } s = (int) sizeof(unsigned32); if (s != 4) { printf("sizeof(u32) == %d, expected size was 4\n", s); exit(-1); } s = (int) sizeof(unsigned64); if (s != 8) { printf("sizeof(u64) == %d, expected size was 8\n", s); exit(-1); } s = (int) sizeof(unsigned128); if (s != 16) { printf("sizeof(u128) == %d, expected size was 16\n", s); exit(-1); } if (0) { /* Test signed128 type. Note that printf has no format selector for this type. */ signed128 x; x = 1; for(s=0; s<128; s++) { /* Note that printf has no format selector for 128-bit integers. */ printf("%d %10g\n", s, ((double) x)); x += x; } exit(0); } } void fmt_1_999(char * s, double x); void fmt_1_999(char * s, double x) { strcpy(s, ""); if (x >= 100.0) { sprintf(s, "%3.0f", x); if (strlen(s) > 3) { strcpy(s, "999"); } } else if (x >= 10.0) { sprintf(s, "%3.1f", x); if (strlen(s) > 4) { strcpy(s, "99.9"); } } else if (x >= 1.0) { sprintf(s, "%3.2f", x); if (strlen(s) > 4) { strcpy(s, "9.99"); } } else { strcpy(s, "1.00"); } } void format_si(char * s, double x); void format_si(char * s, double x) { if (x < 0.0) { strcpy(s, "neg err"); } else if (x == 0) { strcpy(s, " 0 "); } else if (x >= 1.0e18) { sprintf(s, "%2.1e", x); } else if (x >= 1.0e15) { fmt_1_999(s, x/1.0e15); strcat(s, " P"); } else if (x >= 1.0e12) { fmt_1_999(s, x/1.0e12); strcat(s, " T"); } else if (x >= 1.0e9) { fmt_1_999(s, x/1.0e9); strcat(s, " G"); } else if (x >= 1.0e6) { fmt_1_999(s, x/1.0e6); strcat(s, " M"); } else if (x >= 1.0e3) { fmt_1_999(s, x/1.0e3); strcat(s, " K"); } else if (x >= 1.0) { fmt_1_999(s, x); strcat(s, " "); } else { sprintf(s, "%2.1e", x); } } int f64_expon(double x, signed32 * expon); int f64_expon(double x, signed32 * expon) { int rv; if ((x == 0) || isnan(x) || isinf(x)) { return -1; } signed32 e = 0; x = fabs(x); while(x < 1.0) { e--; x *= 10.0; } while(x >= 10.0) { e++; x /= 10.0; } if (expon) { *expon = e; } } char snfu_scratch[32]; int snfu_tag(char * s, long n, char * fmt, double x, char **lead, char **digits, char **trail); /* Perform an snprintf with the given format string, and "tag" the result with pointers to the "leader", digits, and "trailer". */ int snfu_tag(char * s, long n, char * fmt, double x, char **lead, char **digits, char **trail) { int l1, l2; if (n <= 0) { return -1; } if ((lead == 0) || (digits == 0) || (trail == 0)) { return -1; } *lead = 0; *digits = 0; *trail = 0; /* %%% Here we should make sure fmt fits within the constraints we're willing to deal with: contains exactly one '%', has an 'e', 'f' or 'g' at the end, etc. */ /* Get the digits, and also find out how much snprintf wants */ l1 = snprintf(s, n, fmt, x); if (l1+1 > n) { return -1; } *lead = s; /* Find lead digit */ char * ld = s; int gg = 1; while(gg) { if ((*ld >= '1') && (*ld <= '9')) { gg = 0; } else if (*ld == 0) { gg = 0; } else { ld++; } } if (*ld == 0) { return 0; } *digits = ld; /* Skip digits */ gg = 1; while((*ld == '.') || ((*ld >= '0') && (*ld <= '9'))) { ld++; /* Skip a character */ } *trail = ld; return l1; } int snformat_unc(char * s, long n, double x, double unc); int snformat_unc(char * s, long n, double x, double unc) { int l1, l2, i; const char * spec1 = "(Uncertain)"; char ts[30]; char * lead; char * digits; char * trail; if (fabs(unc) > fabs(x)) { /* The uncertainty is bigger than the value */ if (strlen(spec1)+1 < n) { return strlen(spec1); } strncpy(s, spec1, n); return strlen(s); } /* Check for 0 */ if (x == 0.0) { s[0] = '0'; s[1] = 0; return 1; } /* Deal with negatives */ if (x < 0) { *s++ = '-'; n--; x = - x; } if (unc < 0) { unc = - unc; } /* Get the digits, and also find out how much snprintf wrote */ l1 = snfu_tag(ts, sizeof(ts), "%.17g", x, &lead, &digits, &trail); if (digits == 0) { s[0] = 0; return 0; } char t2[30]; int gotdot = 0; for(i=0; (i+1 < sizeof(t2)) && (digits + i < trail); i++) { t2[i] = digits[i]; if (t2[i] == '.') { gotdot = 1; } } if (gotdot == 0) { t2[i++] = '.'; } while (i+1 < sizeof(t2)) { t2[i++] = '0'; } t2[i] = 0; /* We want to add three uncertainty digits inside '()' plus the trailing null */ l2 = l1+5; if (l2+1 > n) { s[0] = 0; return 0; } /* Determine how many lead digits are in the clear */ int ex = 0; int eu = 0; f64_expon(x, &ex); f64_expon(unc, &eu); int need = ex - eu; /* Put at most 15 digits in the lead */ if (need > 15) { need = 15; } /* Find lead digit */ char * ld = digits; int len = digits - lead; for(i=0; i 0) { /* If this character is a digit, count it */ if ((t2[i] >= '0') && (t2[i] <= '9')) { need--; } *s++ = t2[i]; i++; /* Next digit */ } /* Then 3 more value digits */ need = 3; while (need > 0) { /* If this character is a digit, count it */ if ((t2[i] >= '0') && (t2[i] <= '9')) { need--; } *s++ = t2[i]; i++; /* Next digit */ } *s++ = '('; /* Then 3 digits of the uncertainty */ char us[30]; char * ulead; char * udig; char * utr; /* Get the digits, and also find out how much snprintf wrote */ l1 = snfu_tag(us, sizeof(us), "%.3e", unc, &ulead, &udig, &utr); need = 3; int j = 0; while ((need > 0) && (udig[j])) { /* If this character is a digit, count it */ if ((udig[j] >= '0') && (udig[j] <= '9')) { need--; *s++ = udig[j]; } j++; /* Next digit */ } while (need > 0) { *s++ = '0'; need--; } *s++ = ')'; /* Add trail */ j = 0; while(trail[j]) { *s++ = trail[j++]; } /* trailing null, and we be done. */ *s = 0; return strlen(s); } void my_handler(int signal); void my_handler(int s) { g_quit = 1; printf("\n\nGot signal %d, exiting!", s); } void dcp1(iter_type r, long dwell, long act_its) { g_tested++; if (dwell > g_itmax) { g_inset++; g_row_mass += r; g_dp = &dwell1; // dwell1 or dwell3 } else if (dwell > 50) { g_dp = &dwell1; // dwell1 or dwell3 } else { g_dp = &dwell1; // dwell1 or dwell3 } g_effc_its += dwell; g_this_its += act_its; } // This is a relatively straightforward iterate routine, except that it // detects loops. void dwell1(iter_type r, iter_type i, signed32 itmax, signed32 im_sqrt, dwell_cpp d1_cp) { iter_type cr, ci; register iter_type zr, zi; register iter_type sr, si; register iter_type zr2, zi2, rad2; signed32 rv_its; signed32 act_its; signed32 n1, n2; int escape_flag = 0; int cycle_flag = 0; rv_its = 1; act_its = 1; zr = r; zi = i; cr = r; ci = i / k2; sr = zr; si = zi; for (n1=2; n1<=im_sqrt; n1++) { sr = zr; si = zi; for (n2=0; n2 k4) { escape_flag = 1; n1 = n2 = im_sqrt + 1; /* Make both loops exit */ } else if ((zr == sr) && (zi == si)) { rv_its = itmax+1; cycle_flag = 1; n1 = n2 = im_sqrt + 1; /* Make both loops exit */ } } } if ((cycle_flag == 0) && (escape_flag == 0)) { /* We did not cycle or escape, and we ran out of iterations. Return act_its+1 just to make sure it meets the criteria. */ act_its++; } if (cycle_flag == 0) { /* We did not cycle, but might have escaped. Here I want to return the actual iteration count. */ rv_its = act_its; } /* Now call the completion routine, to update the statistics for this grid (total number of pixels, area total, etc.) */ (*d1_cp)(r, rv_its, act_its); } // dwell2 checks to see if the point is in R2a or R2.1/2a, and // calls (*g_dp) if it isn't. (*g_dp) should point to dwell1 or // another dwell routine. void dwell2(iter_type r, iter_type i, signed32 itmax, signed32 im_sqrt, dwell_cpp cp) { iter_type i2, s, s2; signed32 dwell, act_its; // Test for period-2 component i2 = i * i; s = r * r + i2; s2 = (k256*s - k96)*s + k32*r; if (s2 < k3) { /* We do not need to call the dwell routine */ } else { // Test for cardioid s2 = k16*s + k32*r + k16; if (s2 < k1) { /* We do not need to call the dwell routine */ } else { /* Yes we do need to call the dwell routine */ (*g_dp)(r, i, itmax, im_sqrt, cp); // dwell1 or dwell3 /* dwell routine calls *cp, so we don't need to do that. */ return; } } /* If we get here, it means the point is either in the cardioid or the period-2 component (largest mu-atom). */ act_its = 1; // If we're using the PERFECT_CARDIOID method, we report this // point as being *outside* the set because the exact areas of // R2a and R2.1/2a are added by m_area #ifdef PERFECT_CARDIOID dwell = 1; #else dwell = itmax+1; #endif (*cp)(r, dwell, act_its); // dcp1 return; } void save_or_reload (int * loaded, signed64 * wrote_time, signed64 * now_time, iter_type * i, int * firstrow, char * filename); void save_or_reload (int * loaded, signed64 * wrote_time, signed64 * now_time, iter_type * i, int * firstrow, char * filename) { FILE *f; if (*loaded) { // Time to write? *now_time = rpm_clock(); *now_time = *now_time - *wrote_time; #ifdef USE_FILES if (*now_time > write_interval) { // Only write if we're still running "normally". This is // mainly so I don't have to worry too much about making // my loops exit gracefully. if (g_quit == 0) { iter_type tfloat; f_ticks += rpm_clock(); // stop timer to write value to file // printf("about to write; f_ticks == %ld\n", f_ticks); l_ticks += f_ticks; // printf("l_ticks now %ld\n", l_ticks); // write f = fopen(filename, "w"); if (f == 0) { fprintf(stderr, "Could not open '%s' for writing.\n", filename); exit(-1); } fwrite(i, sizeof(i), 1, f); fwrite(&g_inset, sizeof(g_inset), 1, f); fwrite(&g_tested, sizeof(g_tested), 1, f); fwrite(&g_mass, sizeof(g_mass), 1, f); fwrite(firstrow, sizeof(firstrow), 1, f); fwrite(&g_this_its, sizeof(g_this_its), 1, f); fwrite(&g_effc_its, sizeof(g_effc_its), 1, f); fwrite(&l_ticks, sizeof(l_ticks), 1, f); tfloat = R_LO; fwrite(&tfloat, sizeof(tfloat), 1, f); tfloat = R_HI; fwrite(&tfloat, sizeof(tfloat), 1, f); tfloat = I_HI; fwrite(&tfloat, sizeof(tfloat), 1, f); fclose(f); *wrote_time = rpm_clock(); #ifdef DEBUG_WRITES printf("Wrote i = %10.7f\n", i); #endif // printf("restart f_ticks at 0\n"); f_ticks = 0; // zero timer f_ticks -= rpm_clock(); // start timer } if (write_interval < MAX_WRITE_INTERVAL) { write_interval += INIT_WRITE_INTERVAL; } } #endif // USE_FILES } else { /* Not yet loaded cache file */ #ifdef USE_FILES f = fopen(filename, "r"); if (f) { iter_type t_rlo, t_rhi, t_ihi; // Reload from file fread(i, sizeof(i), 1, f); fread(&g_inset, sizeof(g_inset), 1, f); fread(&g_tested, sizeof(g_tested), 1, f); fread(&g_mass, sizeof(g_mass), 1, f); fread(firstrow, sizeof(firstrow), 1, f); fread(&g_this_its, sizeof(g_this_its), 1, f); fread(&g_effc_its, sizeof(g_effc_its), 1, f); fread(&l_ticks, sizeof(l_ticks), 1, f); // printf("read l_ticks == %ld\n", l_ticks); fread(&t_rlo, sizeof(t_rlo), 1, f); fread(&t_rhi, sizeof(t_rhi), 1, f); fread(&t_ihi, sizeof(t_ihi), 1, f); fclose(f); /* The following checks also help ensure endianness and alignment of the preceding data fields */ if (t_rlo != R_LO) { printf("\n"); printf("Cannot use cache file '%s'\n", filename); printf(" because its R_LO value is %10g and we expect %10g\n", t_rlo, R_LO); exit(-1); } if (t_rhi != R_HI) { printf("\n"); printf("Cannot use cache file '%s'\n", filename); printf(" because its R_HI value is %10g and we expect %10g\n", t_rhi, R_HI); exit(-1); } if (t_ihi != I_HI) { printf("\n"); printf("Cannot use cache file '%s'\n", filename); printf(" because its I_HI value is %10g and we expect %10g\n", t_ihi, I_HI); exit(-1); } #ifdef DEBUG_WRITES printf("Continuing from i = %10.7f\n", *i); #endif f_ticks = 0; // Since any calculation up to this point has been replaced // with the cache values, we need to restart the timer at 0 f_ticks -= rpm_clock(); } else { // No cache file. Schedule a cache write 1 second from now. *wrote_time = rpm_clock() - write_interval + 1000000L; } #endif // USE_FILES *loaded = 1; } } int g_num_dots; char g_dots[STAT_NSAMP+1]; area_result m_area(signed32 gridsize, signed32 itmax, signed32 im_sqrt, signed32 skew_v) { area_result rv; iter_type r, i2; iter_type ri_inc; iter_type row0_adjust; double pix_area; double result; double skew_d; iter_type i; int firstrow; char filename[200]; signed64 wrote_time, now_time; int loaded; wrote_time = rpm_clock(); loaded = 0; sprintf(filename, "cache/%ld,%ld@%ld", (long) gridsize, (long) skew_v, (long) itmax); // printf("filename: %s", filename); ri_inc = (R_HI - R_LO) / gridsize; skew_d = skew_v; skew_d /= 65536.0; skew_d *= ri_inc; pix_area = ri_inc*ri_inc; g_inset = 0; g_tested = 0; g_mass = 0; g_itmax = itmax; firstrow = 1; for (i = I_LO + skew_d; (i < I_HI) && (g_quit == 0); i += ri_inc) { save_or_reload(&loaded, &wrote_time, &now_time, &i, &firstrow, filename); i2 = i; if (firstrow) { i2 = (i+(ri_inc/2.0))/2.0; } #ifndef DEBUG_WRITES if (g_quit == 0) { printf("\r%s%9.6f ", g_dots, i2); fflush(stdout); } #endif g_row_mass = 0; for (r = R_LO - skew_d; (r < R_HI) && (g_quit == 0); r += ri_inc) { dwell2(r, i2, itmax, im_sqrt, &dcp1); } if (firstrow) { // printf("%qd->", g_inset); i2 = g_inset; row0_adjust = (i+(ri_inc/2.0))/ri_inc; i2 *= row0_adjust; g_inset = i2; g_row_mass *= row0_adjust; // printf("%qd ", g_inset); firstrow = 0; } g_mass = g_mass + g_row_mass; } result = g_inset; result *= pix_area; result *= k2; #ifdef PERFECT_CARDIOID // Add the exact area of the cardioid and the period-2 component. // %%% this works for the area but not for g_mass, so we get the wrong answer // for the center of gravity result += (k7 * kpi)/k16; #endif rv.ar_area = result; rv.ar_gravity = g_mass / g_inset; return rv; } area_result m_area_stat( signed32 gridsize, // Initial gridsize signed32 itmax, // Itmax for all samples signed32 nsamp, // Number of samles to take area_result *std_dev, // Sample standard deviation results will go here area_result *std_err, // Standard error results will go here int * ns_done) // Number of samples taken will go here { double a_sum, g_sum; double a_sum_sqr, g_sum_sqr; signed32 i, skew_v, skew_inc; double area, gravity; double a_mean, g_mean; area_result mar; area_result rv; signed32 its, n1, n2, im_sqrt; signed32 gs_start, gs_end; gs_start = gridsize - (nsamp/2); gs_end = gs_start + nsamp; printf("Gridwidths %ld-%ld at Nmax %ld\n", (long) gs_start, (long) (gs_end-1), (long) itmax); fflush(stdout); g_num_dots = 0; *ns_done = 0; (*std_dev).ar_area = 0.0; (*std_dev).ar_gravity = 0.0; (*std_err).ar_area = 0.0; (*std_err).ar_gravity = 0.0; rv.ar_area = 0.0; rv.ar_gravity = 0.0; /* Compute im_sqrt. The algorithm is dumb but doesn't matter because we only do it once per unique value of itmax */ its = 0; im_sqrt = 0; for (n1=2; n1 itmax) { im_sqrt = n1; n1 = n2 = itmax; /* Make both loops end now */ } its++; } } if (im_sqrt == 0) { fprintf(stderr, "Error: did not detect im_sqrt\n"); exit(-1); } g_ttl_its = 0; g_t_efc_its = 0; a_sum = a_sum_sqr = 0; g_sum = g_sum_sqr = 0; skew_inc = 65536 / SKEW_INC_MAX; // was "nsamp" skew_v = skew_inc / 2; g_ticks = 0; g_1st_inset = g_1st_tested = 0; gridsize = gs_start; for (i=0; (i 1) { printf("RUNNING AS SLICE %d of %d SLICES\n", g_slice, g_of); } if (g_of > 4) { /* More than 4 slices means less than 5 samples per slice */ printf(" Not enough samples for meaningful averages.\n"); } else if (g_of > 1) { printf( " (These stats reflect %d samples out of the full set of %d)\n", ns_done, STAT_NSAMP); printf("\n"); } if (g_of <= 4) { char s[30]; /* 4 or fewer slices means we have enough samples to calculate the average and standard error, etc. */ printf(" Gridsize %ld, itmax %ld\n", (long) gs, (long) itmax); snformat_unc(s, sizeof(s), mar.ar_area, 1.96 * std_err.ar_area); printf(" Area %s (n=%d, sigma %7.2e)\n", s, ns_done, std_dev.ar_area); // printf( // " Area %11.9f +- %7.2e (n=%d, sigma %7.2e)\n", // mar.ar_area, 1.96 * std_err.ar_area, ns_done, std_dev.ar_area); #ifndef PERFECT_CARDIOID snformat_unc(s, sizeof(s), mar.ar_gravity, 1.96*std_err.ar_gravity); printf(" Center of gravity %s (n=%d, sigma %7.2e)\n", s, ns_done, std_dev.ar_gravity); // printf( // " Center of gravity %13.10f +- %7.2e (n=%d, sigma %7.2e)\n", // mar.ar_gravity, 1.96 * std_err.ar_gravity, ns_done, // std_dev.ar_gravity); #endif s128_str(s, g_t_efc_its); printf(" Its: %20s", s); s128_str(s, g_ttl_its); printf("; Actual: %s\n", s); // printf(" Its: %10.05g; Actual: %10.05g\n", // ((double) g_t_efc_its), ((double) g_ttl_its)); printf(" pixels in first sample: %lu of %lu\n", g_1st_inset, g_1st_tested); perf = ((double) g_ticks)/1.0e6; /* Seconds */ printf(" Total computation time: %g sec", perf); if (perf > 0.01) { char s[20]; /* FLOPs per millisecond equals KFLOPs per second. */ perf = ((double) g_ttl_its) * 7.0 / perf; format_si(s, perf); printf(", %sFLOPs", s); } else { /* Ran for less than 10 milliseconds */ printf(" (too short for KFLOPs measure)"); } printf("\n"); } } } printf("\n"); } } void usage(void); void usage() { fprintf(stderr, "%s", "NAME\n" " luams - measure area of Mandelbrot Set\n" "\n" "SYNOPSIS\n" " luams [-slice N/M] [-gmin A] [-gmax B]\n" "\n" "EXAMPLES\n" " luams -slice 3/4 (slice number is 0-based)\n" "\n" ); } int main(int nargs, char **argv) { char ** av; char *arg; int args; printf("LUAMS-3, %s", COMPILE_DATE); #ifdef PERFECT_CARDIOID printf(", PERFECT_CARDIOID"); #endif #ifdef USE_FILES printf(", USE_FILES"); #endif printf("\n\n"); check_sizes(); /* Set defaults */ write_interval = INIT_WRITE_INTERVAL; g_slice = 0; g_of = 1; g_gs_min = DEFAULT_GS_MIN; g_gs_max = DEFAULT_GS_MAX; if (nargs > 1) { av = argv; args = nargs; av++; /* skip our program name/path */ args--; while(args) { arg = *av; av++; args--; if ((strcmp(arg, "-help") == 0) || (strcmp(arg, "-h") == 0) || (strcmp(arg, "--help") == 0) || (strcmp(arg, "--h") == 0)) { usage(); exit(0); } else if (strcmp(arg, "-gmin") == 0) { char * s; int na, g; s = *av; av++; args--; na = sscanf(s, "%d", &g); if (na != 1) { fprintf(stderr, "-gmin requires an argument\n"); exit(-1); } if (g < 32) { fprintf(stderr, "-gmin argument must be at least 32.\n"); exit(-1); } g_gs_min = g; printf("Min gridsize %d\n", g_gs_min); } else if (strcmp(arg, "-gmax") == 0) { char * s; int na, g; s = *av; av++; args--; na = sscanf(s, "%d", &g); if (na != 1) { fprintf(stderr, "-gmax requires an argument\n"); exit(-1); } if (g < 32) { fprintf(stderr, "-gmax argument must be at least 32.\n"); exit(-1); } g_gs_max = g; printf("Max gridsize %d\n", g_gs_max); } else if (strcmp(arg, "-slice") == 0) { char * s; int na, slice, of; s = *av; av++; args--; na = sscanf(s, "%d/%d", &slice, &of); if (na != 2) { fprintf(stderr, "-slice requires an argument like '3/7'\n"); exit(-1); } if (of <= 0) { fprintf(stderr, "second part of -slice argument must be a positive integer.\n"); exit(-1); } if ((slice < 0) || (slice >= of)) { fprintf(stderr, "first part of -slice argument must be in the range 0 .. %d inclusive\n", (int) (of-1)); exit(-1); } if (of > STAT_NSAMP) { fprintf(stderr, "number of slices should not be more than %d\n", (int) STAT_NSAMP); exit(-1); } g_slice = slice; g_of = of; printf("Will run as slice %d of %d\n", g_slice, g_of); } } } if (g_gs_max < g_gs_min) { fprintf(stderr, "-gmax must be at least as large as -gmin.\n"); exit(-1); } signal(SIGINT, &my_handler); signal(SIGTERM, &my_handler); loop1(); }