mdbtxt1
mdbtxt2
Proceed to Safety

Sloandora: A metric and UI for Integer Sequence exploration    

This page describes a software project I developed in the fall of 2009 with the help of suggestions from the seqfan mailing list.

Contents

Introduction

Overview of Design

Comparison to Other Systems

The Metric

Topic State Variables

User Interface

Results

Example Sessions

      Egyptian Fractions Example

      "Mandelbrot" Topic Example

Performance Benchmarks

Future Improvements

Current Source Code

References


Introduction

The overall goal is to provide a software tool for computer-assisted discovery of integer sequences from OEIS [5] that are related to a user's area of research. The user can choose sequences explicitly, or look at the computer's next recommendation, and tell the computer whether or not it is "interesting" (relevant to his area of research). The computer can determine (through the use of automatic techniques, possibly supplemented by input from other users or database "curators") which sequences are "likely" to be interesting given the user's indicated preferences so far.

An analogy for this type of interaction already exists in:

Overview of Design

The overall structure of my implementation is the following:

1. creating a topic (corresponding to a particular area of interest, or a field of research)

2. auto-recommending items to the user

3. User-rating items as "relevant", "not relevant", or no comment (ambivalent)

The implementation given here (source code below) uses a perl command-line interface and a C compiled tool that implements the metric. It uses a local copy of the OEIS database in the form of eisBTfry00000.txt files (which are no longer available except through special request).

Comparison to Other Systems

The OEIS has existed in various forms for over fifty years. There are a number of other tools for sequence exploration, which fulfill different objectives:

This project adds a new way to explore: using one's partial knowledge of the OEIS to quickly discover related content.

The Sloandora Metric

The "metric" is the algorithm for determining how closely related two sequences are. It is central to any assisted browsing system.

The Pandora music discovery service [3] uses a database (the Music Genome Project) that was built up with the aid of real people, who listened to music and created a "genome" for each song. Each "genome" is essentially a set of cognitive attribute tags drawn from hundreds (or possibly thousands) of different musical attributes. Using humans is necessary partly because music has to be listened to (and somehow interpreted) to discover its cognitive content, and computers aren't too good at that yet.[4]

In our case the job is made easier by the fact that the OEIS entries are essentially just plain text, and the text itself is about the right size to use as the genome.

So I have chosen a character-level concordance metric as the initial algorithm for measuring similarity of sequences. My specific implementation works as follows:

Topic State Variables

The user's browsing is centered around "topics", which represent distinct areas of interest.

Each topic has a datafile that contains its current state. This state consists of:

User Interface

The user begins by creating a "new topic" representing an area of research. To create a topic the user must select a single relevant sequence. This is the "seed sequence".

The program computes the concordance match between the seed sequence and all other sequences, and stores this information in a file representing the current state of that topic.

Then the program selects the sequence that has the highest concordance with the seed sequence, and presents it to the user. (In MacOS, I can open the sequence's web page into a broswer).

The user can then perform the following actions:

"--" or Next: give me another automatic recommendation

Y or Relevant: this sequence belongs to this topic, use it to improve your recommendations in the future.

N or Irrelevant: this sequence does not belong to this topic, avoid sequences like it in the future.

A123456: Display a sequence on request, so I can decide if it is relevant and rate it accordingly.

When the user selects Next, another sequence is recommended and no other actions are taken.

When the user selects Relevant or Irrelevant, the software performs the concordance metric between the new sequence and all other sequences in the database. The resulting concordance score is either added to or subtracted from the current scores for each sequence, and the topic state is updated.

The action of requesting a specific sequence does not affect the topic state, unless the user subsequently rates it with the Relevant or Irrelevant command.

Results

Initial testing of the metric has been performed. I used the metric described above with a 5-character token size to find the closest sequence in the OEIS to a given seed sequence:

Seed Match Score Notes
A000213 (Tribonacci numbers) A001648 0.270 Tetranacci numbers without the leading 4
A000040 (The Prime numbers) A019590 0.225 Fermat's Last Theorem
A000079 (Powers of 2) A005408 0.284 The odd numbers
A005577 (arrangements of pennies in rows) A005576 0.447 A different pennies sequence
A002808 (Composite numbers) A018252 0.357 The nonprime numbers
A006451 (Solution to a diophantine equation) A006454 0.324 Both share a reference to the article "How four dogs meet in a field".
A006887 (Kaprekar triples) A060768 0.268 Pseudo-Kaprekar triples
A100140 (Greedy Egyptian fractions) A117116 0.181 Denominators of an Egyptian Fraction for phi = (1+sqrt(5))/2

Example Sessions

Egyptian Fractions Example

Here is an example session in which the user wants to study Egyptian fractions:

bash$ ./Sloandora _ _ (_` | _ ._ ._ _| _ ._ ._ , ) | ( ) ,-| | | ( | ( ) | ,-| ~ ~ ~ `~` ~ ~ ~` ~ ~ `~`    User-Aided Integer Sequence Auto-Discovery System by Robert Munafo, with the seqfan list members    Loading database index... Select a topic: egyptian There is no topic 'egyptian'. Do you want to create one? yes Enter the A-number of a sequence to start this topic: A100140 Rating 'A100140' at 1; 2 threads: ...................................................... 95.27 CPU-seconds. Next recommendation: A117116 (score 0.179) Denominators of an Egyptian Fraction for phi = (1+sqrt(5))/2.    Command loop (type 'help' for help) Topic 'egyptian' > relevant Rating 'A117116' at 1; 2 threads: ........................................... 77.28 CPU-seconds. Next recommendation: A006525 (score 0.420) Denominator of Egyptian fraction for e-2. Topic 'egyptian' > show Current sequence is A006525; its score is 0.420 %---- A006525 %I M1553 %S 2,5,55,9999,3620211523,25838201785967533906, %T 3408847366605453091140558218322023440765 %N Denominator of Egyptian fraction for e-2. %H <a href="http://www.research.att.com/~njas/sequences/Sindx_Ed.html#Egypt">Index entries for sequences related to Egyptian fractions</a> %H Eric Weisstein's World of Mathematics, <a href="http://mathworld.wolfram.com/EgyptianFraction.html">Egyptian Fraction</a> %Y Adjacent sequences: A006522 A006523 A006524 this_sequence A006526 A006527 A006528 %Y Sequence in context: A114029 A013171 A073422 this_sequence A042161 A101151 A030244 %K nonn %O 0,1 %A njas %E More terms from H. P. Robinson.    Topic 'egyptian' > y Rating 'A006525' at 1; 2 threads: ...................................... 68.64 CPU-seconds. Next recommendation: A006524 (score 0.793) Egyptian fraction for 1/ Pi. Topic 'egyptian' > A01234 Loading A001234 (current score: 0.341) Stirling numbers of first kind. Topic 'egyptian' > ? Sloandora commands:    Show Display the entire record for the current sequence Web Display the current sequence in your web browser Y, Relevant The currently displayed sequence matches this topic N, Irrelevant The currently displayed sequence does NOT match Next, Skip Show me another sequence you think is relevant A012345 Show me sequence A012345 Best Re-select the latest recommendation Ratings List all sequences that have been rated in this topic Save Save the state of the current topic. Quit, Exit, Bye Exit Sloandora (also does "save") Help, ? Displays this help    Topic 'egyptian' > best Current recommendation: A006524 (score 0.793) Egyptian fraction for 1/ Pi. Topic 'egyptian' > y Rating 'A006524' at 1; 2 threads: .......................................... 70.63 CPU-seconds. Next recommendation: A006487 (score 1.143) Egyptian fraction for square root of 2. Topic 'egyptian' > q Saving topic 'egyptian'...done

"Mandelbrot" Topic Example

Here is another example session in which a few sequences had to be rated Irrelevant. This session was run on a system with 8 hardware threads; the user is interested in anything related to the Mandelbrot set:

bash$ ./Sloandora _ _ (_` | _ ._ ._ _| _ ._ ._ , ) | ( ) ,-| | | ( | ( ) | ,-| ~ ~ ~ `~` ~ ~ ~` ~ ~ `~`    User-Aided Integer Sequence Auto-Discovery System by Robert Munafo, with the seqfan list members    Found 0 parsed eis/DBIS files. Repartitioning eisBTfry files into 8 pieces... 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 Loading database index... Select a topic: mandel There is no topic 'mandel'. Do you want to create one? yes Enter the A-number of a sequence to start this topic: 52154 Rating 'A052154' at 1; 8 threads: ......... 66.46 CPU-seconds. Performed 3.08e+11 key comparisons in approx. 9.138 sec. with 8 threads. Next recommendation: A054671 (score 0.203) Denominators of (reduced) coefficients of Laurent series for conformal mapping from exterior of unit disk onto exterior of Mandelbrot set.    Command loop (type 'help' for help) Topic 'mandel' > show Current sequence is A054671; its score is 0.203 %---- A054671 %d 20130208.030727    %S 2,8,4,128,1,1024,16,32768,1,262144,32,4194304,1,33554432,512, %T 2147483648,1,17179869184,2048,274877906944,64,2199023255552,2048, %U 70368744177664,1,562949953421312,131072,9007199254740992,256 %N Denominators of (reduced) coefficients of Laurent series for conformal mapping from exterior of unit disk onto exterior of Mandelbrot set. %C Sum converges very slowly: 10^118 terms to get first two digits, 10^1181 for three digits. %D John H. Ewing and G. Schober, "The area of the Mandelbrot set", Numer. Math. vol. 61, pp. 59-72, 1992. %H Adam Majewski, <a href="http://republika.pl/fraktal/maxima.html">Maxima code for this sequence</a> %H Robert P. Munafo, <a href="http://www.mrob.com/pub/muency/laurentseries.html">Laurent Series</a> %H Eric Weisstein's World of Mathematics, <a href="http://mathworld.wolfram.com/MandelbrotSet.html">Mandelbrot Set</a> %p Munafo site gives Maple code. %Y Cf. A054670. %K frac,nonn %O 0,1 %A _Robert Munafo_, Apr 18 2000 %E Extended by _Eric W. Weisstein_, Nov 27, 2005 %E Definition corrected by Adam Majewski (adammaj1(AT)o2.pl), Nov 17 2006 Sequence data length: 1137 bytes.    Topic 'mandel' > auto web Auto #web# is now on Topic 'mandel' > relevant Rating 'A054671' at 1; 8 threads: ........ 63.02 CPU-seconds. Performed 2.28e+11 key comparisons in approx. 8.665 sec. with 8 threads. Next recommendation: A054670 (score 0.587) Numerators of (reduced) coefficients of Laurent series for conformal mapping from exterior of unit disk onto exterior of Mandelbrot set. http://oeis.org/A054670 Topic 'mandel' > y Rating 'A054670' at 1; 8 threads: ........ 65.52 CPU-seconds. Performed 2.72e+11 key comparisons in approx. 9.009 sec. with 8 threads. Next recommendation: A098403 (score 0.584) Decimal expansion of area of Mandelbrot set. http://oeis.org/A098403 Topic 'mandel' > ok Unknown command (Type 'help' for help) Topic 'mandel' > help Sloandora commands:    Show Display the entire record for the current sequence Web Display the current sequence in your web browser Auto show Automatically do the "Show" command for each new recommendation Auto web Automatically do the "Web" command for each new recommendation Y, Relevant The currently displayed sequence matches this topic N, Irrelevant The currently displayed sequence does NOT match Unrate Remove any rating previously applied to the current sequence Next, Skip Show me another sequence you think is relevant A012345 Show me sequence A012345 Best Re-select the latest recommendation Ratings List all sequences that have been rated in this topic Save Save the state of the current topic. Quit, Exit, Bye Exit Sloandora (also does "save") Help, ? Displays this help    Topic 'mandel' > Y Rating 'A098403' at 1; 8 threads: ....... 55.42 CPU-seconds. Performed 2.14e+11 key comparisons in approx. 7.620 sec. with 8 threads. Next recommendation: A056783 (score 0.673) Number of diamond polyominoes with n cells. http://oeis.org/A056783 Topic 'mandel' > n Rating 'A056783' at -1; 8 threads: ....... 56.30 CPU-seconds. Performed 1.75e+11 key comparisons in approx. 7.741 sec. with 8 threads. Next recommendation: A102525 (score 0.542) Decimal expansion of log(2)/log(3). http://oeis.org/A102525 Topic 'mandel' > n Rating 'A102525' at -1; 8 threads: ........ 62.83 CPU-seconds. Performed 2.26e+11 key comparisons in approx. 8.639 sec. with 8 threads. Next recommendation: A019473 (score 0.369) Number of stable n-celled patterns ("still lifes") in Conway's game of Life. http://oeis.org/A019473 Topic 'mandel' > n Rating 'A019473' at -1; 8 threads: ........ 61.78 CPU-seconds. Performed 2.41e+11 key comparisons in approx. 8.495 sec. with 8 threads. Next recommendation: A006875 (score 0.233) Non-seed mu-atoms of period n in Mandelbrot set. http://oeis.org/A006875 Topic 'mandel' > y Rating 'A006875' at 1; 8 threads: ......... 71.15 CPU-seconds. Performed 3.42e+11 key comparisons in approx. 9.783 sec. with 8 threads. Next recommendation: A006876 (score 0.537) Mu-molecules in Mandelbrot set whose seeds have period n. http://oeis.org/A006876 Topic 'mandel' > y Rating 'A006876' at 1; 8 threads: .......... 68.79 CPU-seconds. Performed 2.9e+11 key comparisons in approx. 9.459 sec. with 8 threads. Next recommendation: A006874 (score 0.765) Mu-atoms of period n on continent of Mandelbrot set. http://oeis.org/A006874 Topic 'mandel' > y Rating 'A006874' at 1; 8 threads: ......... 70.97 CPU-seconds. Performed 3.2e+11 key comparisons in approx. 9.758 sec. with 8 threads. Next recommendation: A007436 (score 0.653) Moebius transform of Fibonacci numbers. http://oeis.org/A007436 Topic 'mandel' > y Rating 'A007436' at 1; 8 threads: ........ 65.60 CPU-seconds. Performed 2.44e+11 key comparisons in approx. 9.020 sec. with 8 threads. Next recommendation: A007435 (score 0.865) Inverse Moebius transform of Fibonacci numbers 1,1,2,3,5,8,... http://oeis.org/A007435 Topic 'mandel' > A007436 Loading A007436 (current score: 1.152; rating: 1) Moebius transform of Fibonacci numbers. http://oeis.org/A007436 Topic 'mandel' > unrate Un-rating 'A007436' back to 0; 8 threads: ........ 65.55 CPU-seconds. Performed 2.44e+11 key comparisons in approx. 9.013 sec. with 8 threads. Next recommendation: A000199 (score 0.650) Coefficient of q^(2n-1) in the series expansion of Ramanujan's mock theta function f(q). http://oeis.org/A000199 Topic 'mandel' > irrelevant Rating 'A000199' at -1; 8 threads: ........ 66.61 CPU-seconds. Performed 2.59e+11 key comparisons in approx. 9.159 sec. with 8 threads. Next recommendation: A094358 (score 0.465) Squarefree products of factors of Fermat numbers (A023394). http://oeis.org/A094358 Topic 'mandel' > n Rating 'A094358' at -1; 8 threads: ......... 67.67 CPU-seconds. Performed 2.85e+11 key comparisons in approx. 9.305 sec. with 8 threads. Next recommendation: A007557 (score 0.358) Shifts left when inverse Moebius transform applied twice. http://oeis.org/A007557 Topic 'mandel' > n Rating 'A007557' at -1; 8 threads: ......... 66.88 CPU-seconds. Performed 2.67e+11 key comparisons in approx. 9.196 sec. with 8 threads. Next recommendation: A097486 (score 0.234) A relationship between Pi and the Mandelbrot set. a(n) = number of iterations of z^2 + c that c-values -.75 + x*i go through before escaping, where x = 10^(-n). lim n->inf a(n) * x = Pi. http://oeis.org/A097486 Topic 'mandel' > y Rating 'A097486' at 1; 8 threads: ........... 81.75 CPU-seconds. Performed 5.69e+11 key comparisons in approx. 11.241 sec. with 8 threads. Next recommendation: A100140 (score 0.308) Largest denominator of greedy Egyptian fraction sum for M/N. http://oeis.org/A100140 Topic 'mandel' > n Rating 'A100140' at -1; 8 threads: ......... 68.04 CPU-seconds. Performed 3e+11 key comparisons in approx. 9.356 sec. with 8 threads. Next recommendation: A005600 (score 0.213) Current estimate of decimal expansion of reciprocal of fine-structure constant alpha. http://oeis.org/A005600 Topic 'mandel' > n Rating 'A005600' at -1; 8 threads: ......... 70.38 CPU-seconds. Performed 3.61e+11 key comparisons in approx. 9.677 sec. with 8 threads. Next recommendation: A137560 (score 0.141) Triangle read by rows: coefficients of the Mandelbrot-Julia quadratic polynomials (sometimes called cycle polynomials or Pc polynomials): Pc[0]=1; Pc[1]=c; Pc[n]=nestedfunction(Pc[z^2+c],n]. http://oeis.org/A137560 Topic 'mandel' > y Rating 'A137560' at 1; 8 threads: ............. 93.98 CPU-seconds. Performed 7.42e+11 key comparisons in approx. 12.922 sec. with 8 threads. Next recommendation: A137867 (score 0.273) Triangular sequence of coefficients of the Misiurewicz polynomial which are made from the Pc Mandelbrot -Julia polynomials A137560 as: Pc(x,n)-Pc(x,m); n<>m. http://oeis.org/A137867 Topic 'mandel' > y Rating 'A137867' at 1; 8 threads: .......... 68.95 CPU-seconds. Performed 6.57e+11 key comparisons in approx. 9.481 sec. with 8 threads. Next recommendation: A136705 (score 0.378) Triangle read by rows where the n-th row gives the coefficients of the characteristic polynomial for a Fibonacci-type matrix with a=1 and b=1. http://oeis.org/A136705 Topic 'mandel' > n Rating 'A136705' at -1; 8 threads: ........ 64.16 CPU-seconds. Performed 3.95e+11 key comparisons in approx. 8.822 sec. with 8 threads. Next recommendation: A137452 (score 0.243) A triangular sequence from the coefficients an Abel polynomial given as an example in Roman:a0=1; a(x,n)=x*(x-a0**n)^(n-1). http://oeis.org/A137452 Topic 'mandel' > n Rating 'A137452' at -1; 8 threads: ........... 84.42 CPU-seconds. Performed 5.94e+11 key comparisons in approx. 11.608 sec. with 8 threads. Next recommendation: A000740 (score 0.133) Number of 2n-bead balanced binary necklaces of fundamental period 2n, equivalent to reversed complement; also Dirichlet convolution of b_n=2^(n-1) with mu(n); also number of components of Mandelbrot set corresponding to Julia sets with an attractive n-cycle. http://oeis.org/A000740 Topic 'mandel' > yes Rating 'A000740' at 1; 8 threads: ............ 91.54 CPU-seconds. Performed 6.85e+11 key comparisons in approx. 12.587 sec. with 8 threads. Next recommendation: A003095 (score 0.283) a(n) = a(n-1)^2 + 1. http://oeis.org/A003095 Topic 'mandel' > n Rating 'A003095' at -1; 8 threads: ............ 89.73 CPU-seconds. Performed 6.16e+11 key comparisons in approx. 12.338 sec. with 8 threads. Next recommendation: A076078 (score 0.126) a(n) = number of nonempty sets of distinct positive integers that have a least common multiple of n. http://oeis.org/A076078 Topic 'mandel' > n Rating 'A076078' at -1; 8 threads: .......... 80.94 CPU-seconds. Performed 5.23e+11 key comparisons in approx. 11.129 sec. with 8 threads. Next recommendation: A138061 (score 0.065) This sequence is a triangular sequence formed by the substitution: ( French sideways graph) 1->1,2;2->3;3->4;4->1; as a Markov style substitution form. The result is the differential polynomial coefficient form. ( first zero omitted). http://oeis.org/A138061 Topic 'mandel' > n Rating 'A138061' at -1; 8 threads: ......... 72.92 CPU-seconds. Performed 5.21e+11 key comparisons in approx. 10.027 sec. with 8 threads. Next recommendation: A051793 (score 0.022) a(n)=sum((-1)^i*a(i),i=n-4..n-1), a(1)=1,a(2)=1,a(3)=1,a(4)=1. http://oeis.org/A051793 Topic 'mandel' > n Rating 'A051793' at -1; 8 threads: ....... 57.43 CPU-seconds. Performed 1.95e+11 key comparisons in approx. 7.897 sec. with 8 threads. Next recommendation: A018897 (score -0.009) Weight distribution of [512,382,16] 9th-order Reed-Muller (or RM) code. http://oeis.org/A018897 Topic 'mandel' > n Rating 'A018897' at -1; 8 threads: ............................ 214.14 CPU-seconds. Performed 4.32e+12 key comparisons in approx. 29.444 sec. with 8 threads. Next recommendation: A101201 (score -0.017) Maximal number of kings in the toroidal king's graph on an n X n board such that each king is attacking no more than four other kings. http://oeis.org/A101201 Topic 'mandel' > No Rating 'A101201' at -1; 8 threads: ........... 85.25 CPU-seconds. Performed 1.72e+12 key comparisons in approx. 11.722 sec. with 8 threads. Next recommendation: A208510 (score -0.022) Triangle of coefficients of polynomials u(n,x) jointly generated with A029653; see the Formula section. http://oeis.org/A208510 Topic 'mandel' > save Saving topic 'mandel'...done Topic 'mandel' > q done bash$

Older Mandelbrot Example

Here is an older Mandelbrot session, run on a system with 16 hardware threads:

bash$ ./Sloandora _ _ (_` | _ ._ ._ _| _ ._ ._ , ) | ( ) ,-| | | ( | ( ) | ,-| ~ ~ ~ `~` ~ ~ ~` ~ ~ `~`    User-Aided Integer Sequence Auto-Discovery System by Robert Munafo, with the seqfan list members    Loading database index... Select a topic: mandelbrot There is no topic 'mandelbrot'. Do you want to create one? yes Enter the A-number of a sequence to start this topic: 52154 Rating 'A052154' at 1; 16 threads: ...... 56.91 CPU-seconds. Next recommendation: A098403 (score 0.203) Decimal expansion of area of Mandelbrot set.    Command loop (type 'help' for help) Topic 'mandelbrot' > show Current sequence is A098403; its score is 0.203 %---- A098403    %S 1,5,0,6,5,9,1 %N Decimal expansion of area of Mandelbrot set. %D Nigel Lesmoir-Godon, Will Rood and Ralph Edney, 'Introducing Fractal Geometry,' Icon Books UK & Totem Books USA, 2000, page 97. %H Kerry Mitchell, <a href="http://www.fractalus.com/kerry/articles/area/mandelbrot-area.html">A Statistical Investigation of the Area of the Mandelbrot Set</a> %H Robert P. Munafo, <a href="http://www.mrob.com/pub/muency/areaofthemandelbrotset.html">Area of the Mandelbrot Set</a>. %H Eric W. Weisstein, <a href="http://mathworld.wolfram.com/MandelbrotSet.html">Link to a section of The World of Mathematics.</a>. %H Eric W. Weisstein, <a href="http://icl.pku.edu.cn/yujs/MathWorld/math/m/m049.htm">Mandelbrot Set</a>. %H Eric W. Weisstein, <a href="http://www.ics.uci.edu/~eppstein/junkyard/mand-area.html">Mandelbrot Area</a>. %e Area = 1.50659177 +/- 0.00000008. %Y Adjacent sequences: A098400 A098401 A098402 this_sequence A098404 A098405 A098406 %Y Sequence in context: A130415 A035550 A096287 this_sequence A069206 A091685 A062824 %K cons,more,nonn %O 1,2 %A Robert G. Wilson v (rgwv(AT)rgwv.com), Sep 06 2004    Topic 'mandelbrot' > y Rating 'A098403' at 1; 16 threads: ...... 52.62 CPU-seconds. Next recommendation: A054671 (score 0.398) Denominators of (reduced) coefficients of Laurent series for conformal mapping from exterior of unit disk onto exterior of Mandelbrot set. Topic 'mandelbrot' > y Rating 'A054671' at 1; 16 threads: ....... 56.96 CPU-seconds. Next recommendation: A054670 (score 0.751) Numerators of (reduced) coefficients of Laurent series for conformal mapping from exterior of unit disk onto exterior of Mandelbrot set. Topic 'mandelbrot' > y Rating 'A054670' at 1; 16 threads: ....... 58.57 CPU-seconds. Next recommendation: A055544 (score 0.781) Total number of nodes in all rooted trees with n nodes. Topic 'mandelbrot' > n Rating 'A055544' at -1; 16 threads: ...... 50.87 CPU-seconds. Next recommendation: A090037 (score 0.547) Number of tetramagic (4-multimagic) series for squares of order n. Topic 'mandelbrot' > n Rating 'A090037' at -1; 16 threads: ...... 50.52 CPU-seconds. Next recommendation: A102525 (score 0.415) Decimal expansion of log(2)/log(3). Topic 'mandelbrot' > n Rating 'A102525' at -1; 16 threads: ...... 57.35 CPU-seconds. Next recommendation: A006874 (score 0.281) Mu-atoms of period n on continent of Mandelbrot set. Topic 'mandelbrot' > y Rating 'A006874' at 1; 16 threads: ....... 60.26 CPU-seconds. Next recommendation: A006875 (score 0.593) Non-seed mu-atoms of period n in Mandelbrot set. Topic 'mandelbrot' > y Rating 'A006875' at 1; 16 threads: ...... 49.89 CPU-seconds. Next recommendation: A006876 (score 0.932) Mu-molecules in Mandelbrot set whose seeds have period n. Topic 'mandelbrot' > show Current sequence is A006876; its score is 0.932 %---- A006876 %I M2883 %S 1,0,1,3,11,20,57,108,240,472,1013,1959,4083,8052,16315,32496, %T 65519,130464,262125,523209,1048353,2095084,4194281,8384100,16777120 %N Mu-molecules in Mandelbrot set whose seeds have period n. %D B. B. Mandelbrot, The Fractal Geometry of Nature, Freeman, NY, 1982, p. 183. %D R. Penrose, The Emperor's New Mind, Penguin Books, NY, 1991, p. 138. %H R. P. Munafo, <a href="http://www.mrob.com/pub/muency.html">Mu-Ency - The Encyclopedia of the Mandelbrot Set</a> %Y Cf. A006874, A006875. %Y Adjacent sequences: A006873 A006874 A006875 this_sequence A006877 A006878 A006879 %Y Sequence in context: A075226 A028978 A082628 this_sequence A031239 A088619 A031318 %K nonn %O 1,4 %A mrob(AT)mrob.com (Robert P Munafo)    Topic 'mandelbrot' > ratings Rated sequences in topic 'mandelbrot': Uprated: A006874 A054670 A052154 A054671 A098403 A006875 Down-rated: A102525 A055544 A090037 Topic 'mandelbrot' > q Saving topic 'mandelbrot'...done

Performance Benchmarks

Every time a sequence is rated, the concordance metric must be performed between it and all other sequences in the database. The processing time needed (measured in "CPU-seconds", number of logical CPUs times elapsed CPU time) is proportional to the size of the entire database times the log (base 2) of the size of the sequence being rated. The number of keyword comparisons per second is roughly the product of the database size and the article size. Using a 2007 version of the OEIS containing 131553 sequences (110 megabytes), a 2.33-GHz Core 2 Duo laptop takes about 20-25 seconds to rate a sequence, and an 8-core, two-socket Xeon "Nehalem" workstation takes less than 4 seconds. The current algorithm is reasonably efficient (see the source) but could still be optimized further.

For significantly larger databases one would need to abandon the concordance metric for something a little less accurate. A number of algorithms have been developed for similar applications, using Markov chains or similar functions. Spam filters typically model character sequences as predictors for subsequent characters; and the Google PageRank system performs a similar function using webpage links as predictors for the relevance of linked pages. In Sloandora, the character-sequence analysis could be used as part of a more efficient concordance metric. The PageRank approach could be applied to existing links in the OEIS database (including when one sequence links explicitly to another, and when two sequences both link to something else like a journal article).

Future Improvements

In addition to performance improvements just discussed, I imagine doing some of the following soon:

It is clear from my results so far that specific types of text (like formulas, sequence numbers, journal titles, email addresses and names of people) should be identified and tagged in different ways. The current implementation uses a generic concordance metric, which is completely "agnostic" to the purpose of the different OEIS data fields. Field-tagging is done in other text concordance applications (for example see [1] e.g. "Candidate Answer Pruning" on page 7).

A lot of this can be done fairly easily, for example, by giving certain fields (like %Y and %E lines) less importance and others (like %S and %T) more. This could be done at the user's recommendation, or by default weights that the system starts with for any newly created topic. The low-level implementation of this is fairly easy within the current concordance search algorithm.

It is relatively easy to "undo" a Relevant or Irrelevant rating, so I'll probably implement that next.

There should be options to display more than one recommendation at a time, and give the user a choice of which one to look at and/or rate. (Only a tiny additional amount of computation would be required.)

When new sequences are added to the OEIS database, the topic needs to be updated. If the updated sequence was not part of the topic definition (i.e. was never given a rating) then all we need to do is update that sequence's score (the sum of its concordance against all Y-rated items minus its concordance against all N-rated items). If the updated sequence is one of the defining sequences for the topic, then the process takes longer: First "undo" the sequence's rating; then "re-rate" the sequence using the new text.

With greater effort, the search algorithm could be improved by the use of a content-addressable index. There is no obvious way to retain the exact metric described above, but the content-addressable index could be used to model the OEIS database as a set of webpages that are linked if they share substrings, and algorithms designed for personalized webpage ranking (e.g. [6]) could provide the auto-recommendation.


Current Source Code

Sloandora runs on UNIX, Linux, or MacOS X systems with Perl and a C compiler.

The C source for the concordance metric is here.

The controlling Perl script is here.

You will also need the eisBTfry00000.txt, eisBTfry00001.txt, etc. files and I can't help you with that, but you can ask the seqfan list (or perhaps email N.J.A. Sloane directly if he already knows you).

A README file explaining how to put the pieces together is here

If you need help setting this up, contact me by email ("email me" link at the bottom of this page).


References

[1] J.H. Kil, L. Lloyd, and S. Skiena, Question Answering with Lydia, TREC-2005, Proceedings of the 2005 Text REtrieval Conference, http://trec.nist.gov/pubs/trec14/t14_proceedings.html

[2] M. Bautin and S. Skiena, Concordance-Based Entity-Oriented Search, Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (2007) 586-592.

[3] Pandora® Internet Radio, (interactive music system), http://www.pandora.com/, 2007-2010.

[4] Rob Walker, The song decoders, New York Times Magazine, 2009 Oct 14. http://www.nytimes.com/2009/10/18/magazine/18Pandora-t.html

[5] The OEIS Foundation, Inc. (http://www.oeisf.org/) has stewardship of The On-Line Encyclopedia of Integer Sequences (OEIS). The legacy (or "Classic") server is at http://oeis.org/classic, and the Wiki is at http://www.oeis.org/wiki.

[6] Christian Borgs et al., A Sublinear Time Algorithm for PageRank Computations, 2012. At arxiv:1202.2771


The On-Line Encyclopedia of Integer Sequences ("OEIS") is intellectual property of The OEIS Foundation Inc.

Pandora is a registered trademark of Pandora Media, Inc.


Robert Munafo's home pages on AWS    © 1996-2024 Robert P. Munafo.    about    contact
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Details here.

This page was written in the "embarrassingly readable" markup language RHTF, and was last updated on 2014 Dec 13. s.27