Sloandora: A metric and UI for Integer Sequence exploration
This page describes a software project I developed in the fall of 2009 with the help of suggestions from the seqfan mailing list.
Contents
Introduction
The overall goal is to provide a software tool for computerassisted discovery of integer sequences from OEIS [5] that are related to a user's area of research. It is understood that the user can look at a sequence and tell the computer whether or not it is "interesting" (relevant to his area of research) or not. It is also understood that the computer can determine (through the use of automatic techniques possibly supplemented by input from other users or database "curators") which sequences are "likely" to be interesting given the user's indicated preferences so far.
An analogy for this type of service already exists in:
 The Pandora [3] music discovery service, which helps listeners find music they might like based on cognitive attributes
 Lydia and related systems for news analysis [2]
Overview of Design
The overall structure of my implementation is the following:
 A database of items (sequences in our case)
 A metric algorithm for measuring the cognitive distance between any two items
 A user interface for:
1. creating a topic (corresponding to a particular area of interest, or a field of research)
2. autorecommending items to the user
3. Userrating items as "relevant", "not relevant", or no comment (ambivalent)
The implementation given here (source code below) uses a perl commandline interface and a C compiled tool that implements the metric. It uses a local copy of the OEIS database in the form of eisBTfry00000.txt files (which are no longer available except through special request).
Comparison to Other Systems
There are a number of other tools for sequence exploration, which fulfill different objectives:
 The OEIS "webcam" available here allows users to see recent additions.
 The Superseeker for "mystery" sequences that are not currently in OEIS.
 Simon Plouffe's GFUN package (also primarily for mystery sequences).
 The seqfan list to ask questions to other interested people.
None of these were an adequate solution for my objectives.
The Sloandora Metric
The "metric" is the algorithm for determining how closely related two sequences are. It is central to any assisted browsing system.
The Pandora music discovery service [3] uses a database (the Music Genome Project) that was built up with the aid of real people, who listened to music and created a "genome" for each song. Each "genome" is essentially a set of cognitive attribute tags drawn from hundreds (or possibly thousands) of different musical attributes. Using humans is necessary partly because music has to be listened to (and somehow interpreted) to discover its cognitive content, and computers aren't too good at that yet.[4]
In our case the job is made easier by the fact that the OEIS entries are essentially just plain text, and the text itself is about the right size to use as the genome.
So I have chosen a characterlevel concordance metric as the initial algorithm for measuring similarity of sequences. My specific implementation works as follows:
 Go through the first source text, compiling a list of all 5byte substrings that appear and their frequencies. Go through the second source text and compile the same statistics for it. Thus for each string we have a count for A and another count for B.
 For each substring we then redistribute the counts into an "AB count". AB equals the lesser of A and B, and A and B are then both decreased by this amount. For example, if "facto" appears three times in source text A and two times in source text B, then the initial scan produces A{"facto"}=3 and B{"facto"}=2; this is then adjusted to produce A{"facto"}=1, AB{"facto"}=2 and B{"facto"}=0.
 Then we compute the coverage of A over B, which measures how much source A "covers" source B. This is simply the total of all AB counts divided by the sum of AB's and B's. So for example, if source B contains 1004 bytes (and thus 1000 5character strings), and if 700 of these 5character strings occur in A, then A's coverage of B is 0.7 (or 70%). B's coverage of A is computed similarly.
 The concordance between A and B is the lesser of the two coverage values.
Topic State Variables
The user's browsing is centered around "topics", which represent distinct areas of interest.
Each topic has a datafile that contains its current state. This state consists of:
 A list of every sequence in the OEIS, with a score telling how well that sequence fits this topic
 A list of every sequence that has been recommended to the user so far
 A flag for each recommendation telling whether it was given a positive or negative rating (or neither).
User Interface
The user begins by creating a "new topic" representing an area of research. To create a topic the user must select a single relevant sequence. This is the "seed sequence".
The program computes the concordance match between the seed sequence and all other sequences, and stores this information in a file representing the current state of that topic.
Then the program selects the sequence that has the highest concordance with the seed sequence, and presents it to the user. (In MacOS, I can open the sequence's web page into a broswer).
The user can then perform the following actions:
"" or Next: give me another automatic recommendation
Y or Relevant: this sequence belongs to this topic, use it to improve your recommendations in the future.
N or Irrelevant: this sequence does not belong to this topic, avoid sequences like it in the future.
A123456: Display a sequence on request, so I can decide if it is relevant and rate it accordingly.
When the user selects Next, another sequence is recommended and no other actions are taken.
When the user selects Relevant or Irrelevant, the software performs the concordance metric between the new sequence and all other sequences in the database. The resulting concordance score is either added to or subtracted from the current scores for each sequence, and the topic state is updated.
The action of requesting a specific sequence does not affect the topic state, unless the user subsequently rates it with the Relevant or Irrelevant command.
Results
Initial testing of the metric has been performed. I used the metric described above with a 5character token size to find the closest sequence in the OEIS to a given seed sequence:

Example Sessions
Here is an example session in which the user wants to study Egyptian fractions:
bash$ ./Sloandora _ _ (_`  _ ._ ._ _ _ ._ ._ , )  ( ) ,   (  ( )  , ~ ~ ~ `~` ~ ~ ~` ~ ~ `~` UserAided Integer Sequence AutoDiscovery System by Robert Munafo, with the seqfan list members Loading database index... Select a topic: egyptian There is no topic 'egyptian'. Do you want to create one? yes Enter the Anumber of a sequence to start this topic: A100140 Rating 'A100140' at 1; 2 threads: ...................................................... 95.27 CPUseconds. Next recommendation: A117116 (score 0.179) Denominators of an Egyptian Fraction for phi = (1+sqrt(5))/2. Command loop (type 'help' for help) Topic 'egyptian' > relevant Rating 'A117116' at 1; 2 threads: ........................................... 77.28 CPUseconds. Next recommendation: A006525 (score 0.420) Denominator of Egyptian fraction for e2. Topic 'egyptian' > show Current sequence is A006525; its score is 0.420 % A006525 %I M1553 %S 2,5,55,9999,3620211523,25838201785967533906, %T 3408847366605453091140558218322023440765 %N Denominator of Egyptian fraction for e2. %H <a href="http://www.research.att.com/~njas/sequences/Sindx_Ed.html#Egypt">Index entries for sequences related to Egyptian fractions</a> %H Eric Weisstein's World of Mathematics, <a href="http://mathworld.wolfram.com/EgyptianFraction.html">Egyptian Fraction</a> %Y Adjacent sequences: A006522 A006523 A006524 this_sequence A006526 A006527 A006528 %Y Sequence in context: A114029 A013171 A073422 this_sequence A042161 A101151 A030244 %K nonn %O 0,1 %A njas %E More terms from H. P. Robinson. Topic 'egyptian' > y Rating 'A006525' at 1; 2 threads: ...................................... 68.64 CPUseconds. Next recommendation: A006524 (score 0.793) Egyptian fraction for 1/ Pi. Topic 'egyptian' > A01234 Loading A001234 (current score: 0.341) Stirling numbers of first kind. Topic 'egyptian' > ? Sloandora commands: Show Display the entire record for the current sequence Web Display the current sequence in your web browser Y, Relevant The currently displayed sequence matches this topic N, Irrelevant The currently displayed sequence does NOT match Next, Skip Show me another sequence you think is relevant A012345 Show me sequence A012345 Best Reselect the latest recommendation Ratings List all sequences that have been rated in this topic Save Save the state of the current topic. Quit, Exit, Bye Exit Sloandora (also does "save") Help, ? Displays this help Topic 'egyptian' > best Current recommendation: A006524 (score 0.793) Egyptian fraction for 1/ Pi. Topic 'egyptian' > y Rating 'A006524' at 1; 2 threads: .......................................... 70.63 CPUseconds. Next recommendation: A006487 (score 1.143) Egyptian fraction for square root of 2. Topic 'egyptian' > q Saving topic 'egyptian'...done
Here is another example session in which a few sequences had to be rated Irrelevant. This session was run on a system with 8 hardware threads; the user is interested in anything related to the Mandelbrot set:
bash$ ./Sloandora _ _ (_`  _ ._ ._ _ _ ._ ._ , )  ( ) ,   (  ( )  , ~ ~ ~ `~` ~ ~ ~` ~ ~ `~` UserAided Integer Sequence AutoDiscovery System by Robert Munafo, with the seqfan list members Found 0 parsed eis/DBIS files. Repartitioning eisBTfry files into 8 pieces... 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 Loading database index... Select a topic: mandel There is no topic 'mandel'. Do you want to create one? yes Enter the Anumber of a sequence to start this topic: 52154 Rating 'A052154' at 1; 8 threads: ......... 66.46 CPUseconds. Performed 3.08e+11 key comparisons in approx. 9.138 sec. with 8 threads. Next recommendation: A054671 (score 0.203) Denominators of (reduced) coefficients of Laurent series for conformal mapping from exterior of unit disk onto exterior of Mandelbrot set. Command loop (type 'help' for help) Topic 'mandel' > show Current sequence is A054671; its score is 0.203 % A054671 %d 20130208.030727 %S 2,8,4,128,1,1024,16,32768,1,262144,32,4194304,1,33554432,512, %T 2147483648,1,17179869184,2048,274877906944,64,2199023255552,2048, %U 70368744177664,1,562949953421312,131072,9007199254740992,256 %N Denominators of (reduced) coefficients of Laurent series for conformal mapping from exterior of unit disk onto exterior of Mandelbrot set. %C Sum converges very slowly: 10^118 terms to get first two digits, 10^1181 for three digits. %D John H. Ewing and G. Schober, "The area of the Mandelbrot set", Numer. Math. vol. 61, pp. 5972, 1992. %H Adam Majewski, <a href="http://republika.pl/fraktal/maxima.html">Maxima code for this sequence</a> %H Robert P. Munafo, <a href="http://www.mrob.com/pub/muency/laurentseries.html">Laurent Series</a> %H Eric Weisstein's World of Mathematics, <a href="http://mathworld.wolfram.com/MandelbrotSet.html">Mandelbrot Set</a> %p Munafo site gives Maple code. %Y Cf. A054670. %K frac,nonn %O 0,1 %A _Robert Munafo_, Apr 18 2000 %E Extended by _Eric W. Weisstein_, Nov 27, 2005 %E Definition corrected by Adam Majewski (adammaj1(AT)o2.pl), Nov 17 2006 Sequence data length: 1137 bytes. Topic 'mandel' > auto web Auto #web# is now on Topic 'mandel' > relevant Rating 'A054671' at 1; 8 threads: ........ 63.02 CPUseconds. Performed 2.28e+11 key comparisons in approx. 8.665 sec. with 8 threads. Next recommendation: A054670 (score 0.587) Numerators of (reduced) coefficients of Laurent series for conformal mapping from exterior of unit disk onto exterior of Mandelbrot set. http://oeis.org/A054670 Topic 'mandel' > y Rating 'A054670' at 1; 8 threads: ........ 65.52 CPUseconds. Performed 2.72e+11 key comparisons in approx. 9.009 sec. with 8 threads. Next recommendation: A098403 (score 0.584) Decimal expansion of area of Mandelbrot set. http://oeis.org/A098403 Topic 'mandel' > ok Unknown command (Type 'help' for help) Topic 'mandel' > help Sloandora commands: Show Display the entire record for the current sequence Web Display the current sequence in your web browser Auto show Automatically do the "Show" command for each new recommendation Auto web Automatically do the "Web" command for each new recommendation Y, Relevant The currently displayed sequence matches this topic N, Irrelevant The currently displayed sequence does NOT match Unrate Remove any rating previously applied to the current sequence Next, Skip Show me another sequence you think is relevant A012345 Show me sequence A012345 Best Reselect the latest recommendation Ratings List all sequences that have been rated in this topic Save Save the state of the current topic. Quit, Exit, Bye Exit Sloandora (also does "save") Help, ? Displays this help Topic 'mandel' > Y Rating 'A098403' at 1; 8 threads: ....... 55.42 CPUseconds. Performed 2.14e+11 key comparisons in approx. 7.620 sec. with 8 threads. Next recommendation: A056783 (score 0.673) Number of diamond polyominoes with n cells. http://oeis.org/A056783 Topic 'mandel' > n Rating 'A056783' at 1; 8 threads: ....... 56.30 CPUseconds. Performed 1.75e+11 key comparisons in approx. 7.741 sec. with 8 threads. Next recommendation: A102525 (score 0.542) Decimal expansion of log(2)/log(3). http://oeis.org/A102525 Topic 'mandel' > n Rating 'A102525' at 1; 8 threads: ........ 62.83 CPUseconds. Performed 2.26e+11 key comparisons in approx. 8.639 sec. with 8 threads. Next recommendation: A019473 (score 0.369) Number of stable ncelled patterns ("still lifes") in Conway's game of Life. http://oeis.org/A019473 Topic 'mandel' > n Rating 'A019473' at 1; 8 threads: ........ 61.78 CPUseconds. Performed 2.41e+11 key comparisons in approx. 8.495 sec. with 8 threads. Next recommendation: A006875 (score 0.233) Nonseed muatoms of period n in Mandelbrot set. http://oeis.org/A006875 Topic 'mandel' > y Rating 'A006875' at 1; 8 threads: ......... 71.15 CPUseconds. Performed 3.42e+11 key comparisons in approx. 9.783 sec. with 8 threads. Next recommendation: A006876 (score 0.537) Mumolecules in Mandelbrot set whose seeds have period n. http://oeis.org/A006876 Topic 'mandel' > y Rating 'A006876' at 1; 8 threads: .......... 68.79 CPUseconds. Performed 2.9e+11 key comparisons in approx. 9.459 sec. with 8 threads. Next recommendation: A006874 (score 0.765) Muatoms of period n on continent of Mandelbrot set. http://oeis.org/A006874 Topic 'mandel' > y Rating 'A006874' at 1; 8 threads: ......... 70.97 CPUseconds. Performed 3.2e+11 key comparisons in approx. 9.758 sec. with 8 threads. Next recommendation: A007436 (score 0.653) Moebius transform of Fibonacci numbers. http://oeis.org/A007436 Topic 'mandel' > y Rating 'A007436' at 1; 8 threads: ........ 65.60 CPUseconds. Performed 2.44e+11 key comparisons in approx. 9.020 sec. with 8 threads. Next recommendation: A007435 (score 0.865) Inverse Moebius transform of Fibonacci numbers 1,1,2,3,5,8,... http://oeis.org/A007435 Topic 'mandel' > A007436 Loading A007436 (current score: 1.152; rating: 1) Moebius transform of Fibonacci numbers. http://oeis.org/A007436 Topic 'mandel' > unrate Unrating 'A007436' back to 0; 8 threads: ........ 65.55 CPUseconds. Performed 2.44e+11 key comparisons in approx. 9.013 sec. with 8 threads. Next recommendation: A000199 (score 0.650) Coefficient of q^(2n1) in the series expansion of Ramanujan's mock theta function f(q). http://oeis.org/A000199 Topic 'mandel' > irrelevant Rating 'A000199' at 1; 8 threads: ........ 66.61 CPUseconds. Performed 2.59e+11 key comparisons in approx. 9.159 sec. with 8 threads. Next recommendation: A094358 (score 0.465) Squarefree products of factors of Fermat numbers (A023394). http://oeis.org/A094358 Topic 'mandel' > n Rating 'A094358' at 1; 8 threads: ......... 67.67 CPUseconds. Performed 2.85e+11 key comparisons in approx. 9.305 sec. with 8 threads. Next recommendation: A007557 (score 0.358) Shifts left when inverse Moebius transform applied twice. http://oeis.org/A007557 Topic 'mandel' > n Rating 'A007557' at 1; 8 threads: ......... 66.88 CPUseconds. Performed 2.67e+11 key comparisons in approx. 9.196 sec. with 8 threads. Next recommendation: A097486 (score 0.234) A relationship between Pi and the Mandelbrot set. a(n) = number of iterations of z^2 + c that cvalues .75 + x*i go through before escaping, where x = 10^(n). lim n>inf a(n) * x = Pi. http://oeis.org/A097486 Topic 'mandel' > y Rating 'A097486' at 1; 8 threads: ........... 81.75 CPUseconds. Performed 5.69e+11 key comparisons in approx. 11.241 sec. with 8 threads. Next recommendation: A100140 (score 0.308) Largest denominator of greedy Egyptian fraction sum for M/N. http://oeis.org/A100140 Topic 'mandel' > n Rating 'A100140' at 1; 8 threads: ......... 68.04 CPUseconds. Performed 3e+11 key comparisons in approx. 9.356 sec. with 8 threads. Next recommendation: A005600 (score 0.213) Current estimate of decimal expansion of reciprocal of finestructure constant alpha. http://oeis.org/A005600 Topic 'mandel' > n Rating 'A005600' at 1; 8 threads: ......... 70.38 CPUseconds. Performed 3.61e+11 key comparisons in approx. 9.677 sec. with 8 threads. Next recommendation: A137560 (score 0.141) Triangle read by rows: coefficients of the MandelbrotJulia quadratic polynomials (sometimes called cycle polynomials or Pc polynomials): Pc[0]=1; Pc[1]=c; Pc[n]=nestedfunction(Pc[z^2+c],n]. http://oeis.org/A137560 Topic 'mandel' > y Rating 'A137560' at 1; 8 threads: ............. 93.98 CPUseconds. Performed 7.42e+11 key comparisons in approx. 12.922 sec. with 8 threads. Next recommendation: A137867 (score 0.273) Triangular sequence of coefficients of the Misiurewicz polynomial which are made from the Pc Mandelbrot Julia polynomials A137560 as: Pc(x,n)Pc(x,m); n<>m. http://oeis.org/A137867 Topic 'mandel' > y Rating 'A137867' at 1; 8 threads: .......... 68.95 CPUseconds. Performed 6.57e+11 key comparisons in approx. 9.481 sec. with 8 threads. Next recommendation: A136705 (score 0.378) Triangle read by rows where the nth row gives the coefficients of the characteristic polynomial for a Fibonaccitype matrix with a=1 and b=1. http://oeis.org/A136705 Topic 'mandel' > n Rating 'A136705' at 1; 8 threads: ........ 64.16 CPUseconds. Performed 3.95e+11 key comparisons in approx. 8.822 sec. with 8 threads. Next recommendation: A137452 (score 0.243) A triangular sequence from the coefficients an Abel polynomial given as an example in Roman:a0=1; a(x,n)=x*(xa0**n)^(n1). http://oeis.org/A137452 Topic 'mandel' > n Rating 'A137452' at 1; 8 threads: ........... 84.42 CPUseconds. Performed 5.94e+11 key comparisons in approx. 11.608 sec. with 8 threads. Next recommendation: A000740 (score 0.133) Number of 2nbead balanced binary necklaces of fundamental period 2n, equivalent to reversed complement; also Dirichlet convolution of b_n=2^(n1) with mu(n); also number of components of Mandelbrot set corresponding to Julia sets with an attractive ncycle. http://oeis.org/A000740 Topic 'mandel' > yes Rating 'A000740' at 1; 8 threads: ............ 91.54 CPUseconds. Performed 6.85e+11 key comparisons in approx. 12.587 sec. with 8 threads. Next recommendation: A003095 (score 0.283) a(n) = a(n1)^2 + 1. http://oeis.org/A003095 Topic 'mandel' > n Rating 'A003095' at 1; 8 threads: ............ 89.73 CPUseconds. Performed 6.16e+11 key comparisons in approx. 12.338 sec. with 8 threads. Next recommendation: A076078 (score 0.126) a(n) = number of nonempty sets of distinct positive integers that have a least common multiple of n. http://oeis.org/A076078 Topic 'mandel' > n Rating 'A076078' at 1; 8 threads: .......... 80.94 CPUseconds. Performed 5.23e+11 key comparisons in approx. 11.129 sec. with 8 threads. Next recommendation: A138061 (score 0.065) This sequence is a triangular sequence formed by the substitution: ( French sideways graph) 1>1,2;2>3;3>4;4>1; as a Markov style substitution form. The result is the differential polynomial coefficient form. ( first zero omitted). http://oeis.org/A138061 Topic 'mandel' > n Rating 'A138061' at 1; 8 threads: ......... 72.92 CPUseconds. Performed 5.21e+11 key comparisons in approx. 10.027 sec. with 8 threads. Next recommendation: A051793 (score 0.022) a(n)=sum((1)^i*a(i),i=n4..n1), a(1)=1,a(2)=1,a(3)=1,a(4)=1. http://oeis.org/A051793 Topic 'mandel' > n Rating 'A051793' at 1; 8 threads: ....... 57.43 CPUseconds. Performed 1.95e+11 key comparisons in approx. 7.897 sec. with 8 threads. Next recommendation: A018897 (score 0.009) Weight distribution of [512,382,16] 9thorder ReedMuller (or RM) code. http://oeis.org/A018897 Topic 'mandel' > n Rating 'A018897' at 1; 8 threads: ............................ 214.14 CPUseconds. Performed 4.32e+12 key comparisons in approx. 29.444 sec. with 8 threads. Next recommendation: A101201 (score 0.017) Maximal number of kings in the toroidal king's graph on an n X n board such that each king is attacking no more than four other kings. http://oeis.org/A101201 Topic 'mandel' > No Rating 'A101201' at 1; 8 threads: ........... 85.25 CPUseconds. Performed 1.72e+12 key comparisons in approx. 11.722 sec. with 8 threads. Next recommendation: A208510 (score 0.022) Triangle of coefficients of polynomials u(n,x) jointly generated with A029653; see the Formula section. http://oeis.org/A208510 Topic 'mandel' > save Saving topic 'mandel'...done Topic 'mandel' > q done bash$
Here is an older Mandelbrot session, run on a system with 16 hardware threads:
bash$ ./Sloandora _ _ (_`  _ ._ ._ _ _ ._ ._ , )  ( ) ,   (  ( )  , ~ ~ ~ `~` ~ ~ ~` ~ ~ `~` UserAided Integer Sequence AutoDiscovery System by Robert Munafo, with the seqfan list members Loading database index... Select a topic: mandelbrot There is no topic 'mandelbrot'. Do you want to create one? yes Enter the Anumber of a sequence to start this topic: 52154 Rating 'A052154' at 1; 16 threads: ...... 56.91 CPUseconds. Next recommendation: A098403 (score 0.203) Decimal expansion of area of Mandelbrot set. Command loop (type 'help' for help) Topic 'mandelbrot' > show Current sequence is A098403; its score is 0.203 % A098403 %S 1,5,0,6,5,9,1 %N Decimal expansion of area of Mandelbrot set. %D Nigel LesmoirGodon, Will Rood and Ralph Edney, 'Introducing Fractal Geometry,' Icon Books UK & Totem Books USA, 2000, page 97. %H Kerry Mitchell, <a href="http://www.fractalus.com/kerry/articles/area/mandelbrotarea.html">A Statistical Investigation of the Area of the Mandelbrot Set</a> %H Robert P. Munafo, <a href="http://www.mrob.com/pub/muency/areaofthemandelbrotset.html">Area of the Mandelbrot Set</a>. %H Eric W. Weisstein, <a href="http://mathworld.wolfram.com/MandelbrotSet.html">Link to a section of The World of Mathematics.</a>. %H Eric W. Weisstein, <a href="http://icl.pku.edu.cn/yujs/MathWorld/math/m/m049.htm">Mandelbrot Set</a>. %H Eric W. Weisstein, <a href="http://www.ics.uci.edu/~eppstein/junkyard/mandarea.html">Mandelbrot Area</a>. %e Area = 1.50659177 +/ 0.00000008. %Y Adjacent sequences: A098400 A098401 A098402 this_sequence A098404 A098405 A098406 %Y Sequence in context: A130415 A035550 A096287 this_sequence A069206 A091685 A062824 %K cons,more,nonn %O 1,2 %A Robert G. Wilson v (rgwv(AT)rgwv.com), Sep 06 2004 Topic 'mandelbrot' > y Rating 'A098403' at 1; 16 threads: ...... 52.62 CPUseconds. Next recommendation: A054671 (score 0.398) Denominators of (reduced) coefficients of Laurent series for conformal mapping from exterior of unit disk onto exterior of Mandelbrot set. Topic 'mandelbrot' > y Rating 'A054671' at 1; 16 threads: ....... 56.96 CPUseconds. Next recommendation: A054670 (score 0.751) Numerators of (reduced) coefficients of Laurent series for conformal mapping from exterior of unit disk onto exterior of Mandelbrot set. Topic 'mandelbrot' > y Rating 'A054670' at 1; 16 threads: ....... 58.57 CPUseconds. Next recommendation: A055544 (score 0.781) Total number of nodes in all rooted trees with n nodes. Topic 'mandelbrot' > n Rating 'A055544' at 1; 16 threads: ...... 50.87 CPUseconds. Next recommendation: A090037 (score 0.547) Number of tetramagic (4multimagic) series for squares of order n. Topic 'mandelbrot' > n Rating 'A090037' at 1; 16 threads: ...... 50.52 CPUseconds. Next recommendation: A102525 (score 0.415) Decimal expansion of log(2)/log(3). Topic 'mandelbrot' > n Rating 'A102525' at 1; 16 threads: ...... 57.35 CPUseconds. Next recommendation: A006874 (score 0.281) Muatoms of period n on continent of Mandelbrot set. Topic 'mandelbrot' > y Rating 'A006874' at 1; 16 threads: ....... 60.26 CPUseconds. Next recommendation: A006875 (score 0.593) Nonseed muatoms of period n in Mandelbrot set. Topic 'mandelbrot' > y Rating 'A006875' at 1; 16 threads: ...... 49.89 CPUseconds. Next recommendation: A006876 (score 0.932) Mumolecules in Mandelbrot set whose seeds have period n. Topic 'mandelbrot' > show Current sequence is A006876; its score is 0.932 % A006876 %I M2883 %S 1,0,1,3,11,20,57,108,240,472,1013,1959,4083,8052,16315,32496, %T 65519,130464,262125,523209,1048353,2095084,4194281,8384100,16777120 %N Mumolecules in Mandelbrot set whose seeds have period n. %D B. B. Mandelbrot, The Fractal Geometry of Nature, Freeman, NY, 1982, p. 183. %D R. Penrose, The Emperor's New Mind, Penguin Books, NY, 1991, p. 138. %H R. P. Munafo, <a href="http://www.mrob.com/pub/muency.html">MuEncy  The Encyclopedia of the Mandelbrot Set</a> %Y Cf. A006874, A006875. %Y Adjacent sequences: A006873 A006874 A006875 this_sequence A006877 A006878 A006879 %Y Sequence in context: A075226 A028978 A082628 this_sequence A031239 A088619 A031318 %K nonn %O 1,4 %A mrob(AT)mrob.com (Robert P Munafo) Topic 'mandelbrot' > ratings Rated sequences in topic 'mandelbrot': Uprated: A006874 A054670 A052154 A054671 A098403 A006875 Downrated: A102525 A055544 A090037 Topic 'mandelbrot' > q Saving topic 'mandelbrot'...done
Performance Benchmarks
Every time a sequence is rated, the concordance metric must be performed between it and all other sequences in the database. The processing time needed (measured in "CPUseconds", number of logical CPUs times elapsed CPU time) is proportional to the size of the entire database times the log (base 2) of the size of the sequence being rated. The number of keyword comparisons per second is roughly the product of the database size and the article size. Using a 2007 version of the OEIS containing 131553 sequences (110 megabytes), a 2.33GHz Core 2 Duo laptop takes about 2025 seconds to rate a sequence, and an 8core, twosocket Xeon "Nehalem" workstation takes less than 4 seconds. The current algorithm is reasonably efficient (see the source) but could still be optimized further.
For significantly larger databases one would need to abandon the concordance metric for something a little less accurate. A number of algorithms have been developed for similar applications, using Markov chains or similar functions. Spam filters typically model character sequences as predictors for subsequent characters; and the Google PageRank system performs a similar function using webpage links as predictors for the relevance of linked pages. In Sloandora, the charactersequence analysis could be used as part of a more efficient concordance metric. The PageRank approach could be applied to existing links in the OEIS database (including when one sequence links explicitly to another, and when two sequences both link to something else like a journal article).
Future Improvements
In addition to performance improvements just discussed, I imagine doing some of the following soon:
It is clear from my results so far that specific types of text (like formulas, sequence numbers, journal titles, email addresses and names of people) should be identified and tagged in different ways. The current implementation uses a generic concordance metric, which is completely "agnostic" to the purpose of the different OEIS data fields. Fieldtagging is done in other text concordance applications (for example see [1] e.g. "Candidate Answer Pruning" on page 7).
A lot of this can be done fairly easily, for example, by giving certain fields (like %Y and %E lines) less importance and others (like %S and %T) more. This could be done at the user's recommendation, or by default weights that the system starts with for any newly created topic. The lowlevel implementation of this is fairly easy within the current concordance search algorithm.
It is relatively easy to "undo" a Relevant or Irrelevant rating, so I'll probably implement that next.
There should be options to display more than one recommendation at a time, and give the user a choice of which one to look at and/or rate. (Only a tiny additional amount of computation would be required.)
When new sequences are added to the OEIS database, the topic needs to be updated. If the updated sequence was not part of the topic definition (i.e. was never given a rating) then all we need to do is update that sequence's score (the sum of its concordance against all Yrated items minus its concordance against all Nrated items). If the updated sequence is one of the defining sequences for the topic, then the process takes longer: First "undo" the sequence's rating; then "rerate" the sequence using the new text.
With greater effort, the search algorithm could be improved by the use of a contentaddressable index. There is no obvious way to retain the exact metric described above, but the contentaddressable index could be used to model the OEIS database as a set of webpages that are linked if they share substrings, and algorithms designed for personalized webpage ranking (e.g. [6]) could provide the autorecommendation.
Current Source Code
Sloandora runs on UNIX, Linux, or MacOS X systems with Perl and a C compiler.
The C source for the concordance metric is here.
The controlling Perl script is here.
You will also need the eisBTfry00000.txt, eisBTfry00001.txt, etc. files and I can't help you with that, but you can ask the seqfan list (or perhaps email N.J.A. Sloane directly if he already knows you).
A README file explaining how to put the pieces together is here
If you need help setting this up, contact me by email ("email me" link at the bottom of this page).
[1] J.H. Kil, L. Lloyd, and S. Skiena, Question Answering with Lydia, TREC2005, Proceedings of the 2005 Text REtrieval Conference, http://trec.nist.gov/pubs/trec14/t14_proceedings.html
[3] Pandora® Internet Radio, (interactive music system), http://www.pandora.com/, 20072010.
[4] Rob Walker, The song decoders, New York Times Magazine, 2009 Oct 14. http://www.nytimes.com/2009/10/18/magazine/18Pandorat.html
[5] The OEIS Foundation, Inc. (http://www.oeisf.org/) has stewardship of The OnLine Encyclopedia of Integer Sequences (OEIS). The legacy (or "Classic") server is at http://oeis.org/classic, and the Wiki is at http://www.oeis.org/wiki.
[6] Christian Borgs et al., A Sublinear Time Algorithm for PageRank Computations, 2012. At arxiv:1202.2771
The OnLine Encyclopedia of Integer Sequences ("OEIS") is intellectual property of The OEIS Foundation Inc.
Pandora is a registered trademark of Pandora Media, Inc.
s.11