Mingmin on Hacks

My blog on some hacking thoughts.

Ruby Lazy Infinite Stream in the SICP Way

After reviewing the beautiful Streams in SICP again recently, while believing Ruby can be an acceptable Lisp, I wonder whether Ruby can express infinite streams in the same way. There are several existing implementations already, but they either have performance problem due to the lack of result caching or don’t have the expressiveness I want, so I decided to implement my own: lazy_stream. You can install it by:

1
$ gem install lazy_stream

Here I am going to explain the implementation and walk through some examples from SICP.

Streams Are Delayed Lists

A stream is just a delayed list. We do lazy evaluation in Ruby with code blocks:

Lazy Stream Definition
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class LazyStream
  def initialize(first=nil, &proc)
    @first = first
    @proc = block_given? ? proc : lambda { LazyStream.new }
  end

  attr_reader :first

  def rest
    @rest ||= @proc.call
  end

  def empty?
    first.nil?
  end
end

module Kernel
  def lazy_stream(first=nil, &rest)
    LazyStream.new(first, &rest)
  end
end

s = lazy_stream(1) { lazy_stream(2) { lazy_stream(3) } }
s.first      #=> 1
s.rest       #=> A LazyStream object for the rest of the list [2, 3]
lazy_stream  #=> An empty LazyStream object

The rest of a stream is a closure @proc which returns another stream. We get it by calling the closure and cache it meanwhile. Caching is like the memo-proc function in SICP that caches the result of the closure, which is important to greatly reduce computational complexity as we can see later.

A stream is empty when there is no first. We return an empty stream when there is no code block given for the rest. The rest of an empty stream is also another empty stream.

We can easily implement the methods at, drop, each, map, reduce, select, take, to_a just like the methods of Array as you can see in my full implementation. Note that drop, map, select, take are lazy too and return a LazyStream object that satisfies the conditions.

Infinite Streams

With those simple structures, we can already construct a lot of cool infinite streams explicitly, like these:

Defining Streams Explicitly
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
require 'lazy_stream'

def integers_starting_from(n)
  lazy_stream(n) { integers_starting_from(n + 1) }
end

integers_starting_from(1).take(10).to_a  #=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

def fibgen(a, b)
  lazy_stream(a) { fibgen(b, a + b) }
end

fibgen(0, 1).take(10).to_a  #=> [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

def sieve(stream)
  lazy_stream(stream.first) do
    sieve(stream.rest.select { |x| x % stream.first > 0 })
  end
end

def primes
  sieve(integers_starting_from(2))
end

primes.take(10).to_a  #=> [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]

The streams are all infinite. We define them with recursive functions which will only get evaluated when we actually need them. We can generate elements from the stream as many as we want with take and then loop them all with each or to_a.

The way we construct primes is called sieve of Eratosthenes. The short code here is barely the definition of the sieve of Eratosthenes, which shows the power and beauty of definitive programming.

Streams above were defined by specifying generating procedures that explicitly compute the stream elements one by one. An alternative way to specify streams is to take advantage of delayed evaluation to define streams implicitly. We need the help of the class methods LazyStream.add which adds multiple streams into one:

Defining Streams Implicitly
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
ones = lazy_stream(1) { ones }
ones.take(10).to_a  #=> [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

integers = lazy_stream(1) { LazyStream.add(ones, integers) }
integers.take(10).to_a  #=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# fibs is a stream beginning with 0 and 1, such that the rest of the stream can
# be generated by adding fibs to itself shifted by one place:
#     1 1 2 3 5  8 13 21 ... = fibs.rest
#     0 1 1 2 3  5  8 13 ... = fibs
# 0 1 1 2 3 5 8 13 21 34 ... = fibs
fibs = lazy_stream(0) { lazy_stream(1) { LazyStream.add(fibs.rest, fibs) } }
fibs.take(10).to_a  #=> [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

# The reason this definition works is that, at any point, enough of the primes
# stream has been generated to test the primality of the numbers we need to
# check next.
@primes = lazy_stream(2) { integers_starting_from(3).select(&method(:prime?)) }

def prime?(n)
  iter = -> ps do
    if ps.first**2 > n
      true
    elsif n % ps.first == 0
      false
    else
      iter.call(ps.rest)
    end
  end
  iter.call(@primes)
end

@primes.take(10).to_a  #=> [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]

To get n Fibonacci numbers from fibs, the time complexity is just O(n) because of the result caching of rest (we don’t need to calculate two fibs to get the final fibs because we made use of the cache that already got calculated).

Exploiting the Stream Paradigm

We can exploit the stream paradigm further like SICP did. We can define several streams to give approximations to π, with Euler’s transform and the tableau sequence accelerator:

Approximating π
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
def pi_summands(n)
  lazy_stream(1.0 / n) { pi_summands(n + 2).map(&:-@) }
end

def pi_stream
  pi_summands(1).partial_sums.scale(4)
end

pi_stream.take(8).print
# 4.0
# 2.666666666666667
# 3.466666666666667
# 2.8952380952380956
# 3.3396825396825403
# 2.9760461760461765
# 3.2837384837384844
# 3.017071817071818

def euler_transform(s)
  lazy_stream(s[2] - ((s[2] - s[1])**2 / (s[0] - 2 * s[1] + s[2]))) do
    euler_transform(s.rest)
  end
end

euler_transform(pi_stream).take(8).print
# 3.166666666666667
# 3.1333333333333337
# 3.1452380952380956
# 3.13968253968254
# 3.1427128427128435
# 3.1408813408813416
# 3.142071817071818
# 3.1412548236077655

def make_tableau(s, &transform)
  lazy_stream(s) { make_tableau(transform.call(s), &transform) }
end

def accelerated_sequence(s, &transform)
  make_tableau(s, &transform).map(&:first)
end

accelerated_sequence(pi_stream, &method(:euler_transform)).take(8).to_a
# 4.0
# 3.166666666666667
# 3.142105263157895
# 3.141599357319005
# 3.1415927140337785
# 3.1415926539752927
# 3.1415926535911765
# 3.141592653589778

The last super-acceleration is amazing. Taking 8 terms of the sequence yields the correct value of π to 14 digits. The stream formulation is particularly elegant and convenient.

Streams and Delayed Evaluation

Stream models of systems with loops may require uses of delay evaluation beyond the delay evaluation supplied by the code block of lazy_stream. For example, this figure shows a signal-processing system for solving the differential equation dy/dt=f(y) where f is a given function.

With the help of the promise gem, we can setup a delayed argument for the feedback signal in our stream definition:

Solving the differential equation dy/dt=f(y)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
require 'promise'

# integrand is a delayed argument that promises to give result after a loop
def integral(integrand, initial, dt)
  int = lazy_stream(initial) { LazyStream.add(integrand.scale(dt), int) }
end

# Setup the delayed integrand with promise
def solve(f, y0, dt)
  y = integral(promise { y.map(&f) }, y0, dt)
end

# Approximating e ~= 2.718 by computing the value at y = 1 of the solution to
# the differential equation dy/dt = y with initial condition y(0) = 1:
solve(lambda { |y| y }, 1, 0.001).at(1000)  #=> 2.716923932235896

Tail Call Optimization

Last but not the least, since the functions are all recursive here, to make them actually work for large data set, we need to enable the tail call optimization in Ruby, otherwise you will hit the stack level too deep error soon.

All the example code here can be found in my Ruby playground.