Higher-Order Go: July 2013

Wednesday, July 31, 2013

2.1: Configuration File Handling

Chapter 2 is all about breaking long if-else chains into maps of values and functions. This will allow us to separate the branching logic from the algorithm, enabling us to easily modify and extend the branching logic and even completely replace it with a different one.

In this first example we create a process that will read through a configuration file that has the form:

DIRECTIVE PARAMETERS

The algorithm runs through the file and executes the function for the DIRECTIVE passing to it the parameters defined in the file.

To start we define our types, the table that holds the branching logic and the functions that will represent the individual branches. In this case the function accepts a slice of strings and a reference to the branching logic.

type (
 DispatchFunc  func([]string, DispatchTable)
 DispatchTable map[string]DispatchFunc
)

The onReadConfig function expects a filename for its argument. It opens the file, reads each line of text, breaks it into tokens, looks up the function to execute using the dispatch table and the first token and then executes that function passing the rest of the line as parameter. It is the core algorithm of this program, but interestingly, it is fully reentrant and has the same signature as other functions in the dispatch table. It can therefore execute itself through the dispatch table in a round-about recursive way.

func onReadConfig(args []string, dispatch DispatchTable) {
 file, err := os.Open(args[0])
 if err != nil {
  panic(err.Error())
 }
 defer file.Close()
 r := bufio.NewReader(file)
 finished := false
 for !finished {
  line, err := r.ReadString('\n')
  if err == io.EOF {
   finished = true
  } else if err != nil {
   panic(err)
  }
  fields := strings.Fields(line)
  if len(fields) > 0 {
   if f, ok := dispatch[fields[0]]; ok {
    f(fields[1:], dispatch)
   }
  }
 }
}

onDefine is a meta function that defines a directive in terms of another existing directive. What it allows us to do is to define a name that executes an directive with default parameters. For example, if directive CD is defined and it expects a directory as a parameter, we can write DEFINE HOME CD /home/ in the configuration file, therefore creating a new directive HOME that simply executes directive CD with the parameters /home/. In HOP, MJD uses DEFINE to define a directive and actual Perl code in the configuration file that is then dynamically evaluated. With our statically compiled code we will have to do with a less powerful version.

func onDefine(args []string, dispatch DispatchTable) {
 var ok bool
 if _, ok = dispatch[args[0]]; ok {
  fmt.Println("Error in DEFINE: action %q already defined\n", args[0])
  return
 }
 var curaction DispatchFunc
 if curaction, ok = dispatch[args[1]]; !ok {
  fmt.Println("Error in DEFINE: curaction %q not defined\n", args[1])
  return
 }
 dispatch[args[0]] = func(args2 []string, dispatch DispatchTable) {
  curaction(args[2:], dispatch)
 }
}

The main function creates the dispatch table with four directives. CONFIG and DEFINE point to our previous functions while PRINT and CD are two very self-explanatory functions. Examples of configuration files can been seen here and here.

func main() {
 if len(os.Args) != 2 {
  fmt.Printf("Usage %s CONFIG\n", os.Args[0])
  os.Exit(0)
 }

 dispatch := DispatchTable{
  "CONFIG": onReadConfig,
  "DEFINE": onDefine,
  "PRINT": func(args []string, dispatch DispatchTable) {
   fmt.Println(strings.Join(args, " "))
  },
  "CD": func(args []string, dispatch DispatchTable) {
   fmt.Printf("Change dir to: %q\n", args[0])
  },
 }

 onReadConfig(os.Args[1:], dispatch)
}

Get the source at GitHub.

Thursday, July 18, 2013

Interlude

So chapter one (Recursion and Callbacks) is done. Chapter two (Dispatch Tables) awaits. It took more time than I expected, mostly because of problem 1.7 which I got stuck on for some time. But nevertheless, 40 pages out of 563 are done, only 8 chapters to go. Will this be finished by year's end?

1.8: Partitioning

The last problem in chapter 1 is intended to warn against the follies of reckless recursion. It works well for a small problem space but quickly blows up once we increase it. The problem is to divide a treasure (e.g. consisting of items valued at $5, $2, $4, $8 and $1) into shares.

findshare accepts a target and a list of valuables and returns a solution. If the target is 0 returns a list with zero elements and if target is less than zero or the treasure has no elements it returns a nil value. It then splits the treasure list into a first element and the tail and attempts to find a solution with that first element. If that fails, it tries again but now without the first element. Repeating this for every element in the list, we've exhausted the problem space and have found a solution if it exists at all.

func findshare(target int, treasures []int) []int {
 if target == 0 {
  return make([]int, 0)
 }
 if target < 0 || len(treasures) == 0 {
  return nil
 }
 first, rest := treasures[0:1], treasures[1:]
 solution := findshare(target-first[0], rest)
 if solution != nil {
  return append(first, solution...)
 }
 return findshare(target, rest)
}

The main function simply creates a treasure list, calculates it total value, asks for a two way split, and prints the result.

func main() {
 treasure := []int{5, 2, 4, 8, 1}
 total := 0
 for _, v := range treasure {
  total += v
 }
 share := findshare(total/2, treasure)
 fmt.Print(share)
}

Get the source at GitHub.

1.7: Html

This section took more time than I expected simply because I was hit by the double whammy of not knowing Perl well enough and not knowing Go well enough at the same time.

What I failed to appreciate in the previous section (1.5: Applications and Variations of Directory Walking) was that Perls Push method would, if it received an array to push onto a array, flatten out the second array so I would end up with a single array rather than an array of arrays. This would blow up the second use (promoting elements, see below) since I would end up with a tree of slices rather than a single slice of strings.

My second problem was that once I figured out the Push problem, I failed to figure out the proper syntax for appending slices to slices in Go. In the end Effective Go lead me to the correct solution, to append "..." to the second slice argument.

We start by declaring some types and functions for the generic walking function. These are analogues to the previous dirwalk function.

type (
 ResultType  interface{}
 TextFunc    func(n *html.Node) ResultType
 ElementFunc func(n *html.Node, results []ResultType) ResultType
)

htmlwalk is our html walker. It will accept a html.Node and call the TextFunc on text elements and ElementFunc on other elements. It will recursively call itself and check if the result is a single ResultType element or a slice of ResultType elements and append them to the final result list.

func htmlwalk(n *html.Node, textf TextFunc, elementf ElementFunc) ResultType {
 if n.Type == html.TextNode {
  return textf(n)
 }
 results := make([]ResultType, 0)
 child := n.FirstChild
 for {
  if child == nil {
   break
  }
  result := htmlwalk(child, textf, elementf)
  if multiresults, ok := result.([]ResultType); ok {
   results = append(results, multiresults...)
  } else {
   results = append(results, result)
  }
  child = child.NextSibling
 }
 return elementf(n, results)
}

We apply this function to two use cases. The first is to simply walk through the html file and return all the text stripped of tags.

Simple and very permissive typecasting function. If it fails it simply returns an empty string.

func stringValue(v interface{}) string {
 if s, ok := v.(string); ok {
  return s
 }
 return ""
}

This is called on text elments and it simply returns the text from the html node.

func untagText(n *html.Node) ResultType {
 return n.Data
}

The element function. Run through the result list and concatenate all the strings.

func untagElement(n *html.Node, results []ResultType) ResultType {
 var b bytes.Buffer
 for _, v := range results {
  s := stringValue(v)
  _, err := b.WriteString(s)
  if err != nil {
   panic(err)
  }
 }
 return b.String()
}

The second use case for the html walker is to construct code that will allow us to dig into an html document and print out only the text contained in the tag we specify. In this example we ask for the h1 tag.

First we define our tag structure (which htmlwalker will pass around as ResultType). Strings that we want to be promoted will be tagged with Keep, other strings will be tagged with Maybe.

type TagType int16

const (
 Maybe TagType = iota
 Keep
)

type tag struct {
 Type  TagType
 Value string
}

Extract the value from ResultType. It can be both either a *tag structure or a slice of ResultType.

func tagValue(v ResultType) (*tag, []ResultType) {
 if t, ok := v.(*tag); ok {
  return t, nil
 } else if t, ok := v.([]ResultType); ok {
  return nil, t
 }
 panic(fmt.Sprintf("Unknown value %v", v))
}

Any text node is a Maybe

func promoteText(n *html.Node) ResultType {
 return &tag{Maybe, n.Data}
}

This functions constructs the ElementFunc function. If the tagname is found we concatenate all the strings in the result list and promote the result to Keep. If not, we simply return the result list that htmlwalk will the flatten into the final result list.

func promoteElementIf(tagname string) func(n *html.Node, results []ResultType) ResultType {
 return func(n *html.Node, results []ResultType) ResultType {
  if n.Data == tagname {
   var b bytes.Buffer
   for _, v := range results {
    t, _ := tagValue(v)
    _, err := b.WriteString(t.Value)
    if err != nil {
     panic(err)
    }
   }
   return &tag{Keep, b.String()}
  }
  return results
 }
}

The htmlwalk function will return a list of ResultType elements. We run through it, find all elements that are tagged with Keep and concatenate them into one string that we then return.

func extractPromoted(n *html.Node) string {
 _, results := tagValue(htmlwalk(n, promoteText, promoteElementIf("h1")))

 if results != nil {
  var b bytes.Buffer
  for _, r := range results {
   tag, _ := tagValue(r)
   if tag.Type == Keep {
    _, err := b.WriteString(tag.Value + " ")
    if err != nil {
     panic(err)
    }
   }
  }
  return b.String()
 }
 return ""
}

The final main function simply tests both of these use cases. You can call it either on a local html file or supply an url using the -u flag.

var url = flag.Bool("u", false, "Use -u for url")

func main() {
 flag.Parse()

 if len(os.Args) < 2 {
  fmt.Printf("Usage %s [-u] NAME\n", os.Args[0])
  os.Exit(0)
 }

 var in io.Reader
 if *url {
  resp, err := http.Get(os.Args[2])
  if err != nil {
   panic(err)
  }
  defer resp.Body.Close()
  in = resp.Body
 } else {
  fi, err := os.Open(os.Args[1])
  if err != nil {
   panic(err)
  }
  defer fi.Close()
  in = bufio.NewReader(fi)
 }

 n, err := html.Parse(in)
 if err != nil {
  panic(err)
 }

 fmt.Println(stringValue(htmlwalk(n, untagText, untagElement)))
 fmt.Println(extractPromoted(n))
}

Get the source at GitHub.

Wednesday, July 17, 2013

1.5: Applications and Variations of Directory Walking

The dirwalk is an generic function (yes, Go lacks generics, we will get to that) that walks a directory tree and executes a supplied file function or a supplied directory function for each entry.

To start there are some types that need to be declared. ComputeType is our generic type and FileFunc and DirFunc are the two user supplied functions.

type (
 ComputeType interface{}
 FileFunc    func(f *os.File) ComputeType
 DirFunc     func(d *os.File, results []ComputeType) ComputeType
)

Next comes a utility function that is only needed so that the user can call dirwalk without either a FileFunc or a DirFunc if those are not needed. It simply returns a empty value.

func empty() ComputeType {
 var e ComputeType
 return e
}

dirwalk opens a file entry and checks if it is a regular file or a directory. If it is a regular file it calls the FileFunc on it and returns the result. If it is a directory it loops through all entries of that directory calling itself recursively on each entry, collecting the results in a slice. It then calls the DirFunc on the result collection and returns the result of the DirFunc.

func dirwalk(name string, f FileFunc, d DirFunc) ComputeType {
 file, err := os.Open(name)
 if err != nil {
  panic(err.Error())
 }
 defer file.Close()

 stat, err := file.Stat()
 if err != nil {
  panic(err.Error())
 }
 if stat.IsDir() {
  files, err := file.Readdirnames(0)
  if err != nil {
   panic(err.Error())
  }
  name = name + string(os.PathSeparator)
  results := make([]ComputeType, 0)
  for _, subfile := range files {
   results = append(results, dirwalk(name+subfile, f, d))
  }
  if d == nil {
   return empty()
  }
  return d(file, results)
 }
 if f == nil {
  return empty()
 }
 return f(file)
}

Now the lack of proper generics starts to complicate things. So far dirwalk has only used the interface{} type but the point of all this is to compute actual values. Typecasting is needed. To reduce the code the following utility function is used. It tries to typecast and panics on failure. Another way to write this function is to accept a default value and return that on failure.

func int64value(val interface{}) int64 {
 if v, ok := val.(int64); ok {
  return v
 }
 panic(fmt.Sprintf("Unexpected type: %v", val))
}

The FileFunc used here simply returns the size of the file, which is an int64 value.

func filefunc(f *os.File) ComputeType {
 stat, err := f.Stat()
 if err != nil {
  panic(err.Error())
 }
 return ComputeType(stat.Size())
}

DirFunc goes through the results slice, extracting the int64 value for each entry and returning the total.

func dirfunc(d *os.File, results []ComputeType) ComputeType {
 var total int64
 for _, v := range results {
  total += int64value(v)
 }
 return total
}

The main function simply calls dirwalk with the size calculation functions and then extracts the int64 value.

func main() {
 if len(os.Args) != 2 {
  fmt.Printf("Usage %s NAME\n", os.Args[0])
  os.Exit(0)
 }
 fmt.Println(int64value(dirwalk(os.Args[1], filefunc, dirfunc)))
}

Get the source at GitHub.

Tuesday, July 16, 2013

1.3: The Towers of Hanoi

The Towers of Hanoi is an old puzzle and quite the popular example for recursion in many languages. The legend is that the world will end once you move 64 disks.

This is type of the callback that performs the Hanoi move. An interface{} could be used here but since no state is being maintained a function type is a better choice.

type (
 MoveFunc func(n int, start, end string)
)

The Hanoi function. Moves n pegs from start to end using 3 pegs.

func hanoi(n int, pegStart, pegEnd, pegExtra string, mover MoveFunc) {
 if n == 1 {
  mover(1, pegStart, pegEnd)
 } else {
  hanoi(n-1, pegStart, pegExtra, pegEnd, mover)
  mover(n, pegStart, pegEnd)
  hanoi(n-1, pegExtra, pegEnd, pegStart, mover)
 }
}

A basic moving function. Simply prints out the actual move (e.g. Move disk 2 from "A" to "C").

func move(n int, start, end string) {
 fmt.Printf("Move disk %d from %q to %q.\n", n, start, end)
}

Kicks of the process for 4 disks.

func main() {
 hanoi(4, "A", "B", "C", move)
}

This function constructs a MoveFunc function that wraps around an actual move function. The idea here is to check if certain invariants hold before each move.

func checkmove(n int, peg string, move MoveFunc) MoveFunc {
 position := make([]string, n+1)
 for i := 1; i < len(position); i++ {
  position[i] = peg
 }
 return func(n int, start, end string) {
  if n < 1 || n > len(position)-1 {
   panic(fmt.Sprintf("Bad disk number %d, should be 1..%d", n, len(position)-1))
  }
  if position[n] != start {
   panic(fmt.Sprintf("Tried to move disk %d from %q, but it is on peg %q", n, start, position[n]))
  }
  for i := 1; i < n-1; i++ {
   if position[i] == start {
    panic(fmt.Sprintf("Can't move disk %n from %q because %n is on top of it", n, start, i))
   } else if position[i] == end {
    panic(fmt.Sprintf("Can't move disk %n to %q becasue %n is already there", n, end, i))
   }
  }
  move(n, start, end)
  position[n] = end
 }
}

Again, kicking of the process for 4 disks, now using the wrapper to construct the invariant function around the mover.

func main() {
 hanoi(4, "A", "B", "C", checkmove(4, "A", move))
}

Get the source at GitHub

What is this?

Hi.
My name is Magnús Örn Gylfason and I will be posting some Go code on this blog.

I'm doing this both as a way to learn Go and also as a bit of a homage to one of my favourite programming books, Higher-Order Perl (by Mark Jason Dominus, published by Morgan Kaufmann Publishers, Copyright 2005 by Elsevier Inc.) I highly recommend it even if you don't write Perl code (which I don't).

The code will also be available at https://github.com/mg/hog.

My native tongue is Icelandic so I will be heavily depending on Google Chrome's spelling checker to get me through this, please feel free to point out any spelling mistakes (or other kinds of mistakes) I make.

You can contact me both on Google+ and Twitter.