Strings

A common data type in programming languages, we will here tackle strings.

In this chapter, you will write a program that, given a string, returns the longest sub-string duplicated in the given string. An example will perhaps be more telling :

"banana" => "ana"
"abcd" => ""

Let's go !

Write the test first

We can see it as a tradition now, we will start with the simplest case.

;; strings-test.scm

(use-modules (srfi srfi-64)
             (strings))
             
(test-begin "harness-strings")

(test-equal "empty string has no duplicated substring"
  ""
  (longest-duplicated-substring-in ""))

(test-end "harness-strings")

Try and run the test

$ guile -L . strings-test.scm

The compilation fails (we treat this feedback in the same way as a failed test).

Write the minimal amount of code for the test to run and check the failing test output

Here is the code that removes the compilation error (we create a file in which we define the strings module):

;; strings.scm

(define-module (strings))

Re-execute the test.

$ guile -L . strings-test.scm
;;; note: auto-compilation is enabled, set GUILE_AUTO_COMPILE=0
;;;       or pass the --no-auto-compile argument to disable.
;;; compiling /home/jeko/Workspace/ghh-exercises/strings/strings-test.scm
;;; compiling ./strings.scm
;;; compiled /home/jeko/.cache/guile/ccache/3.0-LE-8-4.5/home/jeko/Workspace/ghh-exercises/strings/strings.scm.go
;;; strings-test.scm:8:3: warning: possibly unbound variable `longest-duplicated-substring-in`
;;; compiled /home/jeko/.cache/guile/ccache/3.0-LE-8-4.5/home/jeko/Workspace/ghh-exercises/strings/strings-test.scm.go
%%%% Starting test harness-strings  (Writing full log to "harness-strings.log")
strings-test.scm:6: FAIL empty string has no duplicated substring
# of unexpected failures  1

It compiles. So the previously added code has helped us to progress. However, we can still see a warning in the compilation traces :

;;; strings-test.scm:8:3: warning: possibly unbound variable `longest-duplicated-substring-in`

Let's add the minimum code to lift the warning (and I'll force the failure, so I define a procedure that does not return ""):

;; strings.scm

(define-module (strings))

(define-public (longest-duplicated-substring-in a-string)
  -1)

If you run the test again, you can see that there are no more warnings or compilation errors.

In the test report harness-strings.log, we can see that the test failed because we expected the value "" but our freshly defined procedure returns -1.

Write enough code to make it pass

;; strings.scm

(define-module (strings))

(define-public (longest-duplicated-substring-in a-string)
  "")

New value returned, the test passes !

Refactor

In the test harness, we will eliminate a duplication : the name of the test suite.

;; strings-test.scm

(use-modules (srfi srfi-64)
             (strings))

(define test-suite-name "harness-strings")

(test-begin test-suite-name)

(test-equal "empty string has no duplicated substring"
  ""
  (longest-duplicated-substring-in ""))

(test-end test-suite-name)

Write the test first

So we have a first test with a string that does not contain a duplicated sub-string. For this second test, we will start with a string which contains a duplicated sub-string, the simplest possible (only one duplicated character).

;; strings-test.scm

(use-modules (srfi srfi-64)
             (strings))

(define test-suite-name "harness-strings")

(test-begin test-suite-name)

(test-equal "empty string has no duplicated substring"
  ""
  (longest-duplicated-substring-in ""))

(test-equal "single char duplicated substring"
  "a"
  (longest-duplicated-substring-in "aa"))

(test-end test-suite-name)

It is expected that the first test will pass and the second test will fail.

Try and run the test

$ guile -L . strings-test.scm

Write the minimal amount of code for the test to run and check the failing test output

Let's look in the test report to see that this second test fails for good reason :

Test end:
  result-kind: fail
  actual-value: ""
  expected-value: "a"

So far so good! Our procedure can only return "" while our test now expects "a".

Write enough code to make it pass

;; strings.scm

(define-module (strings))

(define-public (longest-duplicated-substring-in a-string)
  (if (string-null? a-string)
      ""
      "a"))

You can run the tests again to confirm that all tests now pass!

$ guile -L . strings-test.scm
;;; note: source file ./strings.scm
;;;       newer than compiled /home/jeko/.cache/guile/ccache/3.0-LE-8-4.5/home/jeko/Workspace/ghh-exercises/strings/strings.scm.go
;;; note: auto-compilation is enabled, set GUILE_AUTO_COMPILE=0
;;;       or pass the --no-auto-compile argument to disable.
;;; compiling ./strings.scm
;;; compiled /home/jeko/.cache/guile/ccache/3.0-LE-8-4.5/home/jeko/Workspace/ghh-exercises/strings/strings.scm.go
%%%% Starting test harness-strings  (Writing full log to "harness-strings.log")
# of expected passes      2

Refactor

Nothing obvious here.

Write the test first

This second test specifies the behavior of the procedure when it is passed a string containing the same character twice. But it only tests one case. With this third test, we will generalize. In TDD jargon, this technique is called triangulation.

;; strings-test.scm

(use-modules (srfi srfi-64)
             (strings))

(define test-suite-name "harness-strings")

(test-begin test-suite-name)

(test-equal "empty string has no duplicated substring"
  ""
  (longest-duplicated-substring-in ""))

(test-equal "only twice a char has a duplicated substring"
  "a"
  (longest-duplicated-substring-in "aa"))

(test-equal "only twice a char has a duplicated substring other case"
  "z"
  (longest-duplicated-substring-in "zz"))

(test-end test-suite-name)

Try and run the test

$ guile -L . strings-test.scm                                
;;; note: source file /home/jeko/Workspace/ghh-exercises/strings/strings-test.scm
;;;       newer than compiled /home/jeko/.cache/guile/ccache/3.0-LE-8-4.5/home/jeko/Workspace/ghh-exercises/strings/strings-test.scm.go
;;; note: auto-compilation is enabled, set GUILE_AUTO_COMPILE=0
;;;       or pass the --no-auto-compile argument to disable.
;;; compiling /home/jeko/Workspace/ghh-exercises/strings/strings-test.scm
;;; compiled /home/jeko/.cache/guile/ccache/3.0-LE-8-4.5/home/jeko/Workspace/ghh-exercises/strings/strings-test.scm.go
%%%% Starting test harness-strings  (Writing full log to "harness-strings.log")
strings-test.scm:16: FAIL twice a char has a duplicated substring other case
# of expected passes      2
# of unexpected failures  1

As expected, our third test fails.

Write the minimal amount of code for the test to run and check the failing test output

Since there is no compile-time error, you can go and check that the test fails because longest-duplicated-substring-in is expected to return "z" when it only returns "a".

Test begin:
  test-name: "twice a char has a duplicated substring other case"
  source-file: "strings-test.scm"
  source-line: 16
  source-form: (test-equal "twice a char has a duplicated substring other case" "z" (longest-duplicated-substring-in "zz"))
Test end:
  result-kind: fail
  actual-value: "a"
  expected-value: "z"

Write enough code to make it pass

Instead of returning the same value "a" if the string is not empty, let's modify the code so that it returns a new string containing the first character of the input string.

;; strings.scm

(define-module (strings))

(define-public (longest-duplicated-substring-in a-string)
  (if (string-null? a-string)
      ""
      (substring a-string 0 1)))

Everything should be green now.

Refactor

The line (substring a-string 0 1) could make its intention a little more explicit. Here is my proposal:

;; strings.scm

(define-module (strings))

(define-public (longest-duplicated-substring-in a-string)
  (if (string-null? a-string)
      ""
      (string-first-char a-string)))

(define (string-first-char a-string)
  (substring a-string 0 1))

Don't forget to replay the tests to make sure I haven't introduced any regression!

Write the test first

For the last test of this chapter, we will manage the case of a string which contains two different characters. Why this test? Because it is the case which seems to me to be the simplest that our procedure does not manage yet. Each new test must be red!

;; strings-test.scm

(use-modules (srfi srfi-64)
             (strings))

(define test-suite-name "harness-strings")

(test-begin test-suite-name)

(test-equal "empty string has no duplicated substring"
  ""
  (longest-duplicated-substring-in ""))

(test-equal "twice a char as a string has a duplicated substring"
  "a"
  (longest-duplicated-substring-in "aa"))

(test-equal "only twice a char as a string has a duplicated substring other case"
  "z"
  (longest-duplicated-substring-in "zz"))

(test-equal "only two different chars as a string has not duplicated substring"
  ""
  (longest-duplicated-substring-in "bc"))

(test-end test-suite-name)

Try and run the test

$ guile -L . strings-test.scm 
;;; note: source file /home/jeko/Workspace/ghh-exercises/strings/strings-test.scm
;;;       newer than compiled /home/jeko/.cache/guile/ccache/3.0-LE-8-4.5/home/jeko/Workspace/ghh-exercises/strings/strings-test.scm.go
;;; note: auto-compilation is enabled, set GUILE_AUTO_COMPILE=0
;;;       or pass the --no-auto-compile argument to disable.
;;; compiling /home/jeko/Workspace/ghh-exercises/strings/strings-test.scm
;;; compiled /home/jeko/.cache/guile/ccache/3.0-LE-8-4.5/home/jeko/Workspace/ghh-exercises/strings/strings-test.scm.go
%%%% Starting test harness-strings  (Writing full log to "harness-strings.log")
strings-test.scm:22: FAIL only two different chars as a string has not duplicated substring
# of expected passes      3
# of unexpected failures  1

Without surprise, our new test does not pass! And that's good!

Write the minimal amount of code for the test to run and check the failing test output

No compilation problem, so we can directly look at the reason for the failure :

Test begin:
  test-name: "only two different chars as a string has not duplicated substring"
  source-file: "strings-test.scm"
  source-line: 22
  source-form: (test-equal "only two different chars as a string has not duplicated substring" "" (longest-duplicated-substring-in "bc"))
Test end:
  result-kind: fail
  actual-value: "b"
  expected-value: ""

While our new test expects us to return an empty string, our procedure returns the first character of the input string.

Write just enough code to pass the test

;; strings.scm

(define-module (strings))

(define-public (longest-duplicated-substring-in a-string)
  (if (string-null? a-string)
      ""
      (if (string=? (substring a-string 0 1) (substring a-string 1 2))
          (string-first-char a-string)
          "")))

(define (string-first-char a-string)
  (substring a-string 0 1))

And here it goes !

$ guile -L . strings-test.scm 
;;; note: source file ./strings.scm
;;;       newer than compiled /home/jeko/.cache/guile/ccache/3.0-LE-8-4.5/home/jeko/Workspace/ghh-exercises/strings/strings.scm.go
;;; note: auto-compilation is enabled, set GUILE_AUTO_COMPILE=0
;;;       or pass the --no-auto-compile argument to disable.
;;; compiling ./strings.scm
;;; compiled /home/jeko/.cache/guile/ccache/3.0-LE-8-4.5/home/jeko/Workspace/ghh-exercises/strings/strings.scm.go
%%%% Starting test harness-strings  (Writing full log to "harness-strings.log")
# of expected passes      4

Refactor

Eliminate duplication and get rid of the nested if.

;; strings.scm

(define-module (strings))

(define-public (longest-duplicated-substring-in a-string)
  (if (and (not (string-null? a-string)) (twice-char? a-string))
      (string-first-char a-string)
      ""))

(define (twice-char? a-string)
  (string=? (string-first-char a-string) (string-second-char a-string)))

(define (string-first-char a-string)
  (substring a-string 0 1))

(define (string-second-char a-string)
  (substring a-string 1 2))

Four tests, all green !

$ guile -L . strings-test.scm 
;;; note: source file ./strings.scm
;;;       newer than compiled /home/jeko/.cache/guile/ccache/3.0-LE-8-4.5/home/jeko/Workspace/ghh-exercises/strings/strings.scm.go
;;; note: auto-compilation is enabled, set GUILE_AUTO_COMPILE=0
;;;       or pass the --no-auto-compile argument to disable.
;;; compiling ./strings.scm
;;; compiled /home/jeko/.cache/guile/ccache/3.0-LE-8-4.5/home/jeko/Workspace/ghh-exercises/strings/strings.scm.go
%%%% Starting test harness-strings  (Writing full log to "harness-strings.log")
# of expected passes      4

The solution is not complete and I invite you to practice by going even further. You can even send me your tests and sources if you want me to give you feedback. I'll be happy to do so!