Solving Sudoku with Prolog: A Journey into Declarative Programming

Tue, Dec 1, 2015
2mo ago
12m read
Image: Pixabay, 2016 (CC0-1.0)

	Timeline:	Oct-Nov 2015
	Languages Used:	Prolog
	School:	AIT at Budapest University of Technology and Economics
	Course:	Semantic and Declarative Technologies

How I Learned to Stop Worrying and Love Logic Programming

During my Semantic and Declarative Technologies course at AIT-Budapest, I was introduced to Prolog, a language that’s dramatically different from imperative languages like Python or Java. Instead of specifying how to achieve a goal, you describe the goal itself and the logical rules that define it. This makes it a fascinating and powerful tool for problems that are all about constraints and logic, like Sudoku.

Prolog is built on the idea of unification and backtracking. You define a set of facts and rules, and Prolog’s engine searches for solutions that satisfy them. If it hits a dead end, it backtracks and tries another path. This is, conceptually, very similar to how a human solves Sudoku: you make a guess, see if it breaks any rules, and if it does, you backtrack and try something else.

This project was my first real dive into this different way of thinking about programming. I started with a standard Sudoku solver and then moved on to a more complex variant called “Skyscraper Sudoku” to really test my understanding.

The most fun challenges

1. The Core Logic: Defining Sudoku Rules

A Sudoku puzzle is a grid where every row, column, and 3x3 sub-grid must contain all the numbers from 1 to 9 without repetition. In Prolog, we can translate these rules directly into logical predicates.

The main predicate, sudoku/2, ties everything together. It takes an incomplete grid (Grid0) and produces a solved grid (Grid).

1sudoku([], []).
2sudoku([R|Rs], Grid):-
3	zeros_gone(R, SR),
4	append([SR], GridN, Grid),
5	sudoku(Rs, GridN),
6	complete(Grid).

This might look intimidating, so let’s break it down:

The first line, sudoku([], [])., is our base case: an empty grid is already solved.
The zeros_gone/2 predicate is a helper I wrote to replace the blank entries (represented as 0) with candidate numbers that can be tested later for consistency.
append/3 reconstructs the grid with the newly variable-filled row.
sudoku(Rs, GridN) recursively processes the rest of the rows.
Finally, complete(Grid) is called to check if the generated grid is a valid Sudoku solution.

The complete/1 predicate is where the rules of Sudoku are enforced:

 1complete(Grid):-
 2	fullyfilled(Grid),
 3	consistent(Grid).
 4
 5consistent(Grid):-
 6	rows_alldiff(Grid),
 7	transpose(Grid, GridT),
 8	rows_alldiff(GridT),
 9	findall(Subgrid, subgrid(Grid, Subgrid), L),
10	(	foreach(Sub, L)
11	do	grid_alldiff(Sub)
12	).

fullyfilled/1 simply ensures there are no empty spots left.
consistent/1 is the real powerhouse. It checks three conditions:
1. rows_alldiff(Grid): All numbers in each row must be unique.
2. transpose(Grid, GridT), rows_alldiff(GridT): It cleverly transposes the grid (rows become columns) and runs the same uniqueness check, thus validating the columns.
3. subgrid/2: It defines and then checks all the 3x3 sub-grids to ensure they also contain unique numbers.

This declarative approach is the beauty of Prolog. We don’t implement a step-by-step solving algorithm. Instead, we define what a correct solution looks like, and Prolog’s engine does the hard work of finding it.

2. Upping the Ante: Skyscraper Sudoku

To push myself further, I tackled Skyscraper Sudoku. This variant adds a new layer of constraints. Imagine the numbers in the grid represent the heights of buildings. The numbers outside the grid tell you how many buildings are visible from that vantage point in that row or column. A taller building hides any shorter ones behind it.

Skyscraper Sudoku

Grid numbers represent building heights. Numbers on the perimeter tell you how many buildings are visible from that vantage point.

Loading interactive scene...

This required a more advanced approach. My first solver was a “generate-and-test” model: it would generate a complete grid and then test if it was valid. For Skyscraper Sudoku, this is incredibly inefficient. The solution was to use Constraint Logic Programming over Finite Domains (CLP(FD)), a library that turns Prolog into a powerful constraint solver.

With CLP(FD), you define the domains for your variables (e.g., integers from 1 to 9) and the constraints they must satisfy before Prolog starts its search. This prunes the search space dramatically, making the problem feasible to solve.

The key predicate here is visnum/4, which defines the “visibility” constraint:

 1% visnum(V, Dir, RC, SS): The number of elements in row or column RC
 2% of grid SS that are visible from the direction Dir is V.
 3visnum(V, n, RC, SS):- % north => left visibility, RC is column number
 4	transpose(SS, SST),
 5	nth(RC, SST, Col),
 6	visnum_left(V, Col).
 7visnum(V, w, RC, SS):- % west => left visibility, RC is row number
 8	nth(RC, SS, Row),
 9	visnum_left(V, Row).
10...

The interesting part is visnum_left/2, which calculates the visibility from one direction.

1visnum_left(V, [A,B|As]):-
2	A #< B #<=> Visible,
3	V #= V0 + Visible,
4	Visible #=> C #= B,
5	#\ Visible #=> C #= A,
6	visnum_left(V0, [C|As]).

This uses CLP(FD)’s specialized operators (#<, #<=>, #=, #=>):

A #< B #<=> Visible: Visible becomes 1 (true) if A is less than B, and 0 (false) otherwise. This determines if building B is visible past A.
V #= V0 + Visible: The total visibility V is the sum of the visibility from the rest of the list (V0) plus the current Visible state.
The next two lines cleverly determine the new “tallest building so far” (C) for the recursive call.

Using CLP(FD) felt like a superpower. It shows the real potential of logic programming for solving combinatorial problems by defining constraints and letting the solver handle the optimization.

Final Thoughts

Working with Prolog was a mind-bending and rewarding experience. It forces a different kind of problem-solving, one focused on defining logical systems rather than detailing computational steps. It’s a powerful reminder that the tools we choose can fundamentally shape how we approach a problem. For puzzles, logic, and AI, Prolog remains a uniquely elegant tool.

The complete code for both the standard and Skyscraper Sudoku solvers is embedded in this post below. Feel free to play with it and explore the world of logic programming!

More Resources

Learn X In Y Minutes, Where X=Prolog

Source Code

sudoku.pl

  1% Soeren Walls - October 14th, 2015
  2
  3:- use_module(library(lists),[append/2,transpose/2]).
  4
  5% sudoku(+Grid0, ?Grid): Grid is a complete refinement of the Sudoku
  6% grid Grid0, where the "blank" entries of Grid0 are zeros.
  7sudoku([], []).
  8sudoku([R|Rs], Grid):-
  9	zeros_gone(R, SR),
 10	append([SR], GridN, Grid),
 11	sudoku(Rs, GridN),
 12	complete(Grid).
 13
 14% zeros_gone(+L0, -L): The list L is the list L0 with all zeros
 15% replaced by positive integers that are not in L0 but are =< the
 16% length of L0. All elements of L are different from one another.
 17zeros_gone(L0, L):-
 18	length(L0, S),
 19	zeros_gone(L0, L0, L, S).
 20
 21% zeros_gone(+L0, +LC, -L, +S): The list L is the list L0 with all
 22% zeros replaced by positive integers that are not in LC but are
 23% all =< S, where S is the length of LC. The list LC is the unchanged
 24% original version of list L0 at the time of the first call to this
 25% predicate. All elements of L are also different from one another.
 26zeros_gone([], _, [], _).
 27zeros_gone([A|T], LC, L, S):-
 28	(	\+ A = 0 -> L = [A|L1]
 29	;	between_butnotin(1, S, N, LC),
 30		L = [N|L1]
 31	),
 32	zeros_gone(T, LC, L1, S),
 33	alldiff_not0(L).
 34
 35% between_butnotin(N, M, X, L): X is an integer >= N and =< M, but
 36% X is not a member of L.
 37between_butnotin(N, M, X, L):-
 38	(	nonmember(N, L) ->
 39		(	N =< M, X = N
 40		;	N < M, N0 is N+1,
 41			between_butnotin(N0, M, X, L)
 42		)
 43	;	N < M, N0 is N+1,
 44		between_butnotin(N0, M, X, L)
 45	).
 46
 47% complete(+Grid): Grid is fully filled in and consistent.
 48complete(Grid):-
 49	fullyfilled(Grid),
 50	consistent(Grid).
 51
 52% fullyfilled(+Grid): All fields of Grid contain positive integers.
 53fullyfilled([]).
 54fullyfilled([R|Rs]):-
 55	allpositive(R),
 56	fullyfilled(Rs).
 57
 58% allpositive(+List): All elements in List are positive integers.
 59allpositive([]).
 60allpositive([A|T]):-
 61	integer(A),
 62	A > 0,
 63	allpositive(T).
 64
 65% consistent(+Grid): For all areas of the Sudoku grid Grid, it holds
 66% that all positive integers in the area are distinct.
 67consistent([]).
 68consistent(Grid):-
 69	rows_alldiff(Grid),
 70	transpose(Grid, GridT),
 71	rows_alldiff(GridT),
 72	findall(Subgrid, subgrid(Grid, Subgrid), L),
 73	(	foreach(Sub, L)
 74	do	grid_alldiff(Sub)
 75	).
 76
 77% subgrid(+Grid, ?Subgrid): Subgrid is a sub-grid
 78% of Grid with size K x K, where K is the square root of the
 79% size of Grid, at (row,col) coordinates (I,J).
 80subgrid(Grid, Subgrid):-
 81	length(Grid, M),
 82	K is integer(sqrt(M)),
 83	sublist(Grid, SubRows, I, K),
 84	I mod K =:= 0,
 85	transpose(SubRows, SubCols),
 86	sublist(SubCols, SubgridT, J, K),
 87	J mod K =:= 0,
 88	transpose(Subgrid, SubgridT).
 89
 90% grid_alldiff(+Grid): All the elements in the list of lists Grid
 91% are different from each other, disregarding the number 0.
 92grid_alldiff([]).
 93grid_alldiff(Grid):-
 94	flatten(Grid, L),
 95	alldiff_not0(L).
 96
 97% flatten(+Grid, -L): L is the "flattened" list consisting
 98% of all individual elements from a list of lists called Grid.
 99flatten([], []).
100flatten([R|Rs], L):-
101	append(R, L2, L),
102	flatten(Rs, L2).
103
104% rows_alldiff(+L): In a list L consisting of sublists, each
105% sublist has elements which are all different from each other,
106% disregarding the number 0.
107rows_alldiff([]).
108rows_alldiff([R|Rs]):-
109	alldiff_not0(R),
110	rows_alldiff(Rs).
111
112% alldiff_not0(+L): The elements of list L are all different from
113% each other, disregarding the number 0.
114alldiff_not0([]).
115alldiff_not0([D|Ds]) :-
116	(	\+ D = 0 -> nonmember(D, Ds) % BIP == \+ member(D, Ds)
117	;	true
118	),
119	alldiff_not0(Ds).
120
121% sublist(+Whole, ?Part, +Before, +Length): Part is a sublist of
122% Whole such that there are Before number of elements in Whole
123% before Part and the length of Part is Length.
124sublist(Whole, Part, Before, Length):-
125	append([BL,Part,_], Whole),
126	length(BL, Before),
127	length(Part, Length).

skysudoku.pl

  1% Soeren Walls - November 15th, 2015
  2
  3:- use_module(library(clpfd)).
  4:- use_module(library(lists),[append/2,transpose/2,reverse/2]).
  5
  6% :- type spuzzle ---> ss(size,list(clue))
  7% :- type size == int.
  8% :- type clue ---> g(num,row,col) ; v(vcount,dir,rowcol).
  9% :- type row == int.
 10% :- type col == int.
 11% :- type rowcol == int.
 12% :- type num == int.
 13% :- type vcount == int.
 14% :- type dir ---> n ; e ; s ; w.
 15% :- type ssol == list(list(num)).
 16% :- pred skysudoku(spuzzle::in, ssol::out).
 17
 18% skysudoku(SP, SS): SS is a solution of the Skyscraper Sudoku puzzle SP.
 19skysudoku(ss(0,_), [[]]).
 20skysudoku(ss(1,_), [[1]]):- !.
 21skysudoku(ss(K,Clues), SS):-
 22	M #= K*K,
 23	length(SS, M),
 24	length(SST, M),
 25	transpose(SS, SST),
 26	clues(Clues, SS),
 27	grid_alldiff(SS, SST, M),
 28	SGI #= M-K,
 29	subgrids_alldiff(SS, K, SGI, SGI, M), % K=2 => (SS,2,2,2,4), K=3 => (SS,3,6,6,9)
 30	append(SS, SSF),
 31	labeling([], SSF).
 32
 33% clues(Clues, SS): SS is a Skyscraper Sudoku puzzle for which the
 34% properties in the list Clues hold true.
 35clues([], _).
 36clues([g(N,R,C)|Clues], SS):-
 37	given(N, R, C, SS),
 38	clues(Clues, SS).
 39clues([v(V,Dir,RC)|Clues], SS):-
 40	visnum(V, Dir, RC, SS),
 41	clues(Clues, SS).
 42
 43% given(+N, +R, +C, ?Grid): N appears in row R, column C of Grid.
 44% The rows and columns begin at index 1.
 45given(N, 1, C, [Row|_]):-
 46	element(C, Row, N), !.
 47given(N, R, C, [_|Rows]):-
 48	R0 #= R-1,
 49	given(N, R0, C, Rows).
 50
 51% visnum(V, Dir, RC, SS): The number of elements in row or column RC
 52% of grid SS that are visible from the direction Dir is V. The rows
 53% and columns begin at index 1.
 54visnum(V, n, RC, SS):- % north => left visibility, RC is column number
 55	transpose(SS, SST),
 56	nth(RC, SST, Col),
 57	visnum_left(V, Col).
 58visnum(V, w, RC, SS):- % west => left visibility, RC is row number
 59	nth(RC, SS, Row),
 60	visnum_left(V, Row).
 61visnum(V, s, RC, SS):- % south => right visibility, RC is column number
 62	transpose(SS, SST),
 63	nth(RC, SST, Col),
 64	visnum_right(V, Col).
 65visnum(V, e, RC, SS):- % east => right visibility, RC is row number
 66	nth(RC, SS, Row),
 67	visnum_right(V, Row).
 68
 69% nth(?N, +L, ?E): The N-th element of the proper list L is E.
 70% The head of a list is considered its 1st element.
 71nth(1, [Head|_], Head).
 72nth(N, [_|Tail], E):-
 73	nth(M, Tail, E),
 74	N #= M+1.
 75
 76% visnum_left(V, L): The number of elements in list L that are visible
 77% from the left is V.
 78visnum_left(0, []).
 79visnum_left(1, [_]).
 80visnum_left(V, [A,B|As]):-
 81	A #< B #<=> Visible,
 82	V #= V0 + Visible,
 83	Visible #=> C #= B,
 84	#\ Visible #=> C #= A,
 85	visnum_left(V0, [C|As]).
 86
 87% visnum_right(V, L): The number of elements in list L that are visible
 88% from the right is V.
 89visnum_right(V, L):-
 90	reverse(L, LR),
 91	visnum_left(V, LR).
 92
 93% grid_alldiff(+Grid, +GridT, M): The rows of Grid contain distinct
 94% integers from 1 to M. The same holds for the rows of GridT. GridT
 95% is the transpose of the grid Grid of size MxM.
 96grid_alldiff([], [], _).
 97grid_alldiff([Row|Rows], [Col|Cols], M):-
 98	length(Row, M),
 99	length(Col, M),
100	domain(Row, 1, M),
101	domain(Col, 1, M),
102	all_distinct(Row),
103	all_distinct(Col),
104	grid_alldiff(Rows, Cols, M).
105
106% subgrids_alldiff(+Grid, ?Subgrid, K, I, J, C): Subgrid is the C-th
107% sub-grid of Grid with size KxK, where K is the square root of the
108% width of Grid, at (row,col) coordinates (I,J), and all elements
109% in Subgrid are distinct integers from 1 to K^2.
110subgrids_alldiff(_, _, _, _, 0):- !.
111subgrids_alldiff(Grid, K, I, J, SGCount):-
112	M #= K*K,
113	I in 0..M,
114	J in 0..M,
115	sublist(Grid, SubRows, I, K),
116	transpose(SubRows, SubCols),
117	sublist(SubCols, SubgridT, J, K),
118	append(SubgridT, Subgrid),
119	length(Subgrid, M),
120	domain(Subgrid, 1, M),
121	all_distinct(Subgrid),
122	I #= 0 #<=> IZero,
123	IZero #=> I0 #= M-K,
124	IZero #=> J0 #= J-K,
125	#\ IZero #=> I0 #= I-K,
126	#\ IZero #=> J0 #= J,
127	SGC0 #= SGCount - 1,
128	subgrids_alldiff(Grid, K, I0, J0, SGC0).
129
130% sublist(+Whole, ?Part, +Before, +Length): Part is a sublist of
131% Whole such that there are Before number of elements in Whole
132% before Part and the length of Part is Length.
133sublist(Whole, Part, Before, Length):-
134	append([BL,Part,_], Whole),
135	length(BL, Before),
136	length(Part, Length).