Shuffle practice

2026-01-05

I'm not good at shuffling a deck of playing cards, and wanted to practice the riffle shuffle. There's a result that it takes 7 riffles to randomize a deck (wikipedia, youtube). To see the full effect of a shuffle, we need to know the order of cards in a deck before and after. I have some tools to annotate a deck and inspect shuffles in the shuffle_practice repo. The repo has example data which I will refer to in this post.

Deck image

I started with capturing an image of all 52 cards in a deck. I laid them out on the floor against a dark background to make further image processing tasks easier.

20251008_01

These are some conventions I used for ordering a deck.

card_order

I only performed riffles, and captured a deck image after every single riffle. In the example sequence included with the repo, there are 8 images of decks, for a total of 7 riffles between the first and last deck. The image above is of the first deck, before any shuffles.

Segmentation

The next task was to segment a deck image into individual cards. I used segment-anything for this purpose. I worked on a cpu-only machine, so I used the smallest model along with a scaled-down image. The model segmented cards very well, but could also generate spurious segments.

all_masks

I got rid of unwanted segments with some simple filtering (such as expected card area and aspect ratio). This worked because I used a fixed set of playing cards, and tried to capture consistent deck images. I often captured a couple of images of the same deck, and chose one on which segmentation worked.

segmentation

Classification

Once I had card segments, I used open_clip for labeling the cards. Based on some evaluation (see the final deck accuracy column in the linked csv), the (SigLIP, webli) models stood out as having very good performance, without any prompt engineering. Perhaps the webli dataset includes playing cards, but I didn't confirm.

Predictions were correct for cards that were clearly visible, such as the 3h card shown below. I took deck images on a phone, and cards near the edges were consistently blurry. The SigLIP model I used often did well labeling even these, such as the 6h card.

classification_examples

I used the classification output as a seed to manually annotate the decks. Classification was pretty good (average accuracy 97% on the 8 example images) which simplified this step.

Permutation stats

With the annotation files in place, I could get an informal picture of shuffle quality. My example sequence of shuffles was far from perfect.

I found looking at permutation matrices to be helpful. A permutation matrix is a matrix of zeros and ones that maps a source sequence to a permutation. In all the matrices shown, the source was the starting deck (20251008_01). The three images below are the permutation matrix after 1, 2, and 3 riffles. In other words, they map deck 1 to 2, 3, and 4.

permutation_matrix_progression

Shuffle 6, between decks 6 and 7, was so bad that it isn't even a riffle. I tried to follow the riffle with a bridge, but fumbled, and undid most of the riffle. Perhaps my sequence of 7 riffles is effectively only 6.

bad_shuffle

The overall permutation matrix, between decks 1 and 8, looks pretty scattered, although a diagonal bias possibly still remains.

permutation_matrix_overall

In a permuted sequence, a pair of elements is inverted if their order is flipped, relative to the source sequence. For a deck of 52 cards, the expected number of inversions for a uniform random permutation is 52 * 51 / 4 = 663. The number of inversions in my sequence of shuffles (relative to the starting deck) is below.

281, 452, 583, 618, 696, 740, 714

I don't have anything as principled as confidence intervals, but 714 seems far enough from 663 to suggest that my riffle shuffle needed more practice.

Conclusion

At the start of the project, I thought I would learn about random permutation statistics. Instead, I spent time working with zero-shot models for segmentation and classification, which to their credit worked very well.

The limiting factor for more statistics was data collection effort. In the final state of my code and workflow, it took me more than a couple of hours to capture, annotate, and inspect a sequence of 7 riffles, or 8 decks. I didn't have enough data instances to look into questions like the following.

I concluded this project once it became more fun, and practice, to shuffle in service of an actual game of cards.