I made Crowd Wordle


Building Crowd Wordle: My Journey of Optimizing for Fun

I recently built Crowd Wordle, a game where multiple users vote on the next word to guess. I wanted to write about the process of building it. It’s open source and on Github


From Crowd Tetris to Crowd Wordle

Originally, I started working on Crowd Tetris, but it didn’t feel very fun. People couldn’t get creative with Tetris like they can with Wordle. In Wordle, players can send words, which feels much more creative. (as long as they don’t spam naughty words)

so I switched gears and decided to build Crowd Wordle instead.


Tech Choices

  • Backend: .NET 9 my favorite backend language.
  • Frontend: I’m not very fond of frontend work, but I decided to use Svelte 5 because I hadn’t tried the latest version with its new runes and features.
  • Realtime Communication: I started with SignalR since I’ve always used it for WebSockets in .NET.

For those unfamiliar, SignalR is a websocket library like Socket.IO but made for the .NET ecosystem. At first, I didn’t even know there were alternatives in .NET ecosystem… i will write about this later on in this post.


WebSockets & Message Optimization

I quickly got the game logic working using SignalR, but I wanted to minimize bandwidth between client and server for fun.

SignalR sends data as JSON, which felt heavy for my needs. I found the SignalR MessagePack library, which reduces message size by sending data in binary format instead of JSON.

That was nice, but I wanted to go deeper.


Packing a Word into 4 Bytes

First message we will work through is a client(user) voting for a word. a client will send 5 character string to server. the dataset i used for words contained only ascii alphabets. there were no accented characters or anything so i only need to consider a-z for character Each guess is just a 5-letter word. Since there are only 26 letters, I can represent each letter as a number from 0–25. That’s 5 bits per letter, so: 5 letters * 5 bits = 25 bits total

This means the word fits into 32 bits (4 bytes) instead of sending it as a 5-byte string. Here’s the encoding logic:

const WORD_LENGTH = 5;
const BASE_CHAR = "a".charCodeAt(0);
const BITS_PER_CHAR = 5;
const BITMASK = (1 << BITS_PER_CHAR) - 1;
const WORD_BITS = WORD_LENGTH * BITS_PER_CHAR;

export function encodeToPackedNum(word: string): number {
  if (word.length !== WORD_LENGTH) return 0;
  word = word.toLowerCase();

  let packed = 0;
  for (let i = 0; i < WORD_LENGTH; i++) {
    const value = word.charCodeAt(i) - BASE_CHAR;
    if (value < 0 || value > 25) return 0;
    packed |= (value & BITMASK) << (i * BITS_PER_CHAR);
  }
  return packed >>> 0;
}

// Send as binary:
const buffer = new ArrayBuffer(4);
const view = new DataView(buffer);
view.setUint32(0, encodedString, false);
socket?.send(buffer);

if you ask me… not a very significant data saving honestly. but still its something. actually it is 20% decrease in message size if i counted just the main message part.


Dropping SignalR for Raw WebSocket

While inspecting SignalR messages, I noticed it sent a lot of metadata (method names, invocation IDs, etc.), signalr sends more details than the actual payload and for necessary measure.

After a somewhat quick search (google search BTW. i should’ve just asked a clanker for this), I realized .NET has built-in WebSocket support. I’ve worked in .NET for almost 7 years now and of the few times I’ve used websocket, it is through signalr. i feel dumb that i didn’t knew about it before. but i guess its alright since you don’t need to know all the things :D

I rewrote the system to use raw WebSockets. It was more work because I had to manage connections manually, but now the message sent from client to server is exactly 4 bytes. huge win for me because i managed to solve the problem (not like there was any problem in fact). i don’t need to send anything else because the only message that can get sent from client to server is the word itself. so i don’t need to send message type or any other things.

On the server side, reading looks like this:

if (result.MessageType == WebSocketMessageType.Binary && result.Count == 4)
{
    var word = BinaryPrimitives.ReadUInt32BigEndian(buffer.AsSpan());
}

now word will be equal to the 4 byte packed number sent from the client. i can decode this if needed. but should i? i decided to store words in numerical form. string comparision is more expensive than numbers after all. so why convert it to string? just save everything in numbers. again, someone might ask me why i’m trying to super optimize something this basic. there is no reason. i’m just trying to do stuffs to learn about it and its fun.


Game State Flow

now the next thing would be sending messages back to client. the way i architected this game, this is the state flow:

  • WAITINGFORVOTE The game waits for the first vote.
  • VOTINGINPROGRESS Once a vote is received, a 10-second timer starts.
  • VOTES ARE EVALUATED The winning vote is selected.
  • If the game isn’t over, go back to WAITINGFORVOTE.
  • If the game ends, start a 10-second cooldown before a new game.

Server-to-Client Messages

so in total these are the messages i’ll need to send from server to client:

  • INITIALMESSAGE: Sent when the client connects, includes all details.
  • LIVEDATA: Sent every second if there’s an update (user count, total votes…)
  • VOTINGSTARTED: Notifies clients when voting begins.
  • GAMEUPDATE: Sent after each round or game end.
  • GAMESTART: New game start notification.
  • VOTESTREAM: Sent every 2 seconds to show new incoming words.
  • RESPONSE: Sent after a user votes (with error messages if needed).

so in total there are 7 possible messages. now comes time for server to client message optimization.

Binary Message Packing

i decided to pack the message as byte array and send it to client. so the way i did it is first created a bitwriter to write details to bits

private ref struct BitWriter
{
    private Span<byte> _buffer;
    private int _bytePosition;
    private ulong _bitBuffer;
    private int _bitCount;

    public BitWriter(Span<byte> buffer)
    {
        _buffer = buffer;
        _bytePosition = 0;
        _bitBuffer = 0;
        _bitCount = 0;
    }

    public void WriteBits(uint value, int bits)
    {
        _bitBuffer |= (value & ((1UL << bits) - 1)) << _bitCount;
        _bitCount += bits;

        while (_bitCount >= 8 && _bytePosition < _buffer.Length)
        {
            _buffer[_bytePosition++] = (byte)_bitBuffer;
            _bitBuffer >>= 8;
            _bitCount -= 8;
        }
    }

    public int Finish()
    {
        if (_bitCount > 0 && _bytePosition < _buffer.Length)
        {
            _buffer[_bytePosition++] = (byte)_bitBuffer;
        }
        return _bytePosition;
    }
}

Example: packing LIVEDATA

now i’ll also show how i packed one of the simplest server message. lets take an example of LIVEDATA

public static byte[] PackLiveData(in Game game, uint userCount, uint totalVotes, ReadOnlySpan<Vote> topWords)
{
    Span<byte> buffer = stackalloc byte[32];
    var writer = new BitWriter(buffer);

    writer.WriteBits((uint)ServerMessageType.LiveData, 3);
    writer.WriteBits((uint)game.State, 2);
    writer.WriteBits(userCount, 16);

    if (game.State == GameState.VotingInProgress || game.GameOver)
    {
        PackVotingData(ref writer, totalVotes, topWords);
    }

    return buffer[..writer.Finish()].ToArray();
}

estimate size of message. here i estimated it to be 32 bytes. then initialize the bitwriter write each message with how much bits it takes for that message. since there are 7 types of server messages, it can be written in 3 bits. 7<2^3

game state is just 4 possible states:

  • WaitingForVote,
  • VotingInProgress,
  • Won,
  • Lost

so 2 bit would work for this message. and then usercount which i have limited to be up to 16 bits (i’m being very optimistic here and assuming i’ll have 65k users at max while my real number will probably be enough for 5 bits). if it anyone crosses this number, it just won’t show up on UI side and there won’t be any issues in my app. (assuming my system will be able to handle that amount of user)

now if voting is in progress or game is over, i want to show up to top 3 votes. for which i have a different function.

private static void PackVotingData(ref BitWriter writer, uint totalVotes, ReadOnlySpan<Vote> topWords)
{
    writer.WriteBits(totalVotes, 16);
    writer.WriteBits((uint)topWords.Length, 2);

    foreach (ref readonly var vote in topWords)
    {
        writer.WriteBits(vote.Word, BITS_PER_WORD);
        writer.WriteBits(vote.Count, 16);
    }
}

i think its self explanatory by this point, total votes also represented by 16 bits. since i’ll only send up to 3 votes. the length of votes can be represented by 2 bits. and as for the votes itself, its just the word itself (packed in the same number as before) which takes BITS_PER_WORD or 25 bits. and the number of votes for that word which again I’ve used 16 bits to represent. now this seems kinda excessive but i’m just assuming the worst case where the hive mind decides to vote on the same word

now i convert this sequence of bits to byte array and send it to the client through websocket. on the client side, it will use similar logic to decode.

Client-Side Decoding

reading message:

first read message and convert it to a uint8array in javascript.

socket.addEventListener("message", (event) => {
	const buffer = new Uint8Array(event.data);
	handleData(buffer);
});
export function handleData(buffer: Uint8Array) {

	const messageType = getMessageType(buffer);
	switch (messageType) {
		// .. handle other message types
		case ServerMessageType.LiveData:
			const liveData = unpackLiveData(buffer);
			handleLiveDataMessage(liveData);
			break;
		default:
			console.warn("Unknown message type:", messageType);
	}
}

export function getMessageType(buffer: Uint8Array): ServerMessageType {
	const reader = new BitReader(buffer);
	return reader.readBits(3) as ServerMessageType;
}

export function unpackLiveData(buffer: Uint8Array): LiveData {
	const reader = new BitReader(buffer);

	const _ = reader.readBits(3); // Skip message type
	const gameState: GameState = reader.readBits(2) as GameState;
	const isGameOver = IsGameOver(gameState);
	const userCount = reader.readBits(16);

	let totalVotes = 0;
	let topWords: Vote[] = [];

	if (gameState === GameState.VotingInProgress || isGameOver) {
		totalVotes = reader.readBits(16);
		const topWordCount = reader.readBits(2);

		for (let i = 0; i < topWordCount; i++) {
			const packed = reader.readBits(WORD_BITS);
			const voteCount = reader.readBits(16);
			const word = decodeFromPackedNum(packed);
			topWords.push({
				Word: word,
				Count: voteCount,
				Percentage: totalVotes > 0 ? Math.round((voteCount / totalVotes) * 100) : 0
			});
		}
	}

	return {
		userCount,
		totalVotes,
		topWords,
	};
}

as you can see, i have a BitReader abstraction which does exactly opposite of BitWriter in C# Also… i just want to say here that working in typescript feels almost like working in c#. just a few JS shenanigans So, here’s the BitReader. Works in a very similar way to BitWriter

export class BitReader {
	private offset = 0;
	private bitBuffer = 0n;
	private bitCount = 0;

	constructor(private buffer: Uint8Array) { }

	readBits(bits: number): number {
		if (bits <= 0) throw new Error("bits must be > 0");
		if (bits > 53) throw new Error("Cannot read more than 53 bits at once (JS number limit)");

		while (this.bitCount < bits) {
			if (this.offset >= this.buffer.length) {
				throw new Error("Unexpected end of buffer");
			}
			this.bitBuffer |= BigInt(this.buffer[this.offset++]) << BigInt(this.bitCount);
			this.bitCount += 8;
		}

		const mask = (1n << BigInt(bits)) - 1n;
		const value = this.bitBuffer & mask;
		this.bitBuffer >>= BigInt(bits);
		this.bitCount -= bits;

		return Number(value);
	}
}

Note: if you look inside, you’ll see that i have used BigInt. I initially tried using normal JavaScript numbers, but reading some bits in some cases caused overflow because JS numbers. Switching to BigInt fixed this. It happened in a very specific scenario. I think it was because bitbuffer increases to more than 32 bits when i read data in some specific sequence. so basically that’s how i read messages…

The unnecessary crown jewel: processing logic

Here’s the most complicated part: determining letter correctness in a branchless (almost) way

public static uint ProcessWordExperimental(uint guess, uint target)
{
    // Extract letters
    uint g0 = guess & 0x1F, g1 = (guess >> 5) & 0x1F, g2 = (guess >> 10) & 0x1F;
    uint g3 = (guess >> 15) & 0x1F, g4 = (guess >> 20) & 0x1F;
    uint t0 = target & 0x1F, t1 = (target >> 5) & 0x1F, t2 = (target >> 10) & 0x1F;
    uint t3 = (target >> 15) & 0x1F, t4 = (target >> 20) & 0x1F;

    // Exact matches
    uint eq0 = (uint)-(g0 == t0 ? 1 : 0);
    uint eq1 = (uint)-(g1 == t1 ? 1 : 0);
    uint eq2 = (uint)-(g2 == t2 ? 1 : 0);
    uint eq3 = (uint)-(g3 == t3 ? 1 : 0);
    uint eq4 = (uint)-(g4 == t4 ? 1 : 0);

    // Frequency table
    ulong freq = 0;
    freq += 1UL << ((int)t0 * 3);
    freq += 1UL << ((int)t1 * 3);
    freq += 1UL << ((int)t2 * 3);
    freq += 1UL << ((int)t3 * 3);
    freq += 1UL << ((int)t4 * 3);

    // Subtract exact matches
    freq -= (eq0 & 1UL) << ((int)g0 * 3);
    freq -= (eq1 & 1UL) << ((int)g1 * 3);
    freq -= (eq2 & 1UL) << ((int)g2 * 3);
    freq -= (eq3 & 1UL) << ((int)g3 * 3);
    freq -= (eq4 & 1UL) << ((int)g4 * 3);

    // Presence checks
    ulong c0 = (freq >> ((int)g0 * 3)) & 7UL; uint pr0 = (uint)-((eq0 == 0 && c0 > 0) ? 1 : 0); freq -= ((c0 > 0 ? 1UL : 0UL) << ((int)g0 * 3));
    ulong c1 = (freq >> ((int)g1 * 3)) & 7UL; uint pr1 = (uint)-((eq1 == 0 && c1 > 0) ? 1 : 0); freq -= ((c1 > 0 ? 1UL : 0UL) << ((int)g1 * 3));
    ulong c2 = (freq >> ((int)g2 * 3)) & 7UL; uint pr2 = (uint)-((eq2 == 0 && c2 > 0) ? 1 : 0); freq -= ((c2 > 0 ? 1UL : 0UL) << ((int)g2 * 3));
    ulong c3 = (freq >> ((int)g3 * 3)) & 7UL; uint pr3 = (uint)-((eq3 == 0 && c3 > 0) ? 1 : 0); freq -= ((c3 > 0 ? 1UL : 0UL) << ((int)g3 * 3));
    ulong c4 = (freq >> ((int)g4 * 3)) & 7UL; uint pr4 = (uint)-((eq4 == 0 && c4 > 0) ? 1 : 0); freq -= ((c4 > 0 ? 1UL : 0UL) << ((int)g4 * 3));

    // Pack result (2 bits per letter)
    return ((eq0 << 1) & 2u) | (pr0 & 1u) |
           (((eq1 << 1) & 2u) | (pr1 & 1u)) << 2 |
           (((eq2 << 1) & 2u) | (pr2 & 1u)) << 4 |
           (((eq3 << 1) & 2u) | (pr3 & 1u)) << 6 |
           (((eq4 << 1) & 2u) | (pr4 & 1u)) << 8;
}

i came up with a simple initial logic myself but decided to check what that logic would look like on bitwise level so consulted Claude + ChatGPT. it tried to make it as branchless as possible but couldn’t totally. maybe compiler can do the rest. who knows. i need to study this whole thing and i need to squeeze anything remaining. but this happens in the game once every 10 seconds if at most. so its very not worth it.

what this actually taught me? a whole lot about c#. i have never used following features in, structs, stackalloc, .NET websocket without SignalR, Interlocked operations, ConcurrentDictionary, Semaphore. I do know someone somewhere will look through my code and cringe. But that’s alright… I’ll get better next time. I’ll make you proud of me. Well that’s about it… i need to study this above bitwise logic now.