Getting The Index Of An Object In An Ordered List In Firebase

September 30, 2023 Post a Comment

I'm building a leaderboard using Firebase. The player's position in the leaderboard is tracked using Firebase's priority system. At some point in my program's execution, I need to

Solution 1:

If you are processing tens or hundreds of elements and don't mind taking a bandwidth hit, see Katos answer.

If you're processing thounsands of records, you'll need to follow an approach outlined in principle in pperrin's answer. The following answer details that.

Step 1: setup Flashlight to index your leaderboard with ElasticSearch

Flashlight is a convenient node script that syncs elasticsearch with Firebase data.

Read about how to set it up here.

Step 2: modify Flashlight to allow you to pass query options to ElasticSearch

as of this writing, Flashlight gives you no way to tell ElasticSearch you're only interested in the number of documents matched and not the documents themselves.

I've submitted this pull request which uses a simple one-line fix to add this functionality. If it isn't closed by the time you read this answer, simply make the change in your copy/fork of flashlight manually.

Step 3: Perform the query!

This is the query I sent via Firebase:

{
    index: 'firebase',
    type: 'allTime',
    query: {
        "filtered": {
            "query": {
                "match_all": {}
            },
            "filter": {
                "range": {
                    "points": {
                        "gte": minPoints
                    }
                }
            }
        }
    },
    options: {
        "search_type": "count"
    }
};

Replace points with the name of the field tracking points for your users, and minPoints with the number of points of the user whose rank you are interested in.

The response will look something like:

{
    max_score:0,
    total:2
}

total is the number of users who have the same or greater number of points -- in other words, the user's rank!

Solution 2:

Since Firebase stores object, not arrays, the elements do not have an "index" in the list--JavaScript and by extension JSON objects are inherently unordered. As explained in Ordered Docs and demonstrated in the leaderboard example, you accomplish ordering by using priorities.

A set operation:

varref = new Firebase('URL/leaderboard');
ref.child('user1').setPriority( newPosition /*score?*/ );

A read operation:

var ref = newFirebase('URL/leaderboard');
ref.child('user1').once('value', function(snap) {
   console.log('user1 is at position', snap.getPriority());
});

Solution 3:

To get the info you want, at some point a process is going to have to enumerate the nodes to count them. So the question is then where/when the couting takes place.

Using .count() in the client will mean it is done every time it is needed, it will be pretty accurate, but procesing/traffic heavy.

If you keep a separate index of the count it will need regular refreshing, or constant updating (each insert causeing a shuffling up of the remaining entries).

Depending on the distribution and volume of your data I would be tempted to go with a background process that just updates(/rebuilds) the index every (say) ten or twenty additions. And indexes every (say) 10 positions.

"Leaderboard",$UserId = priority=$score
...

"Rank",'10' = $UserId,priority=$score"Rank",'20' = $UserId,priority=$score
...

From a score you get the rank within ten and then using a startat/endat/count on your "Leaderboard" get it down to the unit.

If your background process is monitoring the updates to the leaderboard, it could be more inteligent about its updates to the index either updating only as requried.

Solution 4:

I know this is an old question, but I just wanted to share my solutions for future reference. First of all, the Firebase ecosystem has changed quite a bit, and I'm assuming the current best practices (i.e. Firestore and serverless functions). I personally considered these solutions while building a real application, and ended up picking the scheduled approximated ranks.

Live ranks (most up-to-date, but expensive)

When preparing a user leaderboard I make a few assumptions:

The leaderboard ranks users based on a number which I'll call 'score' from now on
New users rank lowest on the leaderboard, so upon user creation, their rank is set to the total user count (with a Firebase function, which sets the rank, but also increases the 'total user' counter by 1).
Scores can only increase (with a few adaptations decreasing scores can also be supported).
Deleted users keep a 'ghost' spot on the leaderboard.

Whenever a user gets to increase their score, a Firebase function responds to this change by querying all surpassed users (whose score is >= the user's old score but < the user's new score) and have their rank decreased by 1. The user's own rank is increased by the size of the before-mentioned query.

The rank is now immediately available on client reads. However, the ranking updates inside of the proposed functions are fairly read- and write-heavy. The exact number of operations depends greatly on your application, but for my personal application a great frequency of score changes and relative closeness of scores rendered this approach too inefficient. I'm curious if anyone has found a more efficient (live) alternative.

Scheduled ranks (simplest, but expensive and periodic)

Schedule a Firebase function to simply sort the entire user collection by ascending score and write back the rank for each (in a batch update). This process can be repeated daily, or more frequent/infrequent depending on your application. For N users, the function always makes N reads and N writes.

Scheduled approximated ranks (cheapest, but non-precise and periodic)

As an alternative for the 'Scheduled ranks' option, I would suggest an approximation technique: instead of writing each user's exact rank upon for each scheduled update, the collection of users (still sorted as before) is simply split into M chunks of equal size and the scores that bound these chunks are written to a separate 'stats' collection.

So, for example: if we use M = 3 for simplicity and we read 60 users sorted by ascending score, we have three chunks of 20 users. For each of the (still sorted chunks) we get the score of the last (lowest score of chunk) and the first user (highest score of chunk) (i.e. the range that contains all scores of that chunk). Let's say that the chunk with the lowest scores has scores ranging from 20-120, the second chunk has scores from 130-180 and the chunk with the highest scores has scores 200-350. We now simply write these ranges to a 'stats' collection (the write-count is reduced to 1, no matter how many users!).

Upon rank retrieval, the user simply reads the most recent 'stats' document and approximates their percentile rank by comparing the ranges with their own score. Of course it is possible that a user scores higher than the greatest score or lower than the lowest score from the previous 'stats' update, but I would just consider them belonging to the highest scoring group and the lowest scoring group respectively.

In my own application I used M = 20 and could therefore show the user percentile ranks by 5% accuracy, and estimate even within that range using linear interpolation (for example, if the user score is 450 and falls into the 40%-45%-chunk ranging from 439-474, we estimate the user's percentile rank to be 40 + (450 - 439) / (474 - 439) * 5 = 41.57...%).

If you want to get real fancy you can also estimate exact percentile ranks by fitting your expected score distribution (e.g. normal distribution) to the measured ranges.

Note: all users DO need to read the 'stats' document to approximate their rank. However, in most applications not all users actually view the statistics (as they are either not active daily or just not interested in the stats). Personally, I also used the 'stats' document (named differently) for storing other DB values that are shared among users, so this document is already retrieved anyways. Besides that, reads are 3x cheaper than writes. Worst case scenario is 2N reads and 1 write.

javascript magazin