Recommendation System
Date: 2019/10/12 Categories: 工作 Tags: recsys
Papers
Discussions
Difficlities
Based on the difficulties I had implementing recommender systems, I think it boils down to two main problems: 1. Evaluating recommender systems is REALLY hard. When it comes to evaluation, recommender systems are very different from other machine learning tasks:
Let’s assume the classic methodology of collecting a dataset of events, splitting it at a certain point in time to get a training and test set, and checking the precision and recall
If your recommender system works as expected, you will influence what your users do in the future, so your test set is probably not going to represent what your users would actually do.
Having a precision and recall of 1 is actually bad. That means your recommender system was perfect, but also useless (you only recommended stuff that the user was already going to pick anyway).
One way to address this is to just use A/B tests and try to optimize some business metric (e.g. number of purchases). This is usually “good enough”, but this will make your recommender focus on sales, not on user satisfaction.
There are also some other metrics that can be used[1], but there are so many of them and some are not very practical to implement, so I guess that everyone just goes with the A/B testing approach.
- Recommendations require an explanation
A lot of recommender algorithms are black boxes. Sure, you can write “this was recommended based on users similar to you” when you use a collaborative filtering algorithm, but that doesn’t help much.
Recommendations without an explanation are not that useful. When a friend recommends you a movie, he’ll also tell you “why”. Otherwise, it’s really hard to make users trust your recommendations (especially if the recommender system recommends something outside of the user’s “comfort zone”)
I’ve noticed that a lot of sites have improved on this front though, and I do enjoy those recommendations a lot more.
Why no wide-spreaded recsys system
It is hard to sell technology to companies when they have their own teams (often using free libraries). Embedded teams are always experts and will often discredit better technology. Often the only method of testing is A/B. These can easily be manipulated. For instance at Netflix (ignoring more blatant practices), P-hacking (run thousands of simulations and report ones that worked), and HARKing (come up with a hypothesis after the results are known) are rampant. That is part of the reason the recommender has been degrading over the years.
- P-hacking: 做很多实验, 挑那个看起来好的
- HARKing: 先射箭后画靶子(Hypothesizing After the Results are Known)
Categories of Recsys
There are basically 3 types of recommender engines:
Content Based: If you can represent your products as a vector, you can have a distance between each product, then you have a item-item recommendation. You can use all kinds of embedding to achieve this results, some techniques that we tried are word2vec embedding of user navigation, auto encoding of features using neural networks, dimensionality reduction with PCA, ALS, etc. There are lots of libs for solving these problems as is a very studied field, usually numpy and for finding the neighbors we use ann from scikitlearn, because if you have millions of items, you cant just find the distance between all the pairs.
Collaborative - Filtering, here you use the pairs of behavior of the users,
<user, item, ranking>
. There is a surprise lib in python that works well, you have the MlLib from Spark too, this techniques are called matrix factorization techniques, and also gives you a embedding of the item or the user, and you can apply the techniques of content based to find user-user and item-item recommendations along the user-item recommendationsHybrid Models: These are the models that use behavior and features of the user an items, LightFM is a good lib that works well, but you can model it with other tools like neural networks (https://ai.google/research/pubs/pub45530).
The challenges are depending on the company, its not the same to recommended small amount of items to large number of users than large number of items to small number users.
There is a whole specialization in coursera that is really good https://www.coursera.org/specializations/recommender-systems
CF
evaluation
- Evaluating Collaborative Filtering Recommender Systems
- leave-one-out cross-validation
If jane saw schindler’s list and gave it 8⁄10, delete this from the database and use the rest of the data to predict it, then compare the predicted score with the score she gave (USING RMS)
http://www.fmjlang.co.uk/morse/morse.pdf
tools
- Spark
- Superise
- lightfm
- implicit
- xlearn