Recommender Systems
Poznámky na přípravu na zkoušku z RS 2023/2024
Collaborative Filtering (CF) (Lecture 1)
Pure CF Approaches
- Input: User-item rating matrix
- Output Types:
- Numerical prediction of user preference
- Top-N list of recommended items
User-based Nearest-Neighbor Collaborative Filtering
- Technique:
- Given an "active user" (Alice) and an item not yet seen by Alice.
- Find a set of users (peers/nearest neighbors) who liked the same items as Alice in the past and who have rated the item.
- Use the average of their ratings to predict if Alice will like the item.
- Recommend the best-rated items Alice has not seen.
- Assumption: User preferences remain stable over time.
Measuring User Similarity
-
Pearson Correlation:
: Rating of user for item : Average rating of user : Set of items rated by both and
-
Jaccard Similarity:
and : Sets of items rated by users
Making Predictions
-
Prediction Formula:
: Predicted rating for user for item : Average rating of user : Set of neighbors of user : Similarity between users and
-
Improving Metrics:
-
Variance Weighting:
-
Memory-based vs. Model-based Approaches
- Memory-based: Uses the rating matrix directly.
- Model-based: Builds models from data using techniques like matrix factorization.
Item-based Collaborative Filtering
- Technique:
-
Use the similarity between items to make predictions.
-
Cosine Similarity:
-
Adjusted Cosine Similarity:
-
Data Sparsity Problems
-
Cold Start: Use default voting or hybrid methods.
-
Default Voting:
Matrix Factorization
-
Problem Formulation:
: Known rating : User feature matrix : Item feature matrix : Regularization parameter
-
Stochastic Gradient Descent (SGD):
def stochastic_gradient_descent(X, y, theta, alpha, num_iters): m = len(y) for i in range(num_iters): for j in range(m): h = np.dot(X[j], theta) loss = h - y[j] gradient = X[j].T * loss theta = theta - alpha * gradient return theta
Matrix Factorization Algorithm
On slides there are better pseudocode with more intuitive thinking
- Algorithm Steps:
def matrix_factorization(R, P, Q, K, steps=5000, alpha=0.0002, beta=0.02): Q = Q.T for step in range(steps): for i in range(len(R)): for j in range(len(R[i])): if R[i][j] > 0: eij = R[i][j] - np.dot(P[i, :], Q[:, j]) for k in range(K): P[i][k] = P[i][k] + alpha * (2 * eij * Q[k][j] - beta * P[i][k]) Q[k][j] = Q[k][j] + alpha (2 * eij * P[i][k] - beta * Q[k][j]) eR = np.dot(P, Q) e = 0 for i in range(len(R)): for j in range(len(R[i])): if R[i][j] > 0: e = e + pow(R[i][j] - np.dot(P[i, :], Q[:, j]), 2) for k in range(K): e = e + (beta / 2) * (pow(P[i][k], 2) + pow(Q[k][j], 2)) if e < 0.001: break return P, Q.T
Hyperparameter Tuning
- Grid Search / Random Search / Bayesian Optimization / Evolutionary Algorithms
Advanced Collaborative Filtering Techniques
- Recursive CF:
- Apply CF-method recursively for sparse datasets.
- Graph-based Methods:
- "Spreading activation" to use longer paths in user-item graphs.
Collaborative Filtering Issues
- Requires a user community.
- Suffers from data sparsity.
- Limited integration with other knowledge sources.
- Challenges in explaining results.
Matrix Factorization Techniques
-
Problem Formulation:
- Minimize sum of squared errors with regularization.
-
Gradient Descent:
def gradient_descent(X, y, theta, alpha, num_iters): m = len(y) for i in range(num_iters): h = np.dot(X, theta) loss = h - y gradient = np.dot(X.T, loss) / m theta = theta - alpha * gradient return theta -
Stochastic Gradient Descent:
def stochastic_gradient_descent(X, y, theta, alpha, num_iters): m = len(y) for i in range(num_iters): for j in range(m): h = np.dot(X[j], theta) loss = h - y[j] gradient = X[j].T * loss theta = theta - alpha * gradient return theta
Optimizing Matrix Factorization
- Factors to consider:
- Number of latent factors
- Learning rate
- Amount of regularization
- Stopping criteria
Ranking-oriented Optimization
- Bayesian Personalized Ranking (BPR):
- Focuses on ranking correctness.
- Maximizes distance in rating of good and bad objects.
- Uses implicit feedback for optimization.
Content Based Recommendation (Lecture 2)
Overview
- Content-based recommendation systems utilize item attributes and user profiles to make recommendations.
- Requires some information about the items and user preferences.
Definitions
- Content: Attributes such as genre, author, type, price, and keywords.
- User Profile: Describes what the user likes, based on previous interactions or explicit input.
Item Representation and Content
Structured vs. Unstructured Data
- Structured: Each item described by the same set of attributes.
- Unstructured: Free-text descriptions.
Examples
- Example items with attributes like title, genre, author, type, price, and keywords.
Simple Content Representation
Keyword Overlap
- Simple Approach: Compute similarity based on keyword overlap using metrics like the Dice coefficient.
Jaccard Similarity
- A method to measure similarity between items based on shared attributes.
- #^3a0177
Example
- Describing books with a set of keywords and calculating overlap with user profiles.
Term-Frequency - Inverse Document Frequency (TF-IDF)
TF-IDF Overview
- TF (Term Frequency): Measures how often a term appears in a document.
- IDF (Inverse Document Frequency): Reduces the weight of terms that appear in many documents.
Formula
- TF:
: Frequency of term in document : Total number of terms in document
- IDF:
: Total number of documents : Number of documents containing term
- TF-IDF:
Example
- Term frequency matrix and combined TF-IDF weights for documents.
Improving the Vector Space Model
Techniques
- Stop Words Removal: Eliminate common words like "a", "the".
- Stemming: Reduce words to their root form.
- Size Cut-offs: Use top representative words to reduce noise.
Advanced Methods
- Lexical Knowledge: Feature selection using domain knowledge or tools like WordNet.
- Phrase Detection: Identify meaningful phrases rather than single words.
Content-Based RS Beyond Text
Numeric and Nominal Attributes
- Handling of attributes such as multimedia, categorical, and numeric attributes.
Example Approaches
- Fixed Weighting: Questioning the use of TF-IDF for certain attributes like gender or age.
- State-of-the-Art: Usage of attention layers and embeddings for multimedia.
Content-Based RS: Multimedia and Attribute Data
Combined Approach
- Aggregation of attribute-based and visual similarity using algorithms like Fuzzy D’Hondt.
Practical Examples
- Examples of combining similarities for recommendation tasks.
Query-Based Retrieval: Rocchio's Method
Overview
- Users provide feedback on relevant/irrelevant documents.
- The system adjusts the query based on feedback.
Formula
-
: Weight for original query : Weight for positive feedback : Weight for negative feedback
Bayesian Relevance Feedback
- Evaluate all points in the solution space based on distance to positive examples.
Explicit Decision Models
Decision Trees
- Trees are constructed from training data using item features to partition test examples.
Rule Induction
- Based on algorithms like RIPPER, effective for certain types of classification tasks.
Feature Selection
Strategies
- Use domain knowledge, frequency-based selection, and measures like mutual information.
Practical Tips
- Evaluate the utility of keywords and construct ranked lists.
Limitations and Challenges
Common Issues
- Overspecialization, ramp-up phase, and limited content extraction capabilities.
Solutions
- Use additional sources for user preferences, employ diversity re-ranking methods.
Summary
- Content-based techniques do not require a user community but may overfit and provide overly similar recommendations.
Knowledge-Based Recommendation
Basic I/O Relationship
- Systems that recommend based on explicit user needs.
Types
- Constraint-Based: Based on a set of recommendation rules.
- Case-Based: Items retrieved using similarity measures.
Examples
- Constraint-based GUI and practical use cases for critiquing in case-based systems.
Evaluation of Recommender Systems (Lecture 4)
Overview
- Evaluating recommender systems is essential to determine their effectiveness and identify areas for improvement.
- Numerous evaluation techniques exist, each suited for different application domains.
Motivation
- Key research questions include:
- Is a recommender system more efficient in terms of accuracy, user satisfaction, response time, serendipity, online conversion, ramp-up efforts, etc.?
- Do customers like or buy recommended items?
- Are customers satisfied with recommendations after purchase?
- Do recommendations increase site usage?
Scientific Requirements for Evaluation
Reproducibility
- Scientific experiments must ensure reproducibility to verify findings.
- Methodology: Describe the methodology thoroughly, follow systematic procedures, and document all decisions.
Evaluation Criteria
- Validity:
- Internal Validity: Effects are due to controlled conditions, not participant differences or environmental factors.
- External Validity: Results are generalizable to other groups or situations.
- Reliability: Consistency and absence of errors in data and measurements.
- Sensibility: Different evaluations should reflect in differences in measurements.
Components of Evaluation Research
Subjects
- Typically specific subgroups like online customers or web users receiving personalized recommendations.
Settings
- Lab Studies:
- Created for the study with controlled variables.
- Potential issues with participant motivation.
- Field Studies (Online Evaluation):
- Conducted in real-world environments with intrinsically motivated users.
- Can be costly.
- Offline Studies:
- Based on real-world data, analyzing after-the-fact.
- Recent improvements towards de-biasing and multi-criterial evaluation.
Measured Variables
- Independent Variables: Static throughout the experiment, controlled by the evaluation design (e.g., recommendation algorithm).
- Dependent Variables: Influenced by independent variables and control conditions.
Design Types
- Experimental: Manipulates independent variables to observe their impact on dependent variables.
- Quasi-Experimental: Lacks random assignment of subjects to treatments.
- Non-Experimental: Uses observational research methods like surveys or case studies.
- Cross-Sectional: Analyzes relations among variables measured in different groups.
- Case Studies: Combines various methods to investigate real-life contexts.
Evaluation Settings and Techniques
Online Experiments
- A/B Testing:
- Users are randomly directed to recommendations from different algorithms.
- Evaluate metrics close to the target behavior (e.g., profit for retailers, subscriptions for Netflix).
Lab Studies
- Includes preference elicitation phases and features like questionnaires or interviews to capture harder-to-detect aspects like trust and novelty.
Offline Evaluation
- Datasets: Collections of synthetic or historical user interaction data, with considerations for dataset size, sparsity, and domain.
Methodology
- Data Splits:
- Holdout: Randomly withholds a set of ratings as the test set.
- N-fold Cross-Validation: Divides data into partitions, each used once as the test set.
- Leave-P-out Validation: Tests one rated item at a time for each user.
Bias and Variance
- Understanding biases in data is crucial as they affect evaluation. Mitigation strategies include using temporal data splits and recording impressions.
Metrics for Evaluating Recommender Systems
Relevance and Ranking Metrics
- Precision and Recall:
- Precision: A measure of exactness, determining the fraction of relevant items retrieved out of all items retrieved.
- Recall: A measure of completeness, determining the fraction of relevant items retrieved out of all relevant items.
- Precision: A measure of exactness, determining the fraction of relevant items retrieved out of all items retrieved.
Rank-Based Metrics
- Normalized Discounted Cumulative Gain (nDCG): Uses a logarithmic reduction factor.
Advanced Metrics
- Mean Average Precision (MAP): Average precision values after each successful prediction.
- Novelty and Diversity:
- Novelty: Measures how new items are to the user.
- Diversity: Measures similarity among recommended items.
- Serendipity: Combines relevance and unexpectedness.
Practical Challenges and Solutions
Causality and Biases
- Evaluation must account for causality and biases in data.
- Strategies: Use temporal data splits, simulate evaluation, and apply de-biasing techniques.
Offline Evaluation Metrics
- Metrics depend on research questions and hypotheses, including relevance, rating error, novelty, diversity, coverage, and serendipity.
Summary and Best Practices
Key Takeaways
- Proper evaluation requires a combination of experimental design, methodology, and metrics tailored to specific research questions.
- Awareness of biases and selection of appropriate metrics are critical for meaningful evaluations.
Future Directions
- Explore beyond-accuracy metrics and ensure evaluations consider long-term effects and user trust.
Recap
-
Relevance and Ranking Metrics:
- Precision:
- Recall:
- F1 Score:
- Mean Average Precision (MAP):
- Precision:
-
Rank-Based Metrics:
- Rank Score:
- Liftindex:
- Normalized Discounted Cumulative Gain (nDCG):
- Rank Score:
-
Advanced Metrics:
- Novelty:
- Diversity
- Novelty:
:
$$
\text{Diversity} = \frac{1}{N} \sum_{i=1}^{N} \sum_{j=i+1}^{N} \text{diversity}(i, j)
$$
- Serendipity:
Linear Monotone Preference Model (Lecture 5)
Overview
- The Linear Monotone Preference Model (LMPM) is used to handle user requirements and conflicting multicriterial preferences.
- Focus on creating an ordering system that is intuitive and explainable to users.
Conflicting Requirements
- Often, no single product can satisfy all user requirements perfectly.
- The goal is to find products that come close to meeting user preferences.
Ordering and Preference Scales
Human-Intuitive Ordering
- Ordering should be explainable and relatable to users.
- Example: Likert scale, which is a common method for capturing user preferences (e.g., from 1 to 5).
User Preferences
- User preferences are often scaled and have ideal values.
- In competitions, clear winners exist, but in e-commerce, user preferences vary significantly.
- Marketing segmentation and faceted browsing are used to specify ideal values for different user segments.
Preferential Sets and Fuzzy Sets
Preferential Sets
- Preferential sets are a variant of fuzzy sets intended to model human linear preferences.
- Example: Price categories can be modeled as fully cheap, reasonable, and expensive on a linear scale from low to high.
Preference Scale
- A typical preference scale ranges from 0 to 1, where 0 is least preferred and 1 is most preferred.
Decathlon Data Example
Multi-Criterial Scale Points
- Decathlon points can be used as a basis for making different attributes commensurable.
Dominance and Pareto Ordering
- Dominance in multi-criteria decision making is similar to Pareto ordering, where an item is considered better if it is not worse in any criterion.
- Example: In a Decathlon, an athlete who scores higher in all events dominates others.
Data Cube and Preference Cube
- Data Cube: Represents raw data.
- Preference Cube: Transforms data into a preference-based format.
Sum of Points
- The sum of points can be used to make a Decathlon linear by aggregating individual event scores.
Linear Monotone Preference Model (LMPM)
Definition
- LMPM is a method to order items based on multiple criteria by transforming attributes into preference scores.
- The model aggregates these scores to provide an overall preference rating.
Aggregation Function
- The aggregation function ( t ) is used to combine individual attribute preferences into a global preference score:
Contour Lines
- Contour lines connect points with the same preference score in the preference cube and can be visualized in the data cube.
Building the LMPM Step-by-Step
Triangular Preference Functions
- Local preferences can be modeled using triangular functions:
Trapezoidal Preference Functions
- Trapezoidal functions provide a more flexible way to model preferences with ideal intervals:
Calculating Preference Degrees
Mapping Data to Preferences
- Objects are mapped from the data cube to the preference cube based on their attribute values and user preference functions.
Aggregation and Ordering
- Overall preference is given by:
- Contour lines in the data cube represent items with the same preference degree.
Example Calculation
- For user
with preferences and : - Objects B, D, E are mapped to preference degrees based on their attribute values.
Dynamical Model and Time Simulation
Moving Ideal Points
- User preferences can evolve over time, with ideal points shifting based on user interactions.
- The model can simulate this dynamic preference adjustment.
Example Sessions
- Starting with initial preferences ( f_0 ) and aggregation ( t_0 ):
- Time 0: Initial preference contour.
- Time 1: User clicks update the ideal point.
- Time 2: New ideal point after further interactions.
Aggregation and Visualization
Aggregation Function
- The aggregation function combines attribute preferences into a global preference score, ensuring Pareto ordering:
Contour Line Visualization
- Contour lines represent levels of global preference and help visualize the preference landscape in both data and preference cubes.
Practical Applications
- Used in web e-shops to order items based on user-specific preference models.
- Each user can have their own preference scales and aggregation methods, allowing for personalized recommendations.
Extensions and Practical Considerations
Handling Complex Preferences
- Extend the basic model to handle more complex preference shapes (e.g., trapezoidal, mixed shapes).
Market Segmentation
- Different user segments can have different ideal points and aggregation functions, reflecting diverse preferences.
Real-World Applications
- Applications in e-commerce, personalized recommendations, and user preference modeling.
Simulation and Extensions
- Simulations can show how preferences evolve over time and how different aggregation methods impact the ordering of items.
Querying Threshold Algorithm (Lecture 6)
Overview
- Focuses on querying techniques to retrieve the top-k results from a dataset.
- Includes various data models and algorithms for efficient querying without scanning the entire dataset.
User Preferences and Data Cube
Preference Model
- Function
: where is a combination function and are preference functions.
Visualization
- Contour lines represent different levels of preferences.
- Example:
Threshold Algorithms (FLN TA)
Web Service Motivation
- FLN data model viewed as Linear Monotone Preference Model (LMPM).
- FLN TA (Threshold Algorithm) is designed for top-k querying.
Data Model
- Objects:
with attributes. - Scores:
- Combination Function:
Threshold Algorithm Steps
-
Sorted Access:
- Access each list
sequentially. - Perform random access to other lists to find grades.
- Compute:
- Remember the top-k objects.
- Access each list
-
Define Threshold:
- For each list
, let be the grade of the last object seen. - Define threshold
:
- For each list
-
Output:
- Set
containing top-k objects with highest grades.
- Set
Algorithm Illustration
Example
- Sequential and random access steps illustrated graphically.
- Demonstrates computation and comparison of object grades to maintain top-k list.
Heuristics and Optimizations
Incremental Threshold Algorithm
- Adjustments to threshold algorithm for incremental improvements.
- Uses frequency and depth of sequential access to optimize performance.
FLN Data Model and Multiuser Extensions
Multiuser Model
- Extends FLN data model to multiuser scenarios.
- Overall Preference:
Proof of Correctness
FLN Theorem 4.1
- Monotone Aggregation Function:
Proof Outline
- Assume an object
. - Use bounds:
where is the threshold value.
No Random Access (NRA) Algorithm
NRA Algorithm Steps
-
Access Each List:
- Calculate
. - For each item
, calculate
- Calculate
-
Viability Check:
- An object
is viable if:
- An object
-
Output Set:
- The set
contains the top-k objects.
- The set
Further Notes
Correctness of NRA
- Theorem: NRA correctly finds top-k objects if the aggregation function is monotone.
Conclusion
- Comparison of different access methods, optimization strategies, and multiuser scenarios.
- Discussion on FLN TA and NRA optimality, and practical implementations.
Hybrid Recommender Systems (Lecture 7)
Overview
- Hybrid recommender systems combine various recommendation methods to enhance performance and mitigate the shortcomings of individual methods.
- Knowledge-based: Recommends items based on explicit user requirements.
- Content-based: Recommends items similar to those the user liked in the past.
- Collaborative: Recommends items that are popular among similar users.
Hybridization Designs
Hybridization Design Types
- Parallel Use: Several recommendation systems run simultaneously, and their results are combined.
- Monolithic: A single recommendation component exploits features from multiple paradigms.
- Pipelined: A sequence of recommendation systems where each system processes the output of the previous one.
Monolithic Hybridization Design
Feature Augmentation
- Content-Boosted Collaborative Filtering:
- Uses content features to create additional synthetic ratings.
- Example:
- Alice likes Item 1 and Item 3.
- Item 7 is similar to Items 1 and 3 by a degree of 0.75.
- Therefore, Alice is predicted to like Item 7 with a rating of 0.75.
- This approach reduces sparsity in the user-item rating matrix.
- Peers with more co-rated items have more influence.
- Higher confidence in predictions when the user has many ratings.
Visual Bayesian Personalized Ranking (BPR)
- A latent factor model that integrates visual features.
- Part of the item embedding is static, based on pre-trained visual feature extractors.
- Dimensionality reduction is applied to balance collaborative filtering and content-based tradeoffs.
- Trained using BPR loss to rank items effectively.
Parallelized Hybridization Design
Weighted Combination
- Combines the output of multiple recommendation systems using a weighting scheme.
- Example:
- Weights can be dynamically adjusted based on performance metrics.
- Dynamic Adjustment:
- Initial uniform weight distribution.
- Weights are adapted to minimize the prediction error for each user.
- Uses multi-armed bandit strategies to optimize weights based on empirical data.
Estimating Weights
- Empirical Bootstrapping:
- Historic data is used to compute different weightings.
- The best-performing weighting scheme is selected.
- Dynamic Adjustment Example:
- Start with uniform weights.
- Adjust weights incrementally to reduce Mean Absolute Error (MAE).
Switching and Mixed Hybridization
Switching
- Uses an oracle to decide which recommender to use based on predefined criteria.
- All weights except one are set to zero.
- Example:
- If there are too few ratings, use a knowledge-based recommender.
- Otherwise, use a collaborative filtering approach.
Mixed
- Results from different recommenders are combined at the user interface level.
- Each technique's results are presented together.
- Example:
- Thompson Sampling:
- Uses Beta distribution to model the success probability of each recommender.
- Recommender with the highest sampled value is selected for each position in the recommendation list.
- Thompson Sampling:
Pipelined Hybridization Design
Cascade
- Successive recommenders refine the recommendations of the predecessors.
- Example:
- The first recommender removes no-go items.
- The second recommender assigns scores and refines the list.
- This method is effective in large-scale scenarios where initial filters reduce the candidate set.
Meta-Level
- A recommender system builds a model used by the subsequent system.
- Example:
- A content-based recommender builds user models with weighted term vectors.
- A collaborative filter uses these models to find similar users and make recommendations.
Limitations and Considerations
Limitations of Hybrid Strategies
- Few empirical studies compare different hybrid strategies.
- Most datasets lack the required data to compare multiple recommendation paradigms effectively.
- Considerations:
- Monolithic Hybrids: Require preprocessing but incorporate more knowledge.
- Parallel Hybrids: Need careful score matching across different recommenders.
- Pipelined Hybrids: Work well when combining antithetic approaches.
Example Formulas and Metrics
Weighted Combination Formula
Dynamic Weight Adjustment
- Mean Absolute Error (MAE) calculation:
where
Reciprocal Recommender Systems (Lecture 8)
People Recommender Systems
Overview
- Focuses on recommending people, such as friends in social networks, mentors, teammates, or dating partners.
- Reciprocal recommender systems consider mutual preferences, aiming to find matches where both parties have a high likelihood of liking each other.
Reciprocal Recommender Systems in Online Dating
Modeling Likes and Dislikes
-
Unified Model for Compatibility:
- ( C_{\text{pos}}(AB) ): Compatibility based on liked attributes.
- ( C_{\text{neg}}(AB) ): Compatibility based on disliked attributes.
-
Harmonic Mean for Compatibility:
- Penalizes high differences in compatibilities between parties.
- Uses a harmonic mean to aggregate compatibility scores:
Insights from Recent Studies
- 2016 Study: Design of Reciprocal Recommendation Systems for Online Dating
- Emphasizes the importance of balancing likes and dislikes.
- Uses socio-demographic links to improve recommendations.
Practical Considerations
- Re-ranking for Active Users:
- Prioritize recently active and registered users to improve engagement.
- Pseudo Embeddings:
- Generate embeddings for new users based on metadata such as location.
- Pre-computed Recommendations:
- Store recommendations for existing users for faster query responses.
- Cold-Start Problem:
- Run inference servers for new users to generate recommendations in real-time.
People Recommender Systems in Job Recruitment
Job Recommender Systems
- Dual Stakeholders:
- Recruiters and job seekers both have preferences that need to be considered.
- Sources of Knowledge:
- Interaction data (e.g., clicks, applications).
- CVs (content-representation of users).
- Job descriptions (content-representation of items).
Effectiveness of Job Title-Based Embeddings
- Embedding Methods:
- Pre-trained Word2Vec.
- Self-trained Word2Vec.
- Self-trained Doc2Vec.
- Results:
- ItemKNN performs best when there is sufficient interactional data.
- Job title-based embeddings outperform resume-based embeddings in cold-start scenarios.
- Self-trained embeddings perform better than pre-trained ones.
Follow-Up Studies
Sentence-Pair Classification for Algorithmic Recruiting (2023)
- Addressing the Semantic Gap:
- Different language styles between resumes and job postings create a semantic gap.
- Use BERT's next-sentence-prediction feature to bridge this gap.
- Approach:
- Train a model to map resumes to job postings using sentence-pair classification.
Career Path Prediction
Task Definition
- Career Path Prediction:
- Predict the next career step based on a sequence of past experiences.
- Each experience is represented by a title, description, and ESCO occupation labels.
Approach
- Skill-Based Prediction:
- Use the ESCO ontology for skill-based matching.
- Description-Based Prediction:
- Employ contrastive learning to transform embeddings of past experiences to predict the next job title.
- Linear transformation maps previous experiences to the current job posting.
Practical Considerations and Challenges
Challenges in Reciprocal Recommender Systems
- Data Privacy:
- Ensuring user data privacy while collecting interaction data.
- Cold-Start Problem:
- Generating accurate recommendations for new users with minimal interaction data.
- Semantic Gap:
- Bridging the gap between different language styles in resumes and job postings.
Improving Recommendations
- Contrastive Learning:
- Learn to distinguish between similar and dissimilar job titles and descriptions.
- Language-Agnostic Networks:
- Develop models that can handle different languages and dialects to improve job matching.
Deep learning (Lecture 9)
Here are the detailed notes extracted from the provided slides on "Deep Learning for Recommender Systems" with headers indented one level deeper and all mathematical expressions formatted with double dollar signs for LaTeX rendering.
Deep Learning for Recommender Systems
Why Deep Learning?
- Deep learning has achieved significant success in various fields, including image recognition (e.g., ImageNet challenge).
- Deep learning models can handle complex architectures and are universal function approximators.
Neural Networks and Learning
Neural Model
- Neuron: The basic unit in a neural network, which processes inputs and produces an output through an activation function.
Feedforward Multilayered Network
-
Stochastic Gradient Descent (SGD):
where ( \theta ) represents model parameters, ( \eta ) is the learning rate, and ( \nabla J(\theta) ) is the gradient of the cost function.
-
Backpropagation:
- Used to compute gradients for each layer to update the model parameters.
- Challenges include vanishing gradients and slow learning.
Modern Deep Networks
Key Ingredients
- Rectified Linear Activation (ReLU):
- Dropout: A technique to prevent overfitting by randomly setting some neuron outputs to zero during training.
- Mini-batches: Using small subsets of the training data to compute gradients and update weights.
Deep Learning for Recommender Systems
Main Topics as of 2017
- Learning item embeddings
- Deep collaborative filtering
- Feature extraction from content
- Session-based recommendations with Recurrent Neural Networks (RNNs)
Research Directions
Best Practices
- Start simple and add complexity gradually.
- Optimize code for GPU/CPU performance.
- Ensure scalability and use open-source code.
- Avoid very small datasets and irrelevant tasks like rating prediction.
Item Embeddings and 2vec Models
Embeddings
- Definition: A learned real-value vector representing an entity.
- Use in Recommenders: Initialization of item representation in advanced algorithms, item-to-item recommendations.
Matrix Factorization (MF)
- Learning User and Item Embeddings:
Word2Vec
Continuous Bag of Words (CBOW)
- Model:
- Maximizes the probability of a target word given its context.
- Input: One-hot encoded words.
- Output: Likelihood of words given the context using softmax.
Skip-gram
- Model:
- Maximizes the probability of context words given a target word.
- Uses shared weights for input and output words.
- Negative Sampling: Efficiently trains the model by updating a subset of negative samples.
Paragraph2Vec (Doc2Vec)
- Learns representations for paragraphs/documents based on the CBOW model, adding global context.
Prod2Vec
Skip-gram on Products
- Model:
- Input: (i)-th product purchased by the user.
- Context: Other purchases of the user.
- User embeddings added as global context.
Meta-Prod2Vec
- Extends Prod2Vec by incorporating item metadata into embeddings, improving recommendations by using side information.
Feature Extraction from Content
Images, Text, and Audio
- Convolutional Neural Networks (CNNs): Extract features from images.
- RNNs and BERT: Extract features from text.
- Applications include fashion, video, and product reviews.
Music Representations
Audio Features in Recommender Systems
- Integration with MF:
- Minimize the distance between music features and MF’s feature vectors.
- Replace item features with music features to minimize the original loss.
Neural Collaborative Filtering (NCF)
NeuMF (He et al. 2017)
- Uses a Multi-Layer Perceptron (MLP) instead of a simple dot product to learn more complex dependencies.
Session-Based Recommendations with RNNs
GRU4Rec (Hidasi et al. 2015)
- Network Structure:
- Input: One-hot encoded item ID.
- GRU layer(s) to process sequential data.
- Output: Scores over all items, targeting the next item in the session.
- Techniques:
- Session-parallel mini-batching.
- Output sampling to handle large item sets.
Follow-Up Development
- Adoption of new networks from Natural Language Processing (NLP) to Recommender Systems.
- Transformers4Rec by NVIDIA Merlin for modern deep learning in recommendations.
Utilizing More Information
Meta-Prod2Vec and Content2Vec
- Meta-Prod2Vec: Embeds metadata along with items to improve recommendations.
- Content2Vec: Uses multimodal information (e.g., images, text) to learn pairwise item similarities.
Context-Aware Recommender Systems
- Incorporate context into matrix factorization and other models for personalized recommendations.
Fairness and Bias in Recommender Systems
Addressing Biases
- Selection Bias: Users choose which items to rate, causing non-representative samples.
- Exposure Bias: Users only interact with exposed items.
- Conformity Bias: Users tend to behave similarly to others.
- Position Bias: Users interact more with items in higher positions on the recommendation list.
Outlook
Future Directions
- Complex, task-specific architectures.
- Scalability challenges and solutions (e.g., sparse EASE variants).
- Incorporating user control and large language models (e.g., conversational recommender systems).
- Emphasizing the evaluation of recommender systems in a broader ecosystem context.