Welcome to my project showcase! Here, you’ll find a collection of my most significant works in AI and data science. Each project represents a unique challenge and demonstrates my approach to solving complex problems using cutting-edge technologies.
My Projects
1. Agent 2048 Visually Masters Strategic Gameplay Through Data, Rewards, and RL
Experiment and Research (April 2025)
This project investigates if Vision-Language Models (VLMs) can learn to play the strategic game 2048 effectively by interpreting the board state directly from visual input, guided by reinforcement learning.Key Highlights:
- Successfully trained Qwen2.5-VL 7B to play 2048 by processing board images using GRPO
- Adapted multi-component reward functions (density, max tile, survival, format) to guide learning based on visually-informed actions
- Demonstrated that VLMs can translate visual understanding into sequential strategic decisions
- Leveraged 4-bit quantization and LoRA for efficient fine-tuning, including vision encoder adaptation
- Showcased RL’s potential for teaching VLMs complex visual reasoning and planning
Impact: This research enables VLMs to tackle visually complex sequential tasks like robotics, navigation, and dynamic visual analysis where visual state understanding drives decision-making.
2. Small Model, Big Equations: Enhancing LaTeX OCR with GRPO
Experiment and Research (14 April 2025)
This project tackles the challenge of accurate LaTeX Optical Character Recognition (OCR) from images, specifically investigating if smaller Vision Language Models (VLMs) can be trained using Reinforcement Learning (GRPO) to achieve high fidelity comparable to larger models.
Key Highlights:
- Successfully fine-tuned Qwen 2.5-VL (3B & 7B models) for improved LaTeX OCR using GRPO.
- Developed a novel “Validity-Gated” reward function prioritizing structural correctness (balanced braces heuristic) before rewarding textual similarity (ROUGE-L).
- Achieved significant ROUGE-L F1 score improvements on a held-out test set for both models (3B: +0.0640, 7B: +0.0633) compared to their base versions.
- Enhanced output reliability: Increased heuristic validity rate (to 100% for 7B) and dramatically reduced output format errors (especially for the 3B model).
- Demonstrated high efficiency: Achieved strong results using only 1000 training samples and ~2 hours of training time on 3x RTX 4090s (leveraging 4-bit quantization & LoRA).
Impact:This work shows that targeted RL fine-tuning can make high-quality, reliable mathematical OCR accessible even with smaller, more efficient VLMs. This has implications for scientific document processing, educational technology, and lowering the barrier to entry for advanced VLM applications.
3. Agent 2048: Teaching Language Models Strategic Gameplay Through Reinforcement Learning
Experiment and Research (April 2025)This research demonstrates how reinforcement learning can transform language models into strategic game players, using the 2048 puzzle as a testbed for spatial reasoning and long-term planning.
Key Highlights:
- Trained Qwen 2.5 7B model to achieve expert-level 2048 gameplay through pure reinforcement learning
- Developed multi-component reward system combining density optimization, survival metrics, and format compliance
- Achieved 83% win rate (2048 tile creation) on expert-level boards
- Discovered novel board management strategies through model innovation
- Demonstrated LLMs’ capacity for spatial reasoning and strategic foresight
Technical Details:
- Model Architecture: Qwen 2.5 7B-Instruct with LoRA (rank 16)
- Training Method: Group Relative Policy Optimization (GRPO)
- Key Reward Components:
- Board density optimization (sum/occupied cells ratio)
- Valid move enforcement
- Survival probability
- Maximum tile progression
- Hardware: Single RTX 4090 (24GB VRAM)
- Training Data: 8,000 procedurally generated game states across 5 difficulty levels
- Evaluation Metrics: Win rate, average score, max tile distribution
Impact:The techniques developed enable AI systems to master complex spatial reasoning tasks, with applications in logistics optimization, resource management, and automated strategy development.
View Project on GitHub
4. AI as Algorithm Designer: Teaching LLMs to Improve Sorting Through Trial and Error in GRPO
Experiment and Research (March 2025)
This research explores whether language models can not only implement but also innovate better methods for sorting data through reinforcement learning. Using Group Relative Policy Optimization (GRPO), the experiment demonstrates how AI can improve upon already optimized algorithms like Timsort implemented in python.
- Trained Qwen 2.5 (7B) models to optimize sorting algorithms using pure reinforcement learning
- Developed a sophisticated reward system that balances correctness and performance improvements
- Achieved up to 47.92x speedup over the baseline Timsort implementation
- Discovered novel hybrid sorting algorithms through model innovation
- Demonstrated that LLMs can systematically improve core computing primitives
5. Teaching Language Models to Invent or Optimize Efficient Sudoku Algorithms Through Reinforcement Learning
Experiment and Research (March 2025)
This research explores whether language models can learn to not just solve, but innovate and optimize algorithms for solving Sudoku puzzles through reinforcement learning. Using a pure RL approach without cold-start data, the experiment demonstrates how AI can evolve from problem-solver to algorithm inventor.
Key Highlights:
- Trained Qwen 2.5 (3B & 7B) models to improve upon a naive baseline Sudoku solver using GRPO
- Developed a sophisticated multi-component reward system that encourages algorithmic efficiency
- Discovered interesting learning dynamics between different model sizes (3B vs 7B)
- Achieved significant speed improvements over the baseline through learned optimizations
- Demonstrated that LLMs can innovate algorithmic improvements without seeing example solutions
Technical Details:
- Pure reinforcement learning approach using Group Relative Policy Optimization (GRPO)
- LoRA fine-tuning with ranks 16-32
- Multi-component reward system including format compliance, function validation, and performance improvement
- Adaptive timeout system for different puzzle difficulties
- Comprehensive code execution and evaluation framework
The techniques developed have broader applications in algorithmic optimization, automated code improvement, and teaching AI systems to innovate better solutions to complex problems.
6. Democratizing Synthetic Data: Creating Metropolitan Drone Footage with Blender
Duration: January 2025 – Present
This project demonstrates how accessible tools like Blender can generate realistic synthetic drone footage of urban environments for training autonomous systems and computer vision applications.
Key Highlights:
- Created a metropolitan environment with realistic building layouts and vehicle traffic
- Implemented authentic drone flight physics and camera properties
- Optimized performance using 3x RTX 4090 GPUs
- Proved high-quality synthetic data generation is possible without enterprise budgets
The framework enables autonomous drone training, computer vision development, and urban planning visualization using consumer hardware and open-source tools.
7. Teaching Language Models to Solve Sudoku Through Reinforcement Learning
Experiment and Research March 2025
This experiment explores whether language models can learn structured reasoning and puzzle-solving through reinforcement learning, focusing on Sudoku as a test case for logical deduction and rule-following.
Key Highlights:
- Successfully trained a Qwen 2.5 7B model to solve Sudoku puzzles using pure reinforcement learning
- Developed a sophisticated multi-component reward system for evaluating solutions
- Discovered a minimum size threshold for stable learning of complex reasoning tasks
- Demonstrated that reinforcement learning can teach language models to maintain format consistency and follow logical rules
The techniques developed have broader applications in programming, mathematical problem-solving, scientific reasoning, and formal verification.
8. Boosting Financial NLP with Reinforcement Learning
Experiment and Research 2025
This project explores the application of reinforcement learning to improve the accuracy of financial sentiment analysis on social media data, specifically for identifying inflation trends.
Key Highlights:
- Demonstrated that a fine-tuned 3B parameter language model can outperform much larger models on a specific financial NLP task.
- Achieved strong classification accuracy of 84% on inflation-related tweets using Group Relative Policy Optimization (GRPO).
- Developed a reward function that integrates correctness and formatting considerations.
- Fine tuned a Qwen 2.5 3B model and it performs on par at pass @1 and when using pass @N approaches near human levels at N=6, and at majority voting reaches: 84%
Technical Details:
- Hardware: Custom-built rig with three RTX 4090 GPUs
- Utilized a Qwen 2.5 3B language model, LoRA (Low-Rank Adaptation) with rank 32 and high-quality, human-labeled tweets with balanced distribution across classes
9. Enterprise Document Intelligence Solution
Client: FS Impact Finance Duration: Q3 2024
This project revolutionized the investment analysis process for FS Impact Finance by leveraging advanced AI techniques to process and extract insights from diverse document formats.
Key Achievements:
- Reduced document processing time from 3-4 weeks to just 4 hours
- Developed custom Vision-Language Models for legacy document processing
- Scaled training data from 20 manual samples to 28,000 synthetic documents
- Implemented automated time series analysis for renewable energy investment trends
10. Intelligent E-commerce Analytics System
Client: Aybee UG Duration: November 2024 – Present
An ongoing project focused on developing end-to-end machine learning pipelines for automated insight extraction from consumer feedback in the e-commerce sector.
Key Features:
- Automated insight extraction from consumer feedback
- Custom LLM training pipelines for processing confidential customer data
- Interactive chatbot interface with dynamic visualization capabilities
- Synthetic training data generation for fine-tuning cloud-provided LLMs
11. AI-Powered Financial Research Platform
Duration: October 2024 – Present
A personal project aimed at developing an autonomous financial analysis platform using agent-based LLMs and RAG architecture.
Project Highlights:
- Scalable infrastructure processing 1000+ financial documents daily
- Synthetic training data creation for finance-specific LLM fine-tuning
- Automated report generation with advanced visualization capabilities
- Potential for commercialization as a SaaS platform
12. Monetary Policy in the Age of Social Media: A Twitter-Based Inflation Analysis
Institution: Frankfurt School of Finance & Management Duration: 2022 – 2024 Publication: European Finance Association 2024
This research project extracted inflation expectations from Twitter data, exploring how social media insights could inform economic trends and monetary policy decisions.
Key Achievements:
- Analyzed millions of tweets using advanced Natural Language Processing
- Published findings in the European Finance Association 2024 conference
- Paper ranked in top 10 on SSRN for four weeks
Impact: Our work demonstrates how AI can bridge social media data and economic indicators, potentially influencing future monetary policy decisions.
Read the Full Paper | Read More About the Project | Learn More
The Impact of These Projects
Each of these projects demonstrates my ability to apply AI and data science techniques to real-world challenges. From drastically reducing processing times in document analysis to creating scalable, automated systems for financial research, these projects showcase the transformative power of AI when applied thoughtfully to business problems.
Technologies Used Across Projects
- Large Language Models (LLMs): Llama, Mistral, Qwen
- Computer Vision: Custom Vision-Language Models
- Synthetic Data Generation: GANs, Text-to-Image Models, Unreal Engine Simulations
- Data Processing: Python, PyTorch, TensorFlow
- Cloud Infrastructure: AWS, GCP
- Database Management: SQL, NoSQL, Graph Databases
Looking to Collaborate?
If you’re interested in discussing any of these projects in more detail or exploring how similar solutions could benefit your business, I’d love to hear from you. My diverse skill set and experience in creating innovative AI solutions can help bring your ideas to life.