Resume

Lijie Li · Data Scientist

Aalto UniversityKTH Royal Institute of Technology

Data scientist focused on multilingual retrieval, applied ML, and AI products. This site highlights selected systems work and case studies.

AI & Machine Learning

Deep LearningNLPSpeech RecognitionGenerative ModelsRAG SystemsData Mining

Engineering & Cloud

PythonPyTorchLangChainTritonSparkAzure AI FactoryGitLinuxAsync IO

Data & Analytics

SQLTableauMongoDBSPSSVisualizationStatistics

Focus

Research depth with product delivery

I build multilingual retrieval systems and production ML workflows, with an emphasis on measurable impact and clear evaluation.

Practice #1

Data Systems Practice

Day job energy goes into multilingual RAG stacks, speech models, and measurable retrieval governance. I prefer shipping explainable systems over publishing papers.

  • Agentic RAG orchestration
  • Knowledge graphs & KG ops
  • QLoRA + TPE fine-tuning
  • Triton + GPU tooling

Practice #2

Applied ML Delivery

Hands-on delivery of ML products: data pipelines, evaluation frameworks, and reliable deployments.

  • Retrieval evaluation & monitoring
  • Multilingual QA & reranking
  • Latency-aware inference stacks
  • Experiment design & reporting

Hybrid Case Studies

Where data products and visuals converge

Selected systems pairing measurable rigor with sensory storytelling.

AI SystemsVTT · Finland · 3rd place · 2025 AaltoAI Hackathon

Knowledge Graph Challenge on Heterogeneous Sources

Automated ingestion + semantic entity resolution with 100% traceability.

  • Hybrid search (Qdrant ANN + BM25) fused with RRF and Cross-Encoder rerankers.
  • HDBSCAN-powered entity resolution using `text-embedding-small` vectors.
  • Evaluation suite covering Hit Rate, MRR, and innovation lineage tracking.
AI ResearchAalto University · 2nd place

SNLP Challenge: Multilingual Speech + Toxicity

WER 0.0664 / CER 0.0123 with Wav2Vec2-BERT + SpecAugment.

  • Fine-tuned multilingual BERT with Triton acceleration and WandB tracking.
  • Benchmarked four multilingual toxicity models across English / German / Finnish.
  • Blended character-level noise defenses with balanced sampling strategies.
Data ProductsKunshan Yuanpai Trading · China

Recommendation & Uni-cloud Platform

Reduced query latency and improved personalization for merchandising teams.

  • DBSCAN clustering + MAB exploration to surface high-value customer cohorts.
  • Optimized MongoDB schema and SQL interfaces for order + inventory ops.
  • Built Tableau dashboards to translate raw telemetry into decisions.
AI EngineeringP&G · Finland · 3rd place · 2025 Junction

Agent Challenge on Automated Personalized Marketing

Multimodal n8n Agentic workflow for localized multi-channel assets.

  • Engineered a multimodal n8n Agentic workflow that adapts visual elements and optimizes text constraints for specific channels, achieving cultural localization.
  • Implemented a Self-Reflective and adaptive design with CoT reasoning to iteratively critique outputs and enforce safety guardrails.

Experience

Professional experience

View full CV on LinkedIn ↗

AI / Data Roles

Aug 2025 — Present

Data Scientist

Lexembed · Stockholm, Sweden

Designing multilingual knowledge engines that blend Agentic RAG, case-based reasoning, and knowledge graphs for legal intelligence teams.

  • Built a multi-hop QA flow that fuses entity extraction with graph traversals for rapid compliance research.
  • Introduced quantitative retrieval guardrails using RAGAS and automated regression suites for every release.

Aug 2023 — Mar 2024

Data Specialist (Intern)

International Digital Economy Academy · Shenzhen, China

Owned the end-to-end lifecycle for policy moderation models, from generative data augmentation to adversarial hardening and deployment.

  • Fine-tuned DeBERTaV3 with QLoRA + TPE, cutting VRAM usage by 80% and improving F1 by 5 points.
  • Used TextAttack adversarial suites to harden classifiers and validated robustness with macro-F1 and MCC dashboards.

Availability

Open to data science roles

Based in Stockholm. Open to onsite or remote ML roles across Europe and global teams.

Based in Stockholm · Shenzhen friendly · English / 中文