Speech AI / Generative AI / Computer Vision

Faisal Ahmed

AI researcher and developer building practical systems across language and speech, generative AI, computer vision, biomedical AI, and production ML workflows.

1 Open-source PyPI package
15 Selected projects
35K+ Tutorial views
1 IEEE publication

Profile

Applied AI with a research mindset.

I design and ship AI systems that move from experiments into usable products, with a focus on multilingual language intelligence, speech processing, perception, and reliable backend delivery.

My work spans automatic speech recognition, transformer-based prediction models, document question answering, agentic and multimodal AI, face verification, custom OCR, data pipelines, visualization apps, medical image classification, and road-safety vision systems.

  • Language & Speech Information extraction, speech-to-text, text-to-speech, conversational AI, and RAG.
  • Generative AI LLM applications, agentic AI systems, and multimodal generation across speech, vision, and text.
  • Computer Vision Smart traffic monitoring, vehicle detection, OCR, virtual try-on, and pattern recognition.
  • Biomedical & Healthcare AI Biometric recognition, facial recognition, biomedical signal processing, and medical imaging.

Experience

Engineering AI systems from prototypes to services.

Recent roles blend research, model development, backend APIs, and practical integrations for production-oriented machine learning work.

Business Automation Limited, Dhaka November 2024 - Present

Machine Learning Engineer

  • Developed an automatic speech recognition system for English language workflows.
  • Built AI auto-suggestion and cluster-based remark suggestion features for a product.
  • Developed face verification authentication and OCR-based user information extraction from PDFs and images.
  • Built data archiving pipelines using Apache NiFi and Apache Airflow.
  • Created interactive data visualization web apps with Plotly Dash.
  • Developed FastAPI and PostgreSQL backend services with SMS, email, and custom PDF generation modules.
Next Solution Lab, Dhaka February 2024 - October 2024

AI Engineer

  • Developed a deep learning-based virtual try-on product for clothes and accessories.
  • Created OCR solutions tailored to business-specific document requirements.
  • Researched transformer-based non-English language models for text classification, question answering, and summarization.
Next Solution Lab, Dhaka April 2022 - January 2024

Associate AI Engineer

  • Developed industrial analog and digital meter recognition using object detection and text recognition.
  • Implemented named entity recognition for non-English documents.
  • Conducted R&D for computer vision and NLP paper implementation, product features, and model improvements.

Projects

Selected applied AI work.

A focused set of systems that show the range of my research and engineering work across LLMs, RAG, Bangla NLP, medical AI, OCR, real-time vision, and production tooling.

Document QA RAG System

  • Built a document question-answering workflow with retrieval augmented generation.
  • Supported Bengali and English document querying with FAISS vector search, FastAPI, and Docker deployment.
LLM RAG Llama 3.3 FAISS Docker

Bangla Sentence Punctuation Restoration

  • Built a Bangla punctuation restoration model using transformer-based language modeling.
  • Used Llama 3.2 for sentence correction and served the workflow through FastAPI and Docker.
Bangla NLP BanglaBERT Llama 3.2 FastAPI

Chattogram to Standard Bangla Conversion

  • Developed a transformer-based Seq2Seq model for local Chattogram language conversion.
  • Processed data with a SentencePiece tokenizer and trained an encoder-decoder model with attention.
NLP PyTorch Transformers Seq2Seq

Brain Tumor Classification Vision Transformer

  • Classified brain MRI images into four tumor categories using transfer learning.
  • Applied a ViT-based medical imaging workflow with PyTorch experimentation.
Medical AI PyTorch ViT Transformer

Wrong-Side Vehicle Detection

  • Detected wrong-side vehicle movement with a YOLO-based computer vision system.
  • Added license plate extraction using OCR for downstream traffic workflows.
Vision YOLOv10 EasyOCR PyTorch

AI Agent MCP Server

  • Built a Python MCP server foundation for AI agent tooling and workflow integration.
  • Structured the project for agent-server experiments and reusable tool endpoints.
AI Agents MCP Python Systems

GraphRAG Chatbot with FastAPI

  • Developed a chatbot workflow using GraphRAG and a FastAPI service layer.
  • Organized an LLM backend for retrieval-grounded responses and API integration.
LLM GraphRAG FastAPI Python

Anomaly Detection FastAPI

  • Packaged an anomaly detection model workflow behind a FastAPI service interface.
  • Prepared API endpoints for scoring and integration with production-style ML pipelines.
ML Systems FastAPI Python Anomaly Detection

Automatic Speech Recognition

  • Developed an automatic speech recognition repository for speech-to-text experimentation.
  • Focused on model workflow foundations for audio processing and language AI.
Speech ASR Python NLP

Image Cryptography with Autoencoders

  • Implemented chaotic-map image encryption and decryption with autoencoder experiments.
  • Included image-processing utilities and metrics such as SSIM, NPCR, and UACI.
Deep Learning Autoencoder Security Python

Building Infrastructure Recognition

  • Studied building infrastructure recognition using deep learning methods.
  • Combined classification and object-detection experiments for infrastructure analysis.
Vision Deep Learning Object Detection Research

Resume Categorization with Transformers

  • Built a transformer-based resume categorization workflow.
  • Used EDA and NLP modeling to map resume text into role categories.
NLP Transformers BERT EDA

Document Similarity with Doc2Vec

  • Implemented document similarity measurement using Doc2Vec and Gensim.
  • Applied NLP feature learning for semantic comparison workflows.
NLP Doc2Vec Gensim Python

Face and Eye Blink Detection

  • Built face and eye-blink detection with dlib and OpenCV.
  • Focused on real-time vision signals for monitoring and liveness-style interfaces.
Vision OpenCV dlib Python

Skills

Tools for research, product, and deployment.

Languages

  • Python
  • C/C++
  • Java
  • JavaScript
  • PHP
  • LaTeX

ML & AI

  • PyTorch
  • TensorFlow
  • Keras
  • SpaCy
  • Transformers
  • LangChain

Full-Stack Development

  • FastAPI
  • Flask
  • Django
  • React.js
  • Next.js

Data Analytics

  • Plotly
  • Tableau
  • Power BI

Delivery

  • Docker
  • CI/CD
  • Apache NiFi
  • Apache Airflow
  • Redis

Databases

  • MongoDB
  • MySQL
  • PostgreSQL

Teaching

Teaching practical AI for the Bangla community.

Computer Science Tutorials

Published tutorials on YouTube with 35K+ views and 450+ subscribers.

Open playlist

Publication & Education

Research, academic foundation, and leadership.

Building Infrastructure Classification with Hybrid CNN Architecture

F. Ahmed, M. Mahmudul Islam and S. M. Masudul Ahsan, EICT 2021. DOI: 10.1109/EICT54103.2021.9733635.

View publication

Education

B.Sc. in Computer Science and Engineering from Khulna University of Engineering & Technology, Khulna, Bangladesh.

February 2017 - April 2022

Visit KUET

Thesis

A Study of Building Infrastructure Recognition Using Deep Learning Methods.

View GitHub

Leadership

General Secretary, Software Research & Development Community of KUET.

Connect

Contact

Let us build something useful with AI.

Reach out for AI engineering, research collaboration, product prototypes, teaching, or consulting.