Profile Picture

Dr. Anju Jose Tom

Data Science Expert | Principal Data Modeler | Machine Learning & AI Governance | Big Data Analytics

Subject Matter Expertise in Machine Learning & Data Science | Ethical AI at Scale

I am a Data Science Leader and Principal Data Modeler at Queensland Treasury, driving data-informed policy, financial governance, and compliance automation. My expertise bridges data science, machine learning, predictive analytics, and government analytics, with proven experience in AI at scale on sensitive government data through the responsible and ethical use of AI, underpinned by strong governance knowledge. I have delivered solutions across multiple technical platforms including Azure, DevOps, Synapse, Databricks, Power BI, SQL, and SAP HANA.

I specialize in building scalable data solutions, integrating machine learning models, and driving data-driven policy insights. My work involves translating business requirements involving large and complex datasets into actionable intelligence, supporting policy decisions, and ensuring data integrity in enterprise-scale systems. I thrive in solving complex data challenges, collaborating across teams, and creating solutions that enhance efficiency, compliance, and policy effectiveness. I also bring proven expertise in rule-based AI model development at scale on sensitive big data.

With a PhD in Computer Vision specialising in Machine Learning, AI, and Data Analysis at the National Institute of Technology Calicut, under the supervision of Dr. Sudhish N. George (awarded Best PhD Thesis in Kerala and ranked 4th best in India, 2020), I progressed into the role of a technical expert in government analytics. My career spans from being a key scientific lead at the world-renowned INRIA (French National Institute for Research in Digital Science and Technology) for the nationally important AI project *Data Repurposing (DARE)*, supervised by Dr. Thomas Maugey , followed by leading AI-driven defence projects at QUT with the Dept of Defence and industry partner Revolution Aerospace, under the supervision of Dr. Clinton Fookes , and finally advancing to a leadership role as Principal Data Modeller at Queensland Revenue Organization (QRO), Queensland Treasury.

I have also been recognised by the Australian Government through the prestigious Global Talent Visa program, awarded for an internationally distinguished record of exceptional professional achievements and contributions to the data science community.

Expertise

Certifications & Training

Databricks / Uplimit

Google Cloud / Coursera

These certifications complement my project experience, providing hands-on expertise in scalable pipelines, advanced analytics, ML deployment, GenAI applications, and Responsible AI frameworks—ensuring solutions are impactful, ethical, and trustworthy.

Projects & Impact

1. Machine Learning Compliance Model for government home exemption scheme

Designed and implemented the first machine learning–driven compliance model in the department for a major Home Exemption scheme, leveraging multi-agency historical data and advanced feature engineering. The solution achieved an impressive 98% accuracy in predicting and validating exemption claims, significantly improving the efficiency of compliance checks. This initiative was highly recognised by leadership as a breakthrough in applying AI for compliance analytics inside the organization, demonstrating how data science can drive fairness, transparency, and automation in regulatory processes. Due to the confidential nature of government compliance data, further details on methodology and datasets are not disclosed, but the project highlights the transformative role of AI in public sector governance.

My role included building the proof-of-concept, collaborating with stakeholders to gather requirements, and presenting results to leadership. I also developed Natural Language Processing (NLP) bulk solution pipelines to process data from over a bulk number of confidential SharePoint case folders, extracting insights from more than 29,000 official PDF and Word files.

2. Cognitive Electronic Warfare (C-EW) Project (Dept of Defense – SME Revolution Aerospace – QUT

Low-Cost Cognitive Electronic Warfare (C-EW) System – A $1.5M Defence research project focused on developing advanced deep learning and cognitive AI systems for next-generation electronic warfare. I contributed both as a developer and project manager, leading algorithm design and prototyping, managing research objectives, coordinating with chief investigators, and mentoring junior researchers. My role also covered executive reporting, stakeholder engagement, and milestone delivery, ensuring technical progress aligned with strategic Defence goals. I represented the project team at international forums, including the Eurosatory Conference in Paris, showcasing its innovation in applied AI for Defence. Due to the classified nature of datasets and development details, further specifics cannot be disclosed, but the project demonstrated how deep learning and cognitive AI systems can transform electronic warfare capabilities.

3. RAG Chatbot Project

As an AI & Data Consultant with strong foundations in GenAI, data platforms, and domain-driven problem solving, I built a Retrieval-Augmented Generation (RAG) chatbot that can answer questions from large documents such as Wikipedia articles or Organization reports/Policy documents. This project demonstrates my ability to combine LangChain, FAISS, and Hugging Face embeddings with advanced LLMs like FLAN-T5-large to deliver accurate, context-grounded insights.

4. Data Repurposing (DARE) Project – French National Institute for Research in Computer Science and Automation (INRIA), France

Data Repurposing (DARE) Project – A two-year French Govt funded research and development initiative developing new compression paradigms for large-scale image and video databases. [Project Report] | [DARE Team Members]

As a Postdoctoral Fellow and Project Manager, I coordinated research delivery, liaised with multiple stakeholders, and ensured strategic alignment. This role gave me deep experience in international project management, cross-cultural collaboration, and delivery of high-impact outcomes.

5. Trust Deeds to Structured Data – Databricks Project from publically available Trust deed dataset

Led the development of a Databricks-based data pipeline to convert unstructured legal trust deeds into structured, queryable data. This project demonstrated how Delta Lake, Spark SQL, and NLP pipelines can extract key clauses, parties, and compliance indicators from thousands of legal documents.

The solution enabled automated compliance checks, improved audit readiness, and allowed treasury/legal teams to run analytics on previously inaccessible data. It highlights my expertise in combining cloud-scale data engineering with AI-driven document intelligence to solve real-world regulatory problems.

6. Charitable Institution Information Extraction – Spark NLP Project

Designed a large-scale Spark NLP pipeline to extract structured insights from over 29,000 PDF and Word files contained in 4000+ confidential SharePoint case folders. These documents represented notices of registration for charitable institutions, containing critical compliance and governance information.

Using Databricks and Spark NLP, I developed an automated workflow to extract and standardize key fields such as registration details, dates, and entity names. This project accelerated compliance auditing, enabled faster data-driven policy decisions, and showcased my ability to operationalize AI at scale on sensitive government data.

7. Visual Search Engine using Semantic Cosine Similarity (GenAI Project)

This work is part of Generative AI , showcasing innovation in embeddings, semantic similarity, and user-preference modeling. I developed a semantic similarity embedding and sampling algorithm for large-scale image datasets that dynamically integrates user preferences with representation learning. By leveraging cosine similarity and transfer learning from nine pre-trained models (ResNet, DenseNet, EfficientNet, MobileNet, Xception), the system enables personalized visual search and efficient semantic sampling. Experiments on the COCO dataset highlight how transfer learning and user-centric weighting can drive accurate semantic search and efficient resource utilization.

Visual Search Engine Diagram

8. AI-Powered Identification of Residential Buildings on Farmland (Geospatial ML Project)

This project combined Queensland Building Footprints with satellite imagery to identify residential buildings located within farmland by analysing the surrounding greenery. Using vegetation indices such as NDVI, the workflow distinguished farmland areas with high vegetation cover from non-farmland residential zones, enabling classification of buildings based on their land context. The approach provided an efficient, scalable method to validate land use patterns, improve compliance monitoring, and support evidence-based decision making for planning and revenue assurance. The project was implemented with a sample dataset, demonstrating its potential to detect misclassified properties and strengthen policy enforcement across Queensland.

9. Land Valuation Predictions (ML Regression Project)

This project developed a machine learning–driven model to forecast land valuations across Queensland by integrating historical sales data, cadastral boundaries, property attributes, and economic indicators. Using advanced regression and ensemble techniques, the model captured spatial, temporal, and market trends to predict future land values with higher accuracy than traditional valuation methods. The solution enables government and policy teams to anticipate revenue shifts, guide fair taxation, and support urban planning, ultimately improving transparency, efficiency, and compliance in the land valuation process. The project was implemented with a sample dataset, demonstrating the capability of regression models to estimate property values effectively.

10. Data-Driven Credit Risk Assessment Using Machine Learning

This project built a machine learning model to predict bank loan eligibility by analysing applicant details such as income, credit history, employment type, loan amount, and collateral information. Using classification algorithms including Logistic Regression, Random Forests, and Gradient Boosting, the system identified key risk factors and generated accurate eligibility predictions. The model streamlined the loan approval process, reduced manual effort, and minimised default risks, enabling banks to make faster, data-driven lending decisions while improving customer experience and financial compliance. The project was implemented with a sample dataset, showcasing how classification models can effectively support credit risk assessment in real-world scenarios.

11. Intelligent Fiscal Forecasting with ML Ensembles

This project applied machine learning ensemble methods such as XGBoost, Random Forest, and Gradient Boosting to forecast transfer duty revenue by analysing historical transaction data, property attributes, and economic trends. The model captured complex patterns and provided more accurate revenue predictions than traditional methods, enabling fiscal planning, revenue assurance, and policy analysis. By enhancing foresight into revenue trends, the solution supports government agencies in strategic decision making and budget alignment. The project was implemented with a sample dataset, demonstrating the effectiveness of ML ensembles for financial forecasting.

12. Data-Driven Compliance Monitoring with Fraud Detection AI

This project developed an AI-based fraud detection system to identify suspicious transfer duty transactions by analysing attributes such as declared value, property type, transaction location, and payment patterns. Leveraging machine learning classification models, the system flagged high-risk cases with strong predictive accuracy, enabling early detection of fraudulent activity and supporting compliance teams in risk prioritisation. The approach improved operational efficiency, reduced financial leakage, and strengthened the governance framework for property tax administration. The project was implemented with a sample dataset, demonstrating the practical application of AI in fraud risk management.

Stakeholder Engagement

1. Govt Home Exemption ML Compliance Project – Queensland Treasury

I led end-to-end stakeholder engagement throughout the design and delivery of the a well known Home Exemption Machine Learning model within the Land Tax business line at Queensland Treasury. This included:

This engagement model ensured not only technical success (~98% accuracy across 3M+ records) but also business adoption and confidence in AI-driven compliance analytics inside Queensland Treasury.

2. Charitable Institution Information Extraction – Queensland Treasury

For the Charitable Institution Information Extraction Project, I partnered with the Royalties and Complex Assessments Division at Queensland Treasury to streamline compliance monitoring. My engagement role included:

This project demonstrated my ability to bridge AI-driven document intelligence with regulatory compliance objectives, earning trust and recognition from senior Treasury leadership.

3. Cognitive Electronic System Development (C-EW) Project – Department of Defence

In the $1.5M Cognitive Electronic Warfare (C-EW) Project, I engaged directly with Department of Defence representatives and acted as a mediator between all collaborators (Defence, Revolution Aerospace, and QUT). My responsibilities included:

This role highlighted my ability to manage multi-stakeholder engagement, resolve conflicts, and ensure delivery of high-stakes Defence AI research under strict timelines and governance requirements.

Innovation & End to End Solution Development Background

My field of interest includes computer vision, signal processing, and AI-driven automation, with contributions to projects funded by the Australian Department of Defence, INRIA (France), QT and QUT. I have published my innovative findings and projects widely in IEEE, Springer, and Elsevier journals, with 165+ citations and an h-index of 6. Recognitions include:

Recent Selected Publications of Innovative Projects (AI/ML/DL)

See full list on Google Scholar.

2023–2024

  1. Anju Jose Tom, Sudhish N. George, “Simultaneous Video Super-Resolution and Moving Object Detection from Low-Res Surveillance Videos,” IEEE ICME, 2023. [Paper]
  2. Anju Jose Tom, Thomas Maugey, “Image Embedding and User Preference Modeling for Data Collection Sampling,” EURASIP JASP, 2023. [Paper]
  3. Anju Jose Tom, Clinton Fookes, “Dynamic User Preference Modeling with Semantic Similarity Embedding Using Adaptive Representation Learning,” SSRN Preprint, 2024. [SSRN]
  4. Williams Jayalath, Anju Jose Tom, Terry Martin, Clinton Fookes, “Enhancing Emitter Localization Accuracy Through Integration of Received Signal Strength in Direct Position Determination,” IEEE SSP, 2023. [Paper]

2020–2022

  1. Tom Bachard, Anju Jose Tom, Thomas Maugey, “Semantic Alignment for Multi-Item Compression,” IEEE ICIP, 2022. [Paper]
  2. Anju Jose Tom, Sudhish N. George, “A Three-Way Optimization Technique for Noise Robust Moving Object Detection Using Tensor Low-Rank Approximation, l1/2 and TTV Regularizations,” IEEE T-Cybernetics, 2019. [Paper]
  3. Madathil B., Sagheer S.V.M., Rahiman V., Anju Jose Tom, Francis J., George S.N., “Tensor Low-Rank Modeling and Its Applications in Signal Processing,” arXiv preprint, arXiv:1912.03435. [Preprint]

2018–2020

  1. Anju Jose Tom, Sudhish N. George, “Simultaneous Reconstruction and Moving Object Detection for Wireless Multimedia Sensor Networks,” IEEE TIP, 2020. [Paper]
  2. Anju Jose Tom, Sudhish N. George, “Video Completion and Simultaneous Moving Object Detection for Extreme Surveillance Environments,” IEEE SPL, 2019. [Paper]
  3. Anju Jose Tom, Sudhish N. George, “Tensor Total Variation Regularized Moving Object Detection for Surveillance Videos,” IEEE SPCOM, 2018. [Paper]
  4. Shijila B., Anju Jose Tom, Sudhish N. George, “Simultaneous Denoising and Moving Object Detection Using Low-Rank Approximation,” Future Generation Computer Systems, 2019. [Paper]
  5. Shijila B., Anju Jose Tom, Sudhish N. George, “Moving Object Detection by Low Rank Approximation and l1-TV Regularization on RPCA Framework,” Journal of Visual Communication and Image Representation, 2018. [Paper]