Experience#
Notable consulting projects#
The projects below represent a selection of my consulting work from 2019–2026. They are not listed in chronological order and do not capture every engagement. Together, they highlight the flexibility, ownership, and delivery focus I bring to my work.
Monitoring and Assurance for HK Development Bureau#
Context: The Government of Hong Kong oversees over 500 active capital projects, spanning various infrastructure and development initiatives. To manage the large volume of data and complexities, there is a critical need for advanced automation tools to detect anomalies and outliers. These tools will enable timely issue identification and quicker intervention.
Contributions: As a Data Scientist, I developed and implemented a new set of forecasting models that leverage historical data on projects with varying durations and budgets for training. Our simulations showed that these new models reduced errors in cost and schedule overruns by over 20%.
Client Feedback:
"Can we extend the models to offer additional capabilities?" — DEVB management
LLM-Powered Data Analyst for Rice University#
Context: An LLM-powered data analyst enhances decision-making by quickly analyzing complex datasets, revealing patterns, and delivering actionable insights. This boosts efficiency, minimizes manual effort, and democratizes access to advanced analytics, resulting in more informed and timely decisions. The HR department at Rice University has shown interest in implementing such a system.
Contributions: As a Data Scientist, I created an agent-based system to translate natural language questions into SQL queries by integrating large language models (LLMs) with our database schema and data distribution. However, due to the limitations in LLM accuracy, we shifted to a simplified schema incorporating common table expressions (CTEs) to establish better control. This change significantly improved the neural typeahead and SQL generation for data analysts, resulting in more practical and high-quality outcomes.
Client Feedback:
"Amazing job, everyone!" — Vice President of Technology Transformation and Innovation
AI in Smart Contracts for Decentralized Finance#
Context: Smart contracts can utilize the outputs of AI models to orchestrate and direct on-chain transactions. Extending the trust modeling of blockchain, how can you verify that the outputs from a machine learning inference API are trustworthy and produced by the model you're paying for, rather than being generated by a less expensive model or fabricated?
Contributions: As a Data Scientist, I contributed to the development of a robust verification framework for authenticating machine learning predictions, addressing the issue by generating cryptographic fingerprints at key inference stages. Through a concrete demonstration of the prover and verifier protocol, I showcased its applicability to various ML inference processes, including ensembles, agentic workflows, streaming, and batching of inferences.
Client Feedback:
"Would you like to join us as Chief AI Scientist?" — CEO
Railway Construction Cost Forecasting#
Context: High Speed 2 (HS2) is a high-speed railway which is under construction in England. The project has been become controversial due to delays and cost increases, with cost projections in the £35-45bn range. Planners need up to date estimates for monitoring and assurance.
Contributions: As a Data Scientist, I have trained forecasting models by leveraging empirical data from both current and past projects, reporting the forecasted project progress during mid-delivery and comparing the results with existing Reference Class Forecasting (RCF) estimates. These forecasts are being used by the senior leadership to direct future strategic moves.
Client Feedback:
"Thanks to Michele and the team for preparing the work; it presents a powerful new way of forecasting." — Risk Management Lead
Recommendation Systems for Researchers#
Context: Researchers must figure out what are the most relevant journals to publish their work. Publishers are in the unique position to push recommendations.
Contributions: As a Data Engineer and Data Scientist, I researched and implemented a family of recommender systems (SQL-based probabilistic models) being used in 400 community pages and serving 5m researchers with a CTR of 5.6%. Recently, implemented improvements to the algorithm logic, with a CTR increase of 36%.
Client Feedback:
"Would you like to join us full-time?" — Head of Product, Researcher Profile
Text Classification for Regulatory Compliance#
Context: Local Planning Authorities (LPAs) in the UK rely on written representations from the community to inform their Local Plans which outline development needs for their area. With an average of 2000 representations per consultation and 4 rounds of consultation per Local Plan, the volume of information can be overwhelming for both LPAs and the Planning Inspectorate tasked with examining the legality and soundness of plans.
Contributions: Streamlining the text analysis for both Local Planning Authorities and Inspectors, using AI to pre-fill text categories & topics. Simulations demonstrated that the percentage of correct predictions increases from 61% to 89% as the ratio of verified labels is expanded from 10% to 50%, significantly increasing the planning effectiveness.
Client Feedback:
"We really got the maximum value for money out of this project" — Senior Data Scientist at the UK Planning Inspectorate
Performance Forecasting for Megaprojects#
Context: A megaproject is a large-scale, complex venture that takes many years to develop and build, involves multiple stakeholders, and impacts millions of people. Examples include building a new train station. Effective management requires monitoring progress, forecasting delays and costs, and making budget adjustments.
Contributions: Development of an AI-based early-warning system for spotting high-risk projects, including outlier prediction and forecasting future portfolio spending. The system uses data from a total of 2,700 years of combined construction activity, with an aggregate cash flow of USD 60bn.
Client Feedback:
"This is excellent." — Head of Data Science
Contact Center Forecasting for Furniture Online Shop#
Context: In contact centers, forecasting involves predicting future demand for communication channels like phone, chat, and email, as well as determining the number of agents required to handle that volume. The process relies heavily on analyzing past data on volume, which can reveal patterns such as seasonal fluctuations and long-term growth or decline.
Contributions: Research and deployment of a family of AI models to forecast the phone contact volume, reducing the mean absolute percentage error (MAPE) by 30% with an estimated business impact of USD 5m-10m in savings per year.
Client Feedback:
"I wholeheartedly endorse Michele and his efforts, as his contributions have been instrumental in the successful implementation and deployment of the forecast models." — Senior Manager, Machine Learning
Demand Forecasting for Textile Industry#
Context: Forecasting demand is crucial in manufacturing, particularly in the textile industry, to optimize production. The textile production process is intricate and requires careful planning and execution, taking into account factors such as staffing, logistics, and various stages of fabric production, including pre-treatments, dyeing, printing, and finishing. This process can take several months. Manufacturers aim to minimize delivery times and unsold stock to remain competitive, which means producing the appropriate quantity at the appropriate time.
Contributions: As a Data Scientist, I conducted a thorough examination of historical data points through exploratory data analysis. I identified data quality concerns and identified potential opportunities to integrate additional data sources to enhance the accuracy of the demand forecasting model. I developed, evaluated, and evaluated the model, and finally, presented the findings in a detailed report. The technologies used for this task were AWS Forecast, Jupyter Lab, Pandas, and Matplotlib.
Client Feedback:
"Thank you very much from the entire team for helping us." — CEO, Consulting agency
Machine Learning Workshop for Mobility as a Service Provider#
Context: Mobility as a Service (MaaS) encompasses a variety of digital tools aimed at streamlining transportation for both passengers and transport operators. These tools include trip planning, booking, ticketing, payment, and updates for passengers, as well as fleet management, demand forecasting, predictive maintenance, and optimized fixed and demand-responsive transit for operators. In order to facilitate the Company's transition to a MaaS provider for transport operators, I proposed and led a workshop on this topic.
Contributions: I led a three-day remote workshop for 20 participants, comprising of sales and management personnel, introducing them to the concepts of Big Data, including its motivations, architectures, and strategies for integrating with legacy systems. The workshop also covered the topic of Machine Learning, including challenges in implementing ML services and opportunities in the field of mobility. We delved into specific use cases of relevance to the Company. The workshop concluded with a discussion on best practices for managing data products.
Client Feedback:
"Clear overview of ML methods with examples applied to our business." — Pre-Sales Analyst
Discovering and Mitigating Inaccuracies in Tax Filings#
Context: In certain financial services, such as tax filing, refunds are granted to customers based on their responses to questionnaires. However, these responses may not be entirely accurate or consistent, which could impact the refund calculations. To address this issue, mitigation strategies include reducing uncertainty through bounds, providing estimates with quality guarantees, automatically correcting inconsistencies, and manually reviewing and correcting any discrepancies.
Contributions: As a Data Scientist and Machine Learning Engineer, I assisted in defining the business problem, acquiring and integrating the training data, constructing predictive models, conducting experiments with domain experts for validation, and implementing the prediction APIs in production systems. The technologies used for this project were Google AI platform, Google App Engine, Flask, Kubernetes, Snowflake, Jupyter Lab, Scikit-learn, and various ML algorithms including regression, binary classification, multi-class classification, and quantile prediction.
Client Feedback:
"Would you like to join us full-time as Lead Data Scientist?" — Head of Data
Multi-Touch Marketing Attribution Modeling#
Context: An attribution model is a system of rules that assigns credit to various touchpoints in a customer's journey to conversion. These touchpoints may include clicking ads, visiting blog posts, using referral codes from influencers or partners, and organic search. Attribution models vary in the way they weigh the significance of different touchpoints, such as first-click, last-click, and multi-touch. Understanding a customer's behavior leading up to conversion is essential for marketing departments to effectively measure and optimize their operations and allocate their budget.
Contributions: As a Data Scientist, I developed a multi-touch attribution model that included data cleaning and integration, a tracking model for extracting information on converted visitors, an attribution model that assigns credit to visits and users, and marketing analytics to provide insights. The technologies used for this project were Piwik/Matomo data model, MySQL, PostgreSQL, SQLite, custom-built visit-user matching with data provenance, and handling of missing attribution data.
Client Feedback:
"Thanks, Michele – what are we going to do when you finish." — Senior Performance Marketing Manager
Optimization in Public Transportation#
Context: In urban public transportation, bus routes and schedules are frequently modified to align with service demand, fleet and driver availability. These changes are typically carried out through manual processes and rely on the expertise of individuals, which can lead to suboptimal decisions being made.
Contributions: As a Data Scientist and Transport Engineer, I designed a comprehensive solution that estimates demand, simulates fleet operations to analyze various scenarios, and determines optimal adjustments to the service. The solution aims to improve key business performance indicators such as network service costs and profits, driver mileage, passenger waiting times, and CO2 emissions. Technologies used in this project include SUMO, Spark, Origin-Destination estimation, and simulation-aided optimization.
Client Feedback:
"We are delighted to work with Michele, which is helping us with the preparation of grant proposals and prototyping." — CEO
Visual Analysis of Pharma Business Processes#
Context: Pharmaceutical companies are subject to stringent regulations regarding the handling of confidential information, product development, and management of medical trials. These processes can be complex, interconnected, and hard to grasp. Analyzing the recorded logs of business processes is a valuable tool that can reveal these complexities and provide insights that can be used to drive automation and optimization through analytical services.
Contributions: As a Data Scientist, I developed an application component that allows for the visualization of business processes and emphasizes different areas of interest. This component has been integrated into a comprehensive Business Intelligence dashboard. The technologies used in this project include Node.js, Vue, TypeScript, and yFiles for diagramming.
Client Feedback:
"That's it for me, well done you." — Freelance Data Scientist with 25+ years of experience in consulting
Public Health and Contact Tracing for COVID-19#
Context: Contact tracing, when implemented consistently, can interrupt the spread of infectious diseases and is a crucial public health strategy for managing disease outbreaks. By integrating mobile phone usage data with other information, public health organizations can assess the situation on a country-wide scale, detect potential risks such as crowded areas and track the contacts of potentially infected individuals.
Contributions: As a Data Scientist and Research Engineer, I evaluated the methodology and algorithms of an existing contact tracing system. Based on my findings, I provided a series of recommendations that have been implemented, resulting in improved performance and more accurate outcomes. The technologies used in this project include ElasticSearch, Python, Pandas, data cleansing, and trajectory mining algorithms.
Client Feedback:
"Thank You, Michele!" — CEO
Adaptive Traffic Signal Analysis & Optimization#
Context: Urban traffic modeling and analysis is a critical aspect of traffic management and control. Its goal is to anticipate congestion states and suggest improvements to the traffic network. One such improvement is traffic signal control, which aims to reduce the travel time of vehicles by coordinating their movements at intersections.
Contributions: As a Data Scientist and Research Engineer, I constructed an urban traffic simulation model utilizing noisy sensor measurements and used reinforcement learning techniques to determine the best traffic light plan. The technologies used in this project include Flink, Kafka, SUMO microscopic agent-based traffic simulator, Java, Python, Jupyter Lab, NetworkX, probabilistic modeling methods, and simulator-in-the-loop optimization strategies.
Client Feedback:
"We truly appreciate the effort that Michele put into our collaboration. He is a skilled data scientist that gets things done. We will definitely consider him for other projects in the future." — Principal Research Engineer and Team Lead
City-Scale People's Movement Analytics#
Context: Mobile devices establish regular connections to the cellular network to receive messages, make calls, and transfer data. The logs generated by these connections offer a detailed view of the population's movement in urban areas and can be used for a variety of purposes, such as optimizing out-of-home advertising.
Contributions: As a Data Engineer and Data Scientist, I designed and developed a GDPR-compliant analytics service that offers population demographics information for urban areas by analyzing people's location and movements using cellular network data. The service ingests billions of records per day. The technologies used in this project include AWS EMR clusters, AWS S3, AWS Lambda, AWS CloudWatch, HDFS, Parquet, Hadoop, Spark, Scala, Python, Zeppelin notebooks and Docker.
Client Feedback:
"Michele is a very talented data scientist with excellent data engineering skills. He contributed several fundamental components of our location intelligence platform. I highly recommend Michele and would love to get the chance to work with him again." — Data Science Capability Manager for ITS Data Lab, Siemens Mobility
Location Intelligence for Retail Analytics#
Context: WiFi radio signals emitted from devices such as smartphones can be used to determine their location within indoor environments. This data can be used in retail analytics to identify high-traffic areas, evaluate the success of promotions, and inform the placement of products for improved sales.

Source: https://www.telefonica.de/news/press-releases-telefonica-germany/2017/05/advanced-data-analytics-for-a-better-shopping-experience-telefonica-next-takes-over-start-up-minodes.html
Contributions: As a Senior Data Engineer and later Lead Data Scientist, I restructured the WiFi analytics pipeline using Spark jobs, resulting in more efficient data and ML pipelines and an improvement of double-digit percentage accuracy. Technologies used included PostgreSQL, Airflow, Celery, Cassandra, AWS S3, Python Docker, JupyterLab, Scikit-learn, Pandas, Fabric, and Flask.
Client Feedback:
"Michele is a dedicated and thoughtful data scientist. It was enjoyable to see how well he handled tasks and produced excellent, dependable, and repeatable results. Michele is a valuable asset to any company. I am pleased that they succeeded in the Lead Data Scientist role." — Alexander Müller (Lead Data Scientist), now Founder & Managing Director Workist
Cloud Robotics & Drone Charging Stations#
Context: To function properly, fully autonomous robots need remote management services such as fleet management, predictive maintenance, and teleoperation, as well as the ability to charge autonomously for extended and unattended operations.
Contributions: As the founder and CTO, I oversaw the creation, development, and launch of charging stations and protective hangars for commercial drones, along with related management and connectivity services. My efforts resulted in contracts with over 30 international clients, including the CIA, NASA, NIH, Google X, Parrot, and Stanford University. Technologies used included ROS, Python, C++, Flask, Docker, Fabric, OpenCV, and Jupyter Lab.
Client Feedback:
"Michele perfectly fits dynamic managing positions, planning the work, coaching, and coordinating the team. I highly recommend him." — Advisor, CEO & Chief Data Scientist of a consulting agency
Ph.D. Research#
My research focused on modeling and querying data with uncertainty.
Context: As a Ph.D. student, I co-authored ten papers in top-tier international conferences and journals, including SIGMOD, VLDB, EDBT, KAIS, and DKE. I worked on problems related to the management and analysis of streaming and uncertain data. Moreover, I established collaborations with prominent research groups in the area of data management, namely, with the data management and data analytics groups at IBM T.J. Watson Research Center (USA), where I spent six months (during two visits) as a visiting researcher, and the Qatar Computing Research Institute (Qatar), where I spent another three months.
Contributions:
- "Improving Classification Quality in Uncertain Graphs". Michele Dallachiesa, Charu Aggarwal, Themis Palpanas. ACM Journal of Data and Information Quality (JDIQ), 2018.
- "Similarity Using Correlation-Aware Measures". Katsiaryna Mirylenka, Michele Dallachiesa, Themis Palpanas. International Conference on Scientific and Statistical Database Management (SSDBM), Chicago, USA, 2017.
- "Correlation-Aware Distance Measures for Data Series". Katsiaryna Mirylenka, Michele Dallachiesa, Themis Palpanas. International Conference on Extending Database Technology (EDBT), Italy, 2017.
- "Top-k Nearest Neighbor Search In Uncertain Data Series". Michele Dallachiesa, Themis Palpanas, Ihab F. Ilyas. Proceedings of the VLDB Endowment (PVLDB) Journal 8(1), 2015.
- "Sliding windows over uncertain data streams". Michele Dallachiesa, Gabriela Jacques-Silva, Bugra Gedik, Kun-Lung Wu, Themis Palpanas. Knowledge and Information Systems (KAIS) Journal 45, 2015.
- "Node classification in uncertain graphs". Michele Dallachiesa, Charu Aggarwal, Themis Palpanas. International Conference on Scientific and Statistical Database Management (SSDBM), Aalborg, Denmark, 2014.
- "NADEEF: a commodity data cleaning system". Michele Dallachiesa, Amr Ebaid, Ahmed Eldawy, Ahmed Elmagarmid, Ihab Ilyas, Mourad Ouzzani, and Nan Tang. ACM SIGMOD International Conference on Management of Data (SIGMOD) – New York City, NY, USA, 2013.
- "Identifying Streaming Frequent Items In Ad-hoc Recent Time Windows". Michele Dallachiesa and Themis Palpanas. Data & Knowledge Engineering (DKE) Journal – 2013.
- "Uncertain Time-Series Similarity: Return to the Basics". Michele Dallachiesa, Besmira Nushi, Katsiaryna Mirylenka, Themis Palpanas. Proceedings of the VLDB Endowment (PVLDB) Journal, 2012, Turkey.
- "Similarity Matching for Uncertain Time Series: Analytical and Experimental Comparison". Michele Dallachiesa, Besmira Nushi, Katsiaryna Mirylenka, Themis Palpanas. ACM SIGSPATIAL International Workshop on Querying and Mining Uncertain Spatio-Temporal Data (QUeST), Chicago, USA, 2011.
Advisor Feedback:
"Michele is able to grasp new ideas and concepts, and in general, learn fast. The leading abilities of Michele are evident in his work: he is capable of independent thinking, and of delivering novel and effective solutions to hard problems. He combines a very solid theoretical background with excellent practical skills." — Advisor, Prof. Themis Palpanas, Paris Descartes University