Measuring Impact Through Innovation: A Follow-Up

In our previous post, we gave a short introduction about how AI is unlocking new opportunities for supporting MSMEs, and rethinking opportunities for impact measurement. As a participant in YBI's High-Flyer program, we are seeking to deliver and scale our support to MSMEs, and are seeking to follow IPA's Best Bets for Emerging Opportunities for Impact at Scale as guiding principles in our work with these new technologies.

*Click here to read the article on the genesis for this experiment for additional background.

Our goal for this experiment at ONOW AI is to find a way to increase business profitability through a direct, ongoing, user-friendly tracking system for MSMEs. This system avoids traditional ledger-based methodologies and uses an AI to guide and structure a conversation in a more familiar format for the millions of businesses that find existing tools too difficult to use or of limited value to their daily operations. It also provides automated data-driven insights from data that is extracted and structured.

If you would like to try the live data prototype tool for yourself, you can test it for yourself. However, please note that it is far from complete, and an extremely early prototype as we progress along the journey. We've also deployed it on a low-cost server to save our limited resources, and it likes to take a nap every 30 minutes. So if you have issues, please refresh your page and it should awake from its slumber!

Try the App for Yourself:

While the technical development is particularly exciting for our team, we're also seeking to share tools with real MSMEs to collect feedback, guide our next steps, and highlight new opportunities that we may otherwise miss. Read on for some of the exciting achievements, challenges, and the strides we've made in this innovative journey after sharing this early prototype with MSMEs in Myanmar.

Core Components of the Experiment:

With limited time and resources, we sought to test a few core components in an effort to test the feasibility of our ideas before moving forward. Each of these areas is primarily targeted at creating direct value for MSMEs with an eye towards increasing user adoption and retention in the future.

AI-Guided Chatflow
Automated AI Data Structuring
Automated Data Dashboard

AI-Guided Chat

A familiar, chat-based conversation that begins by asking the user for a cash balance. From this starting point, the AI is tasked with guiding a conversation that seeks to understand the activities and dollar values that led to this change.
For example, if cash increased from $2000 to $2500 shown below, what happened in between that led to that outcome?
Our experience has shown that many micro businesses think in terms of cash-in-hand as a starting point. And this allows them to back into the transactions that led to their outcome without needing to precisely measure each transaction.
Compared to point-of-sale (POS) systems that begin from a foundation of transactions with true-ups at the end to match cash balances, this approach works backwards from the cash and automatically adds the balancing "other" categories for anything missing.

Automated AI Parsing and Data Structuring

After a conversation completes, a separate AI is tasked with sorting through the unstructured, and potentially noisy conversational data to format it into structured, usable data.
This process includes categorization for inflow/outflow (income/expense), category bucketing, value extraction, and timestamps.

Automated Data Dashboard and Actionable Insights

From the structured data, we're able to build automated dashboards and personalized, actionable insights that are founded in the financial data tracked by the individual.
Our belief is that tools like this can provide a valuable feedback loop for MSMEs to begin seeing the value of improving their financial tracking disciplines - because it directly influences day-to-day operations, and not some uncertain, future desire to access credit that may or may not come.
*Note: This process is extremely early in our development, with many potential progressions yet to come. Our primary aim with this experiment was to test if this was possible with the existing technology. Additionally, it was to test if users could see the potential value before we invested further time and resources in development.

Progress Made

Main Achievements

User Testing in Myanmar: We conducted a test with 30 business owners in Myanmar using our live data prototype. Despite the minimum explanation and guidance, 27 out of 30 participants showed interest in the tool. 7 used it to a point of giving feedback.
Positive Feedback on Dashboard Visuals: One business owner expressed her appreciation for the dashboard visuals, noting that it saved her time on analysis and data visualization while giving personalized guidance that was relevant and useful to her unique situation.
Potential for Adoption: Even with an incomplete UX, the results demonstrated that business owners are likely to use the tool if it offers automated dashboards and data-driven insights that can impact their business quickly. Our past work has demonstrated the importance of familiar tools like Facebook Messenger or WhatsApp to our target users. So we see this as an exciting validation for further investment in the tool.

"I love to see the dashboard visual. It saved me time in doing analysis and data visualization." - Test Group Business Owner

Challenges and Solutions

Early Stage AI Implementation: The prototype and AI are in its infancy, aiming for "good enough" performance to gather feedback rather than being production-ready. We achieved approximately 80% accuracy with this approach, sufficient for testing but not yet ready for production.
Resource Constraints: With a small team and limited resources, we developed the experiment over six weeks. The technical team took a full-stack approach, learning new coding skills, especially in front-end deployment. We find that this short and iterative cycle gives us the ability to "ship" something useful, if incomplete. Then we can test it with real users before determining next steps.
Training and Guidance Issues: Our Myanmar business coaches were not fully trained on the tool before the demonstration, leading to some confusion. This unintentional double-blind study approach still provided valuable insights, but can be greatly improved in the future.
Data Quality and Security Concerns: As with many data-driven applications, users were curious about the potential for errors or data privacy issues. Our future products will need to clearly address these areas from a quality and security perspective. 80% accuracy is okay for a prototype, but we will need far higher accuracy if we hope to help business owners on a day-to-day basis in a reliable way. We also aim to create user validation and correction tools that are missing from this iteration due to time constraints.

*In this example, the AI incorrectly duplicated the $100 found in the "Other" category. This was likely due to the complexity of the conversation involving inflows and outflows netting in different directions, and the AI providing a sub-summary in the conversation that caused confusion on the back-end. This is intended to provide an example of the areas for future improvement before we are ready for offering this solution to business owners.

Connecting the Dots

Real-World Testing

Rather than developing solely in-house, we are committed to iterative building and deploying in small increments with real users. This initial version was tested in English only due to resource constraints with non-native speakers in Myanmar. However, technological advancements such as Meta's Llama-3 and GPT 4o have been combined with our multi-modal approach with Machine Translation to allow users of many languages to benefit from these advancements in AI. We strongly believe that this real-world testing helps us to create a more impactful solution that leads to better user adoption, and ultimately better impact measurement for the ESO and economic development industry.

Tracking Impact

Impact measurement is crucial for organizations like Enterprise Support Organizations (ESOs) and funding bodies aiming to support small businesses and achieve the UN's Sustainable Development Goals (SDGs) related to economic development. Traditional tracking methods often rely on anecdotes or annual qualitative surveys, which provide little value to business owners. We have found that the automated, and structured data from this approach offers a new source of granular data, tracking impact at an individual level over time while delivering immediate value to the business owner, thus encouraging continuous engagement, and highly-valuable impact measurement over an extended time period. To be clear, this is designed to create new data sources that are specifically tailored for business improvement for the MSME, and impact measurement for the supporting organizations and funders.

User Experience and Validation

Despite some UX issues, we received validation from real users that an automated dashboard and advice system based on their inputs can be a powerful tool. This feedback reinforces the potential for MSMEs to regularly use our system to improve their businesses. Unlike other financial tracking systems and point-of-sale (POS) systems, our approach is built with a recognition that we should expect missing and incomple data. AI and LLMs are creating new opportunities to build resilience to this messy data and better align with the realities of the messiness of micro businesses in developing countries without forcing rigid tools upon them.

Some Possible Next Steps:

While many of the results have been extremely exciting, we're just scratching the surface of this opportunity. During development and user testing, we found a number of areas for improvement in the versions to come. While an exhaustive list is beyond the scope of this post, a few areas include:

A more fully-integrated product for MSMEs. This experiment resulted in more of a feature-test rather than a full product. Our future experiments aim to combine this functionality with tools such as personalized AI learning, connections with real and AI-based mentorship, and connections with more familiar tools like Messenger or WhatsApp.
An automated connection with supporting organizations. While we are aiming to create value for MSMEs directly, we recognize the value of improving impact measurement for corresponding support organizations. Stay tuned, as we are also working on creating a platform that connects MSMEs with support organizations to provide AI-guided mentorship and impact measurement.
A stronger AI-system using Agents. While this experiment used multiple features from platforms like LangChain that go beyond simple prompt-engineering, it stopped short of a full, agent-based approach that is likely to improve performance substantially beyond the 80% we see today. This is true for both the front-end, chat guidance, and the back-end structuring and parsing. It could also allow users to mix-and-match between balance-attribution and transaction-based methods of financial tracking.
Income Statement and Balance Sheet Tracking. While this current version stopped at simple cash-based tracking, we could extend the application to track basic account balances in an income statement and balance sheet. This would allow us to create more powerful tools like projections, pro-formas, and account for non-cash events like credit sales or purchases more accurately.
User-guided validation and feedback. Providing easy-to-use features for users to correct, flag, and re-enter data will be pivotal to user experience, model refinement, and usability.
Action-oriented guidance. The current data dashboard is likely not the best methodology for our target users, who often seek more action-oriented and pre-interpreted advice. More work will be needed on the output to ensure high-quality and dynamic advice for users of many business types.
Additional data sources and modalities. The current method accepts text-only input, but it's easy to imagine a system that allows users to input data through text, image, or audio. Or for external data sources to feed and augment the experience. Tools like GPT-4o are making this easier and more accessible to our small team by the day. Our current aim is to prioritize cash-based users that likely don't have access to formalized receipts or other tools that are recognized by optical character recognition (OCR) systems. However, many of our target languages have a tendency to be higher users of speech-to-text, and this will be incredibly valuable for users in the future.
LLM Model Refinement. While our current solution mixes a variety of models including GPT-4o and Llama 3, we would like to further improve and refine our solution to balance the tradeoffs between Accuracy-Cost-Speed to provide a better output and user experience.
Scalable Business Practices. While models like GPT-4o are quick and easy to use, they can become expensive when projected to our target reach of millions of business owners worldwide. As we continue to assess product-market-fit, it is likely that we will continue to move into more open-sourced LLM models such as Meta's Llama series to make the solution more viable for our target audience.

*An example of the data structuring power of GPT-4o. There are many systems that can integrate this data already. So our primary aim is not to duplicate those systems, but rather to provide multiple data entry styles for our target business owners that exist in a messy world with messy data.

Conclusion

Our journey to create a user-friendly, AI-driven financial tracking tool is early, but we're progressing in exciting ways. By enhancing our chat-based, AI-driven data collection, improving user experience, we're trying to build a product that MSMEs are excited to use - and a product that creates a valuable source of impact measurement. Stay tuned for more updates as we continue to progress and refine our approach.

What are your thoughts on our progress? Do you have any suggestions or questions? We'd love to hear from you! Please share your comments below.