Tyler Johnson Tyler Johnson

Breaking Down Data Silos with a Data Fabric, or not? A CIO's Guide


In today's fast-paced business environment, enterprise CIOs face the persistent challenge of operational and data silos. These silos segregate data, creating barriers that inhibit its seamless flow across departments, systems, and business units. Traditionally, organizations have tried to dismantle these silos through various change initiatives, aiming to foster a data-driven culture, implement enterprise-wide data integration strategies, encourage cross-functional collaboration, and promote data governance practices. However, these projects often fall short of their goals, proving difficult to complete and leaving organizations with lingering inefficiencies.

It's essential to recognize operational and data silos as forms of operational debt and data debt, closely related to the concept of technical debt.


Just like with technical debt, it's generally more effective to bypass operational and data debt rather than mitigate them, unless there are additional business reasons to rework data and applications (like enhancing the customer experience) This is where a data fabric can be a game-changer.

A data fabric creates a hybrid integration layer that ensures changes in one system do not disrupt others. This innovative approach eliminates the need to break down operational and data silos solely to obtain actionable training data for Large Language Models (LLMs). By implementing a data fabric, organizations can streamline their data integration processes, enhance data accessibility, and foster a more agile and responsive business environment.

Key Benefits of a Data Fabric

  1. Seamless Data Integration: A data fabric enables the seamless flow of data across disparate systems, providing a unified view of information. This integration is crucial for CIOs looking to harness the full potential of their organization's data assets.

  2. Reduced Operational Complexity: By bypassing the need to dismantle silos, a data fabric reduces operational complexity and minimizes the risk of disruptions. This approach allows for smoother transitions and more efficient data management.

  3. Enhanced Data Governance: Implementing a data fabric supports robust data governance practices, ensuring data quality, security, and compliance. This is particularly important for organizations dealing with sensitive information and regulatory requirements.

  4. Agility and Scalability: A data fabric provides the flexibility to adapt to changing business needs and scale operations efficiently. This agility is vital for staying competitive in a rapidly evolving market landscape.

  5. Improved Decision-Making: With a unified and accessible data infrastructure, decision-makers can leverage real-time insights to drive strategic initiatives and make informed choices.

Implementing a Data Fabric: Best Practices

  1. Assess Your Current Data Landscape: Conduct a thorough assessment of your existing data infrastructure to identify silos and areas of improvement.

  2. Define Clear Objectives: Establish clear goals for your data fabric implementation, aligned with your organization's strategic priorities.

  3. Invest in the Right Technology: Choose a data fabric solution that fits your organization's specific needs and integrates seamlessly with your existing systems.

  4. Foster a Collaborative Culture: Encourage cross-functional collaboration and buy-in from all stakeholders to ensure successful implementation and adoption.

  5. Monitor and Optimize: Continuously monitor the performance of your data fabric and make necessary adjustments to optimize its effectiveness.

Conclusion

For enterprise CIOs, overcoming the challenges posed by operational and data silos is crucial for driving innovation and achieving business goals. A data fabric offers a powerful solution, enabling seamless data integration, reducing operational complexity, and enhancing data governance. By implementing a data fabric, organizations can unlock the full potential of their data, fostering a more agile, efficient, and responsive business environment.

Embrace the power of a data fabric to transform your organization's data strategy and pave the way for a data-driven future.

Read More
Tyler Johnson Tyler Johnson

Using Generative AI with Clean Data to Survive in Shark Infested Waters: Culture and Innovation (Part 4)

In part 4 of this blog post series, we discuss how culture and innovation can either drive or block adoption of data driven Generative AI at exponential scale and what to do about it.

Introduction

Innovation implies change, and this change can often be disruptive to stable organizations, teams, partner networks and other business/social ecosystems. The rise of Generative AI with a Data Fabric built according to Lean and Agile principals is just such a disruption.

Just like in natural ecosystems, a culture of adaptability becomes more advantageous in volatile business/social ecosystems. Conversely, being highly adaptable can hinder the ability to take advantage of stability. As a result, those who have been benefiting from a culture that reinforces stability will ultimately lose out when confronted by more adaptable and agile counterparts better able to take advantage in advancements in data integration and AI unless leadership is able to make their culture readily embrace adaptability and a tolerance for risk.

Considering that teams, organizations, and entire economies in business can be described as business/social ecosystems, there are parallels between workings of biological ecosystems. Exploring these connections can deepen our understanding of how changes unfold in ecosystems in general. This understanding can be crucial in discerning how cultural shifts and advancements in technology impact the management and utilization of data for Artificial Intelligence (AI) applications. It can also offer valuable insights into how and when businesses integrate new data driven AI capabilities within their operational frameworks.

The inflection point for biological and business ecosystems

When stable ecosystems face disruptions either from external factors or the internal evolution of populations, they attempt to find a new balance. This transition, however, is not a gradual one. As the disruptive process unfolds, the ecosystem reaches a tipping point, leading to an exponential acceleration of change. Once this tipping point is reached, equilibrium is lost, and populations within the ecosystem rapidly decline due to the disruption as they are replaced by more nimble or resilient competitors.

Consider the example of coral reef ecosystems. As ocean temperatures continue to rise, expansive reef systems like the Great Barrier Reef are currently in the process of and have already undergone significant decline. I personally witnessed this change a couple of years ago when my family visited the Alligator Reef lighthouse in the Florida Keys. Having snorkeled there frequently in the mid-1990s, I could clearly see how the coral reef had transformed over the past twenty-five years.

The contrast was stark: the once abundant sea fans, bustling schools of fish, and thriving live coral had given way to a few remaining sponges, a sparse population of fish, and a seafloor scattered with bleached coral remains. Despite this disheartening reality, there is a glimmer of hope, as coral reefs are gradually starting to regenerate. Nevertheless, this regeneration process is expected to take millions of years without intervention.

Marine ecologists, such as Nichole Price of the Bigelow Laboratory for Ocean Sciences in Maine have documented the migration of coral species towards latitudes between 20 and 35 degrees north and south of the Equator, driven by warming ocean temperatures. Concurrently, other researchers, like those at the Lirman lab at the University of Miami, are cultivating corals with genetic traits that enhance their adaptability to higher water temperatures and pollutants.

In the coral reef ecosystem, it is the specific coral species and their populations that demonstrate higher adaptability—through migration and resistance to elevated water temperatures—that are surviving compared to the previously larger but less adaptable populations. This is because there is an energy cost associated with adaptability in stable biological ecosystems, which limits the success of more adaptable organisms during times of equilibrium and provides them an advantage only when the ecosystem encounters disruptive inflection points. The same is true of social ecosystems as well.

66 million years ago – After benefiting from 165 million years of stability, dinosaurs became extinct while adaptable mammals thrived.

This dynamic of adaptability and its cost exists across various levels, from teams and business units to entire industries and the human population as a whole. For instance, the value placed on traits and behaviors demonstrating adaptability becomes more apparent when faced with an existential threat to an entire species.



In social ecosystems, inflection points are driven by fear and greed

Inflection points within all ecosystems follow an exponential pattern. In biological ecosystems, these points typically indicate an exponential surge in the rate of change among one or more species' populations. However, in social ecosystems, inflection points manifest differently, marking shifts in behavior. Above all, behavioral shifts stem from either a significant event, such as the Attack on Pearl Harbor, or a culture of innovation, as seen during the early stages of companies like Apple, Facebook, and Netflix, which were born out of such events, like the inception of a new company or market entry.

Beneath these transformations lies fear —fear of external threats, the fear of missing out (FOMO), the fear of defying cultural norms and losing organizational support, the fear of lagging behind competitors due to insufficient innovation, the fear of being perceived by others as less than capable. And greed – the desire to dominate new markets as they emerge and existing markets as they are disrupted. Exponential growth in awareness in social ecosystems creates both fear and greed in the minds of participants and drives the behavioral changes indicative of an inflection point..

The inflection point in Generative AI is not just about the exponential rise in awareness, it’s about how that awareness creates the emotions. It is fear and greed that ultimately drive greater adoption and technological advancement. It encompasses the exponential growth of FOMO, the fear of job displacement due to AI, and the apprehension about losing personal and/or organizational competitiveness. Consequently, an increasing number of individuals are actively incorporating AI into their daily lives, initiating an exponential transformation in the human social ecosystem. This surge in AI usage attracts substantial funding for the development of new AI capabilities, leveraging AI as a competitive advantage that fuels additional revenue and profits.

While some may argue that excitement serves as an intrinsic catalyst for behavioral change, this notion primarily applies to a small percentage of individuals within most social ecosystems. Consider the Technology Diffusion curve, as popularized by innovation pioneer Geoffrey Moore, which emphasizes that only 2.5% of the total population typically leads such changes.

The situation is even more challenging in the realm of Information Technology (IT). IT leaders are typically selected for their ability to maintain the stability of systems, such as overseeing upgrade projects and IT business systems, rather than fostering disruptive innovation. Consequently, excitement alone cannot drive significant innovation and change within organizations, unless the majority of organizational members are carefully chosen from a pool of innovators, as often seen in startup environments. This is a key reason why large corporations, government entities, and other major organizations rarely spearhead disruptive changes. Instead, they tend to evolve gradually and incrementally over time unless catalyzed by a significant triggering event.


The rise of AI, Lean Data, and the Data Fabric

Just as external forces can disrupt ecosystems, evolutionary changes to behaviors of one or more participants can disrupt other participants as the ecosystem attempts to restore equilibrium. The rise of Artificial Intelligence is just such a change. While one might argue that AI is an evolution in humanity itself, that is not what I mean and is a question for another day. What I mean is that the use of AI by individuals represents an adaptive behavior that is in the process of disrupting all levels of social ecosystems as AI technologies evolve and adoption increases.

In biological systems, a brain, functioning as a natural form of intelligence, requires a nervous system to connect and process sensory information and carry commands to the body’s muscles and organs. Similarly, an AI instance requires a “digital” nervous system that pre-processes and delivers clean data to AI instances in a way that is secure and compliant with data privacy requirements. AI also needs a way to issue commands to digital and physical systems for automation and other use cases. And all of this must occur at an exponentially higher speed and scale than traditional approaches to data integration. The Data fabric (at least how I define it) is exactly that, a digital nervous system for AI.

But for Generative AI to reach its potential with sensitive enterprise data, data pipelines need to be exponentially quicker and cheaper to build and maintain. To make that possible, building a data fabric using the principles of Lean Data is critical.

Just as Lean Manufacturing principles help streamline manufacturing assembly lines, Lean Data (See my blog post on Lean Data here) is a set of principles and processes that accelerates the rate of change in building pipelines that deliver data to and from AI instances.

Generative AI’s inflection point

Since the beginning of 2023, I've had numerous discussions with technologists, CTOs, researchers, and other professionals concerning the potential disruption brought about by AI. Despite the customary excitement surrounding emerging technologies (anyone remember the Blockchain hype?), many remain doubtful about the recent buzz around AI, often pointing out that AI has been in existence for decades. What sets the current state of AI apart, however, is the increased awareness among the general public (not just data scientists) about the practical applications of AI in their daily lives. Consequently, the use of generative AI tools such as Chat GPT are experiencing exponential growth in usage - an inflection point to be sure. This is the AI inflection point where the disruption of roles, companies, industries and societies begins in earnest.

In biological ecosystems, an inflection point occurs when an exponential function, such as birth or death rates, reaches a critical threshold, causing the growth rate to dwindle and eventually turn negative, leading to an accelerated decline in the population. In the case of coral reef ecosystems, reef structures begin to deteriorate due to a higher rate of coral organism deaths (bleaching) compared to the rate of new coral growth. This triggers an exponential process where the dwindling number of organisms are unable to reproduce, while a growing percentage of coral organisms perish due to ecosystem stress.

Similarly, in social ecosystems, a parallel exponential process unfolds, but in the opposite direction. Just as elevated water temperatures have triggered an inflection point (collapse) in coral reef ecosystems, the perception of AI as a disruptor to the existing order has prompted a shift in people's behaviors, leading to an inflection point in the disruption of various organizations and industries that make up social ecosystems.

However, depending on how one defines the ecosystem and its constituent populations, several organizational and industry ecosystems are either nearing, have already reached, or have surpassed the disruptive inflection point of AI. Moreover, the use of AI is not a singular point of disruption, as various types of AI exist at different stages of maturity and adoption. On the other hand, data fabrics have not undergone a similar inflection point as AI. This is not because exponential growth in the availability of clean data, serving as critical building block for AI-driven disruption, does not pose a threat to the stability of industries and organizations. Rather it is because the need for data fabrics built with the principles of Lean Data has not been broadly recognized by those charged with managing data because of the lack of a precipitating event, a lack of understanding of lean/agile principles by most data scientists and engineers, and because most data folks tend to be risk averse and linear in their thinking.

The Lean Data Fabric’s inflection point

In social ecosystems, the inflection point is when the realization of a disruptive change in a specific group reaches a stage where an increasing number of participants begin to take action based on that awareness, leading to exponential growth. Geoffrey Moore's concept of the "Chasm" in "Crossing the Chasm" precisely represents this inflection point, where the adoption of new technologies expands rapidly from "early adopters" to "mainstream adopters." Notably, this expansion pertains to actual users embracing the technology.

Regarding Generative AI, the term "user" now encompasses anyone with computer access and an internet connection. While it remains vital to apply Lean Data principles to construct Data Fabrics that can efficiently deliver clean data at an exponential scale as the Generative AI landscape evolves, the composition of the team is critical. Teams and organizations responsible for data integration primarily originate from the data science community, boasting expertise in probability and statistics, mathematics, data modeling, analysis, and artificial intelligence. However, they often lack exposure to operational disciplines such as Lean/Agile, IT operations management, and process automation. And most importantly, there is no precipitating event for most data science organizations to overcome the status quo and embrace a disruptive change in the way they manage data, at least not yet. Many data organizations might be talking about Data Fabrics, but they are looking at it as a sustaining innovation (as a technology) not as a disruptive innovation (as a new way of doing things). This will not work for most. Clayton Christensen, a thought leader on innovation discusses extensively in his writing that no technology is disruptive by itself. Rather, it is how the technology is employed, the business model, that makes it disruptive or sustaining. The value of the Lean/Agile Data Fabric is that it allows organizations to change the way data integrations are built – if organizations don’t change the way they operate, these efforts will fail to create the value business leaders are expecting from their data integration and AI initiatives. This is exactly why many IT organizations 20 years after the introduction of the principles of DevOps, the application of Lean/Agile to software development, still struggle with implementation. Similarly, the Data Fabric (implemented with Lean Data) represents the application of Lean/Agile to data integration. I expect the transformation in data integration to play out in a similar way, albeit in a compressed timeline given the fact that the nature of the need for data fabrics to support generative AI is a far more powerful precipitating event than the one for DevOps (user driven, more efficient software delivery).

Life in the food chain – Strategies for surviving and thriving

Many technology people fancy themselves as “disruptors”, but the reality is that technology people are just like everyone else. They’re raising families and building careers to create safety for themselves and their families and trying to gain recognition in their communities. Most people get jobs at seemingly stable companies where disruption experiences significant pushback.  But the winds of change are blowing, driven by advances in AI and other precipitating change events. It’s important to recognize that unless you are on the cusp of retirement, AI is going to disrupt your life and you will experience much greater success if you are the disruptor (not the disruptee). But when and how to go about this?

Situational Awareness

To begin, it’s critical to develop situational awareness of the stage of disruption for each AI and operational data component for each ecosystem you call home including your team, your greater organization, and your industry. Finding answers to the questions below is a great start, but you’ll also need to do your homework to understand the current state of AI, including what the possible use cases are, both current and emerging.

  • Which versions of AI, Data Fabrics, and Lean Data are reaching an inflection point in your ecosystems? You’ll need to understand this for each AI type/use case and supporting systems like Lean Data and technologies like the Data Fabric as well as how all those pieces fit together to enable business outcomes.

  • Are there disconnects where one technology component (e.g. Generative AI with enterprise data) has reached an inflection point, but supporting technologies (e.g. Lean/Agile Data Fabric) have not?

  • Team/Individual

    • How is your personal productivity being improved with the emerging technology?

    • Are others in similar roles outside the team using the new technology to greater or lesser effect?

    • Are other team members using the new technology to greater or lesser effect?

    • How resistant to change are other team members? Are they emotionally invested in the status quo?

  • Business unit/Company

    • Is the company reaching an inflection point? To what extent has fear/greed kicked in with respect to the broader leadership team in the adoption of AI for the company?

      • With AI. For the most part, business executives recognize that AI has to be part of their core business strategy, but that doesn’t necessarily mean right now.

      • With Lean/Agile Data Fabric. It’s very unlikely that senior leadership recognizes the fact that AI needs a better method of ingesting clean data if an AI strategy is to succeed, but there will be individuals (remember the 2.5%) who will that you will need to partner with.

  • Industry

    • Is the industry reaching (or has reached) a precipitating event where the entire industry is about to be (or is being) disrupted?

    • How would various combinations of Generative AI with Lean/Agile Data Fabrics negate the coming disruption or position your company (or team) to dominate through that disruption?

    • Does your team, unit and/or other company leadership understand the implications of the coming disruption and are they motivated to challenge the status quo as a result?

Develop your strategy – don’t push too far too fast

Now that you’ve established situational awareness, it’s time to get to work. This is where you put together your strategy to navigate around the defenders of the status quo (or convert them) and partner with the innovators. It’s critical to remember though that you can’t incentivize individuals or teams to act if they don’t possess both awareness of the benefits of the new technologies and methodologies AND are motivated to act even in the presence of the risk of change. Instead, the approach is to avoid allowing new technology adoptions being perceived as threatening to established individuals and teams or to convince those participants to join the initiative if they exhibit openness to being educated on the subject and are likely to experience an emotional reaction once they realize the truth. A couple of years ago, I had a conversation with a senior VP (now retired) at UPS that claimed Amazon was not a disruptive threat, even given the fact Amazon represented more than 10% of UPS package deliveries and was competing directly with UPS with little blue vans scouring suburban neighborhoods. And then there’s the fact of Amazon’s history of using their vendor partners/customers own data to compete with them directly. In such cases, it’s important to recognize that such an obviously wrong opinion is rooted in emotion and is very difficult to change. It’s better to work around these leaders than spend a lot of time trying to change their opinion, until you can point to specific business outcomes you can give them credit for. Also, if an organization has too many of these folks, it’s probably better to think about moving on – that ecosystem is too far from an inflection point to be able to embrace change. It may also be the right choice for that organization at that point to NOT change until a precipitating event occurs, meaning you either embrace the status quo yourself or move on.

To build momentum, it’s also important to use a Lean/Agile approach for delivering business value. Few leaders at this point are willing to wait years for a waterfall project to yield fruit.

Partner with fellow disruptors

Even if you’re a CEO, you can’t build an organizational culture of innovation that rewards adaptability over stability by simply hiring. You have other executives whose replacement would be too disruptive to current operations, and then of course, there’s the board. Consequently, it’s critical to identify and partner with individuals in the current ecosystem who are open to challenging the status quo because they feel (or can be made to feel) fear/greed/excitement about the existential threat and opportunity of using Generative AI with Enterprise Data. It’s also critical to make identifying these attributes part of the hiring process as well.

Walt Carter, in his book “We Can't Stay Here: Becoming A Great Change Captain” discusses the need to bring “misfit toys” (from the 1964 TV special “Rudolph the Red Nosed Reindeer”) into your organization to support change initiatives. By this, he is referring to individuals that exhibit adaptability, a preference for collaboration vs. competition, and a tendency to prioritize “us” vs. “me” - traits often penalized in organizations because most organizations prioritize stability over adaptability, yet critical for any ecosystem seeking to get in front of being disrupted.

Sell the car not the assembly line

The 1st Industrial revolution was one of the most disruptive, yet overall positive events in human history. And it all started with the rise of enabling systems and technologies like the rise of capitalism and the corporation, the rise of machines that made use of non biological forms of energy for manufacturing goods and mining, and the rise of nationalism to provide both funding and regulation. But one of the often overlooked drivers of the 1st revolution is the invention of the printing press in 1436. This invention unlocked an exponential advance in the ability of humans to communicate ideas broadly which made it possible for social ecosystems to reach inflection points in years rather than centuries.

The disruption of the Roman Catholic ecosystem via the Protestant Reformation is one such example. When Martin Luther nailed his “Ninety-five Theses or Disputation on the Power and Efficacy of Indulgences” on the door of the Wittenburg Castle Church in 1517, that was an inflection point in a social if there ever was one. And it was made possible because Luther was able to use the printing press to quickly and widely share his ideas in a way that made millions fearful that the Catholic Church had become too greedy and corrupt. This same type of mass communication in a more modern context is exactly why Generative AI has reached an inflection point, but Lean/Agile Data Fabrics have not. Martin Luther was able to reach outside of the clergy (a specialized group of experts unwilling to self-disrupt) to the broader population and create an emotional response in that much larger group to drive disruption in the specialist group.

Nearly 400 years later, Henry Ford achieved something similar when he disrupted the emerging automotive industry. While Ransom Olds received the 1st assembly line patent in 1901, the assembly line was nothing new. A similar process was being utilized by meat packers as early as 1870’s Chicago. In fact, it was a visit to a Swift and Company slaughterhouse by one of Ford’s employees, William "Pa" Klann, sometime around 1906 that is widely credited with introducing Ford to the concept. Once armed with the efficiencies of the assembly line, Ford was able to build awareness in a mass market, the emerging middle class and use that awareness to create FOMO for new middle class consumers wanting something that previously was only available as a luxury good because of cost. Unfortunately for most of Ford’s competition, their engineers were not adaptable enough to navigate the shift to mass production and their car companies ultimately failed.

It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change.
— -Charles Darwin

It was the geometric growth in awareness and corresponding motivation to act in the larger ecosystem of middle class users that created the disruption inflection point in the smaller ecosystem of the automotive industry. The inflection point was therefore not the advent of assembly line in automotive manufacturing. The inflection point was awareness that middle class people could now afford a car grew exponentially to the point where the emergence of the new middle class market disrupted the automotive industry. To be successful innovating with Generative AI with data delivered by a Lean/Agile Data Fabric, you must do the same – Sell the business outcome, not the assembly line.

Your results must link directly to something your team, division, and company cares about; getting vertical alignment like this takes a lot of time and effort, but is a requirement for success. And the choice of business outcomes to tackle must also be ones that map most easily to each level in the hierarchy.

Run the Skunk works strategy for ecosystems that are behind

You’ve already done your homework and achieved situational awareness so you know the business outcomes possible by adopting AI. Now to navigate the landmines. While the ideal situation is one where there is a culture of innovation and everyone is motivated to collaborate and invest personally in transformation, that is an unlikely scenario. More likely, you will have some number of leaders in the organization who will try to block you because of their stake in the status quo, even if your company and/or industry is experiencing significant disruption. The best strategy here is to find and partner with one or more business champions who do want disruptive change and are willing to support your initiative from a budgetary perspective as well as work around the organizational obstacles. This will only work if your partners in data driven AI disruption have political cloud significantly greater than the naysayers and you’ve carefully selected projects that solve business cases for data driven AI that:

  • Avoid interference with existing systems protected by political interests in the organization (even if they work for you)

  • Initial install and 1st sprint completion are less than 3 months in duration

  • 1st sprint creates a clear and significant business impact you don’t need to be technical to understand

  • Creates integration work that is easily repurposed for other items on the business’ AI and/or Data wish list

Know when to move on

In some cases, you’ll finish your investigation on where the people in your ecosystems are with respect to being disrupted by AI and Lean Data via Data Fabrics and realize you cannot foster the cultural changes needed to prepare for the coming disruption. A successful strategy for avoiding ecosystem disruption with data driven generative AI may not be possible because of the current state of culture in that ecosystem; you may have to decide to play the long game by waiting for the political landscape to shift and/or a company/industry inflection point, content yourself with working in an ecosystem on the wrong end of disruption (good for folks nearing retirement) or start the process of moving on now. Forcing the issue by fighting unwinnable political battles is not a good strategy. You’ll likely lose and suffer a loss in reputation as you’re pushed out anyways.

Steve Sasson, inventor of the Digital camera in 1975

This is actually a common situation. While Steve Sasson invented the digital camera in 1975, the leadership of his employer, Kodak, refused to market a digital camera until it was too late. Sasson continued working on digital cameras for Kodak until his retirement in 2009, but although Kodak made royalties for the patent, the leadership’s unwillingness to self cannibalize their traditional photography business ultimately led them to bankruptcy. Did Steve Sasson make a mistake staying at Kodak until the end? It depends on his priorities. Just like Steve Sasson, some of us will have to make some hard choices depending on our values, goals, priorities and where we are with respect to our careers.

Conclusion

It’s important to recognize that unless you are on the cusp of retirement, #AI is going to disrupt your life and you will experience much greater success if you are the disruptor (not the disruptee). It all comes down to the culture of the social and business ecosystems you are a part of and when those ecosystems reach inflection point were exponential growth in awareness turns to change in behavior en masse.

The pace of adoption of Generative AI and Lean Data at an exponential scale requires innovators to recognize and take advantage of disruptive inflection points. This necessitates fostering a culture of innovation and aligning with individuals open to change. Understanding the role of fear and greed in driving transformation is crucial, as seen in the adoption of AI. Recognizing the need for agile methodologies in data integration underscores the importance of putting the principals of Lean Data to work building a Data Fabric that can source the clean data needed at exponential scale to take advantage of Generative AI with Enterprise Data.

Please share!


Tyler Johnson

Cofounder, CTO PrivOps

Read More
Tyler Johnson Tyler Johnson

Using Generative AI with Clean Data to Survive in Shark Infested Waters: Lean Data (Part 3)

In part 3 of this blog post series, we discuss how a data fabric can be used to implement techniques borrowed from lean manufacturing to optimize the time required to integrate training data for LLMs and maximize business results.

Introduction

With all the hype around generative AI, it’s not surprising many organizations are incorporating AI into their strategic plans. The problem is, without clean training data, large language models (LLMs) are worthless. 

As organizations increasingly recognize the power of Artificial Intelligence (AI) in unlocking the value of data, the process of providing high-quality training data for LLMs is critical. In part 3 of this blog post series, we discuss how a data fabric can be used to implement techniques borrowed from lean manufacturing to optimize the time required to integrate training data for LLMs and maximize business results.

In many ways, the current state of data integration resembles pre-industrial manufacturing.  Instead of an assembly line approach, individual “data craftsmen”, also known as (data engineers) in small teams, or in many cases single IT “heros”, build bespoke data architectures that don’t scale.  This is very similar to the state of software application development before the advent of DevOps.

Now that organizations are open to the idea that AI is a key component of future competitiveness, they’ll soon realize that raw data is the input that AI converts into business outcomes; this fact that will drive organizations to borrow concepts from lean manufacturing, essentially creating data factories of their own. “Lean Data” is closely related to “Industry 4.0” but whereas Industry 4.0 describes all cyber-physical systems, Lean Data concerns itself with the optimization of data manufacturing (data pipeline) as part of a data factory (many data  pipelines).

10 Key Concepts of Lean Data

Value

A core principle of Lean Data is to align data integration efforts with customer needs, cost optimization, and other AI driven efforts to improve an organization’s competitiveness.  This is similar to Lean Manufacturing but extends beyond product value to all elements of an organization’s competitive strategy. A data fabric provides a unified and holistic view of the data ecosystem, enabling organizations to focus on minimizing time to business value.  Unlike a data lake or data lakehouse, a data fabric creates the agility needed to “start with the end in mind”, that is to design their AI strategy to focus on business outcomes and work backward to design the AI/Data system taking advantage of the ability to change data pipelines on the fly as the business changes.  Utilizing agile methodologies is a key component of value in Lean Data, and will be the topic of a future blog post in this series.

 

Value Streams

An effective data fabric approach facilitates the mapping of data flow in the integration process by providing a comprehensive view of data movement across the organization. By understanding how data flows through the fabric, organizations can optimize the integration pipeline, ensuring that the right data reaches the model training stages efficiently.

 

Flow

When implementing Lean Data, data fabrics must ensure a smooth, efficient, and continuous data flow by integrating data from various sources in real-time, batch or in between, depending on the requirements for each business outcome. In 1984, Eliyahu Goldratt introduced the concept of the “Theory of Constraints” in his seminal book, “The Goal”.  Connectivity is a critical limiting factor in delivering clean training data to LLMs and other data monetization efforts. To minimize these constraints, a data fabric must connect to the broadest set of connection methods for both legacy and modern information systems. This includes not just modern interfaces like APIs and cloud storage, but SQL databases, flat files, SFTP sites, and other legacy data communication methods.  A best practice is to leverage open source Javascript for connectivity because the broad array of JS connectors and software development kits (SDKs) supported by IT vendors, creating a force multiplier for ensuring all components are kept up to date via 3rd party vendor vulnerability detection and patching of their JS connectors. We are talking integration here, not data analytics where there are other purpose built options in Python, R and other programming languages.

 The situation is not static; as business objectives (value) evolve, bottlenecks including physical constraints, business constraints, process constraints, and most importantly, people constraints will emerge. This requires a data fabric approach that facilitates identifying and addressing bottlenecks as they occur with a policy driven approach, like software defined and infrastructure as code (IaC) approaches seen in DevOps.

 

Pull

Lean Data enables a pull-based approach to data integration, where data is integrated on-demand as required by the outcome. Instead of pushing all available data, the data fabric must have the capability to dynamically pull training data from relevant sources, with the option for on-demand (event driven) or according to a schedule, depending on use case. The data fabric must also be able to implement automation that enables LLMs and other data requestors to request specific data subsets, thus reducing unnecessary data processing and storage costs.

 

Perfection

Lean Data promotes continuous data quality improvement by incorporating data governance and validation mechanisms. This ensures that data is accurate, reliable, and compliant with quality standards before being integrated into training datasets, leading to higher model performance.  While many consider the evolution of data privacy regulations a source of data friction, it isn’t because of trust. A data fabric that incorporates standardized capabilities for consent management, data privacy masking, data lineage, validation, logging and error reporting actually increases trust and facilities both broader sharing of data and continuous improvement via agile processes.

 

Empowerment

Lean Data Empowerment is all about being able to trust your employees and stakeholders with significant tasks, including access to enterprise data through LLMs. LLMs that incorporate both foundational LLM training datasets and enterprise data are subject to data leakage that can put businesses at risk. Although vendors like Microsoft have announced LLM offerings that offer commercial protection from enterprise data leakage (like Bing Chat Enterprise), it’s not enough to protect against leakage outside the organization. Users, and by extension the LLM they use, need to be able to protect against data leakage between user roles as well. As an example, if an organization were to feed all sales data into an LLM, how would they prevent salespeople from poaching each other’s leads by accessing sales data through an LLM? A data fabric in place must provide the ability to govern the flow of and access to sensitive data either directly by users or by LLMs through automation.

 

Standardization

In Lean manufacturing, standardization refers to documenting steps and the sequences of those steps for creating standardized tasks. Lean Data refers to not just the documentation of steps (or components) and the sequencing of those steps in data pipelines, but the standardization of the data pipeline components themselves. By leveraging a data fabric with a minimal set of standard pipeline components, organizations can not just establish and enforce standardized data integration pipeline templates, but drastically reduce the complexity, time and cost required to build and maintain data pipelines, which results in reduced time to data, and reduced time to decision. An effective Data Fabric approach will utilize a minimum set of data pipeline components and a drag and drop user interface (UI) that makes sequencing steps simple. By defining consistent data pipeline components, an effective data fabric approach also ensures uniformity across all data pipelines, minimizing integration complexities.

 

Just in Time

In Lean Manufacturing, Just-in-time (JIT) refers to methods to reduce flow times in manufacturing systems and improve response times to customers and suppliers. While a data fabric can optimize data processing by enabling just-in-time data integration with policy defined data pipeline components, Lean Data JIT also refers to the just-in-time creation and change management of the data integration pipelines themselves as well as their outputs. The ability to apply agile methodologies to data integration is a requirement for meeting the value principle of Lean Data, consequently an effective data fabric approach seeks to drive the change management cost in building and maintaining data pipelines to zero where possible and to minimize it elsewhere.

 

Visual Management

A data fabric approach in support of Lean Data must offer real-time monitoring and visualization of data integration processes through intuitive user interfaces. Teams must be able to not only track data flow, processing times, and errors, but also provide reporting for the purposes of security and compliance.  This empowers IT operations to make informed decisions and address issues promptly, cybersecurity professionals to perform security audits and build in security by design, and compliance professionals to ensure compliance with relevant data privacy and security regulations through data privacy by design.

 

Efficiency (Waste)

Waste in Lean Manufacturing refers to reducing or eliminating everything that does not add value (the 7 wastes), things like excess transportation costs, inventory, idle time, overprocessing, defects etc. to improve product quality, reduce production cost, and production time. In Lean Data, instead of the seven wastes, we have the 10 efficiencies. As with the wastes in Lean Manufacturing, Lean Data efficiency refers to addressing traditional form of waste, work and other costs that also don’t add value. In addition it also refers to opportunities to improve efficiency not traditionally thought of as sources of waste.  In other words, Lean Data Efficiency seeks to eliminate waste while minimizing rework (or technical debt remediation) and maximizing reuse. Data fabrics can play a key role in optimizing the 10 Efficiencies of Lean Data:

 

The 10 Efficiencies of Lean Data

Change Management. Data fabrics minimize the cost of change management by making all parts of data pipelines configurable and able to be automated via policies.  Data users can request schema changes via change requests that are fulfilled in minutes instead of days. A data fabric’s modular, standardized components also make it possible to quickly connect to new data sources and reuse existing work to create new data pipelines by copying existing pipelines or parts of pipelines.  

Minimize rework. To minimize rework, a data fabric approach must be able to:

  • Connect to legacy systems easily to bypass technical debt (until other business value exists that justifies refactoring that technical debt)

  • Use modular data pipeline components to minimize the amount of work required to rework existing data pipelines. This maximizes the ability of operators to implement changes by only needing to modify small parts of data pipelines instead of starting from scratch.  For example, if an HR department wanted to change HR systems, the data fabric only requires a change in the connector and mapper – all other downstream data pipeline components remain unchanged.

  • Eliminate dependencies between IT systems.  Data fabrics isolate changes between existing IT systems by not requiring data changes in existing systems of record.  Instead, the data fabric can easily transform input data into digestible output data via a low-code pipeline management interface and automation.

  • Minimize data pipeline sprawl. An effective data fabric approach requires data fabrics that have a hierarchical catalog system for managing both data pipelines and pipeline components.

Maximize Reuse. with a modular, composable architecture for building data pipelines, data fabrics maximize reuse by making it possible for operators to copy and modify existing data pipelines

Talent. Data fabrics help organizations maximize the productivity of its most skilled engineers and developers by eliminating 95% (or more) of custom development for building data pipelines with a low-code, drag and drop interface for building and managing data pipelines. As a result, organizations are able to utilize non-coding operational personnel for most data pipeline tasks.  Expensive data architects and ETL developers are then able to span across more data pipelines by a factor of 100x.

Access Automation (Zero Trust). Given the sensitivity of data and need to govern that data with access and identity information for both human and non-human data requestors like LLMs, manual access management processes are too inefficient to scale to the 100’s to 10,000’s of data pipelines required.  Since an effective data fabric approach requires that sensitive data be governed, data fabrics with automated access management capabilities are a requirement. By integrating employee and vendor systems with identity platforms (Azure AD, Okta, etc.), requestor identity is automatically established and data pipelines now can govern data effectively because they have accurate access and identity data at all times.

Security and Privacy Automation.    In software development, DevSecOps is a term coined to refer to the idea that security is built into the application by design and from the beginning.  DevOps engineers refer to this as “shift left”, referring to moving security implantations and reviews to the left of a project management Gantt chart. Lean Data seeks to improve efficiency in implementing and managing cyber security and data privacy governance in data integrations by “shifting left” privacy and cybersecurity requirements when building data pipelines. This concept in Lean Data is referred to as “Privacy and Security by design”. An efficient data fabric approach requires that the data fabric include standardized modular components that automate filtering in data pipelines based on consent and requestor identity.

Data Quality.  How many times have we heard CDOs and data users complain about data quality? The problem is that traditional approaches to data integration lack efficient data validation capabilities.  Lean Data seeks to optimize the processes required to clean data.  Data fabrics help to optimize data cleansing by:

  • Quickly building connections to data validation software services that clean data.

  • Including the capability to inject custom data checks into data pipelines.

Interoperability.  To create efficiency, Data fabrics maximize interoperability by making it possible to integrate with the widest set of legacy and modern systems possible (connectors) and streamline the transformation of input data models to output data models. 

Transfer. Data Fabrics minimize data transfer costs by being able to easily configure subsets of data (input schemas) and make those subsets policy defined so operators can change pipeline input schemas on the fly and LLMs and other applications can automate input schema selection.

Storage. Data Fabrics minimize storage costs by eliminating the need for a data lake, warehouse, or lake house in most cases by providing just-in-time access to data from the primary systems of record. Data fabrics can also be used to create new systems of record with persistent storage, but this is not an aggregated lake, lakehouse or warehouse. The only other intermediate data storage is when performance or cost constraints require data to be persisted as part of a data pipeline.

 

Conclusion

Embracing Lean Data principles with the support of a data fabric is critical for organizations seeking to unlock the true potential of AI and derive maximum value from their data assets. By aligning data integration efforts with customer needs, optimizing data flow, and ensuring continuous data quality improvement, organizations can create efficient data manufacturing processes. Lean Data's pull-based approach and standardization of pipeline components reduce complexity and time required for data integration. Agile methodologies facilitate adapting to changing business objectives and addressing bottlenecks in real-time. Through Lean Data, organizations can eliminate waste, maximize reuse, and optimize various aspects of data integration, empowering them to stay at the forefront of innovation and competitiveness in the industry.

Please share if you like this content!

Tyler Johnson

Cofounder, CTO PrivOps

Read More
Tyler Johnson Tyler Johnson

Using Generative AI with Clean Data to Survive in Shark Infested Waters: Data Friction (Part 2)

We delve into sources of data friction that pose significant hurdles for utilizing enterprise data with LLMs




“Fix the wiring before you turn on the light.”




Introduction

With all the hype around generative AI , it’s not surprising many organizations are incorporating AI into their strategic plans. The problem is, without clean training data, large language models (LLMs) are worthless.

As organizations increasingly recognize the power of Artificial Intelligence (AI) in unlocking the value of data, the process of providing high-quality training data for LLMs is critical. In part 2 of this blog post series, we delve into sources of data friction that pose significant hurdles for utilizing enterprise data with LLMs, discussing the nature of these sources of data friction and how organizations can utilize a data fabric to minimize data friction and fuel the success of their AI and data strategies.

Data friction can arise due to technical limitations, incompatible formats, legal or regulatory constraints, security concerns, operational inefficiencies, or lack of interoperability, among other factors. It represents a challenge to organizations seeking to leverage data as a strategic asset, as it slows down data-driven processes, introduces delays, increases costs, and undermines the overall effectiveness of data utilization. Overcoming data friction requires addressing these barriers through technological advancements, standardization efforts, policy changes, and collaborative initiatives to enable the seamless and secure movement of data throughout its lifecycle.


6 Key Sources of Data Friction

Obsolete, Outdated IT (Technical Debt)

Legacy IT systems and outdated infrastructure can impede data integration efforts for LLMs. These systems may lack the necessary compatibility, scalability, and agility to handle the volumes and complexities of data required for training models effectively. Traditionally, overcoming technical debt requires organizations to invest in modernizing their IT infrastructure, adopting cloud-based solutions, leveraging containerization, implementing robust data integration frameworks, etc. Unfortunately, these approaches rarely work for a number of reasons, notably challenges in communicating the business value of technical debt remediation and resistance to change.  Fortunately, there is a better way.  By embracing a distributed data fabric approach that emphasizes maximum interoperability and minimizes change management costs associated with data pipeline development and maintenance, organizations can significantly mitigate the impact of technical debt on AI initiatives. This approach allows them to bypass most technical debt issues and optimize data flow, resulting in a more efficient training process for LLMs.

 

Data Privacy, Security & Governance Requirements

In part 1 of this blog series, we delved into how data privacy, security, and governance are paramount concerns when integrating data into LLMs. To get the most out of their AI strategy, organizations must strike a delicate balance between obtaining valuable LLM and other ML training data, operational efficiency, and maintaining compliance with GDPR and other data privacy regulations. Embracing a data fabric that incorporates cybersecurity and data privacy by design is essential for achieving this balance.

Data Quality

The quality of training data significantly impacts the performance and accuracy of LLMs. Challenges related to data quality include inconsistencies, incompleteness, inaccuracies, and biases within datasets. To mitigate these challenges, organizations typically invest in data cleansing, preprocessing, validation, and augmentation data tooling. Unfortunately, the change rate in such tooling is enormous and there are literally thousands of tools out there to choose from offered by startups and major tech vendors alike.  How do you choose the right tooling to support your AI, analytics and digital business transformation initiatives? It’s impossible to get it right, the competitive environment that drives (or should drive) IT initiatives changes just as fast as the data tooling ecosystem (or faster).  But by utilizing a data fabric as a hot pluggable backplane for data tooling, organizations can create a best of breed approach to data tooling that future proofs organizations’ data quality strategy and rises in stark contrast to large vendors who have a strong financial interest in limiting interoperability.  This is also true when thinking about what AI tooling to adopt.

Talent Shortages

Building and maintaining data integrations for AI and other uses traditionally require skilled data engineers and IT Operations experts. However, there is a shortage of talent in these specialized fields. While organizations generally address these challenges by investing in upskilling existing teams, fostering collaborations with academic institutions, and leveraging third-party expertise through partnerships or outsourcing, these actions all come with a cost.  By utilizing a low code data fabric, organizations can:

  • Create a force multiplier for existing data pipeline development expertise by drastically reducing both the amount of coding required for data pipeline projects as well as the effort required for data pipeline change management.

  • Offload tasks like schema changes, data masking changes, etc. that traditionally required data engineers to non-coders (IT operations managers, analysts, etc.).

  • Maximize the efficiency of data scientists, AI engineers and others with an order of magnitude reduction in the effort required for data pipeline changes.

Vendor Lock-In

Vendor lock-in occurs when organizations become overly dependent on specific technologies or platforms for data integration. This dependency limits flexibility, hampers innovation, and restricts the ability to switch vendors. The current IT industry approach hasn’t helped either. Technology vendors often create platform stickiness (lock-in) so they can:

  • Sell additional products and raise prices by controlling or limiting how their platforms work with other technologies

  • Make it more difficult to switch vendors

In addition, vendor solutions are generally designed to solve a narrow technical problem set; organizational and business process challenges are generally an afterthought. These approaches work directly against interoperability and make the process of building and maintaining data pipelines for LLM training data much slower, more costly, and more likely to fail. By embracing a data fabric that features a modular architecture and ability to connect to anything - both modern (APIs, cloud storage) and legacy systems (Databases/SQL, text/csv, etc.), organizations can facilitate seamless integration with multiple vendors and technologies, enabling organizations to adapt and evolve as rapidly as the AI and data landscape is.

Operational & Data Silos

Operational and data silos create barriers to data integration by segregating data and inhibiting its seamless flow across different departments, systems, or business units. Traditionally, organizations attempt to break down these silos by launching change initiatives that attempt to shift organizations towards a data-driven culture, facilitate the adoption of enterprise-wide data integration strategies, encourage cross-functional collaboration, foster data sharing, implement centralized data repositories, and promote data governance practices etc. but unfortunately, these projects are difficult to complete and usually fall short. It is important to recognize the existence of operational and data silos as operational debt and data debt, close cousins to technical debt. And just like technical debt, it is generally better to bypass data and operational debt than mitigate it.  By using a data fabric, organizations create a hybrid integration layer that keeps changes in one system from affecting other systems, essentially eliminating the need to break down data operational and organizational silos for the purpose of obtaining actionable training data for LLMs.

Conclusion

Using a data fabric approach can help organizations overcome data friction and improve the training process for LLMs. This approach emphasizes interoperability and reduces change management costs, reducing technical debt and enabling smooth integration across different systems and formats. It also ensures compliance with data privacy, security, and governance requirements while extracting valuable insights. The data fabric helps address data quality challenges by providing a flexible foundation for selecting data tools, ensuring strategies remain relevant and avoiding reliance on a single vendor. It also helps organizations tackle talent shortages by reducing coding requirements and empowering existing teams. Moreover, the data fabric eliminates operational and data silos, allowing organizations to obtain useful training data for LLMs without major organizational changes. Overall, adopting a data fabric approach provides a comprehensive and efficient solution to drive successful AI and data strategies.

Please share if you like this content!

Tyler Johnson

Cofounder, CTO PrivOps

Read More
Tyler Johnson Tyler Johnson

Using Generative AI with Clean Data to Survive in Shark Infested Waters: GDPR and cybersecurity (Part 1)

When it comes to training data for AI, fix the wiring before you turn on the light.



“Fix the wiring before you turn on the light.”



Introduction

With all the hype around generative AI , it’s not surprising many organizations are incorporating AI into their strategic plans. The problem is, without clean training data, large language models (LLMs) are worthless.

As organizations strive to harness the potential of artificial intelligence (AI), training data is critical. However, in today's data-driven landscape, data privacy and compliance regulations, such as the EU’s General Data Protection Regulation (GDPR), pose massive challenges and are a significant source of data friction as organization seek to monetize their data. There are many other sources of data friction, including organizational knowledge gaps, data/organization silos, vendor lock-in and technical debt, but for the purposes of this article, we will focus on the importance of utilizing a data fabric for integration, security, and data privacy under GDPR, enabling organizations to obtain valuable training data for LLMs while maintaining compliance.

Key Challenges/Opportunities

  • Pseudonymization and Anonymization

  • Consent Management

  • Data Encryption and Access Control

  • Auditing and Compliance Monitoring

  • Low Code/Efficiency

Data privacy is a growing concern, and regulations like GDPR have been implemented across the globe to safeguard individuals' personal information. Compliance with these regulations is mandatory for organizations handling and processing personal data. When it comes to training Large Language Models(LLMs), organizations must adhere to the principles of data privacy, consent, and lawful processing. This is a massive challenge for most organizations because they have a mix of both legacy and modern IT systems with sensitive data. Let's explore how a data fabric addresses key challenges related to security and data privacy.

Pseudonymization and Anonymization

Under GDPR, CCPA and other data privacy regulations, organizations are required to protect personal data by pseudonymizing or anonymizing it. A data fabric must enable organizations to apply these techniques during the integration process by automatically replacing identifiable information with pseudonyms or removing personally identifiable information altogether. This ensures that training data used for LLMs is privacy-compliant, reducing the risk of unauthorized access or data breaches. The key is to think about change management - What is the cost of reacting to changes in the regulatory environment?  Make sure any data fabric you build or buy has prebuilt data privacy components so new integrations are compliant by design to minimize re-work (technical debt).

Consent Management

Consent is a crucial aspect of data privacy compliance. Organizations must ensure and demonstrate they have obtained appropriate consent from individuals whose data is used for training LLMs. A data fabric must incorporate automated self-service consent management capabilities including automated masking of sensitive data unique to each data requestor. This allows organizations to track and manage consent preferences throughout the data integration process. Training data is then sourced, processed and logged in accordance with the consent given by data subjects, thereby maintaining compliance.

Data Encryption and Access Control

Data security is paramount when handling personal data for LLM training. A data fabric must provide robust encryption mechanisms and automate identity and access management. By implementing encryption protocols, organizations safeguard sensitive training data, preventing unauthorized access, mitigating the risk of data breaches, and giving organizations the fine-grained controls necessary to expose valuable data more broadly to enable citizen data scientists. To be truly secure while providing maximum access to data, a data fabric must follow a zero trust model where access managment is automated. This ensures that data requestors alway have the right permissions to data acces. We’ve also reduced data breach risk by eliminating the chance over-permissioned users or “zombie” users (eg. ex-employees and contractors) are able to access sensitive data.

Auditing and Compliance Monitoring

Data privacy requires organizations to demonstrate compliance and maintain records of data processing activities. A data fabric must enable comprehensive auditing and compliance monitoring, providing organizations with a centralized platform to track data integration processes, access logs, and consent management activities. This facilitates efficient compliance reporting, reducing the administrative burden on organizations.

Low-Code Integration for Efficiency & Scalability

As is the case with most technology projects, data integrations are traditionally project based. Because IT project requirements don’t usually take into account the effects on future work, the result is a large number of point-to-point integrations that fail to re-use prior work; each new integration project gets more expensive, more complicated, and more likely to fail. The current IT industry approach hasn’t helped either. While most vendors pay lip service to interoperability, the reality is quite different. Technology vendors create platform stickiness (lock-in) so they can:

  • Sell additional products and raise prices by controlling or limiting how their platforms work with other technologies

  • Make it more difficult to switch vendors

  • Force you to buy features you don’t want or need

Simply put, it is in the financial interest of cloud data platforms and integrators to create proprietary data structures and interfaces that make it difficult to be replaced when contracts end.  What is needed is flexibility and efficiency, not lock-in. Organizations need a low-code, hot-pluggable data fabric for interchanging custom, open-source, and proprietary components. This is critical because organizations need to be able to swap out AI vendors and integrate new sources of training data as newer platforms emerge. 

The alternatives are:

  • Build a data fabric yourself

  • Use a IT platform vendor with a walled garden approach to data integration that limits flexibility and makes you pay for things you don’t need

  • Use best-of-breed data fabric solution that prioritizes interoperability and use of open source to create a force multiplier (like the PrivOps Matrix)

By utilizing interoperable, low-code data integration, organizations can comply with data privacy requirements in a way that scales, ensuring that only authorized and compliant data sources are used for training LLMs and other forms of Machine Learning (ML).

Conclusion

To get the most out of their AI strategy, organizations must strike a delicate balance between obtaining valuable LLM  and other ML training data, operational efficiency, and maintaining compliance with GDPR and other data privacy regulations. Embracing a data fabric for integration, security, and data privacy is essential for achieving this balance. By leveraging the capabilities of a data fabric like the PrivOps Matrix, organizations can streamline data integration, ensure GDPR compliance, protect personal data, and enhance training data quality for LLMs. With these measures in place, organizations can unlock the full potential of LLMs while upholding the principles of privacy and data protection in today's data-driven world.

Tyler Johnson

Cofounder, CTO PrivOps





Read More
Development Strategies Tyler Candee Development Strategies Tyler Candee

Agile Strategies for Startups: Harnessing the Power of Agility

We should consider the context in which Agile will be implemented when strategizing for startups. To better understand the benefits and challenges of using Agile in startups, let's look at various examples of good and poor implementations.

We should consider the context in which Agile will be implemented when strategizing for startups. The objective of incorporating Agile should be to improve efficiency and productivity across the engineering department, regardless of size. We want to decrease development overhead (e.g., excessive meetings, complex workflows, redundant tasks, etc.). To better understand the benefits and challenges of using Agile in startups, let's look at various examples of good and poor implementations.

No Implementation

Without an Agile strategy, a company may struggle with a lack of flexibility, limited communication and collaboration between teams, difficulty adapting to changing customer needs or market conditions, and difficulty estimating project timelines. An Agile approach provides structure that can prevent complexity and inefficiency. This can lead to missed deadlines, decreased customer satisfaction levels, lower quality products or services delivered to customers, higher costs due to rework, or delays in delivery times.

At the start of my career as a developer, I worked with a small engineering team at Far Corner Technology in Columbia, Maryland. Without any guidelines or structure to our work, we had a startup mentality that led to production and deployment issues. We tested new features in production instead of using separate testing environments and lacked structure for workloads or meetings discussing upcoming projects and initiatives. Tracking progress was also challenging since we relied on manual change logs.

Solid Implementation

A company that has a solid implementation of an Agile strategy is one that understands the value of collaboration and adaptability. Such a company will have established processes in place to ensure teams are working together efficiently, such as sprint planning meetings and retrospectives. They also understand the importance of having clear communication between team members so everyone is on the same page with tasks and goals.

At my second career job at mHelpDesk in Fairfax, Virginia, I gained hands-on experience with Agile and its practical application. As a mid-level developer on a team of 8-10 developers that continued to grow, I was introduced to proper Pull Requests and work tracking through a task management system (Jira). Through the Agile process which included Ticket Grooming, Sprint Planning, and Retrospectives, I could see the impact of my work on the customer - it was an incredibly fulfilling experience.

Additionally, I had the opportunity to take on a full stack development role for building out a separate scheduling module project which we successfully delivered within a few months. With the Scrum meetings, we wouldn’t have been able to pull that off. Through this experience, I gained an appreciation for how much more effective Scrum is than Kanban in an Agile work environment.

Corporate Implementation

A corporate company that has an overkill implementation of an Agile strategy may cause teams to struggle with efficiency due to too many processes and layers in place. These companies often require multiple meetings and reports just for the approval of a task or project before any development can begin, leading to long delays between the start of a project and its completion as bureaucracy takes precedence over actual progress.

Furthermore, these organizations tend not be as open with their communication compared to other companies using more streamlined approaches; this makes it difficult for team members to understand what's expected from them during each phase which can lead to misunderstandings or missed deadlines down the line.

At my third career job with Angi in Indianapolis, Indiana, I moved into a full-blown manager position. In addition to acting as the Scrum Master for our projects, I was now managing my fellow developers. Here, I saw first hand the downside of running Agile in a large company. Despite our team delivering work quickly and having well defined processes that were documented and directed, we still didn't receive much recognition within the company due to so many other initiatives being undertaken at once.

There was a lot of bureaucracy to get through in order to have work accomplished and released. The processes were so precisely defined and structured that it required navigating multiple layers just to have one's voice heard. Our team ran efficiently within these processes, but we felt like a small cog in a large machine.

Startup Implementation

A startup company that utilizes a streamlined Agile strategy without bureaucracy but with structure understands the value of collaboration and adaptability. They have established processes to ensure teams work together efficiently, such as sprint planning meetings and retrospectives. Clear communication between team members ensures everyone is on the same page with tasks and goals.

This approach provides quicker delivery times due to less time spent navigating complex bureaucratic systems. It also allows for more flexibility when changes or unexpected roadblocks arise during development, since there's not as much red tape to cut through in order to make adjustments along the way. Additionally, experimentation leads to innovation within organizations as well as improved customer satisfaction due to more frequent releases with fewer bugs or other problems associated with them.

By streamlining their process while still maintaining structure, startups can get projects completed effectively without sacrificing quality. All stakeholders know what's expected from them throughout each phase which helps build trust amongst team members and keeps everyone focused on meeting deadlines rather than dealing with unnecessary paperwork or waiting for approvals from higher-ups who may be unfamiliar with how software projects should actually be run.

Conclusion

I believe there is a balance between process and relationships in Agile. Structure is important for the development process. But without a connection to the bigger picture, developers may feel like they are just working without seeing results.

When working with a team of 4 or more developers, the Scrum method can be effective as it provides structure for the development process, enables tracking of work, and allows for team discussions on the work's progress. However, the approach should not become bureaucratic, where developers are disconnected from the actual usage of their work or where changing processes takes too long.

In a startup environment, the Agile approach may need to be modified to allow for a more constant stream of work. Rather than having Sprint Planning meetings, have longer-term roadmap discussions and work on larger batches of features. Instead of scheduled retrospectives, have daily discussions on what is going well or what needs improvement.

Tracking work using a task management system and having a proper testing strategy are crucial. Testing should be done in a separate environment before going live and using a Pull Request method to review work can provide a clear history.

In conclusion, structure in the development process is important but should not take priority over the relationship with the development team. Maintaining a balance between relationships and process is crucial in the development world.


Tyler Candee

Tyler Candee

Vice-President of Engineering, PrivOps

Read More
Cloud Computing Tyler Johnson Cloud Computing Tyler Johnson

How metaDNA™ is different than microservices, containers and APIs

There is quite a bit of buzz and confusion around microservices, containers, API’s, service meshes, and other emerging technologies related to “cloud-native” application development and data integration; unfortunately, PrivOps has been caught up in the confusion. I often get questions about our (now patented) technology, specifically metaDNA™, the core of our technology, where folks try to categorize us incorrectly.

'To be clear, metaDNA™ is not an API manager (e.g. Mulesoft), a container orchestrator (e.g. openShift), a service mesh (e.g. Istio), a integration platform (e.g IBM IIB), a master data manager (e.g. Informatica), it is an entirely new category. Let me explain. (And yes, I understand that the vendors mentioned above have products that span multiple categories)

Traditional_Russian_Matryoshka.jpg

To understand metaDNA™, first we need some context. For example, the concept of a microservice is an abstraction that is a manifestation of the interplay between modularity and atomicity (i.e. irreduciblity) at the software architectural layer. There are many other other abstractions at and between other layers of the technology stack, including the interface (e.g. APIs, UIs), server (e.g virtual machine, container) the network (e.g. packets, protocols), the structural (e.g. object-oriented, functional constructs), the language (e.g. high level software instructions that abstract assembly language instructions that abstract hardware operations), and so forth.

Two important questions are:

  1. Is there currently a modularity gap that sits between microservices (software architecture), functional programming and data structures?

  2. Would it matter if an abstraction filled that gap?


modularity.png

Is there a modularity gap that sits between microservices, functional programming and data structures? The answer is yes, which is what my metaDNA™ ontology (and the metaDNA™ catalog that implements the metaDNA™ ontology) attempts to remedy. For those unfamiliar with the term ontology, it is simply a structured way of describing (or building in this case) a class of object structures and the relationships between those objects. (More on ontologies here.) Because of its ontology, the metaDNA™ catalog serves as an abstraction layer that sits between (and unifies) microservices, functional programming and data structures and constitutes an entirely new paradigm for building digital technology. metaDNA™ builds on other abstractions like microservices and containers, but doesn’t necessarily replace them. Like biological DNA, metaDNA™ objects have 4 atomic types, with uniform structures. In the same way biological DNA composes lifeforms, objects from the metaDNA™ catalog compose software components (microservices) AND data structures from EVERY type of structured data. This approach creates the opportunity for several advantages for hybrid cloud applications, including self-referential data and applications, data defined software applications that reconfigure based on context, policy driven application behavior changes, and several others.


Does it matter if an abstraction layer fills the gap between microservices (software architecture), functional programming and data structures? Absolutely, because without it, microservices based software architecture complexity growth is still exponential, even with the use of APIs, containers and service meshes. For example, the cost point to point integration among legacy and modern systems grows exponentially at the rate of ½KN(N-1) where K is the cost of each integration and N is the number of connections. Adding tools adds a similar exponential cost growth. While the modularity afforded by various solutions at the API, microservice and other layers flattens the cost curve, without addressing the modularity gap between the application and data layer the curve is still exponential and still runs into scalability problems, especially for situations like Digital Transformation that requires integration of large numbers of legacy systems and edge computing (even with 5G).

- Tyler

Read More
Tyler Johnson Tyler Johnson

PrivOps awarded contract with US Air Force

Alpharetta, GA. — PrivOps, the leading open data fabric provider, is proud to announce that the US Air Force Small Business Innovation Research (SBIR-STTR) team has selected PrivOps in partnership with JJR Solutions, LLC in a competitive bid process, and PrivOps is officially under contract with the US Air Force. PrivOps has been tasked with creating a plan to leverage their patented data integration, automation, and application development technology to solve some of the US Air Force’s most pressing needs. (More about their recently granted patent here.) PrivOps has already obtained signed commitments from multiple organizations within the US Air Force to support PrivOps’ efforts operationalizing their platform for the Air Force’s needs. Here are some of the needs the Air Force has identified that PrivOps and JJR Solutions are working to address:

  • Provide automated, policy-driven control of registering transactions on blockchain technologies (e.g., Hyperledger) to secure software chain of custody and detect malicious code manipulation

  • Provide an event-driven service mesh that makes it possible to detect threats and other operational events and respond in near real-time (self-healing applications)

  • Implement a distributed Enterprise Information Model (EIM) to support deployment of a data aggregation and transformation system in the cloud

  • Enable a zero-trust model and Attribute-Based Access Control (ABAC) for automating data governance between modern and legacy systems to support new data analytics, multi-domain operations (MDO), and multi-domain command and control (MDC2) capabilities

  • Create cross-domain data pipelines with microservices that incorporate best-of-breed, interchangeable commercial off the shelf (COTS) and open source artificial intelligence (AI) and machine learning (ML) software solutions, making it possible to take advantage of new technologies as they become available

“We are delighted to be working with the US Air Force, and are extremely impressed by their commitment to innovation. We are also excited to be partnered with JJR Solutions, LLC in this effort and look forward to leveraging their world class expertise around data, integration, and governance. We look forward to helping the Us Air Force make our warfighters more effective, safe and secure as they protect our nation” - Kit Johnson, CEO, PrivOps

About the USAF SBIR-STTR Program

AFRL and AFWERX have partnered to streamline the Small Business Innovation Research process in an attempt to speed up the experience, broaden the pool of potential applicants and decrease bureaucratic overhead. Beginning in SBIR 18.2, and now in 19.3, the Air Force has begun offering 'Special' SBIR topics that are faster, leaner and open to a broader range of innovations.

Learn more about the US Air Force’s SBIR-STTR program at https://www.afsbirsttr.af.mil/

About PrivOps

The PrivOps Matrix is a next-generation data and applications integration platform designed to optimize the process of incorporating new technologies into data flows and integrating applications and data at scale. Proprietary point-to-point and service bus integration architectures requiring specialized talent create processes that don’t scale and are difficult to support; the PrivOps Matrix multi-cloud integration platform minimizes rework and maximizes re-use with an open, scalable, and agile hot-pluggable architecture that connects best-of-breed vendor and open source solutions to both modern and legacy applications and databases in a way that is much easier to support and maintain. As a result, US Air Force information technology will adapt faster to an evolving battlespace by being able to apply agile processes to integration while combining best-of-breed tools and emerging technologies with legacy systems.

Read More
Tyler Johnson Tyler Johnson

PrivOps receives US patent 10,491,477 for the PrivOps Matrix

 

We are excited to announce that as of 12/18/2019, the PrivOps Matrix is officially patented. US patent 10,491,477, Hybrid cloud integration fabric and ontology for integration of data, applications, and information technology infrastructure” is confirmation of PrivOps; technical leadership and innovation in helping organizations deal with data sprawl by making is easier to protect and monetize data wherever it lives.

metadna2.png

By integrating, governing and automating data flows between complex systems, the PrivOps Matrix serves as the foundation for building hot pluggable information supply chains that monetize data. We control, at scale and in real time, where sensitive data lives, how it’s processed & stored, and when, who or what has access.

The key innovation in the Matrix data fabric is the patented metaDNA catalog. Just as biological life is built from structures defined by standard sets of genes composed with reconfigurable DNA molecules, “digital molecules” stored in the metaDNA catalog are combined to create “digital genes”. These recipes will make it possible to build self-assembling microservices, applications, integrations, and information supply chains that can be reconfigured in real-time as the environment changes. The result is IT technology that will be more scalable, resilient and .adaptable than anything that exists today.

This is a momentous occasion for PrivOps. Special thanks goes out to Daniel Sineway, our patent attorney at Morris, Manning & Martin, LLP and our advisors Scott Ryan (ATDC), Walt Carter (Homestar Financial), Gary Durst (USAF) and many others who have supported us so far.

Read More
Tyler Johnson Tyler Johnson

The PrivOps Matrix is Selected as a Finalist for 2019 Air Force AFWERX Multi-Domain Operations Challenge

We are honored to announce that PrivOps has been selected as a finalist in the AFWERX Multi-Domain Operations (MDO) Challenge!

We are honored to announce that PrivOps has been selected as a finalist in the AFWERX Multi-Domain Operations (MDO) Challenge! We will be pitching to the Air Force at their big conference hashtag#AFWERXfusion19 in Las Vegas July 23-24. In addition, we are delighted to announce that we are partnering with JJR Solutions, LLC as they bring purpose-driven capabilities to improve the health, well-being, and security of our communities and nation. We share the same desire for our country to be strong and knowing MDO is a top area of focus for the Air Force and Department of Defense makes us highly motivated. This competition gives us the opportunity to help our country automate data governance through trusted open-source software and our data fabric. Our solution supports real-time decision-making and allows the flow of data to be stopped instantly if a threat is detected. Very exciting! AFWERX MDO

Challenge info: https://lnkd.in/ghQeGAv

Conference info: https://lnkd.in/erkv269

Read More