Open Data #1: Basic Understanding The disclosure of government data has become one of the key topics widely discussed and continuously implemented over the past several years. Open Government Data plays an important role in enhancing transparency, improving public services, and supporting data-driven policy development. In Thailand, three key organizations have been working collaboratively to drive all 20 government ministries toward strengthening and improving the Government Data Catalog. These organizations include: Digital Government Development Agency (DGA) National Statistical Office (NSO) Office of the Public Sector Development Commission (OPDC) This collaborative effort aims to establish a standardized and integrated government data ecosystem, enabling effective data sharing, accessibility, and utilization for the benefit of the public sector, private sector, and society as a whole. Simply Bright System Co., Ltd. has had the opportunity to develop Open Data systems for a wide range of organizations across both the public and private sectors. Based on this experience, the company initiated the development of a series of articles to provide guidance on Open Data Platform development, as well as approaches for effectively utilizing open data to create further value and innovation. This article serves as an introduction, beginning with key definitions and terminology related to Open Data to establish a clear and common understanding. More in-depth technical and implementation details regarding Open Data system development will be presented in subsequent articles. Definition of Open Data Open Data refers to data that is freely available for everyone to use, reuse, and redistribute without restrictions. There are no limitations on the type or domain of data that can be published as Open Data. Any organization or agency can disclose data they collect, including but not limited to: Personnel data Budget and financial data Geographic and spatial data Climate and environmental data Indices and key performance indicators Reports and analytical documents Typically, Open Data is published in the form of datasets. Each dataset consists of two main components: Metadata – descriptive information that explains the dataset Data Resources – the actual data files available for access and use In some organizations or systems, a Data Dictionary may also be provided to describe the meaning of each data field in detail. This helps users correctly interpret and effectively utilize the data. Metadata Metadata refers to descriptive information that explains the background and characteristics of a dataset. It provides essential context that helps users understand, discover, and correctly utilize data. Metadata typically includes details such as: Dataset title Data-owning organization Responsible person or unit Keywords Description Objectives Data source Data format and storage method In Thailand, the Digital Government Development Agency (DGA), in collaboration with the National Statistical Office (NSO) and the Institute for the Promotion of Big Data Analytics and Management for Government (GBDi), has developed a minimum metadata standard. This standard is based on ISO/IEC 11179 and the Dublin Core Metadata Initiative (DCMI), along with standardized document templates. These standards define the metadata structure for government datasets, enabling public sector agencies to consistently and systematically develop their data catalogs. Government data has been categorized into five data types, each sharing 14 core metadata elements. Additional metadata elements may be required depending on the specific type of data. Data Resources Data Resources refer to the actual data files that are published and made available for use. Government agencies and organizations can disclose data they collect in various file formats, depending on the nature and purpose of the data. Commonly published formats include, but are not limited to: DOC XLS PDF JPEG CSV RDF The choice of file format should support accessibility, usability, and reusability for different types of users. In addition, the Digital Government Development Agency (DGA) has defined five levels of data openness to guide government agencies in publishing data appropriately. These levels help determine the degree of accessibility and reusability of data, ensuring alignment with national open data policies and standards. https://www.dga.or.th/document-sharing/article/35847/ Data Catalog Another frequently mentioned term in the context of Open Data is Data Catalog. By definition, a Data Catalog is a structured listing of all datasets owned by an organization. These datasets are grouped or classified according to categories defined by the data-owning agency. The datasets listed in a Data Catalog may include both open data and restricted (closed) data, depending on the policies and decisions of the data owner. A Data Catalog serves as a central reference that enables organizations to manage, discover, and govern their data assets effectively, while also supporting transparency and data-driven decision-making. In addition, the Digital Government Development Agency (DGA), the National Statistical Office (NSO), and other related agencies have provided official definitions, recommendations, and practical guidelines regarding Data Catalog implementation. These resources are available through various channels for further study, including: Digital Government Development Agency (DGA): https://www.dga.or.th/document-sharing/infographic/60945/ Thailand Open Government Data Portal: https://data.go.th/pages/about-open-data National Statistical Office (NSO) Help Page: https://gdhelppage.nso.go.th/p06_02_04.html NRCT Data Catalog FAQ: https://catalog-data.nrct.go.th/faq
Open Data #2: Introduction to CKAN In the past, data disclosure by organizations and government agencies was typically carried out by publishing lists of datasets on custom-developed websites. These platforms were often designed for specific purposes and lacked standardized data management capabilities. In 2015, the Digital Government Development Agency (DGA) enhanced and further developed Thailand’s Open Government Data portal, data.go.th, by adopting CKAN, an open-source data management system widely used internationally for powering data hubs and data portals. CKAN was customized and extended to align with national guidelines and operational practices, making data.go.th a reference model for government agencies seeking to implement Open Data platforms. As a result, CKAN has become a widely adopted solution for developing government data portals and data catalog systems across the public sector in Thailand. https://ckan.org What Is CKAN? CKAN is an open-source software platform developed by the Open Knowledge Foundation (OKFN). It is designed to support governments and organizations in building and operating Open Data Portals and has been widely adopted across many countries worldwide, including the United States, the United Kingdom, the European Union, Japan, and Singapore. Due to its flexibility, scalability, and strong community support, CKAN has become one of the most popular open-source solutions for data management and data publishing. In addition to CKAN, there are several other Data Catalog and Open Data platforms available today, offered in both open-source and commercial models. Examples include: DKAN Socrata Azure Open Datasets There are also software platforms with similar functionalities focused on data discovery and governance, such as: Collibra OpenMetadata (open-metadata.org) DataHub Each platform offers different strengths and capabilities, allowing organizations to select solutions that best align with their data governance strategies, technical requirements, and operational goals. Why CKAN? All open-source data management platforms have their own strengths and limitations. For example, DKAN is built on the Drupal framework, which many users are already familiar with. It is developed using PHP and MySQL, making it relatively easy to customize and maintain. However, DKAN is owned by NuCivic, a private company, which makes the long-term direction and sustainability of the software less predictable. In contrast, CKAN is developed and maintained by a non-profit organization and is supported by a large global user base. It has a well-established and active developer community that continuously shares knowledge, extensions, and best practices. This strong community support contributes to the platform’s stability and long-term reliability. Although CKAN is developed using Python, which has a smaller developer base in Thailand compared to PHP, it demonstrates a clearer and more trustworthy development roadmap. For these reasons, the Digital Government Development Agency (DGA) selected CKAN as the core platform for developing and enhancing Thailand’s Open Government Data portal, data.go.th. https://acouch.github.io/dkan-site/ https://open-metadata.org ทWhy Must Government Agencies Use CKAN? According to Thailand’s Digital Government Standard on Open Government Data Disclosure in Digital Format (Open Government Data Guideline: DG-STD 12001:2020), government agencies are required to publish or link their open data for public access through the Open Government Data of Thailand portal at data.go.th. This standard establishes clear guidelines and requirements to ensure transparency, accessibility, and effective data utilization across the public sector. (Reference: https://standard.dga.or.th/dg-std/2028/) In addition, related regulations on the development of the Government Data Catalog require all government agencies to register and link their datasets through the website https://gdcatalog.go.th. (Reference: https://gdhelppage.nso.go.th/p05_01.html) Both systems—data.go.th and gdcatalog.go.th—have been developed based on the CKAN software platform. The National Electronics and Computer Technology Center (NECTEC) plays a key role in supporting and developing data integration mechanisms, including the harvester system, which enables automatic data harvesting from government agencies that use CKAN-based platforms. Furthermore, NECTEC has developed CKAN Open-D, a localized implementation of CKAN designed to support government agencies in Thailand. CKAN Open-D allows agencies to download, install, and use the system in alignment with national Open Data policies and the Thai context, particularly in terms of compliance with government data catalog standards. (Reference: https://www.nectec.or.th/opend/) For these reasons, the development of Open Data platforms across all government agencies in Thailand is based on CKAN. This ensures consistency, seamless data integration, and interoperability across systems. Adopting a common platform also enables standardized data structures, making data easier to access, exchange, and reuse for further analysis, policy-making, and innovation. https://acouch.github.io/dkan-site/ CKAN and the Private Sector In the private sector, many organizations have begun to explore the adoption of CKAN for data management purposes. Rather than focusing solely on public data disclosure, CKAN is increasingly used as a platform for internal data sharing and collaboration across departments within an organization. By implementing CKAN as an internal data catalog or data hub, organizations can improve data discoverability, enhance data governance, and support data-driven decision-making. Use cases and practical examples of CKAN implementation in private organizations will be discussed in greater detail in upcoming articles.
Open Data #3: CKAN Environment CKAN is a web-based application developed using the Python programming language and utilizes PostgreSQL as its primary database. The CKAN environment consists of multiple components working together to support data management, discovery, and access. The key components of the CKAN environment include the following: NGINX Acts as a reverse proxy, handling and routing user requests to the appropriate backend services. CKAN Application Serves as the core of the system, responsible for managing datasets, metadata, and user interactions. CKAN integrates with Apache Solr for search functionality, Redis for caching, and connects to the PostgreSQL database server for data storage. Apache Solr Functions as the indexing and search server, storing indexed data and supporting efficient search and retrieval of datasets based on user queries. Redis Operates as a cache management system, improving system performance by reducing database load and accelerating response times. PostgreSQL Server Acts as the database management system (DBMS), storing structured CKAN data, including metadata and dataset-related information. Together, these components form a scalable and reliable architecture that enables CKAN to efficiently support Open Data portals and data catalog systems for organizations of all sizes. Therefore, installing CKAN requires the complete setup of all related environment components to ensure that the system operates properly and efficiently. CKAN Installation Government agencies in Thailand have published official guidelines for installing CKAN, provided by both the National Electronics and Computer Technology Center (NECTEC) and the Government Big Data Institute (GBDi). These resources offer step-by-step instructions and best practices for CKAN deployment. Additional information can be found at the following references: NECTEC CKAN Installation Guide: https://gitlab.nectec.or.th/opend/installing-ckan/-/blob/master/from-package-2.9.md GBDi CKAN Installation Overview: https://bigdata.go.th/big-data-101/data-engineering/ckan-installation/ CKAN can be deployed in various environments, including Virtual Machines (VMs), container-based environments, or on cloud services. The choice of deployment model primarily depends on the readiness, infrastructure, and operational requirements of the system administrators. CKAN Version CKAN has been continuously developed and improved over time. The current stable version is CKAN 2.10.1 (Reference: https://docs.ckan.org/en/2.10/maintaining/releases.html#release-types). However, most CKAN platforms currently used by government agencies in Thailand are still operating on CKAN versions 2.8.x and 2.9.x, which remain widely adopted due to system stability, compatibility with existing extensions, and operational familiarity.