journal_logo

GMS Medizinische Informatik, Biometrie und Epidemiologie

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS)

1860-9171


This is the English version of the article. The German version can be found here.
Research Article

[Merging of comprehensive data and completeness of information on diagnosis, treatment and progression from several population-based cancer registries in Germany – initial experiences using the example of lung cancer]

 Annika Waldmann 1
Louisa Labohm 1
Hannah Baltus 1
Christine Eisfeld 2
Lina Jansen 3
Imma Löhden 4
Alice Nennecke 4
Florian Oesterling 2
Ron Pritzkuleit 5
Alexander Katalinic 1

1 Institut für Sozialmedizin und Epidemiologie, Universität zu Lübeck, Lübeck, Germany
2 Landeskrebsregister NRW gGmbH, Bochum, Germany
3 Epidemiologisches Krebsregister Baden-Württemberg, Deutsches Krebsforschungszentrum (DKFZ), Heidelberg, Germany
4 Hamburgisches Krebsregister, Freie und Hansestadt Hamburg, Behörde für Wissenschaft, Forschung, Gleichstellung und Bezirke, Hamburg, Germany
5 Krebsregister Schleswig-Holstein, Registerstelle, Lübeck, Germany

Abstract

Background: Germany has a nationwide cancer registration system. Population-based cancer registries are organised at federal state level. To date, research-driven merging of detailed (clinical) data on diagnosis, therapy and progression from several state cancer registries has only rarely been conducted. We examined the feasibility and effort of merging the data as well as the suitability of the data for describing oncological care.

Methods: Data on lung cancer (ICD-10 C34) from the diagnosis years 2016–2019 were requested from four population-based cancer registries, processed and merged. If several case-assignable reports on one therapy event were available, these were aggregated to provide the “best information” as part of the pilot study. The data is analysed in a descriptive and explorative manner.

Results: Data preparation and collation were time-consuming, but technically feasible. The analysis dataset includes over 61,000 cases of disease with follow-up. Information on at least one type of therapy (surgery, radiotherapy, systemic therapy) is available for 74%. The provided information on the details of the therapies is mostly complete or has only a few missing values.

Discussion: Despite some differences in the general conditions of cancer registration in the federal states at the time of this study, differences in data distribution and data quality in the pooled data set were small. The proportion with completely missing information on therapy (26%) could be regarded as high in our opinion, but comparable to other registry studies. Based on the available information, the main features of individual cancer therapies can be described and the information could be used for oncological health services research.


Keywords

population-based cancer registries, data merging, data quality, therapy, follow-up

Introduction

In 2013, the Act on the Further Development of Early Cancer Detection and Quality Assurance through Clinical Cancer Registries laid the basis for nationwide clinical cancer registration in Germany, which is now set out in Section 65c of the German Social Security Code V (SGB V) [1], [2]. State-specific cancer registry laws [3], [4], [5], [6]) and Section 8 of the Federal Cancer Registry Data Act (BKRG) [7] regulate data processing, reporting channels and occasions, among other things. The oncological basic data set (oBDS) and supplementary modules [8], [9], [10], [11] regulate the type and scope of the data to be registered. The law on the consolidation of cancer registry data from 2021 [12] forms the basis for the nationwide consolidation of part of the clinical data at the Center for Cancer Registry Data (ZfKD) into the 'nationwide clinical cancer registry dataset' [13]. As a first step, data on therapy and progression from the state cancer registries were merged for the first time in 2023 [13]. On request, the nationwide data can be provided as individual case data. This data does not include all variables of the oBDS, e.g. information from the organ-specific modules as well as on biomarkers and health-care providers is missing. State cancer registry data must therefore continue to be used for more in-depth analyses. In a second stage, event-related, patient- and health-care provider-related, cross-registry compilations of cancer registry data are to be realised [12]. A detailed concept for this stage is still pending.

Merging of detailed clinical data on therapy and disease progression from several German state cancer registries was initially rare – with the exception of the evaluations that are regularly carried out on the occasion of the German Cancer Congress and presented there under the term “Nationwide Oncological Quality Conference” [14]. In the meantime, individual research projects are beginning to merge national clinical data [15], [16].

In a pilot project, we wanted to test the feasibility and cost of merging data as well as its suitability for describing oncological care. Tumour, therapy and progression data on individuals with lung cancer were requested from four federate state cancer registries, processed and used to answer the following questions:

  • How much effort is required to merge the data?
  • Are the treatments known in the register fully documented so that they are suitable for describing oncological care?

Methods

Request of data

The federate state cancer registries of Baden-Württemberg, Hamburg, North Rhine-Westphalia and Schleswig-Holstein were contacted for the pilot project, as they have been collecting corresponding data since at least 2016. The following considerations also played a role in the selection of these registries: (1) a maximum of two different types of cancer registry software should be used, (2) cancer registries with both large and small underlying populations and (3) city states and territorial states should contribute to the database.

At the turn of the year 2020/2021, the data request procedure was coordinated and, in addition to the registry-specific forms, a general study protocol and a list of variables were sent to the registries. The list contained the requested variables from the oBDS in the desired format (numeric, string).

Information was requested on cases of lung cancer (ICD-10 C34) from the diagnosis years 2016–2019. The patients had to be at least 18 years of age and the registry had to have minimum information on the diagnosis. In addition, the patients had to live in the respective federal state at the time of diagnosis – selection on so-called residence-related cases, which allows a systematic survey of vital status via a comparison of data with registration office information or mortality data [17].

Data provision and processing

In the registries, the data is available in relational databases in two different forms. On the one hand, the databases contain “raw data” in the form of case-assignable reports, as transmitted to the registry by reporters. These reports are made for specific reporting events (e.g. diagnosis, surgery, start of radiotherapy, etc.) and usually only contain data that can be assigned to this event (e.g. surgery reports do not contain any information on concomitant systemic therapy). Several reports from different reporters/physicians may be available for one reporting occasion (e.g. a diagnosis report from a practice and a specialised clinic). On the other hand, the cancer registry databases contain so-called best-of information. These are summaries of several reports (see next section for best-of formation), which ultimately depict the course of a case of disease. A best-of dataset has various 1:n connections – 1 patient : n tumours, 1 tumour : n therapies (Figure 1 [Fig. 1]).

Figure 1: Partial data sets of the nationwide clinical cancer registration – relation of the data sets to each other as well as exemplary content and outline of the procedure for the formation of the best-of information on the basis of case-assignable reports
In the sub-dataset ‘Best-of information tumour data’, there is exactly one row in the dataset for each case. In the best-of datasets on therapy, there is exactly one row for each therapy occasion (e.g. surgery on the primary tumour, revision surgery; partial irradiation as part of radiotherapy), although a lung cancer case can occur several times in the dataset if there are several therapy occasions (1:n relation).

At the time of data application, best-of information was not yet available in the registries for all episodes of a disease course, so that in some cases case-assignable reports were transmitted and the task of best-of formation lay with the pilot project. Information on the diagnosis, i.e. the description of the primary tumour (tumour data), was provided from all registries as best-of information; this was not always the case for the treatments and courses of disease (Table 1 [Tab. 1]). The partial datasets provided were made available in “long layout” (data in a stacked structure; i.e. if a disease case was operated twice, there were two rows in the dataset for this disease case) or in “wide layout” (unstacked structure; if a disease case was operated twice, the columns with the information on the surgeries were doubled and each disease case was included in the dataset with one row). In the course of processing, all partial datasets were transferred to a stacked layout.

Table 1: Description of the data provided and special features of the data provision

Information on substances or protocols of systemic therapy are reported to the registries via free text fields, whereby substances and trade names of the products are permitted according to the data set description of the oBDS [11]. A substance reference list agreed upon by the cancer registries was used for the preparation of this information, which, in addition to the standardisation of substances, also allows assignment to the types of therapy (chemotherapy, immunotherapy, etc.). The information on the type of surgical intervention is entered via text fields in which so-called OPS codes (surgery and procedure codes [18]) are to be entered in accordance with the data record description [11]. An OPS reference list agreed by the cancer registries was also used to decide whether an OPS code indicates a surgery with tumour resection.

Creation of best-of information

Data on the same information from several reporters/physicians must be combined into so-called best-of information (Figure 1 [Fig. 1]). For many years, there have been recommendations for the (epidemiological) best-of formation of diagnostic information (tumour data table) – based on international guidelines and agreements of the state cancer registries – and the implementation of the recommendations has been established in the registries [17]. Since such guidelines for therapy information did not exist at the beginning of nationwide clinical cancer registration in Germany, a working group of the cancer registries has developed recommendations for best-of formation in the registries (date of adoption for the recommendations for surgery: May 7, 2019; radiotherapy: November 5, 2019; systemic therapy: January 11, 2022). If no best-of-therapy information was provided by the register, it was generated on the basis of the individual reports and the working groups’ recommendations (as of spring 2021) in a time-consuming and programming-intensive process. Here, for example, the proposed “ranking” of characteristics was implemented if there were start and end reports of a therapy with discordant information on a characteristic.

Data processing and statistical evaluations

The data sets provided were processed, merged and evaluated using the open source statistical program R (version 4.1.3) [19]. The processing steps (including standardisation of variable names, formats and, if necessary, coding, best-of formation if necessary) were carried out separately for each register. Both the raw datasets and the datasets resulting from processing and merging were checked for data quality and plausibility. The variables in the pooled data sets were also checked for possible registry effects, i.e. differences in the frequency distribution of the characteristic values between the registries.

To assess data homogeneity, the minimum and maximum values from the four registries are given for the percentage frequencies and medians.

For the analyses, cases that were only known to the registry on the basis of a death certificate (DCO case), cases with 0 days survival, cases with missing information on vital status and cases with information on therapies that occurred more than 31 days before the lung cancer diagnosis and that could be interpreted as an indication of other cancers were excluded.

Statistical hypothesis testing was not intended.

Ethics

The project was reported to the Ethics Committee of the University of Lübeck and approved (Ref. 20-483). It is registered in the German Register of Clinical Studies under the number DRKS00025080 [20].

Results

Data provision and processing

Table 1 [Tab. 1] illustrates the heterogeneity of data provision. Two registries transmitted the data promptly, while one registry took seven months. The number of partial datasets provided and their layout differed, partly due to vague specifications in the application for data transmission.

Information on surgical interventions was provided by two registries and information on radiotherapy was provided by one registry as best-of information. The information on systemic or awaiting therapies (hereinafter referred to as “systemic therapy”) was provided by all registries as case-assignable reports.

All registries included here collect vital status systematically and in a similar way by regularly comparing cancer registry data with information from the registration office or mortality data. The last comparison for the data set used here took place in one registry in December 2019, in another in December 2020 and in the two other registries in 2021.

Data basis: Pooled data set with best-of information

After processing, best-of formation and merging, a database with six partial data sets containing a total of 70,821 cases with lung cancer was available (Table 1 [Tab. 1], Figure 2 [Fig. 2]): One dataset each with best-of information on diagnosis (and vital status), surgery, radiotherapy, systemic therapies (which can either consist of only one type of therapy such as chemotherapy or a combination of therapy types) or with follow-up information and another with information on metastases at the time of the primary tumour and during follow-up.

Figure 2: Flow chart

The analyses are based on 61,806 cases after application of the exclusion criteria (= evaluation population).

Completeness of documented therapies: Lung cancer cases with and without known therapy

In 45,465 of the 61,806 cases (73.8%), information on at least one treatment event was available. Compared to cases without information on treatment, these cases were on average 4 years younger, had a missing tumour stage less frequently, had large cell lung cancer less frequently and small cell lung cancer slightly more frequently, and died less frequently during the follow up period (Table 2 [Tab. 2]).

Table 2: Description of lung cancer cases (ICD-10 C34, diagnosis years 2016–2019) in total and according to the presence of at least 1 treatment information
(absolute and relative frequency in relation to the respective row, unless otherwise stated)

Where treatment was known, information was most frequently available on systemic therapy alone (n=12,560; 27.6%), on a combination of radiotherapy and systemic therapy (n=10,081; 22.2%) and on surgery alone (n=8,641; 19.0%) (Figure 3 [Fig. 3]).

Figure 3: Number of cases without or with best-of information on treatment
The data is based on the 61,806 lung cancer cases using the exclusion criteria. For a case, either none, one or a combination of several types of therapy may be known in the cancer registry.

Completeness of the information on the documented therapies

In total, information on 19,917 surgical interventions was available. At least one surgical procedure was known for 27.7% of the cases in the evaluation population (Figure 2 [Fig. 2], Table 3 [Tab. 3]). This proportion varied between the registries in the range of 20.2% to 31.1%. Around 79.2% of all interventions were performed within 6 months of diagnosis (92.2% of all initial interventions). After forming the best-of information, specific information on the intention of the surgery was rarely missing (missing information or “unknown”: 12.2%) and on surgical procedures (1.0%), while information on complications of the surgery was missing or unknown in more than 50% of cases. The frequency distributions across the four registries were mostly quite similar (Table 3 [Tab. 3] – section A). Around 95% of the cases with surgery information contained valid information on the date, intention and procedures performed. This proportion varied between the registries in the range of 85.0% to 100%.

Table 3: Number of cases with best-of information and completeness of treatment information
(absolute and relative frequency, unless otherwise stated)

For 33.0% of the lung cancer cases in the evaluation population, at least one piece of information on partial radiotherapy was available (Figure 2 [Fig. 2], Table 3 [Tab. 3]). Around 50.6% of all (partial) irradiation were started within 6 months of diagnosis (80.1% of all initial partial irradiations). Information on the total dose of partial irradiation was available for more than 90%. With the exception of the information on the side of the target area (unknown: 22.8%, missing: 14.7%), the proportion of missing values was a maximum of 15%. The frequency distributions across the registries were also largely quite similar for the information from the reports on partial irradiation (Table 3 [Tab. 3] – section B). Overall, around 72% of this best-of information contained valid details on the start and end of treatment, position regarding surgery, type of application, intention and target area (spread between the registries: 65.0%–92.7%).

At least one best-of information on systemic anti-cancer therapy could be formed for 48.4% of cases (Figure 2 [Fig. 2], Table 3 [Tab. 3]). Around 47.4% of all systemic therapies were started within 6 months of diagnosis (92.6% of all initial therapies). Information on the end date of therapy and the reason for discontinuation (unknown or missing) was missing in around 40% of cases. In around 93% of the best-of information, at least one substance of the systemic therapy could be assigned to the one of the types chemotherapy, immunotherapy or targeted therapy via the substance reference list. The information on systemic therapy across the registries was predominantly quite similar (Table 3 [Tab. 3] – section C). There were major differences in the information on the protocols of systemic therapy – both, the degree of completion and the content of the field, varied significantly. Information on the protocols was rarely found; instead, the information from the substances field was often repeated or supplemented, e.g. with additional substances or dosages or application types. Overall, around 88% of the best-of information on systemic therapy contained valid information on start, position in relation to the surgery, intention and substances administered (variation between the registries: 83.9%–95.2%).

Discussion

Cancer registries in Europe are increasingly collecting and reporting treatment data, although there is currently still great heterogeneity with regard to the data collected and data quality [21]. In Germany, comprehensive data on the diagnosis, treatment and course of disease of cancers that have occurred and been treated in the catchment area of the respective federate-state cancer registry are available in the treatment-related (clinical) federate-state cancer registries. At the time of the study, (detailed clinical) registry data had not yet been merged on the basis of the oBDS. For this reason, we conducted a pilot project to examine the feasibility and effort involved in merging the data as well as the suitability of the data with a focus on the completeness of the information on the description of oncological care using the example of lung cancer.

Data consolidation and creation of the best-of information

At the time of project planning and implementation, there was no 'nationwide clinical cancer registry dataset' with information on therapy and progression events [13] that could be requested from the ZfKD, so the data had to be requested from the cancer registries.

In addition, there was neither a standardised application form nor a common application portal, which made the application process more difficult. In the meantime, a standardised application form has been developed by the § 65c platform (resolution number 2024/70/06), which has reduced the effort involved in applying for data. The development and permanent implementation of a common application portal for data from different cancer registries, as is currently being pursued in the AI-Care project [22], is desirable and could further reduce the effort involved in applying for data in the future.

The process of data preparation, harmonisation and consolidation was very time-consuming, taking nine months from the data application to the first version of a quality-assured database with six (partial) datasets. Among other things, too vaguely formulated specifications in the data application regarding the structure and content of partial datasets as well as the formation of the best-of information for treatments, which has not yet been implemented everywhere, and the deviating time windows “application to provision of the data” contributed to the high effort. The reason for the large deviation in the time frames is, among other things, that at the time of the data application in a federal state, in the case of an application for the transfer of individual patient level data, an advisory board and a scientific expert committee had to first discuss the application and make a recommendation. Both committees meet at least once a year [5] in accordance with the legal basis – they currently meet regularly every six months. This procedure was changed in mid-2023 and currently only the scientific committee has to discuss the application in this federal state.

With regard to the structure and format of the datasets, only implicit specifications were made in the application for data provision, among other things to allow the cancer registries to fall back on any existing data extraction routines. In order to keep the amount of work involved in data standardisation and merging to a minimum, intensive consultation with the registries regarding the number, format and specific content of the requested datasets as well as the inclusion and exclusion criteria is recommended.

In many federal states, the start and end of radiotherapy and systemic therapy are separate reporting events in cancer registration. Therefore, when determining the best-of information, it must be checked whether, as expected, two reports are available for one treatment occasion and whether the information logically matches or which report contains one or the “better” information. If there are start and end reports but differing information on a characteristic, a decision can be made on the basis of defined algorithms (e.g. using the recommendations for best-of formation developed by the cancer registries) as to which information should be included in the best-of information. The therapy data was predominantly provided (in the case of systemic therapy by all registries) in the form of case-assignable reports. The complexity of programming the best-of formation of therapy events contributed significantly to the effort involved in data preparation. As soon as the registries can provide best-of information for all therapy types, the effort will be significantly reduced. Since case-based (and not notification-based) information has been provided to the ZfKD by the cancer registries since 2022 [13], it can be assumed that best-of formation in the cancer registries is currently largely implemented in all cancer registries and for all notification types. It remains to be seen whether this will also include the complex delineation of multiple therapies, such as first and second-line treatments.

In addition to statements on the completeness of the information, the analyses at the level of best-of information also allow statements to be made on typical treatment patterns and survival after certain therapies, provided that the therapies are reported in full. It should be borne in mind that, unlike individual clinical case files, cancer registry data with its predefined structure and defined characteristics can only approximate the clinical complexity of a case.

Completeness of the documented therapies

In our opinion, the proportion of cases with information on at least one type of therapy (74%) is currently still too low, but is comparable to the data from the SEER registry (78.5% for non-small cell lung cancer [NSCLC] and 75.1% for NSCLC in the age group 80 years and older) [23], [24]. Whether the information (the report) is simply missing in the registry at the time of data provision in cases without known therapy or whether the therapy was not carried out remains open at present. Based on an analysis from Central Europe, it can be assumed that around 5% of people with stage III NSCLC are receiving ‘best supportive care’ [25] – these are therapies that may not be consistently reported to the cancer registries. The completeness of therapy reports is to be investigated in more detail as part of a further analysis.

Completeness of the information on the documented therapies

If a therapy is known to the registry, a high degree of completeness (around 95%) of the key information can be assumed, particularly for surgical interventions. Overall, there is only a high proportion of missing values for a few variables. These include information on complications following surgery and on the side effects of radiotherapy or systemic therapy. A ‘reporting bias’ (distortion due to selective documentation) can have various causes and arise at different points in the documentation process: The occurrence of adverse events is not a separate reporting cause according to the national legal provisions of the cancer registries involved [3], [4], [5], [6]. In addition, the oBDS only provides for the reporting of adverse events that occur within 90 days of treatment, whereas in clinical reality these often occur later. Furthermore, the information in the registries involved is not a ‘minimum’ or mandatory field for reporting [26], [27], [28], so selective non-documentation of ‘no complication’ at the time of reporting can lead to information and misclassification bias. A recently published study on NSCLC and immunotherapy points to further reasons for the high proportion of missing values: In addition to the fear of discontinuing therapy, the individual assessment of severity and the feasibility of dealing with them are mainly responsible for the fact that complications or side effects of patients are not expressed to the health-care providers and thus not documented [29]. Indications of a systematic lack of information can be found in the data in the US National Cancer Database. Here, a high proportion of missing information is associated with poorer survival [30].

If the information on the end date or the reason for termination is missing in the final reports of a therapy or in the best-of information (if start and/or end reports are available) or if these final reports themselves are missing, this can be considered a quality deficit. When applying for data provision, the reason for reporting the treatments (start or end) should therefore be requested – this information can be used to distinguish whether there are quality deficits for the final report or whether final reports for the treatments are missing. For the pilot project, this information was omitted for reasons of data economy, so that the deficit can only be checked indirectly. As expected, reports on systemic therapies initiated in 2020 and 2021 showed significantly higher proportions of missing values for the reason for termination (49% and 76%, respectively) than those for therapies initiated in 2016 to 2019 (35–40%).

The cancer registry laws in Schleswig-Holstein and North Rhine-Westphalia stipulate reporting within 6 and in Hamburg within 8 weeks after the diagnosis or therapy becomes known; in Baden-Württemberg, reporting must take place in the following quarter at the latest [3], [4], [5], [6]. For this reason and due to the median duration of the first known systemic therapy of 64 days or a completion of therapy in 75% of all cases within 105 days, it would have been expected that the end of therapy and the reason for termination should be known when the data was provided in 2021. However, due to missing or outstanding reports from the treating physicians or due to processing backlogs in the registry, information may be missing, especially for more recent therapies.

Unfortunately, the clinics and practices involved in oncological care do not yet report treatment information to the extent and in the quality that would be desirable for a valid description. For example, 15% of the best-of information on partial irradiation currently lacks information on the target area of radiotherapy and 9% lack information on the total radiation dose. This may be due to a lack of resources in the clinics and practices, but it is also conceivable that not all information was available at the time of reporting. The pooled data sets showed that the registries differed in the strictness with which they accepted data. Depending on the registry, there are different variables in which the characteristic ‘unknown’ occurs conspicuously frequently compared to the other registries.

While the SEER data only contain a limited number of characteristics on primary therapies that were started in hospitals within the first 24 months [31] and thus do not reflect the complete picture of oncological care [32], [33], the data from the German cancer registries include not only primary therapies but also further therapies from both the inpatient and outpatient sector. This is a special feature that allows a comprehensive description of oncological care on the basis of the cancer registry data – assuming that all therapies (completeness of reporting) with complete and valid information on the characteristics defined by the oBDS (completeness of information) are included in the state cancer registries.

Data scope of the ‘nationwide clinical cancer registry dataset’ compared to data requested from the cancer registries themselves

The ‘nationwide clinical cancer registry dataset’ currently to be applied for does not include all variables of the oBDS, e.g. information on service providers and from the organ-specific modules as well as on biomarkers is missing. Furthermore, individual details on therapies such as adverse events and complications or dose information on radiotherapy and the residual status after completion of primary therapy are missing [13]. For more in-depth analyses, therefore, self-reported data from the state cancer registries must continue to be used.

Another disadvantage of the ‘nationwide clinical cancer registry dataset’ is that the earliest available diagnosis and treatment year in this dataset is 2020. If therapy and survival are to be analysed before and after the approval of novel therapeutic agents (such as, for example if the treatment and survival of people with non-small cell lung cancer before and after approval of treatment with osimertinib in the presence of an EGFR mutation [34] or a combination of nivolumab, ipilimumab and two cycles of platinum-based chemotherapy in metastatic NSCLC [35] are to be analysed), the data from the state cancer registries, in which treatment and progression data are also available for the period before 2020, must continue to be used.

Conclusion

Our pilot project showed that although the merging of clinical data from the state cancer registries was still very (time) consuming, it was technically feasible. Country-specific differences and unclear information in the data request made it more difficult and slower to merge the data, partly due to different application procedures and formats of the data provided, but in particular due to the fact that the best-of formation for the therapies in the registries has not yet been fully implemented.

Data generation in the federal states involved here is subject to partly differing (different state legislation, cancer registry software, reporting structures and registry routines) and partly common framework conditions (reporting obligation, oBDS). These should be taken into account when interpreting the data. The question of completeness or the absence of information in data fields (variables) must be carefully analysed, in particular because it is unclear whether data are missing randomly or systematically depending on the stage of disease, general condition or age or depending on the area of care (outpatient/inpatient setting).

In the pooled data sets, the differences between the federate-states registries in terms of data distribution and data quality were small. With reasonable effort, it was possible to generate a population-based database that, with more than 61,000 cases, is sufficiently large to be able to carry out meaningful analyses even for rare histological subgroups or age groups with few cases of disease – provided that valid and complete information is available. Even when restricting the analysis to cases of disease with information on at least one type of therapy or to cases of disease with complete (essential) information on the therapy, the number of cases is still very high. These case numbers are comparable with currently ongoing long-term oncological registry studies [36].

It is to be hoped that, due to the legally required delivery to the ZfKD, best-of information for the therapies will be available in the registries in a timely manner. This will make it much easier to merge data on treatment and progression from the state cancer registries. It remains to be seen whether stage 2 of the federal law [12] will speed up the process of merging and providing event-related data by means of a central application and registry centre. But even then, good communication with the registries will be important for data evaluation and interpretation.

In our opinion, the proportion with completely missing information on treatment is still quite high at approx. 26%, but comparable to other registry studies [23], [24]. If information on the therapies is available, this information is mostly sufficiently complete. Therefore, the main features of individual therapies can be described well and used for oncological health services research. However, possible biases due to missing information must be taken into account, especially if conclusions are to be drawn for oncological care from a population perspective.

Notes

Acknowledgement

The authors would like to thank Volker Arndt from the Baden-Württemberg Cancer Registry for his critical and helpful comments on earlier versions of this manuscript.

Competing interests

The authors declare that they have no competing interests. They point out that they (with the exception of AW, LL, HB) are employed in or manage state cancer registries.

Ethics

No human studies were conducted for this article. The guidelines for the evaluation of anonymised data were adhered to. The research project was reported to the ethics committee of the University of Lübeck.


References

[1] Bundesregierung. Gesetz zur Weiterentwicklung der Krebsfrüherkennung und zur Qualitätssicherung durch klinische Krebsregister (Krebsfrüherkennungs- und-registergesetz – KFRG) vom 3. April 2013
[2] Katalinic A, Halber M, Meyer M, Pflüger M, Eberle A, Nennecke A, Kim-Wanner SZ, Hartz T, Weitmann K, Stang A, Justenhoven C, Holleczek B, Piontek D, Wittenberg I, Heßmer A, Kraywinkel K, Spix C, Pritzkuleit R. Population-Based Clinical Cancer Registration in Germany. Cancers (Basel). 2023 Aug;15(15):3934. DOI: 10.3390/cancers15153934
[3] Landesregierung Baden-Württemberg. Gesetz über die Krebsregistrierung in Baden-Württemberg (Landeskrebsregistergesetz – LKrebsRG) vom 7. März 2006.
[4] Senat Freie und Hansestadt Hamburg. Hamburgisches Krebsregistergesetz (HmbKrebsRG) vom 27. Juni 1984.
[5] Landesregierung Nordrhein-Westfalen. Gesetz über die klinische und epidemiologische Krebsregistrierung im Land Nordrhein-Westfalen (Landeskrebsregistergesetz – LKRG NRW) vom 2. Februar 2016.
[6] Landesregierung Schleswig-Holstein. Gesetz über das Krebsregister des Landes Schleswig-Holstein (Krebsregistergesetz – KRG SH) vom 4. November 2015.
[7] Bundesregierung. Bundeskrebsregisterdatengesetz (BKRG) vom 10. August 2009 (BGBl. I S. 2702,2707) – zuletzt geändert durch Artikel 2 des Gesetzes vom 18. August 2021 (BGBl. I S. 3890).
[8] Bundesministerium für Gesundheit. Bekanntmachung: Module zur Dokumentation des Brust- und Darmkrebses in Ergänzung des aktualisierten einheitlichen onkologischen Datensatzes der Arbeitsgemeinschaft Deutscher Tumorzentren e.V. (ADT) und der Gesellschaft der epidemiologischen Krebsregister in Deutschland e.V. (GEKID) vom 28. Oktober 2015. BAnz AT. 26.11.2015;(B1):1-10.
[9] Bundesministerium für Gesundheit. Bekanntmachung: Modul zur Dokumentation des Prostatakrebses in Ergänzung des aktualisierten einheitlichen onkologischen Datensatzes der Arbeitsgemeinschaft Deutscher Tumorzentren e. V. (ADT) und der Gesellschaft der epidemiologischen Krebsregister in Deutschland e. V. (GEKID) vom 9. August 2017. BAnz AT. 29.08.2017;(B6):1-6.
[10] Bundesministerium für Gesundheit. Bekanntmachung: Modul zur Dokumentation des Malignen Melanoms in Ergänzung des einheitlichen onkologischen Basisdatensatzes der Arbeitsgemeinschaft Deutscher Tumorzentren e. V. (ADT) und der Gesellschaft der epidemiologischen Krebsregister in Deutschland e. V. (GEKID) vom 25. Mai 2020. BAnz AT. 26.06.2020;(B4):1-4.
[11] Bundesministerium für Gesundheit. Bekanntmachung: Aktualisierter einheitlicher onkologischer Basisdatensatz der Arbeitsgemeinschaft Deutscher Tumorzentren e.V. (ADT) und der Gesellschaft der epidemiologischen Krebsregister in Deutschland e.V. (GEKID) vom 10. Mai 2021. BAnz AT. 12.07.2021;(B4):1-31.
[12] Bundesregierung. Gesetz zur Zusammenführung von Krebsregisterdaten. 2021.
[13] Meisegeier S, Imhoff M, Berg K, Kraywinkel K. Bundesweiter klinischer Krebsregisterdatensatz – Datenschema und Klassifikationen (oBDS_v3.0.0.8a_RKI). Zenodo; 2023. DOI: 10.5281/zenodo.10022040
[14] Arbeitsgemeinschaft Deutscher Tumorzentren e.V. Die ADT als Vertrauensstelle für Bundesweite Qualitätskonferenzen in der Onkologie. [last accessed 2024 Dec 13]. Available from: https://adt-netzwerk.de/Vertrauensstelle/Bundesweite_Qualitaetskonferenzen/
[15] Roessler M, Schmitt J, Bobeth C, Gerken M, Kleihues-van Tol K, Reissfelder C, Rau BM, Distler M, Piso P, Günster C, Klinkhammer-Schalke M, Schoffer O, Bierbaum V. Is treatment in certified cancer centers related to better survival in patients with pancreatic cancer? Evidence from a large German cohort study. BMC Cancer. 2022 Jun;22(1):621. DOI: 10.1186/s12885-022-09731-w
[16] Rudolph C, Germer S, Nennecke A, Kusche H, Labohm L, Rath N, Rausch K, Holleczek B, Handels H, Katalinic A. Künstliche Intelligenz (KI) in der Krebsregistrierung: Methoden, Herausforderungen und erste Ergebnisse des AI-Care-Datensatzes. In: Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH). Dresden, 08.-13.09.2024. Düsseldorf: German Medical Science GMS Publishing House; 2024. DocAbstr. 867. DOI: 10.3205/24gmds742
[17] Stegmaier C, Hentschel S, Hofstädter F, Katalinic A, Tillack A, Klinkhammer-Schalke M, editors. Das Manual der Krebsregistrierung. München: Zuckschwerdt; 2019.
[18] Operationen- und Prozedurenschlüssel (OPS). [last accessed 2024 Jan 24]. Available from: https://www.bfarm.de/DE/Kodiersysteme/Klassifikationen/OPS-ICHI/OPS/_node.html
[19] R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2022.
[20] Therapie des Lungenkarzinoms – eine Versorgungsforschungsstudie auf Basis von Krebsregisterdaten nach § 65c SGB V. DRKS00025080. In: Deutsches Register Klinischer Studien (DRKS). [last accessed 2024 Jan 24]. Available from: https://drks.de/search/de/trial/DRKS00025080
[21] Giusti F, Martos C, Trama A, Bettio M, Sanvisens A, Audisio R, Arndt V, Francisci S, Dochez C, Ribes J, Fernández LP, Gavin A, Gatta G, Marcos-Gragera R, Lievens Y, Allemani C, De Angelis R, Visser O, Van Eycken L; ENCR Working Group on Treatment Data Harmonisation. Cancer treatment data available in European cancer registries: Where are we and where are we going? Front Oncol. 2023;13:1109978. DOI: 10.3389/fonc.2023.1109978
[22] Institut für Krebsepidemiologie e.V. (IKE) der Universität zu Lübeck. KI-unterstützte, versorgungsnahe Nutzung von Krebsregisterdaten – AI-CARE. [last accessed 2024 Dec 6]. Available from: https://ai-care-cancer.de/
[23] Bei Y, Chen X, Raturi VP, Liu K, Ye S, Xu Q, Lu M. Treatment patterns and outcomes change in early-stage non-small cell lung cancer in octogenarians and older: a SEER database analysis. Aging Clin Exp Res. 2021 Jan;33(1):147-56. DOI: 10.1007/s40520-020-01517-z
[24] Ganti AK, Klein AB, Cotarla I, Seal B, Chou E. Update of Incidence, Prevalence, Survival, and Initial Treatment in Patients With Non-Small Cell Lung Cancer in the US. JAMA Oncol. 2021 Dec;7(12):1824-32. DOI: 10.1001/jamaoncol.2021.4932
[25] Zemanova M, Pirker R, Petruzelka L, Zbozínkova Z, Jovanovic D, Rajer M, Bogos K, Purkalne G, Ceriman V, Chaudhary S, Richter I, Kufa J, Jakubikova L, Zemaitis M, Cernovska M, Koubkova L, Vilasova Z, Dieckmann K, Farkas A, Spasic J, Fröhlich K, Tiefenbacher A, Hollosi V, Kultan J, Kolarová I, Votruba J. Care of patients with non-small-cell lung cancer stage III – the Central European real-world experience. Radiol Oncol. 2020 May;54(2):209-20. DOI: 10.2478/raon-2020-0026
[26] Krebsregister Baden-Württemberg. Krebsregistrierung in Baden-Württemberg: Datenkatalog mit Merkmalsausprägungen nach ADT/GEKID Basisdatensatz 2.0.0 für Tumorzentren, Onkologische Schwerpunkte, Krankenhäuser und niedergelassene Ärzte. 2017.
[27] Krebsregister Schleswig-Holstein. Meldepflicht – Mindestangaben (zwingend notwendige Angaben zur Erfüllung der Vollständigkeit einer Meldung und zur Auszahlung der Meldevergütung). 2018.
[28] Landeskrebsregister NRW. Melder-Broschüre Krebsregistrierung in Nordrhein-Westfalen: Meldepflicht – Meldepflichtige Erkrankungen – Vergütung. Bochum; 2020.
[29] Martin ML, Chung H, Rydén A. Willingness to report treatment-related symptoms of immunotherapy among patients with non-small cell lung cancer. Qual Life Res. 2022 Apr;31(4):1147-55. DOI: 10.1007/s11136-021-02966-3
[30] Yang DX, Khera R, Miccio JA, Jairam V, Chang E, Yu JB, Park HS, Krumholz HM, Aneja S. Prevalence of Missing Data in the National Cancer Database and Association With Overall Survival. JAMA Netw Open. 2021 Mar;4(3):e211793. DOI: 10.1001/jamanetworkopen.2021.1793
[31] SEER Acknowledgment of Treatment Data Limitations. For the 1975-2019 Data (November 2021 Submission). [last accessed 2023 Dec 14]. Available from: https://seer.cancer.gov/data-software/documentation/seerstat/nov2021/treatment-limitations-nov2021.html
[32] Du XL, Key CR, Dickie L, Darling R, Delclos GL, Waller K, Zhang D. Information on chemotherapy and hormone therapy from tumor registry had moderate agreement with chart reviews. J Clin Epidemiol. 2006 Jan;59(1):53-60. DOI: 10.1016/j.jclinepi.2005.06.002
[33] Noone AM, Lund JL, Mariotto A, Cronin K, McNeel T, Deapen D, Warren JL. Comparison of SEER Treatment Data With Medicare Claims. Med Care. 2016 Sep;54(9):e55-64. DOI: 10.1097/MLR.0000000000000073
[34] EGFR-mutiertes (EGFRmut) NSCLC: Zulassung für Osimertinib firstline. Im Focus Onkologie. 2018;21(7):80. DOI: 10.1007/s15015-018-4126-4
[35] Zulassung für Opdivo(R) / Yervoy(R): Erstlinie für NSCLC. Dtsch Ärztebl. 2020 Nov;117(Suppl. Pneumologie & Allergologie 2):40.
[36] Nicht-interventionelle Studie (Anwendungsbeobachtung) NIS-Nr.: 356. CRISP – Clinical Research platform Into molecular testing, treatment and outcome of (non-)Small cell lung carcinoma Patients. [last accessed 2023 Dec 14]. Available from: https://www.pei.de/SharedDocs/awb/nis-0301-0400/0356.html