An AI-supported project sifts through 100 years of job advertisements

An English lady would like to accompany a family summering in Baden or Bad Ischl. In return, she is offering a family member two hours of English lessons in the morning. This newspaper ad from 1870, somewhat unusual from today’s perspective, is a tiny puzzle piece in the research project “The Emergence of a Differentiated Labor Market” in the interdisciplinary network “The Human Factor in Digital Transformation” at the University of Graz.
“In essence, a labor market entails matching, i.e. successfully linking vacancies with job seekers. We are using machine-aided optical character recognition (OCR) to trace the evolution of the labor market from the mid-19th century onwards using job ads in Austrian German-language newspapers,” explains project leader and economist Jörn Kleinert. The Austrian Science Fund FWF is supporting the research project, which has developed this “digital eyes” technology in cooperation with the Austrian National Library’s historical newspaper database (ANNO).
Pinpointing the job ad in the word salad
Eight million digitized newspaper pages, from “Arbeiterwille” to “Wiener Zeitung”, with countless individual ads from both those offering and those seeking jobs are theoretically available for analysis in the ANNO archive, the prerequisite being that the graphically designed advertisements can be automatically identified and evaluated as text. The first technical challenge for the research team was to compile suitable and correct material for training an OCR learning algorithm. The aim was to get the machine to reliably recognize very differently designed job ads and translate them into machine-readable text files.
A basic research project describes the emergence of and the continuous change in the labor market in Austria over a time span of about 100 years from the mid of 19th to the mid of 20th century through the lens of job ads in newspapers.
There were many stumbling blocks along the way to avoid generating texts that did not make any sense. Job advertisements are short, often only using keywords instead of full sentences, and feature a broad range of designs, depending on the year and the given media, all geared toward generating as much visibility as possible. Ad sizes, font types, including the calligraphic Fraktur typeface, but also the rendering of initials, capital lettering and numbers can vary widely.
A total of 35 different digitized newspapers and magazines from the Austrian National Library’s ANNO archive were selected for the creation of the training data set. The sample was randomly distributed according to paper type, layout, page and year of publication, among other things. “The OCR program and the underlying artificial intelligence needed to draw from broad array of images for learning purposes to ensure accuracy and reliability. The final training data set had to be correct; that’s why everyone in the team spent many hours checking it several times. This is the only way we can be sure that we can scan and process the archive using the same procedure,” says Kleinert. His work on the project also led to the discovery of the advertisement placed by the “English lady”.
Part of the research work also involved identifying possible biases in the data and comparing suitable text recognition and processing mechanisms. The manually cleaned and corrected data set spanning over a hundred years contains around 12,500 job ads. In addition, the job advertisements in the second week of April for 1870 and 1927 were analyzed across a range of newspapers. The Austrian National Library will make both of these gold-standard data sets available to the research community. The team expects to have converted around 1.3 million job advertisements into a machine-readable format by the time the project is finished.
How does a differentiated labor market develop?
As an economist, Jörn Kleinert’s interest naturally extends beyond the technical aspects of developing the data set to the history of the labor market in Austria itself. In 1870, there were around 100 different occupations. Many people worked in agriculture and performed jobs as necessary. Today we call this “learning by doing”. At the beginning of 2022, the European Skills, Competences, Qualifications and Occupations (ESCO) classification listed 3,700 unique occupations resulting from the combination of around 13,000 different skills.
“Around 175 years ago, there was no state employment office, which was not created until the turn of the century. The labor market was created at the micro level growing out of the initiative of an array of actors,” says Kleinert. Part of this involved ads for vacancies and job seekers placed in the newspaper, which went out into the world at random. Employers and trade unions were also part of the puzzle, along with private hiring agencies, many with dubious reputations, and what was known back then as “Umschau”: People willing to work would turn up at factories on Monday mornings looking for paid employment for the week. Their suitability and aptitude to the task in question only became apparent on the job, and their term of employment was often short.
From the individual to the market
Kleinert and his colleagues in the interdisciplinary network “The Human Factor in Digital Transformation” have begun the in-depth analysis of the content of the texts, and the work will certainly continue for some time. Their analysis covers a range of aspects, from perceptions and negotiation scope to the spectrum of job profiles and pay. Small evaluation programs have already been developed to analyze some of these parameters.
Newspapers were mainly financed by ads from 1880 onwards, and advertising was promoted accordingly. Consequently, Kleinert sees the project not just as a source of data, but also as a treasure trove for qualitative research. On the one hand, Kleinert found significant scope for negotiation in the analyzed ads aimed at attracting interested and qualified applicants. On the other hand, there were highly personalized ads, for example where someone was seeking to work on Monday and Wednesday afternoons. Some job profiles were non-existent in 1870 but had evolved by 1927, such as typists with shorthand skills, civil engineers and electrical engineers. While dental technicians were not common in 1927, they constituted a real innovation, as the job profile had not existed previously even under a different name.
Jörn Kleinert asks, “How did we get from a world in which there were hardly any professions to today’s differentiated situation of supply and demand for labor? Today, we produce around fifty times more than our ancestors did 200 years ago. How is that possible? Are we able to produce more, or is it more the case that we can do so much more in small, highly specialized fields?”
Different skills and abilities need to be brought together through job matching, which is where the labor market comes into play. Kleinert anticipates that economists, historians and sociologists will thrill at the potential of the data, which will enable them to document and analyze the changing world of work. Artificial intelligence can also provide technical support, he stresses. “If we know exactly how to use it, it can help us. But without us, AI machines are useless.”
About the researcher
Jörn Kleinert is an economist specializing in international economics. After holding positions in Kiel and Tübingen, he accepted a professorship at the University of Graz in 2010. His research focuses on heterogeneity and differentiation as sources of economic development. The Austrian Science Fund FWF provides around 400,000 euros in funding for the research project "The emergence of the differentiated labor market" (2022–2025).
Publications & contributions
Adam R., Venglarova K., Vogeler G.: Exploring Historical Labor Markets: Computational Approaches to Job Title Extraction, hal-04869347 2025 (Preprint)
Venglarova K., Adam R., Mölzer W. et al.: Who Advertises in Newspapers? Data Criticism in Mining Historical Job Ads, Computational Humanities Research Conference, Denmark 2024 (PDF)
Kleinert J., Mölzer W.: The Emergence of the Austrian Labor Market, University of Graz 2024 (PDF)