Website Institut national de l'information géographique et forestière

Training large-scale deep learning models for remote sensing image analysis requires vast amounts of labeled imagery. While Earth observation data is available in large quantities thanks to imaging programs such as the European Sentinel-2 constellation or the French SPOT and BDORTHO initiatives, this mass of data is unlabeled. Indeed, few remote sensing
images are actually labeled with semantic information that can be used to train supervised deep neural networks.

To alleviate this obstacles, machine learning research has turned towards the generation of synthetic datasets. The democratization of “generative” artificial intelligence has made it possible to produce large-scale labeled datasets by generating diverse images in known configurations. Procedural generation is another technique well-known in the video game
industry, that has been use to quickly and efficiently synthesize large-scale 3D virtual worlds.

Scientific goals
The main goal of this Ph.D. thesis is to combine the strenghts of deep learning with procedural generation based on grammars, with applications to geospatial modeling.
Procedural generation consists in algorithms for content creation, especially tailored to video games [7]. These techniques produce coherent virtual worlds [1] that can be used to model or simulate the real world. Procedural generation has regained traction in recent years as it can be used to synthesize large amounts of labeled synthetic data, on which deep neural networks can then be trained [11, 10]. Previously, procedural generation has mostly leveraged four broad types of approaches: exploration [17], constraint satisfaction [18], grammars [15], and statistical learning [5, 14].

Grammars (or assimilated, such as the L-system [21]) are particularly interesting. Indeed, they use a formal language that defines which instances are acceptable objects that can be generated. A grammar allows users to include specialized knowledge and is interpretable. However, manual definition of grammars require expert knowledge and often entails an
iterative ”try-and-retry” workflow. In comparison, learning-based procedural generation trains models to learn how to generate objects based on a existing collection. Yet, these models are not constrained and might generate objects that are not acceptable, e.g. houses without any doors. In addition, recent approaches based on deep neural networks need large-scale datasets to excel, which is often not an option when working with specialized data. Geographic entities in urban areas, typically roads and buildings, have to follow strict geometrical priors that are known to be difficult to satisfy [6].

This Ph.D. therefore looks into hybrid approaches at the intersection of symbolic procedural generation, especially grammars, and deep learning. Hybrid approaches could be able to learn from fewer examples, while being better at enforcing the expert constraints defined in a user-based grammar [9]. We have two main goals:
1. Designing generative neural architectures in which the outputs are constrained by a user-defined grammar. Doing so guarantees that the model can only generate acceptable objects to the user [8]. More specifically, we will investigate:
• how to constrain segmentation maps from a supervised model so that they satisfy a grammar on the spatial relationships between objects,
• or generative models that can only generate instances that are the results of successive rule applications from a given grammar, e.g. for buildings.
2. Develop models that can infer part of or an entire grammar based on a training dataset (inverse procedural generation [12] and grammar inference). In particular, we will look for:
• methods that can automatically learn the terminal symbols of the formal language, for example using prototype learning [4, 13],
• then for methods that can infer production rules of a grammar that can produce the training instances [2].
These generation algorithms will be applied to 2D and 3D geospatial data for city generation [16, 19, 3], such as cadaster maps (building and parcel footprints), buildings 3D models (either manually produced or automatically extracted from Lidar point clouds) and land cover/land use maps.

Applicant profile

The ideal applicant has a master level education (M.Sc. or M.Eng.) specialized in one of the following fields: data science, video games, geoinformation. He or she has previous experience in programming, especially with the Python language ecosystem. Previous knowledge of project management tools, such as Git, is a plus. English proficiency is mandatory. French is not required, although it can help in everyday life. Although not required, a first experience with procedural generation, generative models or geospatial data is welcome.

Workplace

The National institute for geographic and forest information (IGN) is a French governmental agency under the ministry of Ecology and Forests. Its main role is to produce and disseminate reference data and representations (paper and online maps, geovisualizations) relevant to the understanding of the French national territory, its forests, and their evolution. Thanks to its engineering school, ENSG-Géomatique, and its pluridisciplinary research laboratories, the institute fosters a strong and high-level innovation culture in several fields (geodesy, forest management, photogrammetry, artificial intelligence, spatial analysis, visualization…).
The LASTIG is a mixed research unit under the umbrella of IGN-ENSG and University Gustave Eiffel. The laboratory works fundamental and applied researchers in geographical information sciences and technologies. The STRUDEL team focuses on spatio-temporal structures to analyze territories. In particular, its main researches are extracting and structuring knowledge about territories, its characteristics, its evolution, and how to reuse this knowledge for simulation. The team is involved in multiple research projects leveraging synthetic data for simulation, digital twins and unsupervised learning.

Contract: This is a fixed-term 36 months contract, full-time. It is tied to a student registration to the MSTIC graduate school. Salary is approximately 1800€/month.
Location: The lab is located in the ENSG engineering school at Champs-sur-Marne (77), near Paris by train (RER A).
Advantages:
• Flexible remote work after a starting period
• Sports facilities available on-site
• Sports and cultural associations available in the institute
• Access to the campus cafeteria
• Travel card 75% paid up by the employer, biking subsidies
Process: resume, on-site or remote interview including a technical test. Application online:

To apply for this job please visit www.ign.fr.