Transform Your Ideas into Reality

Generate Images with Ease Using Your Prompt and a Reference Image

To access the service, please sign in or become a member. Membership is currently free for a limited time.

ControlNet is a feature that Stable Diffusion users are excited about because it allows users to choose exactly which parts of an original image to keep and which to ignore. For example, we can convert a cartoon sketch into a consistent realistic photograph. 

This technology useful for example for controlling poses and compositions. The controller image is used to constrain/control the result, and the image used depends on the model being used. OpenPose is a fast keypoint detection model that can extract human poses, and edge detection via Canny is another way an image can be preprocessed. ControlNet takes an additional input image, detects its outlines using the Canny edge detector, saves them as a control map, and feeds them into the model as extra conditioning.

ControlNet offers multiple models for controlling the final output with each model suitable for specific cases. The available models include Canny, M-LSD, HED, Scribbles, Fake Scribbles, Semantic Segmentation, Depth, Normal Map, and OpenPose. Users can choose to use preprocessing or none when converting uploaded images to the model style. 

Canny is ideal for applications that require clear object boundaries, while M-LSD is suitable for precise line detection and analysis. HED offers superior edge detection and is perfect for visual arts projects. Scribbles and Fake Scribbles are ideal for creative projects and transforming images into sketch-style illustrations. Semantic Segmentation is useful for generating images with clearly defined objects and areas, while Depth is perfect for visualizing spatial relationships between objects in an image. Normal Map provides realistic lighting and shading effects, while OpenPose is suitable for generating images featuring realistic human poses and postures. More details: 


The Canny model excels at edge detection, identifying areas of rapid intensity change in your images. By highlighting edges, Canny brings out the most important features and shapes, producing results with clean, sharp lines. Ideal for applications requiring clear object boundaries, such as cartography or architectural design.

M-LSD (Multiscale Line Segment Detector):

M-LSD is a highly versatile model that detects line segments in images at multiple scales. This flexibility allows M-LSD to find and analyze geometric structures even in images with varying resolutions. Choose M-LSD for applications in robotics, computer vision, or any project requiring precise line detection and analysis.

HED (Holistically-Nested Edge Detection):

HED offers a holistic approach to edge detection, seamlessly integrating multiple convolutional layers. This advanced technique allows for simultaneous learning of low-level and high-level features, leading to superior edge detection and improved image quality. Ideal for use in graphic design, photography, and other visual arts.


The Scribbles model specializes in generating images based on user-drawn sketches, transforming simple lines into photorealistic scenes. By interpreting and refining your input, Scribbles brings your artistic vision to life with incredible accuracy. Perfect for creative projects, concept development, or simply exploring the power of your imagination.

Fake Scribbles:

Fake Scribbles, unlike the regular Scribbles model, uses ControlNet to automatically create scribble-like sketches from uploaded images. This model analyzes your image and generates a corresponding hand-drawn sketch, maintaining the essence of the original while adding an artistic touch. Ideal for transforming photographs into sketch-style illustrations or simplifying complex scenes for easier interpretation.

Semantic Segmentation:

Semantic Segmentation divides images into labeled regions according to their semantic meaning, allowing for a better understanding of object relationships within a scene. Utilize this model to generate images with clearly defined objects and areas, ideal for applications in autonomous vehicles, robotics, and computer vision. 


The Depth model, trained using the ControlNet framework, focuses on generating depth maps from input images, allowing you to visualize and understand the spatial relationships between objects in an image. Create images with accurate depth information for use in 3D rendering, virtual reality, or simply exploring complex spatial scenes.

Normal Map:

Normal Map is designed to generate normal maps from input images, providing detailed surface information that captures the nuances of textures and materials. Enhance your generated images with realistic lighting and shading effects, perfect for use in video game development, CGI, or any project requiring intricate surface detail.

Human Pose:

The Human Pose model excels in understanding and predicting human body configurations, detecting key body joints and their connections. Use this model to generate images featuring realistic human poses and postures, perfect for character design, animation, or studying body mechanics.

Choose from our diverse selection of pre-trained models to create stunning images with our advanced text-to-image generators. Powered by ControlNet, these models offer various capabilities to meet your specific needs. Discover the unique features of each model below:

Sign up for a Free Account.

Join now and enjoy 60+ AI tools for free, with no fee or credit card required. Don’t miss out – sign up today.